{
    "Selected_candidate": {
        "pr_number": 9655,
        "pr_title": "[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set",
        "pr_body": "#### Reference Issue\r\nFixes #9654 \r\n\r\n\r\n\r\n#### What does this implement/fix? Explain your changes.\r\nRadiusNeighborsRegressor is behaving differently when there are no neighbors for a sample between when weights are or aren't used. This PR fixes this inconsistency. This PR also fixes raised error when no available data points for `RadiusNeighborRegression` using non-uniform weights.",
        "issue_id": 9654,
        "issue_title": "RadiusNeighborRegression error",
        "issue_body": "#### Description\r\nRadiusNeighborRegression has inconsistent output depending whether using weights that are uniform or by distance. \r\n\r\n- When using uniform weights then if no observations are available within the specified radius of an observation point then no it returns `np.nan`, \r\n\r\n- When using distance weights it raises `ZeroDivisionError: Weights sum to zero, can't be normalized` is raised from `np.average` as there are no weights to use for the observation. This is demonstrated with the following example below - copied from `RadiusNeighborsRegressor` where distance is specified and for X used for prediction is different,\r\n\r\n#### Steps/Code to Reproduce\r\n```python\r\nfrom sklearn.neighbors import RadiusNeighborsRegressor   \r\n\r\nX = [[0], [1], [2], [3]]\r\ny = [[0], [0], [1], [1]]\r\n\r\nneigh = RadiusNeighborsRegressor(radius=1.0, weights='distance')\r\nneigh.fit(X, y) \r\n\r\ny_hat = neigh.predict([[-2],[0]])\r\n```\r\n\r\n#### Expected Results\r\n```python\r\narray([[ nan],\r\n       [  0.]])\r\n```\r\n\r\n#### Actual Results\r\n```\r\n---------------------------------------------------------------------------\r\nZeroDivisionError                         Traceback (most recent call last)\r\n<ipython-input-6-c1cb420157ca> in <module>()\r\n      7 neigh.fit(X, y)\r\n      8 \r\n----> 9 y_hat = neigh.predict([[-1.5],[1]])\r\n\r\n~\\AppData\\Local\\Continuum\\Miniconda3\\lib\\site-packages\\sklearn\\neighbors\\regression.py in predict(self, X)\r\n    294             y_pred = np.array([(np.average(_y[ind, :], axis=0,\r\n    295                                            weights=weights[i]))\r\n--> 296                                for (i, ind) in enumerate(neigh_ind)])\r\n    297 \r\n    298         if self._y.ndim == 1:\r\n\r\n~\\AppData\\Local\\Continuum\\Miniconda3\\lib\\site-packages\\sklearn\\neighbors\\regression.py in <listcomp>(.0)\r\n    294             y_pred = np.array([(np.average(_y[ind, :], axis=0,\r\n    295                                            weights=weights[i]))\r\n--> 296                                for (i, ind) in enumerate(neigh_ind)])\r\n    297 \r\n    298         if self._y.ndim == 1:\r\n\r\n~\\AppData\\Local\\Continuum\\Miniconda3\\lib\\site-packages\\numpy\\lib\\function_base.py in average(a, axis, weights, returned)\r\n   1138         if np.any(scl == 0.0):\r\n   1139             raise ZeroDivisionError(\r\n-> 1140                 \"Weights sum to zero, can't be normalized\")\r\n   1141 \r\n   1142         avg = np.multiply(a, wgt, dtype=result_dtype).sum(axis)/scl\r\n\r\nZeroDivisionError: Weights sum to zero, can't be normalized\r\n```\r\n\r\n\r\n#### Versions\r\nWindows-10-10.0.15063-SP0\r\nPython 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]\r\nNumPy 1.13.1\r\nSciPy 0.19.1\r\nScikit-Learn 0.18.2\r\n\r\nI encountered the same problem on Mac OS and Linux machines.",
        "issue_closed_at": "2017-11-21T09:07:55Z",
        "base_commit": "bbdcd70fcaa875e78a009d89f755bba602a861be",
        "changes": [
            {
                "file": "sklearn/neighbors/regression.py",
                "type": "line",
                "name": "line 5",
                "code": "#          Alexandre Gramfort <alexandre.gramfort@inria.fr>\n#          Sparseness support by Lars Buitinck\n#          Multi-output support by Arnaud Joly <a.joly@ulg.ac.be>\n#\n# License: BSD 3 clause (C) INRIA, University of Amsterdam\n\nimport numpy as np\nfrom scipy.sparse import issparse"
            },
            {
                "file": "sklearn/neighbors/regression.py",
                "type": "function",
                "name": "predict",
                "class_name": "RadiusNeighborsRegressor",
                "code": "def predict(self, X):\n        \"\"\"Predict the target for the provided data\n\n        Parameters\n        ----------\n        X : array-like, shape (n_query, n_features), \\\n                or (n_query, n_indexed) if metric == 'precomputed'\n            Test samples.\n\n        Returns\n        -------\n        y : array of int, shape = [n_samples] or [n_samples, n_outputs]\n            Target values\n        \"\"\"\n        X = check_array(X, accept_sparse='csr')\n\n        neigh_dist, neigh_ind = self.radius_neighbors(X)\n\n        weights = _get_weights(neigh_dist, self.weights)\n\n        _y = self._y\n        if _y.ndim == 1:\n            _y = _y.reshape((-1, 1))\n\n        if weights is None:\n            y_pred = np.array([np.mean(_y[ind, :], axis=0)\n                               for ind in neigh_ind])\n        else:\n            y_pred = np.array([(np.average(_y[ind, :], axis=0,\n                                           weights=weights[i]))\n                               for (i, ind) in enumerate(neigh_ind)])\n\n        if self._y.ndim == 1:\n            y_pred = y_pred.ravel()\n\n        return y_pred"
            },
            {
                "file": "sklearn/neighbors/regression.py",
                "type": "function",
                "name": "predict",
                "class_name": "RadiusNeighborsRegressor",
                "code": "def predict(self, X):\n        \"\"\"Predict the target for the provided data\n\n        Parameters\n        ----------\n        X : array-like, shape (n_query, n_features), \\\n                or (n_query, n_indexed) if metric == 'precomputed'\n            Test samples.\n\n        Returns\n        -------\n        y : array of int, shape = [n_samples] or [n_samples, n_outputs]\n            Target values\n        \"\"\"\n        X = check_array(X, accept_sparse='csr')\n\n        neigh_dist, neigh_ind = self.radius_neighbors(X)\n\n        weights = _get_weights(neigh_dist, self.weights)\n\n        _y = self._y\n        if _y.ndim == 1:\n            _y = _y.reshape((-1, 1))\n\n        if weights is None:\n            y_pred = np.array([np.mean(_y[ind, :], axis=0)\n                               for ind in neigh_ind])\n        else:\n            y_pred = np.array([(np.average(_y[ind, :], axis=0,\n                                           weights=weights[i]))\n                               for (i, ind) in enumerate(neigh_ind)])\n\n        if self._y.ndim == 1:\n            y_pred = y_pred.ravel()\n\n        return y_pred"
            }
        ]
    },
    "Justification": "Candidate E addresses a similar issue with handling empty input scenarios, specifically in how the `RadiusNeighborsRegressor` behaves when no neighbors are found. Both the CURRENT bug report and Candidate E's report deal with the expectation of outputs when given empty or insufficient input data. The patches relate to fixing incorrect outputs (either exceptions or unexpected values), making the reasoning and experience in fixing Candidate E particularly relevant for debugging the CURRENT bug regarding the `LabelEncoder`. Additionally, both bugs involve intricacies in data handling and type casting, which aligns well with the current concern about transforming empty lists in different types."
}