{
  "Selected_candidate": {
    "pr_number": 9655,
    "pr_title": "[MRG+1] Return nan in RadiusNeighborsRegressor for empty neighbor set",
    "pr_body": "#### Reference Issue\r\nFixes #9654 \r\n\r\n\r\n\r\n#### What does this implement/fix? Explain your changes.\r\nRadiusNeighborsRegressor is behaving differently when there are no neighbors for a sample between when weights are or aren't used. This PR fixes this inconsistency. This PR also fixes raised error when no available data points for `RadiusNeighborRegression` using non-uniform weights.",
    "issue_id": 9654,
    "issue_title": "RadiusNeighborRegression error",
    "issue_body": "#### Description\r\nRadiusNeighborRegression has inconsistent output depending whether using weights that are uniform or by distance. \r\n\r\n- When using uniform weights then if no observations are available within the specified radius of an observation point then no it returns `np.nan`, \r\n\r\n- When using distance weights it raises `ZeroDivisionError: Weights sum to zero, can't be normalized` is raised from `np.average` as there are no weights to use for the observation. This is demonstrated with the following example below - copied from `RadiusNeighborsRegressor` where distance is specified and for X used for prediction is different,\r\n\r\n#### Steps/Code to Reproduce\r\n```python\r\nfrom sklearn.neighbors import RadiusNeighborsRegressor   \r\n\r\nX = [[0], [1], [2], [3]]\r\ny = [[0], [0], [1], [1]]\r\n\r\nneigh = RadiusNeighborsRegressor(radius=1.0, weights='distance')\r\nneigh.fit(X, y) \r\n\r\ny_hat = neigh.predict([[-2],[0]])\r\n```\r\n\r\n#### Expected Results\r\n```python\r\narray([[ nan],\r\n       [  0.]])\r\n```\r\n\r\n#### Actual Results\r\n```\r\n---------------------------------------------------------------------------\r\nZeroDivisionError                         Traceback (most recent call last)\r\n<ipython-input-6-c1cb420157ca> in <module>()\r\n      7 neigh.fit(X, y)\r\n      8 \r\n----> 9 y_hat = neigh.predict([[-1.5],[1]])\r\n\r\n~\\AppData\\Local\\Continuum\\Miniconda3\\lib\\site-packages\\sklearn\\neighbors\\regression.py in predict(self, X)\r\n    294             y_pred = np.array([(np.average(_y[ind, :], axis=0,\r\n    295                                            weights=weights[i]))\r\n--> 296                                for (i, ind) in enumerate(neigh_ind)])\r\n    297 \r\n    298         if self._y.ndim == 1:\r\n\r\n~\\AppData\\Local\\Continuum\\Miniconda3\\lib\\site-packages\\sklearn\\neighbors\\regression.py in <listcomp>(.0)\r\n    294             y_pred = np.array([(np.average(_y[ind, :], axis=0,\r\n    295                                            weights=weights[i]))\r\n--> 296                                for (i, ind) in enumerate(neigh_ind)])\r\n    297 \r\n    298         if self._y.ndim == 1:\r\n\r\n~\\AppData\\Local\\Continuum\\Miniconda3\\lib\\site-packages\\numpy\\lib\\function_base.py in average(a, axis, weights, returned)\r\n   1138         if np.any(scl == 0.0):\r\n   1139             raise ZeroDivisionError(\r\n-> 1140                 \"Weights sum to zero, can't be normalized\")\r\n   1141 \r\n   1142         avg = np.multiply(a, wgt, dtype=result_dtype).sum(axis)/scl\r\n\r\nZeroDivisionError: Weights sum to zero, can't be normalized\r\n```\r\n\r\n\r\n#### Versions\r\nWindows-10-10.0.15063-SP0\r\nPython 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]\r\nNumPy 1.13.1\r\nSciPy 0.19.1\r\nScikit-Learn 0.18.2\r\n\r\nI encountered the same problem on Mac OS and Linux machines.",
    "issue_closed_at": "2017-11-21T09:07:55Z",
    "base_commit": "bbdcd70fcaa875e78a009d89f755bba602a861be",
    "changes": [
      {
        "file": "sklearn/neighbors/regression.py",
        "type": "line",
        "name": "line 5",
        "code": "#          Alexandre Gramfort <alexandre.gramfort@inria.fr>\n#          Sparseness support by Lars Buitinck\n#          Multi-output support by Arnaud Joly <a.joly@ulg.ac.be>\n#\n# License: BSD 3 clause (C) INRIA, University of Amsterdam\n\nimport numpy as np\nfrom scipy.sparse import issparse"
      },
      {
        "file": "sklearn/neighbors/regression.py",
        "type": "function",
        "name": "predict",
        "class_name": "RadiusNeighborsRegressor",
        "code": "def predict(self, X):\n        \"\"\"Predict the target for the provided data\n\n        Parameters\n        ----------\n        X : array-like, shape (n_query, n_features), \\\n                or (n_query, n_indexed) if metric == 'precomputed'\n            Test samples.\n\n        Returns\n        -------\n        y : array of int, shape = [n_samples] or [n_samples, n_outputs]\n            Target values\n        \"\"\"\n        X = check_array(X, accept_sparse='csr')\n\n        neigh_dist, neigh_ind = self.radius_neighbors(X)\n\n        weights = _get_weights(neigh_dist, self.weights)\n\n        _y = self._y\n        if _y.ndim == 1:\n            _y = _y.reshape((-1, 1))\n\n        if weights is None:\n            y_pred = np.array([np.mean(_y[ind, :], axis=0)\n                               for ind in neigh_ind])\n        else:\n            y_pred = np.array([(np.average(_y[ind, :], axis=0,\n                                           weights=weights[i]))\n                               for (i, ind) in enumerate(neigh_ind)])\n\n        if self._y.ndim == 1:\n            y_pred = y_pred.ravel()\n\n        return y_pred"
      },
      {
        "file": "sklearn/neighbors/regression.py",
        "type": "function",
        "name": "predict",
        "class_name": "RadiusNeighborsRegressor",
        "code": "def predict(self, X):\n        \"\"\"Predict the target for the provided data\n\n        Parameters\n        ----------\n        X : array-like, shape (n_query, n_features), \\\n                or (n_query, n_indexed) if metric == 'precomputed'\n            Test samples.\n\n        Returns\n        -------\n        y : array of int, shape = [n_samples] or [n_samples, n_outputs]\n            Target values\n        \"\"\"\n        X = check_array(X, accept_sparse='csr')\n\n        neigh_dist, neigh_ind = self.radius_neighbors(X)\n\n        weights = _get_weights(neigh_dist, self.weights)\n\n        _y = self._y\n        if _y.ndim == 1:\n            _y = _y.reshape((-1, 1))\n\n        if weights is None:\n            y_pred = np.array([np.mean(_y[ind, :], axis=0)\n                               for ind in neigh_ind])\n        else:\n            y_pred = np.array([(np.average(_y[ind, :], axis=0,\n                                           weights=weights[i]))\n                               for (i, ind) in enumerate(neigh_ind)])\n\n        if self._y.ndim == 1:\n            y_pred = y_pred.ravel()\n\n        return y_pred"
      }
    ]
  },
  "Justification": "Candidate E addresses a similar issue with handling empty input scenarios, specifically in how the `RadiusNeighborsRegressor` behaves when no neighbors are found. Both the CURRENT bug report and Candidate E's report deal with the expectation of outputs when given empty or insufficient input data. The patches relate to fixing incorrect outputs (either exceptions or unexpected values), making the reasoning and experience in fixing Candidate E particularly relevant for debugging the CURRENT bug regarding the `LabelEncoder`. Additionally, both bugs involve intricacies in data handling and type casting, which aligns well with the current concern about transforming empty lists in different types.",
  "instance_id": "scikit-learn__scikit-learn-10508",
  "repo": "scikit-learn/scikit-learn",
  "created_at": "2018-01-19T18:00:29Z",
  "problem_statement": "LabelEncoder transform fails for empty lists (for certain inputs)\nPython 3.6.3, scikit_learn 0.19.1\r\n\r\nDepending on which datatypes were used to fit the LabelEncoder, transforming empty lists works or not. Expected behavior would be that empty arrays are returned in both cases.\r\n\r\n```python\r\n>>> from sklearn.preprocessing import LabelEncoder\r\n>>> le = LabelEncoder()\r\n>>> le.fit([1,2])\r\nLabelEncoder()\r\n>>> le.transform([])\r\narray([], dtype=int64)\r\n>>> le.fit([\"a\",\"b\"])\r\nLabelEncoder()\r\n>>> le.transform([])\r\nTraceback (most recent call last):\r\n  File \"[...]\\Python36\\lib\\site-packages\\numpy\\core\\fromnumeric.py\", line 57, in _wrapfunc\r\n    return getattr(obj, method)(*args, **kwds)\r\nTypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'\r\n\r\nDuring handling of the above exception, another exception occurred:\r\n\r\nTraceback (most recent call last):\r\n  File \"<stdin>\", line 1, in <module>\r\n  File \"[...]\\Python36\\lib\\site-packages\\sklearn\\preprocessing\\label.py\", line 134, in transform\r\n    return np.searchsorted(self.classes_, y)\r\n  File \"[...]\\Python36\\lib\\site-packages\\numpy\\core\\fromnumeric.py\", line 1075, in searchsorted\r\n    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)\r\n  File \"[...]\\Python36\\lib\\site-packages\\numpy\\core\\fromnumeric.py\", line 67, in _wrapfunc\r\n    return _wrapit(obj, method, *args, **kwds)\r\n  File \"[...]\\Python36\\lib\\site-packages\\numpy\\core\\fromnumeric.py\", line 47, in _wrapit\r\n    result = getattr(asarray(obj), method)(*args, **kwds)\r\nTypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'\r\n```\n",
  "patch": "diff --git a/sklearn/preprocessing/label.py b/sklearn/preprocessing/label.py\n--- a/sklearn/preprocessing/label.py\n+++ b/sklearn/preprocessing/label.py\n@@ -126,6 +126,9 @@ def transform(self, y):\n         \"\"\"\n         check_is_fitted(self, 'classes_')\n         y = column_or_1d(y, warn=True)\n+        # transform of empty array is empty array\n+        if _num_samples(y) == 0:\n+            return np.array([])\n \n         classes = np.unique(y)\n         if len(np.intersect1d(classes, self.classes_)) < len(classes):\n@@ -147,6 +150,10 @@ def inverse_transform(self, y):\n         y : numpy array of shape [n_samples]\n         \"\"\"\n         check_is_fitted(self, 'classes_')\n+        y = column_or_1d(y, warn=True)\n+        # inverse transform of empty array is empty array\n+        if _num_samples(y) == 0:\n+            return np.array([])\n \n         diff = np.setdiff1d(y, np.arange(len(self.classes_)))\n         if len(diff):\n"
}