{
  "original_problem": {
    "instance_id": "scikit-learn__scikit-learn-11040",
    "repo": "scikit-learn/scikit-learn",
    "created_at": "2018-04-28T07:18:33Z",
    "problem_statement": "Missing parameter validation in Neighbors estimator for float n_neighbors\n```python\r\nfrom sklearn.neighbors import NearestNeighbors\r\nfrom sklearn.datasets import make_blobs\r\nX, y = make_blobs()\r\nneighbors = NearestNeighbors(n_neighbors=3.)\r\nneighbors.fit(X)\r\nneighbors.kneighbors(X)\r\n```\r\n```\r\n~/checkout/scikit-learn/sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.kd_tree.NeighborsHeap.__init__()\r\n\r\nTypeError: 'float' object cannot be interpreted as an integer\r\n```\r\nThis should be caught earlier and a more helpful error message should be raised (or we could be lenient and cast to integer, but I think a better error might be better).\r\n\r\nWe need to make sure that \r\n```python\r\nneighbors.kneighbors(X, n_neighbors=3.)\r\n```\r\nalso works.\n",
    "patch": "diff --git a/sklearn/neighbors/base.py b/sklearn/neighbors/base.py\n--- a/sklearn/neighbors/base.py\n+++ b/sklearn/neighbors/base.py\n@@ -258,6 +258,12 @@ def _fit(self, X):\n                     \"Expected n_neighbors > 0. Got %d\" %\n                     self.n_neighbors\n                 )\n+            else:\n+                if not np.issubdtype(type(self.n_neighbors), np.integer):\n+                    raise TypeError(\n+                        \"n_neighbors does not take %s value, \"\n+                        \"enter integer value\" %\n+                        type(self.n_neighbors))\n \n         return self\n \n@@ -327,6 +333,17 @@ class from an array representing our data set and ask who's\n \n         if n_neighbors is None:\n             n_neighbors = self.n_neighbors\n+        elif n_neighbors <= 0:\n+            raise ValueError(\n+                \"Expected n_neighbors > 0. Got %d\" %\n+                n_neighbors\n+            )\n+        else:\n+            if not np.issubdtype(type(n_neighbors), np.integer):\n+                raise TypeError(\n+                    \"n_neighbors does not take %s value, \"\n+                    \"enter integer value\" %\n+                    type(n_neighbors))\n \n         if X is not None:\n             query_is_train = False\n"
  },
  "candidates_evaluated": 5,
  "judgment_result": {
    "candidates": [
      {
        "idx": 1,
        "id": "similar_6032",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue involves a mismatch in expected output size, unrelated to parameter validation or type checking."
      },
      {
        "idx": 2,
        "id": "similar_8933",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue is about label encoding affecting scores, not parameter validation or type checking."
      },
      {
        "idx": 3,
        "id": "similar_2771",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue involves incorrect threshold calculation, unrelated to parameter type validation."
      },
      {
        "idx": 4,
        "id": "similar_7346",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue is about list manipulation errors, not parameter validation or type checking."
      },
      {
        "idx": 5,
        "id": "similar_8344",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue involves incorrect method delegation, unrelated to parameter validation or type checking."
      }
    ]
  },
  "raw_summaries": [
    {
      "similar_issue": {
        "issue_title": "LDA.explained_variance_ratio_ is of the wrong size",
        "issue_body": "The docs say that <a href=\"http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html#sklearn.discriminant_analysis.LinearDiscriminantAnalysis\">LDA.explained_variance_ratio_</a> should have only `n_components_`. But it doesn't.\n\nIt looks like this bug only exists when we use the `eigen` solver, not the `svd` solver.\n\n```\n>>> import numpy as np\n>>> from sklearn.lda import LDA\n>>> from sklearn.utils.testing import assert_equal\n>>>\n>>> state = np.random.RandomState(0)\n>>> X = state.normal(loc=0, scale=100, size=(40, 20))\n>>> y = state.randint(0, 3, size=(40, 1))\n>>>\n>>> # Train the LDA classifier. Use the eigen solver\n>>> lda_eigen = LDA(solver='eigen', n_components=5)\n>>> lda_eigen.fit(X, y)\n>>> assert_equal(lda_eigen.explained_variance_ratio_.shape, (5,))\nAssertionError: Tuples differ: (20,) != (5,)\n\nFirst differing element 0:\n20\n5\n\n- (20,)\n+ (5,)\n```\n\nLooks like we fix either the docs or the code. Which one?\n\nPinging @JPFrancoia.\n\nAddresses an issue in #6031.\n",
        "issue_id": 6032,
        "pr_number": 7632,
        "pr_title": "[MRG+1] Correcting length of explained_variance_ratio_, eigen solver, final PR",
        "pr_body": "<!--\nThanks for contributing a pull request! Please ensure you have taken a look at\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#Contributing-Pull-Requests\n-->\n#### Reference Issue\n\n<!-- Example: Fixes #1234 -->\n\nFix #6032 \n#### What does this implement/fix? Explain your changes.\n\nAttribute explained_variance_ratio_ from LinearDiscriminantAnalysis class will be of length n_components (eigen solver).\n#### Any other comments?\n\nThis PR follows PR 7616. I mixed up my git history, so it was easier to open a new PR.\n\n<!--\nPlease be aware that we are a loose team of volunteers so patience is\nnecessary; assistance handling other issues is very welcome. We value\nall user contributions, no matter how minor they are. If we are slow to\nreview, either the pull request needs some benchmarking, tinkering,\nconvincing, etc. or more likely the reviewers are simply busy. In either\ncase, we ask for your understanding during the review process.\nFor more information, see our FAQ on this topic:\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\n\nThanks for contributing!\n-->\n",
        "issue_closed_at": "2016-10-25T12:52:13Z",
        "base_commit": "ee3e61754bd4bb10cea8065993e462fc7b112cb3"
      },
      "summary": "### Summary:\n\nThis issue pertains to a discrepancy in the behavior of the `explained_variance_ratio_` attribute of the `LinearDiscriminantAnalysis` (LDA) class in the scikit-learn library. Specifically, when the `eigen` solver is utilized, the attribute's size does not match the expected number of components, `n_components_`, as outlined in the documentation. Instead of returning an array with the length equal to the number of specified components, it returns an array with a length equal to the number of input features. This behavior contrasts with the consistent output when using the `svd` solver, which does not exhibit this issue.\n\nKey symptoms include an assertion error during testing that highlights the unexpected shape of the `explained_variance_ratio_` attribute. The affected component is the `LinearDiscriminantAnalysis` class within the `discriminant_analysis.py` module of scikit-learn.\n\nThe potential impact of this issue is significant for users relying on the `eigen` solver for dimensionality reduction or feature extraction, as it can lead to incorrect assumptions about the variance explained by the components, potentially affecting downstream analysis and decision-making.\n\nTechnical details reveal that the inconsistency likely stems from the internal workings of the `eigen` solver, which necessitates a correction either in the implementation to align with the documented behavior or an update in the documentation to accurately reflect the current behavior. The changes were made in multiple functions across `LinearDiscriminantAnalysis` and `QuadraticDiscriminantAnalysis`, indicating a thorough approach to resolving the issue.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: LDA.explained_variance_ratio_ is of the wrong size\n\nBody:\nThe docs say that <a href=\"http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html#sklearn.discriminant_analysis.LinearDiscriminantAnalysis\">LDA.explained_variance_ratio_</a> should have only `n_components_`. But it doesn't.\n\nIt looks like this bug only exists when we use the `eigen` solver, not the `svd` solver.\n\n```\n>>> import numpy as np\n>>> from sklearn.lda import LDA\n>>> from sklearn.utils.testing import assert_equal\n>>>\n>>> state = np.random.RandomState(0)\n>>> X = state.normal(loc=0, scale=100, size=(40, 20))\n>>> y = state.randint(0, 3, size=(40, 1))\n>>>\n>>> # Train the LDA classifier. Use the eigen solver\n>>> lda_eigen = LDA(solver='eigen', n_components=5)\n>>> lda_eigen.fit(X, y)\n>>> assert_equal(lda_eigen.explained_variance_ratio_.shape, (5,))\nAssertionError: Tuples differ: (20,) != (5,)\n\nFirst differing element 0:\n20\n5\n\n- (20,)\n+ (5,)\n```\n\nLooks like we fix either the docs or the code. Which one?\n\nPinging @JPFrancoia.\n\nAddresses an issue in #6031.\n\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/discriminant_analysis.py\n  function: LinearDiscriminantAnalysis._solve_lsqr\n  function: LinearDiscriminantAnalysis._solve_svd\n  function: QuadraticDiscriminantAnalysis.fit\n  function: LinearDiscriminantAnalysis.transform\n"
    },
    {
      "similar_issue": {
        "issue_title": "BUG: BaggingClassifier.oob_score_ should not change with class label",
        "issue_body": "Let us compute the oob score of a bagged classifier.\r\n\r\n```python\r\nimport numpy as np\r\nimport pandas as pd\r\nfrom sklearn.ensemble import BaggingClassifier\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\n\r\nN = 50\r\nrandState = 5\r\nlabel = 'Label'\r\nfeatures = ['A','B','C']\r\n\r\nlabels = np.random.randint(3, size = N) - 1\r\ndf = pd.DataFrame( labels , index=range(N), columns=[label] )\r\nfor col in features:\r\n    df[col] = df[label] + 0.01 * np.random.rand( N )\r\n\r\nclf = BaggingClassifier(base_estimator = KNeighborsClassifier(), n_estimators = 10, oob_score = True, random_state = randState )\r\nclf.fit(df[features], df[label])\r\nprint clf.oob_score_\r\n```\r\n\r\nHere, clf.oob_score_=0.0.\r\n\r\nNow, you would not expect that the OOB accuracy is a function of the class labels...\r\n\r\n```python\r\ndf.loc[ df[label] == -1 , label ] = 2\r\nclf = BaggingClassifier(base_estimator = KNeighborsClassifier(), n_estimators = 10, oob_score = True, random_state = randState )\r\nclf.fit(df[features], df[label])\r\nprint clf.oob_score_\r\n```\r\n\r\nNow, clf.oob_score_=1.0.\r\n\r\nClearly, OOB score should not be a function of the labels arbitrarily chosen for the classes.\r\n\r\nsklearn.__version__: '0.18.1'\r\nnumpy.__version__: '1.11.3'",
        "issue_id": 8933,
        "pr_number": 8936,
        "pr_title": "[MRG+1] fixed OOB_Score bug for bagging classifiers.",
        "pr_body": "Fixes #8933\r\n\r\n<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#Contributing-Pull-Requests\r\n-->\r\n#### Reference Issue\r\n<!-- Example: Fixes #1234 -->\r\n\r\n\r\n#### What does this implement/fix? Explain your changes.\r\n\r\n\r\n#### Any other comments?\r\n\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
        "issue_closed_at": "2017-06-08T09:35:49Z",
        "base_commit": "9131f89e6c165fb27dadd37d3168c1ee5ea84f5a"
      },
      "summary": "### Summary:\nThis issue is related to the BaggingClassifier's out-of-bag (OOB) score calculation in the scikit-learn library. The problem arises when the calculated OOB score is unexpectedly influenced by the arbitrary choice of class labels, which should not inherently affect model performance metrics. \n\n1. **Problem Description in General Terms**: The OOB score, a mechanism used to estimate the accuracy of bagged classifiers, should remain consistent regardless of the specific numeric values assigned to class labels. However, an observed discrepancy in the OOB score when class labels are modified indicates a flaw in the scoring mechanism.\n\n2. **Key Symptoms and Behaviors Observed**: The key symptom is the variation in the OOB score when class labels are altered, which is demonstrated by a change from a score of 0.0 to 1.0 when labels are adjusted from -1 to 2. This behavior is contrary to expectations, as the OOB score should reflect model accuracy independent of label encoding.\n\n3. **Affected Components or Systems**: The issue affects the BaggingClassifier component within the scikit-learn library, specifically impacting the `_set_oob_score` function of the `BaggingRegressor` class.\n\n4. **Potential Impact or Severity**: This issue could lead to misleading model evaluation and selection if the OOB score is used as a primary metric for assessing classifier performance. It could result in incorrect assumptions about model accuracy, potentially affecting downstream applications relying on these classifiers.\n\n5. **Relevant Technical Details Abstracted for Broader Understanding**: The problem highlights the importance of label-agnostic evaluation metrics in machine learning systems. The observed behavior suggests that the scoring function might be inadvertently sensitive to label values, which could be due to improper handling or comparison of label encodings during the OOB score computation. The fix likely involves ensuring that the scoring mechanism treats labels in a consistent manner, independent of their numeric representation.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: BUG: BaggingClassifier.oob_score_ should not change with class label\n\nBody:\nLet us compute the oob score of a bagged classifier.\r\n\r\n```python\r\nimport numpy as np\r\nimport pandas as pd\r\nfrom sklearn.ensemble import BaggingClassifier\r\nfrom sklearn.neighbors import KNeighborsClassifier\r\n\r\nN = 50\r\nrandState = 5\r\nlabel = 'Label'\r\nfeatures = ['A','B','C']\r\n\r\nlabels = np.random.randint(3, size = N) - 1\r\ndf = pd.DataFrame( labels , index=range(N), columns=[label] )\r\nfor col in features:\r\n    df[col] = df[label] + 0.01 * np.random.rand( N )\r\n\r\nclf = BaggingClassifier(base_estimator = KNeighborsClassifier(), n_estimators = 10, oob_score = True, random_state = randState )\r\nclf.fit(df[features], df[label])\r\nprint clf.oob_score_\r\n```\r\n\r\nHere, clf.oob_score_=0.0.\r\n\r\nNow, you would not expect that the OOB accuracy is a function of the class labels...\r\n\r\n```python\r\ndf.loc[ df[label] == -1 , label ] = 2\r\nclf = BaggingClassifier(base_estimator = KNeighborsClassifier(), n_estimators = 10, oob_score = True, random_state = randState )\r\nclf.fit(df[features], df[label])\r\nprint clf.oob_score_\r\n```\r\n\r\nNow, clf.oob_score_=1.0.\r\n\r\nClearly, OOB score should not be a function of the labels arbitrarily chosen for the classes.\r\n\r\nsklearn.__version__: '0.18.1'\r\nnumpy.__version__: '1.11.3'\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/ensemble/bagging.py\n  function: BaggingRegressor._set_oob_score\n"
    },
    {
      "similar_issue": {
        "issue_title": "SelectFdr has serious thresholding bug",
        "issue_body": "The current code reads like:\n\n```\ndef _get_support_mask(self):\n    alpha = self.alpha\n    sv = np.sort(self.pvalues_)\n    threshold = sv[sv < alpha * np.arange(len(self.pvalues_))].max()\n    return self.pvalues_ <= threshold\n```\n\nBut this doesn't actually control FDR at all, the correct implementation should have:\n\n```\n    bf_alpha = alpha / len(self.pvalues_)\n    threshold = sv[sv < bf_alpha * np.arange(len(self.pvalues_))].max()\n```\n\nNote the k / m term in the equation at:\nhttp://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure\n",
        "issue_id": 2771,
        "pr_number": 4146,
        "pr_title": "[MRG + 1] Fdr treshold bug",
        "pr_body": "Continues #2932. Fixes #2771.\nThese are some minor fixes on top of #2932, where @bthirion already gave his +1.\nMaybe @arjoly wants to have a look as he commented there.\nThis is a good bug fix that I think we should include asap.\n\nFYI tests take .5s.\n",
        "issue_closed_at": "2015-02-24T22:15:19Z",
        "base_commit": "f6af4881a5a66fb21688379b39f9304898a11bc0"
      },
      "summary": "### Summary: This issue is related to a thresholding bug in the SelectFdr class within the feature selection module of a software library. The problem specifically lies in the implementation of the _get_support_mask method, which is intended to control the False Discovery Rate (FDR) while selecting features based on p-values. However, the current logic incorrectly calculates the threshold for feature selection, failing to properly implement the Benjamini-Hochberg procedure for FDR control.\n\n1. **Problem Description in General Terms**: The method intended to control the FDR during feature selection is flawed, as it does not correctly calculate the threshold needed for the process. This miscalculation leads to the improper selection of features, thereby compromising the integrity of the feature selection process.\n\n2. **Key Symptoms and Behaviors Observed**: The key symptom observed is the incorrect handling of FDR, which may lead to selecting features that should not be included according to the desired FDR level. This potentially results in a higher rate of false discoveries, affecting the overall validity of the feature selection process.\n\n3. **Affected Components or Systems**: The bug affects the SelectFdr class within the univariate_selection module, which is part of a feature selection library. Specifically, the _get_support_mask method is directly impacted.\n\n4. **Potential Impact or Severity**: The impact of this bug is significant, as it affects the reliability of the feature selection process. If not addressed, it can lead to erroneous data analysis outcomes, particularly in applications where controlling the FDR is critical. This can undermine the conclusions drawn from analytical models relying on this feature selection method.\n\n5. **Relevant Technical Details Abstracted for Broader Understanding**: The technical root of the issue is in the threshold calculation logic, which fails to incorporate the correct scaling by the number of tests, as dictated by the Benjamini-Hochberg procedure. The correct implementation should adjust the threshold by dividing the alpha level by the total number of p-values, ensuring proper FDR control. This correction aligns the method with the standard statistical procedure for FDR, enhancing the reliability and accuracy of the feature selection process.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: SelectFdr has serious thresholding bug\n\nBody:\nThe current code reads like:\n\n```\ndef _get_support_mask(self):\n    alpha = self.alpha\n    sv = np.sort(self.pvalues_)\n    threshold = sv[sv < alpha * np.arange(len(self.pvalues_))].max()\n    return self.pvalues_ <= threshold\n```\n\nBut this doesn't actually control FDR at all, the correct implementation should have:\n\n```\n    bf_alpha = alpha / len(self.pvalues_)\n    threshold = sv[sv < bf_alpha * np.arange(len(self.pvalues_))].max()\n```\n\nNote the k / m term in the equation at:\nhttp://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure\n\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/feature_selection/univariate_selection.py\n  function: GenericUnivariateSelect._get_support_mask\n  class: SelectFdr\n  function: GenericUnivariateSelect.__init__\n"
    },
    {
      "similar_issue": {
        "issue_title": "Pop from empty list coming from get_params()",
        "issue_body": "<!--\nIf your issue is a usage question, submit it here instead:\n- StackOverflow with the scikit-learn tag: http://stackoverflow.com/questions/tagged/scikit-learn\n- Mailing List: https://mail.python.org/mailman/listinfo/scikit-learn\nFor more information, see User Questions: http://scikit-learn.org/stable/support.html#user-questions\n-->\n\n<!-- Instructions For Filing a Bug: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#filing-bugs -->\n#### Description\n\n I am getting a pop from empty list error from the warnings.filers.pop(0) call in get_params(). I am using Dask to parallelize the computation of fitting a bunch of MeanShift objects. I only get this error on one machine (a remote linux machine), but it works fine on my home compute (running ubuntu 14) \n#### Steps/Code to Reproduce\n\n<!--\n\n-->\n#### Expected Results\n\nShould just fit the MeanShifts and move on\n#### Actual Results\n\nTraceback (most recent call last):\n  File \"tda_profile.py\", line 34, in <module>\n    _tda.fit(train_features, train_targets)\n  File \"/home/ben/tda/tda_parallel_test.py\", line 652, in fit\n    fits = fits.compute()\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/base.py\", line 86, in compute\n    return compute(self, *_kwargs)[0]\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/base.py\", line 179, in compute\n    results = get(dsk, keys, *_kwargs)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/threaded.py\", line 57, in get\n    **kwargs)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/async.py\", line 484, in get_async\n    raise(remote_exception(res, tb))\ndask.async.IndexError: pop from empty list\n## Traceback\n\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/async.py\", line 267, in execute_task\n    result = _execute_task(task, data)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/async.py\", line 249, in _execute_task\n    return func(*args2)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/cluster/mean_shift_.py\", line 391, in fit\n    cluster_all=self.cluster_all, n_jobs=self.n_jobs)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/cluster/mean_shift_.py\", line 191, in mean_shift\n    (seed, X, nbrs, max_iter) for seed in seeds)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 800, in **call**\n    while self.dispatch_one_batch(iterator):\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 658, in dispatch_one_batch\n    self._dispatch(tasks)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 566, in _dispatch\n    job = ImmediateComputeBatch(batch)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 180, in __init__\n    self.results = batch()\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 72, in **call**\n    return [func(_args, *_kwargs) for func, args, kwargs in self.items]\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 72, in <listcomp>\n    return [func(_args, *_kwargs) for func, args, kwargs in self.items]\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/cluster/mean_shift_.py\", line 75, in _mean_shift_single_seed\n    bandwidth = nbrs.get_params()['radius']\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/base.py\", line 227, in get_params\n    warnings.filters.pop(0)\n#### Versions\n\n> > > import platform; print(platform.platform())\n> > > Linux-3.10.0-327.el7.x86_64-x86_64-with-centos-7.2.1511-Core\n> > > import sys; print(\"Python\", sys.version)\n> > > Python 3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul  2 2016, 17:53:06) \n> > > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]\n> > > import numpy; print(\"NumPy\", numpy.**version**)\n> > > NumPy 1.11.1\n> > > import scipy; print(\"SciPy\", scipy.**version**)\n> > > SciPy 0.17.1\n> > > import sklearn; print(\"Scikit-Learn\", sklearn.**version**)\n> > > Scikit-Learn 0.17.1\n\n<!-- Thanks for contributing! -->\n",
        "issue_id": 7346,
        "pr_number": 9569,
        "pr_title": "[MRG+2] remove modification of warning registry for no reason",
        "pr_body": "Fixes #9560. Fixes #2755. Fixes #7346.",
        "issue_closed_at": "2017-09-08T15:29:37Z",
        "base_commit": "e1fb03c86d2a2c47ef008ead958e1bc10fb06e77"
      },
      "summary": "### Summary:\n\nThis issue is a software bug encountered in the Scikit-Learn library, specifically related to the `get_params()` method within the `BaseEstimator` class. The problem manifests as an \"IndexError: pop from empty list\" when attempting to retrieve parameters using `get_params()`. This error occurs due to the improper handling of the `warnings.filters` list, which is being manipulated by popping elements without ensuring the list is non-empty.\n\n1. **Problem Description in General Terms:** \n   The issue involves an attempt to manipulate a list within the codebase where the list may be empty, leading to an IndexError. This error arises during parameter retrieval in a machine learning context, specifically when using Dask to parallelize computations involving Scikit-Learn's MeanShift clustering algorithm.\n\n2. **Key Symptoms and Behaviors Observed:**\n   - The error occurs on a remote Linux machine but not on a local Ubuntu machine, indicating potential environment-specific behavior.\n   - The error traceback points to a sequence of calls culminating in an IndexError from the `warnings.filters.pop(0)` operation within the `get_params()` function.\n\n3. **Affected Components or Systems:**\n   - Scikit-Learn's `BaseEstimator` class, particularly its `get_params()` method.\n   - The error is observed in a parallel computing context using Dask, suggesting potential concurrency issues or environmental dependencies.\n\n4. **Potential Impact or Severity:**\n   - The error prevents the successful fitting of MeanShift models in parallel, potentially affecting any application relying on this functionality for large-scale data processing.\n   - Given its occurrence under specific conditions, the issue may not universally impact all users but could significantly hinder those with similar setups.\n\n5. **Relevant Technical Details Abstracted for Broader Understanding:**\n   - The issue arises from manipulating a list without checking for its state, a common programming oversight that can lead to runtime errors.\n   - The reliance on external libraries like Dask and the specific Python environment (Linux, CentOS) may contribute to the error's manifestation, highlighting the importance of context in software debugging and patching.\n\nChanges Summary:\nThe patch addresses the issue by modifying the `BaseEstimator.get_params` and `BaseEstimator.__setstate__` functions within `sklearn/base.py`, likely introducing checks or alternative logic to handle the empty list scenario safely.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: Pop from empty list coming from get_params()\n\nBody:\n<!--\nIf your issue is a usage question, submit it here instead:\n- StackOverflow with the scikit-learn tag: http://stackoverflow.com/questions/tagged/scikit-learn\n- Mailing List: https://mail.python.org/mailman/listinfo/scikit-learn\nFor more information, see User Questions: http://scikit-learn.org/stable/support.html#user-questions\n-->\n\n<!-- Instructions For Filing a Bug: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#filing-bugs -->\n#### Description\n\n I am getting a pop from empty list error from the warnings.filers.pop(0) call in get_params(). I am using Dask to parallelize the computation of fitting a bunch of MeanShift objects. I only get this error on one machine (a remote linux machine), but it works fine on my home compute (running ubuntu 14) \n#### Steps/Code to Reproduce\n\n<!--\n\n-->\n#### Expected Results\n\nShould just fit the MeanShifts and move on\n#### Actual Results\n\nTraceback (most recent call last):\n  File \"tda_profile.py\", line 34, in <module>\n    _tda.fit(train_features, train_targets)\n  File \"/home/ben/tda/tda_parallel_test.py\", line 652, in fit\n    fits = fits.compute()\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/base.py\", line 86, in compute\n    return compute(self, *_kwargs)[0]\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/base.py\", line 179, in compute\n    results = get(dsk, keys, *_kwargs)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/threaded.py\", line 57, in get\n    **kwargs)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/async.py\", line 484, in get_async\n    raise(remote_exception(res, tb))\ndask.async.IndexError: pop from empty list\n## Traceback\n\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/async.py\", line 267, in execute_task\n    result = _execute_task(task, data)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/dask/async.py\", line 249, in _execute_task\n    return func(*args2)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/cluster/mean_shift_.py\", line 391, in fit\n    cluster_all=self.cluster_all, n_jobs=self.n_jobs)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/cluster/mean_shift_.py\", line 191, in mean_shift\n    (seed, X, nbrs, max_iter) for seed in seeds)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 800, in **call**\n    while self.dispatch_one_batch(iterator):\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 658, in dispatch_one_batch\n    self._dispatch(tasks)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 566, in _dispatch\n    job = ImmediateComputeBatch(batch)\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 180, in __init__\n    self.results = batch()\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 72, in **call**\n    return [func(_args, *_kwargs) for func, args, kwargs in self.items]\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py\", line 72, in <listcomp>\n    return [func(_args, *_kwargs) for func, args, kwargs in self.items]\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/cluster/mean_shift_.py\", line 75, in _mean_shift_single_seed\n    bandwidth = nbrs.get_params()['radius']\n  File \"/home/ben/anaconda3/lib/python3.5/site-packages/sklearn/base.py\", line 227, in get_params\n    warnings.filters.pop(0)\n#### Versions\n\n> > > import platform; print(platform.platform())\n> > > Linux-3.10.0-327.el7.x86_64-x86_64-with-centos-7.2.1511-Core\n> > > import sys; print(\"Python\", sys.version)\n> > > Python 3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul  2 2016, 17:53:06) \n> > > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]\n> > > import numpy; print(\"NumPy\", numpy.**version**)\n> > > NumPy 1.11.1\n> > > import scipy; print(\"SciPy\", scipy.**version**)\n> > > SciPy 0.17.1\n> > > import sklearn; print(\"Scikit-Learn\", sklearn.**version**)\n> > > Scikit-Learn 0.17.1\n\n<!-- Thanks for contributing! -->\n\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/base.py\n  function: BaseEstimator.get_params\n  function: BaseEstimator.__setstate__\n"
    },
    {
      "similar_issue": {
        "issue_title": "Bug in BaseSearchCV.inverse_transform",
        "issue_body": "The [delegating code](https://github.com/scikit-learn/scikit-learn/blob/e5ceda88f2a24b3dd4f9a94404828f982cdf52ad/sklearn/utils/validation.py#L650) for `inverse_transform` is\r\n\r\n```python\r\n    def inverse_transform(self, Xt):\r\n        self._check_is_fitted('inverse_transform')\r\n        return self.best_estimator_.transform(Xt)\r\n```\r\n\r\nUnless I'm mistaken, this should be `.inverse_transform(Xt)`",
        "issue_id": 8344,
        "pr_number": 8348,
        "pr_title": "[MRG+1] Bug in BaseSearchCV.inverse_transform",
        "pr_body": "<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#Contributing-Pull-Requests\r\n-->\r\n#### Reference Issue\r\n<!-- Example: Fixes #1234 -->\r\nFixes #8344 \r\n\r\n#### What does this implement/fix? Explain your changes.\r\nCode for inverse transform function in BaseSearchCV was written incorrect, I have changed it from `.transform(Xt)` to `.inverse_transform(Xt)` \r\n\r\n#### Any other comments?\r\n\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
        "issue_closed_at": "2017-02-17T16:12:52Z",
        "base_commit": "8694278c027d1017670e67cd3298fc5fd627d4c9"
      },
      "summary": "### Summary:\nThis issue is related to an incorrect method delegation within the `BaseSearchCV` class in the Scikit-learn library. Specifically, the `inverse_transform` method was erroneously calling the `transform` method of `best_estimator_` instead of the intended `inverse_transform` method. This misalignment in method invocation could lead to incorrect data transformation results when users attempt to revert transformed data back to its original form. \n\nKey symptoms and behaviors include unexpected data transformation outcomes when using the `inverse_transform` method, potentially leading to data integrity issues or incorrect analysis results. The main component affected is the `BaseSearchCV` class, a part of the model selection module in Scikit-learn. This bug could significantly impact users relying on accurate inverse transformations, particularly in data preprocessing or model evaluation tasks.\n\nThe severity of the issue is moderate to high, given that incorrect inverse transformations can undermine the validity of machine learning workflows, leading to erroneous conclusions or decisions based on flawed data. \n\nRelevant technical details include the need for correct method delegation, ensuring that the `inverse_transform` method within `BaseSearchCV` accurately references the `inverse_transform` method of `best_estimator_`, thereby maintaining consistency and reliability in data processing operations.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: Bug in BaseSearchCV.inverse_transform\n\nBody:\nThe [delegating code](https://github.com/scikit-learn/scikit-learn/blob/e5ceda88f2a24b3dd4f9a94404828f982cdf52ad/sklearn/utils/validation.py#L650) for `inverse_transform` is\r\n\r\n```python\r\n    def inverse_transform(self, Xt):\r\n        self._check_is_fitted('inverse_transform')\r\n        return self.best_estimator_.transform(Xt)\r\n```\r\n\r\nUnless I'm mistaken, this should be `.inverse_transform(Xt)`\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/model_selection/_search.py\n  function: BaseSearchCV.inverse_transform\n"
    }
  ]
}