{
  "original_problem": {
    "instance_id": "scikit-learn__scikit-learn-13584",
    "repo": "scikit-learn/scikit-learn",
    "created_at": "2019-04-05T23:09:48Z",
    "problem_statement": "bug in print_changed_only in new repr: vector values\n```python\r\nimport sklearn\r\nimport numpy as np\r\nfrom sklearn.linear_model import LogisticRegressionCV\r\nsklearn.set_config(print_changed_only=True)\r\nprint(LogisticRegressionCV(Cs=np.array([0.1, 1])))\r\n```\r\n> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()\r\n\r\nping @NicolasHug \r\n\n",
    "patch": "diff --git a/sklearn/utils/_pprint.py b/sklearn/utils/_pprint.py\n--- a/sklearn/utils/_pprint.py\n+++ b/sklearn/utils/_pprint.py\n@@ -95,7 +95,7 @@ def _changed_params(estimator):\n     init_params = signature(init_func).parameters\n     init_params = {name: param.default for name, param in init_params.items()}\n     for k, v in params.items():\n-        if (v != init_params[k] and\n+        if (repr(v) != repr(init_params[k]) and\n                 not (is_scalar_nan(init_params[k]) and is_scalar_nan(v))):\n             filtered_params[k] = v\n     return filtered_params\n"
  },
  "candidates_evaluated": 5,
  "judgment_result": {
    "candidates": [
      {
        "idx": 1,
        "id": "similar_9791",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue is about deprecated API usage, which is unrelated to the current issue's logic error in parameter comparison."
      },
      {
        "idx": 2,
        "id": "similar_7562",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue involves serialization problems, which do not relate to the logic error in parameter comparison in the current issue."
      },
      {
        "idx": 3,
        "id": "similar_9864",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue is about handling edge cases in data distribution, unrelated to the logic error in parameter comparison."
      },
      {
        "idx": 4,
        "id": "similar_7976",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue involves array shape and memory allocation errors, which do not relate to the logic error in parameter comparison."
      },
      {
        "idx": 5,
        "id": "similar_11951",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue is about handling missing input samples, unrelated to the logic error in parameter comparison."
      }
    ]
  },
  "raw_summaries": [
    {
      "similar_issue": {
        "issue_title": "scipy 1.0: TypeError: fmin_cobyla() got an unexpected keyword argument 'iprint'",
        "issue_body": "Several tests in `sklearn.gaussian_process.tests.test_gaussian_process` fail because we use a deprecated argument that was removed in scipy 1.0:\r\n\r\nHere is an example:\r\n\r\n```\r\n======================================================================\r\nERROR: sklearn.gaussian_process.tests.test_gaussian_process.test_2d\r\n----------------------------------------------------------------------\r\nTraceback (most recent call last):\r\n  File \"/volatile/ogrisel/.virtualenvs/py36/lib/python3.6/site-packages/nose/case.py\", line 198, in runTest\r\n    self.test(*self.arg)\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/tests/test_gaussian_process.py\", line 61, in test_2d\r\n    gp.fit(X, y)\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/gaussian_process.py\", line 350, in fit\r\n    self._arg_max_reduced_likelihood_function()\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/gaussian_process.py\", line 723, in _arg_max_reduced_likelihood_function\r\n    iprint=0)\r\nTypeError: fmin_cobyla() got an unexpected keyword argument 'iprint'\r\n```",
        "issue_id": 9791,
        "pr_number": 9793,
        "pr_title": "[MRG] FIX fmin_cobyla: iprint is deprecated, use disp",
        "pr_body": "This should fix #9791 as the `disp` kwarg was already available in scipy 0.13.3. Let see if CI agrees.",
        "issue_closed_at": "2017-09-19T09:42:01Z",
        "base_commit": "e443c05ea3a4c2634611253759ddbeb4367fe70c"
      },
      "summary": "### Summary:\nThis issue is related to compatibility between the scikit-learn library and scipy version 1.0, which arose due to the removal of a deprecated argument in the `fmin_cobyla` function. In general terms, the problem stems from the use of outdated or deprecated parameters in function calls when interfacing with external libraries that have undergone updates. \n\nKey symptoms and behaviors observed include test failures in the `sklearn.gaussian_process` module, specifically within the `test_gaussian_process` test suite. The error manifests as a `TypeError` indicating that the `fmin_cobyla` function received an unexpected keyword argument, `iprint`, which is no longer supported in scipy 1.0.\n\nThe affected components or systems are primarily within the scikit-learn library's Gaussian Process module, specifically in the `gaussian_process.py` file. The function `_arg_max_reduced_likelihood_function`, which is part of the GaussianProcess class, is directly impacted by this issue.\n\nThe potential impact or severity of this issue is significant for developers or users who rely on the Gaussian Process functionalities of scikit-learn, as it prevents the proper execution of related tests and possibly affects any dependent applications or workflows.\n\nRelevant technical details abstracted for broader understanding include the necessity of maintaining compatibility between dependent libraries by updating code to adhere to the latest APIs, especially when interfacing with external libraries like scipy. The resolution of the issue involves modifying the function calls to accommodate the changes in the external library's API, thereby ensuring continued functionality and compatibility.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: scipy 1.0: TypeError: fmin_cobyla() got an unexpected keyword argument 'iprint'\n\nBody:\nSeveral tests in `sklearn.gaussian_process.tests.test_gaussian_process` fail because we use a deprecated argument that was removed in scipy 1.0:\r\n\r\nHere is an example:\r\n\r\n```\r\n======================================================================\r\nERROR: sklearn.gaussian_process.tests.test_gaussian_process.test_2d\r\n----------------------------------------------------------------------\r\nTraceback (most recent call last):\r\n  File \"/volatile/ogrisel/.virtualenvs/py36/lib/python3.6/site-packages/nose/case.py\", line 198, in runTest\r\n    self.test(*self.arg)\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/tests/test_gaussian_process.py\", line 61, in test_2d\r\n    gp.fit(X, y)\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/gaussian_process.py\", line 350, in fit\r\n    self._arg_max_reduced_likelihood_function()\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/gaussian_process.py\", line 723, in _arg_max_reduced_likelihood_function\r\n    iprint=0)\r\nTypeError: fmin_cobyla() got an unexpected keyword argument 'iprint'\r\n```\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/gaussian_process/gaussian_process.py\n  function: GaussianProcess.minus_reduced_likelihood_function\n"
    },
    {
      "similar_issue": {
        "issue_title": "Error unpickling RandomizedSearchCV objects in 0.18 due to masked arrays",
        "issue_body": "#### Description\n\nIn version 0.18, loading pickles of fitted RandomizedSearchCV objects results in a `TypeError` exception (from pickle also created with version 0.18).\n\nThe error seems related to the use of masked arrays in the `RandomizedSearchCV.cv_results_` attribute - clearing this before pickling (i.e. setting to to `None`) allows pickling/unpickling to work.\n#### Steps/Code to Reproduce\n\n```\nimport pickle                                                                   \nfrom sklearn.model_selection import RandomizedSearchCV                          \nfrom sklearn.ensemble import RandomForestClassifier                             \n\nX = [[1, 0], [1, 0], [1, 0], [0, 1], [0, 1], [0, 1]]                            \ny = [1, 1, 1, 0, 0, 0]                                                          \n\nmodel = RandomizedSearchCV(RandomForestClassifier(),                            \n                           {'n_estimators': [5, 10, 20]},                       \n                           n_iter=3)                                            \nmodel.fit(X, y)                                                                 \n\nwith open('model.pkl', 'wb') as fh:                                             \n    pickle.dump(model, fh)                                                      \n\nwith open('model.pkl', 'rb') as fh:                                             \n    model = pickle.load(fh)\n\nprint(model.predict(X))\n```\n#### Expected Results\n\n```\n[1, 1, 1, 0, 0, 0]\n```\n#### Actual Results\n\n```\nTraceback (most recent call last):\n  File \"./t.py\", line 19, in <module>\n    model = pickle.load(fh)\n  File \"/Users/dsc/miniconda3/envs/p3/lib/python3.5/site-packages/numpy/ma/core.py\", line 5863, in __setstate__\n    super(MaskedArray, self).__setstate__((shp, typ, isf, raw))\nTypeError: object pickle not returning list\n```\n#### Versions\n\nPython 3.5.1 |Continuum Analytics, Inc.| (default, Dec  7 2015, 11:24:55) \n[GCC 4.2.1 (Apple Inc. build 5577)]\nNumPy 1.11.1\nSciPy 0.18.1\nScikit-Learn 0.18\n",
        "issue_id": 7562,
        "pr_number": 7594,
        "pr_title": "[MRG+1] FIX Make sure GridSearchCV and RandomizedSearchCV are pickle-able",
        "pr_body": "Fixes #7562 \n- Subclasses the `np.ma.MaskedArray` and overrides the `__getstate__` to make obj dtyped `MaskedArray`s pickle-able.\n- Uses this fixed `utils.fixes.MaskedArray` inside `gs.cv_results_`...\n\nThis is based off of https://github.com/numpy/numpy/pull/8122\n\nPlease review @jnothman @amueller @GaelVaroquaux @davechallis\n",
        "issue_closed_at": "2016-10-10T19:33:44Z",
        "base_commit": "33ed90dc0aa0549a5963000d7d070aa18ca389c4"
      },
      "summary": "### Summary:\nThis issue involves a compatibility problem with the serialization and deserialization (pickling and unpickling) of `RandomizedSearchCV` objects in Scikit-Learn version 0.18. Specifically, when attempting to unpickle a model that was serialized using the same version, a `TypeError` exception occurs. The problem is traced to the handling of masked arrays within the `cv_results_` attribute of the `RandomizedSearchCV` object. By clearing the `cv_results_` attribute before pickling, the issue can be circumvented, allowing the object to be unpickled without error.\n\n1. **Problem description in general terms**:\n   The issue relates to the inability to correctly unpickle certain machine learning model objects due to internal data structures that are not compatible with Python's pickle module. This problem arises in scenarios where model objects have attributes that include masked arrays, which are part of NumPy's functionalities.\n\n2. **Key symptoms and behaviors observed**:\n   - A `TypeError` occurs during the unpickling process of a `RandomizedSearchCV` object.\n   - The error message indicates a problem with the masked array's state, specifically a failure in returning a list during the process.\n   - The serialization process works without error, but deserialization fails.\n\n3. **Affected components or systems**:\n   - The primary component affected is the `RandomizedSearchCV` class within Scikit-Learn, specifically concerning its `cv_results_` attribute.\n   - The issue also involves the interplay between Scikit-Learn and NumPy's masked array functionalities.\n\n4. **Potential impact or severity**:\n   - This issue can significantly impact workflows that rely on saving and later reloading model objects for prediction or further analysis, as it interrupts the persistence mechanism.\n   - Users working with version 0.18 of Scikit-Learn who need to serialize `RandomizedSearchCV` objects may encounter disruptions or need to implement workarounds.\n\n5. **Any relevant technical details abstracted for broader understanding**:\n   - The issue highlights a limitation in the serialization capabilities of complex data structures such as masked arrays when used in machine learning model objects.\n   - The resolution involves modifying internal states before serialization to ensure compatibility with the pickle module, suggesting a need for careful management of object attributes in software that interfaces with serialization libraries.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: Error unpickling RandomizedSearchCV objects in 0.18 due to masked arrays\n\nBody:\n#### Description\n\nIn version 0.18, loading pickles of fitted RandomizedSearchCV objects results in a `TypeError` exception (from pickle also created with version 0.18).\n\nThe error seems related to the use of masked arrays in the `RandomizedSearchCV.cv_results_` attribute - clearing this before pickling (i.e. setting to to `None`) allows pickling/unpickling to work.\n#### Steps/Code to Reproduce\n\n```\nimport pickle                                                                   \nfrom sklearn.model_selection import RandomizedSearchCV                          \nfrom sklearn.ensemble import RandomForestClassifier                             \n\nX = [[1, 0], [1, 0], [1, 0], [0, 1], [0, 1], [0, 1]]                            \ny = [1, 1, 1, 0, 0, 0]                                                          \n\nmodel = RandomizedSearchCV(RandomForestClassifier(),                            \n                           {'n_estimators': [5, 10, 20]},                       \n                           n_iter=3)                                            \nmodel.fit(X, y)                                                                 \n\nwith open('model.pkl', 'wb') as fh:                                             \n    pickle.dump(model, fh)                                                      \n\nwith open('model.pkl', 'rb') as fh:                                             \n    model = pickle.load(fh)\n\nprint(model.predict(X))\n```\n#### Expected Results\n\n```\n[1, 1, 1, 0, 0, 0]\n```\n#### Actual Results\n\n```\nTraceback (most recent call last):\n  File \"./t.py\", line 19, in <module>\n    model = pickle.load(fh)\n  File \"/Users/dsc/miniconda3/envs/p3/lib/python3.5/site-packages/numpy/ma/core.py\", line 5863, in __setstate__\n    super(MaskedArray, self).__setstate__((shp, typ, isf, raw))\nTypeError: object pickle not returning list\n```\n#### Versions\n\nPython 3.5.1 |Continuum Analytics, Inc.| (default, Dec  7 2015, 11:24:55) \n[GCC 4.2.1 (Apple Inc. build 5577)]\nNumPy 1.11.1\nSciPy 0.18.1\nScikit-Learn 0.18\n\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/model_selection/_search.py\n  line: line 30\n  function: BaseSearchCV._store\n\nsklearn/utils/fixes.py\n  function: rankdata\n"
    },
    {
      "similar_issue": {
        "issue_title": "Improve MinCovDet.fit error when covariance is zero",
        "issue_body": "Ok this is extremely weird, can someone run this code and see if it crashes with that error?\r\n\r\n```py\r\nimport numpy as np\r\n\r\nfrom sklearn.covariance import MinCovDet\r\n\r\nclf = MinCovDet()\r\n\r\ndata = np.array([0.5, 0.1, 0.1, 0.1, 0.957, 0.1, 0.1,\r\n                 0.1, 0.4285, 0.1]).reshape(-1, 1)\r\nclf.fit(data)\r\n```\r\n\r\nIf I change the array to this\r\n\r\n```py\r\ndata = np.array([0.5, 0.11, 0.1, 0.1, 0.957, 0.1, 0.1, \r\n                 0.1, 0.4285, 0.1]).reshape(-1, 1)\r\n```\r\n\r\nThen it runs fine\r\n\r\nBut it seems to crash with any array where there are too many of the same values\r\n\r\nThis array crashes as well\r\n\r\n```py\r\ndata = np.array([0.5, 0.3, 0.3, 0.3, 0.957, 0.3, 0.3, \r\n                 0.3, 0.4285, 0.3]).reshape(-1, 1)\r\n```\r\n\r\nI already checked for NANs and everything, there's nothing\r\n\r\nUsing Python 3.6.2\r\nPandas 0.20.3\r\nNumpy 1.13.1\r\nscikit-learn 0.19.0\r\n\r\n\r\nThanks",
        "issue_id": 9864,
        "pr_number": 9910,
        "pr_title": "[MRG] Improve MinCovDet error when covariance of support data is 0",
        "pr_body": "<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#Contributing-Pull-Requests\r\n-->\r\n#### Reference Issue\r\nFixes #9864 \r\n\r\n\r\n#### What does this implement/fix? Explain your changes.\r\nThis improves the MinCovDet error message when the covariance matrix of the support data is equal to 0. This also adds non-regression tests.\r\n\r\n**EDIT**: When the support data or more lie on a hyperplane, the algorithm described in the original paper returns the minimum covariance determinant estimates of the location and the (singular) covariance matrix. The algorithm then computes the equation of the hyperplane. I don't think (although I'm not certain about it) that `MinCovDet` implements such a particular case and I don't know if this should be handled but we might want to test what happens in this case. I can try the example of the original paper for such a situation and see what happens. My guess is that using `pinvh` makes `MinCovDet` return a solution anyway from which we can compute a Mahalanobis distance.\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
        "issue_closed_at": "2017-10-13T08:33:57Z",
        "base_commit": "fabb4fe4929066041c1c739d40b5fb9b5f514a3a"
      },
      "summary": "### Summary:\nThis issue pertains to the `MinCovDet` class from the `scikit-learn` library, specifically within the `fit` method, which is part of the robust covariance estimation process. The problem arises when the input data contains too many repeated values, leading to a crash during execution. This is an edge case scenario where the covariance matrix calculation can result in a zero value, causing the algorithm to fail.\n\n1. **Problem Description**: The `MinCovDet.fit` method encounters an error when the input data array contains a significant number of repeated values, leading to a zero covariance matrix. This situation is unexpected and causes the method to crash, interrupting the robust covariance estimation process.\n\n2. **Key Symptoms and Behaviors Observed**: The primary symptom is the crashing of the `fit` method when the input data has many identical values, which is confirmed by the successful execution when minor changes are made to the data (e.g., slightly altering one of the repeated values). This suggests a sensitivity in the algorithm to uniform data distributions.\n\n3. **Affected Components or Systems**: The affected component is the `MinCovDet` class in the `scikit-learn` library, specifically within the `robust_covariance` module. The issue resides in the `fit` method, where the covariance matrix calculation is handled.\n\n4. **Potential Impact or Severity**: This issue can significantly impact users relying on robust covariance estimation in data preprocessing or analysis pipelines, particularly when dealing with datasets having uniform value distributions. The severity is considered high for those scenarios as it halts the execution and requires manual intervention to resolve.\n\n5. **Relevant Technical Details Abstracted for Broader Understanding**: The problem highlights a limitation in handling edge cases within robust covariance estimation algorithms. It underscores the importance of accommodating data distributions with repeated values, as these are common in real-world datasets. The fix involves adjustments in the `fast_mcd` function and the `MinCovDet.correct_covariance` method within the `robust_covariance.py` module, ensuring stability and robustness of the covariance estimation process in such scenarios.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: Improve MinCovDet.fit error when covariance is zero\n\nBody:\nOk this is extremely weird, can someone run this code and see if it crashes with that error?\r\n\r\n```py\r\nimport numpy as np\r\n\r\nfrom sklearn.covariance import MinCovDet\r\n\r\nclf = MinCovDet()\r\n\r\ndata = np.array([0.5, 0.1, 0.1, 0.1, 0.957, 0.1, 0.1,\r\n                 0.1, 0.4285, 0.1]).reshape(-1, 1)\r\nclf.fit(data)\r\n```\r\n\r\nIf I change the array to this\r\n\r\n```py\r\ndata = np.array([0.5, 0.11, 0.1, 0.1, 0.957, 0.1, 0.1, \r\n                 0.1, 0.4285, 0.1]).reshape(-1, 1)\r\n```\r\n\r\nThen it runs fine\r\n\r\nBut it seems to crash with any array where there are too many of the same values\r\n\r\nThis array crashes as well\r\n\r\n```py\r\ndata = np.array([0.5, 0.3, 0.3, 0.3, 0.957, 0.3, 0.3, \r\n                 0.3, 0.4285, 0.3]).reshape(-1, 1)\r\n```\r\n\r\nI already checked for NANs and everything, there's nothing\r\n\r\nUsing Python 3.6.2\r\nPandas 0.20.3\r\nNumpy 1.13.1\r\nscikit-learn 0.19.0\r\n\r\n\r\nThanks\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/covariance/robust_covariance.py\n  function: fast_mcd\n  function: MinCovDet.correct_covariance\n"
    },
    {
      "similar_issue": {
        "issue_title": "MLPClasiffier produce error when trying to re-fit",
        "issue_body": "Hi,\r\n\r\nI am training a MLPClasiffier model twice, each time on a different data-set.\r\nOn the second iteration the fit method produce an error. Every time a different error.\r\nI did the processes on various models, this is the only one that produce an error. \r\n\r\nthis are the errors i get (each time a different one) - \r\n\r\n> _lbfgsb.error: failed in converting 7th argument `g' of _lbfgsb.setulb to C/Fortran array\r\n\r\n> ValueError: total size of new array must be unchanged\r\n\r\n > ValueError: operands could not be broadcast together with shapes (154,100) (25,) (154,100) \r\n\r\nthanks",
        "issue_id": 7976,
        "pr_number": 8035,
        "pr_title": "[MRG+1] Catch cases for different class size in MLPClassifier with warm start (#7976) ",
        "pr_body": "<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#Contributing-Pull-Requests\r\n-->\r\n#### Reference Issue\r\n<!-- Example: Fixes #1234 -->\r\nFixes #7976 \r\n\r\n#### What does this implement/fix? Explain your changes.\r\nThis provides a test for different cases that throws an error when warm_start = True for MLPClassifier. Currently, vague errors are thrown when class size is different between the current fit and the previous fit. This fix will throw a clearer error message. \r\n\r\n#### Any other comments?\r\n\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n\r\n",
        "issue_closed_at": "2016-12-29T01:01:13Z",
        "base_commit": "40a1b7a0b10fea2995c7aaa46c90a9633e6d99f6"
      },
      "summary": "### Summary:\nThis issue is related to the `MLPClassifier` from the `sklearn` library, where repeated training attempts on different datasets result in inconsistent errors during the second fit operation. The errors encountered include conversion issues to C/Fortran arrays, mismatched array sizes, and shape broadcasting errors. The specific symptoms observed include various exceptions such as `_lbfgsb.error`, `ValueError` for array size mismatch, and `ValueError` for incompatible array shapes. The affected component is the `MLPClassifier`'s fit method, specifically during its second invocation within the same instance. The potential impact of these errors is significant, as they prevent the successful retraining of the model on new datasets, which could hinder iterative model improvement and validation processes. The technical issues suggest underlying problems with memory allocation, shape validation, and parameter management within the classifier, specifically in functions related to input validation and prediction within the multilayer perceptron module.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: MLPClasiffier produce error when trying to re-fit\n\nBody:\nHi,\r\n\r\nI am training a MLPClasiffier model twice, each time on a different data-set.\r\nOn the second iteration the fit method produce an error. Every time a different error.\r\nI did the processes on various models, this is the only one that produce an error. \r\n\r\nthis are the errors i get (each time a different one) - \r\n\r\n> _lbfgsb.error: failed in converting 7th argument `g' of _lbfgsb.setulb to C/Fortran array\r\n\r\n> ValueError: total size of new array must be unchanged\r\n\r\n > ValueError: operands could not be broadcast together with shapes (154,100) (25,) (154,100) \r\n\r\nthanks\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/neural_network/multilayer_perceptron.py\n  function: MLPRegressor._validate_input\n  function: MLPRegressor.predict\n"
    },
    {
      "similar_issue": {
        "issue_title": "Scoring for dummy classifier does not work without test samples",
        "issue_body": "When using the `Dummy classifier`, it is possible to fit the classifier without providing any examples, which makes sense as the classifier only operates on the targets. However, it is not possible to score the classifier without providing examples. Using `DummyClassifier` without examples is helpful if the examples, but not the targets, are to big to fit into memory. One can still get around this by constructing \r\nartificial examples, say just zeros, but avoiding this would make things a bit easier.\r\n\r\n#### Code to Reproduce\r\n```python\r\nfrom sklearn.dummy import DummyClassifier\r\nimport numpy as np\r\n\r\ny = [1, 1, 2]\r\ny_ = [1, 1, 2]\r\n\r\nd = DummyClassifier()\r\nd.fit(None, y)\r\nprint(d.score(None, y_))\r\n```\r\n\r\n\r\n#### Expected Results\r\nThe score is printed.\r\n\r\n#### Actual Results\r\nAn exception is thrown\r\n```\r\nValueError: Found input variables with inconsistent numbers of samples: [3, 1]\r\n```\r\n\r\nSo one has to do\r\n\r\n```python\r\nfrom sklearn.dummy import DummyClassifier\r\nimport numpy as np\r\n\r\ny = [1, 1, 2]\r\ny_ = [1, 1, 2]\r\nx = np.zeros(shape=(3, 1))\r\n\r\nd = DummyClassifier()\r\nd.fit(None, y)\r\nprint(d.score(x,  y_))\r\n```\r\n\r\nIf this seems like a useful feature, I would be happy to submit a PR.\r\n\r\n#### Versions\r\nLinux-4.15.0-33-generic-x86_64-with-debian-buster-sid\r\nPython 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) \r\n[GCC 7.2.0]\r\nNumPy 1.14.3\r\nSciPy 1.0.0\r\nScikit-Learn 0.19.2\r\n\r\n<!-- Thanks for contributing! -->\r\n",
        "issue_id": 11951,
        "pr_number": 11957,
        "pr_title": "[MRG] Allow scoring of dummies without testsamples",
        "pr_body": "As DummyClassifier and DummyRegressor operate solely on the targets,\r\nthey can now be used without passing test samples, instead passing None.\r\nAlso includes some minor renaming in the corresponding tests for more\r\nconsistency.\r\n\r\n<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist\r\n-->\r\n\r\n#### Reference Issues/PRs\r\nResolves #11951 \r\n<!--\r\nExample: Fixes #1234. See also #3456.\r\nPlease use keywords (e.g., Fixes) to create link to the issues or pull requests\r\nyou resolved, so that they will automatically be closed when your pull request\r\nis merged. See https://github.com/blog/1506-closing-issues-via-pull-requests\r\n-->\r\n\r\n\r\n\r\n\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
        "issue_closed_at": "2018-09-13T08:04:00Z",
        "base_commit": "ddf37c75c7b912104df56e1325363cd94a4fdd5f"
      },
      "summary": "### Summary:\n\nThis issue pertains to the `DummyClassifier` in the Scikit-Learn library, specifically related to the functionality of scoring the classifier without providing test samples. The `DummyClassifier` is designed to operate primarily on target data and can be fitted without input samples, which is beneficial when the input data is too large to fit into memory. However, the problem arises when attempting to score the classifier without input samples, which currently results in an exception due to an inconsistency in the expected input format.\n\n1. **Problem Description in General Terms:**\n   The system fails to handle scenarios where the `DummyClassifier` is used to score predictions without input samples, despite such usage being logical and beneficial in instances where input data is memory-intensive.\n\n2. **Key Symptoms and Behaviors Observed:**\n   - When scoring the `DummyClassifier` without input samples, an exception is thrown indicating a mismatch in the number of samples between input variables.\n   - Users are forced to create artificial input samples (e.g., arrays of zeros) to circumvent this limitation, which complicates the process unnecessarily.\n\n3. **Affected Components or Systems:**\n   - The Scikit-Learn library, specifically the `DummyClassifier` class, is affected. This issue impacts users who rely on this class for modeling when input data is too large to handle in memory.\n\n4. **Potential Impact or Severity:**\n   - The impact is primarily on usability and efficiency for users dealing with large datasets. The issue adds an extra step and complexity for users, but it does not affect the core functionality or accuracy of the classifier itself.\n\n5. **Relevant Technical Details Abstracted for Broader Understanding:**\n   - The `DummyClassifier` is intended to provide baseline predictions without requiring input features, focusing solely on the target data.\n   - The error encountered is a `ValueError` related to inconsistent sample sizes, which arises because the current implementation does not accommodate the absence of input features during scoring.\n\nThis issue suggests a need for enhancement in the `DummyClassifier` to allow scoring without mandatory input samples, improving usability for scenarios with large datasets. The proposed fix involves modifying functions such as `DummyClassifier.predict_log_proba` and `DummyRegressor.predict` to handle such cases appropriately.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: Scoring for dummy classifier does not work without test samples\n\nBody:\nWhen using the `Dummy classifier`, it is possible to fit the classifier without providing any examples, which makes sense as the classifier only operates on the targets. However, it is not possible to score the classifier without providing examples. Using `DummyClassifier` without examples is helpful if the examples, but not the targets, are to big to fit into memory. One can still get around this by constructing \r\nartificial examples, say just zeros, but avoiding this would make things a bit easier.\r\n\r\n#### Code to Reproduce\r\n```python\r\nfrom sklearn.dummy import DummyClassifier\r\nimport numpy as np\r\n\r\ny = [1, 1, 2]\r\ny_ = [1, 1, 2]\r\n\r\nd = DummyClassifier()\r\nd.fit(None, y)\r\nprint(d.score(None, y_))\r\n```\r\n\r\n\r\n#### Expected Results\r\nThe score is printed.\r\n\r\n#### Actual Results\r\nAn exception is thrown\r\n```\r\nValueError: Found input variables with inconsistent numbers of samples: [3, 1]\r\n```\r\n\r\nSo one has to do\r\n\r\n```python\r\nfrom sklearn.dummy import DummyClassifier\r\nimport numpy as np\r\n\r\ny = [1, 1, 2]\r\ny_ = [1, 1, 2]\r\nx = np.zeros(shape=(3, 1))\r\n\r\nd = DummyClassifier()\r\nd.fit(None, y)\r\nprint(d.score(x,  y_))\r\n```\r\n\r\nIf this seems like a useful feature, I would be happy to submit a PR.\r\n\r\n#### Versions\r\nLinux-4.15.0-33-generic-x86_64-with-debian-buster-sid\r\nPython 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) \r\n[GCC 7.2.0]\r\nNumPy 1.14.3\r\nSciPy 1.0.0\r\nScikit-Learn 0.19.2\r\n\r\n<!-- Thanks for contributing! -->\r\n\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nsklearn/dummy.py\n  function: DummyClassifier.predict_log_proba\n  function: DummyRegressor.predict\n"
    }
  ]
}