{
  "instance_id": "scikit-learn__scikit-learn-13584",
  "repo": "scikit-learn/scikit-learn",
  "created_at": "2019-04-05T23:09:48Z",
  "problem_statement": "bug in print_changed_only in new repr: vector values\n```python\r\nimport sklearn\r\nimport numpy as np\r\nfrom sklearn.linear_model import LogisticRegressionCV\r\nsklearn.set_config(print_changed_only=True)\r\nprint(LogisticRegressionCV(Cs=np.array([0.1, 1])))\r\n```\r\n> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()\r\n\r\nping @NicolasHug \r\n\n",
  "patch": "diff --git a/sklearn/utils/_pprint.py b/sklearn/utils/_pprint.py\n--- a/sklearn/utils/_pprint.py\n+++ b/sklearn/utils/_pprint.py\n@@ -95,7 +95,7 @@ def _changed_params(estimator):\n     init_params = signature(init_func).parameters\n     init_params = {name: param.default for name, param in init_params.items()}\n     for k, v in params.items():\n-        if (v != init_params[k] and\n+        if (repr(v) != repr(init_params[k]) and\n                 not (is_scalar_nan(init_params[k]) and is_scalar_nan(v))):\n             filtered_params[k] = v\n     return filtered_params\n",
  "similar_bug_items": [
    {
      "pr_number": 9793,
      "pr_title": "[MRG] FIX fmin_cobyla: iprint is deprecated, use disp",
      "pr_body": "This should fix #9791 as the `disp` kwarg was already available in scipy 0.13.3. Let see if CI agrees.",
      "issue_id": 9791,
      "issue_title": "scipy 1.0: TypeError: fmin_cobyla() got an unexpected keyword argument 'iprint'",
      "issue_body": "Several tests in `sklearn.gaussian_process.tests.test_gaussian_process` fail because we use a deprecated argument that was removed in scipy 1.0:\r\n\r\nHere is an example:\r\n\r\n```\r\n======================================================================\r\nERROR: sklearn.gaussian_process.tests.test_gaussian_process.test_2d\r\n----------------------------------------------------------------------\r\nTraceback (most recent call last):\r\n  File \"/volatile/ogrisel/.virtualenvs/py36/lib/python3.6/site-packages/nose/case.py\", line 198, in runTest\r\n    self.test(*self.arg)\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/tests/test_gaussian_process.py\", line 61, in test_2d\r\n    gp.fit(X, y)\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/gaussian_process.py\", line 350, in fit\r\n    self._arg_max_reduced_likelihood_function()\r\n  File \"/volatile/ogrisel/code/scikit-learn/sklearn/gaussian_process/gaussian_process.py\", line 723, in _arg_max_reduced_likelihood_function\r\n    iprint=0)\r\nTypeError: fmin_cobyla() got an unexpected keyword argument 'iprint'\r\n```",
      "issue_closed_at": "2017-09-19T09:42:01Z",
      "base_commit": "e443c05ea3a4c2634611253759ddbeb4367fe70c",
      "changes": [
        {
          "file": "sklearn/gaussian_process/gaussian_process.py",
          "type": "function",
          "name": "minus_reduced_likelihood_function",
          "class_name": "GaussianProcess",
          "code": "def minus_reduced_likelihood_function(log10t):\n                return - self.reduced_likelihood_function(\n                    theta=10. ** log10t)[0]"
        }
      ]
    },
    {
      "pr_number": 7594,
      "pr_title": "[MRG+1] FIX Make sure GridSearchCV and RandomizedSearchCV are pickle-able",
      "pr_body": "Fixes #7562 \n- Subclasses the `np.ma.MaskedArray` and overrides the `__getstate__` to make obj dtyped `MaskedArray`s pickle-able.\n- Uses this fixed `utils.fixes.MaskedArray` inside `gs.cv_results_`...\n\nThis is based off of https://github.com/numpy/numpy/pull/8122\n\nPlease review @jnothman @amueller @GaelVaroquaux @davechallis\n",
      "issue_id": 7562,
      "issue_title": "Error unpickling RandomizedSearchCV objects in 0.18 due to masked arrays",
      "issue_body": "#### Description\n\nIn version 0.18, loading pickles of fitted RandomizedSearchCV objects results in a `TypeError` exception (from pickle also created with version 0.18).\n\nThe error seems related to the use of masked arrays in the `RandomizedSearchCV.cv_results_` attribute - clearing this before pickling (i.e. setting to to `None`) allows pickling/unpickling to work.\n#### Steps/Code to Reproduce\n\n```\nimport pickle                                                                   \nfrom sklearn.model_selection import RandomizedSearchCV                          \nfrom sklearn.ensemble import RandomForestClassifier                             \n\nX = [[1, 0], [1, 0], [1, 0], [0, 1], [0, 1], [0, 1]]                            \ny = [1, 1, 1, 0, 0, 0]                                                          \n\nmodel = RandomizedSearchCV(RandomForestClassifier(),                            \n                           {'n_estimators': [5, 10, 20]},                       \n                           n_iter=3)                                            \nmodel.fit(X, y)                                                                 \n\nwith open('model.pkl', 'wb') as fh:                                             \n    pickle.dump(model, fh)                                                      \n\nwith open('model.pkl', 'rb') as fh:                                             \n    model = pickle.load(fh)\n\nprint(model.predict(X))\n```\n#### Expected Results\n\n```\n[1, 1, 1, 0, 0, 0]\n```\n#### Actual Results\n\n```\nTraceback (most recent call last):\n  File \"./t.py\", line 19, in <module>\n    model = pickle.load(fh)\n  File \"/Users/dsc/miniconda3/envs/p3/lib/python3.5/site-packages/numpy/ma/core.py\", line 5863, in __setstate__\n    super(MaskedArray, self).__setstate__((shp, typ, isf, raw))\nTypeError: object pickle not returning list\n```\n#### Versions\n\nPython 3.5.1 |Continuum Analytics, Inc.| (default, Dec  7 2015, 11:24:55) \n[GCC 4.2.1 (Apple Inc. build 5577)]\nNumPy 1.11.1\nSciPy 0.18.1\nScikit-Learn 0.18\n",
      "issue_closed_at": "2016-10-10T19:33:44Z",
      "base_commit": "33ed90dc0aa0549a5963000d7d070aa18ca389c4",
      "changes": [
        {
          "file": "sklearn/model_selection/_search.py",
          "type": "line",
          "name": "line 30",
          "code": "from ..utils import check_random_state\nfrom ..utils.fixes import sp_version\nfrom ..utils.fixes import rankdata\nfrom ..utils.random import sample_without_replacement\nfrom ..utils.validation import indexable, check_is_fitted\nfrom ..utils.metaestimators import if_delegate_has_method"
        },
        {
          "file": "sklearn/model_selection/_search.py",
          "type": "function",
          "name": "_store",
          "class_name": "BaseSearchCV",
          "code": "def _store(key_name, array, weights=None, splits=False, rank=False):\n            \"\"\"A small helper to store the scores/times to the cv_results_\"\"\"\n            array = np.array(array, dtype=np.float64).reshape(n_candidates,\n                                                              n_splits)\n            if splits:\n                for split_i in range(n_splits):\n                    results[\"split%d_%s\"\n                            % (split_i, key_name)] = array[:, split_i]\n\n            array_means = np.average(array, axis=1, weights=weights)\n            results['mean_%s' % key_name] = array_means\n            # Weighted std is not directly available in numpy\n            array_stds = np.sqrt(np.average((array -\n                                             array_means[:, np.newaxis]) ** 2,\n                                            axis=1, weights=weights))\n            results['std_%s' % key_name] = array_stds\n\n            if rank:\n                results[\"rank_%s\" % key_name] = np.asarray(\n                    rankdata(-array_means, method='min'), dtype=np.int32)"
        },
        {
          "file": "sklearn/utils/fixes.py",
          "type": "function",
          "name": "rankdata",
          "class_name": null,
          "code": "def rankdata(a, method='average'):\n        if method not in ('average', 'min', 'max', 'dense', 'ordinal'):\n            raise ValueError('unknown method \"{0}\"'.format(method))\n\n        arr = np.ravel(np.asarray(a))\n        algo = 'mergesort' if method == 'ordinal' else 'quicksort'\n        sorter = np.argsort(arr, kind=algo)\n\n        inv = np.empty(sorter.size, dtype=np.intp)\n        inv[sorter] = np.arange(sorter.size, dtype=np.intp)\n\n        if method == 'ordinal':\n            return inv + 1\n\n        arr = arr[sorter]\n        obs = np.r_[True, arr[1:] != arr[:-1]]\n        dense = obs.cumsum()[inv]\n\n        if method == 'dense':\n            return dense\n\n        # cumulative counts of each unique value\n        count = np.r_[np.nonzero(obs)[0], len(obs)]\n\n        if method == 'max':\n            return count[dense]\n\n        if method == 'min':\n            return count[dense - 1] + 1\n\n        # average method\n        return .5 * (count[dense] + count[dense - 1] + 1)"
        }
      ]
    },
    {
      "pr_number": 9910,
      "pr_title": "[MRG] Improve MinCovDet error when covariance of support data is 0",
      "pr_body": "<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#Contributing-Pull-Requests\r\n-->\r\n#### Reference Issue\r\nFixes #9864 \r\n\r\n\r\n#### What does this implement/fix? Explain your changes.\r\nThis improves the MinCovDet error message when the covariance matrix of the support data is equal to 0. This also adds non-regression tests.\r\n\r\n**EDIT**: When the support data or more lie on a hyperplane, the algorithm described in the original paper returns the minimum covariance determinant estimates of the location and the (singular) covariance matrix. The algorithm then computes the equation of the hyperplane. I don't think (although I'm not certain about it) that `MinCovDet` implements such a particular case and I don't know if this should be handled but we might want to test what happens in this case. I can try the example of the original paper for such a situation and see what happens. My guess is that using `pinvh` makes `MinCovDet` return a solution anyway from which we can compute a Mahalanobis distance.\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
      "issue_id": 9864,
      "issue_title": "Improve MinCovDet.fit error when covariance is zero",
      "issue_body": "Ok this is extremely weird, can someone run this code and see if it crashes with that error?\r\n\r\n```py\r\nimport numpy as np\r\n\r\nfrom sklearn.covariance import MinCovDet\r\n\r\nclf = MinCovDet()\r\n\r\ndata = np.array([0.5, 0.1, 0.1, 0.1, 0.957, 0.1, 0.1,\r\n                 0.1, 0.4285, 0.1]).reshape(-1, 1)\r\nclf.fit(data)\r\n```\r\n\r\nIf I change the array to this\r\n\r\n```py\r\ndata = np.array([0.5, 0.11, 0.1, 0.1, 0.957, 0.1, 0.1, \r\n                 0.1, 0.4285, 0.1]).reshape(-1, 1)\r\n```\r\n\r\nThen it runs fine\r\n\r\nBut it seems to crash with any array where there are too many of the same values\r\n\r\nThis array crashes as well\r\n\r\n```py\r\ndata = np.array([0.5, 0.3, 0.3, 0.3, 0.957, 0.3, 0.3, \r\n                 0.3, 0.4285, 0.3]).reshape(-1, 1)\r\n```\r\n\r\nI already checked for NANs and everything, there's nothing\r\n\r\nUsing Python 3.6.2\r\nPandas 0.20.3\r\nNumpy 1.13.1\r\nscikit-learn 0.19.0\r\n\r\n\r\nThanks",
      "issue_closed_at": "2017-10-13T08:33:57Z",
      "base_commit": "fabb4fe4929066041c1c739d40b5fb9b5f514a3a",
      "changes": [
        {
          "file": "sklearn/covariance/robust_covariance.py",
          "type": "function",
          "name": "fast_mcd",
          "class_name": null,
          "code": "def fast_mcd(X, support_fraction=None,\n             cov_computation_method=empirical_covariance,\n             random_state=None):\n    \"\"\"Estimates the Minimum Covariance Determinant matrix.\n\n    Read more in the :ref:`User Guide <robust_covariance>`.\n\n    Parameters\n    ----------\n    X : array-like, shape (n_samples, n_features)\n      The data matrix, with p features and n samples.\n\n    support_fraction : float, 0 < support_fraction < 1\n          The proportion of points to be included in the support of the raw\n          MCD estimate. Default is None, which implies that the minimum\n          value of support_fraction will be used within the algorithm:\n          `[n_sample + n_features + 1] / 2`.\n\n    cov_computation_method : callable, default empirical_covariance\n        The function which will be used to compute the covariance.\n        Must return shape (n_features, n_features)\n\n    random_state : int, RandomState instance or None, optional (default=None)\n        If int, random_state is the seed used by the random number generator;\n        If RandomState instance, random_state is the random number generator;\n        If None, the random number generator is the RandomState instance used\n        by `np.random`.\n\n    Notes\n    -----\n    The FastMCD algorithm has been introduced by Rousseuw and Van Driessen\n    in \"A Fast Algorithm for the Minimum Covariance Determinant Estimator,\n    1999, American Statistical Association and the American Society\n    for Quality, TECHNOMETRICS\".\n    The principle is to compute robust estimates and random subsets before\n    pooling them into a larger subsets, and finally into the full data set.\n    Depending on the size of the initial sample, we have one, two or three\n    such computation levels.\n\n    Note that only raw estimates are returned. If one is interested in\n    the correction and reweighting steps described in [RouseeuwVan]_,\n    see the MinCovDet object.\n\n    References\n    ----------\n\n    .. [RouseeuwVan] A Fast Algorithm for the Minimum Covariance\n        Determinant Estimator, 1999, American Statistical Association\n        and the American Society for Quality, TECHNOMETRICS\n\n    .. [Butler1993] R. W. Butler, P. L. Davies and M. Jhun,\n        Asymptotics For The Minimum Covariance Determinant Estimator,\n        The Annals of Statistics, 1993, Vol. 21, No. 3, 1385-1400\n\n    Returns\n    -------\n    location : array-like, shape (n_features,)\n        Robust location of the data.\n\n    covariance : array-like, shape (n_features, n_features)\n        Robust covariance of the features.\n\n    support : array-like, type boolean, shape (n_samples,)\n        A mask of the observations that have been used to compute\n        the robust location and covariance estimates of the data set.\n\n    \"\"\"\n    random_state = check_random_state(random_state)\n\n    X = check_array(X, ensure_min_samples=2, estimator='fast_mcd')\n    n_samples, n_features = X.shape\n\n    # minimum breakdown value\n    if support_fraction is None:\n        n_support = int(np.ceil(0.5 * (n_samples + n_features + 1)))\n    else:\n        n_support = int(support_fraction * n_samples)\n\n    # 1-dimensional case quick computation\n    # (Rousseeuw, P. J. and Leroy, A. M. (2005) References, in Robust\n    #  Regression and Outlier Detection, John Wiley & Sons, chapter 4)\n    if n_features == 1:\n        if n_support < n_samples:\n            # find the sample shortest halves\n            X_sorted = np.sort(np.ravel(X))\n            diff = X_sorted[n_support:] - X_sorted[:(n_samples - n_support)]\n            halves_start = np.where(diff == np.min(diff))[0]\n            # take the middle points' mean to get the robust location estimate\n            location = 0.5 * (X_sorted[n_support + halves_start] +\n                              X_sorted[halves_start]).mean()\n            support = np.zeros(n_samples, dtype=bool)\n            X_centered = X - location\n            support[np.argsort(np.abs(X_centered), 0)[:n_support]] = True\n            covariance = np.asarray([[np.var(X[support])]])\n            location = np.array([location])\n            # get precision matrix in an optimized way\n            precision = linalg.pinvh(covariance)\n            dist = (np.dot(X_centered, precision) * (X_centered)).sum(axis=1)\n        else:\n            support = np.ones(n_samples, dtype=bool)\n            covariance = np.asarray([[np.var(X)]])\n            location = np.asarray([np.mean(X)])\n            X_centered = X - location\n            # get precision matrix in an optimized way\n            precision = linalg.pinvh(covariance)\n            dist = (np.dot(X_centered, precision) * (X_centered)).sum(axis=1)\n# Starting FastMCD algorithm for p-dimensional case\n    if (n_samples > 500) and (n_features > 1):\n        # 1. Find candidate supports on subsets\n        # a. split the set in subsets of size ~ 300\n        n_subsets = n_samples // 300\n        n_samples_subsets = n_samples // n_subsets\n        samples_shuffle = random_state.permutation(n_samples)\n        h_subset = int(np.ceil(n_samples_subsets *\n                       (n_support / float(n_samples))))\n        # b. perform a total of 500 trials\n        n_trials_tot = 500\n        # c. select 10 best (location, covariance) for each subset\n        n_best_sub = 10\n        n_trials = max(10, n_trials_tot // n_subsets)\n        n_best_tot = n_subsets * n_best_sub\n        all_best_locations = np.zeros((n_best_tot, n_features))\n        try:\n            all_best_covariances = np.zeros((n_best_tot, n_features,\n                                             n_features))\n        except MemoryError:\n            # The above is too big. Let's try with something much small\n            # (and less optimal)\n            all_best_covariances = np.zeros((n_best_tot, n_features,\n                                             n_features))\n            n_best_tot = 10\n            n_best_sub = 2\n        for i in range(n_subsets):\n            low_bound = i * n_samples_subsets\n            high_bound = low_bound + n_samples_subsets\n            current_subset = X[samples_shuffle[low_bound:high_bound]]\n            best_locations_sub, best_covariances_sub, _, _ = select_candidates(\n                current_subset, h_subset, n_trials,\n                select=n_best_sub, n_iter=2,\n                cov_computation_method=cov_computation_method,\n                random_state=random_state)\n            subset_slice = np.arange(i * n_best_sub, (i + 1) * n_best_sub)\n            all_best_locations[subset_slice] = best_locations_sub\n            all_best_covariances[subset_slice] = best_covariances_sub\n        # 2. Pool the candidate supports into a merged set\n        # (possibly the full dataset)\n        n_samples_merged = min(1500, n_samples)\n        h_merged = int(np.ceil(n_samples_merged *\n                       (n_support / float(n_samples))))\n        if n_samples > 1500:\n            n_best_merged = 10\n        else:\n            n_best_merged = 1\n        # find the best couples (location, covariance) on the merged set\n        selection = random_state.permutation(n_samples)[:n_samples_merged]\n        locations_merged, covariances_merged, supports_merged, d = \\\n            select_candidates(\n                X[selection], h_merged,\n                n_trials=(all_best_locations, all_best_covariances),\n                select=n_best_merged,\n                cov_computation_method=cov_computation_method,\n                random_state=random_state)\n        # 3. Finally get the overall best (locations, covariance) couple\n        if n_samples < 1500:\n            # directly get the best couple (location, covariance)\n            location = locations_merged[0]\n            covariance = covariances_merged[0]\n            support = np.zeros(n_samples, dtype=bool)\n            dist = np.zeros(n_samples)\n            support[selection] = supports_merged[0]\n            dist[selection] = d[0]\n        else:\n            # select the best couple on the full dataset\n            locations_full, covariances_full, supports_full, d = \\\n                select_candidates(\n                    X, n_support,\n                    n_trials=(locations_merged, covariances_merged),\n                    select=1,\n                    cov_computation_method=cov_computation_method,\n                    random_state=random_state)\n            location = locations_full[0]\n            covariance = covariances_full[0]\n            support = supports_full[0]\n            dist = d[0]\n    elif n_features > 1:\n        # 1. Find the 10 best couples (location, covariance)\n        # considering two iterations\n        n_trials = 30\n        n_best = 10\n        locations_best, covariances_best, _, _ = select_candidates(\n            X, n_support, n_trials=n_trials, select=n_best, n_iter=2,\n            cov_computation_method=cov_computation_method,\n            random_state=random_state)\n        # 2. Select the best couple on the full dataset amongst the 10\n        locations_full, covariances_full, supports_full, d = select_candidates(\n            X, n_support, n_trials=(locations_best, covariances_best),\n            select=1, cov_computation_method=cov_computation_method,\n            random_state=random_state)\n        location = locations_full[0]\n        covariance = covariances_full[0]\n        support = supports_full[0]\n        dist = d[0]\n\n    return location, covariance, support, dist"
        },
        {
          "file": "sklearn/covariance/robust_covariance.py",
          "type": "function",
          "name": "correct_covariance",
          "class_name": "MinCovDet",
          "code": "def correct_covariance(self, data):\n        \"\"\"Apply a correction to raw Minimum Covariance Determinant estimates.\n\n        Correction using the empirical correction factor suggested\n        by Rousseeuw and Van Driessen in [RVD]_.\n\n        Parameters\n        ----------\n        data : array-like, shape (n_samples, n_features)\n            The data matrix, with p features and n samples.\n            The data set must be the one which was used to compute\n            the raw estimates.\n\n        References\n        ----------\n\n        .. [RVD] `A Fast Algorithm for the Minimum Covariance\n            Determinant Estimator, 1999, American Statistical Association\n            and the American Society for Quality, TECHNOMETRICS`\n\n        Returns\n        -------\n        covariance_corrected : array-like, shape (n_features, n_features)\n            Corrected robust covariance estimate.\n\n        \"\"\"\n        correction = np.median(self.dist_) / chi2(data.shape[1]).isf(0.5)\n        covariance_corrected = self.raw_covariance_ * correction\n        self.dist_ /= correction\n        return covariance_corrected"
        }
      ]
    },
    {
      "pr_number": 8035,
      "pr_title": "[MRG+1] Catch cases for different class size in MLPClassifier with warm start (#7976) ",
      "pr_body": "<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#Contributing-Pull-Requests\r\n-->\r\n#### Reference Issue\r\n<!-- Example: Fixes #1234 -->\r\nFixes #7976 \r\n\r\n#### What does this implement/fix? Explain your changes.\r\nThis provides a test for different cases that throws an error when warm_start = True for MLPClassifier. Currently, vague errors are thrown when class size is different between the current fit and the previous fit. This fix will throw a clearer error message. \r\n\r\n#### Any other comments?\r\n\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n\r\n",
      "issue_id": 7976,
      "issue_title": "MLPClasiffier produce error when trying to re-fit",
      "issue_body": "Hi,\r\n\r\nI am training a MLPClasiffier model twice, each time on a different data-set.\r\nOn the second iteration the fit method produce an error. Every time a different error.\r\nI did the processes on various models, this is the only one that produce an error. \r\n\r\nthis are the errors i get (each time a different one) - \r\n\r\n> _lbfgsb.error: failed in converting 7th argument `g' of _lbfgsb.setulb to C/Fortran array\r\n\r\n> ValueError: total size of new array must be unchanged\r\n\r\n > ValueError: operands could not be broadcast together with shapes (154,100) (25,) (154,100) \r\n\r\nthanks",
      "issue_closed_at": "2016-12-29T01:01:13Z",
      "base_commit": "40a1b7a0b10fea2995c7aaa46c90a9633e6d99f6",
      "changes": [
        {
          "file": "sklearn/neural_network/multilayer_perceptron.py",
          "type": "function",
          "name": "_validate_input",
          "class_name": "MLPRegressor",
          "code": "def _validate_input(self, X, y, incremental):\n        X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],\n                         multi_output=True, y_numeric=True)\n        if y.ndim == 2 and y.shape[1] == 1:\n            y = column_or_1d(y, warn=True)\n        return X, y"
        },
        {
          "file": "sklearn/neural_network/multilayer_perceptron.py",
          "type": "function",
          "name": "predict",
          "class_name": "MLPRegressor",
          "code": "def predict(self, X):\n        \"\"\"Predict using the multi-layer perceptron model.\n\n        Parameters\n        ----------\n        X : {array-like, sparse matrix}, shape (n_samples, n_features)\n            The input data.\n\n        Returns\n        -------\n        y : array-like, shape (n_samples, n_outputs)\n            The predicted values.\n        \"\"\"\n        check_is_fitted(self, \"coefs_\")\n        y_pred = self._predict(X)\n        if y_pred.shape[1] == 1:\n            return y_pred.ravel()\n        return y_pred"
        }
      ]
    },
    {
      "pr_number": 11957,
      "pr_title": "[MRG] Allow scoring of dummies without testsamples",
      "pr_body": "As DummyClassifier and DummyRegressor operate solely on the targets,\r\nthey can now be used without passing test samples, instead passing None.\r\nAlso includes some minor renaming in the corresponding tests for more\r\nconsistency.\r\n\r\n<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist\r\n-->\r\n\r\n#### Reference Issues/PRs\r\nResolves #11951 \r\n<!--\r\nExample: Fixes #1234. See also #3456.\r\nPlease use keywords (e.g., Fixes) to create link to the issues or pull requests\r\nyou resolved, so that they will automatically be closed when your pull request\r\nis merged. See https://github.com/blog/1506-closing-issues-via-pull-requests\r\n-->\r\n\r\n\r\n\r\n\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
      "issue_id": 11951,
      "issue_title": "Scoring for dummy classifier does not work without test samples",
      "issue_body": "When using the `Dummy classifier`, it is possible to fit the classifier without providing any examples, which makes sense as the classifier only operates on the targets. However, it is not possible to score the classifier without providing examples. Using `DummyClassifier` without examples is helpful if the examples, but not the targets, are to big to fit into memory. One can still get around this by constructing \r\nartificial examples, say just zeros, but avoiding this would make things a bit easier.\r\n\r\n#### Code to Reproduce\r\n```python\r\nfrom sklearn.dummy import DummyClassifier\r\nimport numpy as np\r\n\r\ny = [1, 1, 2]\r\ny_ = [1, 1, 2]\r\n\r\nd = DummyClassifier()\r\nd.fit(None, y)\r\nprint(d.score(None, y_))\r\n```\r\n\r\n\r\n#### Expected Results\r\nThe score is printed.\r\n\r\n#### Actual Results\r\nAn exception is thrown\r\n```\r\nValueError: Found input variables with inconsistent numbers of samples: [3, 1]\r\n```\r\n\r\nSo one has to do\r\n\r\n```python\r\nfrom sklearn.dummy import DummyClassifier\r\nimport numpy as np\r\n\r\ny = [1, 1, 2]\r\ny_ = [1, 1, 2]\r\nx = np.zeros(shape=(3, 1))\r\n\r\nd = DummyClassifier()\r\nd.fit(None, y)\r\nprint(d.score(x,  y_))\r\n```\r\n\r\nIf this seems like a useful feature, I would be happy to submit a PR.\r\n\r\n#### Versions\r\nLinux-4.15.0-33-generic-x86_64-with-debian-buster-sid\r\nPython 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) \r\n[GCC 7.2.0]\r\nNumPy 1.14.3\r\nSciPy 1.0.0\r\nScikit-Learn 0.19.2\r\n\r\n<!-- Thanks for contributing! -->\r\n",
      "issue_closed_at": "2018-09-13T08:04:00Z",
      "base_commit": "ddf37c75c7b912104df56e1325363cd94a4fdd5f",
      "changes": [
        {
          "file": "sklearn/dummy.py",
          "type": "function",
          "name": "predict_log_proba",
          "class_name": "DummyClassifier",
          "code": "def predict_log_proba(self, X):\n        \"\"\"\n        Return log probability estimates for the test vectors X.\n\n        Parameters\n        ----------\n        X : {array-like, object with finite length or shape}\n            Training data, requires length = n_samples\n\n        Returns\n        -------\n        P : array-like or list of array-like of shape = [n_samples, n_classes]\n            Returns the log probability of the sample for each class in\n            the model, where classes are ordered arithmetically for each\n            output.\n        \"\"\"\n        proba = self.predict_proba(X)\n        if self.n_outputs_ == 1:\n            return np.log(proba)\n        else:\n            return [np.log(p) for p in proba]"
        },
        {
          "file": "sklearn/dummy.py",
          "type": "function",
          "name": "predict",
          "class_name": "DummyRegressor",
          "code": "def predict(self, X, return_std=False):\n        \"\"\"\n        Perform classification on test vectors X.\n\n        Parameters\n        ----------\n        X : {array-like, object with finite length or shape}\n            Training data, requires length = n_samples\n\n        return_std : boolean, optional\n            Whether to return the standard deviation of posterior prediction.\n            All zeros in this case.\n\n        Returns\n        -------\n        y : array, shape = [n_samples]  or [n_samples, n_outputs]\n            Predicted target values for X.\n\n        y_std : array, shape = [n_samples]  or [n_samples, n_outputs]\n            Standard deviation of predictive distribution of query points.\n        \"\"\"\n        check_is_fitted(self, \"constant_\")\n        n_samples = _num_samples(X)\n\n        y = np.full((n_samples, self.n_outputs_), self.constant_,\n                    dtype=np.array(self.constant_).dtype)\n        y_std = np.zeros((n_samples, self.n_outputs_))\n\n        if self.n_outputs_ == 1 and not self.output_2d_:\n            y = np.ravel(y)\n            y_std = np.ravel(y_std)\n\n        return (y, y_std) if return_std else y"
        }
      ]
    }
  ]
}