{
    "Selected_candidate": {
        "pr_number": 19641,
        "pr_title": "Fix Calibrated classifier cv predictions with pipeline",
        "pr_body": "<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md\r\n-->\r\n\r\n#### Reference Issues/PRs\r\n<!--\r\nExample: Fixes #1234. See also #3456.\r\nPlease use keywords (e.g., Fixes) to create link to the issues or pull requests\r\nyou resolved, so that they will automatically be closed when your pull request\r\nis merged. See https://github.com/blog/1506-closing-issues-via-pull-requests\r\n-->\r\nFixes #19637, #8710.\r\n\r\n#### What does this implement/fix? Explain your changes.\r\n\r\nAs suggested in #19641, this PR removed validation from the CalibratedClassifierCV predict_proba function and replaced X.shape[0] with _num_samples(X).\r\n\r\nRegression testing has also been added.\r\n\r\n#### Any other comments?\r\n\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
        "issue_id": 19637,
        "issue_title": "CalibratedClassifier on pipelines",
        "issue_body": "Looks like CalibratedClassifierCV doesn't support pipelines, see https://github.com/scikit-learn/scikit-learn/issues/19625 and https://github.com/scikit-learn/scikit-learn/discussions/19279\r\n\r\nWe should try to delegate the validation to the base estimator (https://github.com/scikit-learn/scikit-learn/discussions/19279#discussioncomment-313812)",
        "issue_closed_at": "2021-03-10T10:27:24Z",
        "base_commit": "42e90e9ba28fb37c2c9bd3e8aed1ac2387f1d5d5",
        "changes": [
            {
                "file": "sklearn/calibration.py",
                "type": "line",
                "name": "line 24",
                "code": "                   MetaEstimatorMixin)\nfrom .preprocessing import label_binarize, LabelEncoder\nfrom .utils import (\n    check_array,\n    column_or_1d,\n    deprecated,\n    indexable,\n)\nfrom .utils.multiclass import check_classification_targets\nfrom .utils.fixes import delayed\nfrom .utils.validation import check_is_fitted, check_consistent_length\nfrom .utils.validation import _check_sample_weight\nfrom .pipeline import Pipeline\nfrom .isotonic import IsotonicRegression\nfrom .svm import LinearSVC"
            },
            {
                "file": "sklearn/calibration.py",
                "type": "function",
                "name": "predict_proba",
                "class_name": "_CalibratedClassifier",
                "code": "def predict_proba(self, X):\n        \"\"\"Calculate calibrated probabilities.\n\n        Calculates classification calibrated probabilities\n        for each class, in a one-vs-all manner, for `X`.\n\n        Parameters\n        ----------\n        X : ndarray of shape (n_samples, n_features)\n            The sample data.\n\n        Returns\n        -------\n        proba : array, shape (n_samples, n_classes)\n            The predicted probabilities. Can be exact zeros.\n        \"\"\"\n        n_classes = len(self.classes)\n        pred_method = _get_prediction_method(self.base_estimator)\n        predictions = _compute_predictions(pred_method, X, n_classes)\n\n        label_encoder = LabelEncoder().fit(self.classes)\n        pos_class_indices = label_encoder.transform(\n            self.base_estimator.classes_\n        )\n\n        proba = np.zeros((X.shape[0], n_classes))\n        for class_idx, this_pred, calibrator in \\\n                zip(pos_class_indices, predictions.T, self.calibrators):\n            if n_classes == 2:\n                # When binary, `predictions` consists only of predictions for\n                # clf.classes_[1] but `pos_class_indices` = 0\n                class_idx += 1\n            proba[:, class_idx] = calibrator.predict(this_pred)\n\n        # Normalize the probabilities\n        if n_classes == 2:\n            proba[:, 0] = 1. - proba[:, 1]\n        else:\n            denominator = np.sum(proba, axis=1)[:, np.newaxis]\n            # In the edge case where for each class calibrator returns a null\n            # probability for a given sample, use the uniform distribution\n            # instead.\n            uniform_proba = np.full_like(proba, 1 / n_classes)\n            proba = np.divide(proba, denominator, out=uniform_proba,\n                              where=denominator != 0)\n\n        # Deal with cases where the predicted probability minimally exceeds 1.0\n        proba[(1.0 < proba) & (proba <= 1.0 + 1e-5)] = 1.0\n\n        return proba"
            },
            {
                "file": "sklearn/calibration.py",
                "type": "function",
                "name": "predict_proba",
                "class_name": "_CalibratedClassifier",
                "code": "def predict_proba(self, X):\n        \"\"\"Calculate calibrated probabilities.\n\n        Calculates classification calibrated probabilities\n        for each class, in a one-vs-all manner, for `X`.\n\n        Parameters\n        ----------\n        X : ndarray of shape (n_samples, n_features)\n            The sample data.\n\n        Returns\n        -------\n        proba : array, shape (n_samples, n_classes)\n            The predicted probabilities. Can be exact zeros.\n        \"\"\"\n        n_classes = len(self.classes)\n        pred_method = _get_prediction_method(self.base_estimator)\n        predictions = _compute_predictions(pred_method, X, n_classes)\n\n        label_encoder = LabelEncoder().fit(self.classes)\n        pos_class_indices = label_encoder.transform(\n            self.base_estimator.classes_\n        )\n\n        proba = np.zeros((X.shape[0], n_classes))\n        for class_idx, this_pred, calibrator in \\\n                zip(pos_class_indices, predictions.T, self.calibrators):\n            if n_classes == 2:\n                # When binary, `predictions` consists only of predictions for\n                # clf.classes_[1] but `pos_class_indices` = 0\n                class_idx += 1\n            proba[:, class_idx] = calibrator.predict(this_pred)\n\n        # Normalize the probabilities\n        if n_classes == 2:\n            proba[:, 0] = 1. - proba[:, 1]\n        else:\n            denominator = np.sum(proba, axis=1)[:, np.newaxis]\n            # In the edge case where for each class calibrator returns a null\n            # probability for a given sample, use the uniform distribution\n            # instead.\n            uniform_proba = np.full_like(proba, 1 / n_classes)\n            proba = np.divide(proba, denominator, out=uniform_proba,\n                              where=denominator != 0)\n\n        # Deal with cases where the predicted probability minimally exceeds 1.0\n        proba[(1.0 < proba) & (proba <= 1.0 + 1e-5)] = 1.0\n\n        return proba"
            },
            {
                "file": "sklearn/calibration.py",
                "type": "function",
                "name": "predict_proba",
                "class_name": "_CalibratedClassifier",
                "code": "def predict_proba(self, X):\n        \"\"\"Calculate calibrated probabilities.\n\n        Calculates classification calibrated probabilities\n        for each class, in a one-vs-all manner, for `X`.\n\n        Parameters\n        ----------\n        X : ndarray of shape (n_samples, n_features)\n            The sample data.\n\n        Returns\n        -------\n        proba : array, shape (n_samples, n_classes)\n            The predicted probabilities. Can be exact zeros.\n        \"\"\"\n        n_classes = len(self.classes)\n        pred_method = _get_prediction_method(self.base_estimator)\n        predictions = _compute_predictions(pred_method, X, n_classes)\n\n        label_encoder = LabelEncoder().fit(self.classes)\n        pos_class_indices = label_encoder.transform(\n            self.base_estimator.classes_\n        )\n\n        proba = np.zeros((X.shape[0], n_classes))\n        for class_idx, this_pred, calibrator in \\\n                zip(pos_class_indices, predictions.T, self.calibrators):\n            if n_classes == 2:\n                # When binary, `predictions` consists only of predictions for\n                # clf.classes_[1] but `pos_class_indices` = 0\n                class_idx += 1\n            proba[:, class_idx] = calibrator.predict(this_pred)\n\n        # Normalize the probabilities\n        if n_classes == 2:\n            proba[:, 0] = 1. - proba[:, 1]\n        else:\n            denominator = np.sum(proba, axis=1)[:, np.newaxis]\n            # In the edge case where for each class calibrator returns a null\n            # probability for a given sample, use the uniform distribution\n            # instead.\n            uniform_proba = np.full_like(proba, 1 / n_classes)\n            proba = np.divide(proba, denominator, out=uniform_proba,\n                              where=denominator != 0)\n\n        # Deal with cases where the predicted probability minimally exceeds 1.0\n        proba[(1.0 < proba) & (proba <= 1.0 + 1e-5)] = 1.0\n\n        return proba"
            }
        ]
    },
    "Justification": "Candidate C is the most relevant because it directly addresses issues with the `CalibratedClassifierCV` and has a very similar context, especially regarding its integration with pipelines. Both the CURRENT bug report and Candidate C's report involve the `predict_proba` function and validation handling in `CalibratedClassifierCV`. Addressing the problem of incorrect predictions when using pipelines or configuration settings, such as transforming outputs, makes Candidate C an ideal choice for debugging the CURRENT bug. The structural and module similarities, along with their common focus on `CalibratedClassifierCV`, provide substantial insights for fixing the CURRENT issue."
}