{
    "Selected_candidate": {
        "pr_number": 9396,
        "pr_title": "[MRG + 1] Fix wrong error message in StratifiedKFold",
        "pr_body": "#### Reference Issue\r\nFixes #9381 \r\n\r\n#### What does this implement/fix? Explain your changes.\r\nFixes wrong error message used while creating folds in StratifiedKFold \r\n\r\n#### Any other comments?\r\nNone\r\n",
        "issue_id": 9381,
        "issue_title": "Wrong error message in StratifiedKFold",
        "issue_body": "``All the n_groups for individual classes are less than n_splits=%d.`` That seem confusing / wrong",
        "issue_closed_at": "2017-07-18T23:25:28Z",
        "base_commit": "eece6d909a8baa03c6f19276f494a7680ae299e1",
        "changes": [
            {
                "file": "sklearn/model_selection/_split.py",
                "type": "function",
                "name": "_make_test_folds",
                "class_name": "StratifiedKFold",
                "code": "def _make_test_folds(self, X, y=None):\n        rng = self.random_state\n        y = np.asarray(y)\n        n_samples = y.shape[0]\n        unique_y, y_inversed = np.unique(y, return_inverse=True)\n        y_counts = np.bincount(y_inversed)\n        min_groups = np.min(y_counts)\n        if np.all(self.n_splits > y_counts):\n            raise ValueError(\"All the n_groups for individual classes\"\n                             \" are less than n_splits=%d.\"\n                             % (self.n_splits))\n        if self.n_splits > min_groups:\n            warnings.warn((\"The least populated class in y has only %d\"\n                           \" members, which is too few. The minimum\"\n                           \" number of groups for any class cannot\"\n                           \" be less than n_splits=%d.\"\n                           % (min_groups, self.n_splits)), Warning)\n\n        # pre-assign each sample to a test fold index using individual KFold\n        # splitting strategies for each class so as to respect the balance of\n        # classes\n        # NOTE: Passing the data corresponding to ith class say X[y==class_i]\n        # will break when the data is not 100% stratifiable for all classes.\n        # So we pass np.zeroes(max(c, n_splits)) as data to the KFold\n        per_cls_cvs = [\n            KFold(self.n_splits, shuffle=self.shuffle,\n                  random_state=rng).split(np.zeros(max(count, self.n_splits)))\n            for count in y_counts]\n\n        test_folds = np.zeros(n_samples, dtype=np.int)\n        for test_fold_indices, per_cls_splits in enumerate(zip(*per_cls_cvs)):\n            for cls, (_, test_split) in zip(unique_y, per_cls_splits):\n                cls_test_folds = test_folds[y == cls]\n                # the test split can be too big because we used\n                # KFold(...).split(X[:max(c, n_splits)]) when data is not 100%\n                # stratifiable for all the classes\n                # (we use a warning instead of raising an exception)\n                # If this is the case, let's trim it:\n                test_split = test_split[test_split < len(cls_test_folds)]\n                cls_test_folds[test_split] = test_fold_indices\n                test_folds[y == cls] = cls_test_folds\n\n        return test_folds"
            }
        ]
    },
    "Justification": "Candidate C is chosen because it addresses a bug in the `StratifiedKFold`, a component closely related to the cross-validation mechanism used in `LogisticRegressionCV`. The CURRENT bug arises during cross-validation without refitting, which suggests that the handling of data partitions may be relevant to the observed IndexError. Structural similarity exists in the sense that both deals with the splitting of datasets. Moreover, fixing the error message in the context of splitting could uncover underlying issues in how indices are managed, potentially shedding light on the CURRENT bug's root cause."
}