{
    "Selected_candidate": {
        "pr_number": 11796,
        "pr_title": "[MRG+2] Fix LDA predict_proba() ",
        "pr_body": "<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist\r\n-->\r\n\r\n#### Reference Issues/PRs\r\nFixes #6848\r\ncloses #11727\r\ncloses #5149\r\n<!--\r\nExample: Fixes #1234. See also #3456.\r\nPlease use keywords (e.g., Fixes) to create link to the issues or pull requests\r\nyou resolved, so that they will automatically be closed when your pull request\r\nis merged. See https://github.com/blog/1506-closing-issues-via-pull-requests\r\n-->\r\n\r\n\r\n#### What does this implement/fix? Explain your changes.\r\nFixes the `predict_proba()` method of LinearDiscriminantAnalysis.\r\nAn `if` statement is used to differentiate between the binary and multi-class case, due to the different output format of the `decision_function` method implemented in the `LinearClassifierMixin` class.\r\n\r\n#### Any other comments?\r\nCopying from #6848:\r\nDo we perhaps want to include additional tests checking the output of predict_proba for LDA and QDA both for the binary and multi-class cases?\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
        "issue_id": 6848,
        "issue_title": "LinearDiscriminantAnalysis predict probability bug",
        "issue_body": "I am pretty confident there is a bug introduced in commit\n7c1101d7c26ba0b77184cce9c0b9be79adb526de\n\nConcretely, line 518 of the current version \nhttps://github.com/scikit-learn/scikit-learn/blob/master/sklearn/discriminant_analysis.py\nshould be removed as it yields wrong results. \n\nThere is no reason why constant 1 should be added to the computed probability after exponentiation and before inversion. \n\nTo verify this, I have run a one-to-one comparison between the outcome of the method and MATLAB's builtin LDA classifier on the Iris dataset. Only after removal of line 518, results match (up to a tolerance).\n\nIf everyone agrees on that, I am happy to submit a PR.\n",
        "issue_closed_at": "2019-03-07T16:44:18Z",
        "base_commit": "b73a51bcda362d94d8907915a382a8eb403554c8",
        "changes": [
            {
                "file": "sklearn/discriminant_analysis.py",
                "type": "line",
                "name": "line 22",
                "code": "from .utils import check_array, check_X_y\nfrom .utils.validation import check_is_fitted\nfrom .utils.multiclass import check_classification_targets\nfrom .preprocessing import StandardScaler\n\n"
            },
            {
                "file": "sklearn/discriminant_analysis.py",
                "type": "function",
                "name": "predict_proba",
                "class_name": "QuadraticDiscriminantAnalysis",
                "code": "def predict_proba(self, X):\n        \"\"\"Return posterior probabilities of classification.\n\n        Parameters\n        ----------\n        X : array-like, shape = [n_samples, n_features]\n            Array of samples/test vectors.\n\n        Returns\n        -------\n        C : array, shape = [n_samples, n_classes]\n            Posterior probabilities of classification per class.\n        \"\"\"\n        values = self._decision_function(X)\n        # compute the likelihood of the underlying gaussian models\n        # up to a multiplicative constant.\n        likelihood = np.exp(values - values.max(axis=1)[:, np.newaxis])\n        # compute posterior probabilities\n        return likelihood / likelihood.sum(axis=1)[:, np.newaxis]"
            }
        ]
    },
    "Justification": "Candidate C is the most relevant because it deals with the `predict_proba` method in LinearDiscriminantAnalysis, which is structurally similar since it involves classifying data like the VotingClassifier bug. The underlying principles of validating estimator configurations in both reports connect with the current issue of handling a None estimator in the voting process. Moreover, the candidate discusses specific code adjustments to improve outcomes, which can provide direct insights into testing similar edge cases in the CURRENT bug. This alignment in handling estimator logic makes it invaluable for debugging the CURRENT bug effectively."
}