{
  "episode_id": "d0633230-7067-47a9-9dbf-ee11e0a2cdd6",
  "task": "In the Scikit-Learn July 2017 changelog, what other predictor base command received a bug fix? Just give the name, not a path.",
  "file": "",
  "success": true,
  "num_turns": 26,
  "total_tokens": 151837,
  "trajectory": [
    {
      "turn_idx": 0,
      "action": "file_saver({\"content\": \"Title: Step 0 \u2013 Question Parsing and Compliance Checklist\\n\\n1) Understanding the question\\n- Target source: Scikit-Learn (scikit-learn) July 2017 changelog (official docs/release notes)\\n- Task: Identify which \u201cother predictor base command\u201d received a bug fix, and provide only the name (not a path)\\n- Output: A single name (likely a function/class/command name as written in the changelog)\\n\\n2) Output-format rules to follow\\n- End response with: \\\"FINAL ANSWER: [YOUR FINAL ANSWER]\\\"\\n- Answer should be as few words as possible\\n- If the answer is a string:\\n  \u2022 Do not use articles (a, an, the)\\n  \u2022 Do not use abbreviations (e.g., for cities) \u2013 not relevant but included\\n  \u2022 Write digits in plain text\\n- If the answer is a number:\\n  \u2022 No thousand separators, no units\\n- No additional text outside the FINAL ANSWER line\\n\\n3) Acceptance criteria\\n- The name matches exactly what is listed in the July 2017 scikit-learn changelog\\n- Provide only the name (no path/module prefix), no extra commentary\\n- Verified via at least two reputable sources (primary scikit-learn release notes + secondary cross-check)\\n\\n4) Ambiguities / risk considerations\\n- \u201cOther predictor base command\u201d phrasing may be ambiguous; resolution will rely on exact changelog phrasing and context\\n- July 2017 corresponds roughly to scikit-learn 0.19.0 timeframe; ensure correct month-specific changelog section\\n- Ensure we extract precisely the \u201cother predictor\u201d referenced as receiving a bug fix\\n\\n5) Planned approach (subsequent steps)\\n- Retrieve official July 2017 changelog from scikit-learn docs\\n- Locate the bug fix entry referencing \u201cother predictor base command\u201d and capture the exact name\\n- Cross-verify via GitHub release notes or mirrors\\n- Return the single required name formatted per rules\\n\", \"file_path\": \"work_space/gaia_d0633230_20251221_193229_869271/step0_parsing_and_checklist.md\", \"mode\": \"w\"})",
      "explanation": "Parsed the question and documented the compliance checklist. Saved as work_spacegaia_d0633230_20251221_193229_869271/step0_parsing_and_checklist.md. Ready to proceed to retrieving the official scikit-learn July 2017 changelog.",
      "observation": "Content successfully saved to work_space/gaia_d0633230_20251221_193229_869271/step0_parsing_and_checklist.md"
    },
    {
      "turn_idx": 1,
      "action": "mark_step({\"step_index\": 0, \"step_status\": \"completed\", \"step_notes\": \"Parsed the question and documented the compliance checklist. Saved as work_space/gaia_d0633230_20251221_193229_869271/step0_parsing_and_checklist.md. Ready to proceed to retrieving the official scikit-learn July 2017 changelog.\"})",
      "explanation": "",
      "observation": "Step 0: step_status is completed, step_notes is Parsed the question and documented the compliance checklist. Saved as work_space/gaia_d0633230_20251221_193229_869271/step0_parsing_and_checklist.md. Ready to proceed to retrieving the official scikit-learn July 2017 changelog. "
    },
    {
      "turn_idx": 2,
      "action": "tavily_search({\"query\": \"Scikit-Learn July 2017 changelog \\\"July 2017\\\" site:scikit-learn.org\"})",
      "explanation": "Retrieved and extracted the Scikit-Learn July 2017 changelog (Version 0.19) from official sources. Saved consolidated extraction as work_spacegaia_d0633230_20251221_193229_869271/search_results_summary_step1.md. Key relevant section identifies an 'Other predictors' bug fix entry referencing 'semi_supervised.BaseLabelPropagation'. Proceeding to locate and extract the exact command/name requested in Step 2.",
      "observation": "{'query': 'Scikit-Learn July 2017 changelog \"July 2017\"', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://scikit-learn.org/0.21/', 'title': 'machine learning in Python \u2014 scikit-learn 0.21.3 ...', 'content': '# scikit-learn. ## Machine Learning in Python. > SVM, nearest neighbors, random forest, \u2026. > Comparing, validating and choosing parameters and models. > **Application**: Transforming input data such as text for use with machine learning algorithms. * **Scikit-learn from 0.21 requires Python 3.5 or greater.**. * *July 2019.* scikit-learn 0.21.3 (Changelog) and 0.20.4 (Changelog) are available for download. * *May 2019.* scikit-learn 0.21.0 to 0.21.2 are available for download (Changelog). * *March 2019.* scikit-learn 0.20.3 is available for download (Changelog). * *September 2018.* scikit-learn 0.20.0 is available for download (Changelog). * *July 2018.* scikit-learn 0.19.2 is available for download (Changelog). * *July 2017.* scikit-learn 0.19.0 is available for download (Changelog). * *Mailing list:* scikit-learn@python.org. #### Who uses scikit-learn? *\"We use scikit-learn to support leading-edge basic research [...]\"*. *\"scikit-learn\\'s ease-of-use, performance and overall variety of algorithms implemented has proved invaluable [...].\"*. *\"For these tasks, we relied on the excellent scikit-learn package for Python.\"*. *\"scikit-learn makes doing advanced analysis in Python accessible to anyone.\"*.', 'score': 0.99837697, 'raw_content': None}, {'url': 'https://scikit-learn.org/0.21/_downloads/scikit-learn-docs.pdf', 'title': 'scikit-learn user guide', 'content': '... 2017) to work on this project full-time. It also hosts coding ... July 2017. The Labex Digi- \u00b7 Cosme funded Nicolas Goix (2015-2016), Tom ...Read more', 'score': 0.9830851, 'raw_content': None}], 'response_time': 0.49, 'request_id': '846228f9-e760-4534-88e5-bb6f13be2192'}"
    },
    {
      "turn_idx": 3,
      "action": "tavily_search({\"query\": \"scikit-learn changelog July 2017 bug fix\"})",
      "explanation": "",
      "observation": "{'query': 'scikit-learn changelog July 2017 bug fix', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://scikit-learn.sourceforge.net/dev/whats_new.html', 'title': 'Release history \u2014 scikit-learn 0.17.dev0 documentation', 'content': '> * Fixed feature importances as computed with a forest of randomized trees when fit with `sample_weight!= None` and/or with `bootstrap=True`. > * Fixed `class_weight` support in `svm.LinearSVC` and `linear_model.LogisticRegression` by Andreas M\u00fcller. > * Fixed API inconsistency: `linear_model.SGDClassifier.predict_proba` now returns 2d array when fit on two classes. > * Improved *Cross-validation: evaluating estimator performance* and *Grid Search: Searching for estimator parameters* documentation and introduced the new `cross_validation.train_test_split` helper function by Olivier Grisel. > * Minor refactoring in *Stochastic Gradient Descent* module; consolidated dense and sparse predict methods; Enhanced test time performance by converting model parameters to fortran-style arrays after fitting (only multi-class). scikit-learn 0.8 was released on May 2011, one month after the first \u201cinternational\u201d scikit-learn coding sprint and is marked by the inclusion of important modules: *Hierarchical clustering*, *Cross decomposition*, *Non-negative matrix factorization (NMF or NNMF)*, initial support for Python 3 and by important enhancements and bug fixes. > * Improved sparse matrix support, both in main classes (`grid_search.GridSearchCV`) as in modules sklearn.svm.sparse and sklearn.linear\\\\_model.sparse.', 'score': 0.99931216, 'raw_content': None}, {'url': 'https://scikit-learn.org/stable/whats_new/v0.19.html', 'title': 'Version 0.19 \u2014 scikit-learn 1.8.0 documentation', 'content': '* Fixed a bug where `model_selection.validation_curve` reused the same estimator for each parameter value. Joel Nothman, Loic Esteve, Andreas Mueller, Guillaume Lemaitre, Olivier Grisel, Hanmin Qin, Raghav RV, Alexandre Gramfort, themrmax, Aman Dalmia, Gael Varoquaux, Naoya Kanai, Tom Dupr\u00e9 la Tour, Rishikesh, Nelson Liu, Taehoon Lee, Nelle Varoquaux, Aashil, Mikhail Korobov, Sebastin Santy, Joan Massich, Roman Yurchak, RAKOTOARISON Herilalaina, Thierry Guillemot, Alexandre Abadie, Carol Willing, Balakumaran Manoharan, Josh Karnofsky, Vlad Niculae, Utkarsh Upadhyay, Dmitry Petrov, Minghui Liu, Srivatsan, Vincent Pham, Albert Thomas, Jake VanderPlas, Attractadore, JC Liu, alexandercbooth, chkoar, \u00d3scar N\u00e1jera, Aarshay Jain, Kyle Gilliam, Ramana Subramanyam, CJ Carey, Clement Joudet, David Robles, He Chen, Joris Van den Bossche, Karan Desai, Katie Luangkote, Leland McInnes, Maniteja Nandana, Michele Lacchia, Sergei Lebedev, Shubham Bhardwaj, akshay0724, omtcyfz, rickiepark, waterponey, Vathsala Achar, jbDelafosse, Ralf Gommers, Ekaterina Krivich, Vivek Kumar, Ishank Gulati, Dave Elliott, ldirer, Reiichiro Nakano, Levi John Wolf, Mathieu Blondel, Sid Kapur, Dougal J.', 'score': 0.99805814, 'raw_content': None}, {'url': 'https://scikit-learn.org/stable/whats_new/v0.18.html', 'title': 'Version 0.18 \u2014 scikit-learn 1.8.0 documentation', 'content': '* Fixed a bug where `sklearn.model_selection.train_test_split` raised an error when `stratify` is a list of string labels. The `sklearn.model_selection.GridSearchCV` will cross-validate each parameter setting on the split produced by the first `split` call to the cross-validation splitter. The new module `sklearn.model_selection`, which groups together the functionalities of formerly `sklearn.cross_validation`, `sklearn.grid_search` and `sklearn.learning_curve`, introduces new possibilities such as nested cross-validation and better manipulation of parameter searches with Pandas. The new cross-validation splitters, defined in the `sklearn.model_selection`, are no longer initialized with any data-dependent parameters such as `y`. The new `cv_results_` attribute (of `model_selection.GridSearchCV` and `model_selection.RandomizedSearchCV`) introduced in lieu of the `grid_scores_` attribute is a dict of 1D arrays with elements in each array corresponding to the parameter settings (i.e. search candidates). The `n_iter` parameter in `model_selection.ShuffleSplit`, the new class `model_selection.GroupShuffleSplit` and `model_selection.StratifiedShuffleSplit` is now renamed to `n_splits`. The `labels` parameter in the `split` method of the newly renamed splitters `model_selection.GroupKFold`, `model_selection.LeaveOneGroupOut`, `model_selection.LeavePGroupsOut`, `model_selection.GroupShuffleSplit` is renamed to `groups` following the new nomenclature of their class names.', 'score': 0.9976949, 'raw_content': None}, {'url': 'https://imbalanced-learn.org/stable/whats_new.html', 'title': 'Release history \u2014 Version 0.14.0 - Imbalanced-learn', 'content': '* Fix a bug in `BorderlineSMOTE` version 2 where samples should be generated from the whole dataset and not only from the minority class. * Fix a bug in `imblearn.under_sampling.ClusterCentroids` where `voting=\"hard\"` could have lead to select a sample from any class instead of the targeted class. * Fix a bug due to change in `scikit-learn` 0.23 in `imblearn.metrics.make_index_balanced_accuracy`. * Fix a bug when the median of the standard deviation is null in `imblearn.over_sampling.SMOTENC`. * Fix a bug in `imblearn.ensemble.BalancedRandomForestClassifier` leading to a wrong number of samples used during fitting due `max_samples` and therefore a bad computation of the OOB score. * Fix bug which converting to COO format sparse when stacking the matrices in `imblearn.over_sampling.SMOTENC`. * Fix bug in `imblearn.over_sampling.SVMSMOTE` and `imblearn.over_sampling.BorderlineSMOTE` where the default parameter of `n_neighbors` was not set properly. * Fix a bug in `imblearn.over_sampling.SMOTENC` in which the the median of the standard deviation instead of half of the median of the standard deviation. * Fix a bug in `imblearn.over_sampling.SMOTENC` in which a sparse matrices were densify during `inverse_transform`.', 'score': 0.99497885, 'raw_content': None}, {'url': 'https://scikit-learn.org/0.19/whats_new.html', 'title': 'Release history \u2014 scikit-learn 0.19.2 documentation', 'content': 'Joel Nothman, Loic Esteve, Andreas Mueller, Kumar Ashutosh, Vrishank Bhardwaj, Hanmin Qin, Rasul Kerimov, James Bourbeau, Nagarjuna Kumar, Nathaniel Saul, Olivier Grisel, Roman Yurchak, Reiichiro Nakano, Sachin Kelkar, Sam Steingold, Yaroslav Halchenko, diegodlh, felix, goncalo-rodrigues, jkleint, oliblum90, pasbi, Anthony Gitter, Ben Lawson, Charlie Brummitt, Didi Bar-Zev, Gael Varoquaux, Joan Massich, Joris Van den Bossche, nielsenmarkus11 scikit-learn 0.8 was released on May 2011, one month after the first \u201cinternational\u201d scikit-learn coding sprint and is marked by the inclusion of important modules: Hierarchical clustering, Cross decomposition, Non-negative matrix factorization (NMF or NNMF), initial support for Python 3 and by important enhancements and bug fixes. * Refactoring in the Support Vector Machines module and bug fixes by Fabian Pedregosa, Gael Varoquaux and Amit Aides. Also added references of matrices used for clustering and other fixes [Gael Varoquaux, Fabian Pedregosa, Mathieu Blondel, Olivier Grisel, Virgile Fritsch , Emmanuelle Gouillart]', 'score': 0.99187535, 'raw_content': None}], 'response_time': 1.14, 'request_id': '0379cd88-a12f-48aa-8b49-4760ce867435'}"
    },
    {
      "turn_idx": 4,
      "action": "tavily_search({\"query\": \"\\\"July 2017\\\" \\\"scikit-learn\\\" changelog\"})",
      "explanation": "",
      "observation": "{'query': '\"July 2017\" \"scikit-learn\" changelog', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://explorer.invariantlabs.ai/u/feixieliz/gaia_magentic-one/t/70', 'title': 'Invariant Explorer', 'content': '(thought) Updated Ledger:{ \"is_request_satisfied\": { \"reason\": \"We have not yet identified the specific predictor base command that received a bug fix in the Scikit-Learn July 2017 changelog.\", \"answer\": false }, \"is_in_loop\": { \"reason\": \"The current approach is not yielding the specific information needed from the search results; we need a different strategy.\", \"answer\": true }, \"is_progress_being_made\": { \"reason\": \"Although an initial search has been conducted, it has not provided the required information from the correct documentation.\", \"answer\": false }, \"next_speaker\": { \"reason\": \"The WebSurfer needs to try another path or possibly explore other documentation or communication channels.\", \"answer\": \"WebSurfer\" }, \"instruction_or_question\": { \"reason\": \"A targeted search directly on Scikit-Learn\\'s official documentation site or related GitHub repositories might yield better results.\", \"answer\": \"Please search for the specific version of Scikit-Learn around July 2017, such as version 0.18 or 0.19, on their official site or GitHub for changelogs that include bug fixes.\" }}.', 'score': 0.91175514, 'raw_content': None}, {'url': 'https://scikit-learn.org/0.20/', 'title': 'machine learning in Python \u2014 scikit-learn 0.20.4 documentation', 'content': 'July 2018. scikit-learn 0.19.2 is available for download (Changelog). July 2017. scikit-learn 0.19.0 is available for download (Changelog)', 'score': 0.850382, 'raw_content': None}, {'url': 'https://docs.huihoo.com/scikit-learn/0.20/index.html', 'title': 'scikit-learn 0.20.2 documentation - Huihoo', 'content': 'July 2017. scikit-learn 0.19.0 is available for download (Changelog). June 2017. scikit-learn 0.18.2 is available for download (Changelog)', 'score': 0.8458536, 'raw_content': None}, {'url': 'https://scikit-learn.org/0.20/_downloads/scikit-learn-docs.pdf', 'title': 'scikit-learn user guide', 'content': '... scikit-learn user guide, Release 0.20.4 \u00b7 Sydney funds Joel Nothman since July 2017. The Labex Digi- \u00b7 Cosme funded Nicolas Goix (2015-2016)', 'score': 0.7380092, 'raw_content': None}, {'url': 'https://thedataist.com/2017/07/', 'title': 'July 2017 - The Dataist', 'content': '... scikit-learn API, and therefore if you are a regular user of scikit ... July 2017 \u00b7 May 2017 \u00b7 April 2017 \u00b7 March 2017 \u00b7 February 2017 \u00b7 January', 'score': 0.6643884, 'raw_content': None}], 'response_time': 0.78, 'request_id': 'cb4bf960-7541-4d20-bec2-856d479ac927'}"
    },
    {
      "turn_idx": 5,
      "action": "fetch_website_content({\"website_url\": \"https://scikit-learn.org/stable/whats_new/v0.19.html\"})",
      "explanation": "",
      "observation": "\nVersion 0.19 \u2014 scikit-learn 1.8.0 documentation\nSkip to main content\nBack to top\nCtrl + K\nInstall\nUser Guide\nAPI\nExamples\nCommunity\nMore\nGetting Started\nRelease History\nGlossary\nDevelopment\nFAQ\nSupport\nRelated Projects\nRoadmap\nGovernance\nAbout us\nGitHub\nChoose version\nInstall\nUser Guide\nAPI\nExamples\nCommunity\nGetting Started\nRelease History\nGlossary\nDevelopment\nFAQ\nSupport\nRelated Projects\nRoadmap\nGovernance\nAbout us\nGitHub\nChoose version\nSection Navigation\nVersion 1.8\nVersion 1.7\nVersion 1.6\nVersion 1.5\nVersion 1.4\nVersion 1.3\nVersion 1.2\nVersion 1.1\nVersion 1.0\nVersion 0.24\nVersion 0.23\nVersion 0.22\nVersion 0.21\nVersion 0.20\nVersion 0.19\nVersion 0.18\nVersion 0.17\nVersion 0.16\nVersion 0.15\nVersion 0.14\nVersion 0.13\nOlder Versions\nRelease History\nVersion 0.19\nVersion 0.19 #\nVersion 0.19.2 #\nJuly, 2018\nThis release is exclusively in order to support Python 3.7.\nRelated changes #\nn_iter_ may vary from previous releases in\n linear_model.LogisticRegression with solver='lbfgs' and\n linear_model.HuberRegressor . For Scipy <= 1.0.0, the optimizer could\nperform more than the requested maximum number of iterations. Now both\nestimators will report at most max_iter iterations even if more were\nperformed. #10723 by Joel Nothman .\nVersion 0.19.1 #\nOctober 23, 2017\nThis is a bug-fix release with some minor documentation improvements and\nenhancements to features released in 0.19.0.\nNote there may be minor differences in TSNE output in this release (due to\n #9623 ), in the case where multiple samples have equal distance to some\nsample.\nChangelog #\nAPI changes #\nReverted the addition of metrics.ndcg_score and metrics.dcg_score \nwhich had been merged into version 0.19.0 by error. The implementations\nwere broken and undocumented.\nreturn_train_score which was added to\n model_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV and\n model_selection.cross_validate in version 0.19.0 will be changing its\ndefault value from True to False in version 0.21. We found that calculating\ntraining score could have a great effect on cross validation runtime in some\ncases. Users should explicitly set return_train_score to False if\nprediction or scoring functions are slow, resulting in a deleterious effect\non CV runtime, or to True if they wish to use the calculated scores.\n #9677 by Kumar Ashutosh and Joel\nNothman .\ncorrelation_models and regression_models from the legacy gaussian\nprocesses implementation have been belatedly deprecated. #9717 by\n Kumar Ashutosh .\nBug fixes #\nAvoid integer overflows in metrics.matthews_corrcoef .\n #9693 by Sam Steingold .\nFixed a bug in the objective function for manifold.TSNE (both exact\nand with the Barnes-Hut approximation) when n_components >= 3 .\n #9711 by @goncalo-rodrigues .\nFix regression in model_selection.cross_val_predict where it\nraised an error with method='predict_proba' for some probabilistic\nclassifiers. #9641 by James Bourbeau .\nFixed a bug where datasets.make_classification modified its input\n weights . #9865 by Sachin Kelkar .\nmodel_selection.StratifiedShuffleSplit now works with multioutput\nmulticlass or multilabel data with more than 1000 columns. #9922 by\n Charlie Brummitt .\nFixed a bug with nested and conditional parameter setting, e.g. setting a\npipeline step and its parameter at the same time. #9945 by Andreas\nM\u00fcller and Joel Nothman .\nRegressions in 0.19.0 fixed in 0.19.1:\nFixed a bug where parallelised prediction in random forests was not\nthread-safe and could (rarely) result in arbitrary errors. #9830 by\n Joel Nothman .\nFix regression in model_selection.cross_val_predict where it no\nlonger accepted X as a list. #9600 by Rasul Kerimov .\nFixed handling of model_selection.cross_val_predict for binary\nclassification with method='decision_function' . #9593 by\n Reiichiro Nakano and core devs.\nFix regression in pipeline.Pipeline where it no longer accepted\n steps as a tuple. #9604 by Joris Van den Bossche .\nFix bug where n_iter was not properly deprecated, leaving n_iter \nunavailable for interim use in\n linear_model.SGDClassifier , linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron . #9558 by Andreas M\u00fcller .\nDataset fetchers make sure temporary files are closed before removing them,\nwhich caused errors on Windows. #9847 by Joan Massich .\nFixed a regression in manifold.TSNE where it no longer supported\nmetrics other than \u2018euclidean\u2019 and \u2018precomputed\u2019. #9623 by Oli\nBlum .\nEnhancements #\nOur test suite and utils.estimator_checks.check_estimator can now be\nrun without Nose installed. #9697 by Joan Massich .\nTo improve usability of version 0.19\u2019s pipeline.Pipeline \ncaching, memory now allows joblib.Memory instances.\nThis make use of the new utils.validation.check_memory helper.\n #9584 by Kumar Ashutosh\nSome fixes to examples: #9750 , #9788 , #9815\nMade a FutureWarning in SGD-based estimators less verbose. #9802 by\n Vrishank Bhardwaj .\nCode and Documentation Contributors #\nWith thanks to:\nJoel Nothman, Loic Esteve, Andreas Mueller, Kumar Ashutosh,\nVrishank Bhardwaj, Hanmin Qin, Rasul Kerimov, James Bourbeau,\nNagarjuna Kumar, Nathaniel Saul, Olivier Grisel, Roman\nYurchak, Reiichiro Nakano, Sachin Kelkar, Sam Steingold,\nYaroslav Halchenko, diegodlh, felix, goncalo-rodrigues,\njkleint, oliblum90, pasbi, Anthony Gitter, Ben Lawson, Charlie\nBrummitt, Didi Bar-Zev, Gael Varoquaux, Joan Massich, Joris\nVan den Bossche, nielsenmarkus11\nVersion 0.19 #\nAugust 12, 2017\nHighlights #\nWe are excited to release a number of great new features including\n neighbors.LocalOutlierFactor for anomaly detection,\n preprocessing.QuantileTransformer for robust feature transformation,\nand the multioutput.ClassifierChain meta-estimator to simply account\nfor dependencies between classes in multilabel problems. We have some new\nalgorithms in existing estimators, such as multiplicative update in\n decomposition.NMF and multinomial\n linear_model.LogisticRegression with L1 loss (use solver='saga' ).\nCross validation is now able to return the results from multiple metric\nevaluations. The new model_selection.cross_validate can return many\nscores on the test data as well as training set performance and timings, and we\nhave extended the scoring and refit parameters for grid/randomized\nsearch to handle multiple metrics .\nYou can also learn faster. For instance, the new option to cache\ntransformations in pipeline.Pipeline makes grid\nsearch over pipelines including slow transformations much more efficient. And\nyou can predict faster: if you\u2019re sure you know what you\u2019re doing, you can turn\noff validating that the input is finite using config_context .\nWe\u2019ve made some important fixes too. We\u2019ve fixed a longstanding implementation\nerror in metrics.average_precision_score , so please be cautious with\nprior results reported from that function. A number of errors in the\n manifold.TSNE implementation have been fixed, particularly in the\ndefault Barnes-Hut approximation. semi_supervised.LabelSpreading and\n semi_supervised.LabelPropagation have had substantial fixes.\nLabelPropagation was previously broken. LabelSpreading should now correctly\nrespect its alpha parameter.\nChanged models #\nThe following estimators and functions, when fit with the same data and\nparameters, may produce different models from the previous version. This often\noccurs due to changes in the modelling logic (bug fixes or enhancements), or in\nrandom sampling procedures.\ncluster.KMeans with sparse X and initial centroids given (bug fix)\ncross_decomposition.PLSRegression \nwith scale=True (bug fix)\nensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor where min_impurity_split is used (bug fix)\ngradient boosting loss='quantile' (bug fix)\nensemble.IsolationForest (bug fix)\nfeature_selection.SelectFdr (bug fix)\nlinear_model.RANSACRegressor (bug fix)\nlinear_model.LassoLars (bug fix)\nlinear_model.LassoLarsIC (bug fix)\nmanifold.TSNE (bug fix)\nneighbors.NearestCentroid (bug fix)\nsemi_supervised.LabelSpreading (bug fix)\nsemi_supervised.LabelPropagation (bug fix)\ntree based models where min_weight_fraction_leaf is used (enhancement)\nmodel_selection.StratifiedKFold with shuffle=True \n(this change, due to #7823 was not mentioned in the release notes at\nthe time)\nDetails are listed in the changelog below.\n(While we are trying to better inform users by providing this information, we\ncannot assure that this list is complete.)\nChangelog #\nNew features #\nClassifiers and regressors\nAdded multioutput.ClassifierChain for multi-label\nclassification. By Adam Kleczewski .\nAdded solver 'saga' that implements the improved version of Stochastic\nAverage Gradient, in linear_model.LogisticRegression and\n linear_model.Ridge . It allows the use of L1 penalty with\nmultinomial logistic loss, and behaves marginally better than \u2018sag\u2019\nduring the first epochs of ridge and logistic regression.\n #8446 by Arthur Mensch .\nOther estimators\nAdded the neighbors.LocalOutlierFactor class for anomaly\ndetection based on nearest neighbors.\n #5279 by Nicolas Goix and Alexandre Gramfort .\nAdded preprocessing.QuantileTransformer class and\n preprocessing.quantile_transform function for features\nnormalization based on quantiles.\n #8363 by Denis Engemann ,\n Guillaume Lemaitre , Olivier Grisel , Raghav RV ,\n Thierry Guillemot , and Gael Varoquaux .\nThe new solver 'mu' implements a Multiplicate Update in\n decomposition.NMF , allowing the optimization of all\nbeta-divergences, including the Frobenius norm, the generalized\nKullback-Leibler divergence and the Itakura-Saito divergence.\n #5295 by Tom Dupre la Tour .\nModel selection and evaluation\nmodel_selection.GridSearchCV and\n model_selection.RandomizedSearchCV now support simultaneous\nevaluation of multiple metrics. Refer to the\n Specifying multiple metrics for evaluation section of the user guide for more\ninformation. #7388 by Raghav RV\nAdded the model_selection.cross_validate which allows evaluation\nof multiple metrics. This function returns a dict with more useful\ninformation from cross-validation such as the train scores, fit times and\nscore times.\nRefer to The cross_validate function and multiple metric evaluation section of the userguide\nfor more information. #7388 by Raghav RV\nAdded metrics.mean_squared_log_error , which computes\nthe mean square error of the logarithmic transformation of targets,\nparticularly useful for targets with an exponential trend.\n #7655 by Karan Desai .\nAdded metrics.dcg_score and metrics.ndcg_score , which\ncompute Discounted cumulative gain (DCG) and Normalized discounted\ncumulative gain (NDCG).\n #7739 by David Gasquez .\nAdded the model_selection.RepeatedKFold and\n model_selection.RepeatedStratifiedKFold .\n #8120 by Neeraj Gangwar .\nMiscellaneous\nValidation that input data contains no NaN or inf can now be suppressed\nusing config_context , at your own risk. This will save on runtime,\nand may be particularly useful for prediction time. #7548 by\n Joel Nothman .\nAdded a test to ensure parameter listing in docstrings matches the\nfunction/class signature. #9206 by Alexandre Gramfort and\n Raghav RV .\nEnhancements #\nTrees and ensembles\nThe min_weight_fraction_leaf constraint in tree construction is now\nmore efficient, taking a fast path to declare a node a leaf if its weight\nis less than 2 * the minimum. Note that the constructed tree will be\ndifferent from previous versions where min_weight_fraction_leaf is\nused. #7441 by Nelson Liu .\nensemble.GradientBoostingClassifier and ensemble.GradientBoostingRegressor \nnow support sparse input for prediction.\n #6101 by Ibraim Ganiev .\nensemble.VotingClassifier now allows changing estimators by using\n ensemble.VotingClassifier.set_params . An estimator can also be\nremoved by setting it to None .\n #7674 by Yichuan Liu .\ntree.export_graphviz now shows configurable number of decimal\nplaces. #8698 by Guillaume Lemaitre .\nAdded flatten_transform parameter to ensemble.VotingClassifier \nto change output shape of transform method to 2 dimensional.\n #7794 by Ibraim Ganiev and\n Herilalaina Rakotoarison .\nLinear, kernelized and related models\nlinear_model.SGDClassifier , linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron now expose max_iter and\n tol parameters, to handle convergence more precisely.\n n_iter parameter is deprecated, and the fitted estimator exposes\na n_iter_ attribute, with actual number of iterations before\nconvergence. #5036 by Tom Dupre la Tour .\nAdded average parameter to perform weight averaging in\n linear_model.PassiveAggressiveClassifier . #4939 \nby Andrea Esuli .\nlinear_model.RANSACRegressor no longer throws an error\nwhen calling fit if no inliers are found in its first iteration.\nFurthermore, causes of skipped iterations are tracked in newly added\nattributes, n_skips_* .\n #7914 by Michael Horrell .\nIn gaussian_process.GaussianProcessRegressor , method predict \nis a lot faster with return_std=True . #8591 by\n Hadrien Bertrand .\nAdded return_std to predict method of\n linear_model.ARDRegression and\n linear_model.BayesianRidge .\n #7838 by Sergey Feldman .\nMemory usage enhancements: Prevent cast from float32 to float64 in:\n linear_model.MultiTaskElasticNet ;\n linear_model.LogisticRegression when using newton-cg solver; and\n linear_model.Ridge when using svd, sparse_cg, cholesky or lsqr\nsolvers. #8835 , #8061 by Joan Massich and Nicolas\nCordier and Thierry Guillemot .\nOther predictors\nCustom metrics for the sklearn.neighbors binary trees now have\nfewer constraints: they must take two 1d-arrays and return a float.\n #6288 by Jake Vanderplas .\nalgorithm='auto in sklearn.neighbors estimators now chooses the most\nappropriate algorithm for all input types and metrics. #9145 by\n Herilalaina Rakotoarison and Reddy Chinthala .\nDecomposition, manifold learning and clustering\ncluster.MiniBatchKMeans and cluster.KMeans \nnow use significantly less memory when assigning data points to their\nnearest cluster center. #7721 by Jon Crall .\ndecomposition.PCA , decomposition.IncrementalPCA and\n decomposition.TruncatedSVD now expose the singular values\nfrom the underlying SVD. They are stored in the attribute\n singular_values_ , like in decomposition.IncrementalPCA .\n #7685 by Tommy L\u00f6fstedt\ndecomposition.NMF now faster when beta_loss=0 .\n #9277 by @hongkahjun .\nMemory improvements for method barnes_hut in manifold.TSNE\n#7089 by Thomas Moreau and Olivier Grisel .\nOptimization schedule improvements for Barnes-Hut manifold.TSNE \nso the results are closer to the one from the reference implementation\n lvdmaaten/bhtsne by Thomas\nMoreau and Olivier Grisel .\nMemory usage enhancements: Prevent cast from float32 to float64 in\n decomposition.PCA and\n decomposition.randomized_svd_low_rank .\n #9067 by Raghav RV .\nPreprocessing and feature selection\nAdded norm_order parameter to feature_selection.SelectFromModel \nto enable selection of the norm order when coef_ is more than 1D.\n #6181 by Antoine Wendlinger .\nAdded ability to use sparse matrices in feature_selection.f_regression \nwith center=True . #8065 by Daniel LeJeune .\nSmall performance improvement to n-gram creation in\n sklearn.feature_extraction.text by binding methods for loops and\nspecial-casing unigrams. #7567 by Jaye Doepke\nRelax assumption on the data for the\n kernel_approximation.SkewedChi2Sampler . Since the Skewed-Chi2\nkernel is defined on the open interval \\((-skewedness; +\\infty)^d\\) ,\nthe transform function should not check whether X < 0 but whether X <\n-self.skewedness . #7573 by Romain Brault .\nMade default kernel parameters kernel-dependent in\n kernel_approximation.Nystroem .\n #5229 by Saurabh Bansod and Andreas M\u00fcller .\nModel evaluation and meta-estimators\npipeline.Pipeline is now able to cache transformers\nwithin a pipeline by using the memory constructor parameter.\n #7990 by Guillaume Lemaitre .\npipeline.Pipeline steps can now be accessed as attributes of its\n named_steps attribute. #8586 by Herilalaina\nRakotoarison .\nAdded sample_weight parameter to pipeline.Pipeline.score .\n #7723 by Mikhail Korobov .\nAdded ability to set n_jobs parameter to pipeline.make_union .\nA TypeError will be raised for any other kwargs. #8028 \nby Alexander Booth .\nmodel_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV and\n model_selection.cross_val_score now allow estimators with callable\nkernels which were previously prohibited.\n #8005 by Andreas M\u00fcller .\nmodel_selection.cross_val_predict now returns output of the\ncorrect shape for all values of the argument method .\n #7863 by Aman Dalmia .\nAdded shuffle and random_state parameters to shuffle training\ndata before taking prefixes of it based on training sizes in\n model_selection.learning_curve .\n #7506 by Narine Kokhlikyan .\nmodel_selection.StratifiedShuffleSplit now works with multioutput\nmulticlass (or multilabel) data. #9044 by Vlad Niculae .\nSpeed improvements to model_selection.StratifiedShuffleSplit .\n #5991 by Arthur Mensch and Joel Nothman .\nAdd shuffle parameter to model_selection.train_test_split .\n #8845 by themrmax\nmultioutput.MultiOutputRegressor and multioutput.MultiOutputClassifier \nnow support online learning using partial_fit .\n:issue: 8053 by Peng Yu .\nAdd max_train_size parameter to model_selection.TimeSeriesSplit\n#8282 by Aman Dalmia .\nMore clustering metrics are now available through metrics.get_scorer \nand scoring parameters. #8117 by Raghav RV .\nA scorer based on metrics.explained_variance_score is also available.\n #9259 by Hanmin Qin .\nMetrics\nmetrics.matthews_corrcoef now supports multiclass classification.\n #8094 by Jon Crall .\nAdd sample_weight parameter to metrics.cohen_kappa_score .\n #8335 by Victor Poughon .\nMiscellaneous\nutils.estimator_checks.check_estimator now attempts to ensure that methods\ntransform, predict, etc. do not set attributes on the estimator.\n #7533 by Ekaterina Krivich .\nAdded type checking to the accept_sparse parameter in\n sklearn.utils.validation methods. This parameter now accepts only boolean,\nstring, or list/tuple of strings. accept_sparse=None is deprecated and\nshould be replaced by accept_sparse=False .\n #7880 by Josh Karnofsky .\nMake it possible to load a chunk of an svmlight formatted file by\npassing a range of bytes to datasets.load_svmlight_file .\n #935 by Olivier Grisel .\ndummy.DummyClassifier and dummy.DummyRegressor \nnow accept non-finite features. #8931 by @Attractadore .\nBug fixes #\nTrees and ensembles\nFixed a memory leak in trees when using trees with criterion='mae' .\n #8002 by Raghav RV .\nFixed a bug where ensemble.IsolationForest uses an\nincorrect formula for the average path length\n #8549 by Peter Wang .\nFixed a bug where ensemble.AdaBoostClassifier throws\n ZeroDivisionError while fitting data with single class labels.\n #7501 by Dominik Krzeminski .\nFixed a bug in ensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor where a float being compared\nto 0.0 using == caused a divide by zero error. #7970 by\n He Chen .\nFix a bug where ensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor ignored the\n min_impurity_split parameter.\n #8006 by Sebastian P\u00f6lsterl .\nFixed oob_score in ensemble.BaggingClassifier .\n #8936 by Michael Lewis\nFixed excessive memory usage in prediction for random forests estimators.\n #8672 by Mike Benfield .\nFixed a bug where sample_weight as a list broke random forests in Python 2\n #8068 by @xor .\nFixed a bug where ensemble.IsolationForest fails when\n max_features is less than 1.\n #5732 by Ishank Gulati .\nFix a bug where gradient boosting with loss='quantile' computed\nnegative errors for negative values of ytrue - ypred leading to wrong\nvalues when calling __call__ .\n #8087 by Alexis Mignon\nFix a bug where ensemble.VotingClassifier raises an error\nwhen a numpy array is passed in for weights. #7983 by\n Vincent Pham .\nFixed a bug where tree.export_graphviz raised an error\nwhen the length of features_names does not match n_features in the decision\ntree. #8512 by Li Li .\nLinear, kernelized and related models\nFixed a bug where linear_model.RANSACRegressor.fit may run until\n max_iter if it finds a large inlier group early. #8251 by\n @aivision2020 .\nFixed a bug where naive_bayes.MultinomialNB and\n naive_bayes.BernoulliNB failed when alpha=0 . #5814 by\n Yichuan Liu and Herilalaina Rakotoarison .\nFixed a bug where linear_model.LassoLars does not give\nthe same result as the LassoLars implementation available\nin R (lars library). #7849 by Jair Montoya Martinez .\nFixed a bug in linear_model.RandomizedLasso ,\n linear_model.Lars , linear_model.LassoLars ,\n linear_model.LarsCV and linear_model.LassoLarsCV ,\nwhere the parameter precompute was not used consistently across\nclasses, and some values proposed in the docstring could raise errors.\n #5359 by Tom Dupre la Tour .\nFix inconsistent results between linear_model.RidgeCV and\n linear_model.Ridge when using normalize=True . #9302 \nby Alexandre Gramfort .\nFix a bug where linear_model.LassoLars.fit sometimes\nleft coef_ as a list, rather than an ndarray.\n #8160 by CJ Carey .\nFix linear_model.BayesianRidge.fit to return\nridge parameter alpha_ and lambda_ consistent with calculated\ncoefficients coef_ and intercept_ .\n #8224 by Peter Gedeck .\nFixed a bug in svm.OneClassSVM where it returned floats instead of\ninteger classes. #8676 by Vathsala Achar .\nFix AIC/BIC criterion computation in linear_model.LassoLarsIC .\n #9022 by Alexandre Gramfort and Mehmet Basbug .\nFixed a memory leak in our LibLinear implementation. #9024 by\n Sergei Lebedev\nFix bug where stratified CV splitters did not work with\n linear_model.LassoCV . #8973 by\n Paulo Haddad .\nFixed a bug in gaussian_process.GaussianProcessRegressor \nwhen the standard deviation and covariance predicted without fit\nwould fail with a meaningless error by default.\n #6573 by Quazi Marufur Rahman and\n Manoj Kumar .\nOther predictors\nFix semi_supervised.BaseLabelPropagation to correctly implement\n LabelPropagation and LabelSpreading as done in the referenced\npapers. #9239 \nby Andre Ambrosio Boechat , Utkarsh Upadhyay , and Joel Nothman .\nDecomposition, manifold learning and clustering\nFixed the implementation of manifold.TSNE :\nearly_exaggeration parameter had no effect and is now used for the\nfirst 250 optimization iterations.\nFixed the AssertionError: Tree consistency failed exception\nreported in #8992 .\nImprove the learning schedule to match the one from the reference\nimplementation lvdmaaten/bhtsne .\nby Thomas Moreau and Olivier Grisel .\nFix a bug in decomposition.LatentDirichletAllocation \nwhere the perplexity method was returning incorrect results because\nthe transform method returns normalized document topic distributions\nas of version 0.18. #7954 by Gary Foreman .\nFix output shape and bugs with n_jobs > 1 in\n decomposition.SparseCoder transform and\n decomposition.sparse_encode \nfor one-dimensional data and one component.\nThis also impacts the output shape of decomposition.DictionaryLearning .\n #8086 by Andreas M\u00fcller .\nFixed the implementation of explained_variance_ \nin decomposition.PCA ,\n decomposition.RandomizedPCA and\n decomposition.IncrementalPCA .\n #9105 by Hanmin Qin .\nFixed the implementation of noise_variance_ in decomposition.PCA .\n #9108 by Hanmin Qin .\nFixed a bug where cluster.DBSCAN gives incorrect\nresult when input is a precomputed sparse matrix with initial\nrows all zero. #8306 by Akshay Gupta\nFix a bug regarding fitting cluster.KMeans with a sparse\narray X and initial centroids, where X\u2019s means were unnecessarily being\nsubtracted from the centroids. #7872 by Josh Karnofsky .\nFixes to the input validation in covariance.EllipticEnvelope .\n #8086 by Andreas M\u00fcller .\nFixed a bug in covariance.MinCovDet where inputting data\nthat produced a singular covariance matrix would cause the helper method\n _c_step to throw an exception.\n #3367 by Jeremy Steward\nFixed a bug in manifold.TSNE affecting convergence of the\ngradient descent. #8768 by David DeTomaso .\nFixed a bug in manifold.TSNE where it stored the incorrect\n kl_divergence_ . #6507 by Sebastian Saeger .\nFixed improper scaling in cross_decomposition.PLSRegression \nwith scale=True . #7819 by jayzed82 .\ncluster.SpectralCoclustering and\n cluster.SpectralBiclustering fit method conforms\nwith API by accepting y and returning the object. #6126 ,\n #7814 by Laurent Direr and Maniteja\nNandana .\nFix bug where sklearn.mixture sample methods did not return as many\nsamples as requested. #7702 by Levi John Wolf .\nFixed the shrinkage implementation in neighbors.NearestCentroid .\n #9219 by Hanmin Qin .\nPreprocessing and feature selection\nFor sparse matrices, preprocessing.normalize with return_norm=True \nwill now raise a NotImplementedError with \u2018l1\u2019 or \u2018l2\u2019 norm and with\nnorm \u2018max\u2019 the norms returned will be the same as for dense matrices.\n #7771 by Ang Lu .\nFix a bug where feature_selection.SelectFdr did not\nexactly implement Benjamini-Hochberg procedure. It formerly may have\nselected fewer features than it should.\n #7490 by Peng Meng .\nFixed a bug where linear_model.RandomizedLasso and\n linear_model.RandomizedLogisticRegression break for\nsparse input. #8259 by Aman Dalmia .\nFix a bug where feature_extraction.FeatureHasher \nmandatorily applied a sparse random projection to the hashed features,\npreventing the use of\n feature_extraction.text.HashingVectorizer in a\npipeline with feature_extraction.text.TfidfTransformer .\n #7565 by Roman Yurchak .\nFix a bug where feature_selection.mutual_info_regression did not\ncorrectly use n_neighbors . #8181 by Guillaume Lemaitre .\nModel evaluation and meta-estimators\nFixed a bug where model_selection.BaseSearchCV.inverse_transform \nreturns self.best_estimator_.transform() instead of\n self.best_estimator_.inverse_transform() .\n #8344 by Akshay Gupta and Rasmus Eriksson .\nAdded classes_ attribute to model_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV , grid_search.GridSearchCV ,\nand grid_search.RandomizedSearchCV that matches the classes_ \nattribute of best_estimator_ . #7661 and #8295 \nby Alyssa Batula , Dylan Werner-Meier ,\nand Stephen Hoover .\nFixed a bug where model_selection.validation_curve \nreused the same estimator for each parameter value.\n #7365 by Aleksandr Sandrovskii .\nmodel_selection.permutation_test_score now works with Pandas\ntypes. #5697 by Stijn Tonk .\nSeveral fixes to input validation in\n multiclass.OutputCodeClassifier\n#8086 by Andreas M\u00fcller .\nmulticlass.OneVsOneClassifier \u2019s partial_fit now ensures all\nclasses are provided up-front. #6250 by\n Asish Panda .\nFix multioutput.MultiOutputClassifier.predict_proba to return a\nlist of 2d arrays, rather than a 3d array. In the case where different\ntarget columns had different numbers of classes, a ValueError would be\nraised on trying to stack matrices with different dimensions.\n #8093 by Peter Bull .\nCross validation now works with Pandas datatypes that have a\nread-only index. #9507 by Loic Esteve .\nMetrics\nmetrics.average_precision_score no longer linearly\ninterpolates between operating points, and instead weighs precisions\nby the change in recall since the last operating point, as per the\n Wikipedia entry .\n( #7356 ). By\n Nick Dingwall and Gael Varoquaux .\nFix a bug in metrics.classification._check_targets \nwhich would return 'binary' if y_true and y_pred were\nboth 'binary' but the union of y_true and y_pred was\n 'multiclass' . #8377 by Loic Esteve .\nFixed an integer overflow bug in metrics.confusion_matrix and\nhence metrics.cohen_kappa_score . #8354 , #7929 \nby Joel Nothman and Jon Crall .\nFixed passing of gamma parameter to the chi2 kernel in\n metrics.pairwise.pairwise_kernels #5211 by\n Nick Rhinehart ,\n Saurabh Bansod and Andreas M\u00fcller .\nMiscellaneous\nFixed a bug when datasets.make_classification fails\nwhen generating more than 30 features. #8159 by\n Herilalaina Rakotoarison .\nFixed a bug where datasets.make_moons gives an\nincorrect result when n_samples is odd.\n #8198 by Josh Levy .\nSome fetch_ functions in sklearn.datasets were ignoring the\n download_if_missing keyword. #7944 by Ralf Gommers .\nFix estimators to accept a sample_weight parameter of type\n pandas.Series in their fit function. #7825 by\n Kathleen Chen .\nFix a bug in cases where numpy.cumsum may be numerically unstable,\nraising an exception if instability is identified. #7376 and\n #7331 by Joel Nothman and @yangarbiter .\nFix a bug where base.BaseEstimator.__getstate__ \nobstructed pickling customizations of child-classes, when used in a\nmultiple inheritance context.\n #8316 by Holger Peters .\nUpdate Sphinx-Gallery from 0.1.4 to 0.1.7 for resolving links in\ndocumentation build with Sphinx>1.5 #8010 , #7986 by\n Oscar Najera\nAdd data_home parameter to sklearn.datasets.fetch_kddcup99 .\n #9289 by Loic Esteve .\nFix dataset loaders using Python 3 version of makedirs to also work in\nPython 2. #9284 by Sebastin Santy .\nSeveral minor issues were fixed with thanks to the alerts of\n lgtm.com . #9278 by Jean Helie ,\namong others.\nAPI changes summary #\nTrees and ensembles\nGradient boosting base models are no longer estimators. By Andreas M\u00fcller .\nAll tree-based estimators now accept a min_impurity_decrease \nparameter in lieu of the min_impurity_split , which is now deprecated.\nThe min_impurity_decrease helps stop splitting the nodes in which\nthe weighted impurity decrease from splitting is no longer at least\n min_impurity_decrease . #8449 by Raghav RV .\nLinear, kernelized and related models\nn_iter parameter is deprecated in linear_model.SGDClassifier ,\n linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron . By Tom Dupre la Tour .\nOther predictors\nneighbors.LSHForest has been deprecated and will be\nremoved in 0.21 due to poor performance.\n #9078 by Laurent Direr .\nneighbors.NearestCentroid no longer purports to support\n metric='precomputed' which now raises an error. #8515 by\n Sergul Aydore .\nThe alpha parameter of semi_supervised.LabelPropagation now\nhas no effect and is deprecated to be removed in 0.21. #9239 \nby Andre Ambrosio Boechat , Utkarsh Upadhyay , and Joel Nothman .\nDecomposition, manifold learning and clustering\nDeprecate the doc_topic_distr argument of the perplexity method\nin decomposition.LatentDirichletAllocation because the\nuser no longer has access to the unnormalized document topic distribution\nneeded for the perplexity calculation. #7954 by\n Gary Foreman .\nThe n_topics parameter of decomposition.LatentDirichletAllocation \nhas been renamed to n_components and will be removed in version 0.21.\n #8922 by @Attractadore .\ndecomposition.SparsePCA.transform \u2019s ridge_alpha parameter is\ndeprecated in preference for class parameter.\n #8137 by Naoya Kanai .\ncluster.DBSCAN now has a metric_params parameter.\n #8139 by Naoya Kanai .\nPreprocessing and feature selection\nfeature_selection.SelectFromModel now has a partial_fit \nmethod only if the underlying estimator does. By Andreas M\u00fcller .\nfeature_selection.SelectFromModel now validates the threshold \nparameter and sets the threshold_ attribute during the call to\n fit , and no longer during the call to transform . By Andreas\nM\u00fcller .\nThe non_negative parameter in feature_extraction.FeatureHasher \nhas been deprecated, and replaced with a more principled alternative,\n alternate_sign .\n #7565 by Roman Yurchak .\nlinear_model.RandomizedLogisticRegression ,\nand linear_model.RandomizedLasso have been deprecated and will\nbe removed in version 0.21.\n #8995 by Ramana.S .\nModel evaluation and meta-estimators\nDeprecate the fit_params constructor input to the\n model_selection.GridSearchCV and\n model_selection.RandomizedSearchCV in favor\nof passing keyword parameters to the fit methods\nof those classes. Data-dependent parameters needed for model\ntraining should be passed as keyword arguments to fit ,\nand conforming to this convention will allow the hyperparameter\nselection classes to be used with tools such as\n model_selection.cross_val_predict .\n #2879 by Stephen Hoover .\nIn version 0.21, the default behavior of splitters that use the\n test_size and train_size parameter will change, such that\nspecifying train_size alone will cause test_size to be the\nremainder. #7459 by Nelson Liu .\nmulticlass.OneVsRestClassifier now has partial_fit ,\n decision_function and predict_proba methods only when the\nunderlying estimator does. #7812 by Andreas M\u00fcller and\n Mikhail Korobov .\nmulticlass.OneVsRestClassifier now has a partial_fit method\nonly if the underlying estimator does. By Andreas M\u00fcller .\nThe decision_function output shape for binary classification in\n multiclass.OneVsRestClassifier and\n multiclass.OneVsOneClassifier is now (n_samples,) to conform\nto scikit-learn conventions. #9100 by Andreas M\u00fcller .\nThe multioutput.MultiOutputClassifier.predict_proba \nfunction used to return a 3d array ( n_samples , n_classes ,\n n_outputs ). In the case where different target columns had different\nnumbers of classes, a ValueError would be raised on trying to stack\nmatrices with different dimensions. This function now returns a list of\narrays where the length of the list is n_outputs , and each array is\n( n_samples , n_classes ) for that particular output.\n #8093 by Peter Bull .\nReplace attribute named_steps dict to utils.Bunch \nin pipeline.Pipeline to enable tab completion in interactive\nenvironment. In the case conflict value on named_steps and dict \nattribute, dict behavior will be prioritized.\n #8481 by Herilalaina Rakotoarison .\nMiscellaneous\nDeprecate the y parameter in transform and inverse_transform .\nThe method should not accept y parameter, as it\u2019s used at the prediction time.\n #8174 by Tahar Zanouda , Alexandre Gramfort \nand Raghav RV .\nSciPy >= 0.13.3 and NumPy >= 1.8.2 are now the minimum supported versions\nfor scikit-learn. The following backported functions in\n sklearn.utils have been removed or deprecated accordingly.\n #8854 and #8874 by Naoya Kanai\nThe store_covariances and covariances_ parameters of\n discriminant_analysis.QuadraticDiscriminantAnalysis \nhave been renamed to store_covariance and covariance_ to be\nconsistent with the corresponding parameter names of the\n discriminant_analysis.LinearDiscriminantAnalysis . They will be\nremoved in version 0.21. #7998 by Jiacheng\nRemoved in 0.19:\nutils.fixes.argpartition\nutils.fixes.array_equal\nutils.fixes.astype\nutils.fixes.bincount\nutils.fixes.expit\nutils.fixes.frombuffer_empty\nutils.fixes.in1d\nutils.fixes.norm\nutils.fixes.rankdata\nutils.fixes.safe_copy\nDeprecated in 0.19, to be removed in 0.21:\nutils.arpack.eigs\nutils.arpack.eigsh\nutils.arpack.svds\nutils.extmath.fast_dot\nutils.extmath.logsumexp\nutils.extmath.norm\nutils.extmath.pinvh\nutils.graph.graph_laplacian\nutils.random.choice\nutils.sparsetools.connected_components\nutils.stats.rankdata\nEstimators with both methods decision_function and predict_proba \nare now required to have a monotonic relation between them. The\nmethod check_decision_proba_consistency has been added in\n utils.estimator_checks to check their consistency.\n #7578 by Shubham Bhardwaj\nAll checks in utils.estimator_checks , in particular\n utils.estimator_checks.check_estimator now accept estimator\ninstances. Most other checks do not accept\nestimator classes any more. #9019 by Andreas M\u00fcller .\nEnsure that estimators\u2019 attributes ending with _ are not set\nin the constructor but only in the fit method. Most notably,\nensemble estimators (deriving from ensemble.BaseEnsemble )\nnow only have self.estimators_ available after fit .\n #7464 by Lars Buitinck and Loic Esteve .\nCode and Documentation Contributors #\nThanks to everyone who has contributed to the maintenance and improvement of the\nproject since version 0.18, including:\nJoel Nothman, Loic Esteve, Andreas Mueller, Guillaume Lemaitre, Olivier Grisel,\nHanmin Qin, Raghav RV, Alexandre Gramfort, themrmax, Aman Dalmia, Gael\nVaroquaux, Naoya Kanai, Tom Dupr\u00e9 la Tour, Rishikesh, Nelson Liu, Taehoon Lee,\nNelle Varoquaux, Aashil, Mikhail Korobov, Sebastin Santy, Joan Massich, Roman\nYurchak, RAKOTOARISON Herilalaina, Thierry Guillemot, Alexandre Abadie, Carol\nWilling, Balakumaran Manoharan, Josh Karnofsky, Vlad Niculae, Utkarsh Upadhyay,\nDmitry Petrov, Minghui Liu, Srivatsan, Vincent Pham, Albert Thomas, Jake\nVanderPlas, Attractadore, JC Liu, alexandercbooth, chkoar, \u00d3scar N\u00e1jera,\nAarshay Jain, Kyle Gilliam, Ramana Subramanyam, CJ Carey, Clement Joudet, David\nRobles, He Chen, Joris Van den Bossche, Karan Desai, Katie Luangkote, Leland\nMcInnes, Maniteja Nandana, Michele Lacchia, Sergei Lebedev, Shubham Bhardwaj,\nakshay0724, omtcyfz, rickiepark, waterponey, Vathsala Achar, jbDelafosse, Ralf\nGommers, Ekaterina Krivich, Vivek Kumar, Ishank Gulati, Dave Elliott, ldirer,\nReiichiro Nakano, Levi John Wolf, Mathieu Blondel, Sid Kapur, Dougal J.\nSutherland, midinas, mikebenfield, Sourav Singh, Aseem Bansal, Ibraim Ganiev,\nStephen Hoover, AishwaryaRK, Steven C. Howell, Gary Foreman, Neeraj Gangwar,\nTahar, Jon Crall, dokato, Kathy Chen, ferria, Thomas Moreau, Charlie Brummitt,\nNicolas Goix, Adam Kleczewski, Sam Shleifer, Nikita Singh, Basil Beirouti,\nGiorgio Patrini, Manoj Kumar, Rafael Possas, James Bourbeau, James A. Bednar,\nJanine Harper, Jaye, Jean Helie, Jeremy Steward, Artsiom, John Wei, Jonathan\nLIgo, Jonathan Rahn, seanpwilliams, Arthur Mensch, Josh Levy, Julian Kuhlmann,\nJulien Aubert, J\u00f6rn Hees, Kai, shivamgargsya, Kat Hempstalk, Kaushik\nLakshmikanth, Kennedy, Kenneth Lyons, Kenneth Myers, Kevin Yap, Kirill Bobyrev,\nKonstantin Podshumok, Arthur Imbert, Lee Murray, toastedcornflakes, Lera, Li\nLi, Arthur Douillard, Mainak Jas, tobycheese, Manraj Singh, Manvendra Singh,\nMarc Meketon, MarcoFalke, Matthew Brett, Matthias Gilch, Mehul Ahuja, Melanie\nGoetz, Meng, Peng, Michael Dezube, Michal Baumgartner, vibrantabhi19, Artem\nGolubin, Milen Paskov, Antonin Carette, Morikko, MrMjauh, NALEPA Emmanuel,\nNamiya, Antoine Wendlinger, Narine Kokhlikyan, NarineK, Nate Guerin, Angus\nWilliams, Ang Lu, Nicole Vavrova, Nitish Pandey, Okhlopkov Daniil Olegovich,\nAndy Craze, Om Prakash, Parminder Singh, Patrick Carlson, Patrick Pei, Paul\nGanssle, Paulo Haddad, Pawe\u0142 Lorek, Peng Yu, Pete Bachant, Peter Bull, Peter\nCsizsek, Peter Wang, Pieter Arthur de Jong, Ping-Yao, Chang, Preston Parry,\nPuneet Mathur, Quentin Hibon, Andrew Smith, Andrew Jackson, 1kastner, Rameshwar\nBhaskaran, Rebecca Bilbro, Remi Rampin, Andrea Esuli, Rob Hall, Robert\nBradshaw, Romain Brault, Aman Pratik, Ruifeng Zheng, Russell Smith, Sachin\nAgarwal, Sailesh Choyal, Samson Tan, Samu\u00ebl Weber, Sarah Brown, Sebastian\nP\u00f6lsterl, Sebastian Raschka, Sebastian Saeger, Alyssa Batula, Abhyuday Pratap\nSingh, Sergey Feldman, Sergul Aydore, Sharan Yalburgi, willduan, Siddharth\nGupta, Sri Krishna, Almer, Stijn Tonk, Allen Riddell, Theofilos Papapanagiotou,\nAlison, Alexis Mignon, Tommy Boucher, Tommy L\u00f6fstedt, Toshihiro Kamishima,\nTyler Folkman, Tyler Lanigan, Alexander Junge, Varun Shenoy, Victor Poughon,\nVilhelm von Ehrenheim, Aleksandr Sandrovskii, Alan Yee, Vlasios Vasileiou,\nWarut Vijitbenjaronk, Yang Zhang, Yaroslav Halchenko, Yichuan Liu, Yuichi\nFujikawa, affanv14, aivision2020, xor, andreh7, brady salz, campustrampus,\nAgamemnon Krasoulis, ditenberg, elena-sharova, filipj8, fukatani, gedeck,\nguiniol, guoci, hakaa1, hongkahjun, i-am-xhy, jakirkham, jaroslaw-weber,\njayzed82, jeroko, jmontoyam, jonathan.striebel, josephsalmon, jschendel,\nleereeves, martin-hahn, mathurinm, mehak-sachdeva, mlewis1729, mlliou112,\nmthorrell, ndingwall, nuffe, yangarbiter, plagree, pldtc325, Breno Freitas,\nBrett Olsen, Brian A. Alfano, Brian Burns, polmauri, Brandon Carter, Charlton\nAustin, Chayant T15h, Chinmaya Pancholi, Christian Danielsen, Chung Yen,\nChyi-Kwei Yau, pravarmahajan, DOHMATOB Elvis, Daniel LeJeune, Daniel Hnyk,\nDarius Morawiec, David DeTomaso, David Gasquez, David Haberth\u00fcr, David\nHeryanto, David Kirkby, David Nicholson, rashchedrin, Deborah Gertrude Digges,\nDenis Engemann, Devansh D, Dickson, Bob Baxley, Don86, E. Lynch-Klarup, Ed\nRogers, Elizabeth Ferriss, Ellen-Co2, Fabian Egli, Fang-Chieh Chou, Bing Tian\nDai, Greg Stupp, Grzegorz Szpak, Bertrand Thirion, Hadrien Bertrand, Harizo\nRajaona, zxcvbnius, Henry Lin, Holger Peters, Icyblade Dai, Igor\nAndriushchenko, Ilya, Isaac Laughlin, Iv\u00e1n Vall\u00e9s, Aur\u00e9lien Bellet, JPFrancoia,\nJacob Schreiber, Asish Mahapatra\nprevious\nVersion 0.20\nnext\nVersion 0.18\nOn this page\nVersion 0.19.2\nRelated changes\nVersion 0.19.1\nChangelog\nAPI changes\nBug fixes\nEnhancements\nCode and Documentation Contributors\nVersion 0.19\nHighlights\nChanged models\nChangelog\nNew features\nEnhancements\nBug fixes\nAPI changes summary\nCode and Documentation Contributors\nThis Page\nShow Source\n\u00a9 Copyright 2007 - 2025, scikit-learn developers (BSD License).\n"
    },
    {
      "turn_idx": 6,
      "action": "fetch_website_content({\"website_url\": \"https://scikit-learn.org/0.21/\"})",
      "explanation": "",
      "observation": "\nscikit-learn: machine learning in Python \u2014 scikit-learn 0.21.3 documentation\nHome\nInstallation\nDocumentation\nScikit-learn\nTutorials\nUser guide\nAPI\nGlossary\nFAQ\nDevelopment\nRoadmap\nAbout us\nAll available versions\nPDF documentation\nExamples\n\u2039\n\u203a\nscikit-learn\nMachine Learning in Python\nSimple and efficient tools for data mining and data analysis\nAccessible to everybody, and reusable in various contexts\nBuilt on NumPy, SciPy, and matplotlib\nOpen source, commercially usable - BSD license\nClassification\nIdentifying to which category an object belongs to.\nApplications : Spam detection, Image recognition.\nAlgorithms :\u00a0 SVM , nearest neighbors , random forest , \u2026\nExamples\nRegression\nPredicting a continuous-valued attribute associated with an object.\nApplications : Drug response, Stock prices.\nAlgorithms :\u00a0 SVR , ridge regression , Lasso , \u2026\nExamples\nClustering\nAutomatic grouping of similar objects into sets.\nApplications : Customer segmentation, Grouping experiment outcomes\nAlgorithms :\u00a0 k-Means , spectral clustering , mean-shift , \u2026\nExamples\nDimensionality reduction\nReducing the number of random variables to consider.\nApplications : Visualization, Increased efficiency\nAlgorithms :\u00a0 PCA , feature selection , non-negative matrix factorization .\nExamples\nModel selection\nComparing, validating and choosing parameters and models.\nGoal : Improved accuracy via parameter tuning\nModules :\u00a0 grid search , cross validation , metrics .\nExamples\nPreprocessing\nFeature extraction and normalization.\nApplication : Transforming input data such as text for use with machine learning algorithms.\nModules :\u00a0 preprocessing , feature extraction .\nExamples\nNews\nOn-going development:\nWhat's new (Changelog)\nScikit-learn from 0.21 requires Python 3.5 or greater.\nJuly 2019. scikit-learn 0.21.3 ( Changelog ) and 0.20.4 ( Changelog ) are available for download.\nMay 2019. scikit-learn 0.21.0 to 0.21.2 are available for download ( Changelog ).\nMarch 2019. scikit-learn 0.20.3 is available for download ( Changelog ).\nSeptember 2018. scikit-learn 0.20.0 is available for download ( Changelog ).\nJuly 2018. scikit-learn 0.19.2 is available for download ( Changelog ).\nJuly 2017. scikit-learn 0.19.0 is available for download ( Changelog ).\nCommunity\nAbout us See authors and contributing\nMore Machine Learning Find related projects\nQuestions? See FAQ and stackoverflow\nMailing list: scikit-learn@python.org\nIRC: #scikit-learn @ freenode\nHelp us, donate!\nCite us!\nRead more about donations\nWho uses scikit-learn?\n\"We use scikit-learn to support leading-edge basic research [...]\"\n\"I think it's the most well-designed ML package I've seen so far.\"\n\"scikit-learn's ease-of-use, performance and overall variety of algorithms implemented has proved invaluable [...].\"\n\"For these tasks, we relied on the excellent scikit-learn package for Python.\"\n\"The great benefit of scikit-learn is its fast learning curve [...]\"\n\"It allows us to do AWesome stuff we would not otherwise accomplish\"\n\"scikit-learn makes doing advanced analysis in Python accessible to anyone.\"\nMore testimonials\nFunding provided by INRIA and others.\nMore information on our contributors\n"
    },
    {
      "turn_idx": 7,
      "action": "fetch_website_content({\"website_url\": \"https://scikit-learn.org/0.19/whats_new.html\"})",
      "explanation": "",
      "observation": "\nRelease history \u2014 scikit-learn 0.19.2 documentation\nHome\nInstallation\nDocumentation\nScikit-learn\nTutorials\nUser guide\nAPI\nFAQ\nContributing\nAll available versions\nPDF documentation\nExamples\nPrevious\nWho is using ...\nWho is using scikit-learn?\nNext\nscikit-learn ...\nscikit-learn Tutorials\nUp\nscikit-learn\nscikit-learn\nscikit-learn v0.19.2\nOther versions\nPlease cite us if you use the software.\nRelease history\nVersion 0.19.2\nRelated changes\nVersion 0.19.1\nChangelog\nAPI changes\nBug fixes\nEnhancements\nCode and Documentation Contributors\nVersion 0.19\nHighlights\nChanged models\nChangelog\nNew features\nEnhancements\nBug fixes\nAPI changes summary\nCode and Documentation Contributors\nVersion 0.18.2\nChangelog\nCode Contributors\nVersion 0.18.1\nChangelog\nEnhancements\nBug fixes\nAPI changes summary\nVersion 0.18\nModel Selection Enhancements and API Changes\nChangelog\nNew features\nEnhancements\nBug fixes\nAPI changes summary\nCode Contributors\nVersion 0.17.1\nChangelog\nBug fixes\nVersion 0.17\nChangelog\nNew features\nEnhancements\nBug fixes\nAPI changes summary\nCode Contributors\nVersion 0.16.1\nChangelog\nBug fixes\nVersion 0.16\nHighlights\nChangelog\nNew features\nEnhancements\nDocumentation improvements\nBug fixes\nAPI changes summary\nCode Contributors\nVersion 0.15.2\nBug fixes\nVersion 0.15.1\nBug fixes\nVersion 0.15\nHighlights\nChangelog\nNew features\nEnhancements\nDocumentation improvements\nBug fixes\nAPI changes summary\nPeople\nVersion 0.14\nChangelog\nAPI changes summary\nPeople\nVersion 0.13.1\nChangelog\nPeople\nVersion 0.13\nNew Estimator Classes\nChangelog\nAPI changes summary\nPeople\nVersion 0.12.1\nChangelog\nPeople\nVersion 0.12\nChangelog\nAPI changes summary\nPeople\nVersion 0.11\nChangelog\nHighlights\nOther changes\nAPI changes summary\nPeople\nVersion 0.10\nChangelog\nAPI changes summary\nPeople\nVersion 0.9\nChangelog\nAPI changes summary\nPeople\nVersion 0.8\nChangelog\nPeople\nVersion 0.7\nChangelog\nPeople\nVersion 0.6\nChangelog\nPeople\nVersion 0.5\nChangelog\nNew classes\nDocumentation\nFixes\nExamples\nExternal dependencies\nRemoved modules\nMisc\nAuthors\nVersion 0.4\nChangelog\nAuthors\nEarlier versions\nRelease history \u00b6\nVersion 0.19.2 \u00b6\nOctober, 2018\nThis release is exclusively in order to support Python 3.7.\nRelated changes \u00b6\nn_iter_ may vary from previous releases in\n linear_model.LogisticRegression with solver='lbfgs' and\n linear_model.HuberRegressor . For Scipy <= 1.0.0, the optimizer could\nperform more than the requested maximum number of iterations. Now both\nestimators will report at most max_iter iterations even if more were\nperformed. #10723 by Joel Nothman .\nVersion 0.19.1 \u00b6\nOctober, 2017\nThis is a bug-fix release with some minor documentation improvements and\nenhancements to features released in 0.19.0.\nNote there may be minor differences in TSNE output in this release (due to\n #9623 ), in the case where multiple samples have equal distance to some\nsample.\nChangelog \u00b6\nAPI changes \u00b6\nReverted the addition of metrics.ndcg_score and metrics.dcg_score \nwhich had been merged into version 0.19.0 by error. The implementations\nwere broken and undocumented.\nreturn_train_score which was added to\n model_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV and\n model_selection.cross_validate in version 0.19.0 will be changing its\ndefault value from True to False in version 0.21. We found that calculating\ntraining score could have a great effect on cross validation runtime in some\ncases. Users should explicitly set return_train_score to False if\nprediction or scoring functions are slow, resulting in a deleterious effect\non CV runtime, or to True if they wish to use the calculated scores.\n #9677 by Kumar Ashutosh and Joel\nNothman .\ncorrelation_models and regression_models from the legacy gaussian\nprocesses implementation have been belatedly deprecated. #9717 by\n Kumar Ashutosh .\nBug fixes \u00b6\nAvoid integer overflows in metrics.matthews_corrcoef .\n #9693 by Sam Steingold .\nFix ValueError in preprocessing.LabelEncoder when using\n inverse_transform on unseen labels. #9816 by Charlie Newey .\nFixed a bug in the objective function for manifold.TSNE (both exact\nand with the Barnes-Hut approximation) when n_components >= 3 .\n #9711 by @goncalo-rodrigues .\nFix regression in model_selection.cross_val_predict where it\nraised an error with method='predict_proba' for some probabilistic\nclassifiers. #9641 by James Bourbeau .\nFixed a bug where datasets.make_classification modified its input\n weights . #9865 by Sachin Kelkar .\nmodel_selection.StratifiedShuffleSplit now works with multioutput\nmulticlass or multilabel data with more than 1000 columns. #9922 by\n Charlie Brummitt .\nFixed a bug with nested and conditional parameter setting, e.g. setting a\npipeline step and its parameter at the same time. #9945 by Andreas\nM\u00fcller and Joel Nothman .\nRegressions in 0.19.0 fixed in 0.19.1:\nFixed a bug where parallelised prediction in random forests was not\nthread-safe and could (rarely) result in arbitrary errors. #9830 by\n Joel Nothman .\nFix regression in model_selection.cross_val_predict where it no\nlonger accepted X as a list. #9600 by Rasul Kerimov .\nFixed handling of model_selection.cross_val_predict for binary classification with\n method='decision_function' . #9593 by Reiichiro Nakano and core devs.\nFix regression in pipeline.Pipeline where it no longer accepted\n steps as a tuple. #9604 by Joris Van den Bossche .\nFix bug where n_iter was not properly deprecated, leaving n_iter \nunavailable for interim use in\n linear_model.SGDClassifier , linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron . #9558 by Andreas M\u00fcller .\nDataset fetchers make sure temporary files are closed before removing them,\nwhich caused errors on Windows. #9847 by Joan Massich .\nFixed a regression in manifold.TSNE where it no longer supported\nmetrics other than \u2018euclidean\u2019 and \u2018precomputed\u2019. #9623 by Oli\nBlum .\nEnhancements \u00b6\nOur test suite and utils.estimator_checks.check_estimators can now be\nrun without Nose installed. #9697 by Joan Massich .\nTo improve usability of version 0.19\u2019s pipeline.Pipeline \ncaching, memory now allows joblib.Memory instances.\nThis make use of the new utils.validation.check_memory helper.\n #9584 by Kumar Ashutosh\nSome fixes to examples: #9750 , #9788 , #9815\nMade a FutureWarning in SGD-based estimators less verbose. #9802 by\n Vrishank Bhardwaj .\nCode and Documentation Contributors \u00b6\nWith thanks to:\nJoel Nothman, Loic Esteve, Andreas Mueller, Kumar Ashutosh,\nVrishank Bhardwaj, Hanmin Qin, Rasul Kerimov, James Bourbeau,\nNagarjuna Kumar, Nathaniel Saul, Olivier Grisel, Roman\nYurchak, Reiichiro Nakano, Sachin Kelkar, Sam Steingold,\nYaroslav Halchenko, diegodlh, felix, goncalo-rodrigues,\njkleint, oliblum90, pasbi, Anthony Gitter, Ben Lawson, Charlie\nBrummitt, Didi Bar-Zev, Gael Varoquaux, Joan Massich, Joris\nVan den Bossche, nielsenmarkus11\nVersion 0.19 \u00b6\nAugust 12, 2017\nHighlights \u00b6\nWe are excited to release a number of great new features including\n neighbors.LocalOutlierFactor for anomaly detection,\n preprocessing.QuantileTransformer for robust feature transformation,\nand the multioutput.ClassifierChain meta-estimator to simply account\nfor dependencies between classes in multilabel problems. We have some new\nalgorithms in existing estimators, such as multiplicative update in\n decomposition.NMF and multinomial\n linear_model.LogisticRegression with L1 loss (use solver='saga' ).\nCross validation is now able to return the results from multiple metric\nevaluations. The new model_selection.cross_validate can return many\nscores on the test data as well as training set performance and timings, and we\nhave extended the scoring and refit parameters for grid/randomized\nsearch to handle multiple metrics .\nYou can also learn faster. For instance, the new option to cache\ntransformations in pipeline.Pipeline makes grid\nsearch over pipelines including slow transformations much more efficient. And\nyou can predict faster: if you\u2019re sure you know what you\u2019re doing, you can turn\noff validating that the input is finite using config_context .\nWe\u2019ve made some important fixes too. We\u2019ve fixed a longstanding implementation\nerror in metrics.average_precision_score , so please be cautious with\nprior results reported from that function. A number of errors in the\n manifold.TSNE implementation have been fixed, particularly in the\ndefault Barnes-Hut approximation. semi_supervised.LabelSpreading and\n semi_supervised.LabelPropagation have had substantial fixes.\nLabelPropagation was previously broken. LabelSpreading should now correctly\nrespect its alpha parameter.\nChanged models \u00b6\nThe following estimators and functions, when fit with the same data and\nparameters, may produce different models from the previous version. This often\noccurs due to changes in the modelling logic (bug fixes or enhancements), or in\nrandom sampling procedures.\ncluster.KMeans with sparse X and initial centroids given (bug fix)\ncross_decomposition.PLSRegression \nwith scale=True (bug fix)\nensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor where min_impurity_split is used (bug fix)\ngradient boosting loss='quantile' (bug fix)\nensemble.IsolationForest (bug fix)\nfeature_selection.SelectFdr (bug fix)\nlinear_model.RANSACRegressor (bug fix)\nlinear_model.LassoLars (bug fix)\nlinear_model.LassoLarsIC (bug fix)\nmanifold.TSNE (bug fix)\nneighbors.NearestCentroid (bug fix)\nsemi_supervised.LabelSpreading (bug fix)\nsemi_supervised.LabelPropagation (bug fix)\ntree based models where min_weight_fraction_leaf is used (enhancement)\nDetails are listed in the changelog below.\n(While we are trying to better inform users by providing this information, we\ncannot assure that this list is complete.)\nChangelog \u00b6\nNew features \u00b6\nClassifiers and regressors\nAdded multioutput.ClassifierChain for multi-label\nclassification. By Adam Kleczewski .\nAdded solver 'saga' that implements the improved version of Stochastic\nAverage Gradient, in linear_model.LogisticRegression and\n linear_model.Ridge . It allows the use of L1 penalty with\nmultinomial logistic loss, and behaves marginally better than \u2018sag\u2019\nduring the first epochs of ridge and logistic regression.\n #8446 by Arthur Mensch .\nOther estimators\nAdded the neighbors.LocalOutlierFactor class for anomaly\ndetection based on nearest neighbors.\n #5279 by Nicolas Goix and Alexandre Gramfort .\nAdded preprocessing.QuantileTransformer class and\n preprocessing.quantile_transform function for features\nnormalization based on quantiles.\n #8363 by Denis Engemann ,\n Guillaume Lemaitre , Olivier Grisel , Raghav RV ,\n Thierry Guillemot , and Gael Varoquaux .\nThe new solver 'mu' implements a Multiplicate Update in\n decomposition.NMF , allowing the optimization of all\nbeta-divergences, including the Frobenius norm, the generalized\nKullback-Leibler divergence and the Itakura-Saito divergence.\n #5295 by Tom Dupre la Tour .\nModel selection and evaluation\nmodel_selection.GridSearchCV and\n model_selection.RandomizedSearchCV now support simultaneous\nevaluation of multiple metrics. Refer to the\n Specifying multiple metrics for evaluation section of the user guide for more\ninformation. #7388 by Raghav RV\nAdded the model_selection.cross_validate which allows evaluation\nof multiple metrics. This function returns a dict with more useful\ninformation from cross-validation such as the train scores, fit times and\nscore times.\nRefer to The cross_validate function and multiple metric evaluation section of the userguide\nfor more information. #7388 by Raghav RV\nAdded metrics.mean_squared_log_error , which computes\nthe mean square error of the logarithmic transformation of targets,\nparticularly useful for targets with an exponential trend.\n #7655 by Karan Desai .\nAdded metrics.dcg_score and metrics.ndcg_score , which\ncompute Discounted cumulative gain (DCG) and Normalized discounted\ncumulative gain (NDCG).\n #7739 by David Gasquez .\nAdded the model_selection.RepeatedKFold and\n model_selection.RepeatedStratifiedKFold .\n #8120 by Neeraj Gangwar .\nAdded a scorer based on metrics.explained_variance_score .\n #9259 by Hanmin Qin .\nMiscellaneous\nValidation that input data contains no NaN or inf can now be suppressed\nusing config_context , at your own risk. This will save on runtime,\nand may be particularly useful for prediction time. #7548 by\n Joel Nothman .\nAdded a test to ensure parameter listing in docstrings match the\nfunction/class signature. #9206 by Alexandre Gramfort and\n Raghav RV .\nEnhancements \u00b6\nTrees and ensembles\nThe min_weight_fraction_leaf constraint in tree construction is now\nmore efficient, taking a fast path to declare a node a leaf if its weight\nis less than 2 * the minimum. Note that the constructed tree will be\ndifferent from previous versions where min_weight_fraction_leaf is\nused. #7441 by Nelson Liu .\nensemble.GradientBoostingClassifier and ensemble.GradientBoostingRegressor \nnow support sparse input for prediction.\n #6101 by Ibraim Ganiev .\nensemble.VotingClassifier now allows changing estimators by using\n ensemble.VotingClassifier.set_params . An estimator can also be\nremoved by setting it to None .\n #7674 by Yichuan Liu .\ntree.export_graphviz now shows configurable number of decimal\nplaces. #8698 by Guillaume Lemaitre .\nAdded flatten_transform parameter to ensemble.VotingClassifier \nto change output shape of transform method to 2 dimensional.\n #7794 by Ibraim Ganiev and\n Herilalaina Rakotoarison .\nLinear, kernelized and related models\nlinear_model.SGDClassifier , linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron now expose max_iter and\n tol parameters, to handle convergence more precisely.\n n_iter parameter is deprecated, and the fitted estimator exposes\na n_iter_ attribute, with actual number of iterations before\nconvergence. #5036 by Tom Dupre la Tour .\nAdded average parameter to perform weight averaging in\n linear_model.PassiveAggressiveClassifier . #4939 \nby Andrea Esuli .\nlinear_model.RANSACRegressor no longer throws an error\nwhen calling fit if no inliers are found in its first iteration.\nFurthermore, causes of skipped iterations are tracked in newly added\nattributes, n_skips_* .\n #7914 by Michael Horrell .\nIn gaussian_process.GaussianProcessRegressor , method predict \nis a lot faster with return_std=True . #8591 by\n Hadrien Bertrand .\nAdded return_std to predict method of\n linear_model.ARDRegression and\n linear_model.BayesianRidge .\n #7838 by Sergey Feldman .\nMemory usage enhancements: Prevent cast from float32 to float64 in:\n linear_model.MultiTaskElasticNet ;\n linear_model.LogisticRegression when using newton-cg solver; and\n linear_model.Ridge when using svd, sparse_cg, cholesky or lsqr\nsolvers. #8835 , #8061 by Joan Massich and Nicolas\nCordier and Thierry Guillemot .\nOther predictors\nCustom metrics for the neighbors binary trees now have\nfewer constraints: they must take two 1d-arrays and return a float.\n #6288 by Jake Vanderplas .\nalgorithm='auto in neighbors estimators now chooses the most\nappropriate algorithm for all input types and metrics. #9145 by\n Herilalaina Rakotoarison and Reddy Chinthala .\nDecomposition, manifold learning and clustering\ncluster.MiniBatchKMeans and cluster.KMeans \nnow use significantly less memory when assigning data points to their\nnearest cluster center. #7721 by Jon Crall .\ndecomposition.PCA , decomposition.IncrementalPCA and\n decomposition.TruncatedSVD now expose the singular values\nfrom the underlying SVD. They are stored in the attribute\n singular_values_ , like in decomposition.IncrementalPCA .\n #7685 by Tommy L\u00f6fstedt\nFixed the implementation of noise_variance_ in decomposition.PCA .\n #9108 by Hanmin Qin .\ndecomposition.NMF now faster when beta_loss=0 .\n #9277 by @hongkahjun .\nMemory improvements for method barnes_hut in manifold.TSNE\n#7089 by Thomas Moreau and Olivier Grisel .\nOptimization schedule improvements for Barnes-Hut manifold.TSNE \nso the results are closer to the one from the reference implementation\n lvdmaaten/bhtsne by Thomas\nMoreau and Olivier Grisel .\nMemory usage enhancements: Prevent cast from float32 to float64 in\n decomposition.PCA and\n decomposition.randomized_svd_low_rank .\n #9067 by Raghav RV .\nPreprocessing and feature selection\nAdded norm_order parameter to feature_selection.SelectFromModel \nto enable selection of the norm order when coef_ is more than 1D.\n #6181 by Antoine Wendlinger .\nAdded ability to use sparse matrices in feature_selection.f_regression \nwith center=True . #8065 by Daniel LeJeune .\nSmall performance improvement to n-gram creation in\n feature_extraction.text by binding methods for loops and\nspecial-casing unigrams. #7567 by Jaye Doepke\nRelax assumption on the data for the\n kernel_approximation.SkewedChi2Sampler . Since the Skewed-Chi2\nkernel is defined on the open interval ,\nthe transform function should not check whether X < 0 but whether X <\n-self.skewedness . #7573 by Romain Brault .\nMade default kernel parameters kernel-dependent in\n kernel_approximation.Nystroem .\n #5229 by Saurabh Bansod and Andreas M\u00fcller .\nModel evaluation and meta-estimators\npipeline.Pipeline is now able to cache transformers\nwithin a pipeline by using the memory constructor parameter.\n #7990 by Guillaume Lemaitre .\npipeline.Pipeline steps can now be accessed as attributes of its\n named_steps attribute. #8586 by Herilalaina\nRakotoarison .\nAdded sample_weight parameter to pipeline.Pipeline.score .\n #7723 by Mikhail Korobov .\nAdded ability to set n_jobs parameter to pipeline.make_union .\nA TypeError will be raised for any other kwargs. #8028 \nby Alexander Booth .\nmodel_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV and\n model_selection.cross_val_score now allow estimators with callable\nkernels which were previously prohibited.\n #8005 by Andreas M\u00fcller .\nmodel_selection.cross_val_predict now returns output of the\ncorrect shape for all values of the argument method .\n #7863 by Aman Dalmia .\nAdded shuffle and random_state parameters to shuffle training\ndata before taking prefixes of it based on training sizes in\n model_selection.learning_curve .\n #7506 by Narine Kokhlikyan .\nmodel_selection.StratifiedShuffleSplit now works with multioutput\nmulticlass (or multilabel) data. #9044 by Vlad Niculae .\nSpeed improvements to model_selection.StratifiedShuffleSplit .\n #5991 by Arthur Mensch and Joel Nothman .\nAdd shuffle parameter to model_selection.train_test_split .\n #8845 by themrmax\nmultioutput.MultiOutputRegressor and multioutput.MultiOutputClassifier \nnow support online learning using partial_fit .\n:issue: 8053 by Peng Yu .\nAdd max_train_size parameter to model_selection.TimeSeriesSplit\n#8282 by Aman Dalmia .\nMore clustering metrics are now available through metrics.get_scorer \nand scoring parameters. #8117 by Raghav RV .\nMetrics\nmetrics.matthews_corrcoef now support multiclass classification.\n #8094 by Jon Crall .\nAdd sample_weight parameter to metrics.cohen_kappa_score .\n #8335 by Victor Poughon .\nMiscellaneous\nutils.check_estimator now attempts to ensure that methods\ntransform, predict, etc. do not set attributes on the estimator.\n #7533 by Ekaterina Krivich .\nAdded type checking to the accept_sparse parameter in\n utils.validation methods. This parameter now accepts only boolean,\nstring, or list/tuple of strings. accept_sparse=None is deprecated and\nshould be replaced by accept_sparse=False .\n #7880 by Josh Karnofsky .\nMake it possible to load a chunk of an svmlight formatted file by\npassing a range of bytes to datasets.load_svmlight_file .\n #935 by Olivier Grisel .\ndummy.DummyClassifier and dummy.DummyRegressor \nnow accept non-finite features. #8931 by @Attractadore .\nBug fixes \u00b6\nTrees and ensembles\nFixed a memory leak in trees when using trees with criterion='mae' .\n #8002 by Raghav RV .\nFixed a bug where ensemble.IsolationForest uses an\nan incorrect formula for the average path length\n #8549 by Peter Wang .\nFixed a bug where ensemble.AdaBoostClassifier throws\n ZeroDivisionError while fitting data with single class labels.\n #7501 by Dominik Krzeminski .\nFixed a bug in ensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor where a float being compared\nto 0.0 using == caused a divide by zero error. #7970 by\n He Chen .\nFix a bug where ensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor ignored the\n min_impurity_split parameter.\n #8006 by Sebastian P\u00f6lsterl .\nFixed oob_score in ensemble.BaggingClassifier .\n #8936 by Michael Lewis\nFixed excessive memory usage in prediction for random forests estimators.\n #8672 by Mike Benfield .\nFixed a bug where sample_weight as a list broke random forests in Python 2\n #8068 by @xor .\nFixed a bug where ensemble.IsolationForest fails when\n max_features is less than 1.\n #5732 by Ishank Gulati .\nFix a bug where gradient boosting with loss='quantile' computed\nnegative errors for negative values of ytrue - ypred leading to wrong\nvalues when calling __call__ .\n #8087 by Alexis Mignon\nFix a bug where ensemble.VotingClassifier raises an error\nwhen a numpy array is passed in for weights. #7983 by\n Vincent Pham .\nFixed a bug where tree.export_graphviz raised an error\nwhen the length of features_names does not match n_features in the decision\ntree. #8512 by Li Li .\nLinear, kernelized and related models\nFixed a bug where linear_model.RANSACRegressor.fit may run until\n max_iter if it finds a large inlier group early. #8251 by\n @aivision2020 .\nFixed a bug where naive_bayes.MultinomialNB and\n naive_bayes.BernoulliNB failed when alpha=0 . #5814 by\n Yichuan Liu and Herilalaina Rakotoarison .\nFixed a bug where linear_model.LassoLars does not give\nthe same result as the LassoLars implementation available\nin R (lars library). #7849 by Jair Montoya Martinez .\nFixed a bug in linear_model.RandomizedLasso ,\n linear_model.Lars , linear_model.LassoLars ,\n linear_model.LarsCV and linear_model.LassoLarsCV ,\nwhere the parameter precompute was not used consistently across\nclasses, and some values proposed in the docstring could raise errors.\n #5359 by Tom Dupre la Tour .\nFix inconsistent results between linear_model.RidgeCV and\n linear_model.Ridge when using normalize=True . #9302 \nby Alexandre Gramfort .\nFix a bug where linear_model.LassoLars.fit sometimes\nleft coef_ as a list, rather than an ndarray.\n #8160 by CJ Carey .\nFix linear_model.BayesianRidge.fit to return\nridge parameter alpha_ and lambda_ consistent with calculated\ncoefficients coef_ and intercept_ .\n #8224 by Peter Gedeck .\nFixed a bug in svm.OneClassSVM where it returned floats instead of\ninteger classes. #8676 by Vathsala Achar .\nFix AIC/BIC criterion computation in linear_model.LassoLarsIC .\n #9022 by Alexandre Gramfort and Mehmet Basbug .\nFixed a memory leak in our LibLinear implementation. #9024 by\n Sergei Lebedev\nFix bug where stratified CV splitters did not work with\n linear_model.LassoCV . #8973 by\n Paulo Haddad .\nFixed a bug in gaussian_process.GaussianProcessRegressor \nwhen the standard deviation and covariance predicted without fit\nwould fail with a unmeaningful error by default.\n #6573 by Quazi Marufur Rahman and\n Manoj Kumar .\nOther predictors\nFix semi_supervised.BaseLabelPropagation to correctly implement\n LabelPropagation and LabelSpreading as done in the referenced\npapers. #9239 \nby Andre Ambrosio Boechat , Utkarsh Upadhyay , and Joel Nothman .\nDecomposition, manifold learning and clustering\nFixed the implementation of manifold.TSNE :\nearly_exageration parameter had no effect and is now used for the\nfirst 250 optimization iterations.\nFixed the AssertionError: Tree consistency failed exception\nreported in #8992 .\nImprove the learning schedule to match the one from the reference\nimplementation lvdmaaten/bhtsne .\nby Thomas Moreau and Olivier Grisel .\nFix a bug in decomposition.LatentDirichletAllocation \nwhere the perplexity method was returning incorrect results because\nthe transform method returns normalized document topic distributions\nas of version 0.18. #7954 by Gary Foreman .\nFix output shape and bugs with n_jobs > 1 in\n decomposition.SparseCoder transform and\n decomposition.sparse_encode \nfor one-dimensional data and one component.\nThis also impacts the output shape of decomposition.DictionaryLearning .\n #8086 by Andreas M\u00fcller .\nFixed the implementation of explained_variance_ \nin decomposition.PCA ,\n decomposition.RandomizedPCA and\n decomposition.IncrementalPCA .\n #9105 by Hanmin Qin .\nFixed the implementation of noise_variance_ in decomposition.PCA .\n #9108 by Hanmin Qin .\nFixed a bug where cluster.DBSCAN gives incorrect\nresult when input is a precomputed sparse matrix with initial\nrows all zero. #8306 by Akshay Gupta\nFix a bug regarding fitting cluster.KMeans with a sparse\narray X and initial centroids, where X\u2019s means were unnecessarily being\nsubtracted from the centroids. #7872 by Josh Karnofsky .\nFixes to the input validation in covariance.EllipticEnvelope .\n #8086 by Andreas M\u00fcller .\nFixed a bug in covariance.MinCovDet where inputting data\nthat produced a singular covariance matrix would cause the helper method\n _c_step to throw an exception.\n #3367 by Jeremy Steward\nFixed a bug in manifold.TSNE affecting convergence of the\ngradient descent. #8768 by David DeTomaso .\nFixed a bug in manifold.TSNE where it stored the incorrect\n kl_divergence_ . #6507 by Sebastian Saeger .\nFixed improper scaling in cross_decomposition.PLSRegression \nwith scale=True . #7819 by jayzed82 .\ncluster.bicluster.SpectralCoclustering and\n cluster.bicluster.SpectralBiclustering fit method conforms\nwith API by accepting y and returning the object. #6126 ,\n #7814 by Laurent Direr and Maniteja\nNandana .\nFix bug where mixture sample methods did not return as many\nsamples as requested. #7702 by Levi John Wolf .\nFixed the shrinkage implementation in neighbors.NearestCentroid .\n #9219 by Hanmin Qin .\nPreprocessing and feature selection\nFor sparse matrices, preprocessing.normalize with return_norm=True \nwill now raise a NotImplementedError with \u2018l1\u2019 or \u2018l2\u2019 norm and with\nnorm \u2018max\u2019 the norms returned will be the same as for dense matrices.\n #7771 by Ang Lu .\nFix a bug where feature_selection.SelectFdr did not\nexactly implement Benjamini-Hochberg procedure. It formerly may have\nselected fewer features than it should.\n #7490 by Peng Meng .\nFixed a bug where linear_model.RandomizedLasso and\n linear_model.RandomizedLogisticRegression breaks for\nsparse input. #8259 by Aman Dalmia .\nFix a bug where feature_extraction.FeatureHasher \nmandatorily applied a sparse random projection to the hashed features,\npreventing the use of\n feature_extraction.text.HashingVectorizer in a\npipeline with feature_extraction.text.TfidfTransformer .\n #7565 by Roman Yurchak .\nFix a bug where feature_selection.mutual_info_regression did not\ncorrectly use n_neighbors . #8181 by Guillaume Lemaitre .\nModel evaluation and meta-estimators\nFixed a bug where model_selection.BaseSearchCV.inverse_transform \nreturns self.best_estimator_.transform() instead of\n self.best_estimator_.inverse_transform() .\n #8344 by Akshay Gupta and Rasmus Eriksson .\nAdded classes_ attribute to model_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV , grid_search.GridSearchCV ,\nand grid_search.RandomizedSearchCV that matches the classes_ \nattribute of best_estimator_ . #7661 and #8295 \nby Alyssa Batula , Dylan Werner-Meier ,\nand Stephen Hoover .\nFixed a bug where model_selection.validation_curve \nreused the same estimator for each parameter value.\n #7365 by Aleksandr Sandrovskii .\nmodel_selection.permutation_test_score now works with Pandas\ntypes. #5697 by Stijn Tonk .\nSeveral fixes to input validation in\n multiclass.OutputCodeClassifier\n#8086 by Andreas M\u00fcller .\nmulticlass.OneVsOneClassifier \u2019s partial_fit now ensures all\nclasses are provided up-front. #6250 by\n Asish Panda .\nFix multioutput.MultiOutputClassifier.predict_proba to return a\nlist of 2d arrays, rather than a 3d array. In the case where different\ntarget columns had different numbers of classes, a ValueError would be\nraised on trying to stack matrices with different dimensions.\n #8093 by Peter Bull .\nCross validation now works with Pandas datatypes that that have a\nread-only index. #9507 by Loic Esteve .\nMetrics\nmetrics.average_precision_score no longer linearly\ninterpolates between operating points, and instead weighs precisions\nby the change in recall since the last operating point, as per the\n Wikipedia entry .\n( #7356 ). By\n Nick Dingwall and Gael Varoquaux .\nFix a bug in metrics.classification._check_targets \nwhich would return 'binary' if y_true and y_pred were\nboth 'binary' but the union of y_true and y_pred was\n 'multiclass' . #8377 by Loic Esteve .\nFixed an integer overflow bug in metrics.confusion_matrix and\nhence metrics.cohen_kappa_score . #8354 , #7929 \nby Joel Nothman and Jon Crall .\nFixed passing of gamma parameter to the chi2 kernel in\n metrics.pairwise.pairwise_kernels #5211 by\n Nick Rhinehart ,\n Saurabh Bansod and Andreas M\u00fcller .\nMiscellaneous\nFixed a bug when datasets.make_classification fails\nwhen generating more than 30 features. #8159 by\n Herilalaina Rakotoarison .\nFixed a bug where datasets.make_moons gives an\nincorrect result when n_samples is odd.\n #8198 by Josh Levy .\nSome fetch_ functions in datasets were ignoring the\n download_if_missing keyword. #7944 by Ralf Gommers .\nFix estimators to accept a sample_weight parameter of type\n pandas.Series in their fit function. #7825 by\n Kathleen Chen .\nFix a bug in cases where numpy.cumsum may be numerically unstable,\nraising an exception if instability is identified. #7376 and\n #7331 by Joel Nothman and @yangarbiter .\nFix a bug where base.BaseEstimator.__getstate__ \nobstructed pickling customizations of child-classes, when used in a\nmultiple inheritance context.\n #8316 by Holger Peters .\nUpdate Sphinx-Gallery from 0.1.4 to 0.1.7 for resolving links in\ndocumentation build with Sphinx>1.5 #8010 , #7986 by\n Oscar Najera\nAdd data_home parameter to sklearn.datasets.fetch_kddcup99 .\n #9289 by Loic Esteve .\nFix dataset loaders using Python 3 version of makedirs to also work in\nPython 2. #9284 by Sebastin Santy .\nSeveral minor issues were fixed with thanks to the alerts of\n[lgtm.com]( http://lgtm.com ). #9278 by Jean Helie ,\namong others.\nAPI changes summary \u00b6\nTrees and ensembles\nGradient boosting base models are no longer estimators. By Andreas M\u00fcller .\nAll tree based estimators now accept a min_impurity_decrease \nparameter in lieu of the min_impurity_split , which is now deprecated.\nThe min_impurity_decrease helps stop splitting the nodes in which\nthe weighted impurity decrease from splitting is no longer alteast\n min_impurity_decrease . #8449 by Raghav RV .\nLinear, kernelized and related models\nn_iter parameter is deprecated in linear_model.SGDClassifier ,\n linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron . By Tom Dupre la Tour .\nOther predictors\nneighbors.LSHForest has been deprecated and will be\nremoved in 0.21 due to poor performance.\n #9078 by Laurent Direr .\nneighbors.NearestCentroid no longer purports to support\n metric='precomputed' which now raises an error. #8515 by\n Sergul Aydore .\nThe alpha parameter of semi_supervised.LabelPropagation now\nhas no effect and is deprecated to be removed in 0.21. #9239 \nby Andre Ambrosio Boechat , Utkarsh Upadhyay , and Joel Nothman .\nDecomposition, manifold learning and clustering\nDeprecate the doc_topic_distr argument of the perplexity method\nin decomposition.LatentDirichletAllocation because the\nuser no longer has access to the unnormalized document topic distribution\nneeded for the perplexity calculation. #7954 by\n Gary Foreman .\nThe n_topics parameter of decomposition.LatentDirichletAllocation \nhas been renamed to n_components and will be removed in version 0.21.\n #8922 by @Attractadore .\ndecomposition.SparsePCA.transform \u2019s ridge_alpha parameter is\ndeprecated in preference for class parameter.\n #8137 by Naoya Kanai .\ncluster.DBSCAN now has a metric_params parameter.\n #8139 by Naoya Kanai .\nPreprocessing and feature selection\nfeature_selection.SelectFromModel now has a partial_fit \nmethod only if the underlying estimator does. By Andreas M\u00fcller .\nfeature_selection.SelectFromModel now validates the threshold \nparameter and sets the threshold_ attribute during the call to\n fit , and no longer during the call to transform` . By Andreas\nM\u00fcller .\nThe non_negative parameter in feature_extraction.FeatureHasher \nhas been deprecated, and replaced with a more principled alternative,\n alternate_sign .\n #7565 by Roman Yurchak .\nlinear_model.RandomizedLogisticRegression ,\nand linear_model.RandomizedLasso have been deprecated and will\nbe removed in version 0.21.\n #8995 by Ramana.S .\nModel evaluation and meta-estimators\nDeprecate the fit_params constructor input to the\n model_selection.GridSearchCV and\n model_selection.RandomizedSearchCV in favor\nof passing keyword parameters to the fit methods\nof those classes. Data-dependent parameters needed for model\ntraining should be passed as keyword arguments to fit ,\nand conforming to this convention will allow the hyperparameter\nselection classes to be used with tools such as\n model_selection.cross_val_predict .\n #2879 by Stephen Hoover .\nIn version 0.21, the default behavior of splitters that use the\n test_size and train_size parameter will change, such that\nspecifying train_size alone will cause test_size to be the\nremainder. #7459 by Nelson Liu .\nmulticlass.OneVsRestClassifier now has partial_fit ,\n decision_function and predict_proba methods only when the\nunderlying estimator does. #7812 by Andreas M\u00fcller and\n Mikhail Korobov .\nmulticlass.OneVsRestClassifier now has a partial_fit method\nonly if the underlying estimator does. By Andreas M\u00fcller .\nThe decision_function output shape for binary classification in\n multiclass.OneVsRestClassifier and\n multiclass.OneVsOneClassifier is now (n_samples,) to conform\nto scikit-learn conventions. #9100 by Andreas M\u00fcller .\nThe multioutput.MultiOutputClassifier.predict_proba \nfunction used to return a 3d array ( n_samples , n_classes ,\n n_outputs ). In the case where different target columns had different\nnumbers of classes, a ValueError would be raised on trying to stack\nmatrices with different dimensions. This function now returns a list of\narrays where the length of the list is n_outputs , and each array is\n( n_samples , n_classes ) for that particular output.\n #8093 by Peter Bull .\nReplace attribute named_steps dict to utils.Bunch \nin pipeline.Pipeline to enable tab completion in interactive\nenvironment. In the case conflict value on named_steps and dict \nattribute, dict behavior will be prioritized.\n #8481 by Herilalaina Rakotoarison .\nMiscellaneous\nDeprecate the y parameter in transform and inverse_transform .\nThe method should not accept y parameter, as it\u2019s used at the prediction time.\n #8174 by Tahar Zanouda , Alexandre Gramfort \nand Raghav RV .\nSciPy >= 0.13.3 and NumPy >= 1.8.2 are now the minimum supported versions\nfor scikit-learn. The following backported functions in\n utils have been removed or deprecated accordingly.\n #8854 and #8874 by Naoya Kanai\nThe store_covariances and covariances_ parameters of\n discriminant_analysis.QuadraticDiscriminantAnalysis \nhas been renamed to store_covariance and covariance_ to be\nconsistent with the corresponding parameter names of the\n discriminant_analysis.LinearDiscriminantAnalysis . They will be\nremoved in version 0.21. #7998 by Jiacheng\nRemoved in 0.19:\nutils.fixes.argpartition\nutils.fixes.array_equal\nutils.fixes.astype\nutils.fixes.bincount\nutils.fixes.expit\nutils.fixes.frombuffer_empty\nutils.fixes.in1d\nutils.fixes.norm\nutils.fixes.rankdata\nutils.fixes.safe_copy\nDeprecated in 0.19, to be removed in 0.21:\nutils.arpack.eigs\nutils.arpack.eigsh\nutils.arpack.svds\nutils.extmath.fast_dot\nutils.extmath.logsumexp\nutils.extmath.norm\nutils.extmath.pinvh\nutils.graph.graph_laplacian\nutils.random.choice\nutils.sparsetools.connected_components\nutils.stats.rankdata\nEstimators with both methods decision_function and predict_proba \nare now required to have a monotonic relation between them. The\nmethod check_decision_proba_consistency has been added in\n utils.estimator_checks to check their consistency.\n #7578 by Shubham Bhardwaj\nAll checks in utils.estimator_checks , in particular\n utils.estimator_checks.check_estimator now accept estimator\ninstances. Most other checks do not accept\nestimator classes any more. #9019 by Andreas M\u00fcller .\nEnsure that estimators\u2019 attributes ending with _ are not set\nin the constructor but only in the fit method. Most notably,\nensemble estimators (deriving from ensemble.BaseEnsemble )\nnow only have self.estimators_ available after fit .\n #7464 by Lars Buitinck and Loic Esteve .\nCode and Documentation Contributors \u00b6\nThanks to everyone who has contributed to the maintenance and improvement of the\nproject since version 0.18, including:\nJoel Nothman, Loic Esteve, Andreas Mueller, Guillaume Lemaitre, Olivier Grisel,\nHanmin Qin, Raghav RV, Alexandre Gramfort, themrmax, Aman Dalmia, Gael\nVaroquaux, Naoya Kanai, Tom Dupr\u00e9 la Tour, Rishikesh, Nelson Liu, Taehoon Lee,\nNelle Varoquaux, Aashil, Mikhail Korobov, Sebastin Santy, Joan Massich, Roman\nYurchak, RAKOTOARISON Herilalaina, Thierry Guillemot, Alexandre Abadie, Carol\nWilling, Balakumaran Manoharan, Josh Karnofsky, Vlad Niculae, Utkarsh Upadhyay,\nDmitry Petrov, Minghui Liu, Srivatsan, Vincent Pham, Albert Thomas, Jake\nVanderPlas, Attractadore, JC Liu, alexandercbooth, chkoar, \u00d3scar N\u00e1jera,\nAarshay Jain, Kyle Gilliam, Ramana Subramanyam, CJ Carey, Clement Joudet, David\nRobles, He Chen, Joris Van den Bossche, Karan Desai, Katie Luangkote, Leland\nMcInnes, Maniteja Nandana, Michele Lacchia, Sergei Lebedev, Shubham Bhardwaj,\nakshay0724, omtcyfz, rickiepark, waterponey, Vathsala Achar, jbDelafosse, Ralf\nGommers, Ekaterina Krivich, Vivek Kumar, Ishank Gulati, Dave Elliott, ldirer,\nReiichiro Nakano, Levi John Wolf, Mathieu Blondel, Sid Kapur, Dougal J.\nSutherland, midinas, mikebenfield, Sourav Singh, Aseem Bansal, Ibraim Ganiev,\nStephen Hoover, AishwaryaRK, Steven C. Howell, Gary Foreman, Neeraj Gangwar,\nTahar, Jon Crall, dokato, Kathy Chen, ferria, Thomas Moreau, Charlie Brummitt,\nNicolas Goix, Adam Kleczewski, Sam Shleifer, Nikita Singh, Basil Beirouti,\nGiorgio Patrini, Manoj Kumar, Rafael Possas, James Bourbeau, James A. Bednar,\nJanine Harper, Jaye, Jean Helie, Jeremy Steward, Artsiom, John Wei, Jonathan\nLIgo, Jonathan Rahn, seanpwilliams, Arthur Mensch, Josh Levy, Julian Kuhlmann,\nJulien Aubert, J\u00f6rn Hees, Kai, shivamgargsya, Kat Hempstalk, Kaushik\nLakshmikanth, Kennedy, Kenneth Lyons, Kenneth Myers, Kevin Yap, Kirill Bobyrev,\nKonstantin Podshumok, Arthur Imbert, Lee Murray, toastedcornflakes, Lera, Li\nLi, Arthur Douillard, Mainak Jas, tobycheese, Manraj Singh, Manvendra Singh,\nMarc Meketon, MarcoFalke, Matthew Brett, Matthias Gilch, Mehul Ahuja, Melanie\nGoetz, Meng, Peng, Michael Dezube, Michal Baumgartner, vibrantabhi19, Artem\nGolubin, Milen Paskov, Antonin Carette, Morikko, MrMjauh, NALEPA Emmanuel,\nNamiya, Antoine Wendlinger, Narine Kokhlikyan, NarineK, Nate Guerin, Angus\nWilliams, Ang Lu, Nicole Vavrova, Nitish Pandey, Okhlopkov Daniil Olegovich,\nAndy Craze, Om Prakash, Parminder Singh, Patrick Carlson, Patrick Pei, Paul\nGanssle, Paulo Haddad, Pawe\u0142 Lorek, Peng Yu, Pete Bachant, Peter Bull, Peter\nCsizsek, Peter Wang, Pieter Arthur de Jong, Ping-Yao, Chang, Preston Parry,\nPuneet Mathur, Quentin Hibon, Andrew Smith, Andrew Jackson, 1kastner, Rameshwar\nBhaskaran, Rebecca Bilbro, Remi Rampin, Andrea Esuli, Rob Hall, Robert\nBradshaw, Romain Brault, Aman Pratik, Ruifeng Zheng, Russell Smith, Sachin\nAgarwal, Sailesh Choyal, Samson Tan, Samu\u00ebl Weber, Sarah Brown, Sebastian\nP\u00f6lsterl, Sebastian Raschka, Sebastian Saeger, Alyssa Batula, Abhyuday Pratap\nSingh, Sergey Feldman, Sergul Aydore, Sharan Yalburgi, willduan, Siddharth\nGupta, Sri Krishna, Almer, Stijn Tonk, Allen Riddell, Theofilos Papapanagiotou,\nAlison, Alexis Mignon, Tommy Boucher, Tommy L\u00f6fstedt, Toshihiro Kamishima,\nTyler Folkman, Tyler Lanigan, Alexander Junge, Varun Shenoy, Victor Poughon,\nVilhelm von Ehrenheim, Aleksandr Sandrovskii, Alan Yee, Vlasios Vasileiou,\nWarut Vijitbenjaronk, Yang Zhang, Yaroslav Halchenko, Yichuan Liu, Yuichi\nFujikawa, affanv14, aivision2020, xor, andreh7, brady salz, campustrampus,\nAgamemnon Krasoulis, ditenberg, elena-sharova, filipj8, fukatani, gedeck,\nguiniol, guoci, hakaa1, hongkahjun, i-am-xhy, jakirkham, jaroslaw-weber,\njayzed82, jeroko, jmontoyam, jonathan.striebel, josephsalmon, jschendel,\nleereeves, martin-hahn, mathurinm, mehak-sachdeva, mlewis1729, mlliou112,\nmthorrell, ndingwall, nuffe, yangarbiter, plagree, pldtc325, Breno Freitas,\nBrett Olsen, Brian A. Alfano, Brian Burns, polmauri, Brandon Carter, Charlton\nAustin, Chayant T15h, Chinmaya Pancholi, Christian Danielsen, Chung Yen,\nChyi-Kwei Yau, pravarmahajan, DOHMATOB Elvis, Daniel LeJeune, Daniel Hnyk,\nDarius Morawiec, David DeTomaso, David Gasquez, David Haberth\u00fcr, David\nHeryanto, David Kirkby, David Nicholson, rashchedrin, Deborah Gertrude Digges,\nDenis Engemann, Devansh D, Dickson, Bob Baxley, Don86, E. Lynch-Klarup, Ed\nRogers, Elizabeth Ferriss, Ellen-Co2, Fabian Egli, Fang-Chieh Chou, Bing Tian\nDai, Greg Stupp, Grzegorz Szpak, Bertrand Thirion, Hadrien Bertrand, Harizo\nRajaona, zxcvbnius, Henry Lin, Holger Peters, Icyblade Dai, Igor\nAndriushchenko, Ilya, Isaac Laughlin, Iv\u00e1n Vall\u00e9s, Aur\u00e9lien Bellet, JPFrancoia,\nJacob Schreiber, Asish Mahapatra\nVersion 0.18.2 \u00b6\nJune 20, 2017\nLast release with Python 2.6 support\nScikit-learn 0.18 is the last major release of scikit-learn to support Python 2.6.\nLater versions of scikit-learn will require Python 2.7 or above.\nChangelog \u00b6\nFixes for compatibility with NumPy 1.13.0: #7946 #8355 by\n Loic Esteve .\nMinor compatibility changes in the examples #9010 #8040\n#9149 .\nCode Contributors \u00b6\nAman Dalmia, Loic Esteve, Nate Guerin, Sergei Lebedev\nVersion 0.18.1 \u00b6\nNovember 11, 2016\nChangelog \u00b6\nEnhancements \u00b6\nImproved sample_without_replacement speed by utilizing\nnumpy.random.permutation for most cases. As a result,\nsamples may differ in this release for a fixed random state.\nAffected estimators:\nensemble.BaggingClassifier\nensemble.BaggingRegressor\nlinear_model.RANSACRegressor\nmodel_selection.RandomizedSearchCV\nrandom_projection.SparseRandomProjection\nThis also affects the datasets.make_classification \nmethod.\nBug fixes \u00b6\nFix issue where min_grad_norm and n_iter_without_progress \nparameters were not being utilised by manifold.TSNE .\n #6497 by Sebastian S\u00e4ger\nFix bug for svm\u2019s decision values when decision_function_shape \nis ovr in svm.SVC .\n svm.SVC \u2019s decision_function was incorrect from versions\n0.17.0 through 0.18.0.\n #7724 by Bing Tian Dai\nAttribute explained_variance_ratio of\n discriminant_analysis.LinearDiscriminantAnalysis calculated\nwith SVD and Eigen solver are now of the same length. #7632 \nby JPFrancoia\nFixes issue in Univariate feature selection where score\nfunctions were not accepting multi-label targets. #7676 \nby Mohammed Affan\nFixed setting parameters when calling fit multiple times on\n feature_selection.SelectFromModel . #7756 by Andreas M\u00fcller\nFixes issue in partial_fit method of\n multiclass.OneVsRestClassifier when number of classes used in\n partial_fit was less than the total number of classes in the\ndata. #7786 by Srivatsan Ramesh\nFixes issue in calibration.CalibratedClassifierCV where\nthe sum of probabilities of each class for a data was not 1, and\n CalibratedClassifierCV now handles the case where the training set\nhas less number of classes than the total data. #7799 by\n Srivatsan Ramesh\nFix a bug where sklearn.feature_selection.SelectFdr did not\nexactly implement Benjamini-Hochberg procedure. It formerly may have\nselected fewer features than it should.\n #7490 by Peng Meng .\nsklearn.manifold.LocallyLinearEmbedding now correctly handles\ninteger inputs. #6282 by Jake Vanderplas .\nThe min_weight_fraction_leaf parameter of tree-based classifiers and\nregressors now assumes uniform sample weights by default if the\n sample_weight argument is not passed to the fit function.\nPreviously, the parameter was silently ignored. #7301 \nby Nelson Liu .\nNumerical issue with linear_model.RidgeCV on centered data when\n n_features > n_samples . #6178 by Bertrand Thirion\nTree splitting criterion classes\u2019 cloning/pickling is now memory safe\n #7680 by Ibraim Ganiev .\nFixed a bug where decomposition.NMF sets its n_iters_ \nattribute in transform() . #7553 by Ekaterina\nKrivich .\nsklearn.linear_model.LogisticRegressionCV now correctly handles\nstring labels. #5874 by Raghav RV .\nFixed a bug where sklearn.model_selection.train_test_split raised\nan error when stratify is a list of string labels. #7593 by\n Raghav RV .\nFixed a bug where sklearn.model_selection.GridSearchCV and\n sklearn.model_selection.RandomizedSearchCV were not pickleable\nbecause of a pickling bug in np.ma.MaskedArray . #7594 by\n Raghav RV .\nAll cross-validation utilities in sklearn.model_selection now\npermit one time cross-validation splitters for the cv parameter. Also\nnon-deterministic cross-validation splitters (where multiple calls to\n split produce dissimilar splits) can be used as cv parameter.\nThe sklearn.model_selection.GridSearchCV will cross-validate each\nparameter setting on the split produced by the first split call\nto the cross-validation splitter. #7660 by Raghav RV .\nFix bug where preprocessing.MultiLabelBinarizer.fit_transform \nreturned an invalid CSR matrix.\n #7750 by CJ Carey .\nFixed a bug where metrics.pairwise.cosine_distances could return a\nsmall negative distance. #7732 by Artsion .\nAPI changes summary \u00b6\nTrees and forests\nThe min_weight_fraction_leaf parameter of tree-based classifiers and\nregressors now assumes uniform sample weights by default if the\n sample_weight argument is not passed to the fit function.\nPreviously, the parameter was silently ignored. #7301 by Nelson\nLiu .\nTree splitting criterion classes\u2019 cloning/pickling is now memory safe.\n #7680 by Ibraim Ganiev .\nLinear, kernelized and related models\nLength of explained_variance_ratio of\n discriminant_analysis.LinearDiscriminantAnalysis \nchanged for both Eigen and SVD solvers. The attribute has now a length\nof min(n_components, n_classes - 1). #7632 \nby JPFrancoia\nNumerical issue with linear_model.RidgeCV on centered data when\n n_features > n_samples . #6178 by Bertrand Thirion\nVersion 0.18 \u00b6\nSeptember 28, 2016\nLast release with Python 2.6 support\nScikit-learn 0.18 will be the last version of scikit-learn to support Python 2.6.\nLater versions of scikit-learn will require Python 2.7 or above.\nModel Selection Enhancements and API Changes \u00b6\nThe model_selection module\nThe new module sklearn.model_selection , which groups together the\nfunctionalities of formerly sklearn.cross_validation ,\n sklearn.grid_search and sklearn.learning_curve , introduces new\npossibilities such as nested cross-validation and better manipulation of\nparameter searches with Pandas.\nMany things will stay the same but there are some key differences. Read\nbelow to know more about the changes.\nData-independent CV splitters enabling nested cross-validation\nThe new cross-validation splitters, defined in the\n sklearn.model_selection , are no longer initialized with any\ndata-dependent parameters such as y . Instead they expose a\n split method that takes in the data and yields a generator for the\ndifferent splits.\nThis change makes it possible to use the cross-validation splitters to\nperform nested cross-validation, facilitated by\n model_selection.GridSearchCV and\n model_selection.RandomizedSearchCV utilities.\nThe enhanced cv_results_ attribute\nThe new cv_results_ attribute (of model_selection.GridSearchCV \nand model_selection.RandomizedSearchCV ) introduced in lieu of the\n grid_scores_ attribute is a dict of 1D arrays with elements in each\narray corresponding to the parameter settings (i.e. search candidates).\nThe cv_results_ dict can be easily imported into pandas as a\n DataFrame for exploring the search results.\nThe cv_results_ arrays include scores for each cross-validation split\n(with keys such as 'split0_test_score' ), as well as their mean\n( 'mean_test_score' ) and standard deviation ( 'std_test_score' ).\nThe ranks for the search candidates (based on their mean\ncross-validation score) is available at cv_results_['rank_test_score'] .\nThe parameter values for each parameter is stored separately as numpy\nmasked object arrays. The value, for that search candidate, is masked if\nthe corresponding parameter is not applicable. Additionally a list of all\nthe parameter dicts are stored at cv_results_['params'] .\nParameters n_folds and n_iter renamed to n_splits\nSome parameter names have changed:\nThe n_folds parameter in new model_selection.KFold ,\n model_selection.GroupKFold (see below for the name change),\nand model_selection.StratifiedKFold is now renamed to\n n_splits . The n_iter parameter in\n model_selection.ShuffleSplit , the new class\n model_selection.GroupShuffleSplit and\n model_selection.StratifiedShuffleSplit is now renamed to\n n_splits .\nRename of splitter classes which accepts group labels along with data\nThe cross-validation splitters LabelKFold ,\n LabelShuffleSplit , LeaveOneLabelOut and LeavePLabelOut have\nbeen renamed to model_selection.GroupKFold ,\n model_selection.GroupShuffleSplit ,\n model_selection.LeaveOneGroupOut and\n model_selection.LeavePGroupsOut respectively.\nNote the change from singular to plural form in\n model_selection.LeavePGroupsOut .\nFit parameter labels renamed to groups\nThe labels parameter in the split method of the newly renamed\nsplitters model_selection.GroupKFold ,\n model_selection.LeaveOneGroupOut ,\n model_selection.LeavePGroupsOut ,\n model_selection.GroupShuffleSplit is renamed to groups \nfollowing the new nomenclature of their class names.\nParameter n_labels renamed to n_groups\nThe parameter n_labels in the newly renamed\n model_selection.LeavePGroupsOut is changed to n_groups .\nTraining scores and Timing information\ncv_results_ also includes the training scores for each\ncross-validation split (with keys such as 'split0_train_score' ), as\nwell as their mean ( 'mean_train_score' ) and standard deviation\n( 'std_train_score' ). To avoid the cost of evaluating training score,\nset return_train_score=False .\nAdditionally the mean and standard deviation of the times taken to split,\ntrain and score the model across all the cross-validation splits is\navailable at the key 'mean_time' and 'std_time' respectively.\nChangelog \u00b6\nNew features \u00b6\nClassifiers and Regressors\nThe Gaussian Process module has been reimplemented and now offers classification\nand regression estimators through gaussian_process.GaussianProcessClassifier \nand gaussian_process.GaussianProcessRegressor . Among other things, the new\nimplementation supports kernel engineering, gradient-based hyperparameter optimization or\nsampling of functions from GP prior and GP posterior. Extensive documentation and\nexamples are provided. By Jan Hendrik Metzen .\nAdded new supervised learning algorithm: Multi-layer Perceptron\n#3204 by Issam H. Laradji\nAdded linear_model.HuberRegressor , a linear model robust to outliers.\n #5291 by Manoj Kumar .\nAdded the multioutput.MultiOutputRegressor meta-estimator. It\nconverts single output regressors to multi-output regressors by fitting\none regressor per output. By Tim Head .\nOther estimators\nNew mixture.GaussianMixture and mixture.BayesianGaussianMixture \nreplace former mixture models, employing faster inference\nfor sounder results. #7295 by Wei Xue and\n Thierry Guillemot .\nClass decomposition.RandomizedPCA is now factored into decomposition.PCA \nand it is available calling with parameter svd_solver='randomized' .\nThe default number of n_iter for 'randomized' has changed to 4. The old\nbehavior of PCA is recovered by svd_solver='full' . An additional solver\ncalls arpack and performs truncated (non-randomized) SVD. By default,\nthe best solver is selected depending on the size of the input and the\nnumber of components requested. #5299 by Giorgio Patrini .\nAdded two functions for mutual information estimation:\n feature_selection.mutual_info_classif and\n feature_selection.mutual_info_regression . These functions can be\nused in feature_selection.SelectKBest and\n feature_selection.SelectPercentile as score functions.\nBy Andrea Bravi and Nikolay Mayorov .\nAdded the ensemble.IsolationForest class for anomaly detection based on\nrandom forests. By Nicolas Goix .\nAdded algorithm=\"elkan\" to cluster.KMeans implementing\nElkan\u2019s fast K-Means algorithm. By Andreas M\u00fcller .\nModel selection and evaluation\nAdded metrics.cluster.fowlkes_mallows_score , the Fowlkes Mallows\nIndex which measures the similarity of two clusterings of a set of points\nBy Arnaud Fouchet and Thierry Guillemot .\nAdded metrics.calinski_harabaz_score , which computes the Calinski\nand Harabaz score to evaluate the resulting clustering of a set of points.\nBy Arnaud Fouchet and Thierry Guillemot .\nAdded new cross-validation splitter\n model_selection.TimeSeriesSplit to handle time series data.\n #6586 by YenChen Lin\nThe cross-validation iterators are replaced by cross-validation splitters\navailable from sklearn.model_selection , allowing for nested\ncross-validation. See Model Selection Enhancements and API Changes for more information.\n #4294 by Raghav RV .\nEnhancements \u00b6\nTrees and ensembles\nAdded a new splitting criterion for tree.DecisionTreeRegressor ,\nthe mean absolute error. This criterion can also be used in\n ensemble.ExtraTreesRegressor ,\n ensemble.RandomForestRegressor , and the gradient boosting\nestimators. #6667 by Nelson Liu .\nAdded weighted impurity-based early stopping criterion for decision tree\ngrowth. #6954 by Nelson Liu\nThe random forest, extra tree and decision tree estimators now has a\nmethod decision_path which returns the decision path of samples in\nthe tree. By Arnaud Joly .\nA new example has been added unveiling the decision tree structure.\nBy Arnaud Joly .\nRandom forest, extra trees, decision trees and gradient boosting estimator\naccept the parameter min_samples_split and min_samples_leaf \nprovided as a percentage of the training samples. By yelite and Arnaud Joly .\nGradient boosting estimators accept the parameter criterion to specify\nto splitting criterion used in built decision trees.\n #6667 by Nelson Liu .\nThe memory footprint is reduced (sometimes greatly) for\n ensemble.bagging.BaseBagging and classes that inherit from it,\ni.e, ensemble.BaggingClassifier ,\n ensemble.BaggingRegressor , and ensemble.IsolationForest ,\nby dynamically generating attribute estimators_samples_ only when it is\nneeded. By David Staub .\nAdded n_jobs and sample_weight parameters for\n ensemble.VotingClassifier to fit underlying estimators in parallel.\n #5805 by Ibraim Ganiev .\nLinear, kernelized and related models\nIn linear_model.LogisticRegression , the SAG solver is now\navailable in the multinomial case. #5251 by Tom Dupre la Tour .\nlinear_model.RANSACRegressor , svm.LinearSVC and\n svm.LinearSVR now support sample_weight .\nBy Imaculate .\nAdd parameter loss to linear_model.RANSACRegressor to measure the\nerror on the samples for every trial. By Manoj Kumar .\nPrediction of out-of-sample events with Isotonic Regression\n( isotonic.IsotonicRegression ) is now much faster (over 1000x in tests with synthetic\ndata). By Jonathan Arfa .\nIsotonic regression ( isotonic.IsotonicRegression ) now uses a better algorithm to avoid\n O(n^2) behavior in pathological cases, and is also generally faster\n( ##6691 ). By Antony Lee .\nnaive_bayes.GaussianNB now accepts data-independent class-priors\nthrough the parameter priors . By Guillaume Lemaitre .\nlinear_model.ElasticNet and linear_model.Lasso \nnow works with np.float32 input data without converting it\ninto np.float64 . This allows to reduce the memory\nconsumption. #6913 by YenChen Lin .\nsemi_supervised.LabelPropagation and semi_supervised.LabelSpreading \nnow accept arbitrary kernel functions in addition to strings knn and rbf .\n #5762 by Utkarsh Upadhyay .\nDecomposition, manifold learning and clustering\nAdded inverse_transform function to decomposition.NMF to compute\ndata matrix of original shape. By Anish Shah .\ncluster.KMeans and cluster.MiniBatchKMeans now works\nwith np.float32 and np.float64 input data without converting it.\nThis allows to reduce the memory consumption by using np.float32 .\n #6846 by Sebastian S\u00e4ger and\n YenChen Lin .\nPreprocessing and feature selection\npreprocessing.RobustScaler now accepts quantile_range parameter.\n #5929 by Konstantin Podshumok .\nfeature_extraction.FeatureHasher now accepts string values.\n #6173 by Ryad Zenine and\n Devashish Deshpande .\nKeyword arguments can now be supplied to func in\n preprocessing.FunctionTransformer by means of the kw_args \nparameter. By Brian McFee .\nfeature_selection.SelectKBest and feature_selection.SelectPercentile \nnow accept score functions that take X, y as input and return only the scores.\nBy Nikolay Mayorov .\nModel evaluation and meta-estimators\nmulticlass.OneVsOneClassifier and multiclass.OneVsRestClassifier \nnow support partial_fit . By Asish Panda and\n Philipp Dowling .\nAdded support for substituting or disabling pipeline.Pipeline \nand pipeline.FeatureUnion components using the set_params \ninterface that powers sklearn.grid_search .\nSee Selecting dimensionality reduction with Pipeline and GridSearchCV \nBy Joel Nothman and Robert McGibbon .\nThe new cv_results_ attribute of model_selection.GridSearchCV \n(and model_selection.RandomizedSearchCV ) can be easily imported\ninto pandas as a DataFrame . Ref Model Selection Enhancements and API Changes for\nmore information. #6697 by Raghav RV .\nGeneralization of model_selection.cross_val_predict .\nOne can pass method names such as predict_proba to be used in the cross\nvalidation framework instead of the default predict .\nBy Ori Ziv and Sears Merritt .\nThe training scores and time taken for training followed by scoring for\neach search candidate are now available at the cv_results_ dict.\nSee Model Selection Enhancements and API Changes for more information.\n #7325 by Eugene Chen and Raghav RV .\nMetrics\nAdded labels flag to metrics.log_loss to explicitly provide\nthe labels when the number of classes in y_true and y_pred differ.\n #7239 by Hong Guangguo with help from\n Mads Jensen and Nelson Liu .\nSupport sparse contingency matrices in cluster evaluation\n( metrics.cluster.supervised ) to scale to a large number of\nclusters.\n #7419 by Gregory Stupp and Joel Nothman .\nAdd sample_weight parameter to metrics.matthews_corrcoef .\nBy Jatin Shah and Raghav RV .\nSpeed up metrics.silhouette_score by using vectorized operations.\nBy Manoj Kumar .\nAdd sample_weight parameter to metrics.confusion_matrix .\nBy Bernardo Stein .\nMiscellaneous\nAdded n_jobs parameter to feature_selection.RFECV to compute\nthe score on the test folds in parallel. By Manoj Kumar\nCodebase does not contain C/C++ cython generated files: they are\ngenerated during build. Distribution packages will still contain generated\nC/C++ files. By Arthur Mensch .\nReduce the memory usage for 32-bit float input arrays of\n utils.sparse_func.mean_variance_axis and\n utils.sparse_func.incr_mean_variance_axis by supporting cython\nfused types. By YenChen Lin .\nThe ignore_warnings now accept a category argument to ignore only\nthe warnings of a specified type. By Thierry Guillemot .\nAdded parameter return_X_y and return type (data, target) : tuple option to\n load_iris dataset\n #7049 ,\n load_breast_cancer dataset\n #7152 ,\n load_digits dataset,\n load_diabetes dataset,\n load_linnerud dataset,\n load_boston dataset\n #7154 by\n Manvendra Singh .\nSimplification of the clone function, deprecate support for estimators\nthat modify parameters in __init__ . #5540 by Andreas M\u00fcller .\nWhen unpickling a scikit-learn estimator in a different version than the one\nthe estimator was trained with, a UserWarning is raised, see the documentation\non model persistence for more details. ( #7248 )\nBy Andreas M\u00fcller .\nBug fixes \u00b6\nTrees and ensembles\nRandom forest, extra trees, decision trees and gradient boosting\nwon\u2019t accept anymore min_samples_split=1 as at least 2 samples\nare required to split a decision tree node. By Arnaud Joly\nensemble.VotingClassifier now raises NotFittedError if predict ,\n transform or predict_proba are called on the non-fitted estimator.\nby Sebastian Raschka .\nFix bug where ensemble.AdaBoostClassifier and\n ensemble.AdaBoostRegressor would perform poorly if the\n random_state was fixed\n( #7411 ). By Joel Nothman .\nFix bug in ensembles with randomization where the ensemble would not\nset random_state on base estimators in a pipeline or similar nesting.\n( #7411 ). Note, results for ensemble.BaggingClassifier\nensemble.BaggingRegressor , ensemble.AdaBoostClassifier \nand ensemble.AdaBoostRegressor will now differ from previous\nversions. By Joel Nothman .\nLinear, kernelized and related models\nFixed incorrect gradient computation for loss='squared_epsilon_insensitive' in\n linear_model.SGDClassifier and linear_model.SGDRegressor \n( #6764 ). By Wenhua Yang .\nFix bug in linear_model.LogisticRegressionCV where\n solver='liblinear' did not accept class_weights='balanced .\n( #6817 ). By Tom Dupre la Tour .\nFix bug in neighbors.RadiusNeighborsClassifier where an error\noccurred when there were outliers being labelled and a weight function\nspecified ( #6902 ). By\n LeonieBorne .\nFix linear_model.ElasticNet sparse decision function to match\noutput with dense in the multioutput case.\nDecomposition, manifold learning and clustering\ndecomposition.RandomizedPCA default number of iterated_power is 4 instead of 3.\n #5141 by Giorgio Patrini .\nutils.extmath.randomized_svd performs 4 power iterations by default, instead or 0.\nIn practice this is enough for obtaining a good approximation of the\ntrue eigenvalues/vectors in the presence of noise. When n_components is\nsmall ( < .1 * min(X.shape) ) n_iter is set to 7, unless the user specifies\na higher number. This improves precision with few components.\n #5299 by Giorgio Patrini .\nWhiten/non-whiten inconsistency between components of decomposition.PCA \nand decomposition.RandomizedPCA (now factored into PCA, see the\nNew features) is fixed. components_ are stored with no whitening.\n #5299 by Giorgio Patrini .\nFixed bug in manifold.spectral_embedding where diagonal of unnormalized\nLaplacian matrix was incorrectly set to 1. #4995 by Peter Fischer .\nFixed incorrect initialization of utils.arpack.eigsh on all\noccurrences. Affects cluster.bicluster.SpectralBiclustering ,\n decomposition.KernelPCA , manifold.LocallyLinearEmbedding ,\nand manifold.SpectralEmbedding ( #5012 ). By\n Peter Fischer .\nAttribute explained_variance_ratio_ calculated with the SVD solver\nof discriminant_analysis.LinearDiscriminantAnalysis now returns\ncorrect results. By JPFrancoia\nPreprocessing and feature selection\npreprocessing.data._transform_selected now always passes a copy\nof X to transform function when copy=True ( #7194 ). By Caio\nOliveira .\nModel evaluation and meta-estimators\nmodel_selection.StratifiedKFold now raises error if all n_labels\nfor individual classes is less than n_folds.\n #6182 by Devashish Deshpande .\nFixed bug in model_selection.StratifiedShuffleSplit \nwhere train and test sample could overlap in some edge cases,\nsee #6121 for\nmore details. By Loic Esteve .\nFix in sklearn.model_selection.StratifiedShuffleSplit to\nreturn splits of size train_size and test_size in all cases\n( #6472 ). By Andreas M\u00fcller .\nCross-validation of OneVsOneClassifier and\n OneVsRestClassifier now works with precomputed kernels.\n #7350 by Russell Smith .\nFix incomplete predict_proba method delegation from\n model_selection.GridSearchCV to\n linear_model.SGDClassifier ( #7159 )\nby Yichuan Liu .\nMetrics\nFix bug in metrics.silhouette_score in which clusters of\nsize 1 were incorrectly scored. They should get a score of 0.\nBy Joel Nothman .\nFix bug in metrics.silhouette_samples so that it now works with\narbitrary labels, not just those ranging from 0 to n_clusters - 1.\nFix bug where expected and adjusted mutual information were incorrect if\ncluster contingency cells exceeded 2**16 . By Joel Nothman .\nmetrics.pairwise.pairwise_distances now converts arrays to\nboolean arrays when required in scipy.spatial.distance .\n #5460 by Tom Dupre la Tour .\nFix sparse input support in metrics.silhouette_score as well as\nexample examples/text/document_clustering.py. By YenChen Lin .\nmetrics.roc_curve and metrics.precision_recall_curve no\nlonger round y_score values when creating ROC curves; this was causing\nproblems for users with very small differences in scores ( #7353 ).\nMiscellaneous\nmodel_selection.tests._search._check_param_grid now works correctly with all types\nthat extends/implements Sequence (except string), including range (Python 3.x) and xrange\n(Python 2.x). #7323 by Viacheslav Kovalevskyi.\nutils.extmath.randomized_range_finder is more numerically stable when many\npower iterations are requested, since it applies LU normalization by default.\nIf n_iter<2 numerical issues are unlikely, thus no normalization is applied.\nOther normalization options are available: 'none', 'LU' and 'QR' .\n #5141 by Giorgio Patrini .\nFix a bug where some formats of scipy.sparse matrix, and estimators\nwith them as parameters, could not be passed to base.clone .\nBy Loic Esteve .\ndatasets.load_svmlight_file now is able to read long int QID values.\n #7101 by Ibraim Ganiev .\nAPI changes summary \u00b6\nLinear, kernelized and related models\nresidual_metric has been deprecated in linear_model.RANSACRegressor .\nUse loss instead. By Manoj Kumar .\nAccess to public attributes .X_ and .y_ has been deprecated in\n isotonic.IsotonicRegression . By Jonathan Arfa .\nDecomposition, manifold learning and clustering\nThe old mixture.DPGMM is deprecated in favor of the new\n mixture.BayesianGaussianMixture (with the parameter\n weight_concentration_prior_type='dirichlet_process' ).\nThe new class solves the computational\nproblems of the old class and computes the Gaussian mixture with a\nDirichlet process prior faster than before.\n #7295 by Wei Xue and Thierry Guillemot .\nThe old mixture.VBGMM is deprecated in favor of the new\n mixture.BayesianGaussianMixture (with the parameter\n weight_concentration_prior_type='dirichlet_distribution' ).\nThe new class solves the computational\nproblems of the old class and computes the Variational Bayesian Gaussian\nmixture faster than before.\n #6651 by Wei Xue and Thierry Guillemot .\nThe old mixture.GMM is deprecated in favor of the new\n mixture.GaussianMixture . The new class computes the Gaussian mixture\nfaster than before and some of computational problems have been solved.\n #6666 by Wei Xue and Thierry Guillemot .\nModel evaluation and meta-estimators\nThe sklearn.cross_validation , sklearn.grid_search and\n sklearn.learning_curve have been deprecated and the classes and\nfunctions have been reorganized into the sklearn.model_selection \nmodule. Ref Model Selection Enhancements and API Changes for more information.\n #4294 by Raghav RV .\nThe grid_scores_ attribute of model_selection.GridSearchCV \nand model_selection.RandomizedSearchCV is deprecated in favor of\nthe attribute cv_results_ .\nRef Model Selection Enhancements and API Changes for more information.\n #6697 by Raghav RV .\nThe parameters n_iter or n_folds in old CV splitters are replaced\nby the new parameter n_splits since it can provide a consistent\nand unambiguous interface to represent the number of train-test splits.\n #7187 by YenChen Lin .\nclasses parameter was renamed to labels in\n metrics.hamming_loss . #7260 by Sebasti\u00e1n Vanrell .\nThe splitter classes LabelKFold , LabelShuffleSplit ,\n LeaveOneLabelOut and LeavePLabelsOut are renamed to\n model_selection.GroupKFold ,\n model_selection.GroupShuffleSplit ,\n model_selection.LeaveOneGroupOut \nand model_selection.LeavePGroupsOut respectively.\nAlso the parameter labels in the split method of the newly\nrenamed splitters model_selection.LeaveOneGroupOut and\n model_selection.LeavePGroupsOut is renamed to\n groups . Additionally in model_selection.LeavePGroupsOut ,\nthe parameter n_labels is renamed to n_groups .\n #6660 by Raghav RV .\nError and loss names for scoring parameters are now prefixed by\n 'neg_' , such as neg_mean_squared_error . The unprefixed versions\nare deprecated and will be removed in version 0.20.\n #7261 by Tim Head .\nCode Contributors \u00b6\nAditya Joshi, Alejandro, Alexander Fabisch, Alexander Loginov, Alexander\nMinyushkin, Alexander Rudy, Alexandre Abadie, Alexandre Abraham, Alexandre\nGramfort, Alexandre Saint, alexfields, Alvaro Ulloa, alyssaq, Amlan Kar,\nAndreas Mueller, andrew giessel, Andrew Jackson, Andrew McCulloh, Andrew\nMurray, Anish Shah, Arafat, Archit Sharma, Ariel Rokem, Arnaud Joly, Arnaud\nRachez, Arthur Mensch, Ash Hoover, asnt, b0noI, Behzad Tabibian, Bernardo,\nBernhard Kratzwald, Bhargav Mangipudi, blakeflei, Boyuan Deng, Brandon Carter,\nBrett Naul, Brian McFee, Caio Oliveira, Camilo Lamus, Carol Willing, Cass,\nCeShine Lee, Charles Truong, Chyi-Kwei Yau, CJ Carey, codevig, Colin Ni, Dan\nShiebler, Daniel, Daniel Hnyk, David Ellis, David Nicholson, David Staub, David\nThaler, David Warshaw, Davide Lasagna, Deborah, definitelyuncertain, Didi\nBar-Zev, djipey, dsquareindia, edwinENSAE, Elias Kuthe, Elvis DOHMATOB, Ethan\nWhite, Fabian Pedregosa, Fabio Ticconi, fisache, Florian Wilhelm, Francis,\nFrancis O\u2019Donovan, Gael Varoquaux, Ganiev Ibraim, ghg, Gilles Louppe, Giorgio\nPatrini, Giovanni Cherubin, Giovanni Lanzani, Glenn Qian, Gordon\nMohr, govin-vatsan, Graham Clenaghan, Greg Reda, Greg Stupp, Guillaume\nLemaitre, Gustav M\u00f6rtberg, halwai, Harizo Rajaona, Harry Mavroforakis,\nhashcode55, hdmetor, Henry Lin, Hobson Lane, Hugo Bowne-Anderson,\nIgor Andriushchenko, Imaculate, Inki Hwang, Isaac Sijaranamual,\nIshank Gulati, Issam Laradji, Iver Jordal, jackmartin, Jacob Schreiber, Jake\nVanderplas, James Fiedler, James Routley, Jan Zikes, Janna Brettingen, jarfa, Jason\nLaska, jblackburne, jeff levesque, Jeffrey Blackburne, Jeffrey04, Jeremy Hintz,\njeremynixon, Jeroen, Jessica Yung, Jill-J\u00eann Vie, Jimmy Jia, Jiyuan Qian, Joel\nNothman, johannah, John, John Boersma, John Kirkham, John Moeller,\njonathan.striebel, joncrall, Jordi, Joseph Munoz, Joshua Cook, JPFrancoia,\njrfiedler, JulianKahnert, juliathebrave, kaichogami, KamalakerDadi, Kenneth\nLyons, Kevin Wang, kingjr, kjell, Konstantin Podshumok, Kornel Kielczewski,\nKrishna Kalyan, krishnakalyan3, Kvle Putnam, Kyle Jackson, Lars Buitinck,\nldavid, LeiG, LeightonZhang, Leland McInnes, Liang-Chi Hsieh, Lilian Besson,\nlizsz, Loic Esteve, Louis Tiao, L\u00e9onie Borne, Mads Jensen, Maniteja Nandana,\nManoj Kumar, Manvendra Singh, Marco, Mario Krell, Mark Bao, Mark Szepieniec,\nMartin Madsen, MartinBpr, MaryanMorel, Massil, Matheus, Mathieu Blondel,\nMathieu Dubois, Matteo, Matthias Ekman, Max Moroz, Michael Scherer, michiaki\nariga, Mikhail Korobov, Moussa Taifi, mrandrewandrade, Mridul Seth, nadya-p,\nNaoya Kanai, Nate George, Nelle Varoquaux, Nelson Liu, Nick James,\nNickleDave, Nico, Nicolas Goix, Nikolay Mayorov, ningchi, nlathia,\nokbalefthanded, Okhlopkov, Olivier Grisel, Panos Louridas, Paul Strickland,\nPerrine Letellier, pestrickland, Peter Fischer, Pieter, Ping-Yao, Chang,\npracticalswift, Preston Parry, Qimu Zheng, Rachit Kansal, Raghav RV,\nRalf Gommers, Ramana.S, Rammig, Randy Olson, Rob Alexander, Robert Lutz,\nRobin Schucker, Rohan Jain, Ruifeng Zheng, Ryan Yu, R\u00e9my L\u00e9one, saihttam,\nSaiwing Yeung, Sam Shleifer, Samuel St-Jean, Sartaj Singh, Sasank Chilamkurthy,\nsaurabh.bansod, Scott Andrews, Scott Lowe, seales, Sebastian Raschka, Sebastian\nSaeger, Sebasti\u00e1n Vanrell, Sergei Lebedev, shagun Sodhani, shanmuga cv,\nShashank Shekhar, shawpan, shengxiduan, Shota, shuckle16, Skipper Seabold,\nsklearn-ci, SmedbergM, srvanrell, S\u00e9bastien Lerique, Taranjeet, themrmax,\nThierry, Thierry Guillemot, Thomas, Thomas Hallock, Thomas Moreau, Tim Head,\ntKammy, toastedcornflakes, Tom, TomDLT, Toshihiro Kamishima, tracer0tong, Trent\nHauck, trevorstephens, Tue Vo, Varun, Varun Jewalikar, Viacheslav, Vighnesh\nBirodkar, Vikram, Villu Ruusmann, Vinayak Mehta, walter, waterponey, Wenhua\nYang, Wenjian Huang, Will Welch, wyseguy7, xyguo, yanlend, Yaroslav Halchenko,\nyelite, Yen, YenChenLin, Yichuan Liu, Yoav Ram, Yoshiki, Zheng RuiFeng, zivori, \u00d3scar N\u00e1jera\nVersion 0.17.1 \u00b6\nFebruary 18, 2016\nChangelog \u00b6\nBug fixes \u00b6\nUpgrade vendored joblib to version 0.9.4 that fixes an important bug in\n joblib.Parallel that can silently yield to wrong results when working\non datasets larger than 1MB:\n https://github.com/joblib/joblib/blob/0.9.4/CHANGES.rst\nFixed reading of Bunch pickles generated with scikit-learn\nversion <= 0.16. This can affect users who have already\ndownloaded a dataset with scikit-learn 0.16 and are loading it\nwith scikit-learn 0.17. See #6196 for\nhow this affected datasets.fetch_20newsgroups . By Loic\nEsteve .\nFixed a bug that prevented using ROC AUC score to perform grid search on\nseveral CPU / cores on large arrays. See #6147 \nBy Olivier Grisel .\nFixed a bug that prevented to properly set the presort parameter\nin ensemble.GradientBoostingRegressor . See #5857 \nBy Andrew McCulloh.\nFixed a joblib error when evaluating the perplexity of a\n decomposition.LatentDirichletAllocation model. See #6258 \nBy Chyi-Kwei Yau.\nVersion 0.17 \u00b6\nNovember 5, 2015\nChangelog \u00b6\nNew features \u00b6\nAll the Scaler classes but preprocessing.RobustScaler can be fitted online by\ncalling partial_fit . By Giorgio Patrini .\nThe new class ensemble.VotingClassifier implements a\n\u201cmajority rule\u201d / \u201csoft voting\u201d ensemble classifier to combine\nestimators for classification. By Sebastian Raschka .\nThe new class preprocessing.RobustScaler provides an\nalternative to preprocessing.StandardScaler for feature-wise\ncentering and range normalization that is robust to outliers.\nBy Thomas Unterthiner .\nThe new class preprocessing.MaxAbsScaler provides an\nalternative to preprocessing.MinMaxScaler for feature-wise\nrange normalization when the data is already centered or sparse.\nBy Thomas Unterthiner .\nThe new class preprocessing.FunctionTransformer turns a Python\nfunction into a Pipeline -compatible transformer object.\nBy Joe Jevnik.\nThe new classes cross_validation.LabelKFold and\n cross_validation.LabelShuffleSplit generate train-test folds,\nrespectively similar to cross_validation.KFold and\n cross_validation.ShuffleSplit , except that the folds are\nconditioned on a label array. By Brian McFee , Jean\nKossaifi and Gilles Louppe .\ndecomposition.LatentDirichletAllocation implements the Latent\nDirichlet Allocation topic model with online variational\ninference. By Chyi-Kwei Yau , with code based on an implementation\nby Matt Hoffman. ( #3659 )\nThe new solver sag implements a Stochastic Average Gradient descent\nand is available in both linear_model.LogisticRegression and\n linear_model.Ridge . This solver is very efficient for large\ndatasets. By Danny Sullivan and Tom Dupre la Tour .\n( #4738 )\nThe new solver cd implements a Coordinate Descent in\n decomposition.NMF . Previous solver based on Projected Gradient is\nstill available setting new parameter solver to pg , but is\ndeprecated and will be removed in 0.19, along with\n decomposition.ProjectedGradientNMF and parameters sparseness ,\n eta , beta and nls_max_iter . New parameters alpha and\n l1_ratio control L1 and L2 regularization, and shuffle adds a\nshuffling step in the cd solver.\nBy Tom Dupre la Tour and Mathieu Blondel .\nEnhancements \u00b6\nmanifold.TSNE now supports approximate optimization via the\nBarnes-Hut method, leading to much faster fitting. By Christopher Erick Moody.\n( #4025 )\ncluster.mean_shift_.MeanShift now supports parallel execution,\nas implemented in the mean_shift function. By Martino\nSorbaro .\nnaive_bayes.GaussianNB now supports fitting with sample_weight .\nBy Jan Hendrik Metzen .\ndummy.DummyClassifier now supports a prior fitting strategy.\nBy Arnaud Joly .\nAdded a fit_predict method for mixture.GMM and subclasses.\nBy Cory Lorenz .\nAdded the metrics.label_ranking_loss metric.\nBy Arnaud Joly .\nAdded the metrics.cohen_kappa_score metric.\nAdded a warm_start constructor parameter to the bagging ensemble\nmodels to increase the size of the ensemble. By Tim Head .\nAdded option to use multi-output regression metrics without averaging.\nBy Konstantin Shmelkov and Michael Eickenberg .\nAdded stratify option to cross_validation.train_test_split \nfor stratified splitting. By Miroslav Batchkarov.\nThe tree.export_graphviz function now supports aesthetic\nimprovements for tree.DecisionTreeClassifier and\n tree.DecisionTreeRegressor , including options for coloring nodes\nby their majority class or impurity, showing variable names, and using\nnode proportions instead of raw sample counts. By Trevor Stephens .\nImproved speed of newton-cg solver in\n linear_model.LogisticRegression , by avoiding loss computation.\nBy Mathieu Blondel and Tom Dupre la Tour .\nThe class_weight=\"auto\" heuristic in classifiers supporting\n class_weight was deprecated and replaced by the class_weight=\"balanced\" \noption, which has a simpler formula and interpretation.\nBy Hanna Wallach and Andreas M\u00fcller .\nAdd class_weight parameter to automatically weight samples by class\nfrequency for linear_model.PassiveAgressiveClassifier . By\n Trevor Stephens .\nAdded backlinks from the API reference pages to the user guide. By\n Andreas M\u00fcller .\nThe labels parameter to sklearn.metrics.f1_score ,\n sklearn.metrics.fbeta_score ,\n sklearn.metrics.recall_score and\n sklearn.metrics.precision_score has been extended.\nIt is now possible to ignore one or more labels, such as where\na multiclass problem has a majority class to ignore. By Joel Nothman .\nAdd sample_weight support to linear_model.RidgeClassifier .\nBy Trevor Stephens .\nProvide an option for sparse output from\n sklearn.metrics.pairwise.cosine_similarity . By\n Jaidev Deshpande .\nAdd minmax_scale to provide a function interface for\n MinMaxScaler . By Thomas Unterthiner .\ndump_svmlight_file now handles multi-label datasets.\nBy Chih-Wei Chang.\nRCV1 dataset loader ( sklearn.datasets.fetch_rcv1 ).\nBy Tom Dupre la Tour .\nThe \u201cWisconsin Breast Cancer\u201d classical two-class classification dataset\nis now included in scikit-learn, available with\n sklearn.dataset.load_breast_cancer .\nUpgraded to joblib 0.9.3 to benefit from the new automatic batching of\nshort tasks. This makes it possible for scikit-learn to benefit from\nparallelism when many very short tasks are executed in parallel, for\ninstance by the grid_search.GridSearchCV meta-estimator\nwith n_jobs > 1 used with a large grid of parameters on a small\ndataset. By Vlad Niculae , Olivier Grisel and Loic Esteve .\nFor more details about changes in joblib 0.9.3 see the release notes:\n https://github.com/joblib/joblib/blob/master/CHANGES.rst#release-093\nImproved speed (3 times per iteration) of\n decomposition.DictLearning with coordinate descent method\nfrom linear_model.Lasso . By Arthur Mensch .\nParallel processing (threaded) for queries of nearest neighbors\n(using the ball-tree) by Nikolay Mayorov.\nAllow datasets.make_multilabel_classification to output\na sparse y . By Kashif Rasul.\ncluster.DBSCAN now accepts a sparse matrix of precomputed\ndistances, allowing memory-efficient distance precomputation. By\n Joel Nothman .\ntree.DecisionTreeClassifier now exposes an apply method\nfor retrieving the leaf indices samples are predicted as. By\n Daniel Galvez and Gilles Louppe .\nSpeed up decision tree regressors, random forest regressors, extra trees\nregressors and gradient boosting estimators by computing a proxy\nof the impurity improvement during the tree growth. The proxy quantity is\nsuch that the split that maximizes this value also maximizes the impurity\nimprovement. By Arnaud Joly , Jacob Schreiber \nand Gilles Louppe .\nSpeed up tree based methods by reducing the number of computations needed\nwhen computing the impurity measure taking into account linear\nrelationship of the computed statistics. The effect is particularly\nvisible with extra trees and on datasets with categorical or sparse\nfeatures. By Arnaud Joly .\nensemble.GradientBoostingRegressor and\n ensemble.GradientBoostingClassifier now expose an apply \nmethod for retrieving the leaf indices each sample ends up in under\neach try. By Jacob Schreiber .\nAdd sample_weight support to linear_model.LinearRegression .\nBy Sonny Hu. ( ##4881 )\nAdd n_iter_without_progress to manifold.TSNE to control\nthe stopping criterion. By Santi Villalba. ( #5186 )\nAdded optional parameter random_state in linear_model.Ridge \n, to set the seed of the pseudo random generator used in sag solver. By Tom Dupre la Tour .\nAdded optional parameter warm_start in\n linear_model.LogisticRegression . If set to True, the solvers\n lbfgs , newton-cg and sag will be initialized with the\ncoefficients computed in the previous fit. By Tom Dupre la Tour .\nAdded sample_weight support to linear_model.LogisticRegression for\nthe lbfgs , newton-cg , and sag solvers. By Valentin Stolbunov .\nSupport added to the liblinear solver. By Manoj Kumar .\nAdded optional parameter presort to ensemble.GradientBoostingRegressor \nand ensemble.GradientBoostingClassifier , keeping default behavior\nthe same. This allows gradient boosters to turn off presorting when building\ndeep trees or using sparse data. By Jacob Schreiber .\nAltered metrics.roc_curve to drop unnecessary thresholds by\ndefault. By Graham Clenaghan .\nAdded feature_selection.SelectFromModel meta-transformer which can\nbe used along with estimators that have coef_ or feature_importances_ \nattribute to select important features of the input data. By\n Maheshakya Wijewardena , Joel Nothman and Manoj Kumar .\nAdded metrics.pairwise.laplacian_kernel . By Clyde Fare .\ncovariance.GraphLasso allows separate control of the convergence criterion\nfor the Elastic-Net subproblem via the enet_tol parameter.\nImproved verbosity in decomposition.DictionaryLearning .\nensemble.RandomForestClassifier and\n ensemble.RandomForestRegressor no longer explicitly store the\nsamples used in bagging, resulting in a much reduced memory footprint for\nstoring random forest models.\nAdded positive option to linear_model.Lars and\n linear_model.lars_path to force coefficients to be positive.\n( #5131 )\nAdded the X_norm_squared parameter to metrics.pairwise.euclidean_distances \nto provide precomputed squared norms for X .\nAdded the fit_predict method to pipeline.Pipeline .\nAdded the preprocessing.min_max_scale function.\nBug fixes \u00b6\nFixed non-determinism in dummy.DummyClassifier with sparse\nmulti-label output. By Andreas M\u00fcller .\nFixed the output shape of linear_model.RANSACRegressor to\n (n_samples, ) . By Andreas M\u00fcller .\nFixed bug in decomposition.DictLearning when n_jobs < 0 . By\n Andreas M\u00fcller .\nFixed bug where grid_search.RandomizedSearchCV could consume a\nlot of memory for large discrete grids. By Joel Nothman .\nFixed bug in linear_model.LogisticRegressionCV where penalty was ignored\nin the final fit. By Manoj Kumar .\nFixed bug in ensemble.forest.ForestClassifier while computing\noob_score and X is a sparse.csc_matrix. By Ankur Ankan .\nAll regressors now consistently handle and warn when given y that is of\nshape (n_samples, 1) . By Andreas M\u00fcller and Henry Lin.\n( #5431 )\nFix in cluster.KMeans cluster reassignment for sparse input by\n Lars Buitinck .\nFixed a bug in lda.LDA that could cause asymmetric covariance\nmatrices when using shrinkage. By Martin Billinger .\nFixed cross_validation.cross_val_predict for estimators with\nsparse predictions. By Buddha Prakash.\nFixed the predict_proba method of linear_model.LogisticRegression \nto use soft-max instead of one-vs-rest normalization. By Manoj Kumar .\n( #5182 )\nFixed the partial_fit method of linear_model.SGDClassifier \nwhen called with average=True . By Andrew Lamb .\n( #5282 )\nDataset fetchers use different filenames under Python 2 and Python 3 to\navoid pickling compatibility issues. By Olivier Grisel .\n( #5355 )\nFixed a bug in naive_bayes.GaussianNB which caused classification\nresults to depend on scale. By Jake Vanderplas .\nFixed temporarily linear_model.Ridge , which was incorrect\nwhen fitting the intercept in the case of sparse data. The fix\nautomatically changes the solver to \u2018sag\u2019 in this case.\n #5360 by Tom Dupre la Tour .\nFixed a performance bug in decomposition.RandomizedPCA on data\nwith a large number of features and fewer samples. ( #4478 )\nBy Andreas M\u00fcller , Loic Esteve and Giorgio Patrini .\nFixed bug in cross_decomposition.PLS that yielded unstable and\nplatform dependent output, and failed on fit_transform .\nBy Arthur Mensch .\nFixes to the Bunch class used to store datasets.\nFixed ensemble.plot_partial_dependence ignoring the\n percentiles parameter.\nProviding a set as vocabulary in CountVectorizer no longer\nleads to inconsistent results when pickling.\nFixed the conditions on when a precomputed Gram matrix needs to\nbe recomputed in linear_model.LinearRegression ,\n linear_model.OrthogonalMatchingPursuit ,\n linear_model.Lasso and linear_model.ElasticNet .\nFixed inconsistent memory layout in the coordinate descent solver\nthat affected linear_model.DictionaryLearning and\n covariance.GraphLasso . ( #5337 )\nBy Olivier Grisel .\nmanifold.LocallyLinearEmbedding no longer ignores the reg \nparameter.\nNearest Neighbor estimators with custom distance metrics can now be pickled.\n( #4362 )\nFixed a bug in pipeline.FeatureUnion where transformer_weights \nwere not properly handled when performing grid-searches.\nFixed a bug in linear_model.LogisticRegression and\n linear_model.LogisticRegressionCV when using\n class_weight='balanced'```or ``class_weight='auto' .\nBy Tom Dupre la Tour .\nFixed bug #5495 when\ndoing OVR(SVC(decision_function_shape=\u201dovr\u201d)). Fixed by\n Elvis Dohmatob .\nAPI changes summary \u00b6\nAttribute data_min , data_max and data_range in\n preprocessing.MinMaxScaler are deprecated and won\u2019t be available\nfrom 0.19. Instead, the class now exposes data_min_ , data_max_ \nand data_range_ . By Giorgio Patrini .\nAll Scaler classes now have an scale_ attribute, the feature-wise\nrescaling applied by their transform methods. The old attribute std_ \nin preprocessing.StandardScaler is deprecated and superseded\nby scale_ ; it won\u2019t be available in 0.19. By Giorgio Patrini .\nsvm.SVC` and svm.NuSVC now have an decision_function_shape \nparameter to make their decision function of shape (n_samples, n_classes) \nby setting decision_function_shape='ovr' . This will be the default behavior\nstarting in 0.19. By Andreas M\u00fcller .\nPassing 1D data arrays as input to estimators is now deprecated as it\ncaused confusion in how the array elements should be interpreted\nas features or as samples. All data arrays are now expected\nto be explicitly shaped (n_samples, n_features) .\nBy Vighnesh Birodkar .\nlda.LDA and qda.QDA have been moved to\n discriminant_analysis.LinearDiscriminantAnalysis and\n discriminant_analysis.QuadraticDiscriminantAnalysis .\nThe store_covariance and tol parameters have been moved from\nthe fit method to the constructor in\n discriminant_analysis.LinearDiscriminantAnalysis and the\n store_covariances and tol parameters have been moved from the\nfit method to the constructor in\n discriminant_analysis.QuadraticDiscriminantAnalysis .\nModels inheriting from _LearntSelectorMixin will no longer support the\ntransform methods. (i.e, RandomForests, GradientBoosting, LogisticRegression,\nDecisionTrees, SVMs and SGD related models). Wrap these models around the\nmetatransfomer feature_selection.SelectFromModel to remove\nfeatures (according to coefs_ or feature_importances_ )\nwhich are below a certain threshold value instead.\ncluster.KMeans re-runs cluster-assignments in case of non-convergence,\nto ensure consistency of predict(X) and labels_ . By\n Vighnesh Birodkar .\nClassifier and Regressor models are now tagged as such using the\n _estimator_type attribute.\nCross-validation iterators always provide indices into training and test set,\nnot boolean masks.\nThe decision_function on all regressors was deprecated and will be\nremoved in 0.19. Use predict instead.\ndatasets.load_lfw_pairs is deprecated and will be removed in 0.19.\nUse datasets.fetch_lfw_pairs instead.\nThe deprecated hmm module was removed.\nThe deprecated Bootstrap cross-validation iterator was removed.\nThe deprecated Ward and WardAgglomerative classes have been removed.\nUse clustering.AgglomerativeClustering instead.\ncross_validation.check_cv is now a public function.\nThe property residues_ of linear_model.LinearRegression is deprecated\nand will be removed in 0.19.\nThe deprecated n_jobs parameter of linear_model.LinearRegression has been moved\nto the constructor.\nRemoved deprecated class_weight parameter from linear_model.SGDClassifier \u2019s fit \nmethod. Use the construction parameter instead.\nThe deprecated support for the sequence of sequences (or list of lists) multilabel\nformat was removed. To convert to and from the supported binary\nindicator matrix format, use\n MultiLabelBinarizer .\nThe behavior of calling the inverse_transform method of Pipeline.pipeline will\nchange in 0.19. It will no longer reshape one-dimensional input to two-dimensional input.\nThe deprecated attributes indicator_matrix_ , multilabel_ and classes_ of\n preprocessing.LabelBinarizer were removed.\nUsing gamma=0 in svm.SVC and svm.SVR to automatically set the\ngamma to 1. / n_features is deprecated and will be removed in 0.19.\nUse gamma=\"auto\" instead.\nCode Contributors \u00b6\nAaron Schumacher, Adithya Ganesh, akitty, Alexandre Gramfort, Alexey Grigorev,\nAli Baharev, Allen Riddell, Ando Saabas, Andreas Mueller, Andrew Lamb, Anish\nShah, Ankur Ankan, Anthony Erlinger, Ari Rouvinen, Arnaud Joly, Arnaud Rachez,\nArthur Mensch, banilo, Barmaley.exe, benjaminirving, Boyuan Deng, Brett Naul,\nBrian McFee, Buddha Prakash, Chi Zhang, Chih-Wei Chang, Christof Angermueller,\nChristoph Gohlke, Christophe Bourguignat, Christopher Erick Moody, Chyi-Kwei\nYau, Cindy Sridharan, CJ Carey, Clyde-fare, Cory Lorenz, Dan Blanchard, Daniel\nGalvez, Daniel Kronovet, Danny Sullivan, Data1010, David, David D Lowe, David\nDotson, djipey, Dmitry Spikhalskiy, Donne Martin, Dougal J. Sutherland, Dougal\nSutherland, edson duarte, Eduardo Caro, Eric Larson, Eric Martin, Erich\nSchubert, Fernando Carrillo, Frank C. Eckert, Frank Zalkow, Gael Varoquaux,\nGaniev Ibraim, Gilles Louppe, Giorgio Patrini, giorgiop, Graham Clenaghan,\nGryllos Prokopis, gwulfs, Henry Lin, Hsuan-Tien Lin, Immanuel Bayer, Ishank\nGulati, Jack Martin, Jacob Schreiber, Jaidev Deshpande, Jake Vanderplas, Jan\nHendrik Metzen, Jean Kossaifi, Jeffrey04, Jeremy, jfraj, Jiali Mei,\nJoe Jevnik, Joel Nothman, John Kirkham, John Wittenauer, Joseph, Joshua Loyal,\nJungkook Park, KamalakerDadi, Kashif Rasul, Keith Goodman, Kian Ho, Konstantin\nShmelkov, Kyler Brown, Lars Buitinck, Lilian Besson, Loic Esteve, Louis Tiao,\nmaheshakya, Maheshakya Wijewardena, Manoj Kumar, MarkTab marktab.net, Martin\nKu, Martin Spacek, MartinBpr, martinosorb, MaryanMorel, Masafumi Oyamada,\nMathieu Blondel, Matt Krump, Matti Lyra, Maxim Kolganov, mbillinger, mhg,\nMichael Heilman, Michael Patterson, Miroslav Batchkarov, Nelle Varoquaux,\nNicolas, Nikolay Mayorov, Olivier Grisel, Omer Katz, \u00d3scar N\u00e1jera, Pauli\nVirtanen, Peter Fischer, Peter Prettenhofer, Phil Roth, pianomania, Preston\nParry, Raghav RV, Rob Zinkov, Robert Layton, Rohan Ramanath, Saket Choudhary,\nSam Zhang, santi, saurabh.bansod, scls19fr, Sebastian Raschka, Sebastian\nSaeger, Shivan Sornarajah, SimonPL, sinhrks, Skipper Seabold, Sonny Hu, sseg,\nStephen Hoover, Steven De Gryze, Steven Seguin, Theodore Vasiloudis, Thomas\nUnterthiner, Tiago Freitas Pereira, Tian Wang, Tim Head, Timothy Hopper,\ntokoroten, Tom Dupr\u00e9 la Tour, Trevor Stephens, Valentin Stolbunov, Vighnesh\nBirodkar, Vinayak Mehta, Vincent, Vincent Michel, vstolbunov, wangz10, Wei Xue,\nYucheng Low, Yury Zhauniarovich, Zac Stewart, zhai_pro, Zichen Wang\nVersion 0.16.1 \u00b6\nApril 14, 2015\nChangelog \u00b6\nBug fixes \u00b6\nAllow input data larger than block_size in\n covariance.LedoitWolf by Andreas M\u00fcller .\nFix a bug in isotonic.IsotonicRegression deduplication that\ncaused unstable result in calibration.CalibratedClassifierCV by\n Jan Hendrik Metzen .\nFix sorting of labels in func: preprocessing.label_binarize by Michael Heilman.\nFix several stability and convergence issues in\n cross_decomposition.CCA and\n cross_decomposition.PLSCanonical by Andreas M\u00fcller\nFix a bug in cluster.KMeans when precompute_distances=False \non fortran-ordered data.\nFix a speed regression in ensemble.RandomForestClassifier \u2019s predict \nand predict_proba by Andreas M\u00fcller .\nFix a regression where utils.shuffle converted lists and dataframes to arrays, by Olivier Grisel\nVersion 0.16 \u00b6\nMarch 26, 2015\nHighlights \u00b6\nSpeed improvements (notably in cluster.DBSCAN ), reduced memory\nrequirements, bug-fixes and better default settings.\nMultinomial Logistic regression and a path algorithm in\n linear_model.LogisticRegressionCV .\nOut-of core learning of PCA via decomposition.IncrementalPCA .\nProbability callibration of classifiers using\n calibration.CalibratedClassifierCV .\ncluster.Birch clustering method for large-scale datasets.\nScalable approximate nearest neighbors search with Locality-sensitive\nhashing forests in neighbors.LSHForest .\nImproved error messages and better validation when using malformed input data.\nMore robust integration with pandas dataframes.\nChangelog \u00b6\nNew features \u00b6\nThe new neighbors.LSHForest implements locality-sensitive hashing\nfor approximate nearest neighbors search. By Maheshakya Wijewardena .\nAdded svm.LinearSVR . This class uses the liblinear implementation\nof Support Vector Regression which is much faster for large\nsample sizes than svm.SVR with linear kernel. By\n Fabian Pedregosa and Qiang Luo.\nIncremental fit for GaussianNB .\nAdded sample_weight support to dummy.DummyClassifier and\n dummy.DummyRegressor . By Arnaud Joly .\nAdded the metrics.label_ranking_average_precision_score metrics.\nBy Arnaud Joly .\nAdd the metrics.coverage_error metrics. By Arnaud Joly .\nAdded linear_model.LogisticRegressionCV . By\n Manoj Kumar , Fabian Pedregosa , Gael Varoquaux \nand Alexandre Gramfort .\nAdded warm_start constructor parameter to make it possible for any\ntrained forest model to grow additional trees incrementally. By\n Laurent Direr .\nAdded sample_weight support to ensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor . By Peter Prettenhofer .\nAdded decomposition.IncrementalPCA , an implementation of the PCA\nalgorithm that supports out-of-core learning with a partial_fit \nmethod. By Kyle Kastner .\nAveraged SGD for SGDClassifier \nand SGDRegressor By\n Danny Sullivan .\nAdded cross_val_predict \nfunction which computes cross-validated estimates. By Luis Pedro Coelho\nAdded linear_model.TheilSenRegressor , a robust\ngeneralized-median-based estimator. By Florian Wilhelm .\nAdded metrics.median_absolute_error , a robust metric.\nBy Gael Varoquaux and Florian Wilhelm .\nAdd cluster.Birch , an online clustering algorithm. By\n Manoj Kumar , Alexandre Gramfort and Joel Nothman .\nAdded shrinkage support to discriminant_analysis.LinearDiscriminantAnalysis \nusing two new solvers. By Clemens Brunner and Martin Billinger .\nAdded kernel_ridge.KernelRidge , an implementation of\nkernelized ridge regression.\nBy Mathieu Blondel and Jan Hendrik Metzen .\nAll solvers in linear_model.Ridge now support sample_weight .\nBy Mathieu Blondel .\nAdded cross_validation.PredefinedSplit cross-validation\nfor fixed user-provided cross-validation folds.\nBy Thomas Unterthiner .\nAdded calibration.CalibratedClassifierCV , an approach for\ncalibrating the predicted probabilities of a classifier.\nBy Alexandre Gramfort , Jan Hendrik Metzen , Mathieu Blondel \nand Balazs Kegl .\nEnhancements \u00b6\nAdd option return_distance in hierarchical.ward_tree \nto return distances between nodes for both structured and unstructured\nversions of the algorithm. By Matteo Visconti di Oleggio Castello .\nThe same option was added in hierarchical.linkage_tree .\nBy Manoj Kumar\nAdd support for sample weights in scorer objects. Metrics with sample\nweight support will automatically benefit from it. By Noel Dawe and\n Vlad Niculae .\nAdded newton-cg and lbfgs solver support in\n linear_model.LogisticRegression . By Manoj Kumar .\nAdd selection=\"random\" parameter to implement stochastic coordinate\ndescent for linear_model.Lasso , linear_model.ElasticNet \nand related. By Manoj Kumar .\nAdd sample_weight parameter to\n metrics.jaccard_similarity_score and metrics.log_loss .\nBy Jatin Shah .\nSupport sparse multilabel indicator representation in\n preprocessing.LabelBinarizer and\n multiclass.OneVsRestClassifier (by Hamzeh Alsalhi with thanks\nto Rohit Sivaprasad), as well as evaluation metrics (by\n Joel Nothman ).\nAdd sample_weight parameter to metrics.jaccard_similarity_score .\nBy Jatin Shah .\nAdd support for multiclass in metrics.hinge_loss . Added labels=None \nas optional parameter. By Saurabh Jha .\nAdd sample_weight parameter to metrics.hinge_loss .\nBy Saurabh Jha .\nAdd multi_class=\"multinomial\" option in\n linear_model.LogisticRegression to implement a Logistic\nRegression solver that minimizes the cross-entropy or multinomial loss\ninstead of the default One-vs-Rest setting. Supports lbfgs and\n newton-cg solvers. By Lars Buitinck and Manoj Kumar . Solver option\n newton-cg by Simon Wu.\nDictVectorizer can now perform fit_transform on an iterable in a\nsingle pass, when giving the option sort=False . By Dan\nBlanchard .\nGridSearchCV and RandomizedSearchCV can now be\nconfigured to work with estimators that may fail and raise errors on\nindividual folds. This option is controlled by the error_score \nparameter. This does not affect errors raised on re-fit. By\n Michal Romaniuk .\nAdd digits parameter to metrics.classification_report to allow\nreport to show different precision of floating point numbers. By\n Ian Gilmore .\nAdd a quantile prediction strategy to the dummy.DummyRegressor .\nBy Aaron Staple .\nAdd handle_unknown option to preprocessing.OneHotEncoder to\nhandle unknown categorical features more gracefully during transform.\nBy Manoj Kumar .\nAdded support for sparse input data to decision trees and their ensembles.\nBy Fares Hedyati and Arnaud Joly .\nOptimized cluster.AffinityPropagation by reducing the number of\nmemory allocations of large temporary data-structures. By Antony Lee .\nParellization of the computation of feature importances in random forest.\nBy Olivier Grisel and Arnaud Joly .\nAdd n_iter_ attribute to estimators that accept a max_iter attribute\nin their constructor. By Manoj Kumar .\nAdded decision function for multiclass.OneVsOneClassifier \nBy Raghav RV and Kyle Beauchamp .\nneighbors.kneighbors_graph and radius_neighbors_graph \nsupport non-Euclidean metrics. By Manoj Kumar\nParameter connectivity in cluster.AgglomerativeClustering \nand family now accept callables that return a connectivity matrix.\nBy Manoj Kumar .\nSparse support for paired_distances . By Joel Nothman .\ncluster.DBSCAN now supports sparse input and sample weights and\nhas been optimized: the inner loop has been rewritten in Cython and\nradius neighbors queries are now computed in batch. By Joel Nothman \nand Lars Buitinck .\nAdd class_weight parameter to automatically weight samples by class\nfrequency for ensemble.RandomForestClassifier ,\n tree.DecisionTreeClassifier , ensemble.ExtraTreesClassifier \nand tree.ExtraTreeClassifier . By Trevor Stephens .\ngrid_search.RandomizedSearchCV now does sampling without\nreplacement if all parameters are given as lists. By Andreas M\u00fcller .\nParallelized calculation of pairwise_distances is now supported\nfor scipy metrics and custom callables. By Joel Nothman .\nAllow the fitting and scoring of all clustering algorithms in\n pipeline.Pipeline . By Andreas M\u00fcller .\nMore robust seeding and improved error messages in cluster.MeanShift \nby Andreas M\u00fcller .\nMake the stopping criterion for mixture.GMM ,\n mixture.DPGMM and mixture.VBGMM less dependent on the\nnumber of samples by thresholding the average log-likelihood change\ninstead of its sum over all samples. By Herv\u00e9 Bredin .\nThe outcome of manifold.spectral_embedding was made deterministic\nby flipping the sign of eigenvectors. By Hasil Sharma .\nSignificant performance and memory usage improvements in\n preprocessing.PolynomialFeatures . By Eric Martin .\nNumerical stability improvements for preprocessing.StandardScaler \nand preprocessing.scale . By Nicolas Goix\nsvm.SVC fitted on sparse input now implements decision_function .\nBy Rob Zinkov and Andreas M\u00fcller .\ncross_validation.train_test_split now preserves the input type,\ninstead of converting to numpy arrays.\nDocumentation improvements \u00b6\nAdded example of using FeatureUnion for heterogeneous input.\nBy Matt Terry\nDocumentation on scorers was improved, to highlight the handling of loss\nfunctions. By Matt Pico .\nA discrepancy between liblinear output and scikit-learn\u2019s wrappers\nis now noted. By Manoj Kumar .\nImproved documentation generation: examples referring to a class or\nfunction are now shown in a gallery on the class/function\u2019s API reference\npage. By Joel Nothman .\nMore explicit documentation of sample generators and of data\ntransformation. By Joel Nothman .\nsklearn.neighbors.BallTree and sklearn.neighbors.KDTree \nused to point to empty pages stating that they are aliases of BinaryTree.\nThis has been fixed to show the correct class docs. By Manoj Kumar .\nAdded silhouette plots for analysis of KMeans clustering using\n metrics.silhouette_samples and metrics.silhouette_score .\nSee Selecting the number of clusters with silhouette analysis on KMeans clustering\nBug fixes \u00b6\nMetaestimators now support ducktyping for the presence of decision_function ,\n predict_proba and other methods. This fixes behavior of\n grid_search.GridSearchCV ,\n grid_search.RandomizedSearchCV , pipeline.Pipeline ,\n feature_selection.RFE , feature_selection.RFECV when nested.\nBy Joel Nothman\nThe scoring attribute of grid-search and cross-validation methods is no longer\nignored when a grid_search.GridSearchCV is given as a base estimator or\nthe base estimator doesn\u2019t have predict.\nThe function hierarchical.ward_tree now returns the children in\nthe same order for both the structured and unstructured versions. By\n Matteo Visconti di Oleggio Castello .\nfeature_selection.RFECV now correctly handles cases when\n step is not equal to 1. By Nikolay Mayorov\nThe decomposition.PCA now undoes whitening in its\n inverse_transform . Also, its components_ now always have unit\nlength. By Michael Eickenberg .\nFix incomplete download of the dataset when\n datasets.download_20newsgroups is called. By Manoj Kumar .\nVarious fixes to the Gaussian processes subpackage by Vincent Dubourg\nand Jan Hendrik Metzen.\nCalling partial_fit with class_weight=='auto' throws an\nappropriate error message and suggests a work around.\nBy Danny Sullivan .\nRBFSampler with gamma=g \nformerly approximated rbf_kernel \nwith gamma=g/2. ; the definition of gamma is now consistent,\nwhich may substantially change your results if you use a fixed value.\n(If you cross-validated over gamma , it probably doesn\u2019t matter\ntoo much.) By Dougal Sutherland .\nPipeline object delegate the classes_ attribute to the underlying\nestimator. It allows, for instance, to make bagging of a pipeline object.\nBy Arnaud Joly\nneighbors.NearestCentroid now uses the median as the centroid\nwhen metric is set to manhattan . It was using the mean before.\nBy Manoj Kumar\nFix numerical stability issues in linear_model.SGDClassifier \nand linear_model.SGDRegressor by clipping large gradients and\nensuring that weight decay rescaling is always positive (for large\nl2 regularization and large learning rate values).\nBy Olivier Grisel\nWhen compute_full_tree is set to \u201cauto\u201d, the full tree is\nbuilt when n_clusters is high and is early stopped when n_clusters is\nlow, while the behavior should be vice-versa in\n cluster.AgglomerativeClustering (and friends).\nThis has been fixed By Manoj Kumar\nFix lazy centering of data in linear_model.enet_path and\n linear_model.lasso_path . It was centered around one. It has\nbeen changed to be centered around the origin. By Manoj Kumar\nFix handling of precomputed affinity matrices in\n cluster.AgglomerativeClustering when using connectivity\nconstraints. By Cathy Deng\nCorrect partial_fit handling of class_prior for\n sklearn.naive_bayes.MultinomialNB and\n sklearn.naive_bayes.BernoulliNB . By Trevor Stephens .\nFixed a crash in metrics.precision_recall_fscore_support \nwhen using unsorted labels in the multi-label setting.\nBy Andreas M\u00fcller .\nAvoid skipping the first nearest neighbor in the methods radius_neighbors ,\n kneighbors , kneighbors_graph and radius_neighbors_graph in\n sklearn.neighbors.NearestNeighbors and family, when the query\ndata is not the same as fit data. By Manoj Kumar .\nFix log-density calculation in the mixture.GMM with\ntied covariance. By Will Dawson\nFixed a scaling error in feature_selection.SelectFdr \nwhere a factor n_features was missing. By Andrew Tulloch\nFix zero division in neighbors.KNeighborsRegressor and related\nclasses when using distance weighting and having identical data points.\nBy Garret-R .\nFixed round off errors with non positive-definite covariance matrices\nin GMM. By Alexis Mignon .\nFixed a error in the computation of conditional probabilities in\n naive_bayes.BernoulliNB . By Hanna Wallach .\nMake the method radius_neighbors of\n neighbors.NearestNeighbors return the samples lying on the\nboundary for algorithm='brute' . By Yan Yi .\nFlip sign of dual_coef_ of svm.SVC \nto make it consistent with the documentation and\n decision_function . By Artem Sobolev.\nFixed handling of ties in isotonic.IsotonicRegression .\nWe now use the weighted average of targets (secondary method). By\n Andreas M\u00fcller and Michael Bommarito .\nAPI changes summary \u00b6\nGridSearchCV and\n cross_val_score and other\nmeta-estimators don\u2019t convert pandas DataFrames into arrays any more,\nallowing DataFrame specific operations in custom estimators.\nmulticlass.fit_ovr , multiclass.predict_ovr ,\n predict_proba_ovr ,\n multiclass.fit_ovo , multiclass.predict_ovo ,\n multiclass.fit_ecoc and multiclass.predict_ecoc \nare deprecated. Use the underlying estimators instead.\nNearest neighbors estimators used to take arbitrary keyword arguments\nand pass these to their distance metric. This will no longer be supported\nin scikit-learn 0.18; use the metric_params argument instead.\nn_jobs parameter of the fit method shifted to the constructor of the\nLinearRegression class.\nThe predict_proba method of multiclass.OneVsRestClassifier \nnow returns two probabilities per sample in the multiclass case; this\nis consistent with other estimators and with the method\u2019s documentation,\nbut previous versions accidentally returned only the positive\nprobability. Fixed by Will Lamond and Lars Buitinck .\nChange default value of precompute in ElasticNet and Lasso \nto False. Setting precompute to \u201cauto\u201d was found to be slower when\nn_samples > n_features since the computation of the Gram matrix is\ncomputationally expensive and outweighs the benefit of fitting the Gram\nfor just one alpha.\n precompute=\"auto\" is now deprecated and will be removed in 0.18\nBy Manoj Kumar .\nExpose positive option in linear_model.enet_path and\n linear_model.enet_path which constrains coefficients to be\npositive. By Manoj Kumar .\nUsers should now supply an explicit average parameter to\n sklearn.metrics.f1_score , sklearn.metrics.fbeta_score ,\n sklearn.metrics.recall_score and\n sklearn.metrics.precision_score when performing multiclass\nor multilabel (i.e. not binary) classification. By Joel Nothman .\nscoring parameter for cross validation now accepts \u2018f1_micro\u2019 ,\n \u2018f1_macro\u2019 or \u2018f1_weighted\u2019 . \u2018f1\u2019 is now for binary classification\nonly. Similar changes apply to \u2018precision\u2019 and \u2018recall\u2019 .\nBy Joel Nothman .\nThe fit_intercept , normalize and return_models parameters in\n linear_model.enet_path and linear_model.lasso_path have\nbeen removed. They were deprecated since 0.14\nFrom now onwards, all estimators will uniformly raise NotFittedError \n( utils.validation.NotFittedError ), when any of the predict \nlike methods are called before the model is fit. By Raghav RV .\nInput data validation was refactored for more consistent input\nvalidation. The check_arrays function was replaced by check_array \nand check_X_y . By Andreas M\u00fcller .\nAllow X=None in the methods radius_neighbors , kneighbors ,\n kneighbors_graph and radius_neighbors_graph in\n sklearn.neighbors.NearestNeighbors and family. If set to None,\nthen for every sample this avoids setting the sample itself as the\nfirst nearest neighbor. By Manoj Kumar .\nAdd parameter include_self in neighbors.kneighbors_graph \nand neighbors.radius_neighbors_graph which has to be explicitly\nset by the user. If set to True, then the sample itself is considered\nas the first nearest neighbor.\nthresh parameter is deprecated in favor of new tol parameter in\n GMM , DPGMM and VBGMM . See Enhancements \nsection for details. By Herv\u00e9 Bredin .\nEstimators will treat input with dtype object as numeric when possible.\nBy Andreas M\u00fcller\nEstimators now raise ValueError consistently when fitted on empty\ndata (less than 1 sample or less than 1 feature for 2D input).\nBy Olivier Grisel .\nThe shuffle option of linear_model.SGDClassifier ,\n linear_model.SGDRegressor , linear_model.Perceptron ,\n linear_model.PassiveAgressiveClassifier and\n linear_model.PassiveAgressiveRegressor now defaults to True .\ncluster.DBSCAN now uses a deterministic initialization. The\n random_state parameter is deprecated. By Erich Schubert .\nCode Contributors \u00b6\nA. Flaxman, Aaron Schumacher, Aaron Staple, abhishek thakur, Akshay, akshayah3,\nAldrian Obaja, Alexander Fabisch, Alexandre Gramfort, Alexis Mignon, Anders\nAagaard, Andreas Mueller, Andreas van Cranenburgh, Andrew Tulloch, Andrew\nWalker, Antony Lee, Arnaud Joly, banilo, Barmaley.exe, Ben Davies, Benedikt\nKoehler, bhsu, Boris Feld, Borja Ayerdi, Boyuan Deng, Brent Pedersen, Brian\nWignall, Brooke Osborn, Calvin Giles, Cathy Deng, Celeo, cgohlke, chebee7i,\nChristian Stade-Schuldt, Christof Angermueller, Chyi-Kwei Yau, CJ Carey,\nClemens Brunner, Daiki Aminaka, Dan Blanchard, danfrankj, Danny Sullivan, David\nFletcher, Dmitrijs Milajevs, Dougal J. Sutherland, Erich Schubert, Fabian\nPedregosa, Florian Wilhelm, floydsoft, F\u00e9lix-Antoine Fortin, Gael Varoquaux,\nGarrett-R, Gilles Louppe, gpassino, gwulfs, Hampus Bengtsson, Hamzeh Alsalhi,\nHanna Wallach, Harry Mavroforakis, Hasil Sharma, Helder, Herve Bredin,\nHsiang-Fu Yu, Hugues SALAMIN, Ian Gilmore, Ilambharathi Kanniah, Imran Haque,\nisms, Jake VanderPlas, Jan Dlabal, Jan Hendrik Metzen, Jatin Shah, Javier L\u00f3pez\nPe\u00f1a, jdcaballero, Jean Kossaifi, Jeff Hammerbacher, Joel Nothman, Jonathan\nHelmus, Joseph, Kaicheng Zhang, Kevin Markham, Kyle Beauchamp, Kyle Kastner,\nLagacherie Matthieu, Lars Buitinck, Laurent Direr, leepei, Loic Esteve, Luis\nPedro Coelho, Lukas Michelbacher, maheshakya, Manoj Kumar, Manuel, Mario\nMichael Krell, Martin, Martin Billinger, Martin Ku, Mateusz Susik, Mathieu\nBlondel, Matt Pico, Matt Terry, Matteo Visconti dOC, Matti Lyra, Max Linke,\nMehdi Cherti, Michael Bommarito, Michael Eickenberg, Michal Romaniuk, MLG,\nmr.Shu, Nelle Varoquaux, Nicola Montecchio, Nicolas, Nikolay Mayorov, Noel\nDawe, Okal Billy, Olivier Grisel, \u00d3scar N\u00e1jera, Paolo Puggioni, Peter\nPrettenhofer, Pratap Vardhan, pvnguyen, queqichao, Rafael Carrascosa, Raghav R\nV, Rahiel Kasim, Randall Mason, Rob Zinkov, Robert Bradshaw, Saket Choudhary,\nSam Nicholls, Samuel Charron, Saurabh Jha, sethdandridge, sinhrks, snuderl,\nStefan Otte, Stefan van der Walt, Steve Tjoa, swu, Sylvain Zimmer, tejesh95,\nterrycojones, Thomas Delteil, Thomas Unterthiner, Tomas Kazmar, trevorstephens,\ntttthomasssss, Tzu-Ming Kuo, ugurcaliskan, ugurthemaster, Vinayak Mehta,\nVincent Dubourg, Vjacheslav Murashkin, Vlad Niculae, wadawson, Wei Xue, Will\nLamond, Wu Jiang, x0l, Xinfan Meng, Yan Yi, Yu-Chin\nVersion 0.15.2 \u00b6\nSeptember 4, 2014\nBug fixes \u00b6\nFixed handling of the p parameter of the Minkowski distance that was\npreviously ignored in nearest neighbors models. By Nikolay\nMayorov .\nFixed duplicated alphas in linear_model.LassoLars with early\nstopping on 32 bit Python. By Olivier Grisel and Fabian Pedregosa .\nFixed the build under Windows when scikit-learn is built with MSVC while\nNumPy is built with MinGW. By Olivier Grisel and Federico\nVaggi .\nFixed an array index overflow bug in the coordinate descent solver. By\n Gael Varoquaux .\nBetter handling of numpy 1.9 deprecation warnings. By Gael Varoquaux .\nRemoved unnecessary data copy in cluster.KMeans .\nBy Gael Varoquaux .\nExplicitly close open files to avoid ResourceWarnings under Python 3.\nBy Calvin Giles.\nThe transform of discriminant_analysis.LinearDiscriminantAnalysis \nnow projects the input on the most discriminant directions. By Martin Billinger.\nFixed potential overflow in _tree.safe_realloc by Lars Buitinck .\nPerformance optimization in isotonic.IsotonicRegression .\nBy Robert Bradshaw.\nnose is non-longer a runtime dependency to import sklearn , only for\nrunning the tests. By Joel Nothman .\nMany documentation and website fixes by Joel Nothman , Lars Buitinck\nMatt Pico , and others.\nVersion 0.15.1 \u00b6\nAugust 1, 2014\nBug fixes \u00b6\nMade cross_validation.cross_val_score use\n cross_validation.KFold instead of\n cross_validation.StratifiedKFold on multi-output classification\nproblems. By Nikolay Mayorov .\nSupport unseen labels preprocessing.LabelBinarizer to restore\nthe default behavior of 0.14.1 for backward compatibility. By\n Hamzeh Alsalhi .\nFixed the cluster.KMeans stopping criterion that prevented early\nconvergence detection. By Edward Raff and Gael Varoquaux .\nFixed the behavior of multiclass.OneVsOneClassifier .\nin case of ties at the per-class vote level by computing the correct\nper-class sum of prediction scores. By Andreas M\u00fcller .\nMade cross_validation.cross_val_score and\n grid_search.GridSearchCV accept Python lists as input data.\nThis is especially useful for cross-validation and model selection of\ntext processing pipelines. By Andreas M\u00fcller .\nFixed data input checks of most estimators to accept input data that\nimplements the NumPy __array__ protocol. This is the case for\nfor pandas.Series and pandas.DataFrame in recent versions of\npandas. By Gael Varoquaux .\nFixed a regression for linear_model.SGDClassifier with\n class_weight=\"auto\" on data with non-contiguous labels. By\n Olivier Grisel .\nVersion 0.15 \u00b6\nJuly 15, 2014\nHighlights \u00b6\nMany speed and memory improvements all across the code\nHuge speed and memory improvements to random forests (and extra\ntrees) that also benefit better from parallel computing.\nIncremental fit to BernoulliRBM\nAdded cluster.AgglomerativeClustering for hierarchical\nagglomerative clustering with average linkage, complete linkage and\nward strategies.\nAdded linear_model.RANSACRegressor for robust regression\nmodels.\nAdded dimensionality reduction with manifold.TSNE which can be\nused to visualize high-dimensional data.\nChangelog \u00b6\nNew features \u00b6\nAdded ensemble.BaggingClassifier and\n ensemble.BaggingRegressor meta-estimators for ensembling\nany kind of base estimator. See the Bagging section of\nthe user guide for details and examples. By Gilles Louppe .\nNew unsupervised feature selection algorithm\n feature_selection.VarianceThreshold , by Lars Buitinck .\nAdded linear_model.RANSACRegressor meta-estimator for the robust\nfitting of regression models. By Johannes Sch\u00f6nberger .\nAdded cluster.AgglomerativeClustering for hierarchical\nagglomerative clustering with average linkage, complete linkage and\nward strategies, by Nelle Varoquaux and Gael Varoquaux .\nShorthand constructors pipeline.make_pipeline and\n pipeline.make_union were added by Lars Buitinck .\nShuffle option for cross_validation.StratifiedKFold .\nBy Jeffrey Blackburne .\nIncremental learning ( partial_fit ) for Gaussian Naive Bayes by\nImran Haque.\nAdded partial_fit to BernoulliRBM \nBy Danny Sullivan .\nAdded learning_curve utility to\nchart performance with respect to training size. See\n Plotting Learning Curves . By Alexander Fabisch.\nAdd positive option in LassoCV and\n ElasticNetCV .\nBy Brian Wignall and Alexandre Gramfort .\nAdded linear_model.MultiTaskElasticNetCV and\n linear_model.MultiTaskLassoCV . By Manoj Kumar .\nAdded manifold.TSNE . By Alexander Fabisch.\nEnhancements \u00b6\nAdd sparse input support to ensemble.AdaBoostClassifier and\n ensemble.AdaBoostRegressor meta-estimators.\nBy Hamzeh Alsalhi .\nMemory improvements of decision trees, by Arnaud Joly .\nDecision trees can now be built in best-first manner by using max_leaf_nodes \nas the stopping criteria. Refactored the tree code to use either a\nstack or a priority queue for tree building.\nBy Peter Prettenhofer and Gilles Louppe .\nDecision trees can now be fitted on fortran- and c-style arrays, and\nnon-continuous arrays without the need to make a copy.\nIf the input array has a different dtype than np.float32 , a fortran-\nstyle copy will be made since fortran-style memory layout has speed\nadvantages. By Peter Prettenhofer and Gilles Louppe .\nSpeed improvement of regression trees by optimizing the\nthe computation of the mean square error criterion. This lead\nto speed improvement of the tree, forest and gradient boosting tree\nmodules. By Arnaud Joly\nThe img_to_graph and grid_tograph functions in\n sklearn.feature_extraction.image now return np.ndarray \ninstead of np.matrix when return_as=np.ndarray . See the\nNotes section for more information on compatibility.\nChanged the internal storage of decision trees to use a struct array.\nThis fixed some small bugs, while improving code and providing a small\nspeed gain. By Joel Nothman .\nReduce memory usage and overhead when fitting and predicting with forests\nof randomized trees in parallel with n_jobs != 1 by leveraging new\nthreading backend of joblib 0.8 and releasing the GIL in the tree fitting\nCython code. By Olivier Grisel and Gilles Louppe .\nSpeed improvement of the sklearn.ensemble.gradient_boosting module.\nBy Gilles Louppe and Peter Prettenhofer .\nVarious enhancements to the sklearn.ensemble.gradient_boosting \nmodule: a warm_start argument to fit additional trees,\na max_leaf_nodes argument to fit GBM style trees,\na monitor fit argument to inspect the estimator during training, and\nrefactoring of the verbose code. By Peter Prettenhofer .\nFaster sklearn.ensemble.ExtraTrees by caching feature values.\nBy Arnaud Joly .\nFaster depth-based tree building algorithm such as decision tree,\nrandom forest, extra trees or gradient tree boosting (with depth based\ngrowing strategy) by avoiding trying to split on found constant features\nin the sample subset. By Arnaud Joly .\nAdd min_weight_fraction_leaf pre-pruning parameter to tree-based\nmethods: the minimum weighted fraction of the input samples required to be\nat a leaf node. By Noel Dawe .\nAdded metrics.pairwise_distances_argmin_min , by Philippe Gervais.\nAdded predict method to cluster.AffinityPropagation and\n cluster.MeanShift , by Mathieu Blondel .\nVector and matrix multiplications have been optimised throughout the\nlibrary by Denis Engemann , and Alexandre Gramfort .\nIn particular, they should take less memory with older NumPy versions\n(prior to 1.7.2).\nPrecision-recall and ROC examples now use train_test_split, and have more\nexplanation of why these metrics are useful. By Kyle Kastner\nThe training algorithm for decomposition.NMF is faster for\nsparse matrices and has much lower memory complexity, meaning it will\nscale up gracefully to large datasets. By Lars Buitinck .\nAdded svd_method option with default value to \u201crandomized\u201d to\n decomposition.FactorAnalysis to save memory and\nsignificantly speedup computation by Denis Engemann , and\n Alexandre Gramfort .\nChanged cross_validation.StratifiedKFold to try and\npreserve as much of the original ordering of samples as possible so as\nnot to hide overfitting on datasets with a non-negligible level of\nsamples dependency.\nBy Daniel Nouri and Olivier Grisel .\nAdd multi-output support to gaussian_process.GaussianProcess \nby John Novak.\nSupport for precomputed distance matrices in nearest neighbor estimators\nby Robert Layton and Joel Nothman .\nNorm computations optimized for NumPy 1.6 and later versions by\n Lars Buitinck . In particular, the k-means algorithm no longer\nneeds a temporary data structure the size of its input.\ndummy.DummyClassifier can now be used to predict a constant\noutput value. By Manoj Kumar .\ndummy.DummyRegressor has now a strategy parameter which allows\nto predict the mean, the median of the training set or a constant\noutput value. By Maheshakya Wijewardena .\nMulti-label classification output in multilabel indicator format\nis now supported by metrics.roc_auc_score and\n metrics.average_precision_score by Arnaud Joly .\nSignificant performance improvements (more than 100x speedup for\nlarge problems) in isotonic.IsotonicRegression by\n Andrew Tulloch .\nSpeed and memory usage improvements to the SGD algorithm for linear\nmodels: it now uses threads, not separate processes, when n_jobs>1 .\nBy Lars Buitinck .\nGrid search and cross validation allow NaNs in the input arrays so that\npreprocessors such as preprocessing.Imputer can be trained within the cross validation loop,\navoiding potentially skewed results.\nRidge regression can now deal with sample weights in feature space\n(only sample space until then). By Michael Eickenberg .\nBoth solutions are provided by the Cholesky solver.\nSeveral classification and regression metrics now support weighted\nsamples with the new sample_weight argument:\n metrics.accuracy_score ,\n metrics.zero_one_loss ,\n metrics.precision_score ,\n metrics.average_precision_score ,\n metrics.f1_score ,\n metrics.fbeta_score ,\n metrics.recall_score ,\n metrics.roc_auc_score ,\n metrics.explained_variance_score ,\n metrics.mean_squared_error ,\n metrics.mean_absolute_error ,\n metrics.r2_score .\nBy Noel Dawe .\nSpeed up of the sample generator\n datasets.make_multilabel_classification . By Joel Nothman .\nDocumentation improvements \u00b6\nThe Working With Text Data tutorial\nhas now been worked in to the main documentation\u2019s tutorial section.\nIncludes exercises and skeletons for tutorial presentation.\nOriginal tutorial created by several authors including\n Olivier Grisel , Lars Buitinck and many others.\nTutorial integration into the scikit-learn documentation\nby Jaques Grobler\nAdded Computational Performance \ndocumentation. Discussion and examples of prediction latency / throughput\nand different factors that have influence over speed. Additional tips for\nbuilding faster models and choosing a relevant compromise between speed\nand predictive power.\nBy Eustache Diemert .\nBug fixes \u00b6\nFixed bug in decomposition.MiniBatchDictionaryLearning :\n partial_fit was not working properly.\nFixed bug in linear_model.stochastic_gradient :\n l1_ratio was used as (1.0 - l1_ratio) .\nFixed bug in multiclass.OneVsOneClassifier with string\nlabels\nFixed a bug in LassoCV and\n ElasticNetCV : they would not\npre-compute the Gram matrix with precompute=True or\n precompute=\"auto\" and n_samples > n_features . By Manoj Kumar .\nFixed incorrect estimation of the degrees of freedom in\n feature_selection.f_regression when variates are not centered.\nBy Virgile Fritsch .\nFixed a race condition in parallel processing with\n pre_dispatch != \"all\" (for instance, in cross_val_score ).\nBy Olivier Grisel .\nRaise error in cluster.FeatureAgglomeration and\n cluster.WardAgglomeration when no samples are given,\nrather than returning meaningless clustering.\nFixed bug in gradient_boosting.GradientBoostingRegressor with\n loss='huber' : gamma might have not been initialized.\nFixed feature importances as computed with a forest of randomized trees\nwhen fit with sample_weight != None and/or with bootstrap=True .\nBy Gilles Louppe .\nAPI changes summary \u00b6\nsklearn.hmm is deprecated. Its removal is planned\nfor the 0.17 release.\nUse of covariance.EllipticEnvelop has now been removed after\ndeprecation.\nPlease use covariance.EllipticEnvelope instead.\ncluster.Ward is deprecated. Use\n cluster.AgglomerativeClustering instead.\ncluster.WardClustering is deprecated. Use\ncluster.AgglomerativeClustering instead.\ncross_validation.Bootstrap is deprecated.\n cross_validation.KFold or\n cross_validation.ShuffleSplit are recommended instead.\nDirect support for the sequence of sequences (or list of lists) multilabel\nformat is deprecated. To convert to and from the supported binary\nindicator matrix format, use\n MultiLabelBinarizer .\nBy Joel Nothman .\nAdd score method to PCA following the model of\nprobabilistic PCA and deprecate\n ProbabilisticPCA model whose\nscore implementation is not correct. The computation now also exploits the\nmatrix inversion lemma for faster computation. By Alexandre Gramfort .\nThe score method of FactorAnalysis \nnow returns the average log-likelihood of the samples. Use score_samples\nto get log-likelihood of each sample. By Alexandre Gramfort .\nGenerating boolean masks (the setting indices=False )\nfrom cross-validation generators is deprecated.\nSupport for masks will be removed in 0.17.\nThe generators have produced arrays of indices by default since 0.10.\nBy Joel Nothman .\n1-d arrays containing strings with dtype=object (as used in Pandas)\nare now considered valid classification targets. This fixes a regression\nfrom version 0.13 in some classifiers. By Joel Nothman .\nFix wrong explained_variance_ratio_ attribute in\n RandomizedPCA .\nBy Alexandre Gramfort .\nFit alphas for each l1_ratio instead of mean_l1_ratio in\n linear_model.ElasticNetCV and linear_model.LassoCV .\nThis changes the shape of alphas_ from (n_alphas,) to\n (n_l1_ratio, n_alphas) if the l1_ratio provided is a 1-D array like\nobject of length greater than one.\nBy Manoj Kumar .\nFix linear_model.ElasticNetCV and linear_model.LassoCV \nwhen fitting intercept and input data is sparse. The automatic grid\nof alphas was not computed correctly and the scaling with normalize\nwas wrong. By Manoj Kumar .\nFix wrong maximal number of features drawn ( max_features ) at each split\nfor decision trees, random forests and gradient tree boosting.\nPreviously, the count for the number of drawn features started only after\none non constant features in the split. This bug fix will affect\ncomputational and generalization performance of those algorithms in the\npresence of constant features. To get back previous generalization\nperformance, you should modify the value of max_features .\nBy Arnaud Joly .\nFix wrong maximal number of features drawn ( max_features ) at each split\nfor ensemble.ExtraTreesClassifier and\n ensemble.ExtraTreesRegressor . Previously, only non constant\nfeatures in the split was counted as drawn. Now constant features are\ncounted as drawn. Furthermore at least one feature must be non constant\nin order to make a valid split. This bug fix will affect\ncomputational and generalization performance of extra trees in the\npresence of constant features. To get back previous generalization\nperformance, you should modify the value of max_features .\nBy Arnaud Joly .\nFix utils.compute_class_weight when class_weight==\"auto\" .\nPreviously it was broken for input of non-integer dtype and the\nweighted array that was returned was wrong. By Manoj Kumar .\nFix cross_validation.Bootstrap to return ValueError \nwhen n_train + n_test > n . By Ronald Phlypo .\nPeople \u00b6\nList of contributors for release 0.15 by number of commits.\n312 Olivier Grisel\n275 Lars Buitinck\n221 Gael Varoquaux\n148 Arnaud Joly\n134 Johannes Sch\u00f6nberger\n119 Gilles Louppe\n113 Joel Nothman\n111 Alexandre Gramfort\n95 Jaques Grobler\n89 Denis Engemann\n83 Peter Prettenhofer\n83 Alexander Fabisch\n62 Mathieu Blondel\n60 Eustache Diemert\n60 Nelle Varoquaux\n49 Michael Bommarito\n45 Manoj-Kumar-S\n28 Kyle Kastner\n26 Andreas Mueller\n22 Noel Dawe\n21 Maheshakya Wijewardena\n21 Brooke Osborn\n21 Hamzeh Alsalhi\n21 Jake VanderPlas\n21 Philippe Gervais\n19 Bala Subrahmanyam Varanasi\n12 Ronald Phlypo\n10 Mikhail Korobov\n8 Thomas Unterthiner\n8 Jeffrey Blackburne\n8 eltermann\n8 bwignall\n7 Ankit Agrawal\n7 CJ Carey\n6 Daniel Nouri\n6 Chen Liu\n6 Michael Eickenberg\n6 ugurthemaster\n5 Aaron Schumacher\n5 Baptiste Lagarde\n5 Rajat Khanduja\n5 Robert McGibbon\n5 Sergio Pascual\n4 Alexis Metaireau\n4 Ignacio Rossi\n4 Virgile Fritsch\n4 Sebastian S\u00e4ger\n4 Ilambharathi Kanniah\n4 sdenton4\n4 Robert Layton\n4 Alyssa\n4 Amos Waterland\n3 Andrew Tulloch\n3 murad\n3 Steven Maude\n3 Karol Pysniak\n3 Jacques Kvam\n3 cgohlke\n3 cjlin\n3 Michael Becker\n3 hamzeh\n3 Eric Jacobsen\n3 john collins\n3 kaushik94\n3 Erwin Marsi\n2 csytracy\n2 LK\n2 Vlad Niculae\n2 Laurent Direr\n2 Erik Shilts\n2 Raul Garreta\n2 Yoshiki V\u00e1zquez Baeza\n2 Yung Siang Liau\n2 abhishek thakur\n2 James Yu\n2 Rohit Sivaprasad\n2 Roland Szabo\n2 amormachine\n2 Alexis Mignon\n2 Oscar Carlsson\n2 Nantas Nardelli\n2 jess010\n2 kowalski87\n2 Andrew Clegg\n2 Federico Vaggi\n2 Simon Frid\n2 F\u00e9lix-Antoine Fortin\n1 Ralf Gommers\n1 t-aft\n1 Ronan Amicel\n1 Rupesh Kumar Srivastava\n1 Ryan Wang\n1 Samuel Charron\n1 Samuel St-Jean\n1 Fabian Pedregosa\n1 Skipper Seabold\n1 Stefan Walk\n1 Stefan van der Walt\n1 Stephan Hoyer\n1 Allen Riddell\n1 Valentin Haenel\n1 Vijay Ramesh\n1 Will Myers\n1 Yaroslav Halchenko\n1 Yoni Ben-Meshulam\n1 Yury V. Zaytsev\n1 adrinjalali\n1 ai8rahim\n1 alemagnani\n1 alex\n1 benjamin wilson\n1 chalmerlowe\n1 dzikie dro\u017cd\u017ce\n1 jamestwebber\n1 matrixorz\n1 popo\n1 samuela\n1 Fran\u00e7ois Boulogne\n1 Alexander Measure\n1 Ethan White\n1 Guilherme Trein\n1 Hendrik Heuer\n1 IvicaJovic\n1 Jan Hendrik Metzen\n1 Jean Michel Rouly\n1 Eduardo Ari\u00f1o de la Rubia\n1 Jelle Zijlstra\n1 Eddy L O Jansson\n1 Denis\n1 John\n1 John Schmidt\n1 Jorge Ca\u00f1ardo Alastuey\n1 Joseph Perla\n1 Joshua Vredevoogd\n1 Jos\u00e9 Ricardo\n1 Julien Miotte\n1 Kemal Eren\n1 Kenta Sato\n1 David Cournapeau\n1 Kyle Kelley\n1 Daniele Medri\n1 Laurent Luce\n1 Laurent Pierron\n1 Luis Pedro Coelho\n1 DanielWeitzenfeld\n1 Craig Thompson\n1 Chyi-Kwei Yau\n1 Matthew Brett\n1 Matthias Feurer\n1 Max Linke\n1 Chris Filo Gorgolewski\n1 Charles Earl\n1 Michael Hanke\n1 Michele Orr\u00f9\n1 Bryan Lunt\n1 Brian Kearns\n1 Paul Butler\n1 Pawe\u0142 Mandera\n1 Peter\n1 Andrew Ash\n1 Pietro Zambelli\n1 staubda\nVersion 0.14 \u00b6\nAugust 7, 2013\nChangelog \u00b6\nMissing values with sparse and dense matrices can be imputed with the\ntransformer preprocessing.Imputer by Nicolas Tr\u00e9segnie .\nThe core implementation of decisions trees has been rewritten from\nscratch, allowing for faster tree induction and lower memory\nconsumption in all tree-based estimators. By Gilles Louppe .\nAdded ensemble.AdaBoostClassifier and\n ensemble.AdaBoostRegressor , by Noel Dawe and\n Gilles Louppe . See the AdaBoost section of the user\nguide for details and examples.\nAdded grid_search.RandomizedSearchCV and\n grid_search.ParameterSampler for randomized hyperparameter\noptimization. By Andreas M\u00fcller .\nAdded biclustering algorithms\n( sklearn.cluster.bicluster.SpectralCoclustering and\n sklearn.cluster.bicluster.SpectralBiclustering ), data\ngeneration methods ( sklearn.datasets.make_biclusters and\n sklearn.datasets.make_checkerboard ), and scoring metrics\n( sklearn.metrics.consensus_score ). By Kemal Eren .\nAdded Restricted Boltzmann Machines \n( neural_network.BernoulliRBM ). By Yann Dauphin .\nPython 3 support by Justin Vincent , Lars Buitinck ,\n Subhodeep Moitra and Olivier Grisel . All tests now pass under\nPython 3.3.\nAbility to pass one penalty (alpha value) per target in\n linear_model.Ridge , by @eickenberg and Mathieu Blondel .\nFixed sklearn.linear_model.stochastic_gradient.py L2 regularization\nissue (minor practical significance).\nBy Norbert Crombach and Mathieu Blondel .\nAdded an interactive version of Andreas M\u00fcller \u2019s\n Machine Learning Cheat Sheet (for scikit-learn) \nto the documentation. See Choosing the right estimator .\nBy Jaques Grobler .\ngrid_search.GridSearchCV and\n cross_validation.cross_val_score now support the use of advanced\nscoring function such as area under the ROC curve and f-beta scores.\nSee The scoring parameter: defining model evaluation rules for details. By Andreas M\u00fcller \nand Lars Buitinck .\nPassing a function from sklearn.metrics as score_func is\ndeprecated.\nMulti-label classification output is now supported by\n metrics.accuracy_score , metrics.zero_one_loss ,\n metrics.f1_score , metrics.fbeta_score ,\n metrics.classification_report ,\n metrics.precision_score and metrics.recall_score \nby Arnaud Joly .\nTwo new metrics metrics.hamming_loss and\n metrics.jaccard_similarity_score \nare added with multi-label support by Arnaud Joly .\nSpeed and memory usage improvements in\n feature_extraction.text.CountVectorizer and\n feature_extraction.text.TfidfVectorizer ,\nby Jochen Wersd\u00f6rfer and Roman Sinayev.\nThe min_df parameter in\n feature_extraction.text.CountVectorizer and\n feature_extraction.text.TfidfVectorizer , which used to be 2,\nhas been reset to 1 to avoid unpleasant surprises (empty vocabularies)\nfor novice users who try it out on tiny document collections.\nA value of at least 2 is still recommended for practical use.\nsvm.LinearSVC , linear_model.SGDClassifier and\n linear_model.SGDRegressor now have a sparsify method that\nconverts their coef_ into a sparse matrix, meaning stored models\ntrained using these estimators can be made much more compact.\nlinear_model.SGDClassifier now produces multiclass probability\nestimates when trained under log loss or modified Huber loss.\nHyperlinks to documentation in example code on the website by\n Martin Luessi .\nFixed bug in preprocessing.MinMaxScaler causing incorrect scaling\nof the features for non-default feature_range settings. By Andreas\nM\u00fcller .\nmax_features in tree.DecisionTreeClassifier ,\n tree.DecisionTreeRegressor and all derived ensemble estimators\nnow supports percentage values. By Gilles Louppe .\nPerformance improvements in isotonic.IsotonicRegression by\n Nelle Varoquaux .\nmetrics.accuracy_score has an option normalize to return\nthe fraction or the number of correctly classified sample\nby Arnaud Joly .\nAdded metrics.log_loss that computes log loss, aka cross-entropy\nloss. By Jochen Wersd\u00f6rfer and Lars Buitinck .\nA bug that caused ensemble.AdaBoostClassifier \u2019s to output\nincorrect probabilities has been fixed.\nFeature selectors now share a mixin providing consistent transform ,\n inverse_transform and get_support methods. By Joel Nothman .\nA fitted grid_search.GridSearchCV or\n grid_search.RandomizedSearchCV can now generally be pickled.\nBy Joel Nothman .\nRefactored and vectorized implementation of metrics.roc_curve \nand metrics.precision_recall_curve . By Joel Nothman .\nThe new estimator sklearn.decomposition.TruncatedSVD \nperforms dimensionality reduction using SVD on sparse matrices,\nand can be used for latent semantic analysis (LSA).\nBy Lars Buitinck .\nAdded self-contained example of out-of-core learning on text data\n Out-of-core classification of text documents .\nBy Eustache Diemert .\nThe default number of components for\n sklearn.decomposition.RandomizedPCA is now correctly documented\nto be n_features . This was the default behavior, so programs using it\nwill continue to work as they did.\nsklearn.cluster.KMeans now fits several orders of magnitude\nfaster on sparse data (the speedup depends on the sparsity). By\n Lars Buitinck .\nReduce memory footprint of FastICA by Denis Engemann and\n Alexandre Gramfort .\nVerbose output in sklearn.ensemble.gradient_boosting now uses\na column format and prints progress in decreasing frequency.\nIt also shows the remaining time. By Peter Prettenhofer .\nsklearn.ensemble.gradient_boosting provides out-of-bag improvement\n oob_improvement_ \nrather than the OOB score for model selection. An example that shows\nhow to use OOB estimates to select the number of trees was added.\nBy Peter Prettenhofer .\nMost metrics now support string labels for multiclass classification\nby Arnaud Joly and Lars Buitinck .\nNew OrthogonalMatchingPursuitCV class by Alexandre Gramfort \nand Vlad Niculae .\nFixed a bug in sklearn.covariance.GraphLassoCV : the\n\u2018alphas\u2019 parameter now works as expected when given a list of\nvalues. By Philippe Gervais.\nFixed an important bug in sklearn.covariance.GraphLassoCV \nthat prevented all folds provided by a CV object to be used (only\nthe first 3 were used). When providing a CV object, execution\ntime may thus increase significantly compared to the previous\nversion (bug results are correct now). By Philippe Gervais.\ncross_validation.cross_val_score and the grid_search \nmodule is now tested with multi-output data by Arnaud Joly .\ndatasets.make_multilabel_classification can now return\nthe output in label indicator multilabel format by Arnaud Joly .\nK-nearest neighbors, neighbors.KNeighborsRegressor \nand neighbors.RadiusNeighborsRegressor ,\nand radius neighbors, neighbors.RadiusNeighborsRegressor and\n neighbors.RadiusNeighborsClassifier support multioutput data\nby Arnaud Joly .\nRandom state in LibSVM-based estimators ( svm.SVC , NuSVC ,\n OneClassSVM , svm.SVR , svm.NuSVR ) can now be\ncontrolled. This is useful to ensure consistency in the probability\nestimates for the classifiers trained with probability=True . By\n Vlad Niculae .\nOut-of-core learning support for discrete naive Bayes classifiers\n sklearn.naive_bayes.MultinomialNB and\n sklearn.naive_bayes.BernoulliNB by adding the partial_fit \nmethod by Olivier Grisel .\nNew website design and navigation by Gilles Louppe , Nelle Varoquaux ,\nVincent Michel and Andreas M\u00fcller .\nImproved documentation on multi-class, multi-label and multi-output\nclassification by Yannick Schwartz and Arnaud Joly .\nBetter input and error handling in the metrics module by\n Arnaud Joly and Joel Nothman .\nSpeed optimization of the hmm module by Mikhail Korobov\nSignificant speed improvements for sklearn.cluster.DBSCAN \nby cleverless\nAPI changes summary \u00b6\nThe auc_score was renamed roc_auc_score .\nTesting scikit-learn with sklearn.test() is deprecated. Use\n nosetests sklearn from the command line.\nFeature importances in tree.DecisionTreeClassifier ,\n tree.DecisionTreeRegressor and all derived ensemble estimators\nare now computed on the fly when accessing the feature_importances_ \nattribute. Setting compute_importances=True is no longer required.\nBy Gilles Louppe .\nlinear_model.lasso_path and\n linear_model.enet_path can return its results in the same\nformat as that of linear_model.lars_path . This is done by\nsetting the return_models parameter to False . By\n Jaques Grobler and Alexandre Gramfort\ngrid_search.IterGrid was renamed to\n grid_search.ParameterGrid .\nFixed bug in KFold causing imperfect class balance in some\ncases. By Alexandre Gramfort and Tadej Jane\u017e.\nsklearn.neighbors.BallTree has been refactored, and a\n sklearn.neighbors.KDTree has been\nadded which shares the same interface. The Ball Tree now works with\na wide variety of distance metrics. Both classes have many new\nmethods, including single-tree and dual-tree queries, breadth-first\nand depth-first searching, and more advanced queries such as\nkernel density estimation and 2-point correlation functions.\nBy Jake Vanderplas\nSupport for scipy.spatial.cKDTree within neighbors queries has been\nremoved, and the functionality replaced with the new KDTree \nclass.\nsklearn.neighbors.KernelDensity has been added, which performs\nefficient kernel density estimation with a variety of kernels.\nsklearn.decomposition.KernelPCA now always returns output with\n n_components components, unless the new parameter remove_zero_eig \nis set to True . This new behavior is consistent with the way\nkernel PCA was always documented; previously, the removal of components\nwith zero eigenvalues was tacitly performed on all data.\ngcv_mode=\"auto\" no longer tries to perform SVD on a densified\nsparse matrix in sklearn.linear_model.RidgeCV .\nSparse matrix support in sklearn.decomposition.RandomizedPCA \nis now deprecated in favor of the new TruncatedSVD .\ncross_validation.KFold and\n cross_validation.StratifiedKFold now enforce n_folds >= 2 \notherwise a ValueError is raised. By Olivier Grisel .\ndatasets.load_files \u2019s charset and charset_errors \nparameters were renamed encoding and decode_errors .\nAttribute oob_score_ in sklearn.ensemble.GradientBoostingRegressor \nand sklearn.ensemble.GradientBoostingClassifier \nis deprecated and has been replaced by oob_improvement_ .\nAttributes in OrthogonalMatchingPursuit have been deprecated\n(copy_X, Gram, \u2026) and precompute_gram renamed precompute\nfor consistency. See #2224.\nsklearn.preprocessing.StandardScaler now converts integer input\nto float, and raises a warning. Previously it rounded for dense integer\ninput.\nsklearn.multiclass.OneVsRestClassifier now has a\n decision_function method. This will return the distance of each\nsample from the decision boundary for each class, as long as the\nunderlying estimators implement the decision_function method.\nBy Kyle Kastner .\nBetter input validation, warning on unexpected shapes for y.\nPeople \u00b6\nList of contributors for release 0.14 by number of commits.\n277 Gilles Louppe\n245 Lars Buitinck\n187 Andreas Mueller\n124 Arnaud Joly\n112 Jaques Grobler\n109 Gael Varoquaux\n107 Olivier Grisel\n102 Noel Dawe\n99 Kemal Eren\n79 Joel Nothman\n75 Jake VanderPlas\n73 Nelle Varoquaux\n71 Vlad Niculae\n65 Peter Prettenhofer\n64 Alexandre Gramfort\n54 Mathieu Blondel\n38 Nicolas Tr\u00e9segnie\n35 eustache\n27 Denis Engemann\n25 Yann N. Dauphin\n19 Justin Vincent\n17 Robert Layton\n15 Doug Coleman\n14 Michael Eickenberg\n13 Robert Marchman\n11 Fabian Pedregosa\n11 Philippe Gervais\n10 Jim Holmstr\u00f6m\n10 Tadej Jane\u017e\n10 syhw\n9 Mikhail Korobov\n9 Steven De Gryze\n8 sergeyf\n7 Ben Root\n7 Hrishikesh Huilgolkar\n6 Kyle Kastner\n6 Martin Luessi\n6 Rob Speer\n5 Federico Vaggi\n5 Raul Garreta\n5 Rob Zinkov\n4 Ken Geis\n3 A. Flaxman\n3 Denton Cockburn\n3 Dougal Sutherland\n3 Ian Ozsvald\n3 Johannes Sch\u00f6nberger\n3 Robert McGibbon\n3 Roman Sinayev\n3 Szabo Roland\n2 Diego Molla\n2 Imran Haque\n2 Jochen Wersd\u00f6rfer\n2 Sergey Karayev\n2 Yannick Schwartz\n2 jamestwebber\n1 Abhijeet Kolhe\n1 Alexander Fabisch\n1 Bastiaan van den Berg\n1 Benjamin Peterson\n1 Daniel Velkov\n1 Fazlul Shahriar\n1 Felix Brockherde\n1 F\u00e9lix-Antoine Fortin\n1 Harikrishnan S\n1 Jack Hale\n1 JakeMick\n1 James McDermott\n1 John Benediktsson\n1 John Zwinck\n1 Joshua Vredevoogd\n1 Justin Pati\n1 Kevin Hughes\n1 Kyle Kelley\n1 Matthias Ekman\n1 Miroslav Shubernetskiy\n1 Naoki Orii\n1 Norbert Crombach\n1 Rafael Cunha de Almeida\n1 Rolando Espinoza La fuente\n1 Seamus Abshere\n1 Sergey Feldman\n1 Sergio Medina\n1 Stefano Lattarini\n1 Steve Koch\n1 Sturla Molden\n1 Thomas Jarosch\n1 Yaroslav Halchenko\nVersion 0.13.1 \u00b6\nFebruary 23, 2013\nThe 0.13.1 release only fixes some bugs and does not add any new functionality.\nChangelog \u00b6\nFixed a testing error caused by the function cross_validation.train_test_split being\ninterpreted as a test by Yaroslav Halchenko .\nFixed a bug in the reassignment of small clusters in the cluster.MiniBatchKMeans \nby Gael Varoquaux .\nFixed default value of gamma in decomposition.KernelPCA by Lars Buitinck .\nUpdated joblib to 0.7.0d by Gael Varoquaux .\nFixed scaling of the deviance in ensemble.GradientBoostingClassifier by Peter Prettenhofer .\nBetter tie-breaking in multiclass.OneVsOneClassifier by Andreas M\u00fcller .\nOther small improvements to tests and documentation.\nPeople \u00b6\nList of contributors for release 0.13.1 by number of commits.\n16 Lars Buitinck\n12 Andreas M\u00fcller\n8 Gael Varoquaux\n5 Robert Marchman\n3 Peter Prettenhofer\n2 Hrishikesh Huilgolkar\n1 Bastiaan van den Berg\n1 Diego Molla\n1 Gilles Louppe\n1 Mathieu Blondel\n1 Nelle Varoquaux\n1 Rafael Cunha de Almeida\n1 Rolando Espinoza La fuente\n1 Vlad Niculae\n1 Yaroslav Halchenko\nVersion 0.13 \u00b6\nJanuary 21, 2013\nNew Estimator Classes \u00b6\ndummy.DummyClassifier and dummy.DummyRegressor , two\ndata-independent predictors by Mathieu Blondel . Useful to sanity-check\nyour estimators. See Dummy estimators in the user guide.\nMultioutput support added by Arnaud Joly .\ndecomposition.FactorAnalysis , a transformer implementing the\nclassical factor analysis, by Christian Osendorfer and Alexandre\nGramfort . See Factor Analysis in the user guide.\nfeature_extraction.FeatureHasher , a transformer implementing the\n\u201chashing trick\u201d for fast, low-memory feature extraction from string fields\nby Lars Buitinck and feature_extraction.text.HashingVectorizer \nfor text documents by Olivier Grisel See Feature hashing and\n Vectorizing a large text corpus with the hashing trick for the documentation and sample usage.\npipeline.FeatureUnion , a transformer that concatenates\nresults of several other transformers by Andreas M\u00fcller . See\n FeatureUnion: composite feature spaces in the user guide.\nrandom_projection.GaussianRandomProjection ,\n random_projection.SparseRandomProjection and the function\n random_projection.johnson_lindenstrauss_min_dim . The first two are\ntransformers implementing Gaussian and sparse random projection matrix\nby Olivier Grisel and Arnaud Joly .\nSee Random Projection in the user guide.\nkernel_approximation.Nystroem , a transformer for approximating\narbitrary kernels by Andreas M\u00fcller . See\n Nystroem Method for Kernel Approximation in the user guide.\npreprocessing.OneHotEncoder , a transformer that computes binary\nencodings of categorical features by Andreas M\u00fcller . See\n Encoding categorical features in the user guide.\nlinear_model.PassiveAggressiveClassifier and\n linear_model.PassiveAggressiveRegressor , predictors implementing\nan efficient stochastic optimization for linear models by Rob Zinkov and\n Mathieu Blondel . See Passive Aggressive Algorithms in the user\nguide.\nensemble.RandomTreesEmbedding , a transformer for creating high-dimensional\nsparse representations using ensembles of totally random trees by Andreas M\u00fcller .\nSee Totally Random Trees Embedding in the user guide.\nmanifold.SpectralEmbedding and function\n manifold.spectral_embedding , implementing the \u201claplacian\neigenmaps\u201d transformation for non-linear dimensionality reduction by Wei\nLi. See Spectral Embedding in the user guide.\nisotonic.IsotonicRegression by Fabian Pedregosa , Alexandre Gramfort \nand Nelle Varoquaux ,\nChangelog \u00b6\nmetrics.zero_one_loss (formerly metrics.zero_one ) now has\noption for normalized output that reports the fraction of\nmisclassifications, rather than the raw number of misclassifications. By\nKyle Beauchamp.\ntree.DecisionTreeClassifier and all derived ensemble models now\nsupport sample weighting, by Noel Dawe and Gilles Louppe .\nSpeedup improvement when using bootstrap samples in forests of randomized\ntrees, by Peter Prettenhofer and Gilles Louppe .\nPartial dependence plots for Gradient Tree Boosting in\n ensemble.partial_dependence.partial_dependence by Peter\nPrettenhofer . See Partial Dependence Plots for an\nexample.\nThe table of contents on the website has now been made expandable by\n Jaques Grobler .\nfeature_selection.SelectPercentile now breaks ties\ndeterministically instead of returning all equally ranked features.\nfeature_selection.SelectKBest and\n feature_selection.SelectPercentile are more numerically stable\nsince they use scores, rather than p-values, to rank results. This means\nthat they might sometimes select different features than they did\npreviously.\nRidge regression and ridge classification fitting with sparse_cg solver\nno longer has quadratic memory complexity, by Lars Buitinck and\n Fabian Pedregosa .\nRidge regression and ridge classification now support a new fast solver\ncalled lsqr , by Mathieu Blondel .\nSpeed up of metrics.precision_recall_curve by Conrad Lee.\nAdded support for reading/writing svmlight files with pairwise\npreference attribute (qid in svmlight file format) in\n datasets.dump_svmlight_file and\n datasets.load_svmlight_file by Fabian Pedregosa .\nFaster and more robust metrics.confusion_matrix and\n Clustering performance evaluation by Wei Li.\ncross_validation.cross_val_score now works with precomputed kernels\nand affinity matrices, by Andreas M\u00fcller .\nLARS algorithm made more numerically stable with heuristics to drop\nregressors too correlated as well as to stop the path when\nnumerical noise becomes predominant, by Gael Varoquaux .\nFaster implementation of metrics.precision_recall_curve by\nConrad Lee.\nNew kernel metrics.chi2_kernel by Andreas M\u00fcller , often used\nin computer vision applications.\nFix of longstanding bug in naive_bayes.BernoulliNB fixed by\nShaun Jackman.\nImplemented predict_proba in multiclass.OneVsRestClassifier ,\nby Andrew Winterman.\nImprove consistency in gradient boosting: estimators\n ensemble.GradientBoostingRegressor and\n ensemble.GradientBoostingClassifier use the estimator\n tree.DecisionTreeRegressor instead of the\n tree._tree.Tree data structure by Arnaud Joly .\nFixed a floating point exception in the decision trees \nmodule, by Seberg.\nFix metrics.roc_curve fails when y_true has only one class\nby Wei Li.\nAdd the metrics.mean_absolute_error function which computes the\nmean absolute error. The metrics.mean_squared_error ,\n metrics.mean_absolute_error and\n metrics.r2_score metrics support multioutput by Arnaud Joly .\nFixed class_weight support in svm.LinearSVC and\n linear_model.LogisticRegression by Andreas M\u00fcller . The meaning\nof class_weight was reversed as erroneously higher weight meant less\npositives of a given class in earlier releases.\nImprove narrative documentation and consistency in\n sklearn.metrics for regression and classification metrics\nby Arnaud Joly .\nFixed a bug in sklearn.svm.SVC when using csr-matrices with\nunsorted indices by Xinfan Meng and Andreas M\u00fcller .\nMiniBatchKMeans : Add random reassignment of cluster centers\nwith little observations attached to them, by Gael Varoquaux .\nAPI changes summary \u00b6\nRenamed all occurrences of n_atoms to n_components for consistency.\nThis applies to decomposition.DictionaryLearning ,\n decomposition.MiniBatchDictionaryLearning ,\n decomposition.dict_learning , decomposition.dict_learning_online .\nRenamed all occurrences of max_iters to max_iter for consistency.\nThis applies to semi_supervised.LabelPropagation and\n semi_supervised.label_propagation.LabelSpreading .\nRenamed all occurrences of learn_rate to learning_rate for\nconsistency in ensemble.BaseGradientBoosting and\n ensemble.GradientBoostingRegressor .\nThe module sklearn.linear_model.sparse is gone. Sparse matrix support\nwas already integrated into the \u201cregular\u201d linear models.\nsklearn.metrics.mean_square_error , which incorrectly returned the\naccumulated error, was removed. Use mean_squared_error instead.\nPassing class_weight parameters to fit methods is no longer\nsupported. Pass them to estimator constructors instead.\nGMMs no longer have decode and rvs methods. Use the score ,\n predict or sample methods instead.\nThe solver fit option in Ridge regression and classification is now\ndeprecated and will be removed in v0.14. Use the constructor option\ninstead.\nfeature_extraction.text.DictVectorizer now returns sparse\nmatrices in the CSR format, instead of COO.\nRenamed k in cross_validation.KFold and\n cross_validation.StratifiedKFold to n_folds , renamed\n n_bootstraps to n_iter in cross_validation.Bootstrap .\nRenamed all occurrences of n_iterations to n_iter for consistency.\nThis applies to cross_validation.ShuffleSplit ,\n cross_validation.StratifiedShuffleSplit ,\n utils.randomized_range_finder and utils.randomized_svd .\nReplaced rho in linear_model.ElasticNet and\n linear_model.SGDClassifier by l1_ratio . The rho parameter\nhad different meanings; l1_ratio was introduced to avoid confusion.\nIt has the same meaning as previously rho in\n linear_model.ElasticNet and (1-rho) in\n linear_model.SGDClassifier .\nlinear_model.LassoLars and linear_model.Lars now\nstore a list of paths in the case of multiple targets, rather than\nan array of paths.\nThe attribute gmm of hmm.GMMHMM was renamed to gmm_ \nto adhere more strictly with the API.\ncluster.spectral_embedding was moved to\n manifold.spectral_embedding .\nRenamed eig_tol in manifold.spectral_embedding ,\n cluster.SpectralClustering to eigen_tol , renamed mode \nto eigen_solver .\nRenamed mode in manifold.spectral_embedding and\n cluster.SpectralClustering to eigen_solver .\nclasses_ and n_classes_ attributes of\n tree.DecisionTreeClassifier and all derived ensemble models are\nnow flat in case of single output problems and nested in case of\nmulti-output problems.\nThe estimators_ attribute of\n ensemble.gradient_boosting.GradientBoostingRegressor and\n ensemble.gradient_boosting.GradientBoostingClassifier is now an\narray of :class:\u2019tree.DecisionTreeRegressor\u2019.\nRenamed chunk_size to batch_size in\n decomposition.MiniBatchDictionaryLearning and\n decomposition.MiniBatchSparsePCA for consistency.\nsvm.SVC and svm.NuSVC now provide a classes_ \nattribute and support arbitrary dtypes for labels y .\nAlso, the dtype returned by predict now reflects the dtype of\n y during fit (used to be np.float ).\nChanged default test_size in cross_validation.train_test_split \nto None, added possibility to infer test_size from train_size in\n cross_validation.ShuffleSplit and\n cross_validation.StratifiedShuffleSplit .\nRenamed function sklearn.metrics.zero_one to\n sklearn.metrics.zero_one_loss . Be aware that the default behavior\nin sklearn.metrics.zero_one_loss is different from\n sklearn.metrics.zero_one : normalize=False is changed to\n normalize=True .\nRenamed function metrics.zero_one_score to\n metrics.accuracy_score .\ndatasets.make_circles now has the same number of inner and outer points.\nIn the Naive Bayes classifiers, the class_prior parameter was moved\nfrom fit to __init__ .\nPeople \u00b6\nList of contributors for release 0.13 by number of commits.\n364 Andreas M\u00fcller\n143 Arnaud Joly\n137 Peter Prettenhofer\n131 Gael Varoquaux\n117 Mathieu Blondel\n108 Lars Buitinck\n106 Wei Li\n101 Olivier Grisel\n65 Vlad Niculae\n54 Gilles Louppe\n40 Jaques Grobler\n38 Alexandre Gramfort\n30 Rob Zinkov\n19 Aymeric Masurelle\n18 Andrew Winterman\n17 Fabian Pedregosa\n17 Nelle Varoquaux\n16 Christian Osendorfer\n14 Daniel Nouri\n13 Virgile Fritsch\n13 syhw\n12 Satrajit Ghosh\n10 Corey Lynch\n10 Kyle Beauchamp\n9 Brian Cheung\n9 Immanuel Bayer\n9 mr.Shu\n8 Conrad Lee\n8 James Bergstra\n7 Tadej Jane\u017e\n6 Brian Cajes\n6 Jake Vanderplas\n6 Michael\n6 Noel Dawe\n6 Tiago Nunes\n6 cow\n5 Anze\n5 Shiqiao Du\n4 Christian Jauvin\n4 Jacques Kvam\n4 Richard T. Guy\n4 Robert Layton\n3 Alexandre Abraham\n3 Doug Coleman\n3 Scott Dickerson\n2 ApproximateIdentity\n2 John Benediktsson\n2 Mark Veronda\n2 Matti Lyra\n2 Mikhail Korobov\n2 Xinfan Meng\n1 Alejandro Weinstein\n1 Alexandre Passos\n1 Christoph Deil\n1 Eugene Nizhibitsky\n1 Kenneth C. Arnold\n1 Luis Pedro Coelho\n1 Miroslav Batchkarov\n1 Pavel\n1 Sebastian Berg\n1 Shaun Jackman\n1 Subhodeep Moitra\n1 bob\n1 dengemann\n1 emanuele\n1 x006\nVersion 0.12.1 \u00b6\nOctober 8, 2012\nThe 0.12.1 release is a bug-fix release with no additional features, but is\ninstead a set of bug fixes\nChangelog \u00b6\nImproved numerical stability in spectral embedding by Gael\nVaroquaux\nDoctest under windows 64bit by Gael Varoquaux\nDocumentation fixes for elastic net by Andreas M\u00fcller and\n Alexandre Gramfort\nProper behavior with fortran-ordered NumPy arrays by Gael Varoquaux\nMake GridSearchCV work with non-CSR sparse matrix by Lars Buitinck\nFix parallel computing in MDS by Gael Varoquaux\nFix Unicode support in count vectorizer by Andreas M\u00fcller\nFix MinCovDet breaking with X.shape = (3, 1) by Virgile Fritsch\nFix clone of SGD objects by Peter Prettenhofer\nStabilize GMM by Virgile Fritsch\nPeople \u00b6\n14 Peter Prettenhofer\n12 Gael Varoquaux\n10 Andreas M\u00fcller\n5 Lars Buitinck\n3 Virgile Fritsch\n1 Alexandre Gramfort\n1 Gilles Louppe\n1 Mathieu Blondel\nVersion 0.12 \u00b6\nSeptember 4, 2012\nChangelog \u00b6\nVarious speed improvements of the decision trees module, by\n Gilles Louppe .\nensemble.GradientBoostingRegressor and\n ensemble.GradientBoostingClassifier now support feature subsampling\nvia the max_features argument, by Peter Prettenhofer .\nAdded Huber and Quantile loss functions to\n ensemble.GradientBoostingRegressor , by Peter Prettenhofer .\nDecision trees and forests of randomized trees \nnow support multi-output classification and regression problems, by\n Gilles Louppe .\nAdded preprocessing.LabelEncoder , a simple utility class to\nnormalize labels or transform non-numerical labels, by Mathieu Blondel .\nAdded the epsilon-insensitive loss and the ability to make probabilistic\npredictions with the modified huber loss in Stochastic Gradient Descent , by\n Mathieu Blondel .\nAdded Multi-dimensional Scaling (MDS) , by Nelle Varoquaux.\nSVMlight file format loader now detects compressed (gzip/bzip2) files and\ndecompresses them on the fly, by Lars Buitinck .\nSVMlight file format serializer now preserves double precision floating\npoint values, by Olivier Grisel .\nA common testing framework for all estimators was added, by Andreas M\u00fcller .\nUnderstandable error messages for estimators that do not accept\nsparse input by Gael Varoquaux\nSpeedups in hierarchical clustering by Gael Varoquaux . In\nparticular building the tree now supports early stopping. This is\nuseful when the number of clusters is not small compared to the\nnumber of samples.\nAdd MultiTaskLasso and MultiTaskElasticNet for joint feature selection,\nby Alexandre Gramfort .\nAdded metrics.auc_score and\n metrics.average_precision_score convenience functions by Andreas\nM\u00fcller .\nImproved sparse matrix support in the Feature selection \nmodule by Andreas M\u00fcller .\nNew word boundaries-aware character n-gram analyzer for the\n Text feature extraction module by @kernc .\nFixed bug in spectral clustering that led to single point clusters\nby Andreas M\u00fcller .\nIn feature_extraction.text.CountVectorizer , added an option to\nignore infrequent words, min_df by Andreas M\u00fcller .\nAdd support for multiple targets in some linear models (ElasticNet, Lasso\nand OrthogonalMatchingPursuit) by Vlad Niculae and\n Alexandre Gramfort .\nFixes in decomposition.ProbabilisticPCA score function by Wei Li.\nFixed feature importance computation in\n Gradient Tree Boosting .\nAPI changes summary \u00b6\nThe old scikits.learn package has disappeared; all code should import\nfrom sklearn instead, which was introduced in 0.9.\nIn metrics.roc_curve , the thresholds array is now returned\nwith it\u2019s order reversed, in order to keep it consistent with the order\nof the returned fpr and tpr .\nIn hmm objects, like hmm.GaussianHMM ,\n hmm.MultinomialHMM , etc., all parameters must be passed to the\nobject when initialising it and not through fit . Now fit will\nonly accept the data as an input parameter.\nFor all SVM classes, a faulty behavior of gamma was fixed. Previously,\nthe default gamma value was only computed the first time fit was called\nand then stored. It is now recalculated on every call to fit .\nAll Base classes are now abstract meta classes so that they can not be\ninstantiated.\ncluster.ward_tree now also returns the parent array. This is\nnecessary for early-stopping in which case the tree is not\ncompletely built.\nIn feature_extraction.text.CountVectorizer the parameters\n min_n and max_n were joined to the parameter n_gram_range to\nenable grid-searching both at once.\nIn feature_extraction.text.CountVectorizer , words that appear\nonly in one document are now ignored by default. To reproduce\nthe previous behavior, set min_df=1 .\nFixed API inconsistency: linear_model.SGDClassifier.predict_proba now\nreturns 2d array when fit on two classes.\nFixed API inconsistency: discriminant_analysis.QuadraticDiscriminantAnalysis.decision_function \nand discriminant_analysis.LinearDiscriminantAnalysis.decision_function now return 1d arrays\nwhen fit on two classes.\nGrid of alphas used for fitting linear_model.LassoCV and\n linear_model.ElasticNetCV is now stored\nin the attribute alphas_ rather than overriding the init parameter\n alphas .\nLinear models when alpha is estimated by cross-validation store\nthe estimated value in the alpha_ attribute rather than just\n alpha or best_alpha .\nensemble.GradientBoostingClassifier now supports\n ensemble.GradientBoostingClassifier.staged_predict_proba , and\n ensemble.GradientBoostingClassifier.staged_predict .\nsvm.sparse.SVC and other sparse SVM classes are now deprecated.\nThe all classes in the Support Vector Machines module now automatically select the\nsparse or dense representation base on the input.\nAll clustering algorithms now interpret the array X given to fit as\ninput data, in particular cluster.SpectralClustering and\n cluster.AffinityPropagation which previously expected affinity matrices.\nFor clustering algorithms that take the desired number of clusters as a parameter,\nthis parameter is now called n_clusters .\nPeople \u00b6\n267 Andreas M\u00fcller\n94 Gilles Louppe\n89 Gael Varoquaux\n79 Peter Prettenhofer\n60 Mathieu Blondel\n57 Alexandre Gramfort\n52 Vlad Niculae\n45 Lars Buitinck\n44 Nelle Varoquaux\n37 Jaques Grobler\n30 Alexis Mignon\n30 Immanuel Bayer\n27 Olivier Grisel\n16 Subhodeep Moitra\n13 Yannick Schwartz\n12 @kernc\n11 Virgile Fritsch\n9 Daniel Duckworth\n9 Fabian Pedregosa\n9 Robert Layton\n8 John Benediktsson\n7 Marko Burjek\n5 Nicolas Pinto\n4 Alexandre Abraham\n4 Jake Vanderplas\n3 Brian Holt\n3 Edouard Duchesnay\n3 Florian Hoenig\n3 flyingimmidev\n2 Francois Savard\n2 Hannes Schulz\n2 Peter Welinder\n2 Yaroslav Halchenko\n2 Wei Li\n1 Alex Companioni\n1 Brandyn A. White\n1 Bussonnier Matthias\n1 Charles-Pierre Astolfi\n1 Dan O\u2019Huiginn\n1 David Cournapeau\n1 Keith Goodman\n1 Ludwig Schwardt\n1 Olivier Hervieu\n1 Sergio Medina\n1 Shiqiao Du\n1 Tim Sheerman-Chase\n1 buguen\nVersion 0.11 \u00b6\nMay 7, 2012\nChangelog \u00b6\nHighlights \u00b6\nGradient boosted regression trees ( Gradient Tree Boosting )\nfor classification and regression by Peter Prettenhofer \nand Scott White .\nSimple dict-based feature loader with support for categorical variables\n( feature_extraction.DictVectorizer ) by Lars Buitinck .\nAdded Matthews correlation coefficient ( metrics.matthews_corrcoef )\nand added macro and micro average options to\n metrics.precision_score , metrics.recall_score and\n metrics.f1_score by Satrajit Ghosh .\nOut of Bag Estimates of generalization error for Ensemble methods \nby Andreas M\u00fcller .\nRandomized sparse linear models for feature\nselection, by Alexandre Gramfort and Gael Varoquaux\nLabel Propagation for semi-supervised learning, by Clay\nWoolam. Note the semi-supervised API is still work in progress,\nand may change.\nAdded BIC/AIC model selection to classical Gaussian mixture models and unified\nthe API with the remainder of scikit-learn, by Bertrand Thirion\nAdded sklearn.cross_validation.StratifiedShuffleSplit , which is\na sklearn.cross_validation.ShuffleSplit with balanced splits,\nby Yannick Schwartz.\nsklearn.neighbors.NearestCentroid classifier added, along with a\n shrink_threshold parameter, which implements shrunken centroid\nclassification , by Robert Layton .\nOther changes \u00b6\nMerged dense and sparse implementations of Stochastic Gradient Descent module and\nexposed utility extension types for sequential\ndatasets seq_dataset and weight vectors weight_vector \nby Peter Prettenhofer .\nAdded partial_fit (support for online/minibatch learning) and\nwarm_start to the Stochastic Gradient Descent module by Mathieu Blondel .\nDense and sparse implementations of Support Vector Machines classes and\n linear_model.LogisticRegression merged by Lars Buitinck .\nRegressors can now be used as base estimator in the Multiclass and multilabel algorithms \nmodule by Mathieu Blondel .\nAdded n_jobs option to metrics.pairwise.pairwise_distances \nand metrics.pairwise.pairwise_kernels for parallel computation,\nby Mathieu Blondel .\nK-means can now be run in parallel, using the n_jobs argument\nto either K-means or KMeans , by Robert Layton .\nImproved Cross-validation: evaluating estimator performance and Tuning the hyper-parameters of an estimator documentation\nand introduced the new cross_validation.train_test_split \nhelper function by Olivier Grisel\nsvm.SVC members coef_ and intercept_ changed sign for\nconsistency with decision_function ; for kernel==linear ,\n coef_ was fixed in the one-vs-one case, by Andreas M\u00fcller .\nPerformance improvements to efficient leave-one-out cross-validated\nRidge regression, esp. for the n_samples > n_features case, in\n linear_model.RidgeCV , by Reuben Fletcher-Costin.\nRefactoring and simplification of the Text feature extraction \nAPI and fixed a bug that caused possible negative IDF,\nby Olivier Grisel .\nBeam pruning option in _BaseHMM module has been removed since it\nis difficult to Cythonize. If you are interested in contributing a Cython\nversion, you can use the python version in the git history as a reference.\nClasses in Nearest Neighbors now support arbitrary Minkowski metric for\nnearest neighbors searches. The metric can be specified by argument p .\nAPI changes summary \u00b6\ncovariance.EllipticEnvelop is now deprecated - Please use covariance.EllipticEnvelope \ninstead.\nNeighborsClassifier and NeighborsRegressor are gone in the module\n Nearest Neighbors . Use the classes KNeighborsClassifier ,\n RadiusNeighborsClassifier , KNeighborsRegressor \nand/or RadiusNeighborsRegressor instead.\nSparse classes in the Stochastic Gradient Descent module are now deprecated.\nIn mixture.GMM , mixture.DPGMM and mixture.VBGMM ,\nparameters must be passed to an object when initialising it and not through\n fit . Now fit will only accept the data as an input parameter.\nmethods rvs and decode in GMM module are now deprecated.\n sample and score or predict should be used instead.\nattribute _scores and _pvalues in univariate feature selection\nobjects are now deprecated.\n scores_ or pvalues_ should be used instead.\nIn LogisticRegression , LinearSVC , SVC and\n NuSVC , the class_weight parameter is now an initialization\nparameter, not a parameter to fit. This makes grid searches\nover this parameter possible.\nLFW data is now always shape (n_samples, n_features) to be\nconsistent with the Olivetti faces dataset. Use images and\n pairs attribute to access the natural images shapes instead.\nIn svm.LinearSVC , the meaning of the multi_class parameter\nchanged. Options now are 'ovr' and 'crammer_singer' , with\n 'ovr' being the default. This does not change the default behavior\nbut hopefully is less confusing.\nClass feature_selection.text.Vectorizer is deprecated and\nreplaced by feature_selection.text.TfidfVectorizer .\nThe preprocessor / analyzer nested structure for text feature\nextraction has been removed. All those features are\nnow directly passed as flat constructor arguments\nto feature_selection.text.TfidfVectorizer and\n feature_selection.text.CountVectorizer , in particular the\nfollowing parameters are now used:\nanalyzer can be 'word' or 'char' to switch the default\nanalysis scheme, or use a specific python callable (as previously).\ntokenizer and preprocessor have been introduced to make it\nstill possible to customize those steps with the new API.\ninput explicitly control how to interpret the sequence passed to\n fit and predict : filenames, file objects or direct (byte or\nUnicode) strings.\ncharset decoding is explicit and strict by default.\nthe vocabulary , fitted or not is now stored in the\n vocabulary_ attribute to be consistent with the project\nconventions.\nClass feature_selection.text.TfidfVectorizer now derives directly\nfrom feature_selection.text.CountVectorizer to make grid\nsearch trivial.\nmethods rvs in _BaseHMM module are now deprecated.\n sample should be used instead.\nBeam pruning option in _BaseHMM module is removed since it is\ndifficult to be Cythonized. If you are interested, you can look in the\nhistory codes by git.\nThe SVMlight format loader now supports files with both zero-based and\none-based column indices, since both occur \u201cin the wild\u201d.\nArguments in class ShuffleSplit are now consistent with\n StratifiedShuffleSplit . Arguments test_fraction and\n train_fraction are deprecated and renamed to test_size and\n train_size and can accept both float and int .\nArguments in class Bootstrap are now consistent with\n StratifiedShuffleSplit . Arguments n_test and\n n_train are deprecated and renamed to test_size and\n train_size and can accept both float and int .\nArgument p added to classes in Nearest Neighbors to specify an\narbitrary Minkowski metric for nearest neighbors searches.\nPeople \u00b6\n282 Andreas M\u00fcller\n239 Peter Prettenhofer\n198 Gael Varoquaux\n129 Olivier Grisel\n114 Mathieu Blondel\n103 Clay Woolam\n96 Lars Buitinck\n88 Jaques Grobler\n82 Alexandre Gramfort\n50 Bertrand Thirion\n42 Robert Layton\n28 flyingimmidev\n26 Jake Vanderplas\n26 Shiqiao Du\n21 Satrajit Ghosh\n17 David Marek\n17 Gilles Louppe\n14 Vlad Niculae\n11 Yannick Schwartz\n10 Fabian Pedregosa\n9 fcostin\n7 Nick Wilson\n5 Adrien Gaidon\n5 Nicolas Pinto\n4 David Warde-Farley\n5 Nelle Varoquaux\n5 Emmanuelle Gouillart\n3 Joonas Sillanp\u00e4\u00e4\n3 Paolo Losi\n2 Charles McCarthy\n2 Roy Hyunjin Han\n2 Scott White\n2 ibayer\n1 Brandyn White\n1 Carlos Scheidegger\n1 Claire Revillet\n1 Conrad Lee\n1 Edouard Duchesnay\n1 Jan Hendrik Metzen\n1 Meng Xinfan\n1 Rob Zinkov\n1 Shiqiao\n1 Udi Weinsberg\n1 Virgile Fritsch\n1 Xinfan Meng\n1 Yaroslav Halchenko\n1 jansoe\n1 Leon Palafox\nVersion 0.10 \u00b6\nJanuary 11, 2012\nChangelog \u00b6\nPython 2.5 compatibility was dropped; the minimum Python version needed\nto use scikit-learn is now 2.6.\nSparse inverse covariance estimation using the graph Lasso, with\nassociated cross-validated estimator, by Gael Varoquaux\nNew Tree module by Brian Holt , Peter Prettenhofer ,\n Satrajit Ghosh and Gilles Louppe . The module comes with complete\ndocumentation and examples.\nFixed a bug in the RFE module by Gilles Louppe (issue #378).\nFixed a memory leak in Support Vector Machines module by Brian Holt (issue #367).\nFaster tests by Fabian Pedregosa and others.\nSilhouette Coefficient cluster analysis evaluation metric added as\n sklearn.metrics.silhouette_score by Robert Layton.\nFixed a bug in K-means in the handling of the n_init parameter:\nthe clustering algorithm used to be run n_init times but the last\nsolution was retained instead of the best solution by Olivier Grisel .\nMinor refactoring in Stochastic Gradient Descent module; consolidated dense and sparse\npredict methods; Enhanced test time performance by converting model\nparameters to fortran-style arrays after fitting (only multi-class).\nAdjusted Mutual Information metric added as\n sklearn.metrics.adjusted_mutual_info_score by Robert Layton.\nModels like SVC/SVR/LinearSVC/LogisticRegression from libsvm/liblinear\nnow support scaling of C regularization parameter by the number of\nsamples by Alexandre Gramfort .\nNew Ensemble Methods module by Gilles Louppe and\n Brian Holt . The module comes with the random forest algorithm and the\nextra-trees method, along with documentation and examples.\nNovelty and Outlier Detection : outlier and novelty detection, by\n Virgile Fritsch .\nKernel Approximation : a transform implementing kernel\napproximation for fast SGD on non-linear kernels by\n Andreas M\u00fcller .\nFixed a bug due to atom swapping in Orthogonal Matching Pursuit (OMP) by Vlad Niculae .\nSparse coding with a precomputed dictionary by Vlad Niculae .\nMini Batch K-Means performance improvements by Olivier Grisel .\nK-means support for sparse matrices by Mathieu Blondel .\nImproved documentation for developers and for the sklearn.utils \nmodule, by Jake Vanderplas .\nVectorized 20newsgroups dataset loader\n( sklearn.datasets.fetch_20newsgroups_vectorized ) by\n Mathieu Blondel .\nMulticlass and multilabel algorithms by Lars Buitinck .\nUtilities for fast computation of mean and variance for sparse matrices\nby Mathieu Blondel .\nMake sklearn.preprocessing.scale and\n sklearn.preprocessing.Scaler work on sparse matrices by\n Olivier Grisel\nFeature importances using decision trees and/or forest of trees,\nby Gilles Louppe .\nParallel implementation of forests of randomized trees by\n Gilles Louppe .\nsklearn.cross_validation.ShuffleSplit can subsample the train\nsets as well as the test sets by Olivier Grisel .\nErrors in the build of the documentation fixed by Andreas M\u00fcller .\nAPI changes summary \u00b6\nHere are the code migration instructions when upgrading from scikit-learn\nversion 0.9:\nSome estimators that may overwrite their inputs to save memory previously\nhad overwrite_ parameters; these have been replaced with copy_ \nparameters with exactly the opposite meaning.\nThis particularly affects some of the estimators in linear_model .\nThe default behavior is still to copy everything passed in.\nThe SVMlight dataset loader sklearn.datasets.load_svmlight_file no\nlonger supports loading two files at once; use load_svmlight_files \ninstead. Also, the (unused) buffer_mb parameter is gone.\nSparse estimators in the Stochastic Gradient Descent module use dense parameter vector\n coef_ instead of sparse_coef_ . This significantly improves\ntest time performance.\nThe Covariance estimation module now has a robust estimator of\ncovariance, the Minimum Covariance Determinant estimator.\nCluster evaluation metrics in metrics.cluster have been refactored\nbut the changes are backwards compatible. They have been moved to the\n metrics.cluster.supervised , along with\n metrics.cluster.unsupervised which contains the Silhouette\nCoefficient.\nThe permutation_test_score function now behaves the same way as\n cross_val_score (i.e. uses the mean score across the folds.)\nCross Validation generators now use integer indices ( indices=True )\nby default instead of boolean masks. This make it more intuitive to\nuse with sparse matrix data.\nThe functions used for sparse coding, sparse_encode and\n sparse_encode_parallel have been combined into\n sklearn.decomposition.sparse_encode , and the shapes of the arrays\nhave been transposed for consistency with the matrix factorization setting,\nas opposed to the regression setting.\nFixed an off-by-one error in the SVMlight/LibSVM file format handling;\nfiles generated using sklearn.datasets.dump_svmlight_file should be\nre-generated. (They should continue to work, but accidentally had one\nextra column of zeros prepended.)\nBaseDictionaryLearning class replaced by SparseCodingMixin .\nsklearn.utils.extmath.fast_svd has been renamed\n sklearn.utils.extmath.randomized_svd and the default\noversampling is now fixed to 10 additional random vectors instead\nof doubling the number of components to extract. The new behavior\nfollows the reference paper.\nPeople \u00b6\nThe following people contributed to scikit-learn since last release:\n246 Andreas M\u00fcller\n242 Olivier Grisel\n220 Gilles Louppe\n183 Brian Holt\n166 Gael Varoquaux\n144 Lars Buitinck\n73 Vlad Niculae\n65 Peter Prettenhofer\n64 Fabian Pedregosa\n60 Robert Layton\n55 Mathieu Blondel\n52 Jake Vanderplas\n44 Noel Dawe\n38 Alexandre Gramfort\n24 Virgile Fritsch\n23 Satrajit Ghosh\n3 Jan Hendrik Metzen\n3 Kenneth C. Arnold\n3 Shiqiao Du\n3 Tim Sheerman-Chase\n3 Yaroslav Halchenko\n2 Bala Subrahmanyam Varanasi\n2 DraXus\n2 Michael Eickenberg\n1 Bogdan Trach\n1 F\u00e9lix-Antoine Fortin\n1 Juan Manuel Caicedo Carvajal\n1 Nelle Varoquaux\n1 Nicolas Pinto\n1 Tiziano Zito\n1 Xinfan Meng\nVersion 0.9 \u00b6\nSeptember 21, 2011\nscikit-learn 0.9 was released on September 2011, three months after the 0.8\nrelease and includes the new modules Manifold learning , The Dirichlet Process \nas well as several new algorithms and documentation improvements.\nThis release also includes the dictionary-learning work developed by\n Vlad Niculae as part of the Google Summer of Code program.\nChangelog \u00b6\nNew Manifold learning module by Jake Vanderplas and\n Fabian Pedregosa .\nNew Dirichlet Process Gaussian Mixture\nModel by Alexandre Passos\nNearest Neighbors module refactoring by Jake Vanderplas :\ngeneral refactoring, support for sparse matrices in input, speed and\ndocumentation improvements. See the next section for a full list of API\nchanges.\nImprovements on the Feature selection module by\n Gilles Louppe : refactoring of the RFE classes, documentation\nrewrite, increased efficiency and minor API changes.\nSparse principal components analysis (SparsePCA and MiniBatchSparsePCA) by Vlad Niculae , Gael Varoquaux and\n Alexandre Gramfort\nPrinting an estimator now behaves independently of architectures\nand Python version thanks to Jean Kossaifi .\nLoader for libsvm/svmlight format by\n Mathieu Blondel and Lars Buitinck\nDocumentation improvements: thumbnails in\nexample gallery by Fabian Pedregosa .\nImportant bugfixes in Support Vector Machines module (segfaults, bad\nperformance) by Fabian Pedregosa .\nAdded Multinomial Naive Bayes and Bernoulli Naive Bayes \nby Lars Buitinck\nText feature extraction optimizations by Lars Buitinck\nChi-Square feature selection\n( feature_selection.univariate_selection.chi2 ) by Lars Buitinck .\nSample generators module refactoring by Gilles Louppe\nMulticlass and multilabel algorithms by Mathieu Blondel\nBall tree rewrite by Jake Vanderplas\nImplementation of DBSCAN algorithm by Robert Layton\nKmeans predict and transform by Robert Layton\nPreprocessing module refactoring by Olivier Grisel\nFaster mean shift by Conrad Lee\nNew Bootstrap , Random permutations cross-validation a.k.a. Shuffle & Split and various other\nimprovements in cross validation schemes by Olivier Grisel and\n Gael Varoquaux\nAdjusted Rand index and V-Measure clustering evaluation metrics by Olivier Grisel\nAdded Orthogonal Matching Pursuit by Vlad Niculae\nAdded 2D-patch extractor utilities in the Feature extraction module by Vlad Niculae\nImplementation of linear_model.LassoLarsCV \n(cross-validated Lasso solver using the Lars algorithm) and\n linear_model.LassoLarsIC (BIC/AIC model\nselection in Lars) by Gael Varoquaux \nand Alexandre Gramfort\nScalability improvements to metrics.roc_curve by Olivier Hervieu\nDistance helper functions metrics.pairwise.pairwise_distances \nand metrics.pairwise.pairwise_kernels by Robert Layton\nMini-Batch K-Means by Nelle Varoquaux and Peter Prettenhofer.\nDownloading datasets from the mldata.org repository utilities by Pietro Berkes.\nThe Olivetti faces dataset by David Warde-Farley .\nAPI changes summary \u00b6\nHere are the code migration instructions when upgrading from scikit-learn\nversion 0.8:\nThe scikits.learn package was renamed sklearn . There is\nstill a scikits.learn package alias for backward compatibility.\nThird-party projects with a dependency on scikit-learn 0.9+ should\nupgrade their codebase. For instance, under Linux / MacOSX just run\n(make a backup first!):\nfind - name \"*.py\" | xargs sed - i 's/ \\b scikits.learn \\b /sklearn/g'\nEstimators no longer accept model parameters as fit arguments:\ninstead all parameters must be only be passed as constructor\narguments or using the now public set_params method inherited\nfrom base.BaseEstimator .\nSome estimators can still accept keyword arguments on the fit \nbut this is restricted to data-dependent values (e.g. a Gram matrix\nor an affinity matrix that are precomputed from the X data matrix.\nThe cross_val package has been renamed to cross_validation \nalthough there is also a cross_val package alias in place for\nbackward compatibility.\nThird-party projects with a dependency on scikit-learn 0.9+ should\nupgrade their codebase. For instance, under Linux / MacOSX just run\n(make a backup first!):\nfind - name \"*.py\" | xargs sed - i 's/ \\b cross_val \\b /cross_validation/g'\nThe score_func argument of the\n sklearn.cross_validation.cross_val_score function is now expected\nto accept y_test and y_predicted as only arguments for\nclassification and regression tasks or X_test for unsupervised\nestimators.\ngamma parameter for support vector machine algorithms is set\nto 1 / n_features by default, instead of 1 / n_samples .\nThe sklearn.hmm has been marked as orphaned: it will be removed\nfrom scikit-learn in version 0.11 unless someone steps up to\ncontribute documentation, examples and fix lurking numerical\nstability issues.\nsklearn.neighbors has been made into a submodule. The two previously\navailable estimators, NeighborsClassifier and NeighborsRegressor \nhave been marked as deprecated. Their functionality has been divided\namong five new classes: NearestNeighbors for unsupervised neighbors\nsearches, KNeighborsClassifier & RadiusNeighborsClassifier \nfor supervised classification problems, and KNeighborsRegressor \n& RadiusNeighborsRegressor for supervised regression problems.\nsklearn.ball_tree.BallTree has been moved to\n sklearn.neighbors.BallTree . Using the former will generate a warning.\nsklearn.linear_model.LARS() and related classes (LassoLARS,\nLassoLARSCV, etc.) have been renamed to\n sklearn.linear_model.Lars() .\nAll distance metrics and kernels in sklearn.metrics.pairwise now have a Y\nparameter, which by default is None. If not given, the result is the distance\n(or kernel similarity) between each sample in Y. If given, the result is the\npairwise distance (or kernel similarity) between samples in X to Y.\nsklearn.metrics.pairwise.l1_distance is now called manhattan_distance ,\nand by default returns the pairwise distance. For the component wise distance,\nset the parameter sum_over_features to False .\nBackward compatibility package aliases and other deprecated classes and\nfunctions will be removed in version 0.11.\nPeople \u00b6\n38 people contributed to this release.\n387 Vlad Niculae\n320 Olivier Grisel\n192 Lars Buitinck\n179 Gael Varoquaux\n168 Fabian Pedregosa ( INRIA , Parietal Team )\n127 Jake Vanderplas\n120 Mathieu Blondel\n85 Alexandre Passos\n67 Alexandre Gramfort\n57 Peter Prettenhofer\n56 Gilles Louppe\n42 Robert Layton\n38 Nelle Varoquaux\n32 Jean Kossaifi\n30 Conrad Lee\n22 Pietro Berkes\n18 andy\n17 David Warde-Farley\n12 Brian Holt\n11 Robert\n8 Amit Aides\n8 Virgile Fritsch\n7 Yaroslav Halchenko\n6 Salvatore Masecchia\n5 Paolo Losi\n4 Vincent Schut\n3 Alexis Metaireau\n3 Bryan Silverthorn\n3 Andreas M\u00fcller\n2 Minwoo Jake Lee\n1 Emmanuelle Gouillart\n1 Keith Goodman\n1 Lucas Wiman\n1 Nicolas Pinto\n1 Thouis (Ray) Jones\n1 Tim Sheerman-Chase\nVersion 0.8 \u00b6\nMay 11, 2011\nscikit-learn 0.8 was released on May 2011, one month after the first\n\u201cinternational\u201d scikit-learn coding sprint and is\nmarked by the inclusion of important modules: Hierarchical clustering ,\n Cross decomposition , Non-negative matrix factorization (NMF or NNMF) , initial support for Python 3 and by important\nenhancements and bug fixes.\nChangelog \u00b6\nSeveral new modules where introduced during this release:\nNew Hierarchical clustering module by Vincent Michel,\n Bertrand Thirion , Alexandre Gramfort and Gael Varoquaux .\nKernel PCA implementation by Mathieu Blondel\nThe Labeled Faces in the Wild face recognition dataset by Olivier Grisel .\nNew Cross decomposition module by Edouard Duchesnay .\nNon-negative matrix factorization (NMF or NNMF) module Vlad Niculae\nImplementation of the Oracle Approximating Shrinkage algorithm by\n Virgile Fritsch in the Covariance estimation module.\nSome other modules benefited from significant improvements or cleanups.\nInitial support for Python 3: builds and imports cleanly,\nsome modules are usable while others have failing tests by Fabian Pedregosa .\ndecomposition.PCA is now usable from the Pipeline object by Olivier Grisel .\nGuide How to optimize for speed by Olivier Grisel .\nFixes for memory leaks in libsvm bindings, 64-bit safer BallTree by Lars Buitinck.\nbug and style fixing in K-means algorithm by Jan Schl\u00fcter.\nAdd attribute converged to Gaussian Mixture Models by Vincent Schut.\nImplemented transform , predict_log_proba in\n discriminant_analysis.LinearDiscriminantAnalysis By Mathieu Blondel .\nRefactoring in the Support Vector Machines module and bug fixes by Fabian Pedregosa ,\n Gael Varoquaux and Amit Aides.\nRefactored SGD module (removed code duplication, better variable naming),\nadded interface for sample weight by Peter Prettenhofer .\nWrapped BallTree with Cython by Thouis (Ray) Jones.\nAdded function svm.l1_min_c by Paolo Losi.\nTypos, doc style, etc. by Yaroslav Halchenko , Gael Varoquaux ,\n Olivier Grisel , Yann Malet, Nicolas Pinto , Lars Buitinck and\n Fabian Pedregosa .\nPeople \u00b6\nPeople that made this release possible preceded by number of commits:\n159 Olivier Grisel\n96 Gael Varoquaux\n96 Vlad Niculae\n94 Fabian Pedregosa\n36 Alexandre Gramfort\n32 Paolo Losi\n31 Edouard Duchesnay\n30 Mathieu Blondel\n25 Peter Prettenhofer\n22 Nicolas Pinto\n11 Virgile Fritsch\n7 Lars Buitinck\n6 Vincent Michel\n5 Bertrand Thirion\n4 Thouis (Ray) Jones\n4 Vincent Schut\n3 Jan Schl\u00fcter\n2 Julien Miotte\n2 Matthieu Perrot\n2 Yann Malet\n2 Yaroslav Halchenko\n1 Amit Aides\n1 Andreas M\u00fcller\n1 Feth Arezki\n1 Meng Xinfan\nVersion 0.7 \u00b6\nMarch 2, 2011\nscikit-learn 0.7 was released in March 2011, roughly three months\nafter the 0.6 release. This release is marked by the speed\nimprovements in existing algorithms like k-Nearest Neighbors and\nK-Means algorithm and by the inclusion of an efficient algorithm for\ncomputing the Ridge Generalized Cross Validation solution. Unlike the\npreceding release, no new modules where added to this release.\nChangelog \u00b6\nPerformance improvements for Gaussian Mixture Model sampling [Jan\nSchl\u00fcter].\nImplementation of efficient leave-one-out cross-validated Ridge in\n linear_model.RidgeCV [ Mathieu Blondel ]\nBetter handling of collinearity and early stopping in\n linear_model.lars_path [ Alexandre Gramfort and Fabian\nPedregosa ].\nFixes for liblinear ordering of labels and sign of coefficients\n[Dan Yamins, Paolo Losi, Mathieu Blondel and Fabian Pedregosa ].\nPerformance improvements for Nearest Neighbors algorithm in\nhigh-dimensional spaces [ Fabian Pedregosa ].\nPerformance improvements for cluster.KMeans [ Gael\nVaroquaux and James Bergstra ].\nSanity checks for SVM-based classes [ Mathieu Blondel ].\nRefactoring of neighbors.NeighborsClassifier and\n neighbors.kneighbors_graph : added different algorithms for\nthe k-Nearest Neighbor Search and implemented a more stable\nalgorithm for finding barycenter weights. Also added some\ndeveloper documentation for this module, see\n notes_neighbors for more information [ Fabian Pedregosa ].\nDocumentation improvements: Added pca.RandomizedPCA and\n linear_model.LogisticRegression to the class\nreference. Also added references of matrices used for clustering\nand other fixes [ Gael Varoquaux , Fabian Pedregosa , Mathieu\nBlondel , Olivier Grisel , Virgile Fritsch , Emmanuelle\nGouillart]\nBinded decision_function in classes that make use of liblinear ,\ndense and sparse variants, like svm.LinearSVC or\n linear_model.LogisticRegression [ Fabian Pedregosa ].\nPerformance and API improvements to\n metrics.euclidean_distances and to\n pca.RandomizedPCA [ James Bergstra ].\nFix compilation issues under NetBSD [Kamel Ibn Hassen Derouiche]\nAllow input sequences of different lengths in hmm.GaussianHMM \n[ Ron Weiss ].\nFix bug in affinity propagation caused by incorrect indexing [Xinfan Meng]\nPeople \u00b6\nPeople that made this release possible preceded by number of commits:\n85 Fabian Pedregosa\n67 Mathieu Blondel\n20 Alexandre Gramfort\n19 James Bergstra\n14 Dan Yamins\n13 Olivier Grisel\n12 Gael Varoquaux\n4 Edouard Duchesnay\n4 Ron Weiss\n2 Satrajit Ghosh\n2 Vincent Dubourg\n1 Emmanuelle Gouillart\n1 Kamel Ibn Hassen Derouiche\n1 Paolo Losi\n1 VirgileFritsch\n1 Yaroslav Halchenko\n1 Xinfan Meng\nVersion 0.6 \u00b6\nDecember 21, 2010\nscikit-learn 0.6 was released on December 2010. It is marked by the\ninclusion of several new modules and a general renaming of old\nones. It is also marked by the inclusion of new example, including\napplications to real-world datasets.\nChangelog \u00b6\nNew stochastic gradient descent\nmodule by Peter Prettenhofer. The module comes with complete\ndocumentation and examples.\nImproved svm module: memory consumption has been reduced by 50%,\nheuristic to automatically set class weights, possibility to\nassign weights to samples (see\n SVM: Weighted samples for an example).\nNew Gaussian Processes module by Vincent Dubourg. This module\nalso has great documentation and some very neat examples. See\nexample_gaussian_process_plot_gp_regression.py or\nexample_gaussian_process_plot_gp_probabilistic_classification_after_regression.py\nfor a taste of what can be done.\nIt is now possible to use liblinear\u2019s Multi-class SVC (option\nmulti_class in svm.LinearSVC )\nNew features and performance improvements of text feature\nextraction.\nImproved sparse matrix support, both in main classes\n( grid_search.GridSearchCV ) as in modules\nsklearn.svm.sparse and sklearn.linear_model.sparse.\nLots of cool new examples and a new section that uses real-world\ndatasets was created. These include:\n Faces recognition example using eigenfaces and SVMs ,\n Species distribution modeling ,\n Libsvm GUI ,\n Wikipedia principal eigenvector and\nothers.\nFaster Least Angle Regression algorithm. It is now 2x\nfaster than the R version on worst case and up to 10x times faster\non some cases.\nFaster coordinate descent algorithm. In particular, the full path\nversion of lasso ( linear_model.lasso_path ) is more than\n200x times faster than before.\nIt is now possible to get probability estimates from a\n linear_model.LogisticRegression model.\nmodule renaming: the glm module has been renamed to linear_model,\nthe gmm module has been included into the more general mixture\nmodel and the sgd module has been included in linear_model.\nLots of bug fixes and documentation improvements.\nPeople \u00b6\nPeople that made this release possible preceded by number of commits:\n207 Olivier Grisel\n167 Fabian Pedregosa\n97 Peter Prettenhofer\n68 Alexandre Gramfort\n59 Mathieu Blondel\n55 Gael Varoquaux\n33 Vincent Dubourg\n21 Ron Weiss\n9 Bertrand Thirion\n3 Alexandre Passos\n3 Anne-Laure Fouque\n2 Ronan Amicel\n1 Christian Osendorfer\nVersion 0.5 \u00b6\nOctober 11, 2010\nChangelog \u00b6\nNew classes \u00b6\nSupport for sparse matrices in some classifiers of modules\n svm and linear_model (see svm.sparse.SVC ,\n svm.sparse.SVR , svm.sparse.LinearSVC ,\n linear_model.sparse.Lasso , linear_model.sparse.ElasticNet )\nNew pipeline.Pipeline object to compose different estimators.\nRecursive Feature Elimination routines in module\n Feature selection .\nAddition of various classes capable of cross validation in the\nlinear_model module ( linear_model.LassoCV , linear_model.ElasticNetCV ,\netc.).\nNew, more efficient LARS algorithm implementation. The Lasso\nvariant of the algorithm is also implemented. See\n linear_model.lars_path , linear_model.Lars and\n linear_model.LassoLars .\nNew Hidden Markov Models module (see classes\n hmm.GaussianHMM , hmm.MultinomialHMM ,\n hmm.GMMHMM )\nNew module feature_extraction (see class reference )\nNew FastICA algorithm in module sklearn.fastica\nDocumentation \u00b6\nImproved documentation for many modules, now separating\nnarrative documentation from the class reference. As an example,\nsee documentation for the SVM module and the\ncomplete class reference .\nFixes \u00b6\nAPI changes: adhere variable names to PEP-8, give more\nmeaningful names.\nFixes for svm module to run on a shared memory context\n(multiprocessing).\nIt is again possible to generate latex (and thus PDF) from the\nsphinx docs.\nExamples \u00b6\nnew examples using some of the mlcomp datasets:\n sphx_glr_auto_examples_mlcomp_sparse_document_classification.py (since removed) and\n Classification of text documents using sparse features\nMany more examples. See here \nthe full list of examples.\nExternal dependencies \u00b6\nJoblib is now a dependency of this package, although it is\nshipped with (sklearn.externals.joblib).\nRemoved modules \u00b6\nModule ann (Artificial Neural Networks) has been removed from\nthe distribution. Users wanting this sort of algorithms should\ntake a look into pybrain.\nMisc \u00b6\nNew sphinx theme for the web page.\nAuthors \u00b6\nThe following is a list of authors for this release, preceded by\nnumber of commits:\n262 Fabian Pedregosa\n240 Gael Varoquaux\n149 Alexandre Gramfort\n116 Olivier Grisel\n40 Vincent Michel\n38 Ron Weiss\n23 Matthieu Perrot\n10 Bertrand Thirion\n7 Yaroslav Halchenko\n9 VirgileFritsch\n6 Edouard Duchesnay\n4 Mathieu Blondel\n1 Ariel Rokem\n1 Matthieu Brucher\nVersion 0.4 \u00b6\nAugust 26, 2010\nChangelog \u00b6\nMajor changes in this release include:\nCoordinate Descent algorithm (Lasso, ElasticNet) refactoring &\nspeed improvements (roughly 100x times faster).\nCoordinate Descent Refactoring (and bug fixing) for consistency\nwith R\u2019s package GLMNET.\nNew metrics module.\nNew GMM module contributed by Ron Weiss.\nImplementation of the LARS algorithm (without Lasso variant for now).\nfeature_selection module redesign.\nMigration to GIT as version control system.\nRemoval of obsolete attrselect module.\nRename of private compiled extensions (added underscore).\nRemoval of legacy unmaintained code.\nDocumentation improvements (both docstring and rst).\nImprovement of the build system to (optionally) link with MKL.\nAlso, provide a lite BLAS implementation in case no system-wide BLAS is\nfound.\nLots of new examples.\nMany, many bug fixes \u2026\nAuthors \u00b6\nThe committer list for this release is the following (preceded by number\nof commits):\n143 Fabian Pedregosa\n35 Alexandre Gramfort\n34 Olivier Grisel\n11 Gael Varoquaux\n5 Yaroslav Halchenko\n2 Vincent Michel\n1 Chris Filo Gorgolewski\nEarlier versions \u00b6\nEarlier versions included contributions by Fred Mailhot, David Cooke,\nDavid Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.\n\u00a9 2007 - 2017, scikit-learn developers (BSD License).\n Show this page source\nPrevious\nNext\n"
    },
    {
      "turn_idx": 8,
      "action": "file_saver({\"content\": \"Title: Information from Scikit-Learn July 2017 changelog via scikit-learn.org\\n\\nSources:\\n- https://scikit-learn.org/stable/whats_new/v0.19.html (Version 0.19 \u2014 scikit-learn 1.8.0 documentation)\\n- https://scikit-learn.org/0.19/whats_new.html (Release history \u2014 scikit-learn 0.19.2 documentation)\\n\\nExtracted Content (direct quotes with exact source attribution):\\n\\n1) Source: https://scikit-learn.org/stable/whats_new/v0.19.html\\n- \\\"Version 0.19 \u2014 August 12, 2017\\\"\\n- Under Bug fixes \u2192 Other predictors:\\n  \\\"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 by Andre Ambrosio Boechat, Utkarsh Upadhyay, and Joel Nothman.\\\"\\n- Under Changed models:\\n  \\\"semi_supervised.LabelSpreading (bug fix)\\\" and \\\"semi_supervised.LabelPropagation (bug fix)\\\"\\n\\n2) Source: https://scikit-learn.org/0.19/whats_new.html\\n- \\\"Version 0.19 \u2014 August 12, 2017\\\"\\n- Under Bug fixes \u2192 Other predictors:\\n  \\\"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 by Andre Ambrosio Boechat, Utkarsh Upadhyay, and Joel Nothman.\\\"\\n\\nNotes:\\n- The news page lists: \\\"July 2017. scikit-learn 0.19.0 is available for download (Changelog)\\\", and Version 0.19 changelog includes the \\\"Other predictors\\\" section containing the BaseLabelPropagation bug fix.\\n\", \"file_path\": \"work_space/gaia_d0633230_20251221_193229_869271/search_results_summary_step1.md\", \"mode\": \"w\"})",
      "explanation": "",
      "observation": "Content successfully saved to work_space/gaia_d0633230_20251221_193229_869271/search_results_summary_step1.md"
    },
    {
      "turn_idx": 9,
      "action": "mark_step({\"step_index\": 1, \"step_status\": \"completed\", \"step_notes\": \"Retrieved and extracted the Scikit-Learn July 2017 changelog (Version 0.19) from official sources. Saved consolidated extraction as work_space/gaia_d0633230_20251221_193229_869271/search_results_summary_step1.md. Key relevant section identifies an 'Other predictors' bug fix entry referencing 'semi_supervised.BaseLabelPropagation'. Proceeding to locate and extract the exact command/name requested in Step 2.\"})",
      "explanation": "",
      "observation": "Step 1: step_status is completed, step_notes is Retrieved and extracted the Scikit-Learn July 2017 changelog (Version 0.19) from official sources. Saved consolidated extraction as work_space/gaia_d0633230_20251221_193229_869271/search_results_summary_step1.md. Key relevant section identifies an 'Other predictors' bug fix entry referencing 'semi_supervised.BaseLabelPropagation'. Proceeding to locate and extract the exact command/name requested in Step 2. "
    },
    {
      "turn_idx": 10,
      "action": "file_read({\"file\": \"work_space/gaia_d0633230_20251221_193229_869271/search_results_summary_step1.md\"})",
      "explanation": "Located and extracted the bug fix entry under Bug fixes \u2192 Other predictors from the scikit-learn v0.19 (August 12, 2017) changelog. The entry states: \"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 ...\" The requested name (not a path) is \"BaseLabelPropagation\". Saved evidence file at work_spacegaia_d0633230_20251221_193229_869271/extract_step2_other_predictor_base_command.md.",
      "observation": "Title: Information from Scikit-Learn July 2017 changelog via scikit-learn.org\n\nSources:\n- https://scikit-learn.org/stable/whats_new/v0.19.html (Version 0.19 \u2014 scikit-learn 1.8.0 documentation)\n- https://scikit-learn.org/0.19/whats_new.html (Release history \u2014 scikit-learn 0.19.2 documentation)\n\nExtracted Content (direct quotes with exact source attribution):\n\n1) Source: https://scikit-learn.org/stable/whats_new/v0.19.html\n- \"Version 0.19 \u2014 August 12, 2017\"\n- Under Bug fixes \u2192 Other predictors:\n  \"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 by Andre Ambrosio Boechat, Utkarsh Upadhyay, and Joel Nothman.\"\n- Under Changed models:\n  \"semi_supervised.LabelSpreading (bug fix)\" and \"semi_supervised.LabelPropagation (bug fix)\"\n\n2) Source: https://scikit-learn.org/0.19/whats_new.html\n- \"Version 0.19 \u2014 August 12, 2017\"\n- Under Bug fixes \u2192 Other predictors:\n  \"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 by Andre Ambrosio Boechat, Utkarsh Upadhyay, and Joel Nothman.\"\n\nNotes:\n- The news page lists: \"July 2017. scikit-learn 0.19.0 is available for download (Changelog)\", and Version 0.19 changelog includes the \"Other predictors\" section containing the BaseLabelPropagation bug fix.\n"
    },
    {
      "turn_idx": 11,
      "action": "file_saver({\"content\": \"Title: Bug fix entry for Other predictors (scikit-learn v0.19 changelog)\\n\\nSources:\\n- https://scikit-learn.org/stable/whats_new/v0.19.html\\n- https://scikit-learn.org/0.19/whats_new.html\\n\\nDirect quotation (Bug fixes \u2192 Other predictors):\\n- \\\"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 by Andre Ambrosio Boechat, Utkarsh Upadhyay, and Joel Nothman.\\\"\\n\\nIdentified command name (not a path):\\n- BaseLabelPropagation\\n\", \"file_path\": \"work_space/gaia_d0633230_20251221_193229_869271/extract_step2_other_predictor_base_command.md\", \"mode\": \"w\"})",
      "explanation": "",
      "observation": "Content successfully saved to work_space/gaia_d0633230_20251221_193229_869271/extract_step2_other_predictor_base_command.md"
    },
    {
      "turn_idx": 12,
      "action": "mark_step({\"step_index\": 2, \"step_status\": \"completed\", \"step_notes\": \"Located and extracted the bug fix entry under Bug fixes \u2192 Other predictors from the scikit-learn v0.19 (August 12, 2017) changelog. The entry states: \\\"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 ...\\\" The requested name (not a path) is \\\"BaseLabelPropagation\\\". Saved evidence file at work_space/gaia_d0633230_20251221_193229_869271/extract_step2_other_predictor_base_command.md.\"})",
      "explanation": "",
      "observation": "Step 2: step_status is completed, step_notes is Located and extracted the bug fix entry under Bug fixes \u2192 Other predictors from the scikit-learn v0.19 (August 12, 2017) changelog. The entry states: \"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 ...\" The requested name (not a path) is \"BaseLabelPropagation\". Saved evidence file at work_space/gaia_d0633230_20251221_193229_869271/extract_step2_other_predictor_base_command.md. "
    },
    {
      "turn_idx": 13,
      "action": "tavily_search({\"query\": \"scikit-learn bug fix BaseLabelPropagation changelog 2017 #9239 LabelSpreading LabelPropagation\"})",
      "explanation": "Cross-verified the identified command name using multiple authoritative sources. The scikit-learn v0.19 (August 12, 2017) changelog under Bug fixes \u2192 Other predictors explicitly states: \"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 ...\". Secondary confirmation via the 0.19 _sources RST text, the 1.3 archived sources, and the GitHub PR #9239 corroborate this. Consolidated extraction saved at: work_space/gaia_d0633230_20251221_193229_869271/info_scikit-learn July 2017 changelog_tavily.md. Ready to finalize with the required name only.",
      "observation": "{'query': 'scikit-learn bug fix BaseLabelPropagation changelog 2017 #9239 LabelSpreading LabelPropagation', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://www.scribd.com/document/446399914/Scikit-Learn-Docs', 'title': 'Scikit Learn Docs | PDF | Thread (Computing)', 'content': '... (bug fix). 1.12.2 Changelog. sklearn.compose. \u2022 [F IX ] Fixed an issue in ... \u2022 semi_supervised.LabelSpreading (bug fix) \u2022 semi_supervised.LabelPropagation ...Read more', 'score': 0.6633424, 'raw_content': None}, {'url': 'https://github.com/scikit-learn/scikit-learn/issues/9292', 'title': '[BUG] Label propagation sometimes produces ...', 'content': 'Invalid value encountered in true_divide through when calling fit on LabelSpreading. After convergence, the label distribution for some samples ...Read more', 'score': 0.5818027, 'raw_content': None}, {'url': 'https://github.com/scikit-learn/scikit-learn/issues/9722', 'title': 'Bad results for LabelPropagation (and LabelSpreading) ...', 'content': 'The best way to confirm this bug would be to generate some data using the alternate kernel and then see whether LabelPropagation or ...Read more', 'score': 0.4765072, 'raw_content': None}, {'url': 'https://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelSpreading.html', 'title': 'LabelSpreading', 'content': 'LabelSpreading model for semi-supervised learning. This model is similar to the basic Label Propagation algorithm, but uses affinity matrix based on the ...Read more', 'score': 0.38069043, 'raw_content': None}, {'url': 'https://stackoverflow.com/questions/20081149/label-propagation-in-sklearn-is-classifying-every-vector-as-1', 'title': 'Label Propagation in sklearn is classifying every vector as 1', 'content': '# Label Propagation in sklearn is classifying every vector as 1 I have 2000 labelled data (7 different labels) and about 100K unlabeled data and I am trying to use sklearn.semi\\\\_supervised.LabelPropagation. > /usr/local/lib/python2.7/site-packages/sklearn/semi\\\\_supervised/label\\\\_propagation.py:255: RuntimeWarning: invalid value encountered in divide self.label\\\\_distributions\\\\_ /= normalizer Commented May 20, 2018 at 16:43 ## 2 Answers 2 ## 3 Comments Stack Overflow has a question-and-answer format, not a discussion thread format. I agree with you, but I worked on this problem for some time and found no answer on the web. Thank you for your leniency, but maybe I can reduce my \"answer\" and then you could put it as a comment ? ## Comments Find the answer to your question by asking. Scikit-Learn: Label not x is present in all training examples', 'score': 0.37188962, 'raw_content': None}], 'response_time': 0.73, 'request_id': '2150fe3b-e56d-42ce-b3cd-53c3614cd591'}"
    },
    {
      "turn_idx": 14,
      "action": "tavily_search({\"query\": \"scikit-learn 0.19 changelog \\\"Other predictors\\\" bug fix BaseLabelPropagation\"})",
      "explanation": "",
      "observation": "{'query': 'scikit-learn 0.19 changelog \"Other predictors\" bug fix BaseLabelPropagation', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://www.scribd.com/document/773770354/scikit-learn-docs-1', 'title': 'Scikit Learn Docs 1 | PDF - Scribd', 'content': 'Rahman and Manoj Kumar. Other predictors \u2022 Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in', 'score': 0.9998623, 'raw_content': None}, {'url': 'https://scikit-learn.org/0.19/_downloads/scikit-learn-docs.pdf', 'title': '[PDF] scikit-learn user guide', 'content': '... error by default. #6573 by Quazi Marufur \u00b7 Rahman and Manoj Kumar. Other predictors. \u2022 Fix semi_supervised.BaseLabelPropagation to correctly', 'score': 0.99985576, 'raw_content': None}, {'url': 'https://scikit-learn.org/0.19/_sources/whats_new.rst.txt', 'title': 'https://scikit-learn.org/0.19/_sources/whats_new.r...', 'content': 'This is a bug-fix release with some minor documentation improvements and enhancements to features released in 0.19 ... Other predictors - Custom metrics for the :', 'score': 0.99958915, 'raw_content': None}, {'url': 'https://scikit-learn.org/1.3/_sources/whats_new/v0.19.rst.txt', 'title': 'v0.19.rst.txt - Scikit-learn', 'content': 'By `Tom Dupre la Tour`_. Other predictors - `neighbors.LSHForest` has been deprecated and will be removed in 0.21 due to poor performance. :issue:`9078` by :', 'score': 0.99933845, 'raw_content': None}, {'url': 'https://scikit-learn.org/stable/whats_new/v0.19.html', 'title': 'Version 0.19 \u2014 scikit-learn 1.8.0 documentation', 'content': '* Fixed a bug where `model_selection.validation_curve` reused the same estimator for each parameter value. Joel Nothman, Loic Esteve, Andreas Mueller, Guillaume Lemaitre, Olivier Grisel, Hanmin Qin, Raghav RV, Alexandre Gramfort, themrmax, Aman Dalmia, Gael Varoquaux, Naoya Kanai, Tom Dupr\u00e9 la Tour, Rishikesh, Nelson Liu, Taehoon Lee, Nelle Varoquaux, Aashil, Mikhail Korobov, Sebastin Santy, Joan Massich, Roman Yurchak, RAKOTOARISON Herilalaina, Thierry Guillemot, Alexandre Abadie, Carol Willing, Balakumaran Manoharan, Josh Karnofsky, Vlad Niculae, Utkarsh Upadhyay, Dmitry Petrov, Minghui Liu, Srivatsan, Vincent Pham, Albert Thomas, Jake VanderPlas, Attractadore, JC Liu, alexandercbooth, chkoar, \u00d3scar N\u00e1jera, Aarshay Jain, Kyle Gilliam, Ramana Subramanyam, CJ Carey, Clement Joudet, David Robles, He Chen, Joris Van den Bossche, Karan Desai, Katie Luangkote, Leland McInnes, Maniteja Nandana, Michele Lacchia, Sergei Lebedev, Shubham Bhardwaj, akshay0724, omtcyfz, rickiepark, waterponey, Vathsala Achar, jbDelafosse, Ralf Gommers, Ekaterina Krivich, Vivek Kumar, Ishank Gulati, Dave Elliott, ldirer, Reiichiro Nakano, Levi John Wolf, Mathieu Blondel, Sid Kapur, Dougal J.', 'score': 0.9918121, 'raw_content': None}], 'response_time': 0.62, 'request_id': 'f6ad44fd-286e-48b0-9580-177791270000'}"
    },
    {
      "turn_idx": 15,
      "action": "tavily_search({\"query\": \"scikit-learn July 2017 changelog other predictors bug fix BaseLabelPropagation\"})",
      "explanation": "",
      "observation": "{'query': 'scikit-learn July 2017 changelog other predictors bug fix BaseLabelPropagation', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://www.scribd.com/document/490387773/scikit-learn-docs-pdf', 'title': 'Scikit Learn Docs PDF', 'content': 'Rahman and Manoj Kumar. Other predictors \u2022 Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in ...Read more', 'score': 0.99983525, 'raw_content': None}, {'url': 'https://huggingface.co/spaces/bstraehle/gaia/blob/aa6cfa9b8d5e200aeff9c4d83425fe47c9ca7f24/data/gaia_validation.jsonl', 'title': 'data/gaia_validation.jsonl \u00b7 bstraehle/gaia at ...', 'content': '... Scikit-Learn July 2017 changelog, what other predictor base command received a bug fix? Just give the name, not a path.\", \"Level\": 1, \"Final ...Read more', 'score': 0.9995005, 'raw_content': None}, {'url': 'https://scikit-learn.sourceforge.net/dev/whats_new.html', 'title': 'Release history \u2014 scikit-learn 0.17.dev0 documentation', 'content': '> * Fixed feature importances as computed with a forest of randomized trees when fit with `sample_weight!= None` and/or with `bootstrap=True`. > * Fixed `class_weight` support in `svm.LinearSVC` and `linear_model.LogisticRegression` by Andreas M\u00fcller. > * Fixed API inconsistency: `linear_model.SGDClassifier.predict_proba` now returns 2d array when fit on two classes. > * Improved *Cross-validation: evaluating estimator performance* and *Grid Search: Searching for estimator parameters* documentation and introduced the new `cross_validation.train_test_split` helper function by Olivier Grisel. > * Minor refactoring in *Stochastic Gradient Descent* module; consolidated dense and sparse predict methods; Enhanced test time performance by converting model parameters to fortran-style arrays after fitting (only multi-class). scikit-learn 0.8 was released on May 2011, one month after the first \u201cinternational\u201d scikit-learn coding sprint and is marked by the inclusion of important modules: *Hierarchical clustering*, *Cross decomposition*, *Non-negative matrix factorization (NMF or NNMF)*, initial support for Python 3 and by important enhancements and bug fixes. > * Improved sparse matrix support, both in main classes (`grid_search.GridSearchCV`) as in modules sklearn.svm.sparse and sklearn.linear\\\\_model.sparse.', 'score': 0.99293363, 'raw_content': None}, {'url': 'https://scikit-learn.org/0.19/whats_new.html', 'title': 'Release history \u2014 scikit-learn 0.19.2 documentation', 'content': 'Joel Nothman, Loic Esteve, Andreas Mueller, Kumar Ashutosh, Vrishank Bhardwaj, Hanmin Qin, Rasul Kerimov, James Bourbeau, Nagarjuna Kumar, Nathaniel Saul, Olivier Grisel, Roman Yurchak, Reiichiro Nakano, Sachin Kelkar, Sam Steingold, Yaroslav Halchenko, diegodlh, felix, goncalo-rodrigues, jkleint, oliblum90, pasbi, Anthony Gitter, Ben Lawson, Charlie Brummitt, Didi Bar-Zev, Gael Varoquaux, Joan Massich, Joris Van den Bossche, nielsenmarkus11 scikit-learn 0.8 was released on May 2011, one month after the first \u201cinternational\u201d scikit-learn coding sprint and is marked by the inclusion of important modules: Hierarchical clustering, Cross decomposition, Non-negative matrix factorization (NMF or NNMF), initial support for Python 3 and by important enhancements and bug fixes. * Refactoring in the Support Vector Machines module and bug fixes by Fabian Pedregosa, Gael Varoquaux and Amit Aides. Also added references of matrices used for clustering and other fixes [Gael Varoquaux, Fabian Pedregosa, Mathieu Blondel, Olivier Grisel, Virgile Fritsch , Emmanuelle Gouillart]', 'score': 0.97090924, 'raw_content': None}, {'url': 'https://scikit-learn.org/stable/whats_new.html', 'title': 'Release History', 'content': '# Release History#. Changelogs and release notes for all scikit-learn releases are linked in this page. Subscribe to scikit-learn releases on libraries.io to be notified when new versions are released. + Version 0.22.2.post1. ### This Page.', 'score': 0.75635755, 'raw_content': None}], 'response_time': 0.7, 'request_id': '71db2378-0f7b-4e0e-9828-5a163b485010'}"
    },
    {
      "turn_idx": 16,
      "action": "tavily_search({\"query\": \"site:scikit-learn.org 0.19 changelog bug fixes Other predictors BaseLabelPropagation\"})",
      "explanation": "",
      "observation": "{'query': '0.19 changelog bug fixes Other predictors BaseLabelPropagation', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://scikit-learn.org/0.19/whats_new.html', 'title': 'Release history \u2014 scikit-learn 0.19.2 documentation', 'content': 'Joel Nothman, Loic Esteve, Andreas Mueller, Kumar Ashutosh, Vrishank Bhardwaj, Hanmin Qin, Rasul Kerimov, James Bourbeau, Nagarjuna Kumar, Nathaniel Saul, Olivier Grisel, Roman Yurchak, Reiichiro Nakano, Sachin Kelkar, Sam Steingold, Yaroslav Halchenko, diegodlh, felix, goncalo-rodrigues, jkleint, oliblum90, pasbi, Anthony Gitter, Ben Lawson, Charlie Brummitt, Didi Bar-Zev, Gael Varoquaux, Joan Massich, Joris Van den Bossche, nielsenmarkus11 scikit-learn 0.8 was released on May 2011, one month after the first \u201cinternational\u201d scikit-learn coding sprint and is marked by the inclusion of important modules: Hierarchical clustering, Cross decomposition, Non-negative matrix factorization (NMF or NNMF), initial support for Python 3 and by important enhancements and bug fixes. * Refactoring in the Support Vector Machines module and bug fixes by Fabian Pedregosa, Gael Varoquaux and Amit Aides. Also added references of matrices used for clustering and other fixes [Gael Varoquaux, Fabian Pedregosa, Mathieu Blondel, Olivier Grisel, Virgile Fritsch , Emmanuelle Gouillart]', 'score': 0.9975656, 'raw_content': None}, {'url': 'https://scikit-learn.org/1.3/_sources/whats_new/v0.19.rst.txt', 'title': 'v0.19.rst.txt', 'content': 'Regressions in 0.19.0 fixed in 0.19.1: - Fixed a bug where parallelised prediction in random forests was not thread-safe and could (rarely) result in ...Read more', 'score': 0.9930962, 'raw_content': None}, {'url': 'https://scikit-learn.org/stable/whats_new/v0.19.html', 'title': 'Version 0.19', 'content': '* Fixed a bug where `model_selection.validation_curve` reused the same estimator for each parameter value. Joel Nothman, Loic Esteve, Andreas Mueller, Guillaume Lemaitre, Olivier Grisel, Hanmin Qin, Raghav RV, Alexandre Gramfort, themrmax, Aman Dalmia, Gael Varoquaux, Naoya Kanai, Tom Dupr\u00e9 la Tour, Rishikesh, Nelson Liu, Taehoon Lee, Nelle Varoquaux, Aashil, Mikhail Korobov, Sebastin Santy, Joan Massich, Roman Yurchak, RAKOTOARISON Herilalaina, Thierry Guillemot, Alexandre Abadie, Carol Willing, Balakumaran Manoharan, Josh Karnofsky, Vlad Niculae, Utkarsh Upadhyay, Dmitry Petrov, Minghui Liu, Srivatsan, Vincent Pham, Albert Thomas, Jake VanderPlas, Attractadore, JC Liu, alexandercbooth, chkoar, \u00d3scar N\u00e1jera, Aarshay Jain, Kyle Gilliam, Ramana Subramanyam, CJ Carey, Clement Joudet, David Robles, He Chen, Joris Van den Bossche, Karan Desai, Katie Luangkote, Leland McInnes, Maniteja Nandana, Michele Lacchia, Sergei Lebedev, Shubham Bhardwaj, akshay0724, omtcyfz, rickiepark, waterponey, Vathsala Achar, jbDelafosse, Ralf Gommers, Ekaterina Krivich, Vivek Kumar, Ishank Gulati, Dave Elliott, ldirer, Reiichiro Nakano, Levi John Wolf, Mathieu Blondel, Sid Kapur, Dougal J.', 'score': 0.9891816, 'raw_content': None}, {'url': 'https://scikit-learn.org/stable/whats_new/older_versions.html', 'title': 'Older Versions \u2014 scikit-learn 1.7.2 documentation', 'content': '* Improved sparse matrix support in the Feature selection module by Andreas M\u00fcller. * Minor refactoring in Stochastic Gradient Descent module; consolidated dense and sparse predict methods; Enhanced test time performance by converting model parameters to fortran-style arrays after fitting (only multi-class). scikit-learn 0.9 was released on September 2011, three months after the 0.8 release and includes the new modules Manifold learning, The Dirichlet Process as well as several new algorithms and documentation improvements. * Improvements on the Feature selection module by Gilles Louppe : refactoring of the RFE classes, documentation rewrite, increased efficiency and minor API changes. scikit-learn 0.8 was released on May 2011, one month after the first \u201cinternational\u201d scikit-learn coding sprint and is marked by the inclusion of important modules: Hierarchical clustering, Cross decomposition, Non-negative matrix factorization (NMF or NNMF), initial support for Python 3 and by important enhancements and bug fixes. * Improved sparse matrix support, both in main classes (`GridSearchCV`) as in modules sklearn.svm.sparse and sklearn.linear\\\\_model.sparse.', 'score': 0.715424, 'raw_content': None}, {'url': 'https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html', 'title': 'Perceptron \u2014 scikit-learn 1.8.0 documentation', 'content': 'In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.Read more', 'score': 0.05749328, 'raw_content': None}], 'response_time': 0.65, 'request_id': 'cb585446-03ce-4bb3-835b-2e6f1d3ca8e0'}"
    },
    {
      "turn_idx": 17,
      "action": "fetch_website_content({\"website_url\": \"https://scikit-learn.org/stable/whats_new/v0.19.html\"})",
      "explanation": "",
      "observation": "\nVersion 0.19 \u2014 scikit-learn 1.8.0 documentation\nSkip to main content\nBack to top\nCtrl + K\nInstall\nUser Guide\nAPI\nExamples\nCommunity\nMore\nGetting Started\nRelease History\nGlossary\nDevelopment\nFAQ\nSupport\nRelated Projects\nRoadmap\nGovernance\nAbout us\nGitHub\nChoose version\nInstall\nUser Guide\nAPI\nExamples\nCommunity\nGetting Started\nRelease History\nGlossary\nDevelopment\nFAQ\nSupport\nRelated Projects\nRoadmap\nGovernance\nAbout us\nGitHub\nChoose version\nSection Navigation\nVersion 1.8\nVersion 1.7\nVersion 1.6\nVersion 1.5\nVersion 1.4\nVersion 1.3\nVersion 1.2\nVersion 1.1\nVersion 1.0\nVersion 0.24\nVersion 0.23\nVersion 0.22\nVersion 0.21\nVersion 0.20\nVersion 0.19\nVersion 0.18\nVersion 0.17\nVersion 0.16\nVersion 0.15\nVersion 0.14\nVersion 0.13\nOlder Versions\nRelease History\nVersion 0.19\nVersion 0.19 #\nVersion 0.19.2 #\nJuly, 2018\nThis release is exclusively in order to support Python 3.7.\nRelated changes #\nn_iter_ may vary from previous releases in\n linear_model.LogisticRegression with solver='lbfgs' and\n linear_model.HuberRegressor . For Scipy <= 1.0.0, the optimizer could\nperform more than the requested maximum number of iterations. Now both\nestimators will report at most max_iter iterations even if more were\nperformed. #10723 by Joel Nothman .\nVersion 0.19.1 #\nOctober 23, 2017\nThis is a bug-fix release with some minor documentation improvements and\nenhancements to features released in 0.19.0.\nNote there may be minor differences in TSNE output in this release (due to\n #9623 ), in the case where multiple samples have equal distance to some\nsample.\nChangelog #\nAPI changes #\nReverted the addition of metrics.ndcg_score and metrics.dcg_score \nwhich had been merged into version 0.19.0 by error. The implementations\nwere broken and undocumented.\nreturn_train_score which was added to\n model_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV and\n model_selection.cross_validate in version 0.19.0 will be changing its\ndefault value from True to False in version 0.21. We found that calculating\ntraining score could have a great effect on cross validation runtime in some\ncases. Users should explicitly set return_train_score to False if\nprediction or scoring functions are slow, resulting in a deleterious effect\non CV runtime, or to True if they wish to use the calculated scores.\n #9677 by Kumar Ashutosh and Joel\nNothman .\ncorrelation_models and regression_models from the legacy gaussian\nprocesses implementation have been belatedly deprecated. #9717 by\n Kumar Ashutosh .\nBug fixes #\nAvoid integer overflows in metrics.matthews_corrcoef .\n #9693 by Sam Steingold .\nFixed a bug in the objective function for manifold.TSNE (both exact\nand with the Barnes-Hut approximation) when n_components >= 3 .\n #9711 by @goncalo-rodrigues .\nFix regression in model_selection.cross_val_predict where it\nraised an error with method='predict_proba' for some probabilistic\nclassifiers. #9641 by James Bourbeau .\nFixed a bug where datasets.make_classification modified its input\n weights . #9865 by Sachin Kelkar .\nmodel_selection.StratifiedShuffleSplit now works with multioutput\nmulticlass or multilabel data with more than 1000 columns. #9922 by\n Charlie Brummitt .\nFixed a bug with nested and conditional parameter setting, e.g. setting a\npipeline step and its parameter at the same time. #9945 by Andreas\nM\u00fcller and Joel Nothman .\nRegressions in 0.19.0 fixed in 0.19.1:\nFixed a bug where parallelised prediction in random forests was not\nthread-safe and could (rarely) result in arbitrary errors. #9830 by\n Joel Nothman .\nFix regression in model_selection.cross_val_predict where it no\nlonger accepted X as a list. #9600 by Rasul Kerimov .\nFixed handling of model_selection.cross_val_predict for binary\nclassification with method='decision_function' . #9593 by\n Reiichiro Nakano and core devs.\nFix regression in pipeline.Pipeline where it no longer accepted\n steps as a tuple. #9604 by Joris Van den Bossche .\nFix bug where n_iter was not properly deprecated, leaving n_iter \nunavailable for interim use in\n linear_model.SGDClassifier , linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron . #9558 by Andreas M\u00fcller .\nDataset fetchers make sure temporary files are closed before removing them,\nwhich caused errors on Windows. #9847 by Joan Massich .\nFixed a regression in manifold.TSNE where it no longer supported\nmetrics other than \u2018euclidean\u2019 and \u2018precomputed\u2019. #9623 by Oli\nBlum .\nEnhancements #\nOur test suite and utils.estimator_checks.check_estimator can now be\nrun without Nose installed. #9697 by Joan Massich .\nTo improve usability of version 0.19\u2019s pipeline.Pipeline \ncaching, memory now allows joblib.Memory instances.\nThis make use of the new utils.validation.check_memory helper.\n #9584 by Kumar Ashutosh\nSome fixes to examples: #9750 , #9788 , #9815\nMade a FutureWarning in SGD-based estimators less verbose. #9802 by\n Vrishank Bhardwaj .\nCode and Documentation Contributors #\nWith thanks to:\nJoel Nothman, Loic Esteve, Andreas Mueller, Kumar Ashutosh,\nVrishank Bhardwaj, Hanmin Qin, Rasul Kerimov, James Bourbeau,\nNagarjuna Kumar, Nathaniel Saul, Olivier Grisel, Roman\nYurchak, Reiichiro Nakano, Sachin Kelkar, Sam Steingold,\nYaroslav Halchenko, diegodlh, felix, goncalo-rodrigues,\njkleint, oliblum90, pasbi, Anthony Gitter, Ben Lawson, Charlie\nBrummitt, Didi Bar-Zev, Gael Varoquaux, Joan Massich, Joris\nVan den Bossche, nielsenmarkus11\nVersion 0.19 #\nAugust 12, 2017\nHighlights #\nWe are excited to release a number of great new features including\n neighbors.LocalOutlierFactor for anomaly detection,\n preprocessing.QuantileTransformer for robust feature transformation,\nand the multioutput.ClassifierChain meta-estimator to simply account\nfor dependencies between classes in multilabel problems. We have some new\nalgorithms in existing estimators, such as multiplicative update in\n decomposition.NMF and multinomial\n linear_model.LogisticRegression with L1 loss (use solver='saga' ).\nCross validation is now able to return the results from multiple metric\nevaluations. The new model_selection.cross_validate can return many\nscores on the test data as well as training set performance and timings, and we\nhave extended the scoring and refit parameters for grid/randomized\nsearch to handle multiple metrics .\nYou can also learn faster. For instance, the new option to cache\ntransformations in pipeline.Pipeline makes grid\nsearch over pipelines including slow transformations much more efficient. And\nyou can predict faster: if you\u2019re sure you know what you\u2019re doing, you can turn\noff validating that the input is finite using config_context .\nWe\u2019ve made some important fixes too. We\u2019ve fixed a longstanding implementation\nerror in metrics.average_precision_score , so please be cautious with\nprior results reported from that function. A number of errors in the\n manifold.TSNE implementation have been fixed, particularly in the\ndefault Barnes-Hut approximation. semi_supervised.LabelSpreading and\n semi_supervised.LabelPropagation have had substantial fixes.\nLabelPropagation was previously broken. LabelSpreading should now correctly\nrespect its alpha parameter.\nChanged models #\nThe following estimators and functions, when fit with the same data and\nparameters, may produce different models from the previous version. This often\noccurs due to changes in the modelling logic (bug fixes or enhancements), or in\nrandom sampling procedures.\ncluster.KMeans with sparse X and initial centroids given (bug fix)\ncross_decomposition.PLSRegression \nwith scale=True (bug fix)\nensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor where min_impurity_split is used (bug fix)\ngradient boosting loss='quantile' (bug fix)\nensemble.IsolationForest (bug fix)\nfeature_selection.SelectFdr (bug fix)\nlinear_model.RANSACRegressor (bug fix)\nlinear_model.LassoLars (bug fix)\nlinear_model.LassoLarsIC (bug fix)\nmanifold.TSNE (bug fix)\nneighbors.NearestCentroid (bug fix)\nsemi_supervised.LabelSpreading (bug fix)\nsemi_supervised.LabelPropagation (bug fix)\ntree based models where min_weight_fraction_leaf is used (enhancement)\nmodel_selection.StratifiedKFold with shuffle=True \n(this change, due to #7823 was not mentioned in the release notes at\nthe time)\nDetails are listed in the changelog below.\n(While we are trying to better inform users by providing this information, we\ncannot assure that this list is complete.)\nChangelog #\nNew features #\nClassifiers and regressors\nAdded multioutput.ClassifierChain for multi-label\nclassification. By Adam Kleczewski .\nAdded solver 'saga' that implements the improved version of Stochastic\nAverage Gradient, in linear_model.LogisticRegression and\n linear_model.Ridge . It allows the use of L1 penalty with\nmultinomial logistic loss, and behaves marginally better than \u2018sag\u2019\nduring the first epochs of ridge and logistic regression.\n #8446 by Arthur Mensch .\nOther estimators\nAdded the neighbors.LocalOutlierFactor class for anomaly\ndetection based on nearest neighbors.\n #5279 by Nicolas Goix and Alexandre Gramfort .\nAdded preprocessing.QuantileTransformer class and\n preprocessing.quantile_transform function for features\nnormalization based on quantiles.\n #8363 by Denis Engemann ,\n Guillaume Lemaitre , Olivier Grisel , Raghav RV ,\n Thierry Guillemot , and Gael Varoquaux .\nThe new solver 'mu' implements a Multiplicate Update in\n decomposition.NMF , allowing the optimization of all\nbeta-divergences, including the Frobenius norm, the generalized\nKullback-Leibler divergence and the Itakura-Saito divergence.\n #5295 by Tom Dupre la Tour .\nModel selection and evaluation\nmodel_selection.GridSearchCV and\n model_selection.RandomizedSearchCV now support simultaneous\nevaluation of multiple metrics. Refer to the\n Specifying multiple metrics for evaluation section of the user guide for more\ninformation. #7388 by Raghav RV\nAdded the model_selection.cross_validate which allows evaluation\nof multiple metrics. This function returns a dict with more useful\ninformation from cross-validation such as the train scores, fit times and\nscore times.\nRefer to The cross_validate function and multiple metric evaluation section of the userguide\nfor more information. #7388 by Raghav RV\nAdded metrics.mean_squared_log_error , which computes\nthe mean square error of the logarithmic transformation of targets,\nparticularly useful for targets with an exponential trend.\n #7655 by Karan Desai .\nAdded metrics.dcg_score and metrics.ndcg_score , which\ncompute Discounted cumulative gain (DCG) and Normalized discounted\ncumulative gain (NDCG).\n #7739 by David Gasquez .\nAdded the model_selection.RepeatedKFold and\n model_selection.RepeatedStratifiedKFold .\n #8120 by Neeraj Gangwar .\nMiscellaneous\nValidation that input data contains no NaN or inf can now be suppressed\nusing config_context , at your own risk. This will save on runtime,\nand may be particularly useful for prediction time. #7548 by\n Joel Nothman .\nAdded a test to ensure parameter listing in docstrings matches the\nfunction/class signature. #9206 by Alexandre Gramfort and\n Raghav RV .\nEnhancements #\nTrees and ensembles\nThe min_weight_fraction_leaf constraint in tree construction is now\nmore efficient, taking a fast path to declare a node a leaf if its weight\nis less than 2 * the minimum. Note that the constructed tree will be\ndifferent from previous versions where min_weight_fraction_leaf is\nused. #7441 by Nelson Liu .\nensemble.GradientBoostingClassifier and ensemble.GradientBoostingRegressor \nnow support sparse input for prediction.\n #6101 by Ibraim Ganiev .\nensemble.VotingClassifier now allows changing estimators by using\n ensemble.VotingClassifier.set_params . An estimator can also be\nremoved by setting it to None .\n #7674 by Yichuan Liu .\ntree.export_graphviz now shows configurable number of decimal\nplaces. #8698 by Guillaume Lemaitre .\nAdded flatten_transform parameter to ensemble.VotingClassifier \nto change output shape of transform method to 2 dimensional.\n #7794 by Ibraim Ganiev and\n Herilalaina Rakotoarison .\nLinear, kernelized and related models\nlinear_model.SGDClassifier , linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron now expose max_iter and\n tol parameters, to handle convergence more precisely.\n n_iter parameter is deprecated, and the fitted estimator exposes\na n_iter_ attribute, with actual number of iterations before\nconvergence. #5036 by Tom Dupre la Tour .\nAdded average parameter to perform weight averaging in\n linear_model.PassiveAggressiveClassifier . #4939 \nby Andrea Esuli .\nlinear_model.RANSACRegressor no longer throws an error\nwhen calling fit if no inliers are found in its first iteration.\nFurthermore, causes of skipped iterations are tracked in newly added\nattributes, n_skips_* .\n #7914 by Michael Horrell .\nIn gaussian_process.GaussianProcessRegressor , method predict \nis a lot faster with return_std=True . #8591 by\n Hadrien Bertrand .\nAdded return_std to predict method of\n linear_model.ARDRegression and\n linear_model.BayesianRidge .\n #7838 by Sergey Feldman .\nMemory usage enhancements: Prevent cast from float32 to float64 in:\n linear_model.MultiTaskElasticNet ;\n linear_model.LogisticRegression when using newton-cg solver; and\n linear_model.Ridge when using svd, sparse_cg, cholesky or lsqr\nsolvers. #8835 , #8061 by Joan Massich and Nicolas\nCordier and Thierry Guillemot .\nOther predictors\nCustom metrics for the sklearn.neighbors binary trees now have\nfewer constraints: they must take two 1d-arrays and return a float.\n #6288 by Jake Vanderplas .\nalgorithm='auto in sklearn.neighbors estimators now chooses the most\nappropriate algorithm for all input types and metrics. #9145 by\n Herilalaina Rakotoarison and Reddy Chinthala .\nDecomposition, manifold learning and clustering\ncluster.MiniBatchKMeans and cluster.KMeans \nnow use significantly less memory when assigning data points to their\nnearest cluster center. #7721 by Jon Crall .\ndecomposition.PCA , decomposition.IncrementalPCA and\n decomposition.TruncatedSVD now expose the singular values\nfrom the underlying SVD. They are stored in the attribute\n singular_values_ , like in decomposition.IncrementalPCA .\n #7685 by Tommy L\u00f6fstedt\ndecomposition.NMF now faster when beta_loss=0 .\n #9277 by @hongkahjun .\nMemory improvements for method barnes_hut in manifold.TSNE\n#7089 by Thomas Moreau and Olivier Grisel .\nOptimization schedule improvements for Barnes-Hut manifold.TSNE \nso the results are closer to the one from the reference implementation\n lvdmaaten/bhtsne by Thomas\nMoreau and Olivier Grisel .\nMemory usage enhancements: Prevent cast from float32 to float64 in\n decomposition.PCA and\n decomposition.randomized_svd_low_rank .\n #9067 by Raghav RV .\nPreprocessing and feature selection\nAdded norm_order parameter to feature_selection.SelectFromModel \nto enable selection of the norm order when coef_ is more than 1D.\n #6181 by Antoine Wendlinger .\nAdded ability to use sparse matrices in feature_selection.f_regression \nwith center=True . #8065 by Daniel LeJeune .\nSmall performance improvement to n-gram creation in\n sklearn.feature_extraction.text by binding methods for loops and\nspecial-casing unigrams. #7567 by Jaye Doepke\nRelax assumption on the data for the\n kernel_approximation.SkewedChi2Sampler . Since the Skewed-Chi2\nkernel is defined on the open interval \\((-skewedness; +\\infty)^d\\) ,\nthe transform function should not check whether X < 0 but whether X <\n-self.skewedness . #7573 by Romain Brault .\nMade default kernel parameters kernel-dependent in\n kernel_approximation.Nystroem .\n #5229 by Saurabh Bansod and Andreas M\u00fcller .\nModel evaluation and meta-estimators\npipeline.Pipeline is now able to cache transformers\nwithin a pipeline by using the memory constructor parameter.\n #7990 by Guillaume Lemaitre .\npipeline.Pipeline steps can now be accessed as attributes of its\n named_steps attribute. #8586 by Herilalaina\nRakotoarison .\nAdded sample_weight parameter to pipeline.Pipeline.score .\n #7723 by Mikhail Korobov .\nAdded ability to set n_jobs parameter to pipeline.make_union .\nA TypeError will be raised for any other kwargs. #8028 \nby Alexander Booth .\nmodel_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV and\n model_selection.cross_val_score now allow estimators with callable\nkernels which were previously prohibited.\n #8005 by Andreas M\u00fcller .\nmodel_selection.cross_val_predict now returns output of the\ncorrect shape for all values of the argument method .\n #7863 by Aman Dalmia .\nAdded shuffle and random_state parameters to shuffle training\ndata before taking prefixes of it based on training sizes in\n model_selection.learning_curve .\n #7506 by Narine Kokhlikyan .\nmodel_selection.StratifiedShuffleSplit now works with multioutput\nmulticlass (or multilabel) data. #9044 by Vlad Niculae .\nSpeed improvements to model_selection.StratifiedShuffleSplit .\n #5991 by Arthur Mensch and Joel Nothman .\nAdd shuffle parameter to model_selection.train_test_split .\n #8845 by themrmax\nmultioutput.MultiOutputRegressor and multioutput.MultiOutputClassifier \nnow support online learning using partial_fit .\n:issue: 8053 by Peng Yu .\nAdd max_train_size parameter to model_selection.TimeSeriesSplit\n#8282 by Aman Dalmia .\nMore clustering metrics are now available through metrics.get_scorer \nand scoring parameters. #8117 by Raghav RV .\nA scorer based on metrics.explained_variance_score is also available.\n #9259 by Hanmin Qin .\nMetrics\nmetrics.matthews_corrcoef now supports multiclass classification.\n #8094 by Jon Crall .\nAdd sample_weight parameter to metrics.cohen_kappa_score .\n #8335 by Victor Poughon .\nMiscellaneous\nutils.estimator_checks.check_estimator now attempts to ensure that methods\ntransform, predict, etc. do not set attributes on the estimator.\n #7533 by Ekaterina Krivich .\nAdded type checking to the accept_sparse parameter in\n sklearn.utils.validation methods. This parameter now accepts only boolean,\nstring, or list/tuple of strings. accept_sparse=None is deprecated and\nshould be replaced by accept_sparse=False .\n #7880 by Josh Karnofsky .\nMake it possible to load a chunk of an svmlight formatted file by\npassing a range of bytes to datasets.load_svmlight_file .\n #935 by Olivier Grisel .\ndummy.DummyClassifier and dummy.DummyRegressor \nnow accept non-finite features. #8931 by @Attractadore .\nBug fixes #\nTrees and ensembles\nFixed a memory leak in trees when using trees with criterion='mae' .\n #8002 by Raghav RV .\nFixed a bug where ensemble.IsolationForest uses an\nincorrect formula for the average path length\n #8549 by Peter Wang .\nFixed a bug where ensemble.AdaBoostClassifier throws\n ZeroDivisionError while fitting data with single class labels.\n #7501 by Dominik Krzeminski .\nFixed a bug in ensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor where a float being compared\nto 0.0 using == caused a divide by zero error. #7970 by\n He Chen .\nFix a bug where ensemble.GradientBoostingClassifier and\n ensemble.GradientBoostingRegressor ignored the\n min_impurity_split parameter.\n #8006 by Sebastian P\u00f6lsterl .\nFixed oob_score in ensemble.BaggingClassifier .\n #8936 by Michael Lewis\nFixed excessive memory usage in prediction for random forests estimators.\n #8672 by Mike Benfield .\nFixed a bug where sample_weight as a list broke random forests in Python 2\n #8068 by @xor .\nFixed a bug where ensemble.IsolationForest fails when\n max_features is less than 1.\n #5732 by Ishank Gulati .\nFix a bug where gradient boosting with loss='quantile' computed\nnegative errors for negative values of ytrue - ypred leading to wrong\nvalues when calling __call__ .\n #8087 by Alexis Mignon\nFix a bug where ensemble.VotingClassifier raises an error\nwhen a numpy array is passed in for weights. #7983 by\n Vincent Pham .\nFixed a bug where tree.export_graphviz raised an error\nwhen the length of features_names does not match n_features in the decision\ntree. #8512 by Li Li .\nLinear, kernelized and related models\nFixed a bug where linear_model.RANSACRegressor.fit may run until\n max_iter if it finds a large inlier group early. #8251 by\n @aivision2020 .\nFixed a bug where naive_bayes.MultinomialNB and\n naive_bayes.BernoulliNB failed when alpha=0 . #5814 by\n Yichuan Liu and Herilalaina Rakotoarison .\nFixed a bug where linear_model.LassoLars does not give\nthe same result as the LassoLars implementation available\nin R (lars library). #7849 by Jair Montoya Martinez .\nFixed a bug in linear_model.RandomizedLasso ,\n linear_model.Lars , linear_model.LassoLars ,\n linear_model.LarsCV and linear_model.LassoLarsCV ,\nwhere the parameter precompute was not used consistently across\nclasses, and some values proposed in the docstring could raise errors.\n #5359 by Tom Dupre la Tour .\nFix inconsistent results between linear_model.RidgeCV and\n linear_model.Ridge when using normalize=True . #9302 \nby Alexandre Gramfort .\nFix a bug where linear_model.LassoLars.fit sometimes\nleft coef_ as a list, rather than an ndarray.\n #8160 by CJ Carey .\nFix linear_model.BayesianRidge.fit to return\nridge parameter alpha_ and lambda_ consistent with calculated\ncoefficients coef_ and intercept_ .\n #8224 by Peter Gedeck .\nFixed a bug in svm.OneClassSVM where it returned floats instead of\ninteger classes. #8676 by Vathsala Achar .\nFix AIC/BIC criterion computation in linear_model.LassoLarsIC .\n #9022 by Alexandre Gramfort and Mehmet Basbug .\nFixed a memory leak in our LibLinear implementation. #9024 by\n Sergei Lebedev\nFix bug where stratified CV splitters did not work with\n linear_model.LassoCV . #8973 by\n Paulo Haddad .\nFixed a bug in gaussian_process.GaussianProcessRegressor \nwhen the standard deviation and covariance predicted without fit\nwould fail with a meaningless error by default.\n #6573 by Quazi Marufur Rahman and\n Manoj Kumar .\nOther predictors\nFix semi_supervised.BaseLabelPropagation to correctly implement\n LabelPropagation and LabelSpreading as done in the referenced\npapers. #9239 \nby Andre Ambrosio Boechat , Utkarsh Upadhyay , and Joel Nothman .\nDecomposition, manifold learning and clustering\nFixed the implementation of manifold.TSNE :\nearly_exaggeration parameter had no effect and is now used for the\nfirst 250 optimization iterations.\nFixed the AssertionError: Tree consistency failed exception\nreported in #8992 .\nImprove the learning schedule to match the one from the reference\nimplementation lvdmaaten/bhtsne .\nby Thomas Moreau and Olivier Grisel .\nFix a bug in decomposition.LatentDirichletAllocation \nwhere the perplexity method was returning incorrect results because\nthe transform method returns normalized document topic distributions\nas of version 0.18. #7954 by Gary Foreman .\nFix output shape and bugs with n_jobs > 1 in\n decomposition.SparseCoder transform and\n decomposition.sparse_encode \nfor one-dimensional data and one component.\nThis also impacts the output shape of decomposition.DictionaryLearning .\n #8086 by Andreas M\u00fcller .\nFixed the implementation of explained_variance_ \nin decomposition.PCA ,\n decomposition.RandomizedPCA and\n decomposition.IncrementalPCA .\n #9105 by Hanmin Qin .\nFixed the implementation of noise_variance_ in decomposition.PCA .\n #9108 by Hanmin Qin .\nFixed a bug where cluster.DBSCAN gives incorrect\nresult when input is a precomputed sparse matrix with initial\nrows all zero. #8306 by Akshay Gupta\nFix a bug regarding fitting cluster.KMeans with a sparse\narray X and initial centroids, where X\u2019s means were unnecessarily being\nsubtracted from the centroids. #7872 by Josh Karnofsky .\nFixes to the input validation in covariance.EllipticEnvelope .\n #8086 by Andreas M\u00fcller .\nFixed a bug in covariance.MinCovDet where inputting data\nthat produced a singular covariance matrix would cause the helper method\n _c_step to throw an exception.\n #3367 by Jeremy Steward\nFixed a bug in manifold.TSNE affecting convergence of the\ngradient descent. #8768 by David DeTomaso .\nFixed a bug in manifold.TSNE where it stored the incorrect\n kl_divergence_ . #6507 by Sebastian Saeger .\nFixed improper scaling in cross_decomposition.PLSRegression \nwith scale=True . #7819 by jayzed82 .\ncluster.SpectralCoclustering and\n cluster.SpectralBiclustering fit method conforms\nwith API by accepting y and returning the object. #6126 ,\n #7814 by Laurent Direr and Maniteja\nNandana .\nFix bug where sklearn.mixture sample methods did not return as many\nsamples as requested. #7702 by Levi John Wolf .\nFixed the shrinkage implementation in neighbors.NearestCentroid .\n #9219 by Hanmin Qin .\nPreprocessing and feature selection\nFor sparse matrices, preprocessing.normalize with return_norm=True \nwill now raise a NotImplementedError with \u2018l1\u2019 or \u2018l2\u2019 norm and with\nnorm \u2018max\u2019 the norms returned will be the same as for dense matrices.\n #7771 by Ang Lu .\nFix a bug where feature_selection.SelectFdr did not\nexactly implement Benjamini-Hochberg procedure. It formerly may have\nselected fewer features than it should.\n #7490 by Peng Meng .\nFixed a bug where linear_model.RandomizedLasso and\n linear_model.RandomizedLogisticRegression break for\nsparse input. #8259 by Aman Dalmia .\nFix a bug where feature_extraction.FeatureHasher \nmandatorily applied a sparse random projection to the hashed features,\npreventing the use of\n feature_extraction.text.HashingVectorizer in a\npipeline with feature_extraction.text.TfidfTransformer .\n #7565 by Roman Yurchak .\nFix a bug where feature_selection.mutual_info_regression did not\ncorrectly use n_neighbors . #8181 by Guillaume Lemaitre .\nModel evaluation and meta-estimators\nFixed a bug where model_selection.BaseSearchCV.inverse_transform \nreturns self.best_estimator_.transform() instead of\n self.best_estimator_.inverse_transform() .\n #8344 by Akshay Gupta and Rasmus Eriksson .\nAdded classes_ attribute to model_selection.GridSearchCV ,\n model_selection.RandomizedSearchCV , grid_search.GridSearchCV ,\nand grid_search.RandomizedSearchCV that matches the classes_ \nattribute of best_estimator_ . #7661 and #8295 \nby Alyssa Batula , Dylan Werner-Meier ,\nand Stephen Hoover .\nFixed a bug where model_selection.validation_curve \nreused the same estimator for each parameter value.\n #7365 by Aleksandr Sandrovskii .\nmodel_selection.permutation_test_score now works with Pandas\ntypes. #5697 by Stijn Tonk .\nSeveral fixes to input validation in\n multiclass.OutputCodeClassifier\n#8086 by Andreas M\u00fcller .\nmulticlass.OneVsOneClassifier \u2019s partial_fit now ensures all\nclasses are provided up-front. #6250 by\n Asish Panda .\nFix multioutput.MultiOutputClassifier.predict_proba to return a\nlist of 2d arrays, rather than a 3d array. In the case where different\ntarget columns had different numbers of classes, a ValueError would be\nraised on trying to stack matrices with different dimensions.\n #8093 by Peter Bull .\nCross validation now works with Pandas datatypes that have a\nread-only index. #9507 by Loic Esteve .\nMetrics\nmetrics.average_precision_score no longer linearly\ninterpolates between operating points, and instead weighs precisions\nby the change in recall since the last operating point, as per the\n Wikipedia entry .\n( #7356 ). By\n Nick Dingwall and Gael Varoquaux .\nFix a bug in metrics.classification._check_targets \nwhich would return 'binary' if y_true and y_pred were\nboth 'binary' but the union of y_true and y_pred was\n 'multiclass' . #8377 by Loic Esteve .\nFixed an integer overflow bug in metrics.confusion_matrix and\nhence metrics.cohen_kappa_score . #8354 , #7929 \nby Joel Nothman and Jon Crall .\nFixed passing of gamma parameter to the chi2 kernel in\n metrics.pairwise.pairwise_kernels #5211 by\n Nick Rhinehart ,\n Saurabh Bansod and Andreas M\u00fcller .\nMiscellaneous\nFixed a bug when datasets.make_classification fails\nwhen generating more than 30 features. #8159 by\n Herilalaina Rakotoarison .\nFixed a bug where datasets.make_moons gives an\nincorrect result when n_samples is odd.\n #8198 by Josh Levy .\nSome fetch_ functions in sklearn.datasets were ignoring the\n download_if_missing keyword. #7944 by Ralf Gommers .\nFix estimators to accept a sample_weight parameter of type\n pandas.Series in their fit function. #7825 by\n Kathleen Chen .\nFix a bug in cases where numpy.cumsum may be numerically unstable,\nraising an exception if instability is identified. #7376 and\n #7331 by Joel Nothman and @yangarbiter .\nFix a bug where base.BaseEstimator.__getstate__ \nobstructed pickling customizations of child-classes, when used in a\nmultiple inheritance context.\n #8316 by Holger Peters .\nUpdate Sphinx-Gallery from 0.1.4 to 0.1.7 for resolving links in\ndocumentation build with Sphinx>1.5 #8010 , #7986 by\n Oscar Najera\nAdd data_home parameter to sklearn.datasets.fetch_kddcup99 .\n #9289 by Loic Esteve .\nFix dataset loaders using Python 3 version of makedirs to also work in\nPython 2. #9284 by Sebastin Santy .\nSeveral minor issues were fixed with thanks to the alerts of\n lgtm.com . #9278 by Jean Helie ,\namong others.\nAPI changes summary #\nTrees and ensembles\nGradient boosting base models are no longer estimators. By Andreas M\u00fcller .\nAll tree-based estimators now accept a min_impurity_decrease \nparameter in lieu of the min_impurity_split , which is now deprecated.\nThe min_impurity_decrease helps stop splitting the nodes in which\nthe weighted impurity decrease from splitting is no longer at least\n min_impurity_decrease . #8449 by Raghav RV .\nLinear, kernelized and related models\nn_iter parameter is deprecated in linear_model.SGDClassifier ,\n linear_model.SGDRegressor ,\n linear_model.PassiveAggressiveClassifier ,\n linear_model.PassiveAggressiveRegressor and\n linear_model.Perceptron . By Tom Dupre la Tour .\nOther predictors\nneighbors.LSHForest has been deprecated and will be\nremoved in 0.21 due to poor performance.\n #9078 by Laurent Direr .\nneighbors.NearestCentroid no longer purports to support\n metric='precomputed' which now raises an error. #8515 by\n Sergul Aydore .\nThe alpha parameter of semi_supervised.LabelPropagation now\nhas no effect and is deprecated to be removed in 0.21. #9239 \nby Andre Ambrosio Boechat , Utkarsh Upadhyay , and Joel Nothman .\nDecomposition, manifold learning and clustering\nDeprecate the doc_topic_distr argument of the perplexity method\nin decomposition.LatentDirichletAllocation because the\nuser no longer has access to the unnormalized document topic distribution\nneeded for the perplexity calculation. #7954 by\n Gary Foreman .\nThe n_topics parameter of decomposition.LatentDirichletAllocation \nhas been renamed to n_components and will be removed in version 0.21.\n #8922 by @Attractadore .\ndecomposition.SparsePCA.transform \u2019s ridge_alpha parameter is\ndeprecated in preference for class parameter.\n #8137 by Naoya Kanai .\ncluster.DBSCAN now has a metric_params parameter.\n #8139 by Naoya Kanai .\nPreprocessing and feature selection\nfeature_selection.SelectFromModel now has a partial_fit \nmethod only if the underlying estimator does. By Andreas M\u00fcller .\nfeature_selection.SelectFromModel now validates the threshold \nparameter and sets the threshold_ attribute during the call to\n fit , and no longer during the call to transform . By Andreas\nM\u00fcller .\nThe non_negative parameter in feature_extraction.FeatureHasher \nhas been deprecated, and replaced with a more principled alternative,\n alternate_sign .\n #7565 by Roman Yurchak .\nlinear_model.RandomizedLogisticRegression ,\nand linear_model.RandomizedLasso have been deprecated and will\nbe removed in version 0.21.\n #8995 by Ramana.S .\nModel evaluation and meta-estimators\nDeprecate the fit_params constructor input to the\n model_selection.GridSearchCV and\n model_selection.RandomizedSearchCV in favor\nof passing keyword parameters to the fit methods\nof those classes. Data-dependent parameters needed for model\ntraining should be passed as keyword arguments to fit ,\nand conforming to this convention will allow the hyperparameter\nselection classes to be used with tools such as\n model_selection.cross_val_predict .\n #2879 by Stephen Hoover .\nIn version 0.21, the default behavior of splitters that use the\n test_size and train_size parameter will change, such that\nspecifying train_size alone will cause test_size to be the\nremainder. #7459 by Nelson Liu .\nmulticlass.OneVsRestClassifier now has partial_fit ,\n decision_function and predict_proba methods only when the\nunderlying estimator does. #7812 by Andreas M\u00fcller and\n Mikhail Korobov .\nmulticlass.OneVsRestClassifier now has a partial_fit method\nonly if the underlying estimator does. By Andreas M\u00fcller .\nThe decision_function output shape for binary classification in\n multiclass.OneVsRestClassifier and\n multiclass.OneVsOneClassifier is now (n_samples,) to conform\nto scikit-learn conventions. #9100 by Andreas M\u00fcller .\nThe multioutput.MultiOutputClassifier.predict_proba \nfunction used to return a 3d array ( n_samples , n_classes ,\n n_outputs ). In the case where different target columns had different\nnumbers of classes, a ValueError would be raised on trying to stack\nmatrices with different dimensions. This function now returns a list of\narrays where the length of the list is n_outputs , and each array is\n( n_samples , n_classes ) for that particular output.\n #8093 by Peter Bull .\nReplace attribute named_steps dict to utils.Bunch \nin pipeline.Pipeline to enable tab completion in interactive\nenvironment. In the case conflict value on named_steps and dict \nattribute, dict behavior will be prioritized.\n #8481 by Herilalaina Rakotoarison .\nMiscellaneous\nDeprecate the y parameter in transform and inverse_transform .\nThe method should not accept y parameter, as it\u2019s used at the prediction time.\n #8174 by Tahar Zanouda , Alexandre Gramfort \nand Raghav RV .\nSciPy >= 0.13.3 and NumPy >= 1.8.2 are now the minimum supported versions\nfor scikit-learn. The following backported functions in\n sklearn.utils have been removed or deprecated accordingly.\n #8854 and #8874 by Naoya Kanai\nThe store_covariances and covariances_ parameters of\n discriminant_analysis.QuadraticDiscriminantAnalysis \nhave been renamed to store_covariance and covariance_ to be\nconsistent with the corresponding parameter names of the\n discriminant_analysis.LinearDiscriminantAnalysis . They will be\nremoved in version 0.21. #7998 by Jiacheng\nRemoved in 0.19:\nutils.fixes.argpartition\nutils.fixes.array_equal\nutils.fixes.astype\nutils.fixes.bincount\nutils.fixes.expit\nutils.fixes.frombuffer_empty\nutils.fixes.in1d\nutils.fixes.norm\nutils.fixes.rankdata\nutils.fixes.safe_copy\nDeprecated in 0.19, to be removed in 0.21:\nutils.arpack.eigs\nutils.arpack.eigsh\nutils.arpack.svds\nutils.extmath.fast_dot\nutils.extmath.logsumexp\nutils.extmath.norm\nutils.extmath.pinvh\nutils.graph.graph_laplacian\nutils.random.choice\nutils.sparsetools.connected_components\nutils.stats.rankdata\nEstimators with both methods decision_function and predict_proba \nare now required to have a monotonic relation between them. The\nmethod check_decision_proba_consistency has been added in\n utils.estimator_checks to check their consistency.\n #7578 by Shubham Bhardwaj\nAll checks in utils.estimator_checks , in particular\n utils.estimator_checks.check_estimator now accept estimator\ninstances. Most other checks do not accept\nestimator classes any more. #9019 by Andreas M\u00fcller .\nEnsure that estimators\u2019 attributes ending with _ are not set\nin the constructor but only in the fit method. Most notably,\nensemble estimators (deriving from ensemble.BaseEnsemble )\nnow only have self.estimators_ available after fit .\n #7464 by Lars Buitinck and Loic Esteve .\nCode and Documentation Contributors #\nThanks to everyone who has contributed to the maintenance and improvement of the\nproject since version 0.18, including:\nJoel Nothman, Loic Esteve, Andreas Mueller, Guillaume Lemaitre, Olivier Grisel,\nHanmin Qin, Raghav RV, Alexandre Gramfort, themrmax, Aman Dalmia, Gael\nVaroquaux, Naoya Kanai, Tom Dupr\u00e9 la Tour, Rishikesh, Nelson Liu, Taehoon Lee,\nNelle Varoquaux, Aashil, Mikhail Korobov, Sebastin Santy, Joan Massich, Roman\nYurchak, RAKOTOARISON Herilalaina, Thierry Guillemot, Alexandre Abadie, Carol\nWilling, Balakumaran Manoharan, Josh Karnofsky, Vlad Niculae, Utkarsh Upadhyay,\nDmitry Petrov, Minghui Liu, Srivatsan, Vincent Pham, Albert Thomas, Jake\nVanderPlas, Attractadore, JC Liu, alexandercbooth, chkoar, \u00d3scar N\u00e1jera,\nAarshay Jain, Kyle Gilliam, Ramana Subramanyam, CJ Carey, Clement Joudet, David\nRobles, He Chen, Joris Van den Bossche, Karan Desai, Katie Luangkote, Leland\nMcInnes, Maniteja Nandana, Michele Lacchia, Sergei Lebedev, Shubham Bhardwaj,\nakshay0724, omtcyfz, rickiepark, waterponey, Vathsala Achar, jbDelafosse, Ralf\nGommers, Ekaterina Krivich, Vivek Kumar, Ishank Gulati, Dave Elliott, ldirer,\nReiichiro Nakano, Levi John Wolf, Mathieu Blondel, Sid Kapur, Dougal J.\nSutherland, midinas, mikebenfield, Sourav Singh, Aseem Bansal, Ibraim Ganiev,\nStephen Hoover, AishwaryaRK, Steven C. Howell, Gary Foreman, Neeraj Gangwar,\nTahar, Jon Crall, dokato, Kathy Chen, ferria, Thomas Moreau, Charlie Brummitt,\nNicolas Goix, Adam Kleczewski, Sam Shleifer, Nikita Singh, Basil Beirouti,\nGiorgio Patrini, Manoj Kumar, Rafael Possas, James Bourbeau, James A. Bednar,\nJanine Harper, Jaye, Jean Helie, Jeremy Steward, Artsiom, John Wei, Jonathan\nLIgo, Jonathan Rahn, seanpwilliams, Arthur Mensch, Josh Levy, Julian Kuhlmann,\nJulien Aubert, J\u00f6rn Hees, Kai, shivamgargsya, Kat Hempstalk, Kaushik\nLakshmikanth, Kennedy, Kenneth Lyons, Kenneth Myers, Kevin Yap, Kirill Bobyrev,\nKonstantin Podshumok, Arthur Imbert, Lee Murray, toastedcornflakes, Lera, Li\nLi, Arthur Douillard, Mainak Jas, tobycheese, Manraj Singh, Manvendra Singh,\nMarc Meketon, MarcoFalke, Matthew Brett, Matthias Gilch, Mehul Ahuja, Melanie\nGoetz, Meng, Peng, Michael Dezube, Michal Baumgartner, vibrantabhi19, Artem\nGolubin, Milen Paskov, Antonin Carette, Morikko, MrMjauh, NALEPA Emmanuel,\nNamiya, Antoine Wendlinger, Narine Kokhlikyan, NarineK, Nate Guerin, Angus\nWilliams, Ang Lu, Nicole Vavrova, Nitish Pandey, Okhlopkov Daniil Olegovich,\nAndy Craze, Om Prakash, Parminder Singh, Patrick Carlson, Patrick Pei, Paul\nGanssle, Paulo Haddad, Pawe\u0142 Lorek, Peng Yu, Pete Bachant, Peter Bull, Peter\nCsizsek, Peter Wang, Pieter Arthur de Jong, Ping-Yao, Chang, Preston Parry,\nPuneet Mathur, Quentin Hibon, Andrew Smith, Andrew Jackson, 1kastner, Rameshwar\nBhaskaran, Rebecca Bilbro, Remi Rampin, Andrea Esuli, Rob Hall, Robert\nBradshaw, Romain Brault, Aman Pratik, Ruifeng Zheng, Russell Smith, Sachin\nAgarwal, Sailesh Choyal, Samson Tan, Samu\u00ebl Weber, Sarah Brown, Sebastian\nP\u00f6lsterl, Sebastian Raschka, Sebastian Saeger, Alyssa Batula, Abhyuday Pratap\nSingh, Sergey Feldman, Sergul Aydore, Sharan Yalburgi, willduan, Siddharth\nGupta, Sri Krishna, Almer, Stijn Tonk, Allen Riddell, Theofilos Papapanagiotou,\nAlison, Alexis Mignon, Tommy Boucher, Tommy L\u00f6fstedt, Toshihiro Kamishima,\nTyler Folkman, Tyler Lanigan, Alexander Junge, Varun Shenoy, Victor Poughon,\nVilhelm von Ehrenheim, Aleksandr Sandrovskii, Alan Yee, Vlasios Vasileiou,\nWarut Vijitbenjaronk, Yang Zhang, Yaroslav Halchenko, Yichuan Liu, Yuichi\nFujikawa, affanv14, aivision2020, xor, andreh7, brady salz, campustrampus,\nAgamemnon Krasoulis, ditenberg, elena-sharova, filipj8, fukatani, gedeck,\nguiniol, guoci, hakaa1, hongkahjun, i-am-xhy, jakirkham, jaroslaw-weber,\njayzed82, jeroko, jmontoyam, jonathan.striebel, josephsalmon, jschendel,\nleereeves, martin-hahn, mathurinm, mehak-sachdeva, mlewis1729, mlliou112,\nmthorrell, ndingwall, nuffe, yangarbiter, plagree, pldtc325, Breno Freitas,\nBrett Olsen, Brian A. Alfano, Brian Burns, polmauri, Brandon Carter, Charlton\nAustin, Chayant T15h, Chinmaya Pancholi, Christian Danielsen, Chung Yen,\nChyi-Kwei Yau, pravarmahajan, DOHMATOB Elvis, Daniel LeJeune, Daniel Hnyk,\nDarius Morawiec, David DeTomaso, David Gasquez, David Haberth\u00fcr, David\nHeryanto, David Kirkby, David Nicholson, rashchedrin, Deborah Gertrude Digges,\nDenis Engemann, Devansh D, Dickson, Bob Baxley, Don86, E. Lynch-Klarup, Ed\nRogers, Elizabeth Ferriss, Ellen-Co2, Fabian Egli, Fang-Chieh Chou, Bing Tian\nDai, Greg Stupp, Grzegorz Szpak, Bertrand Thirion, Hadrien Bertrand, Harizo\nRajaona, zxcvbnius, Henry Lin, Holger Peters, Icyblade Dai, Igor\nAndriushchenko, Ilya, Isaac Laughlin, Iv\u00e1n Vall\u00e9s, Aur\u00e9lien Bellet, JPFrancoia,\nJacob Schreiber, Asish Mahapatra\nprevious\nVersion 0.20\nnext\nVersion 0.18\nOn this page\nVersion 0.19.2\nRelated changes\nVersion 0.19.1\nChangelog\nAPI changes\nBug fixes\nEnhancements\nCode and Documentation Contributors\nVersion 0.19\nHighlights\nChanged models\nChangelog\nNew features\nEnhancements\nBug fixes\nAPI changes summary\nCode and Documentation Contributors\nThis Page\nShow Source\n\u00a9 Copyright 2007 - 2025, scikit-learn developers (BSD License).\n"
    },
    {
      "turn_idx": 18,
      "action": "fetch_website_content({\"website_url\": \"https://scikit-learn.org/0.19/_sources/whats_new.rst.txt\"})",
      "explanation": "",
      "observation": ".. currentmodule:: sklearn\n\n===============\nRelease history\n===============\n\nVersion 0.19.2\n==============\n\n**October, 2018**\n\nThis release is exclusively in order to support Python 3.7.\n\nRelated changes\n---------------\n\n- ``n_iter_`` may vary from previous releases in\n :class:`linear_model.LogisticRegression` with ``solver='lbfgs'`` and\n :class:`linear_model.HuberRegressor`. For Scipy <= 1.0.0, the optimizer could\n perform more than the requested maximum number of iterations. Now both\n estimators will report at most ``max_iter`` iterations even if more were\n performed. :issue:`10723` by `Joel Nothman`_.\nVersion 0.19.1\n==============\n\n**October, 2017**\n\nThis is a bug-fix release with some minor documentation improvements and\nenhancements to features released in 0.19.0.\n\nNote there may be minor differences in TSNE output in this release (due to\n:issue:`9623`), in the case where multiple samples have equal distance to some\nsample.\n\nChangelog\n---------\n\nAPI changes\n...........\n\n- Reverted the addition of ``metrics.ndcg_score`` and ``metrics.dcg_score``\n which had been merged into version 0.19.0 by error. The implementations\n were broken and undocumented.\n\n- ``return_train_score`` which was added to\n :class:`model_selection.GridSearchCV`,\n :class:`model_selection.RandomizedSearchCV` and\n :func:`model_selection.cross_validate` in version 0.19.0 will be changing its\n default value from True to False in version 0.21. We found that calculating\n training score could have a great effect on cross validation runtime in some\n cases. Users should explicitly set ``return_train_score`` to False if\n prediction or scoring functions are slow, resulting in a deleterious effect\n on CV runtime, or to True if they wish to use the calculated scores.\n :issue:`9677` by :user:`Kumar Ashutosh ` and `Joel\n Nothman`_.\n\n- ``correlation_models`` and ``regression_models`` from the legacy gaussian\n processes implementation have been belatedly deprecated. :issue:`9717` by\n :user:`Kumar Ashutosh `.\n\nBug fixes\n.........\n\n- Avoid integer overflows in :func:`metrics.matthews_corrcoef`.\n :issue:`9693` by :user:`Sam Steingold `.\n\n- Fix ValueError in :class:`preprocessing.LabelEncoder` when using\n ``inverse_transform`` on unseen labels. :issue:`9816` by :user:`Charlie Newey\n `.\n\n- Fixed a bug in the objective function for :class:`manifold.TSNE` (both exact\n and with the Barnes-Hut approximation) when ``n_components >= 3``.\n :issue:`9711` by :user:`goncalo-rodrigues`.\n\n- Fix regression in :func:`model_selection.cross_val_predict` where it\n raised an error with ``method='predict_proba'`` for some probabilistic\n classifiers. :issue:`9641` by :user:`James Bourbeau `.\n\n- Fixed a bug where :func:`datasets.make_classification` modified its input\n ``weights``. :issue:`9865` by :user:`Sachin Kelkar `.\n\n- :class:`model_selection.StratifiedShuffleSplit` now works with multioutput\n multiclass or multilabel data with more than 1000 columns. :issue:`9922` by\n :user:`Charlie Brummitt `.\n\n- Fixed a bug with nested and conditional parameter setting, e.g. setting a\n pipeline step and its parameter at the same time. :issue:`9945` by `Andreas\n M\u00fcller`_ and `Joel Nothman`_.\n\nRegressions in 0.19.0 fixed in 0.19.1:\n\n- Fixed a bug where parallelised prediction in random forests was not\n thread-safe and could (rarely) result in arbitrary errors. :issue:`9830` by\n `Joel Nothman`_.\n\n- Fix regression in :func:`model_selection.cross_val_predict` where it no\n longer accepted ``X`` as a list. :issue:`9600` by :user:`Rasul Kerimov\n `.\n\n- Fixed handling of :func:`model_selection.cross_val_predict` for binary classification with\n ``method='decision_function'``. :issue:`9593` by :user:`Reiichiro Nakano\n ` and core devs.\n\n- Fix regression in :class:`pipeline.Pipeline` where it no longer accepted\n ``steps`` as a tuple. :issue:`9604` by :user:`Joris Van den Bossche\n `.\n\n- Fix bug where ``n_iter`` was not properly deprecated, leaving ``n_iter``\n unavailable for interim use in\n :class:`linear_model.SGDClassifier`, :class:`linear_model.SGDRegressor`,\n :class:`linear_model.PassiveAggressiveClassifier`,\n :class:`linear_model.PassiveAggressiveRegressor` and\n :class:`linear_model.Perceptron`. :issue:`9558` by `Andreas M\u00fcller`_.\n\n- Dataset fetchers make sure temporary files are closed before removing them,\n which caused errors on Windows. :issue:`9847` by :user:`Joan Massich `.\n\n- Fixed a regression in :class:`manifold.TSNE` where it no longer supported\n metrics other than 'euclidean' and 'precomputed'. :issue:`9623` by :user:`Oli\n Blum `.\n\nEnhancements\n............\n\n- Our test suite and :func:`utils.estimator_checks.check_estimators` can now be\n run without Nose installed. :issue:`9697` by :user:`Joan Massich `.\n\n- To improve usability of version 0.19's :class:`pipeline.Pipeline`\n caching, ``memory`` now allows ``joblib.Memory`` instances.\n This make use of the new :func:`utils.validation.check_memory` helper.\n :issue:`9584` by :user:`Kumar Ashutosh `\n\n- Some fixes to examples: :issue:`9750`, :issue:`9788`, :issue:`9815`\n\n- Made a FutureWarning in SGD-based estimators less verbose. :issue:`9802` by\n :user:`Vrishank Bhardwaj `.\n\nCode and Documentation Contributors\n-----------------------------------\n\nWith thanks to:\n\nJoel Nothman, Loic Esteve, Andreas Mueller, Kumar Ashutosh,\nVrishank Bhardwaj, Hanmin Qin, Rasul Kerimov, James Bourbeau,\nNagarjuna Kumar, Nathaniel Saul, Olivier Grisel, Roman\nYurchak, Reiichiro Nakano, Sachin Kelkar, Sam Steingold,\nYaroslav Halchenko, diegodlh, felix, goncalo-rodrigues,\njkleint, oliblum90, pasbi, Anthony Gitter, Ben Lawson, Charlie\nBrummitt, Didi Bar-Zev, Gael Varoquaux, Joan Massich, Joris\nVan den Bossche, nielsenmarkus11\n\nVersion 0.19\n============\n\n**August 12, 2017**\n\nHighlights\n----------\n\nWe are excited to release a number of great new features including\n:class:`neighbors.LocalOutlierFactor` for anomaly detection,\n:class:`preprocessing.QuantileTransformer` for robust feature transformation,\nand the :class:`multioutput.ClassifierChain` meta-estimator to simply account\nfor dependencies between classes in multilabel problems. We have some new\nalgorithms in existing estimators, such as multiplicative update in\n:class:`decomposition.NMF` and multinomial\n:class:`linear_model.LogisticRegression` with L1 loss (use ``solver='saga'``).\n\nCross validation is now able to return the results from multiple metric\nevaluations. The new :func:`model_selection.cross_validate` can return many\nscores on the test data as well as training set performance and timings, and we\nhave extended the ``scoring`` and ``refit`` parameters for grid/randomized\nsearch :ref:`to handle multiple metrics `.\n\nYou can also learn faster. For instance, the :ref:`new option to cache\ntransformations ` in :class:`pipeline.Pipeline` makes grid\nsearch over pipelines including slow transformations much more efficient. And\nyou can predict faster: if you're sure you know what you're doing, you can turn\noff validating that the input is finite using :func:`config_context`.\n\nWe've made some important fixes too. We've fixed a longstanding implementation\nerror in :func:`metrics.average_precision_score`, so please be cautious with\nprior results reported from that function. A number of errors in the\n:class:`manifold.TSNE` implementation have been fixed, particularly in the\ndefault Barnes-Hut approximation. :class:`semi_supervised.LabelSpreading` and\n:class:`semi_supervised.LabelPropagation` have had substantial fixes.\nLabelPropagation was previously broken. LabelSpreading should now correctly\nrespect its alpha parameter.\n\nChanged models\n--------------\n\nThe following estimators and functions, when fit with the same data and\nparameters, may produce different models from the previous version. This often\noccurs due to changes in the modelling logic (bug fixes or enhancements), or in\nrandom sampling procedures.\n\n- :class:`cluster.KMeans` with sparse X and initial centroids given (bug fix)\n- :class:`cross_decomposition.PLSRegression`\n with ``scale=True`` (bug fix)\n- :class:`ensemble.GradientBoostingClassifier` and\n :class:`ensemble.GradientBoostingRegressor` where ``min_impurity_split`` is used (bug fix)\n- gradient boosting ``loss='quantile'`` (bug fix)\n- :class:`ensemble.IsolationForest` (bug fix)\n- :class:`feature_selection.SelectFdr` (bug fix)\n- :class:`linear_model.RANSACRegressor` (bug fix)\n- :class:`linear_model.LassoLars` (bug fix)\n- :class:`linear_model.LassoLarsIC` (bug fix)\n- :class:`manifold.TSNE` (bug fix)\n- :class:`neighbors.NearestCentroid` (bug fix)\n- :class:`semi_supervised.LabelSpreading` (bug fix)\n- :class:`semi_supervised.LabelPropagation` (bug fix)\n- tree based models where ``min_weight_fraction_leaf`` is used (enhancement)\n\nDetails are listed in the changelog below.\n\n(While we are trying to better inform users by providing this information, we\ncannot assure that this list is complete.)\n\nChangelog\n---------\n\nNew features\n............\n\nClassifiers and regressors\n\n- Added :class:`multioutput.ClassifierChain` for multi-label\n classification. By `Adam Kleczewski `_.\n\n- Added solver ``'saga'`` that implements the improved version of Stochastic\n Average Gradient, in :class:`linear_model.LogisticRegression` and\n :class:`linear_model.Ridge`. It allows the use of L1 penalty with\n multinomial logistic loss, and behaves marginally better than 'sag'\n during the first epochs of ridge and logistic regression.\n :issue:`8446` by `Arthur Mensch`_.\n\nOther estimators\n\n- Added the :class:`neighbors.LocalOutlierFactor` class for anomaly\n detection based on nearest neighbors.\n :issue:`5279` by `Nicolas Goix`_ and `Alexandre Gramfort`_.\n\n- Added :class:`preprocessing.QuantileTransformer` class and\n :func:`preprocessing.quantile_transform` function for features\n normalization based on quantiles.\n :issue:`8363` by :user:`Denis Engemann `,\n :user:`Guillaume Lemaitre `, `Olivier Grisel`_, `Raghav RV`_,\n :user:`Thierry Guillemot `, and `Gael Varoquaux`_.\n\n- The new solver ``'mu'`` implements a Multiplicate Update in\n :class:`decomposition.NMF`, allowing the optimization of all\n beta-divergences, including the Frobenius norm, the generalized\n Kullback-Leibler divergence and the Itakura-Saito divergence.\n :issue:`5295` by `Tom Dupre la Tour`_.\n\nModel selection and evaluation\n\n- :class:`model_selection.GridSearchCV` and\n :class:`model_selection.RandomizedSearchCV` now support simultaneous\n evaluation of multiple metrics. Refer to the\n :ref:`multimetric_grid_search` section of the user guide for more\n information. :issue:`7388` by `Raghav RV`_\n\n- Added the :func:`model_selection.cross_validate` which allows evaluation\n of multiple metrics. This function returns a dict with more useful\n information from cross-validation such as the train scores, fit times and\n score times.\n Refer to :ref:`multimetric_cross_validation` section of the userguide\n for more information. :issue:`7388` by `Raghav RV`_\n\n- Added :func:`metrics.mean_squared_log_error`, which computes\n the mean square error of the logarithmic transformation of targets,\n particularly useful for targets with an exponential trend.\n :issue:`7655` by :user:`Karan Desai `.\n\n- Added :func:`metrics.dcg_score` and :func:`metrics.ndcg_score`, which\n compute Discounted cumulative gain (DCG) and Normalized discounted\n cumulative gain (NDCG).\n :issue:`7739` by :user:`David Gasquez `.\n\n- Added the :class:`model_selection.RepeatedKFold` and\n :class:`model_selection.RepeatedStratifiedKFold`.\n :issue:`8120` by `Neeraj Gangwar`_.\n\n- Added a scorer based on :class:`metrics.explained_variance_score`.\n :issue:`9259` by `Hanmin Qin `_.\nMiscellaneous\n\n- Validation that input data contains no NaN or inf can now be suppressed\n using :func:`config_context`, at your own risk. This will save on runtime,\n and may be particularly useful for prediction time. :issue:`7548` by\n `Joel Nothman`_.\n\n- Added a test to ensure parameter listing in docstrings match the\n function/class signature. :issue:`9206` by `Alexandre Gramfort`_ and\n `Raghav RV`_.\n\nEnhancements\n............\n\nTrees and ensembles\n\n- The ``min_weight_fraction_leaf`` constraint in tree construction is now\n more efficient, taking a fast path to declare a node a leaf if its weight\n is less than 2 * the minimum. Note that the constructed tree will be\n different from previous versions where ``min_weight_fraction_leaf`` is\n used. :issue:`7441` by :user:`Nelson Liu `.\n\n- :class:`ensemble.GradientBoostingClassifier` and :class:`ensemble.GradientBoostingRegressor`\n now support sparse input for prediction.\n :issue:`6101` by :user:`Ibraim Ganiev `.\n\n- :class:`ensemble.VotingClassifier` now allows changing estimators by using\n :meth:`ensemble.VotingClassifier.set_params`. An estimator can also be\n removed by setting it to ``None``.\n :issue:`7674` by :user:`Yichuan Liu `.\n\n- :func:`tree.export_graphviz` now shows configurable number of decimal\n places. :issue:`8698` by :user:`Guillaume Lemaitre `.\n\n- Added ``flatten_transform`` parameter to :class:`ensemble.VotingClassifier`\n to change output shape of `transform` method to 2 dimensional.\n :issue:`7794` by :user:`Ibraim Ganiev ` and\n :user:`Herilalaina Rakotoarison `.\n\nLinear, kernelized and related models\n\n- :class:`linear_model.SGDClassifier`, :class:`linear_model.SGDRegressor`,\n :class:`linear_model.PassiveAggressiveClassifier`,\n :class:`linear_model.PassiveAggressiveRegressor` and\n :class:`linear_model.Perceptron` now expose ``max_iter`` and\n ``tol`` parameters, to handle convergence more precisely.\n ``n_iter`` parameter is deprecated, and the fitted estimator exposes\n a ``n_iter_`` attribute, with actual number of iterations before\n convergence. :issue:`5036` by `Tom Dupre la Tour`_.\n\n- Added ``average`` parameter to perform weight averaging in\n :class:`linear_model.PassiveAggressiveClassifier`. :issue:`4939`\n by :user:`Andrea Esuli `.\n\n- :class:`linear_model.RANSACRegressor` no longer throws an error\n when calling ``fit`` if no inliers are found in its first iteration.\n Furthermore, causes of skipped iterations are tracked in newly added\n attributes, ``n_skips_*``.\n :issue:`7914` by :user:`Michael Horrell `.\n\n- In :class:`gaussian_process.GaussianProcessRegressor`, method ``predict``\n is a lot faster with ``return_std=True``. :issue:`8591` by\n :user:`Hadrien Bertrand `.\n\n- Added ``return_std`` to ``predict`` method of\n :class:`linear_model.ARDRegression` and\n :class:`linear_model.BayesianRidge`.\n :issue:`7838` by :user:`Sergey Feldman `.\n\n- Memory usage enhancements: Prevent cast from float32 to float64 in:\n :class:`linear_model.MultiTaskElasticNet`;\n :class:`linear_model.LogisticRegression` when using newton-cg solver; and\n :class:`linear_model.Ridge` when using svd, sparse_cg, cholesky or lsqr\n solvers. :issue:`8835`, :issue:`8061` by :user:`Joan Massich ` and :user:`Nicolas\n Cordier ` and :user:`Thierry Guillemot `.\n\nOther predictors\n\n- Custom metrics for the :mod:`neighbors` binary trees now have\n fewer constraints: they must take two 1d-arrays and return a float.\n :issue:`6288` by `Jake Vanderplas`_.\n\n- ``algorithm='auto`` in :mod:`neighbors` estimators now chooses the most\n appropriate algorithm for all input types and metrics. :issue:`9145` by\n :user:`Herilalaina Rakotoarison ` and :user:`Reddy Chinthala\n `.\n\nDecomposition, manifold learning and clustering\n\n- :class:`cluster.MiniBatchKMeans` and :class:`cluster.KMeans`\n now use significantly less memory when assigning data points to their\n nearest cluster center. :issue:`7721` by :user:`Jon Crall `.\n\n- :class:`decomposition.PCA`, :class:`decomposition.IncrementalPCA` and\n :class:`decomposition.TruncatedSVD` now expose the singular values\n from the underlying SVD. They are stored in the attribute\n ``singular_values_``, like in :class:`decomposition.IncrementalPCA`.\n :issue:`7685` by :user:`Tommy L\u00f6fstedt `\n\n- Fixed the implementation of noise_variance_ in :class:`decomposition.PCA`.\n :issue:`9108` by `Hanmin Qin `_.\n\n- :class:`decomposition.NMF` now faster when ``beta_loss=0``.\n :issue:`9277` by :user:`hongkahjun`.\n\n- Memory improvements for method ``barnes_hut`` in :class:`manifold.TSNE`\n :issue:`7089` by :user:`Thomas Moreau ` and `Olivier Grisel`_.\n\n- Optimization schedule improvements for Barnes-Hut :class:`manifold.TSNE`\n so the results are closer to the one from the reference implementation\n `lvdmaaten/bhtsne `_ by :user:`Thomas\n Moreau ` and `Olivier Grisel`_.\n\n- Memory usage enhancements: Prevent cast from float32 to float64 in\n :class:`decomposition.PCA` and\n :func:`decomposition.randomized_svd_low_rank`.\n :issue:`9067` by `Raghav RV`_.\n\nPreprocessing and feature selection\n\n- Added ``norm_order`` parameter to :class:`feature_selection.SelectFromModel`\n to enable selection of the norm order when ``coef_`` is more than 1D.\n :issue:`6181` by :user:`Antoine Wendlinger `.\n\n- Added ability to use sparse matrices in :func:`feature_selection.f_regression`\n with ``center=True``. :issue:`8065` by :user:`Daniel LeJeune `.\n\n- Small performance improvement to n-gram creation in\n :mod:`feature_extraction.text` by binding methods for loops and\n special-casing unigrams. :issue:`7567` by :user:`Jaye Doepke `\n\n- Relax assumption on the data for the\n :class:`kernel_approximation.SkewedChi2Sampler`. Since the Skewed-Chi2\n kernel is defined on the open interval :math:`(-skewedness; +\\infty)^d`,\n the transform function should not check whether ``X < 0`` but whether ``X <\n -self.skewedness``. :issue:`7573` by :user:`Romain Brault `.\n\n- Made default kernel parameters kernel-dependent in\n :class:`kernel_approximation.Nystroem`.\n :issue:`5229` by :user:`Saurabh Bansod ` and `Andreas M\u00fcller`_.\n\nModel evaluation and meta-estimators\n\n- :class:`pipeline.Pipeline` is now able to cache transformers\n within a pipeline by using the ``memory`` constructor parameter.\n :issue:`7990` by :user:`Guillaume Lemaitre `.\n\n- :class:`pipeline.Pipeline` steps can now be accessed as attributes of its\n ``named_steps`` attribute. :issue:`8586` by :user:`Herilalaina\n Rakotoarison `.\n\n- Added ``sample_weight`` parameter to :meth:`pipeline.Pipeline.score`.\n :issue:`7723` by :user:`Mikhail Korobov `.\n\n- Added ability to set ``n_jobs`` parameter to :func:`pipeline.make_union`.\n A ``TypeError`` will be raised for any other kwargs. :issue:`8028`\n by :user:`Alexander Booth `.\n\n- :class:`model_selection.GridSearchCV`,\n :class:`model_selection.RandomizedSearchCV` and\n :func:`model_selection.cross_val_score` now allow estimators with callable\n kernels which were previously prohibited.\n :issue:`8005` by `Andreas M\u00fcller`_ .\n\n- :func:`model_selection.cross_val_predict` now returns output of the\n correct shape for all values of the argument ``method``.\n :issue:`7863` by :user:`Aman Dalmia `.\n\n- Added ``shuffle`` and ``random_state`` parameters to shuffle training\n data before taking prefixes of it based on training sizes in\n :func:`model_selection.learning_curve`.\n :issue:`7506` by :user:`Narine Kokhlikyan `.\n\n- :class:`model_selection.StratifiedShuffleSplit` now works with multioutput\n multiclass (or multilabel) data. :issue:`9044` by `Vlad Niculae`_.\n\n- Speed improvements to :class:`model_selection.StratifiedShuffleSplit`.\n :issue:`5991` by :user:`Arthur Mensch ` and `Joel Nothman`_.\n\n- Add ``shuffle`` parameter to :func:`model_selection.train_test_split`.\n :issue:`8845` by :user:`themrmax `\n\n- :class:`multioutput.MultiOutputRegressor` and :class:`multioutput.MultiOutputClassifier`\n now support online learning using ``partial_fit``.\n :issue: `8053` by :user:`Peng Yu `.\n\n- Add ``max_train_size`` parameter to :class:`model_selection.TimeSeriesSplit`\n :issue:`8282` by :user:`Aman Dalmia `.\n\n- More clustering metrics are now available through :func:`metrics.get_scorer`\n and ``scoring`` parameters. :issue:`8117` by `Raghav RV`_.\n\nMetrics\n\n- :func:`metrics.matthews_corrcoef` now support multiclass classification.\n :issue:`8094` by :user:`Jon Crall `.\n\n- Add ``sample_weight`` parameter to :func:`metrics.cohen_kappa_score`.\n :issue:`8335` by :user:`Victor Poughon `.\n\nMiscellaneous\n\n- :func:`utils.check_estimator` now attempts to ensure that methods\n transform, predict, etc. do not set attributes on the estimator.\n :issue:`7533` by :user:`Ekaterina Krivich `.\n\n- Added type checking to the ``accept_sparse`` parameter in\n :mod:`utils.validation` methods. This parameter now accepts only boolean,\n string, or list/tuple of strings. ``accept_sparse=None`` is deprecated and\n should be replaced by ``accept_sparse=False``.\n :issue:`7880` by :user:`Josh Karnofsky `.\n\n- Make it possible to load a chunk of an svmlight formatted file by\n passing a range of bytes to :func:`datasets.load_svmlight_file`.\n :issue:`935` by :user:`Olivier Grisel `.\n\n- :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor`\n now accept non-finite features. :issue:`8931` by :user:`Attractadore`.\n\nBug fixes\n.........\n\nTrees and ensembles\n\n- Fixed a memory leak in trees when using trees with ``criterion='mae'``.\n :issue:`8002` by `Raghav RV`_.\n\n- Fixed a bug where :class:`ensemble.IsolationForest` uses an\n an incorrect formula for the average path length\n :issue:`8549` by `Peter Wang `_.\n\n- Fixed a bug where :class:`ensemble.AdaBoostClassifier` throws\n ``ZeroDivisionError`` while fitting data with single class labels.\n :issue:`7501` by :user:`Dominik Krzeminski `.\n\n- Fixed a bug in :class:`ensemble.GradientBoostingClassifier` and\n :class:`ensemble.GradientBoostingRegressor` where a float being compared\n to ``0.0`` using ``==`` caused a divide by zero error. :issue:`7970` by\n :user:`He Chen `.\n\n- Fix a bug where :class:`ensemble.GradientBoostingClassifier` and\n :class:`ensemble.GradientBoostingRegressor` ignored the\n ``min_impurity_split`` parameter.\n :issue:`8006` by :user:`Sebastian P\u00f6lsterl `.\n\n- Fixed ``oob_score`` in :class:`ensemble.BaggingClassifier`.\n :issue:`8936` by :user:`Michael Lewis `\n\n- Fixed excessive memory usage in prediction for random forests estimators.\n :issue:`8672` by :user:`Mike Benfield `.\n\n- Fixed a bug where ``sample_weight`` as a list broke random forests in Python 2\n :issue:`8068` by :user:`xor`.\n\n- Fixed a bug where :class:`ensemble.IsolationForest` fails when\n ``max_features`` is less than 1.\n :issue:`5732` by :user:`Ishank Gulati `.\n\n- Fix a bug where gradient boosting with ``loss='quantile'`` computed\n negative errors for negative values of ``ytrue - ypred`` leading to wrong\n values when calling ``__call__``.\n :issue:`8087` by :user:`Alexis Mignon `\n\n- Fix a bug where :class:`ensemble.VotingClassifier` raises an error\n when a numpy array is passed in for weights. :issue:`7983` by\n :user:`Vincent Pham `.\n\n- Fixed a bug where :func:`tree.export_graphviz` raised an error\n when the length of features_names does not match n_features in the decision\n tree. :issue:`8512` by :user:`Li Li `.\n\nLinear, kernelized and related models\n\n- Fixed a bug where :func:`linear_model.RANSACRegressor.fit` may run until\n ``max_iter`` if it finds a large inlier group early. :issue:`8251` by\n :user:`aivision2020`.\n\n- Fixed a bug where :class:`naive_bayes.MultinomialNB` and\n :class:`naive_bayes.BernoulliNB` failed when ``alpha=0``. :issue:`5814` by\n :user:`Yichuan Liu ` and :user:`Herilalaina Rakotoarison\n `.\n\n- Fixed a bug where :class:`linear_model.LassoLars` does not give\n the same result as the LassoLars implementation available\n in R (lars library). :issue:`7849` by :user:`Jair Montoya Martinez `.\n\n- Fixed a bug in :class:`linear_model.RandomizedLasso`,\n :class:`linear_model.Lars`, :class:`linear_model.LassoLars`,\n :class:`linear_model.LarsCV` and :class:`linear_model.LassoLarsCV`,\n where the parameter ``precompute`` was not used consistently across\n classes, and some values proposed in the docstring could raise errors.\n :issue:`5359` by `Tom Dupre la Tour`_.\n\n- Fix inconsistent results between :class:`linear_model.RidgeCV` and\n :class:`linear_model.Ridge` when using ``normalize=True``. :issue:`9302`\n by `Alexandre Gramfort`_.\n\n- Fix a bug where :func:`linear_model.LassoLars.fit` sometimes\n left ``coef_`` as a list, rather than an ndarray.\n :issue:`8160` by :user:`CJ Carey `.\n\n- Fix :func:`linear_model.BayesianRidge.fit` to return\n ridge parameter ``alpha_`` and ``lambda_`` consistent with calculated\n coefficients ``coef_`` and ``intercept_``.\n :issue:`8224` by :user:`Peter Gedeck `.\n\n- Fixed a bug in :class:`svm.OneClassSVM` where it returned floats instead of\n integer classes. :issue:`8676` by :user:`Vathsala Achar `.\n\n- Fix AIC/BIC criterion computation in :class:`linear_model.LassoLarsIC`.\n :issue:`9022` by `Alexandre Gramfort`_ and :user:`Mehmet Basbug `.\n\n- Fixed a memory leak in our LibLinear implementation. :issue:`9024` by\n :user:`Sergei Lebedev `\n\n- Fix bug where stratified CV splitters did not work with\n :class:`linear_model.LassoCV`. :issue:`8973` by\n :user:`Paulo Haddad `.\n\n- Fixed a bug in :class:`gaussian_process.GaussianProcessRegressor`\n when the standard deviation and covariance predicted without fit\n would fail with a unmeaningful error by default.\n :issue:`6573` by :user:`Quazi Marufur Rahman ` and\n `Manoj Kumar`_.\n\nOther predictors\n\n- Fix :class:`semi_supervised.BaseLabelPropagation` to correctly implement\n ``LabelPropagation`` and ``LabelSpreading`` as done in the referenced\n papers. :issue:`9239`\n by :user:`Andre Ambrosio Boechat `, :user:`Utkarsh Upadhyay\n `, and `Joel Nothman`_.\n\nDecomposition, manifold learning and clustering\n\n- Fixed the implementation of :class:`manifold.TSNE`:\n- ``early_exageration`` parameter had no effect and is now used for the\n first 250 optimization iterations.\n- Fixed the ``AssertionError: Tree consistency failed`` exception\n reported in :issue:`8992`.\n- Improve the learning schedule to match the one from the reference\n implementation `lvdmaaten/bhtsne `_.\n by :user:`Thomas Moreau ` and `Olivier Grisel`_.\n\n- Fix a bug in :class:`decomposition.LatentDirichletAllocation`\n where the ``perplexity`` method was returning incorrect results because\n the ``transform`` method returns normalized document topic distributions\n as of version 0.18. :issue:`7954` by :user:`Gary Foreman `.\n\n- Fix output shape and bugs with n_jobs > 1 in\n :class:`decomposition.SparseCoder` transform and\n :func:`decomposition.sparse_encode`\n for one-dimensional data and one component.\n This also impacts the output shape of :class:`decomposition.DictionaryLearning`.\n :issue:`8086` by `Andreas M\u00fcller`_.\n\n- Fixed the implementation of ``explained_variance_``\n in :class:`decomposition.PCA`,\n :class:`decomposition.RandomizedPCA` and\n :class:`decomposition.IncrementalPCA`.\n :issue:`9105` by `Hanmin Qin `_.\n\n- Fixed the implementation of noise_variance_ in :class:`decomposition.PCA`.\n :issue:`9108` by `Hanmin Qin `_.\n\n- Fixed a bug where :class:`cluster.DBSCAN` gives incorrect\n result when input is a precomputed sparse matrix with initial\n rows all zero. :issue:`8306` by :user:`Akshay Gupta `\n\n- Fix a bug regarding fitting :class:`cluster.KMeans` with a sparse\n array X and initial centroids, where X's means were unnecessarily being\n subtracted from the centroids. :issue:`7872` by :user:`Josh Karnofsky `.\n\n- Fixes to the input validation in :class:`covariance.EllipticEnvelope`.\n :issue:`8086` by `Andreas M\u00fcller`_.\n\n- Fixed a bug in :class:`covariance.MinCovDet` where inputting data\n that produced a singular covariance matrix would cause the helper method\n ``_c_step`` to throw an exception.\n :issue:`3367` by :user:`Jeremy Steward `\n\n- Fixed a bug in :class:`manifold.TSNE` affecting convergence of the\n gradient descent. :issue:`8768` by :user:`David DeTomaso `.\n\n- Fixed a bug in :class:`manifold.TSNE` where it stored the incorrect\n ``kl_divergence_``. :issue:`6507` by :user:`Sebastian Saeger `.\n\n- Fixed improper scaling in :class:`cross_decomposition.PLSRegression`\n with ``scale=True``. :issue:`7819` by :user:`jayzed82 `.\n\n- :class:`cluster.bicluster.SpectralCoclustering` and\n :class:`cluster.bicluster.SpectralBiclustering` ``fit`` method conforms\n with API by accepting ``y`` and returning the object. :issue:`6126`,\n :issue:`7814` by :user:`Laurent Direr ` and :user:`Maniteja\n Nandana `.\n\n- Fix bug where :mod:`mixture` ``sample`` methods did not return as many\n samples as requested. :issue:`7702` by :user:`Levi John Wolf `.\n\n- Fixed the shrinkage implementation in :class:`neighbors.NearestCentroid`.\n :issue:`9219` by `Hanmin Qin `_.\n\nPreprocessing and feature selection\n\n- For sparse matrices, :func:`preprocessing.normalize` with ``return_norm=True``\n will now raise a ``NotImplementedError`` with 'l1' or 'l2' norm and with\n norm 'max' the norms returned will be the same as for dense matrices.\n :issue:`7771` by `Ang Lu `_.\n\n- Fix a bug where :class:`feature_selection.SelectFdr` did not\n exactly implement Benjamini-Hochberg procedure. It formerly may have\n selected fewer features than it should.\n :issue:`7490` by :user:`Peng Meng `.\n\n- Fixed a bug where :class:`linear_model.RandomizedLasso` and\n :class:`linear_model.RandomizedLogisticRegression` breaks for\n sparse input. :issue:`8259` by :user:`Aman Dalmia `.\n\n- Fix a bug where :class:`feature_extraction.FeatureHasher`\n mandatorily applied a sparse random projection to the hashed features,\n preventing the use of\n :class:`feature_extraction.text.HashingVectorizer` in a\n pipeline with :class:`feature_extraction.text.TfidfTransformer`.\n :issue:`7565` by :user:`Roman Yurchak `.\n\n- Fix a bug where :class:`feature_selection.mutual_info_regression` did not\n correctly use ``n_neighbors``. :issue:`8181` by :user:`Guillaume Lemaitre\n `.\n\nModel evaluation and meta-estimators\n\n- Fixed a bug where :func:`model_selection.BaseSearchCV.inverse_transform`\n returns ``self.best_estimator_.transform()`` instead of\n ``self.best_estimator_.inverse_transform()``.\n :issue:`8344` by :user:`Akshay Gupta ` and :user:`Rasmus Eriksson `.\n\n- Added ``classes_`` attribute to :class:`model_selection.GridSearchCV`,\n :class:`model_selection.RandomizedSearchCV`, :class:`grid_search.GridSearchCV`,\n and :class:`grid_search.RandomizedSearchCV` that matches the ``classes_``\n attribute of ``best_estimator_``. :issue:`7661` and :issue:`8295`\n by :user:`Alyssa Batula `, :user:`Dylan Werner-Meier `,\n and :user:`Stephen Hoover `.\n\n- Fixed a bug where :func:`model_selection.validation_curve`\n reused the same estimator for each parameter value.\n :issue:`7365` by :user:`Aleksandr Sandrovskii `.\n\n- :func:`model_selection.permutation_test_score` now works with Pandas\n types. :issue:`5697` by :user:`Stijn Tonk `.\n\n- Several fixes to input validation in\n :class:`multiclass.OutputCodeClassifier`\n :issue:`8086` by `Andreas M\u00fcller`_.\n\n- :class:`multiclass.OneVsOneClassifier`'s ``partial_fit`` now ensures all\n classes are provided up-front. :issue:`6250` by\n :user:`Asish Panda `.\n\n- Fix :func:`multioutput.MultiOutputClassifier.predict_proba` to return a\n list of 2d arrays, rather than a 3d array. In the case where different\n target columns had different numbers of classes, a ``ValueError`` would be\n raised on trying to stack matrices with different dimensions.\n :issue:`8093` by :user:`Peter Bull `.\n\n- Cross validation now works with Pandas datatypes that that have a\n read-only index. :issue:`9507` by `Loic Esteve`_.\n\nMetrics\n\n- :func:`metrics.average_precision_score` no longer linearly\n interpolates between operating points, and instead weighs precisions\n by the change in recall since the last operating point, as per the\n `Wikipedia entry `_.\n (`#7356 `_). By\n :user:`Nick Dingwall ` and `Gael Varoquaux`_.\n\n- Fix a bug in :func:`metrics.classification._check_targets`\n which would return ``'binary'`` if ``y_true`` and ``y_pred`` were\n both ``'binary'`` but the union of ``y_true`` and ``y_pred`` was\n ``'multiclass'``. :issue:`8377` by `Loic Esteve`_.\n\n- Fixed an integer overflow bug in :func:`metrics.confusion_matrix` and\n hence :func:`metrics.cohen_kappa_score`. :issue:`8354`, :issue:`7929`\n by `Joel Nothman`_ and :user:`Jon Crall `.\n\n- Fixed passing of ``gamma`` parameter to the ``chi2`` kernel in\n :func:`metrics.pairwise.pairwise_kernels` :issue:`5211` by\n :user:`Nick Rhinehart `,\n :user:`Saurabh Bansod ` and `Andreas M\u00fcller`_.\n\nMiscellaneous\n\n- Fixed a bug when :func:`datasets.make_classification` fails\n when generating more than 30 features. :issue:`8159` by\n :user:`Herilalaina Rakotoarison `.\n\n- Fixed a bug where :func:`datasets.make_moons` gives an\n incorrect result when ``n_samples`` is odd.\n :issue:`8198` by :user:`Josh Levy `.\n\n- Some ``fetch_`` functions in :mod:`datasets` were ignoring the\n ``download_if_missing`` keyword. :issue:`7944` by :user:`Ralf Gommers `.\n\n- Fix estimators to accept a ``sample_weight`` parameter of type\n ``pandas.Series`` in their ``fit`` function. :issue:`7825` by\n `Kathleen Chen`_.\n\n- Fix a bug in cases where ``numpy.cumsum`` may be numerically unstable,\n raising an exception if instability is identified. :issue:`7376` and\n :issue:`7331` by `Joel Nothman`_ and :user:`yangarbiter`.\n\n- Fix a bug where :meth:`base.BaseEstimator.__getstate__`\n obstructed pickling customizations of child-classes, when used in a\n multiple inheritance context.\n :issue:`8316` by :user:`Holger Peters `.\n\n- Update Sphinx-Gallery from 0.1.4 to 0.1.7 for resolving links in\n documentation build with Sphinx>1.5 :issue:`8010`, :issue:`7986` by\n :user:`Oscar Najera `\n\n- Add ``data_home`` parameter to :func:`sklearn.datasets.fetch_kddcup99`.\n :issue:`9289` by `Loic Esteve`_.\n\n- Fix dataset loaders using Python 3 version of makedirs to also work in\n Python 2. :issue:`9284` by :user:`Sebastin Santy `.\n\n- Several minor issues were fixed with thanks to the alerts of\n [lgtm.com](http://lgtm.com). :issue:`9278` by :user:`Jean Helie `,\n among others.\n\nAPI changes summary\n-------------------\n\nTrees and ensembles\n\n- Gradient boosting base models are no longer estimators. By `Andreas M\u00fcller`_.\n\n- All tree based estimators now accept a ``min_impurity_decrease``\n parameter in lieu of the ``min_impurity_split``, which is now deprecated.\n The ``min_impurity_decrease`` helps stop splitting the nodes in which\n the weighted impurity decrease from splitting is no longer alteast\n ``min_impurity_decrease``. :issue:`8449` by `Raghav RV`_.\n\nLinear, kernelized and related models\n\n- ``n_iter`` parameter is deprecated in :class:`linear_model.SGDClassifier`,\n :class:`linear_model.SGDRegressor`,\n :class:`linear_model.PassiveAggressiveClassifier`,\n :class:`linear_model.PassiveAggressiveRegressor` and\n :class:`linear_model.Perceptron`. By `Tom Dupre la Tour`_.\n\nOther predictors\n\n- :class:`neighbors.LSHForest` has been deprecated and will be\n removed in 0.21 due to poor performance.\n :issue:`9078` by :user:`Laurent Direr `.\n\n- :class:`neighbors.NearestCentroid` no longer purports to support\n ``metric='precomputed'`` which now raises an error. :issue:`8515` by\n :user:`Sergul Aydore `.\n\n- The ``alpha`` parameter of :class:`semi_supervised.LabelPropagation` now\n has no effect and is deprecated to be removed in 0.21. :issue:`9239`\n by :user:`Andre Ambrosio Boechat `, :user:`Utkarsh Upadhyay\n `, and `Joel Nothman`_.\n\nDecomposition, manifold learning and clustering\n\n- Deprecate the ``doc_topic_distr`` argument of the ``perplexity`` method\n in :class:`decomposition.LatentDirichletAllocation` because the\n user no longer has access to the unnormalized document topic distribution\n needed for the perplexity calculation. :issue:`7954` by\n :user:`Gary Foreman `.\n\n- The ``n_topics`` parameter of :class:`decomposition.LatentDirichletAllocation`\n has been renamed to ``n_components`` and will be removed in version 0.21.\n :issue:`8922` by :user:`Attractadore`.\n\n- :meth:`decomposition.SparsePCA.transform`'s ``ridge_alpha`` parameter is\n deprecated in preference for class parameter.\n :issue:`8137` by :user:`Naoya Kanai `.\n\n- :class:`cluster.DBSCAN` now has a ``metric_params`` parameter.\n :issue:`8139` by :user:`Naoya Kanai `.\n\nPreprocessing and feature selection\n\n- :class:`feature_selection.SelectFromModel` now has a ``partial_fit``\n method only if the underlying estimator does. By `Andreas M\u00fcller`_.\n\n- :class:`feature_selection.SelectFromModel` now validates the ``threshold``\n parameter and sets the ``threshold_`` attribute during the call to\n ``fit``, and no longer during the call to ``transform```. By `Andreas\n M\u00fcller`_.\n\n- The ``non_negative`` parameter in :class:`feature_extraction.FeatureHasher`\n has been deprecated, and replaced with a more principled alternative,\n ``alternate_sign``.\n :issue:`7565` by :user:`Roman Yurchak `.\n\n- :class:`linear_model.RandomizedLogisticRegression`,\n and :class:`linear_model.RandomizedLasso` have been deprecated and will\n be removed in version 0.21.\n :issue:`8995` by :user:`Ramana.S `.\n\nModel evaluation and meta-estimators\n\n- Deprecate the ``fit_params`` constructor input to the\n :class:`model_selection.GridSearchCV` and\n :class:`model_selection.RandomizedSearchCV` in favor\n of passing keyword parameters to the ``fit`` methods\n of those classes. Data-dependent parameters needed for model\n training should be passed as keyword arguments to ``fit``,\n and conforming to this convention will allow the hyperparameter\n selection classes to be used with tools such as\n :func:`model_selection.cross_val_predict`.\n :issue:`2879` by :user:`Stephen Hoover `.\n\n- In version 0.21, the default behavior of splitters that use the\n ``test_size`` and ``train_size`` parameter will change, such that\n specifying ``train_size`` alone will cause ``test_size`` to be the\n remainder. :issue:`7459` by :user:`Nelson Liu `.\n\n- :class:`multiclass.OneVsRestClassifier` now has ``partial_fit``,\n ``decision_function`` and ``predict_proba`` methods only when the\n underlying estimator does. :issue:`7812` by `Andreas M\u00fcller`_ and\n :user:`Mikhail Korobov `.\n\n- :class:`multiclass.OneVsRestClassifier` now has a ``partial_fit`` method\n only if the underlying estimator does. By `Andreas M\u00fcller`_.\n\n- The ``decision_function`` output shape for binary classification in\n :class:`multiclass.OneVsRestClassifier` and\n :class:`multiclass.OneVsOneClassifier` is now ``(n_samples,)`` to conform\n to scikit-learn conventions. :issue:`9100` by `Andreas M\u00fcller`_.\n\n- The :func:`multioutput.MultiOutputClassifier.predict_proba`\n function used to return a 3d array (``n_samples``, ``n_classes``,\n ``n_outputs``). In the case where different target columns had different\n numbers of classes, a ``ValueError`` would be raised on trying to stack\n matrices with different dimensions. This function now returns a list of\n arrays where the length of the list is ``n_outputs``, and each array is\n (``n_samples``, ``n_classes``) for that particular output.\n :issue:`8093` by :user:`Peter Bull `.\n\n- Replace attribute ``named_steps`` ``dict`` to :class:`utils.Bunch`\n in :class:`pipeline.Pipeline` to enable tab completion in interactive\n environment. In the case conflict value on ``named_steps`` and ``dict``\n attribute, ``dict`` behavior will be prioritized.\n :issue:`8481` by :user:`Herilalaina Rakotoarison `.\n\nMiscellaneous\n\n- Deprecate the ``y`` parameter in ``transform`` and ``inverse_transform``.\n The method should not accept ``y`` parameter, as it's used at the prediction time.\n :issue:`8174` by :user:`Tahar Zanouda `, `Alexandre Gramfort`_\n and `Raghav RV`_.\n\n- SciPy >= 0.13.3 and NumPy >= 1.8.2 are now the minimum supported versions\n for scikit-learn. The following backported functions in\n :mod:`utils` have been removed or deprecated accordingly.\n :issue:`8854` and :issue:`8874` by :user:`Naoya Kanai `\n\n- The ``store_covariances`` and ``covariances_`` parameters of\n :class:`discriminant_analysis.QuadraticDiscriminantAnalysis`\n has been renamed to ``store_covariance`` and ``covariance_`` to be\n consistent with the corresponding parameter names of the\n :class:`discriminant_analysis.LinearDiscriminantAnalysis`. They will be\n removed in version 0.21. :issue:`7998` by :user:`Jiacheng `\nRemoved in 0.19:\n- ``utils.fixes.argpartition``\n - ``utils.fixes.array_equal``\n - ``utils.fixes.astype``\n - ``utils.fixes.bincount``\n - ``utils.fixes.expit``\n - ``utils.fixes.frombuffer_empty``\n - ``utils.fixes.in1d``\n - ``utils.fixes.norm``\n - ``utils.fixes.rankdata``\n - ``utils.fixes.safe_copy``\nDeprecated in 0.19, to be removed in 0.21:\n- ``utils.arpack.eigs``\n - ``utils.arpack.eigsh``\n - ``utils.arpack.svds``\n - ``utils.extmath.fast_dot``\n - ``utils.extmath.logsumexp``\n - ``utils.extmath.norm``\n - ``utils.extmath.pinvh``\n - ``utils.graph.graph_laplacian``\n - ``utils.random.choice``\n - ``utils.sparsetools.connected_components``\n - ``utils.stats.rankdata``\n\n- Estimators with both methods ``decision_function`` and ``predict_proba``\n are now required to have a monotonic relation between them. The\n method ``check_decision_proba_consistency`` has been added in\n **utils.estimator_checks** to check their consistency.\n :issue:`7578` by :user:`Shubham Bhardwaj `\n\n- All checks in ``utils.estimator_checks``, in particular\n :func:`utils.estimator_checks.check_estimator` now accept estimator\n instances. Most other checks do not accept\n estimator classes any more. :issue:`9019` by `Andreas M\u00fcller`_.\n\n- Ensure that estimators' attributes ending with ``_`` are not set\n in the constructor but only in the ``fit`` method. Most notably,\n ensemble estimators (deriving from :class:`ensemble.BaseEnsemble`)\n now only have ``self.estimators_`` available after ``fit``.\n :issue:`7464` by `Lars Buitinck`_ and `Loic Esteve`_.\nCode and Documentation Contributors\n-----------------------------------\n\nThanks to everyone who has contributed to the maintenance and improvement of the\nproject since version 0.18, including:\n\nJoel Nothman, Loic Esteve, Andreas Mueller, Guillaume Lemaitre, Olivier Grisel,\nHanmin Qin, Raghav RV, Alexandre Gramfort, themrmax, Aman Dalmia, Gael\nVaroquaux, Naoya Kanai, Tom Dupr\u00e9 la Tour, Rishikesh, Nelson Liu, Taehoon Lee,\nNelle Varoquaux, Aashil, Mikhail Korobov, Sebastin Santy, Joan Massich, Roman\nYurchak, RAKOTOARISON Herilalaina, Thierry Guillemot, Alexandre Abadie, Carol\nWilling, Balakumaran Manoharan, Josh Karnofsky, Vlad Niculae, Utkarsh Upadhyay,\nDmitry Petrov, Minghui Liu, Srivatsan, Vincent Pham, Albert Thomas, Jake\nVanderPlas, Attractadore, JC Liu, alexandercbooth, chkoar, \u00d3scar N\u00e1jera,\nAarshay Jain, Kyle Gilliam, Ramana Subramanyam, CJ Carey, Clement Joudet, David\nRobles, He Chen, Joris Van den Bossche, Karan Desai, Katie Luangkote, Leland\nMcInnes, Maniteja Nandana, Michele Lacchia, Sergei Lebedev, Shubham Bhardwaj,\nakshay0724, omtcyfz, rickiepark, waterponey, Vathsala Achar, jbDelafosse, Ralf\nGommers, Ekaterina Krivich, Vivek Kumar, Ishank Gulati, Dave Elliott, ldirer,\nReiichiro Nakano, Levi John Wolf, Mathieu Blondel, Sid Kapur, Dougal J.\nSutherland, midinas, mikebenfield, Sourav Singh, Aseem Bansal, Ibraim Ganiev,\nStephen Hoover, AishwaryaRK, Steven C. Howell, Gary Foreman, Neeraj Gangwar,\nTahar, Jon Crall, dokato, Kathy Chen, ferria, Thomas Moreau, Charlie Brummitt,\nNicolas Goix, Adam Kleczewski, Sam Shleifer, Nikita Singh, Basil Beirouti,\nGiorgio Patrini, Manoj Kumar, Rafael Possas, James Bourbeau, James A. Bednar,\nJanine Harper, Jaye, Jean Helie, Jeremy Steward, Artsiom, John Wei, Jonathan\nLIgo, Jonathan Rahn, seanpwilliams, Arthur Mensch, Josh Levy, Julian Kuhlmann,\nJulien Aubert, J\u00f6rn Hees, Kai, shivamgargsya, Kat Hempstalk, Kaushik\nLakshmikanth, Kennedy, Kenneth Lyons, Kenneth Myers, Kevin Yap, Kirill Bobyrev,\nKonstantin Podshumok, Arthur Imbert, Lee Murray, toastedcornflakes, Lera, Li\nLi, Arthur Douillard, Mainak Jas, tobycheese, Manraj Singh, Manvendra Singh,\nMarc Meketon, MarcoFalke, Matthew Brett, Matthias Gilch, Mehul Ahuja, Melanie\nGoetz, Meng, Peng, Michael Dezube, Michal Baumgartner, vibrantabhi19, Artem\nGolubin, Milen Paskov, Antonin Carette, Morikko, MrMjauh, NALEPA Emmanuel,\nNamiya, Antoine Wendlinger, Narine Kokhlikyan, NarineK, Nate Guerin, Angus\nWilliams, Ang Lu, Nicole Vavrova, Nitish Pandey, Okhlopkov Daniil Olegovich,\nAndy Craze, Om Prakash, Parminder Singh, Patrick Carlson, Patrick Pei, Paul\nGanssle, Paulo Haddad, Pawe\u0142 Lorek, Peng Yu, Pete Bachant, Peter Bull, Peter\nCsizsek, Peter Wang, Pieter Arthur de Jong, Ping-Yao, Chang, Preston Parry,\nPuneet Mathur, Quentin Hibon, Andrew Smith, Andrew Jackson, 1kastner, Rameshwar\nBhaskaran, Rebecca Bilbro, Remi Rampin, Andrea Esuli, Rob Hall, Robert\nBradshaw, Romain Brault, Aman Pratik, Ruifeng Zheng, Russell Smith, Sachin\nAgarwal, Sailesh Choyal, Samson Tan, Samu\u00ebl Weber, Sarah Brown, Sebastian\nP\u00f6lsterl, Sebastian Raschka, Sebastian Saeger, Alyssa Batula, Abhyuday Pratap\nSingh, Sergey Feldman, Sergul Aydore, Sharan Yalburgi, willduan, Siddharth\nGupta, Sri Krishna, Almer, Stijn Tonk, Allen Riddell, Theofilos Papapanagiotou,\nAlison, Alexis Mignon, Tommy Boucher, Tommy L\u00f6fstedt, Toshihiro Kamishima,\nTyler Folkman, Tyler Lanigan, Alexander Junge, Varun Shenoy, Victor Poughon,\nVilhelm von Ehrenheim, Aleksandr Sandrovskii, Alan Yee, Vlasios Vasileiou,\nWarut Vijitbenjaronk, Yang Zhang, Yaroslav Halchenko, Yichuan Liu, Yuichi\nFujikawa, affanv14, aivision2020, xor, andreh7, brady salz, campustrampus,\nAgamemnon Krasoulis, ditenberg, elena-sharova, filipj8, fukatani, gedeck,\nguiniol, guoci, hakaa1, hongkahjun, i-am-xhy, jakirkham, jaroslaw-weber,\njayzed82, jeroko, jmontoyam, jonathan.striebel, josephsalmon, jschendel,\nleereeves, martin-hahn, mathurinm, mehak-sachdeva, mlewis1729, mlliou112,\nmthorrell, ndingwall, nuffe, yangarbiter, plagree, pldtc325, Breno Freitas,\nBrett Olsen, Brian A. Alfano, Brian Burns, polmauri, Brandon Carter, Charlton\nAustin, Chayant T15h, Chinmaya Pancholi, Christian Danielsen, Chung Yen,\nChyi-Kwei Yau, pravarmahajan, DOHMATOB Elvis, Daniel LeJeune, Daniel Hnyk,\nDarius Morawiec, David DeTomaso, David Gasquez, David Haberth\u00fcr, David\nHeryanto, David Kirkby, David Nicholson, rashchedrin, Deborah Gertrude Digges,\nDenis Engemann, Devansh D, Dickson, Bob Baxley, Don86, E. Lynch-Klarup, Ed\nRogers, Elizabeth Ferriss, Ellen-Co2, Fabian Egli, Fang-Chieh Chou, Bing Tian\nDai, Greg Stupp, Grzegorz Szpak, Bertrand Thirion, Hadrien Bertrand, Harizo\nRajaona, zxcvbnius, Henry Lin, Holger Peters, Icyblade Dai, Igor\nAndriushchenko, Ilya, Isaac Laughlin, Iv\u00e1n Vall\u00e9s, Aur\u00e9lien Bellet, JPFrancoia,\nJacob Schreiber, Asish Mahapatra\n\n.. _changes_0_18_2:\n\nVersion 0.18.2\n==============\n\n**June 20, 2017**\n\n.. topic:: Last release with Python 2.6 support\nScikit-learn 0.18 is the last major release of scikit-learn to support Python 2.6.\n Later versions of scikit-learn will require Python 2.7 or above.\nChangelog\n---------\n\n- Fixes for compatibility with NumPy 1.13.0: :issue:`7946` :issue:`8355` by\n `Loic Esteve`_.\n\n- Minor compatibility changes in the examples :issue:`9010` :issue:`8040`\n :issue:`9149`.\n\nCode Contributors\n-----------------\nAman Dalmia, Loic Esteve, Nate Guerin, Sergei Lebedev\n.. _changes_0_18_1:\n\nVersion 0.18.1\n==============\n\n**November 11, 2016**\n\nChangelog\n---------\n\nEnhancements\n............\n\n- Improved ``sample_without_replacement`` speed by utilizing\n numpy.random.permutation for most cases. As a result,\n samples may differ in this release for a fixed random state.\n Affected estimators:\n- :class:`ensemble.BaggingClassifier`\n - :class:`ensemble.BaggingRegressor`\n - :class:`linear_model.RANSACRegressor`\n - :class:`model_selection.RandomizedSearchCV`\n - :class:`random_projection.SparseRandomProjection`\nThis also affects the :meth:`datasets.make_classification`\n method.\n\nBug fixes\n.........\n\n- Fix issue where ``min_grad_norm`` and ``n_iter_without_progress``\n parameters were not being utilised by :class:`manifold.TSNE`.\n :issue:`6497` by :user:`Sebastian S\u00e4ger `\n\n- Fix bug for svm's decision values when ``decision_function_shape``\n is ``ovr`` in :class:`svm.SVC`.\n :class:`svm.SVC`'s decision_function was incorrect from versions\n 0.17.0 through 0.18.0.\n :issue:`7724` by `Bing Tian Dai`_\n\n- Attribute ``explained_variance_ratio`` of\n :class:`discriminant_analysis.LinearDiscriminantAnalysis` calculated\n with SVD and Eigen solver are now of the same length. :issue:`7632`\n by :user:`JPFrancoia `\n\n- Fixes issue in :ref:`univariate_feature_selection` where score\n functions were not accepting multi-label targets. :issue:`7676`\n by :user:`Mohammed Affan `\n\n- Fixed setting parameters when calling ``fit`` multiple times on\n :class:`feature_selection.SelectFromModel`. :issue:`7756` by `Andreas M\u00fcller`_\n\n- Fixes issue in ``partial_fit`` method of\n :class:`multiclass.OneVsRestClassifier` when number of classes used in\n ``partial_fit`` was less than the total number of classes in the\n data. :issue:`7786` by `Srivatsan Ramesh`_\n\n- Fixes issue in :class:`calibration.CalibratedClassifierCV` where\n the sum of probabilities of each class for a data was not 1, and\n ``CalibratedClassifierCV`` now handles the case where the training set\n has less number of classes than the total data. :issue:`7799` by\n `Srivatsan Ramesh`_\n\n- Fix a bug where :class:`sklearn.feature_selection.SelectFdr` did not\n exactly implement Benjamini-Hochberg procedure. It formerly may have\n selected fewer features than it should.\n :issue:`7490` by :user:`Peng Meng `.\n\n- :class:`sklearn.manifold.LocallyLinearEmbedding` now correctly handles\n integer inputs. :issue:`6282` by `Jake Vanderplas`_.\n\n- The ``min_weight_fraction_leaf`` parameter of tree-based classifiers and\n regressors now assumes uniform sample weights by default if the\n ``sample_weight`` argument is not passed to the ``fit`` function.\n Previously, the parameter was silently ignored. :issue:`7301`\n by :user:`Nelson Liu `.\n\n- Numerical issue with :class:`linear_model.RidgeCV` on centered data when\n `n_features > n_samples`. :issue:`6178` by `Bertrand Thirion`_\n\n- Tree splitting criterion classes' cloning/pickling is now memory safe\n :issue:`7680` by :user:`Ibraim Ganiev `.\n\n- Fixed a bug where :class:`decomposition.NMF` sets its ``n_iters_``\n attribute in `transform()`. :issue:`7553` by :user:`Ekaterina\n Krivich `.\n\n- :class:`sklearn.linear_model.LogisticRegressionCV` now correctly handles\n string labels. :issue:`5874` by `Raghav RV`_.\n\n- Fixed a bug where :func:`sklearn.model_selection.train_test_split` raised\n an error when ``stratify`` is a list of string labels. :issue:`7593` by\n `Raghav RV`_.\n\n- Fixed a bug where :class:`sklearn.model_selection.GridSearchCV` and\n :class:`sklearn.model_selection.RandomizedSearchCV` were not pickleable\n because of a pickling bug in ``np.ma.MaskedArray``. :issue:`7594` by\n `Raghav RV`_.\n\n- All cross-validation utilities in :mod:`sklearn.model_selection` now\n permit one time cross-validation splitters for the ``cv`` parameter. Also\n non-deterministic cross-validation splitters (where multiple calls to\n ``split`` produce dissimilar splits) can be used as ``cv`` parameter.\n The :class:`sklearn.model_selection.GridSearchCV` will cross-validate each\n parameter setting on the split produced by the first ``split`` call\n to the cross-validation splitter. :issue:`7660` by `Raghav RV`_.\n\n- Fix bug where :meth:`preprocessing.MultiLabelBinarizer.fit_transform`\n returned an invalid CSR matrix.\n :issue:`7750` by :user:`CJ Carey `.\n\n- Fixed a bug where :func:`metrics.pairwise.cosine_distances` could return a\n small negative distance. :issue:`7732` by :user:`Artsion `.\n\nAPI changes summary\n-------------------\n\nTrees and forests\n\n- The ``min_weight_fraction_leaf`` parameter of tree-based classifiers and\n regressors now assumes uniform sample weights by default if the\n ``sample_weight`` argument is not passed to the ``fit`` function.\n Previously, the parameter was silently ignored. :issue:`7301` by :user:`Nelson\n Liu `.\n\n- Tree splitting criterion classes' cloning/pickling is now memory safe.\n :issue:`7680` by :user:`Ibraim Ganiev `.\nLinear, kernelized and related models\n\n- Length of ``explained_variance_ratio`` of\n :class:`discriminant_analysis.LinearDiscriminantAnalysis`\n changed for both Eigen and SVD solvers. The attribute has now a length\n of min(n_components, n_classes - 1). :issue:`7632`\n by :user:`JPFrancoia `\n\n- Numerical issue with :class:`linear_model.RidgeCV` on centered data when\n ``n_features > n_samples``. :issue:`6178` by `Bertrand Thirion`_\n\n.. _changes_0_18:\n\nVersion 0.18\n============\n\n**September 28, 2016**\n\n.. topic:: Last release with Python 2.6 support\nScikit-learn 0.18 will be the last version of scikit-learn to support Python 2.6.\n Later versions of scikit-learn will require Python 2.7 or above.\n\n.. _model_selection_changes:\n\nModel Selection Enhancements and API Changes\n--------------------------------------------\n\n- **The model_selection module**\nThe new module :mod:`sklearn.model_selection`, which groups together the\n functionalities of formerly :mod:`sklearn.cross_validation`,\n :mod:`sklearn.grid_search` and :mod:`sklearn.learning_curve`, introduces new\n possibilities such as nested cross-validation and better manipulation of\n parameter searches with Pandas.\nMany things will stay the same but there are some key differences. Read\n below to know more about the changes.\n\n- **Data-independent CV splitters enabling nested cross-validation**\nThe new cross-validation splitters, defined in the\n :mod:`sklearn.model_selection`, are no longer initialized with any\n data-dependent parameters such as ``y``. Instead they expose a\n :func:`split` method that takes in the data and yields a generator for the\n different splits.\nThis change makes it possible to use the cross-validation splitters to\n perform nested cross-validation, facilitated by\n :class:`model_selection.GridSearchCV` and\n :class:`model_selection.RandomizedSearchCV` utilities.\n\n- **The enhanced cv_results_ attribute**\nThe new ``cv_results_`` attribute (of :class:`model_selection.GridSearchCV`\n and :class:`model_selection.RandomizedSearchCV`) introduced in lieu of the\n ``grid_scores_`` attribute is a dict of 1D arrays with elements in each\n array corresponding to the parameter settings (i.e. search candidates).\nThe ``cv_results_`` dict can be easily imported into ``pandas`` as a\n ``DataFrame`` for exploring the search results.\nThe ``cv_results_`` arrays include scores for each cross-validation split\n (with keys such as ``'split0_test_score'``), as well as their mean\n (``'mean_test_score'``) and standard deviation (``'std_test_score'``).\nThe ranks for the search candidates (based on their mean\n cross-validation score) is available at ``cv_results_['rank_test_score']``.\nThe parameter values for each parameter is stored separately as numpy\n masked object arrays. The value, for that search candidate, is masked if\n the corresponding parameter is not applicable. Additionally a list of all\n the parameter dicts are stored at ``cv_results_['params']``.\n\n- **Parameters n_folds and n_iter renamed to n_splits**\nSome parameter names have changed:\n The ``n_folds`` parameter in new :class:`model_selection.KFold`,\n :class:`model_selection.GroupKFold` (see below for the name change),\n and :class:`model_selection.StratifiedKFold` is now renamed to\n ``n_splits``. The ``n_iter`` parameter in\n :class:`model_selection.ShuffleSplit`, the new class\n :class:`model_selection.GroupShuffleSplit` and\n :class:`model_selection.StratifiedShuffleSplit` is now renamed to\n ``n_splits``.\n\n- **Rename of splitter classes which accepts group labels along with data**\nThe cross-validation splitters ``LabelKFold``,\n ``LabelShuffleSplit``, ``LeaveOneLabelOut`` and ``LeavePLabelOut`` have\n been renamed to :class:`model_selection.GroupKFold`,\n :class:`model_selection.GroupShuffleSplit`,\n :class:`model_selection.LeaveOneGroupOut` and\n :class:`model_selection.LeavePGroupsOut` respectively.\nNote the change from singular to plural form in\n :class:`model_selection.LeavePGroupsOut`.\n\n- **Fit parameter labels renamed to groups**\nThe ``labels`` parameter in the :func:`split` method of the newly renamed\n splitters :class:`model_selection.GroupKFold`,\n :class:`model_selection.LeaveOneGroupOut`,\n :class:`model_selection.LeavePGroupsOut`,\n :class:`model_selection.GroupShuffleSplit` is renamed to ``groups``\n following the new nomenclature of their class names.\n\n- **Parameter n_labels renamed to n_groups**\nThe parameter ``n_labels`` in the newly renamed\n :class:`model_selection.LeavePGroupsOut` is changed to ``n_groups``.\n\n- Training scores and Timing information\n``cv_results_`` also includes the training scores for each\n cross-validation split (with keys such as ``'split0_train_score'``), as\n well as their mean (``'mean_train_score'``) and standard deviation\n (``'std_train_score'``). To avoid the cost of evaluating training score,\n set ``return_train_score=False``.\nAdditionally the mean and standard deviation of the times taken to split,\n train and score the model across all the cross-validation splits is\n available at the key ``'mean_time'`` and ``'std_time'`` respectively.\n\nChangelog\n---------\n\nNew features\n............\n\nClassifiers and Regressors\n\n- The Gaussian Process module has been reimplemented and now offers classification\n and regression estimators through :class:`gaussian_process.GaussianProcessClassifier`\n and :class:`gaussian_process.GaussianProcessRegressor`. Among other things, the new\n implementation supports kernel engineering, gradient-based hyperparameter optimization or\n sampling of functions from GP prior and GP posterior. Extensive documentation and\n examples are provided. By `Jan Hendrik Metzen`_.\n\n- Added new supervised learning algorithm: :ref:`Multi-layer Perceptron `\n :issue:`3204` by :user:`Issam H. Laradji `\n\n- Added :class:`linear_model.HuberRegressor`, a linear model robust to outliers.\n :issue:`5291` by `Manoj Kumar`_.\n\n- Added the :class:`multioutput.MultiOutputRegressor` meta-estimator. It\n converts single output regressors to multi-output regressors by fitting\n one regressor per output. By :user:`Tim Head `.\n\nOther estimators\n\n- New :class:`mixture.GaussianMixture` and :class:`mixture.BayesianGaussianMixture`\n replace former mixture models, employing faster inference\n for sounder results. :issue:`7295` by :user:`Wei Xue ` and\n :user:`Thierry Guillemot `.\n\n- Class :class:`decomposition.RandomizedPCA` is now factored into :class:`decomposition.PCA`\n and it is available calling with parameter ``svd_solver='randomized'``.\n The default number of ``n_iter`` for ``'randomized'`` has changed to 4. The old\n behavior of PCA is recovered by ``svd_solver='full'``. An additional solver\n calls ``arpack`` and performs truncated (non-randomized) SVD. By default,\n the best solver is selected depending on the size of the input and the\n number of components requested. :issue:`5299` by :user:`Giorgio Patrini `.\n\n- Added two functions for mutual information estimation:\n :func:`feature_selection.mutual_info_classif` and\n :func:`feature_selection.mutual_info_regression`. These functions can be\n used in :class:`feature_selection.SelectKBest` and\n :class:`feature_selection.SelectPercentile` as score functions.\n By :user:`Andrea Bravi ` and :user:`Nikolay Mayorov `.\n\n- Added the :class:`ensemble.IsolationForest` class for anomaly detection based on\n random forests. By `Nicolas Goix`_.\n\n- Added ``algorithm=\"elkan\"`` to :class:`cluster.KMeans` implementing\n Elkan's fast K-Means algorithm. By `Andreas M\u00fcller`_.\n\nModel selection and evaluation\n\n- Added :func:`metrics.cluster.fowlkes_mallows_score`, the Fowlkes Mallows\n Index which measures the similarity of two clusterings of a set of points\n By :user:`Arnaud Fouchet ` and :user:`Thierry Guillemot `.\n\n- Added :func:`metrics.calinski_harabaz_score`, which computes the Calinski\n and Harabaz score to evaluate the resulting clustering of a set of points.\n By :user:`Arnaud Fouchet ` and :user:`Thierry Guillemot `.\n\n- Added new cross-validation splitter\n :class:`model_selection.TimeSeriesSplit` to handle time series data.\n :issue:`6586` by :user:`YenChen Lin `\n\n- The cross-validation iterators are replaced by cross-validation splitters\n available from :mod:`sklearn.model_selection`, allowing for nested\n cross-validation. See :ref:`model_selection_changes` for more information.\n :issue:`4294` by `Raghav RV`_.\n\nEnhancements\n............\n\nTrees and ensembles\n\n- Added a new splitting criterion for :class:`tree.DecisionTreeRegressor`,\n the mean absolute error. This criterion can also be used in\n :class:`ensemble.ExtraTreesRegressor`,\n :class:`ensemble.RandomForestRegressor`, and the gradient boosting\n estimators. :issue:`6667` by :user:`Nelson Liu `.\n\n- Added weighted impurity-based early stopping criterion for decision tree\n growth. :issue:`6954` by :user:`Nelson Liu `\n\n- The random forest, extra tree and decision tree estimators now has a\n method ``decision_path`` which returns the decision path of samples in\n the tree. By `Arnaud Joly`_.\n\n- A new example has been added unveiling the decision tree structure.\n By `Arnaud Joly`_.\n\n- Random forest, extra trees, decision trees and gradient boosting estimator\n accept the parameter ``min_samples_split`` and ``min_samples_leaf``\n provided as a percentage of the training samples. By :user:`yelite ` and `Arnaud Joly`_.\n\n- Gradient boosting estimators accept the parameter ``criterion`` to specify\n to splitting criterion used in built decision trees.\n :issue:`6667` by :user:`Nelson Liu `.\n\n- The memory footprint is reduced (sometimes greatly) for\n :class:`ensemble.bagging.BaseBagging` and classes that inherit from it,\n i.e, :class:`ensemble.BaggingClassifier`,\n :class:`ensemble.BaggingRegressor`, and :class:`ensemble.IsolationForest`,\n by dynamically generating attribute ``estimators_samples_`` only when it is\n needed. By :user:`David Staub `.\n\n- Added ``n_jobs`` and ``sample_weight`` parameters for\n :class:`ensemble.VotingClassifier` to fit underlying estimators in parallel.\n :issue:`5805` by :user:`Ibraim Ganiev `.\n\nLinear, kernelized and related models\n\n- In :class:`linear_model.LogisticRegression`, the SAG solver is now\n available in the multinomial case. :issue:`5251` by `Tom Dupre la Tour`_.\n\n- :class:`linear_model.RANSACRegressor`, :class:`svm.LinearSVC` and\n :class:`svm.LinearSVR` now support ``sample_weight``.\n By :user:`Imaculate `.\n\n- Add parameter ``loss`` to :class:`linear_model.RANSACRegressor` to measure the\n error on the samples for every trial. By `Manoj Kumar`_.\n\n- Prediction of out-of-sample events with Isotonic Regression\n (:class:`isotonic.IsotonicRegression`) is now much faster (over 1000x in tests with synthetic\n data). By :user:`Jonathan Arfa `.\n\n- Isotonic regression (:class:`isotonic.IsotonicRegression`) now uses a better algorithm to avoid\n `O(n^2)` behavior in pathological cases, and is also generally faster\n (:issue:`#6691`). By `Antony Lee`_.\n\n- :class:`naive_bayes.GaussianNB` now accepts data-independent class-priors\n through the parameter ``priors``. By :user:`Guillaume Lemaitre `.\n\n- :class:`linear_model.ElasticNet` and :class:`linear_model.Lasso`\n now works with ``np.float32`` input data without converting it\n into ``np.float64``. This allows to reduce the memory\n consumption. :issue:`6913` by :user:`YenChen Lin `.\n\n- :class:`semi_supervised.LabelPropagation` and :class:`semi_supervised.LabelSpreading`\n now accept arbitrary kernel functions in addition to strings ``knn`` and ``rbf``.\n :issue:`5762` by :user:`Utkarsh Upadhyay `.\n\nDecomposition, manifold learning and clustering\n\n- Added ``inverse_transform`` function to :class:`decomposition.NMF` to compute\n data matrix of original shape. By :user:`Anish Shah `.\n\n- :class:`cluster.KMeans` and :class:`cluster.MiniBatchKMeans` now works\n with ``np.float32`` and ``np.float64`` input data without converting it.\n This allows to reduce the memory consumption by using ``np.float32``.\n :issue:`6846` by :user:`Sebastian S\u00e4ger ` and\n :user:`YenChen Lin `.\n\nPreprocessing and feature selection\n\n- :class:`preprocessing.RobustScaler` now accepts ``quantile_range`` parameter.\n :issue:`5929` by :user:`Konstantin Podshumok `.\n\n- :class:`feature_extraction.FeatureHasher` now accepts string values.\n :issue:`6173` by :user:`Ryad Zenine ` and\n :user:`Devashish Deshpande `.\n\n- Keyword arguments can now be supplied to ``func`` in\n :class:`preprocessing.FunctionTransformer` by means of the ``kw_args``\n parameter. By `Brian McFee`_.\n\n- :class:`feature_selection.SelectKBest` and :class:`feature_selection.SelectPercentile`\n now accept score functions that take X, y as input and return only the scores.\n By :user:`Nikolay Mayorov `.\n\nModel evaluation and meta-estimators\n\n- :class:`multiclass.OneVsOneClassifier` and :class:`multiclass.OneVsRestClassifier`\n now support ``partial_fit``. By :user:`Asish Panda ` and\n :user:`Philipp Dowling `.\n\n- Added support for substituting or disabling :class:`pipeline.Pipeline`\n and :class:`pipeline.FeatureUnion` components using the ``set_params``\n interface that powers :mod:`sklearn.grid_search`.\n See :ref:`sphx_glr_auto_examples_plot_compare_reduction.py`\n By `Joel Nothman`_ and :user:`Robert McGibbon `.\n\n- The new ``cv_results_`` attribute of :class:`model_selection.GridSearchCV`\n (and :class:`model_selection.RandomizedSearchCV`) can be easily imported\n into pandas as a ``DataFrame``. Ref :ref:`model_selection_changes` for\n more information. :issue:`6697` by `Raghav RV`_.\n\n- Generalization of :func:`model_selection.cross_val_predict`.\n One can pass method names such as `predict_proba` to be used in the cross\n validation framework instead of the default `predict`.\n By :user:`Ori Ziv ` and :user:`Sears Merritt `.\n\n- The training scores and time taken for training followed by scoring for\n each search candidate are now available at the ``cv_results_`` dict.\n See :ref:`model_selection_changes` for more information.\n :issue:`7325` by :user:`Eugene Chen ` and `Raghav RV`_.\n\nMetrics\n\n- Added ``labels`` flag to :class:`metrics.log_loss` to explicitly provide\n the labels when the number of classes in ``y_true`` and ``y_pred`` differ.\n :issue:`7239` by :user:`Hong Guangguo ` with help from\n :user:`Mads Jensen ` and :user:`Nelson Liu `.\n\n- Support sparse contingency matrices in cluster evaluation\n (:mod:`metrics.cluster.supervised`) to scale to a large number of\n clusters.\n :issue:`7419` by :user:`Gregory Stupp ` and `Joel Nothman`_.\n\n- Add ``sample_weight`` parameter to :func:`metrics.matthews_corrcoef`.\n By :user:`Jatin Shah ` and `Raghav RV`_.\n\n- Speed up :func:`metrics.silhouette_score` by using vectorized operations.\n By `Manoj Kumar`_.\n\n- Add ``sample_weight`` parameter to :func:`metrics.confusion_matrix`.\n By :user:`Bernardo Stein `.\n\nMiscellaneous\n\n- Added ``n_jobs`` parameter to :class:`feature_selection.RFECV` to compute\n the score on the test folds in parallel. By `Manoj Kumar`_\n\n- Codebase does not contain C/C++ cython generated files: they are\n generated during build. Distribution packages will still contain generated\n C/C++ files. By :user:`Arthur Mensch `.\n\n- Reduce the memory usage for 32-bit float input arrays of\n :func:`utils.sparse_func.mean_variance_axis` and\n :func:`utils.sparse_func.incr_mean_variance_axis` by supporting cython\n fused types. By :user:`YenChen Lin `.\n\n- The :func:`ignore_warnings` now accept a category argument to ignore only\n the warnings of a specified type. By :user:`Thierry Guillemot `.\n\n- Added parameter ``return_X_y`` and return type ``(data, target) : tuple`` option to\n :func:`load_iris` dataset\n :issue:`7049`,\n :func:`load_breast_cancer` dataset\n :issue:`7152`,\n :func:`load_digits` dataset,\n :func:`load_diabetes` dataset,\n :func:`load_linnerud` dataset,\n :func:`load_boston` dataset\n :issue:`7154` by\n :user:`Manvendra Singh `.\n\n- Simplification of the ``clone`` function, deprecate support for estimators\n that modify parameters in ``__init__``. :issue:`5540` by `Andreas M\u00fcller`_.\n\n- When unpickling a scikit-learn estimator in a different version than the one\n the estimator was trained with, a ``UserWarning`` is raised, see :ref:`the documentation\n on model persistence ` for more details. (:issue:`7248`)\n By `Andreas M\u00fcller`_.\n\nBug fixes\n.........\n\nTrees and ensembles\n\n- Random forest, extra trees, decision trees and gradient boosting\n won't accept anymore ``min_samples_split=1`` as at least 2 samples\n are required to split a decision tree node. By `Arnaud Joly`_\n\n- :class:`ensemble.VotingClassifier` now raises ``NotFittedError`` if ``predict``,\n ``transform`` or ``predict_proba`` are called on the non-fitted estimator.\n by `Sebastian Raschka`_.\n\n- Fix bug where :class:`ensemble.AdaBoostClassifier` and\n :class:`ensemble.AdaBoostRegressor` would perform poorly if the\n ``random_state`` was fixed\n (:issue:`7411`). By `Joel Nothman`_.\n\n- Fix bug in ensembles with randomization where the ensemble would not\n set ``random_state`` on base estimators in a pipeline or similar nesting.\n (:issue:`7411`). Note, results for :class:`ensemble.BaggingClassifier`\n :class:`ensemble.BaggingRegressor`, :class:`ensemble.AdaBoostClassifier`\n and :class:`ensemble.AdaBoostRegressor` will now differ from previous\n versions. By `Joel Nothman`_.\n\nLinear, kernelized and related models\n\n- Fixed incorrect gradient computation for ``loss='squared_epsilon_insensitive'`` in\n :class:`linear_model.SGDClassifier` and :class:`linear_model.SGDRegressor`\n (:issue:`6764`). By :user:`Wenhua Yang `.\n\n- Fix bug in :class:`linear_model.LogisticRegressionCV` where\n ``solver='liblinear'`` did not accept ``class_weights='balanced``.\n (:issue:`6817`). By `Tom Dupre la Tour`_.\n\n- Fix bug in :class:`neighbors.RadiusNeighborsClassifier` where an error\n occurred when there were outliers being labelled and a weight function\n specified (:issue:`6902`). By\n `LeonieBorne `_.\n\n- Fix :class:`linear_model.ElasticNet` sparse decision function to match\n output with dense in the multioutput case.\n\nDecomposition, manifold learning and clustering\n\n- :class:`decomposition.RandomizedPCA` default number of `iterated_power` is 4 instead of 3.\n :issue:`5141` by :user:`Giorgio Patrini `.\n\n- :func:`utils.extmath.randomized_svd` performs 4 power iterations by default, instead or 0.\n In practice this is enough for obtaining a good approximation of the\n true eigenvalues/vectors in the presence of noise. When `n_components` is\n small (``< .1 * min(X.shape)``) `n_iter` is set to 7, unless the user specifies\n a higher number. This improves precision with few components.\n :issue:`5299` by :user:`Giorgio Patrini `.\n\n- Whiten/non-whiten inconsistency between components of :class:`decomposition.PCA`\n and :class:`decomposition.RandomizedPCA` (now factored into PCA, see the\n New features) is fixed. `components_` are stored with no whitening.\n :issue:`5299` by :user:`Giorgio Patrini `.\n\n- Fixed bug in :func:`manifold.spectral_embedding` where diagonal of unnormalized\n Laplacian matrix was incorrectly set to 1. :issue:`4995` by :user:`Peter Fischer `.\n\n- Fixed incorrect initialization of :func:`utils.arpack.eigsh` on all\n occurrences. Affects :class:`cluster.bicluster.SpectralBiclustering`,\n :class:`decomposition.KernelPCA`, :class:`manifold.LocallyLinearEmbedding`,\n and :class:`manifold.SpectralEmbedding` (:issue:`5012`). By\n :user:`Peter Fischer `.\n\n- Attribute ``explained_variance_ratio_`` calculated with the SVD solver\n of :class:`discriminant_analysis.LinearDiscriminantAnalysis` now returns\n correct results. By :user:`JPFrancoia `\n\nPreprocessing and feature selection\n\n- :func:`preprocessing.data._transform_selected` now always passes a copy\n of ``X`` to transform function when ``copy=True`` (:issue:`7194`). By `Caio\n Oliveira `_.\n\nModel evaluation and meta-estimators\n\n- :class:`model_selection.StratifiedKFold` now raises error if all n_labels\n for individual classes is less than n_folds.\n :issue:`6182` by :user:`Devashish Deshpande `.\n\n- Fixed bug in :class:`model_selection.StratifiedShuffleSplit`\n where train and test sample could overlap in some edge cases,\n see :issue:`6121` for\n more details. By `Loic Esteve`_.\n\n- Fix in :class:`sklearn.model_selection.StratifiedShuffleSplit` to\n return splits of size ``train_size`` and ``test_size`` in all cases\n (:issue:`6472`). By `Andreas M\u00fcller`_.\n\n- Cross-validation of :class:`OneVsOneClassifier` and\n :class:`OneVsRestClassifier` now works with precomputed kernels.\n :issue:`7350` by :user:`Russell Smith `.\n\n- Fix incomplete ``predict_proba`` method delegation from\n :class:`model_selection.GridSearchCV` to\n :class:`linear_model.SGDClassifier` (:issue:`7159`)\n by `Yichuan Liu `_.\n\nMetrics\n\n- Fix bug in :func:`metrics.silhouette_score` in which clusters of\n size 1 were incorrectly scored. They should get a score of 0.\n By `Joel Nothman`_.\n\n- Fix bug in :func:`metrics.silhouette_samples` so that it now works with\n arbitrary labels, not just those ranging from 0 to n_clusters - 1.\n\n- Fix bug where expected and adjusted mutual information were incorrect if\n cluster contingency cells exceeded ``2**16``. By `Joel Nothman`_.\n\n- :func:`metrics.pairwise.pairwise_distances` now converts arrays to\n boolean arrays when required in ``scipy.spatial.distance``.\n :issue:`5460` by `Tom Dupre la Tour`_.\n\n- Fix sparse input support in :func:`metrics.silhouette_score` as well as\n example examples/text/document_clustering.py. By :user:`YenChen Lin `.\n\n- :func:`metrics.roc_curve` and :func:`metrics.precision_recall_curve` no\n longer round ``y_score`` values when creating ROC curves; this was causing\n problems for users with very small differences in scores (:issue:`7353`).\n\nMiscellaneous\n\n- :func:`model_selection.tests._search._check_param_grid` now works correctly with all types\n that extends/implements `Sequence` (except string), including range (Python 3.x) and xrange\n (Python 2.x). :issue:`7323` by Viacheslav Kovalevskyi.\n\n- :func:`utils.extmath.randomized_range_finder` is more numerically stable when many\n power iterations are requested, since it applies LU normalization by default.\n If ``n_iter<2`` numerical issues are unlikely, thus no normalization is applied.\n Other normalization options are available: ``'none', 'LU'`` and ``'QR'``.\n :issue:`5141` by :user:`Giorgio Patrini `.\n\n- Fix a bug where some formats of ``scipy.sparse`` matrix, and estimators\n with them as parameters, could not be passed to :func:`base.clone`.\n By `Loic Esteve`_.\n\n- :func:`datasets.load_svmlight_file` now is able to read long int QID values.\n :issue:`7101` by :user:`Ibraim Ganiev `.\nAPI changes summary\n-------------------\n\nLinear, kernelized and related models\n\n- ``residual_metric`` has been deprecated in :class:`linear_model.RANSACRegressor`.\n Use ``loss`` instead. By `Manoj Kumar`_.\n\n- Access to public attributes ``.X_`` and ``.y_`` has been deprecated in\n :class:`isotonic.IsotonicRegression`. By :user:`Jonathan Arfa `.\n\nDecomposition, manifold learning and clustering\n\n- The old :class:`mixture.DPGMM` is deprecated in favor of the new\n :class:`mixture.BayesianGaussianMixture` (with the parameter\n ``weight_concentration_prior_type='dirichlet_process'``).\n The new class solves the computational\n problems of the old class and computes the Gaussian mixture with a\n Dirichlet process prior faster than before.\n :issue:`7295` by :user:`Wei Xue ` and :user:`Thierry Guillemot `.\n\n- The old :class:`mixture.VBGMM` is deprecated in favor of the new\n :class:`mixture.BayesianGaussianMixture` (with the parameter\n ``weight_concentration_prior_type='dirichlet_distribution'``).\n The new class solves the computational\n problems of the old class and computes the Variational Bayesian Gaussian\n mixture faster than before.\n :issue:`6651` by :user:`Wei Xue ` and :user:`Thierry Guillemot `.\n\n- The old :class:`mixture.GMM` is deprecated in favor of the new\n :class:`mixture.GaussianMixture`. The new class computes the Gaussian mixture\n faster than before and some of computational problems have been solved.\n :issue:`6666` by :user:`Wei Xue ` and :user:`Thierry Guillemot `.\n\nModel evaluation and meta-estimators\n\n- The :mod:`sklearn.cross_validation`, :mod:`sklearn.grid_search` and\n :mod:`sklearn.learning_curve` have been deprecated and the classes and\n functions have been reorganized into the :mod:`sklearn.model_selection`\n module. Ref :ref:`model_selection_changes` for more information.\n :issue:`4294` by `Raghav RV`_.\n\n- The ``grid_scores_`` attribute of :class:`model_selection.GridSearchCV`\n and :class:`model_selection.RandomizedSearchCV` is deprecated in favor of\n the attribute ``cv_results_``.\n Ref :ref:`model_selection_changes` for more information.\n :issue:`6697` by `Raghav RV`_.\n\n- The parameters ``n_iter`` or ``n_folds`` in old CV splitters are replaced\n by the new parameter ``n_splits`` since it can provide a consistent\n and unambiguous interface to represent the number of train-test splits.\n :issue:`7187` by :user:`YenChen Lin `.\n\n- ``classes`` parameter was renamed to ``labels`` in\n :func:`metrics.hamming_loss`. :issue:`7260` by :user:`Sebasti\u00e1n Vanrell `.\n\n- The splitter classes ``LabelKFold``, ``LabelShuffleSplit``,\n ``LeaveOneLabelOut`` and ``LeavePLabelsOut`` are renamed to\n :class:`model_selection.GroupKFold`,\n :class:`model_selection.GroupShuffleSplit`,\n :class:`model_selection.LeaveOneGroupOut`\n and :class:`model_selection.LeavePGroupsOut` respectively.\n Also the parameter ``labels`` in the :func:`split` method of the newly\n renamed splitters :class:`model_selection.LeaveOneGroupOut` and\n :class:`model_selection.LeavePGroupsOut` is renamed to\n ``groups``. Additionally in :class:`model_selection.LeavePGroupsOut`,\n the parameter ``n_labels`` is renamed to ``n_groups``.\n :issue:`6660` by `Raghav RV`_.\n\n- Error and loss names for ``scoring`` parameters are now prefixed by\n ``'neg_'``, such as ``neg_mean_squared_error``. The unprefixed versions\n are deprecated and will be removed in version 0.20.\n :issue:`7261` by :user:`Tim Head `.\n\nCode Contributors\n-----------------\nAditya Joshi, Alejandro, Alexander Fabisch, Alexander Loginov, Alexander\nMinyushkin, Alexander Rudy, Alexandre Abadie, Alexandre Abraham, Alexandre\nGramfort, Alexandre Saint, alexfields, Alvaro Ulloa, alyssaq, Amlan Kar,\nAndreas Mueller, andrew giessel, Andrew Jackson, Andrew McCulloh, Andrew\nMurray, Anish Shah, Arafat, Archit Sharma, Ariel Rokem, Arnaud Joly, Arnaud\nRachez, Arthur Mensch, Ash Hoover, asnt, b0noI, Behzad Tabibian, Bernardo,\nBernhard Kratzwald, Bhargav Mangipudi, blakeflei, Boyuan Deng, Brandon Carter,\nBrett Naul, Brian McFee, Caio Oliveira, Camilo Lamus, Carol Willing, Cass,\nCeShine Lee, Charles Truong, Chyi-Kwei Yau, CJ Carey, codevig, Colin Ni, Dan\nShiebler, Daniel, Daniel Hnyk, David Ellis, David Nicholson, David Staub, David\nThaler, David Warshaw, Davide Lasagna, Deborah, definitelyuncertain, Didi\nBar-Zev, djipey, dsquareindia, edwinENSAE, Elias Kuthe, Elvis DOHMATOB, Ethan\nWhite, Fabian Pedregosa, Fabio Ticconi, fisache, Florian Wilhelm, Francis,\nFrancis O'Donovan, Gael Varoquaux, Ganiev Ibraim, ghg, Gilles Louppe, Giorgio\nPatrini, Giovanni Cherubin, Giovanni Lanzani, Glenn Qian, Gordon\nMohr, govin-vatsan, Graham Clenaghan, Greg Reda, Greg Stupp, Guillaume\nLemaitre, Gustav M\u00f6rtberg, halwai, Harizo Rajaona, Harry Mavroforakis,\nhashcode55, hdmetor, Henry Lin, Hobson Lane, Hugo Bowne-Anderson,\nIgor Andriushchenko, Imaculate, Inki Hwang, Isaac Sijaranamual,\nIshank Gulati, Issam Laradji, Iver Jordal, jackmartin, Jacob Schreiber, Jake\nVanderplas, James Fiedler, James Routley, Jan Zikes, Janna Brettingen, jarfa, Jason\nLaska, jblackburne, jeff levesque, Jeffrey Blackburne, Jeffrey04, Jeremy Hintz,\njeremynixon, Jeroen, Jessica Yung, Jill-J\u00eann Vie, Jimmy Jia, Jiyuan Qian, Joel\nNothman, johannah, John, John Boersma, John Kirkham, John Moeller,\njonathan.striebel, joncrall, Jordi, Joseph Munoz, Joshua Cook, JPFrancoia,\njrfiedler, JulianKahnert, juliathebrave, kaichogami, KamalakerDadi, Kenneth\nLyons, Kevin Wang, kingjr, kjell, Konstantin Podshumok, Kornel Kielczewski,\nKrishna Kalyan, krishnakalyan3, Kvle Putnam, Kyle Jackson, Lars Buitinck,\nldavid, LeiG, LeightonZhang, Leland McInnes, Liang-Chi Hsieh, Lilian Besson,\nlizsz, Loic Esteve, Louis Tiao, L\u00e9onie Borne, Mads Jensen, Maniteja Nandana,\nManoj Kumar, Manvendra Singh, Marco, Mario Krell, Mark Bao, Mark Szepieniec,\nMartin Madsen, MartinBpr, MaryanMorel, Massil, Matheus, Mathieu Blondel,\nMathieu Dubois, Matteo, Matthias Ekman, Max Moroz, Michael Scherer, michiaki\nariga, Mikhail Korobov, Moussa Taifi, mrandrewandrade, Mridul Seth, nadya-p,\nNaoya Kanai, Nate George, Nelle Varoquaux, Nelson Liu, Nick James,\nNickleDave, Nico, Nicolas Goix, Nikolay Mayorov, ningchi, nlathia,\nokbalefthanded, Okhlopkov, Olivier Grisel, Panos Louridas, Paul Strickland,\nPerrine Letellier, pestrickland, Peter Fischer, Pieter, Ping-Yao, Chang,\npracticalswift, Preston Parry, Qimu Zheng, Rachit Kansal, Raghav RV,\nRalf Gommers, Ramana.S, Rammig, Randy Olson, Rob Alexander, Robert Lutz,\nRobin Schucker, Rohan Jain, Ruifeng Zheng, Ryan Yu, R\u00e9my L\u00e9one, saihttam,\nSaiwing Yeung, Sam Shleifer, Samuel St-Jean, Sartaj Singh, Sasank Chilamkurthy,\nsaurabh.bansod, Scott Andrews, Scott Lowe, seales, Sebastian Raschka, Sebastian\nSaeger, Sebasti\u00e1n Vanrell, Sergei Lebedev, shagun Sodhani, shanmuga cv,\nShashank Shekhar, shawpan, shengxiduan, Shota, shuckle16, Skipper Seabold,\nsklearn-ci, SmedbergM, srvanrell, S\u00e9bastien Lerique, Taranjeet, themrmax,\nThierry, Thierry Guillemot, Thomas, Thomas Hallock, Thomas Moreau, Tim Head,\ntKammy, toastedcornflakes, Tom, TomDLT, Toshihiro Kamishima, tracer0tong, Trent\nHauck, trevorstephens, Tue Vo, Varun, Varun Jewalikar, Viacheslav, Vighnesh\nBirodkar, Vikram, Villu Ruusmann, Vinayak Mehta, walter, waterponey, Wenhua\nYang, Wenjian Huang, Will Welch, wyseguy7, xyguo, yanlend, Yaroslav Halchenko,\nyelite, Yen, YenChenLin, Yichuan Liu, Yoav Ram, Yoshiki, Zheng RuiFeng, zivori, \u00d3scar N\u00e1jera\n\n.. currentmodule:: sklearn\n\n.. _changes_0_17_1:\n\nVersion 0.17.1\n==============\n\n**February 18, 2016**\n\nChangelog\n---------\n\nBug fixes\n.........\n- Upgrade vendored joblib to version 0.9.4 that fixes an important bug in\n ``joblib.Parallel`` that can silently yield to wrong results when working\n on datasets larger than 1MB:\n https://github.com/joblib/joblib/blob/0.9.4/CHANGES.rst\n\n- Fixed reading of Bunch pickles generated with scikit-learn\n version <= 0.16. This can affect users who have already\n downloaded a dataset with scikit-learn 0.16 and are loading it\n with scikit-learn 0.17. See :issue:`6196` for\n how this affected :func:`datasets.fetch_20newsgroups`. By `Loic\n Esteve`_.\n\n- Fixed a bug that prevented using ROC AUC score to perform grid search on\n several CPU / cores on large arrays. See :issue:`6147`\n By `Olivier Grisel`_.\n\n- Fixed a bug that prevented to properly set the ``presort`` parameter\n in :class:`ensemble.GradientBoostingRegressor`. See :issue:`5857`\n By Andrew McCulloh.\n\n- Fixed a joblib error when evaluating the perplexity of a\n :class:`decomposition.LatentDirichletAllocation` model. See :issue:`6258`\n By Chyi-Kwei Yau.\n.. _changes_0_17:\n\nVersion 0.17\n============\n\n**November 5, 2015**\n\nChangelog\n---------\n\nNew features\n............\n\n- All the Scaler classes but :class:`preprocessing.RobustScaler` can be fitted online by\n calling `partial_fit`. By :user:`Giorgio Patrini `.\n\n- The new class :class:`ensemble.VotingClassifier` implements a\n \"majority rule\" / \"soft voting\" ensemble classifier to combine\n estimators for classification. By `Sebastian Raschka`_.\n\n- The new class :class:`preprocessing.RobustScaler` provides an\n alternative to :class:`preprocessing.StandardScaler` for feature-wise\n centering and range normalization that is robust to outliers.\n By :user:`Thomas Unterthiner `.\n\n- The new class :class:`preprocessing.MaxAbsScaler` provides an\n alternative to :class:`preprocessing.MinMaxScaler` for feature-wise\n range normalization when the data is already centered or sparse.\n By :user:`Thomas Unterthiner `.\n\n- The new class :class:`preprocessing.FunctionTransformer` turns a Python\n function into a ``Pipeline``-compatible transformer object.\n By Joe Jevnik.\n\n- The new classes :class:`cross_validation.LabelKFold` and\n :class:`cross_validation.LabelShuffleSplit` generate train-test folds,\n respectively similar to :class:`cross_validation.KFold` and\n :class:`cross_validation.ShuffleSplit`, except that the folds are\n conditioned on a label array. By `Brian McFee`_, :user:`Jean\n Kossaifi ` and `Gilles Louppe`_.\n\n- :class:`decomposition.LatentDirichletAllocation` implements the Latent\n Dirichlet Allocation topic model with online variational\n inference. By :user:`Chyi-Kwei Yau `, with code based on an implementation\n by Matt Hoffman. (:issue:`3659`)\n\n- The new solver ``sag`` implements a Stochastic Average Gradient descent\n and is available in both :class:`linear_model.LogisticRegression` and\n :class:`linear_model.Ridge`. This solver is very efficient for large\n datasets. By :user:`Danny Sullivan ` and `Tom Dupre la Tour`_.\n (:issue:`4738`)\n\n- The new solver ``cd`` implements a Coordinate Descent in\n :class:`decomposition.NMF`. Previous solver based on Projected Gradient is\n still available setting new parameter ``solver`` to ``pg``, but is\n deprecated and will be removed in 0.19, along with\n :class:`decomposition.ProjectedGradientNMF` and parameters ``sparseness``,\n ``eta``, ``beta`` and ``nls_max_iter``. New parameters ``alpha`` and\n ``l1_ratio`` control L1 and L2 regularization, and ``shuffle`` adds a\n shuffling step in the ``cd`` solver.\n By `Tom Dupre la Tour`_ and `Mathieu Blondel`_.\n\nEnhancements\n............\n- :class:`manifold.TSNE` now supports approximate optimization via the\n Barnes-Hut method, leading to much faster fitting. By Christopher Erick Moody.\n (:issue:`4025`)\n\n- :class:`cluster.mean_shift_.MeanShift` now supports parallel execution,\n as implemented in the ``mean_shift`` function. By :user:`Martino\n Sorbaro `.\n\n- :class:`naive_bayes.GaussianNB` now supports fitting with ``sample_weight``.\n By `Jan Hendrik Metzen`_.\n\n- :class:`dummy.DummyClassifier` now supports a prior fitting strategy.\n By `Arnaud Joly`_.\n\n- Added a ``fit_predict`` method for :class:`mixture.GMM` and subclasses.\n By :user:`Cory Lorenz `.\n\n- Added the :func:`metrics.label_ranking_loss` metric.\n By `Arnaud Joly`_.\n\n- Added the :func:`metrics.cohen_kappa_score` metric.\n\n- Added a ``warm_start`` constructor parameter to the bagging ensemble\n models to increase the size of the ensemble. By :user:`Tim Head `.\n\n- Added option to use multi-output regression metrics without averaging.\n By Konstantin Shmelkov and :user:`Michael Eickenberg `.\n\n- Added ``stratify`` option to :func:`cross_validation.train_test_split`\n for stratified splitting. By Miroslav Batchkarov.\n\n- The :func:`tree.export_graphviz` function now supports aesthetic\n improvements for :class:`tree.DecisionTreeClassifier` and\n :class:`tree.DecisionTreeRegressor`, including options for coloring nodes\n by their majority class or impurity, showing variable names, and using\n node proportions instead of raw sample counts. By `Trevor Stephens`_.\n\n- Improved speed of ``newton-cg`` solver in\n :class:`linear_model.LogisticRegression`, by avoiding loss computation.\n By `Mathieu Blondel`_ and `Tom Dupre la Tour`_.\n\n- The ``class_weight=\"auto\"`` heuristic in classifiers supporting\n ``class_weight`` was deprecated and replaced by the ``class_weight=\"balanced\"``\n option, which has a simpler formula and interpretation.\n By `Hanna Wallach`_ and `Andreas M\u00fcller`_.\n\n- Add ``class_weight`` parameter to automatically weight samples by class\n frequency for :class:`linear_model.PassiveAgressiveClassifier`. By\n `Trevor Stephens`_.\n\n- Added backlinks from the API reference pages to the user guide. By\n `Andreas M\u00fcller`_.\n\n- The ``labels`` parameter to :func:`sklearn.metrics.f1_score`,\n :func:`sklearn.metrics.fbeta_score`,\n :func:`sklearn.metrics.recall_score` and\n :func:`sklearn.metrics.precision_score` has been extended.\n It is now possible to ignore one or more labels, such as where\n a multiclass problem has a majority class to ignore. By `Joel Nothman`_.\n\n- Add ``sample_weight`` support to :class:`linear_model.RidgeClassifier`.\n By `Trevor Stephens`_.\n\n- Provide an option for sparse output from\n :func:`sklearn.metrics.pairwise.cosine_similarity`. By\n :user:`Jaidev Deshpande `.\n\n- Add :func:`minmax_scale` to provide a function interface for\n :class:`MinMaxScaler`. By :user:`Thomas Unterthiner `.\n\n- ``dump_svmlight_file`` now handles multi-label datasets.\n By Chih-Wei Chang.\n\n- RCV1 dataset loader (:func:`sklearn.datasets.fetch_rcv1`).\n By `Tom Dupre la Tour`_.\n\n- The \"Wisconsin Breast Cancer\" classical two-class classification dataset\n is now included in scikit-learn, available with\n :func:`sklearn.dataset.load_breast_cancer`.\n\n- Upgraded to joblib 0.9.3 to benefit from the new automatic batching of\n short tasks. This makes it possible for scikit-learn to benefit from\n parallelism when many very short tasks are executed in parallel, for\n instance by the :class:`grid_search.GridSearchCV` meta-estimator\n with ``n_jobs > 1`` used with a large grid of parameters on a small\n dataset. By `Vlad Niculae`_, `Olivier Grisel`_ and `Loic Esteve`_.\n\n- For more details about changes in joblib 0.9.3 see the release notes:\n https://github.com/joblib/joblib/blob/master/CHANGES.rst#release-093\n\n- Improved speed (3 times per iteration) of\n :class:`decomposition.DictLearning` with coordinate descent method\n from :class:`linear_model.Lasso`. By :user:`Arthur Mensch `.\n\n- Parallel processing (threaded) for queries of nearest neighbors\n (using the ball-tree) by Nikolay Mayorov.\n\n- Allow :func:`datasets.make_multilabel_classification` to output\n a sparse ``y``. By Kashif Rasul.\n\n- :class:`cluster.DBSCAN` now accepts a sparse matrix of precomputed\n distances, allowing memory-efficient distance precomputation. By\n `Joel Nothman`_.\n\n- :class:`tree.DecisionTreeClassifier` now exposes an ``apply`` method\n for retrieving the leaf indices samples are predicted as. By\n :user:`Daniel Galvez ` and `Gilles Louppe`_.\n\n- Speed up decision tree regressors, random forest regressors, extra trees\n regressors and gradient boosting estimators by computing a proxy\n of the impurity improvement during the tree growth. The proxy quantity is\n such that the split that maximizes this value also maximizes the impurity\n improvement. By `Arnaud Joly`_, :user:`Jacob Schreiber `\n and `Gilles Louppe`_.\n\n- Speed up tree based methods by reducing the number of computations needed\n when computing the impurity measure taking into account linear\n relationship of the computed statistics. The effect is particularly\n visible with extra trees and on datasets with categorical or sparse\n features. By `Arnaud Joly`_.\n\n- :class:`ensemble.GradientBoostingRegressor` and\n :class:`ensemble.GradientBoostingClassifier` now expose an ``apply``\n method for retrieving the leaf indices each sample ends up in under\n each try. By :user:`Jacob Schreiber `.\n\n- Add ``sample_weight`` support to :class:`linear_model.LinearRegression`.\n By Sonny Hu. (:issue:`#4881`)\n\n- Add ``n_iter_without_progress`` to :class:`manifold.TSNE` to control\n the stopping criterion. By Santi Villalba. (:issue:`5186`)\n\n- Added optional parameter ``random_state`` in :class:`linear_model.Ridge`\n , to set the seed of the pseudo random generator used in ``sag`` solver. By `Tom Dupre la Tour`_.\n\n- Added optional parameter ``warm_start`` in\n :class:`linear_model.LogisticRegression`. If set to True, the solvers\n ``lbfgs``, ``newton-cg`` and ``sag`` will be initialized with the\n coefficients computed in the previous fit. By `Tom Dupre la Tour`_.\n\n- Added ``sample_weight`` support to :class:`linear_model.LogisticRegression` for\n the ``lbfgs``, ``newton-cg``, and ``sag`` solvers. By `Valentin Stolbunov`_.\n Support added to the ``liblinear`` solver. By `Manoj Kumar`_.\n\n- Added optional parameter ``presort`` to :class:`ensemble.GradientBoostingRegressor`\n and :class:`ensemble.GradientBoostingClassifier`, keeping default behavior\n the same. This allows gradient boosters to turn off presorting when building\n deep trees or using sparse data. By :user:`Jacob Schreiber `.\n\n- Altered :func:`metrics.roc_curve` to drop unnecessary thresholds by\n default. By :user:`Graham Clenaghan `.\n\n- Added :class:`feature_selection.SelectFromModel` meta-transformer which can\n be used along with estimators that have `coef_` or `feature_importances_`\n attribute to select important features of the input data. By\n :user:`Maheshakya Wijewardena `, `Joel Nothman`_ and `Manoj Kumar`_.\n\n- Added :func:`metrics.pairwise.laplacian_kernel`. By `Clyde Fare `_.\n\n- :class:`covariance.GraphLasso` allows separate control of the convergence criterion\n for the Elastic-Net subproblem via the ``enet_tol`` parameter.\n\n- Improved verbosity in :class:`decomposition.DictionaryLearning`.\n\n- :class:`ensemble.RandomForestClassifier` and\n :class:`ensemble.RandomForestRegressor` no longer explicitly store the\n samples used in bagging, resulting in a much reduced memory footprint for\n storing random forest models.\n\n- Added ``positive`` option to :class:`linear_model.Lars` and\n :func:`linear_model.lars_path` to force coefficients to be positive.\n (:issue:`5131`)\n\n- Added the ``X_norm_squared`` parameter to :func:`metrics.pairwise.euclidean_distances`\n to provide precomputed squared norms for ``X``.\n\n- Added the ``fit_predict`` method to :class:`pipeline.Pipeline`.\n\n- Added the :func:`preprocessing.min_max_scale` function.\n\nBug fixes\n.........\n\n- Fixed non-determinism in :class:`dummy.DummyClassifier` with sparse\n multi-label output. By `Andreas M\u00fcller`_.\n\n- Fixed the output shape of :class:`linear_model.RANSACRegressor` to\n ``(n_samples, )``. By `Andreas M\u00fcller`_.\n\n- Fixed bug in :class:`decomposition.DictLearning` when ``n_jobs < 0``. By\n `Andreas M\u00fcller`_.\n\n- Fixed bug where :class:`grid_search.RandomizedSearchCV` could consume a\n lot of memory for large discrete grids. By `Joel Nothman`_.\n\n- Fixed bug in :class:`linear_model.LogisticRegressionCV` where `penalty` was ignored\n in the final fit. By `Manoj Kumar`_.\n\n- Fixed bug in :class:`ensemble.forest.ForestClassifier` while computing\n oob_score and X is a sparse.csc_matrix. By :user:`Ankur Ankan `.\n\n- All regressors now consistently handle and warn when given ``y`` that is of\n shape ``(n_samples, 1)``. By `Andreas M\u00fcller`_ and Henry Lin.\n (:issue:`5431`)\n\n- Fix in :class:`cluster.KMeans` cluster reassignment for sparse input by\n `Lars Buitinck`_.\n\n- Fixed a bug in :class:`lda.LDA` that could cause asymmetric covariance\n matrices when using shrinkage. By `Martin Billinger`_.\n\n- Fixed :func:`cross_validation.cross_val_predict` for estimators with\n sparse predictions. By Buddha Prakash.\n\n- Fixed the ``predict_proba`` method of :class:`linear_model.LogisticRegression`\n to use soft-max instead of one-vs-rest normalization. By `Manoj Kumar`_.\n (:issue:`5182`)\n\n- Fixed the :func:`partial_fit` method of :class:`linear_model.SGDClassifier`\n when called with ``average=True``. By :user:`Andrew Lamb `.\n (:issue:`5282`)\n\n- Dataset fetchers use different filenames under Python 2 and Python 3 to\n avoid pickling compatibility issues. By `Olivier Grisel`_.\n (:issue:`5355`)\n\n- Fixed a bug in :class:`naive_bayes.GaussianNB` which caused classification\n results to depend on scale. By `Jake Vanderplas`_.\n\n- Fixed temporarily :class:`linear_model.Ridge`, which was incorrect\n when fitting the intercept in the case of sparse data. The fix\n automatically changes the solver to 'sag' in this case.\n :issue:`5360` by `Tom Dupre la Tour`_.\n\n- Fixed a performance bug in :class:`decomposition.RandomizedPCA` on data\n with a large number of features and fewer samples. (:issue:`4478`)\n By `Andreas M\u00fcller`_, `Loic Esteve`_ and :user:`Giorgio Patrini `.\n\n- Fixed bug in :class:`cross_decomposition.PLS` that yielded unstable and\n platform dependent output, and failed on `fit_transform`.\n By :user:`Arthur Mensch `.\n\n- Fixes to the ``Bunch`` class used to store datasets.\n\n- Fixed :func:`ensemble.plot_partial_dependence` ignoring the\n ``percentiles`` parameter.\n\n- Providing a ``set`` as vocabulary in ``CountVectorizer`` no longer\n leads to inconsistent results when pickling.\n\n- Fixed the conditions on when a precomputed Gram matrix needs to\n be recomputed in :class:`linear_model.LinearRegression`,\n :class:`linear_model.OrthogonalMatchingPursuit`,\n :class:`linear_model.Lasso` and :class:`linear_model.ElasticNet`.\n\n- Fixed inconsistent memory layout in the coordinate descent solver\n that affected :class:`linear_model.DictionaryLearning` and\n :class:`covariance.GraphLasso`. (:issue:`5337`)\n By `Olivier Grisel`_.\n\n- :class:`manifold.LocallyLinearEmbedding` no longer ignores the ``reg``\n parameter.\n\n- Nearest Neighbor estimators with custom distance metrics can now be pickled.\n (:issue:`4362`)\n\n- Fixed a bug in :class:`pipeline.FeatureUnion` where ``transformer_weights``\n were not properly handled when performing grid-searches.\n\n- Fixed a bug in :class:`linear_model.LogisticRegression` and\n :class:`linear_model.LogisticRegressionCV` when using\n ``class_weight='balanced'```or ``class_weight='auto'``.\n By `Tom Dupre la Tour`_.\n\n- Fixed bug :issue:`5495` when\n doing OVR(SVC(decision_function_shape=\"ovr\")). Fixed by\n :user:`Elvis Dohmatob `.\nAPI changes summary\n-------------------\n- Attribute `data_min`, `data_max` and `data_range` in\n :class:`preprocessing.MinMaxScaler` are deprecated and won't be available\n from 0.19. Instead, the class now exposes `data_min_`, `data_max_`\n and `data_range_`. By :user:`Giorgio Patrini `.\n\n- All Scaler classes now have an `scale_` attribute, the feature-wise\n rescaling applied by their `transform` methods. The old attribute `std_`\n in :class:`preprocessing.StandardScaler` is deprecated and superseded\n by `scale_`; it won't be available in 0.19. By :user:`Giorgio Patrini `.\n\n- :class:`svm.SVC`` and :class:`svm.NuSVC` now have an ``decision_function_shape``\n parameter to make their decision function of shape ``(n_samples, n_classes)``\n by setting ``decision_function_shape='ovr'``. This will be the default behavior\n starting in 0.19. By `Andreas M\u00fcller`_.\n\n- Passing 1D data arrays as input to estimators is now deprecated as it\n caused confusion in how the array elements should be interpreted\n as features or as samples. All data arrays are now expected\n to be explicitly shaped ``(n_samples, n_features)``.\n By :user:`Vighnesh Birodkar `.\n\n- :class:`lda.LDA` and :class:`qda.QDA` have been moved to\n :class:`discriminant_analysis.LinearDiscriminantAnalysis` and\n :class:`discriminant_analysis.QuadraticDiscriminantAnalysis`.\n\n- The ``store_covariance`` and ``tol`` parameters have been moved from\n the fit method to the constructor in\n :class:`discriminant_analysis.LinearDiscriminantAnalysis` and the\n ``store_covariances`` and ``tol`` parameters have been moved from the\n fit method to the constructor in\n :class:`discriminant_analysis.QuadraticDiscriminantAnalysis`.\n\n- Models inheriting from ``_LearntSelectorMixin`` will no longer support the\n transform methods. (i.e, RandomForests, GradientBoosting, LogisticRegression,\n DecisionTrees, SVMs and SGD related models). Wrap these models around the\n metatransfomer :class:`feature_selection.SelectFromModel` to remove\n features (according to `coefs_` or `feature_importances_`)\n which are below a certain threshold value instead.\n\n- :class:`cluster.KMeans` re-runs cluster-assignments in case of non-convergence,\n to ensure consistency of ``predict(X)`` and ``labels_``. By\n :user:`Vighnesh Birodkar `.\n\n- Classifier and Regressor models are now tagged as such using the\n ``_estimator_type`` attribute.\n\n- Cross-validation iterators always provide indices into training and test set,\n not boolean masks.\n\n- The ``decision_function`` on all regressors was deprecated and will be\n removed in 0.19. Use ``predict`` instead.\n\n- :func:`datasets.load_lfw_pairs` is deprecated and will be removed in 0.19.\n Use :func:`datasets.fetch_lfw_pairs` instead.\n\n- The deprecated ``hmm`` module was removed.\n\n- The deprecated ``Bootstrap`` cross-validation iterator was removed.\n\n- The deprecated ``Ward`` and ``WardAgglomerative`` classes have been removed.\n Use :class:`clustering.AgglomerativeClustering` instead.\n\n- :func:`cross_validation.check_cv` is now a public function.\n\n- The property ``residues_`` of :class:`linear_model.LinearRegression` is deprecated\n and will be removed in 0.19.\n\n- The deprecated ``n_jobs`` parameter of :class:`linear_model.LinearRegression` has been moved\n to the constructor.\n\n- Removed deprecated ``class_weight`` parameter from :class:`linear_model.SGDClassifier`'s ``fit``\n method. Use the construction parameter instead.\n\n- The deprecated support for the sequence of sequences (or list of lists) multilabel\n format was removed. To convert to and from the supported binary\n indicator matrix format, use\n :class:`MultiLabelBinarizer `.\n\n- The behavior of calling the ``inverse_transform`` method of ``Pipeline.pipeline`` will\n change in 0.19. It will no longer reshape one-dimensional input to two-dimensional input.\n\n- The deprecated attributes ``indicator_matrix_``, ``multilabel_`` and ``classes_`` of\n :class:`preprocessing.LabelBinarizer` were removed.\n\n- Using ``gamma=0`` in :class:`svm.SVC` and :class:`svm.SVR` to automatically set the\n gamma to ``1. / n_features`` is deprecated and will be removed in 0.19.\n Use ``gamma=\"auto\"`` instead.\n\nCode Contributors\n-----------------\nAaron Schumacher, Adithya Ganesh, akitty, Alexandre Gramfort, Alexey Grigorev,\nAli Baharev, Allen Riddell, Ando Saabas, Andreas Mueller, Andrew Lamb, Anish\nShah, Ankur Ankan, Anthony Erlinger, Ari Rouvinen, Arnaud Joly, Arnaud Rachez,\nArthur Mensch, banilo, Barmaley.exe, benjaminirving, Boyuan Deng, Brett Naul,\nBrian McFee, Buddha Prakash, Chi Zhang, Chih-Wei Chang, Christof Angermueller,\nChristoph Gohlke, Christophe Bourguignat, Christopher Erick Moody, Chyi-Kwei\nYau, Cindy Sridharan, CJ Carey, Clyde-fare, Cory Lorenz, Dan Blanchard, Daniel\nGalvez, Daniel Kronovet, Danny Sullivan, Data1010, David, David D Lowe, David\nDotson, djipey, Dmitry Spikhalskiy, Donne Martin, Dougal J. Sutherland, Dougal\nSutherland, edson duarte, Eduardo Caro, Eric Larson, Eric Martin, Erich\nSchubert, Fernando Carrillo, Frank C. Eckert, Frank Zalkow, Gael Varoquaux,\nGaniev Ibraim, Gilles Louppe, Giorgio Patrini, giorgiop, Graham Clenaghan,\nGryllos Prokopis, gwulfs, Henry Lin, Hsuan-Tien Lin, Immanuel Bayer, Ishank\nGulati, Jack Martin, Jacob Schreiber, Jaidev Deshpande, Jake Vanderplas, Jan\nHendrik Metzen, Jean Kossaifi, Jeffrey04, Jeremy, jfraj, Jiali Mei,\nJoe Jevnik, Joel Nothman, John Kirkham, John Wittenauer, Joseph, Joshua Loyal,\nJungkook Park, KamalakerDadi, Kashif Rasul, Keith Goodman, Kian Ho, Konstantin\nShmelkov, Kyler Brown, Lars Buitinck, Lilian Besson, Loic Esteve, Louis Tiao,\nmaheshakya, Maheshakya Wijewardena, Manoj Kumar, MarkTab marktab.net, Martin\nKu, Martin Spacek, MartinBpr, martinosorb, MaryanMorel, Masafumi Oyamada,\nMathieu Blondel, Matt Krump, Matti Lyra, Maxim Kolganov, mbillinger, mhg,\nMichael Heilman, Michael Patterson, Miroslav Batchkarov, Nelle Varoquaux,\nNicolas, Nikolay Mayorov, Olivier Grisel, Omer Katz, \u00d3scar N\u00e1jera, Pauli\nVirtanen, Peter Fischer, Peter Prettenhofer, Phil Roth, pianomania, Preston\nParry, Raghav RV, Rob Zinkov, Robert Layton, Rohan Ramanath, Saket Choudhary,\nSam Zhang, santi, saurabh.bansod, scls19fr, Sebastian Raschka, Sebastian\nSaeger, Shivan Sornarajah, SimonPL, sinhrks, Skipper Seabold, Sonny Hu, sseg,\nStephen Hoover, Steven De Gryze, Steven Seguin, Theodore Vasiloudis, Thomas\nUnterthiner, Tiago Freitas Pereira, Tian Wang, Tim Head, Timothy Hopper,\ntokoroten, Tom Dupr\u00e9 la Tour, Trevor Stephens, Valentin Stolbunov, Vighnesh\nBirodkar, Vinayak Mehta, Vincent, Vincent Michel, vstolbunov, wangz10, Wei Xue,\nYucheng Low, Yury Zhauniarovich, Zac Stewart, zhai_pro, Zichen Wang\n\n.. _changes_0_1_16:\n\nVersion 0.16.1\n===============\n\n**April 14, 2015**\n\nChangelog\n---------\n\nBug fixes\n.........\n\n- Allow input data larger than ``block_size`` in\n :class:`covariance.LedoitWolf` by `Andreas M\u00fcller`_.\n\n- Fix a bug in :class:`isotonic.IsotonicRegression` deduplication that\n caused unstable result in :class:`calibration.CalibratedClassifierCV` by\n `Jan Hendrik Metzen`_.\n\n- Fix sorting of labels in func:`preprocessing.label_binarize` by Michael Heilman.\n\n- Fix several stability and convergence issues in\n :class:`cross_decomposition.CCA` and\n :class:`cross_decomposition.PLSCanonical` by `Andreas M\u00fcller`_\n\n- Fix a bug in :class:`cluster.KMeans` when ``precompute_distances=False``\n on fortran-ordered data.\n\n- Fix a speed regression in :class:`ensemble.RandomForestClassifier`'s ``predict``\n and ``predict_proba`` by `Andreas M\u00fcller`_.\n\n- Fix a regression where ``utils.shuffle`` converted lists and dataframes to arrays, by `Olivier Grisel`_\n\n.. _changes_0_16:\n\nVersion 0.16\n============\n\n**March 26, 2015**\n\nHighlights\n-----------\n\n- Speed improvements (notably in :class:`cluster.DBSCAN`), reduced memory\n requirements, bug-fixes and better default settings.\n\n- Multinomial Logistic regression and a path algorithm in\n :class:`linear_model.LogisticRegressionCV`.\n\n- Out-of core learning of PCA via :class:`decomposition.IncrementalPCA`.\n\n- Probability callibration of classifiers using\n :class:`calibration.CalibratedClassifierCV`.\n\n- :class:`cluster.Birch` clustering method for large-scale datasets.\n\n- Scalable approximate nearest neighbors search with Locality-sensitive\n hashing forests in :class:`neighbors.LSHForest`.\n\n- Improved error messages and better validation when using malformed input data.\n\n- More robust integration with pandas dataframes.\n\nChangelog\n---------\n\nNew features\n............\n\n- The new :class:`neighbors.LSHForest` implements locality-sensitive hashing\n for approximate nearest neighbors search. By :user:`Maheshakya Wijewardena `.\n\n- Added :class:`svm.LinearSVR`. This class uses the liblinear implementation\n of Support Vector Regression which is much faster for large\n sample sizes than :class:`svm.SVR` with linear kernel. By\n `Fabian Pedregosa`_ and Qiang Luo.\n\n- Incremental fit for :class:`GaussianNB `.\n\n- Added ``sample_weight`` support to :class:`dummy.DummyClassifier` and\n :class:`dummy.DummyRegressor`. By `Arnaud Joly`_.\n\n- Added the :func:`metrics.label_ranking_average_precision_score` metrics.\n By `Arnaud Joly`_.\n\n- Add the :func:`metrics.coverage_error` metrics. By `Arnaud Joly`_.\n\n- Added :class:`linear_model.LogisticRegressionCV`. By\n `Manoj Kumar`_, `Fabian Pedregosa`_, `Gael Varoquaux`_\n and `Alexandre Gramfort`_.\n\n- Added ``warm_start`` constructor parameter to make it possible for any\n trained forest model to grow additional trees incrementally. By\n :user:`Laurent Direr `.\n\n- Added ``sample_weight`` support to :class:`ensemble.GradientBoostingClassifier` and\n :class:`ensemble.GradientBoostingRegressor`. By `Peter Prettenhofer`_.\n\n- Added :class:`decomposition.IncrementalPCA`, an implementation of the PCA\n algorithm that supports out-of-core learning with a ``partial_fit``\n method. By `Kyle Kastner`_.\n\n- Averaged SGD for :class:`SGDClassifier `\n and :class:`SGDRegressor ` By\n :user:`Danny Sullivan `.\n\n- Added :func:`cross_val_predict `\n function which computes cross-validated estimates. By `Luis Pedro Coelho`_\n\n- Added :class:`linear_model.TheilSenRegressor`, a robust\n generalized-median-based estimator. By :user:`Florian Wilhelm `.\n\n- Added :func:`metrics.median_absolute_error`, a robust metric.\n By `Gael Varoquaux`_ and :user:`Florian Wilhelm `.\n\n- Add :class:`cluster.Birch`, an online clustering algorithm. By\n `Manoj Kumar`_, `Alexandre Gramfort`_ and `Joel Nothman`_.\n\n- Added shrinkage support to :class:`discriminant_analysis.LinearDiscriminantAnalysis`\n using two new solvers. By :user:`Clemens Brunner ` and `Martin Billinger`_.\n\n- Added :class:`kernel_ridge.KernelRidge`, an implementation of\n kernelized ridge regression.\n By `Mathieu Blondel`_ and `Jan Hendrik Metzen`_.\n\n- All solvers in :class:`linear_model.Ridge` now support `sample_weight`.\n By `Mathieu Blondel`_.\n\n- Added :class:`cross_validation.PredefinedSplit` cross-validation\n for fixed user-provided cross-validation folds.\n By :user:`Thomas Unterthiner `.\n\n- Added :class:`calibration.CalibratedClassifierCV`, an approach for\n calibrating the predicted probabilities of a classifier.\n By `Alexandre Gramfort`_, `Jan Hendrik Metzen`_, `Mathieu Blondel`_\n and :user:`Balazs Kegl `.\nEnhancements\n............\n\n- Add option ``return_distance`` in :func:`hierarchical.ward_tree`\n to return distances between nodes for both structured and unstructured\n versions of the algorithm. By `Matteo Visconti di Oleggio Castello`_.\n The same option was added in :func:`hierarchical.linkage_tree`.\n By `Manoj Kumar`_\n\n- Add support for sample weights in scorer objects. Metrics with sample\n weight support will automatically benefit from it. By `Noel Dawe`_ and\n `Vlad Niculae`_.\n\n- Added ``newton-cg`` and `lbfgs` solver support in\n :class:`linear_model.LogisticRegression`. By `Manoj Kumar`_.\n\n- Add ``selection=\"random\"`` parameter to implement stochastic coordinate\n descent for :class:`linear_model.Lasso`, :class:`linear_model.ElasticNet`\n and related. By `Manoj Kumar`_.\n\n- Add ``sample_weight`` parameter to\n :func:`metrics.jaccard_similarity_score` and :func:`metrics.log_loss`.\n By :user:`Jatin Shah `.\n\n- Support sparse multilabel indicator representation in\n :class:`preprocessing.LabelBinarizer` and\n :class:`multiclass.OneVsRestClassifier` (by :user:`Hamzeh Alsalhi ` with thanks\n to Rohit Sivaprasad), as well as evaluation metrics (by\n `Joel Nothman`_).\n\n- Add ``sample_weight`` parameter to `metrics.jaccard_similarity_score`.\n By `Jatin Shah`.\n\n- Add support for multiclass in `metrics.hinge_loss`. Added ``labels=None``\n as optional parameter. By `Saurabh Jha`.\n\n- Add ``sample_weight`` parameter to `metrics.hinge_loss`.\n By `Saurabh Jha`.\n\n- Add ``multi_class=\"multinomial\"`` option in\n :class:`linear_model.LogisticRegression` to implement a Logistic\n Regression solver that minimizes the cross-entropy or multinomial loss\n instead of the default One-vs-Rest setting. Supports `lbfgs` and\n `newton-cg` solvers. By `Lars Buitinck`_ and `Manoj Kumar`_. Solver option\n `newton-cg` by Simon Wu.\n\n- ``DictVectorizer`` can now perform ``fit_transform`` on an iterable in a\n single pass, when giving the option ``sort=False``. By :user:`Dan\n Blanchard `.\n\n- :class:`GridSearchCV` and :class:`RandomizedSearchCV` can now be\n configured to work with estimators that may fail and raise errors on\n individual folds. This option is controlled by the `error_score`\n parameter. This does not affect errors raised on re-fit. By\n :user:`Michal Romaniuk `.\n\n- Add ``digits`` parameter to `metrics.classification_report` to allow\n report to show different precision of floating point numbers. By\n :user:`Ian Gilmore `.\n\n- Add a quantile prediction strategy to the :class:`dummy.DummyRegressor`.\n By :user:`Aaron Staple `.\n\n- Add ``handle_unknown`` option to :class:`preprocessing.OneHotEncoder` to\n handle unknown categorical features more gracefully during transform.\n By `Manoj Kumar`_.\n\n- Added support for sparse input data to decision trees and their ensembles.\n By `Fares Hedyati`_ and `Arnaud Joly`_.\n\n- Optimized :class:`cluster.AffinityPropagation` by reducing the number of\n memory allocations of large temporary data-structures. By `Antony Lee`_.\n\n- Parellization of the computation of feature importances in random forest.\n By `Olivier Grisel`_ and `Arnaud Joly`_.\n\n- Add ``n_iter_`` attribute to estimators that accept a ``max_iter`` attribute\n in their constructor. By `Manoj Kumar`_.\n\n- Added decision function for :class:`multiclass.OneVsOneClassifier`\n By `Raghav RV`_ and :user:`Kyle Beauchamp `.\n\n- :func:`neighbors.kneighbors_graph` and :func:`radius_neighbors_graph`\n support non-Euclidean metrics. By `Manoj Kumar`_\n\n- Parameter ``connectivity`` in :class:`cluster.AgglomerativeClustering`\n and family now accept callables that return a connectivity matrix.\n By `Manoj Kumar`_.\n\n- Sparse support for :func:`paired_distances`. By `Joel Nothman`_.\n\n- :class:`cluster.DBSCAN` now supports sparse input and sample weights and\n has been optimized: the inner loop has been rewritten in Cython and\n radius neighbors queries are now computed in batch. By `Joel Nothman`_\n and `Lars Buitinck`_.\n\n- Add ``class_weight`` parameter to automatically weight samples by class\n frequency for :class:`ensemble.RandomForestClassifier`,\n :class:`tree.DecisionTreeClassifier`, :class:`ensemble.ExtraTreesClassifier`\n and :class:`tree.ExtraTreeClassifier`. By `Trevor Stephens`_.\n\n- :class:`grid_search.RandomizedSearchCV` now does sampling without\n replacement if all parameters are given as lists. By `Andreas M\u00fcller`_.\n\n- Parallelized calculation of :func:`pairwise_distances` is now supported\n for scipy metrics and custom callables. By `Joel Nothman`_.\n\n- Allow the fitting and scoring of all clustering algorithms in\n :class:`pipeline.Pipeline`. By `Andreas M\u00fcller`_.\n\n- More robust seeding and improved error messages in :class:`cluster.MeanShift`\n by `Andreas M\u00fcller`_.\n\n- Make the stopping criterion for :class:`mixture.GMM`,\n :class:`mixture.DPGMM` and :class:`mixture.VBGMM` less dependent on the\n number of samples by thresholding the average log-likelihood change\n instead of its sum over all samples. By `Herv\u00e9 Bredin`_.\n\n- The outcome of :func:`manifold.spectral_embedding` was made deterministic\n by flipping the sign of eigenvectors. By :user:`Hasil Sharma `.\n\n- Significant performance and memory usage improvements in\n :class:`preprocessing.PolynomialFeatures`. By `Eric Martin`_.\n\n- Numerical stability improvements for :class:`preprocessing.StandardScaler`\n and :func:`preprocessing.scale`. By `Nicolas Goix`_\n\n- :class:`svm.SVC` fitted on sparse input now implements ``decision_function``.\n By `Rob Zinkov`_ and `Andreas M\u00fcller`_.\n\n- :func:`cross_validation.train_test_split` now preserves the input type,\n instead of converting to numpy arrays.\nDocumentation improvements\n..........................\n\n- Added example of using :class:`FeatureUnion` for heterogeneous input.\n By :user:`Matt Terry `\n\n- Documentation on scorers was improved, to highlight the handling of loss\n functions. By :user:`Matt Pico `.\n\n- A discrepancy between liblinear output and scikit-learn's wrappers\n is now noted. By `Manoj Kumar`_.\n\n- Improved documentation generation: examples referring to a class or\n function are now shown in a gallery on the class/function's API reference\n page. By `Joel Nothman`_.\n\n- More explicit documentation of sample generators and of data\n transformation. By `Joel Nothman`_.\n\n- :class:`sklearn.neighbors.BallTree` and :class:`sklearn.neighbors.KDTree`\n used to point to empty pages stating that they are aliases of BinaryTree.\n This has been fixed to show the correct class docs. By `Manoj Kumar`_.\n\n- Added silhouette plots for analysis of KMeans clustering using\n :func:`metrics.silhouette_samples` and :func:`metrics.silhouette_score`.\n See :ref:`sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py`\n\nBug fixes\n.........\n- Metaestimators now support ducktyping for the presence of ``decision_function``,\n ``predict_proba`` and other methods. This fixes behavior of\n :class:`grid_search.GridSearchCV`,\n :class:`grid_search.RandomizedSearchCV`, :class:`pipeline.Pipeline`,\n :class:`feature_selection.RFE`, :class:`feature_selection.RFECV` when nested.\n By `Joel Nothman`_\n\n- The ``scoring`` attribute of grid-search and cross-validation methods is no longer\n ignored when a :class:`grid_search.GridSearchCV` is given as a base estimator or\n the base estimator doesn't have predict.\n\n- The function :func:`hierarchical.ward_tree` now returns the children in\n the same order for both the structured and unstructured versions. By\n `Matteo Visconti di Oleggio Castello`_.\n\n- :class:`feature_selection.RFECV` now correctly handles cases when\n ``step`` is not equal to 1. By :user:`Nikolay Mayorov `\n\n- The :class:`decomposition.PCA` now undoes whitening in its\n ``inverse_transform``. Also, its ``components_`` now always have unit\n length. By :user:`Michael Eickenberg `.\n\n- Fix incomplete download of the dataset when\n :func:`datasets.download_20newsgroups` is called. By `Manoj Kumar`_.\n\n- Various fixes to the Gaussian processes subpackage by Vincent Dubourg\n and Jan Hendrik Metzen.\n\n- Calling ``partial_fit`` with ``class_weight=='auto'`` throws an\n appropriate error message and suggests a work around.\n By :user:`Danny Sullivan `.\n\n- :class:`RBFSampler ` with ``gamma=g``\n formerly approximated :func:`rbf_kernel `\n with ``gamma=g/2.``; the definition of ``gamma`` is now consistent,\n which may substantially change your results if you use a fixed value.\n (If you cross-validated over ``gamma``, it probably doesn't matter\n too much.) By :user:`Dougal Sutherland `.\n\n- Pipeline object delegate the ``classes_`` attribute to the underlying\n estimator. It allows, for instance, to make bagging of a pipeline object.\n By `Arnaud Joly`_\n\n- :class:`neighbors.NearestCentroid` now uses the median as the centroid\n when metric is set to ``manhattan``. It was using the mean before.\n By `Manoj Kumar`_\n\n- Fix numerical stability issues in :class:`linear_model.SGDClassifier`\n and :class:`linear_model.SGDRegressor` by clipping large gradients and\n ensuring that weight decay rescaling is always positive (for large\n l2 regularization and large learning rate values).\n By `Olivier Grisel`_\n\n- When `compute_full_tree` is set to \"auto\", the full tree is\n built when n_clusters is high and is early stopped when n_clusters is\n low, while the behavior should be vice-versa in\n :class:`cluster.AgglomerativeClustering` (and friends).\n This has been fixed By `Manoj Kumar`_\n\n- Fix lazy centering of data in :func:`linear_model.enet_path` and\n :func:`linear_model.lasso_path`. It was centered around one. It has\n been changed to be centered around the origin. By `Manoj Kumar`_\n\n- Fix handling of precomputed affinity matrices in\n :class:`cluster.AgglomerativeClustering` when using connectivity\n constraints. By :user:`Cathy Deng `\n\n- Correct ``partial_fit`` handling of ``class_prior`` for\n :class:`sklearn.naive_bayes.MultinomialNB` and\n :class:`sklearn.naive_bayes.BernoulliNB`. By `Trevor Stephens`_.\n\n- Fixed a crash in :func:`metrics.precision_recall_fscore_support`\n when using unsorted ``labels`` in the multi-label setting.\n By `Andreas M\u00fcller`_.\n\n- Avoid skipping the first nearest neighbor in the methods ``radius_neighbors``,\n ``kneighbors``, ``kneighbors_graph`` and ``radius_neighbors_graph`` in\n :class:`sklearn.neighbors.NearestNeighbors` and family, when the query\n data is not the same as fit data. By `Manoj Kumar`_.\n\n- Fix log-density calculation in the :class:`mixture.GMM` with\n tied covariance. By `Will Dawson`_\n\n- Fixed a scaling error in :class:`feature_selection.SelectFdr`\n where a factor ``n_features`` was missing. By `Andrew Tulloch`_\n\n- Fix zero division in :class:`neighbors.KNeighborsRegressor` and related\n classes when using distance weighting and having identical data points.\n By `Garret-R `_.\n\n- Fixed round off errors with non positive-definite covariance matrices\n in GMM. By :user:`Alexis Mignon `.\n\n- Fixed a error in the computation of conditional probabilities in\n :class:`naive_bayes.BernoulliNB`. By `Hanna Wallach`_.\n\n- Make the method ``radius_neighbors`` of\n :class:`neighbors.NearestNeighbors` return the samples lying on the\n boundary for ``algorithm='brute'``. By `Yan Yi`_.\n\n- Flip sign of ``dual_coef_`` of :class:`svm.SVC`\n to make it consistent with the documentation and\n ``decision_function``. By Artem Sobolev.\n\n- Fixed handling of ties in :class:`isotonic.IsotonicRegression`.\n We now use the weighted average of targets (secondary method). By\n `Andreas M\u00fcller`_ and `Michael Bommarito `_.\n\nAPI changes summary\n-------------------\n\n- :class:`GridSearchCV ` and\n :func:`cross_val_score ` and other\n meta-estimators don't convert pandas DataFrames into arrays any more,\n allowing DataFrame specific operations in custom estimators.\n\n- :func:`multiclass.fit_ovr`, :func:`multiclass.predict_ovr`,\n :func:`predict_proba_ovr`,\n :func:`multiclass.fit_ovo`, :func:`multiclass.predict_ovo`,\n :func:`multiclass.fit_ecoc` and :func:`multiclass.predict_ecoc`\n are deprecated. Use the underlying estimators instead.\n\n- Nearest neighbors estimators used to take arbitrary keyword arguments\n and pass these to their distance metric. This will no longer be supported\n in scikit-learn 0.18; use the ``metric_params`` argument instead.\n\n- `n_jobs` parameter of the fit method shifted to the constructor of the\n LinearRegression class.\n\n- The ``predict_proba`` method of :class:`multiclass.OneVsRestClassifier`\n now returns two probabilities per sample in the multiclass case; this\n is consistent with other estimators and with the method's documentation,\n but previous versions accidentally returned only the positive\n probability. Fixed by Will Lamond and `Lars Buitinck`_.\n\n- Change default value of precompute in :class:`ElasticNet` and :class:`Lasso`\n to False. Setting precompute to \"auto\" was found to be slower when\n n_samples > n_features since the computation of the Gram matrix is\n computationally expensive and outweighs the benefit of fitting the Gram\n for just one alpha.\n ``precompute=\"auto\"`` is now deprecated and will be removed in 0.18\n By `Manoj Kumar`_.\n\n- Expose ``positive`` option in :func:`linear_model.enet_path` and\n :func:`linear_model.enet_path` which constrains coefficients to be\n positive. By `Manoj Kumar`_.\n\n- Users should now supply an explicit ``average`` parameter to\n :func:`sklearn.metrics.f1_score`, :func:`sklearn.metrics.fbeta_score`,\n :func:`sklearn.metrics.recall_score` and\n :func:`sklearn.metrics.precision_score` when performing multiclass\n or multilabel (i.e. not binary) classification. By `Joel Nothman`_.\n\n- `scoring` parameter for cross validation now accepts `'f1_micro'`,\n `'f1_macro'` or `'f1_weighted'`. `'f1'` is now for binary classification\n only. Similar changes apply to `'precision'` and `'recall'`.\n By `Joel Nothman`_.\n\n- The ``fit_intercept``, ``normalize`` and ``return_models`` parameters in\n :func:`linear_model.enet_path` and :func:`linear_model.lasso_path` have\n been removed. They were deprecated since 0.14\n\n- From now onwards, all estimators will uniformly raise ``NotFittedError``\n (:class:`utils.validation.NotFittedError`), when any of the ``predict``\n like methods are called before the model is fit. By `Raghav RV`_.\n\n- Input data validation was refactored for more consistent input\n validation. The ``check_arrays`` function was replaced by ``check_array``\n and ``check_X_y``. By `Andreas M\u00fcller`_.\n\n- Allow ``X=None`` in the methods ``radius_neighbors``, ``kneighbors``,\n ``kneighbors_graph`` and ``radius_neighbors_graph`` in\n :class:`sklearn.neighbors.NearestNeighbors` and family. If set to None,\n then for every sample this avoids setting the sample itself as the\n first nearest neighbor. By `Manoj Kumar`_.\n\n- Add parameter ``include_self`` in :func:`neighbors.kneighbors_graph`\n and :func:`neighbors.radius_neighbors_graph` which has to be explicitly\n set by the user. If set to True, then the sample itself is considered\n as the first nearest neighbor.\n\n- `thresh` parameter is deprecated in favor of new `tol` parameter in\n :class:`GMM`, :class:`DPGMM` and :class:`VBGMM`. See `Enhancements`\n section for details. By `Herv\u00e9 Bredin`_.\n\n- Estimators will treat input with dtype object as numeric when possible.\n By `Andreas M\u00fcller`_\n\n- Estimators now raise `ValueError` consistently when fitted on empty\n data (less than 1 sample or less than 1 feature for 2D input).\n By `Olivier Grisel`_.\n- The ``shuffle`` option of :class:`.linear_model.SGDClassifier`,\n :class:`linear_model.SGDRegressor`, :class:`linear_model.Perceptron`,\n :class:`linear_model.PassiveAgressiveClassifier` and\n :class:`linear_model.PassiveAgressiveRegressor` now defaults to ``True``.\n\n- :class:`cluster.DBSCAN` now uses a deterministic initialization. The\n `random_state` parameter is deprecated. By :user:`Erich Schubert `.\n\nCode Contributors\n-----------------\nA. Flaxman, Aaron Schumacher, Aaron Staple, abhishek thakur, Akshay, akshayah3,\nAldrian Obaja, Alexander Fabisch, Alexandre Gramfort, Alexis Mignon, Anders\nAagaard, Andreas Mueller, Andreas van Cranenburgh, Andrew Tulloch, Andrew\nWalker, Antony Lee, Arnaud Joly, banilo, Barmaley.exe, Ben Davies, Benedikt\nKoehler, bhsu, Boris Feld, Borja Ayerdi, Boyuan Deng, Brent Pedersen, Brian\nWignall, Brooke Osborn, Calvin Giles, Cathy Deng, Celeo, cgohlke, chebee7i,\nChristian Stade-Schuldt, Christof Angermueller, Chyi-Kwei Yau, CJ Carey,\nClemens Brunner, Daiki Aminaka, Dan Blanchard, danfrankj, Danny Sullivan, David\nFletcher, Dmitrijs Milajevs, Dougal J. Sutherland, Erich Schubert, Fabian\nPedregosa, Florian Wilhelm, floydsoft, F\u00e9lix-Antoine Fortin, Gael Varoquaux,\nGarrett-R, Gilles Louppe, gpassino, gwulfs, Hampus Bengtsson, Hamzeh Alsalhi,\nHanna Wallach, Harry Mavroforakis, Hasil Sharma, Helder, Herve Bredin,\nHsiang-Fu Yu, Hugues SALAMIN, Ian Gilmore, Ilambharathi Kanniah, Imran Haque,\nisms, Jake VanderPlas, Jan Dlabal, Jan Hendrik Metzen, Jatin Shah, Javier L\u00f3pez\nPe\u00f1a, jdcaballero, Jean Kossaifi, Jeff Hammerbacher, Joel Nothman, Jonathan\nHelmus, Joseph, Kaicheng Zhang, Kevin Markham, Kyle Beauchamp, Kyle Kastner,\nLagacherie Matthieu, Lars Buitinck, Laurent Direr, leepei, Loic Esteve, Luis\nPedro Coelho, Lukas Michelbacher, maheshakya, Manoj Kumar, Manuel, Mario\nMichael Krell, Martin, Martin Billinger, Martin Ku, Mateusz Susik, Mathieu\nBlondel, Matt Pico, Matt Terry, Matteo Visconti dOC, Matti Lyra, Max Linke,\nMehdi Cherti, Michael Bommarito, Michael Eickenberg, Michal Romaniuk, MLG,\nmr.Shu, Nelle Varoquaux, Nicola Montecchio, Nicolas, Nikolay Mayorov, Noel\nDawe, Okal Billy, Olivier Grisel, \u00d3scar N\u00e1jera, Paolo Puggioni, Peter\nPrettenhofer, Pratap Vardhan, pvnguyen, queqichao, Rafael Carrascosa, Raghav R\nV, Rahiel Kasim, Randall Mason, Rob Zinkov, Robert Bradshaw, Saket Choudhary,\nSam Nicholls, Samuel Charron, Saurabh Jha, sethdandridge, sinhrks, snuderl,\nStefan Otte, Stefan van der Walt, Steve Tjoa, swu, Sylvain Zimmer, tejesh95,\nterrycojones, Thomas Delteil, Thomas Unterthiner, Tomas Kazmar, trevorstephens,\ntttthomasssss, Tzu-Ming Kuo, ugurcaliskan, ugurthemaster, Vinayak Mehta,\nVincent Dubourg, Vjacheslav Murashkin, Vlad Niculae, wadawson, Wei Xue, Will\nLamond, Wu Jiang, x0l, Xinfan Meng, Yan Yi, Yu-Chin\n\n.. _changes_0_15_2:\n\nVersion 0.15.2\n==============\n\n**September 4, 2014**\n\nBug fixes\n---------\n\n- Fixed handling of the ``p`` parameter of the Minkowski distance that was\n previously ignored in nearest neighbors models. By :user:`Nikolay\n Mayorov `.\n\n- Fixed duplicated alphas in :class:`linear_model.LassoLars` with early\n stopping on 32 bit Python. By `Olivier Grisel`_ and `Fabian Pedregosa`_.\n\n- Fixed the build under Windows when scikit-learn is built with MSVC while\n NumPy is built with MinGW. By `Olivier Grisel`_ and :user:`Federico\n Vaggi `.\n\n- Fixed an array index overflow bug in the coordinate descent solver. By\n `Gael Varoquaux`_.\n\n- Better handling of numpy 1.9 deprecation warnings. By `Gael Varoquaux`_.\n\n- Removed unnecessary data copy in :class:`cluster.KMeans`.\n By `Gael Varoquaux`_.\n\n- Explicitly close open files to avoid ``ResourceWarnings`` under Python 3.\n By Calvin Giles.\n\n- The ``transform`` of :class:`discriminant_analysis.LinearDiscriminantAnalysis`\n now projects the input on the most discriminant directions. By Martin Billinger.\n\n- Fixed potential overflow in ``_tree.safe_realloc`` by `Lars Buitinck`_.\n\n- Performance optimization in :class:`isotonic.IsotonicRegression`.\n By Robert Bradshaw.\n\n- ``nose`` is non-longer a runtime dependency to import ``sklearn``, only for\n running the tests. By `Joel Nothman`_.\n\n- Many documentation and website fixes by `Joel Nothman`_, `Lars Buitinck`_\n :user:`Matt Pico `, and others.\n\n.. _changes_0_15_1:\n\nVersion 0.15.1\n==============\n\n**August 1, 2014**\n\nBug fixes\n---------\n\n- Made :func:`cross_validation.cross_val_score` use\n :class:`cross_validation.KFold` instead of\n :class:`cross_validation.StratifiedKFold` on multi-output classification\n problems. By :user:`Nikolay Mayorov `.\n\n- Support unseen labels :class:`preprocessing.LabelBinarizer` to restore\n the default behavior of 0.14.1 for backward compatibility. By\n :user:`Hamzeh Alsalhi `.\n\n- Fixed the :class:`cluster.KMeans` stopping criterion that prevented early\n convergence detection. By Edward Raff and `Gael Varoquaux`_.\n\n- Fixed the behavior of :class:`multiclass.OneVsOneClassifier`.\n in case of ties at the per-class vote level by computing the correct\n per-class sum of prediction scores. By `Andreas M\u00fcller`_.\n\n- Made :func:`cross_validation.cross_val_score` and\n :class:`grid_search.GridSearchCV` accept Python lists as input data.\n This is especially useful for cross-validation and model selection of\n text processing pipelines. By `Andreas M\u00fcller`_.\n\n- Fixed data input checks of most estimators to accept input data that\n implements the NumPy ``__array__`` protocol. This is the case for\n for ``pandas.Series`` and ``pandas.DataFrame`` in recent versions of\n pandas. By `Gael Varoquaux`_.\n\n- Fixed a regression for :class:`linear_model.SGDClassifier` with\n ``class_weight=\"auto\"`` on data with non-contiguous labels. By\n `Olivier Grisel`_.\n.. _changes_0_15:\n\nVersion 0.15\n============\n\n**July 15, 2014**\n\nHighlights\n-----------\n\n- Many speed and memory improvements all across the code\n\n- Huge speed and memory improvements to random forests (and extra\n trees) that also benefit better from parallel computing.\n\n- Incremental fit to :class:`BernoulliRBM `\n\n- Added :class:`cluster.AgglomerativeClustering` for hierarchical\n agglomerative clustering with average linkage, complete linkage and\n ward strategies.\n\n- Added :class:`linear_model.RANSACRegressor` for robust regression\n models.\n\n- Added dimensionality reduction with :class:`manifold.TSNE` which can be\n used to visualize high-dimensional data.\nChangelog\n---------\n\nNew features\n............\n\n- Added :class:`ensemble.BaggingClassifier` and\n :class:`ensemble.BaggingRegressor` meta-estimators for ensembling\n any kind of base estimator. See the :ref:`Bagging ` section of\n the user guide for details and examples. By `Gilles Louppe`_.\n\n- New unsupervised feature selection algorithm\n :class:`feature_selection.VarianceThreshold`, by `Lars Buitinck`_.\n\n- Added :class:`linear_model.RANSACRegressor` meta-estimator for the robust\n fitting of regression models. By :user:`Johannes Sch\u00f6nberger `.\n\n- Added :class:`cluster.AgglomerativeClustering` for hierarchical\n agglomerative clustering with average linkage, complete linkage and\n ward strategies, by `Nelle Varoquaux`_ and `Gael Varoquaux`_.\n\n- Shorthand constructors :func:`pipeline.make_pipeline` and\n :func:`pipeline.make_union` were added by `Lars Buitinck`_.\n\n- Shuffle option for :class:`cross_validation.StratifiedKFold`.\n By :user:`Jeffrey Blackburne `.\n\n- Incremental learning (``partial_fit``) for Gaussian Naive Bayes by\n Imran Haque.\n\n- Added ``partial_fit`` to :class:`BernoulliRBM\n `\n By :user:`Danny Sullivan `.\n\n- Added :func:`learning_curve ` utility to\n chart performance with respect to training size. See\n :ref:`sphx_glr_auto_examples_model_selection_plot_learning_curve.py`. By Alexander Fabisch.\n\n- Add positive option in :class:`LassoCV ` and\n :class:`ElasticNetCV `.\n By Brian Wignall and `Alexandre Gramfort`_.\n\n- Added :class:`linear_model.MultiTaskElasticNetCV` and\n :class:`linear_model.MultiTaskLassoCV`. By `Manoj Kumar`_.\n\n- Added :class:`manifold.TSNE`. By Alexander Fabisch.\n\nEnhancements\n............\n\n- Add sparse input support to :class:`ensemble.AdaBoostClassifier` and\n :class:`ensemble.AdaBoostRegressor` meta-estimators.\n By :user:`Hamzeh Alsalhi `.\n\n- Memory improvements of decision trees, by `Arnaud Joly`_.\n\n- Decision trees can now be built in best-first manner by using ``max_leaf_nodes``\n as the stopping criteria. Refactored the tree code to use either a\n stack or a priority queue for tree building.\n By `Peter Prettenhofer`_ and `Gilles Louppe`_.\n\n- Decision trees can now be fitted on fortran- and c-style arrays, and\n non-continuous arrays without the need to make a copy.\n If the input array has a different dtype than ``np.float32``, a fortran-\n style copy will be made since fortran-style memory layout has speed\n advantages. By `Peter Prettenhofer`_ and `Gilles Louppe`_.\n\n- Speed improvement of regression trees by optimizing the\n the computation of the mean square error criterion. This lead\n to speed improvement of the tree, forest and gradient boosting tree\n modules. By `Arnaud Joly`_\n\n- The ``img_to_graph`` and ``grid_tograph`` functions in\n :mod:`sklearn.feature_extraction.image` now return ``np.ndarray``\n instead of ``np.matrix`` when ``return_as=np.ndarray``. See the\n Notes section for more information on compatibility.\n\n- Changed the internal storage of decision trees to use a struct array.\n This fixed some small bugs, while improving code and providing a small\n speed gain. By `Joel Nothman`_.\n\n- Reduce memory usage and overhead when fitting and predicting with forests\n of randomized trees in parallel with ``n_jobs != 1`` by leveraging new\n threading backend of joblib 0.8 and releasing the GIL in the tree fitting\n Cython code. By `Olivier Grisel`_ and `Gilles Louppe`_.\n\n- Speed improvement of the :mod:`sklearn.ensemble.gradient_boosting` module.\n By `Gilles Louppe`_ and `Peter Prettenhofer`_.\n\n- Various enhancements to the :mod:`sklearn.ensemble.gradient_boosting`\n module: a ``warm_start`` argument to fit additional trees,\n a ``max_leaf_nodes`` argument to fit GBM style trees,\n a ``monitor`` fit argument to inspect the estimator during training, and\n refactoring of the verbose code. By `Peter Prettenhofer`_.\n\n- Faster :class:`sklearn.ensemble.ExtraTrees` by caching feature values.\n By `Arnaud Joly`_.\n\n- Faster depth-based tree building algorithm such as decision tree,\n random forest, extra trees or gradient tree boosting (with depth based\n growing strategy) by avoiding trying to split on found constant features\n in the sample subset. By `Arnaud Joly`_.\n\n- Add ``min_weight_fraction_leaf`` pre-pruning parameter to tree-based\n methods: the minimum weighted fraction of the input samples required to be\n at a leaf node. By `Noel Dawe`_.\n\n- Added :func:`metrics.pairwise_distances_argmin_min`, by Philippe Gervais.\n\n- Added predict method to :class:`cluster.AffinityPropagation` and\n :class:`cluster.MeanShift`, by `Mathieu Blondel`_.\n\n- Vector and matrix multiplications have been optimised throughout the\n library by `Denis Engemann`_, and `Alexandre Gramfort`_.\n In particular, they should take less memory with older NumPy versions\n (prior to 1.7.2).\n\n- Precision-recall and ROC examples now use train_test_split, and have more\n explanation of why these metrics are useful. By `Kyle Kastner`_\n\n- The training algorithm for :class:`decomposition.NMF` is faster for\n sparse matrices and has much lower memory complexity, meaning it will\n scale up gracefully to large datasets. By `Lars Buitinck`_.\n\n- Added svd_method option with default value to \"randomized\" to\n :class:`decomposition.FactorAnalysis` to save memory and\n significantly speedup computation by `Denis Engemann`_, and\n `Alexandre Gramfort`_.\n\n- Changed :class:`cross_validation.StratifiedKFold` to try and\n preserve as much of the original ordering of samples as possible so as\n not to hide overfitting on datasets with a non-negligible level of\n samples dependency.\n By `Daniel Nouri`_ and `Olivier Grisel`_.\n\n- Add multi-output support to :class:`gaussian_process.GaussianProcess`\n by John Novak.\n\n- Support for precomputed distance matrices in nearest neighbor estimators\n by `Robert Layton`_ and `Joel Nothman`_.\n\n- Norm computations optimized for NumPy 1.6 and later versions by\n `Lars Buitinck`_. In particular, the k-means algorithm no longer\n needs a temporary data structure the size of its input.\n\n- :class:`dummy.DummyClassifier` can now be used to predict a constant\n output value. By `Manoj Kumar`_.\n\n- :class:`dummy.DummyRegressor` has now a strategy parameter which allows\n to predict the mean, the median of the training set or a constant\n output value. By :user:`Maheshakya Wijewardena `.\n\n- Multi-label classification output in multilabel indicator format\n is now supported by :func:`metrics.roc_auc_score` and\n :func:`metrics.average_precision_score` by `Arnaud Joly`_.\n\n- Significant performance improvements (more than 100x speedup for\n large problems) in :class:`isotonic.IsotonicRegression` by\n `Andrew Tulloch`_.\n\n- Speed and memory usage improvements to the SGD algorithm for linear\n models: it now uses threads, not separate processes, when ``n_jobs>1``.\n By `Lars Buitinck`_.\n\n- Grid search and cross validation allow NaNs in the input arrays so that\n preprocessors such as :class:`preprocessing.Imputer\n ` can be trained within the cross validation loop,\n avoiding potentially skewed results.\n\n- Ridge regression can now deal with sample weights in feature space\n (only sample space until then). By :user:`Michael Eickenberg `.\n Both solutions are provided by the Cholesky solver.\n\n- Several classification and regression metrics now support weighted\n samples with the new ``sample_weight`` argument:\n :func:`metrics.accuracy_score`,\n :func:`metrics.zero_one_loss`,\n :func:`metrics.precision_score`,\n :func:`metrics.average_precision_score`,\n :func:`metrics.f1_score`,\n :func:`metrics.fbeta_score`,\n :func:`metrics.recall_score`,\n :func:`metrics.roc_auc_score`,\n :func:`metrics.explained_variance_score`,\n :func:`metrics.mean_squared_error`,\n :func:`metrics.mean_absolute_error`,\n :func:`metrics.r2_score`.\n By `Noel Dawe`_.\n\n- Speed up of the sample generator\n :func:`datasets.make_multilabel_classification`. By `Joel Nothman`_.\n\nDocumentation improvements\n...........................\n\n- The :ref:`Working With Text Data ` tutorial\n has now been worked in to the main documentation's tutorial section.\n Includes exercises and skeletons for tutorial presentation.\n Original tutorial created by several authors including\n `Olivier Grisel`_, Lars Buitinck and many others.\n Tutorial integration into the scikit-learn documentation\n by `Jaques Grobler`_\n\n- Added :ref:`Computational Performance `\n documentation. Discussion and examples of prediction latency / throughput\n and different factors that have influence over speed. Additional tips for\n building faster models and choosing a relevant compromise between speed\n and predictive power.\n By :user:`Eustache Diemert `.\n\nBug fixes\n.........\n\n- Fixed bug in :class:`decomposition.MiniBatchDictionaryLearning` :\n ``partial_fit`` was not working properly.\n\n- Fixed bug in :class:`linear_model.stochastic_gradient` :\n ``l1_ratio`` was used as ``(1.0 - l1_ratio)`` .\n\n- Fixed bug in :class:`multiclass.OneVsOneClassifier` with string\n labels\n\n- Fixed a bug in :class:`LassoCV ` and\n :class:`ElasticNetCV `: they would not\n pre-compute the Gram matrix with ``precompute=True`` or\n ``precompute=\"auto\"`` and ``n_samples > n_features``. By `Manoj Kumar`_.\n\n- Fixed incorrect estimation of the degrees of freedom in\n :func:`feature_selection.f_regression` when variates are not centered.\n By :user:`Virgile Fritsch `.\n\n- Fixed a race condition in parallel processing with\n ``pre_dispatch != \"all\"`` (for instance, in ``cross_val_score``).\n By `Olivier Grisel`_.\n\n- Raise error in :class:`cluster.FeatureAgglomeration` and\n :class:`cluster.WardAgglomeration` when no samples are given,\n rather than returning meaningless clustering.\n\n- Fixed bug in :class:`gradient_boosting.GradientBoostingRegressor` with\n ``loss='huber'``: ``gamma`` might have not been initialized.\n\n- Fixed feature importances as computed with a forest of randomized trees\n when fit with ``sample_weight != None`` and/or with ``bootstrap=True``.\n By `Gilles Louppe`_.\n\nAPI changes summary\n-------------------\n\n- :mod:`sklearn.hmm` is deprecated. Its removal is planned\n for the 0.17 release.\n\n- Use of :class:`covariance.EllipticEnvelop` has now been removed after\n deprecation.\n Please use :class:`covariance.EllipticEnvelope` instead.\n\n- :class:`cluster.Ward` is deprecated. Use\n :class:`cluster.AgglomerativeClustering` instead.\n\n- :class:`cluster.WardClustering` is deprecated. Use\n- :class:`cluster.AgglomerativeClustering` instead.\n\n- :class:`cross_validation.Bootstrap` is deprecated.\n :class:`cross_validation.KFold` or\n :class:`cross_validation.ShuffleSplit` are recommended instead.\n\n- Direct support for the sequence of sequences (or list of lists) multilabel\n format is deprecated. To convert to and from the supported binary\n indicator matrix format, use\n :class:`MultiLabelBinarizer `.\n By `Joel Nothman`_.\n\n- Add score method to :class:`PCA ` following the model of\n probabilistic PCA and deprecate\n :class:`ProbabilisticPCA ` model whose\n score implementation is not correct. The computation now also exploits the\n matrix inversion lemma for faster computation. By `Alexandre Gramfort`_.\n\n- The score method of :class:`FactorAnalysis `\n now returns the average log-likelihood of the samples. Use score_samples\n to get log-likelihood of each sample. By `Alexandre Gramfort`_.\n\n- Generating boolean masks (the setting ``indices=False``)\n from cross-validation generators is deprecated.\n Support for masks will be removed in 0.17.\n The generators have produced arrays of indices by default since 0.10.\n By `Joel Nothman`_.\n\n- 1-d arrays containing strings with ``dtype=object`` (as used in Pandas)\n are now considered valid classification targets. This fixes a regression\n from version 0.13 in some classifiers. By `Joel Nothman`_.\n\n- Fix wrong ``explained_variance_ratio_`` attribute in\n :class:`RandomizedPCA `.\n By `Alexandre Gramfort`_.\n\n- Fit alphas for each ``l1_ratio`` instead of ``mean_l1_ratio`` in\n :class:`linear_model.ElasticNetCV` and :class:`linear_model.LassoCV`.\n This changes the shape of ``alphas_`` from ``(n_alphas,)`` to\n ``(n_l1_ratio, n_alphas)`` if the ``l1_ratio`` provided is a 1-D array like\n object of length greater than one.\n By `Manoj Kumar`_.\n\n- Fix :class:`linear_model.ElasticNetCV` and :class:`linear_model.LassoCV`\n when fitting intercept and input data is sparse. The automatic grid\n of alphas was not computed correctly and the scaling with normalize\n was wrong. By `Manoj Kumar`_.\n\n- Fix wrong maximal number of features drawn (``max_features``) at each split\n for decision trees, random forests and gradient tree boosting.\n Previously, the count for the number of drawn features started only after\n one non constant features in the split. This bug fix will affect\n computational and generalization performance of those algorithms in the\n presence of constant features. To get back previous generalization\n performance, you should modify the value of ``max_features``.\n By `Arnaud Joly`_.\n\n- Fix wrong maximal number of features drawn (``max_features``) at each split\n for :class:`ensemble.ExtraTreesClassifier` and\n :class:`ensemble.ExtraTreesRegressor`. Previously, only non constant\n features in the split was counted as drawn. Now constant features are\n counted as drawn. Furthermore at least one feature must be non constant\n in order to make a valid split. This bug fix will affect\n computational and generalization performance of extra trees in the\n presence of constant features. To get back previous generalization\n performance, you should modify the value of ``max_features``.\n By `Arnaud Joly`_.\n\n- Fix :func:`utils.compute_class_weight` when ``class_weight==\"auto\"``.\n Previously it was broken for input of non-integer ``dtype`` and the\n weighted array that was returned was wrong. By `Manoj Kumar`_.\n\n- Fix :class:`cross_validation.Bootstrap` to return ``ValueError``\n when ``n_train + n_test > n``. By :user:`Ronald Phlypo `.\nPeople\n------\n\nList of contributors for release 0.15 by number of commits.\n\n* 312 Olivier Grisel\n* 275 Lars Buitinck\n* 221 Gael Varoquaux\n* 148 Arnaud Joly\n* 134 Johannes Sch\u00f6nberger\n* 119 Gilles Louppe\n* 113 Joel Nothman\n* 111 Alexandre Gramfort\n* 95 Jaques Grobler\n* 89 Denis Engemann\n* 83 Peter Prettenhofer\n* 83 Alexander Fabisch\n* 62 Mathieu Blondel\n* 60 Eustache Diemert\n* 60 Nelle Varoquaux\n* 49 Michael Bommarito\n* 45 Manoj-Kumar-S\n* 28 Kyle Kastner\n* 26 Andreas Mueller\n* 22 Noel Dawe\n* 21 Maheshakya Wijewardena\n* 21 Brooke Osborn\n* 21 Hamzeh Alsalhi\n* 21 Jake VanderPlas\n* 21 Philippe Gervais\n* 19 Bala Subrahmanyam Varanasi\n* 12 Ronald Phlypo\n* 10 Mikhail Korobov\n* 8 Thomas Unterthiner\n* 8 Jeffrey Blackburne\n* 8 eltermann\n* 8 bwignall\n* 7 Ankit Agrawal\n* 7 CJ Carey\n* 6 Daniel Nouri\n* 6 Chen Liu\n* 6 Michael Eickenberg\n* 6 ugurthemaster\n* 5 Aaron Schumacher\n* 5 Baptiste Lagarde\n* 5 Rajat Khanduja\n* 5 Robert McGibbon\n* 5 Sergio Pascual\n* 4 Alexis Metaireau\n* 4 Ignacio Rossi\n* 4 Virgile Fritsch\n* 4 Sebastian S\u00e4ger\n* 4 Ilambharathi Kanniah\n* 4 sdenton4\n* 4 Robert Layton\n* 4 Alyssa\n* 4 Amos Waterland\n* 3 Andrew Tulloch\n* 3 murad\n* 3 Steven Maude\n* 3 Karol Pysniak\n* 3 Jacques Kvam\n* 3 cgohlke\n* 3 cjlin\n* 3 Michael Becker\n* 3 hamzeh\n* 3 Eric Jacobsen\n* 3 john collins\n* 3 kaushik94\n* 3 Erwin Marsi\n* 2 csytracy\n* 2 LK\n* 2 Vlad Niculae\n* 2 Laurent Direr\n* 2 Erik Shilts\n* 2 Raul Garreta\n* 2 Yoshiki V\u00e1zquez Baeza\n* 2 Yung Siang Liau\n* 2 abhishek thakur\n* 2 James Yu\n* 2 Rohit Sivaprasad\n* 2 Roland Szabo\n* 2 amormachine\n* 2 Alexis Mignon\n* 2 Oscar Carlsson\n* 2 Nantas Nardelli\n* 2 jess010\n* 2 kowalski87\n* 2 Andrew Clegg\n* 2 Federico Vaggi\n* 2 Simon Frid\n* 2 F\u00e9lix-Antoine Fortin\n* 1 Ralf Gommers\n* 1 t-aft\n* 1 Ronan Amicel\n* 1 Rupesh Kumar Srivastava\n* 1 Ryan Wang\n* 1 Samuel Charron\n* 1 Samuel St-Jean\n* 1 Fabian Pedregosa\n* 1 Skipper Seabold\n* 1 Stefan Walk\n* 1 Stefan van der Walt\n* 1 Stephan Hoyer\n* 1 Allen Riddell\n* 1 Valentin Haenel\n* 1 Vijay Ramesh\n* 1 Will Myers\n* 1 Yaroslav Halchenko\n* 1 Yoni Ben-Meshulam\n* 1 Yury V. Zaytsev\n* 1 adrinjalali\n* 1 ai8rahim\n* 1 alemagnani\n* 1 alex\n* 1 benjamin wilson\n* 1 chalmerlowe\n* 1 dzikie dro\u017cd\u017ce\n* 1 jamestwebber\n* 1 matrixorz\n* 1 popo\n* 1 samuela\n* 1 Fran\u00e7ois Boulogne\n* 1 Alexander Measure\n* 1 Ethan White\n* 1 Guilherme Trein\n* 1 Hendrik Heuer\n* 1 IvicaJovic\n* 1 Jan Hendrik Metzen\n* 1 Jean Michel Rouly\n* 1 Eduardo Ari\u00f1o de la Rubia\n* 1 Jelle Zijlstra\n* 1 Eddy L O Jansson\n* 1 Denis\n* 1 John\n* 1 John Schmidt\n* 1 Jorge Ca\u00f1ardo Alastuey\n* 1 Joseph Perla\n* 1 Joshua Vredevoogd\n* 1 Jos\u00e9 Ricardo\n* 1 Julien Miotte\n* 1 Kemal Eren\n* 1 Kenta Sato\n* 1 David Cournapeau\n* 1 Kyle Kelley\n* 1 Daniele Medri\n* 1 Laurent Luce\n* 1 Laurent Pierron\n* 1 Luis Pedro Coelho\n* 1 DanielWeitzenfeld\n* 1 Craig Thompson\n* 1 Chyi-Kwei Yau\n* 1 Matthew Brett\n* 1 Matthias Feurer\n* 1 Max Linke\n* 1 Chris Filo Gorgolewski\n* 1 Charles Earl\n* 1 Michael Hanke\n* 1 Michele Orr\u00f9\n* 1 Bryan Lunt\n* 1 Brian Kearns\n* 1 Paul Butler\n* 1 Pawe\u0142 Mandera\n* 1 Peter\n* 1 Andrew Ash\n* 1 Pietro Zambelli\n* 1 staubda\n.. _changes_0_14:\n\nVersion 0.14\n===============\n\n**August 7, 2013**\n\nChangelog\n---------\n\n- Missing values with sparse and dense matrices can be imputed with the\n transformer :class:`preprocessing.Imputer` by `Nicolas Tr\u00e9segnie`_.\n\n- The core implementation of decisions trees has been rewritten from\n scratch, allowing for faster tree induction and lower memory\n consumption in all tree-based estimators. By `Gilles Louppe`_.\n\n- Added :class:`ensemble.AdaBoostClassifier` and\n :class:`ensemble.AdaBoostRegressor`, by `Noel Dawe`_ and\n `Gilles Louppe`_. See the :ref:`AdaBoost ` section of the user\n guide for details and examples.\n\n- Added :class:`grid_search.RandomizedSearchCV` and\n :class:`grid_search.ParameterSampler` for randomized hyperparameter\n optimization. By `Andreas M\u00fcller`_.\n\n- Added :ref:`biclustering ` algorithms\n (:class:`sklearn.cluster.bicluster.SpectralCoclustering` and\n :class:`sklearn.cluster.bicluster.SpectralBiclustering`), data\n generation methods (:func:`sklearn.datasets.make_biclusters` and\n :func:`sklearn.datasets.make_checkerboard`), and scoring metrics\n (:func:`sklearn.metrics.consensus_score`). By `Kemal Eren`_.\n\n- Added :ref:`Restricted Boltzmann Machines `\n (:class:`neural_network.BernoulliRBM`). By `Yann Dauphin`_.\n\n- Python 3 support by :user:`Justin Vincent `, `Lars Buitinck`_,\n :user:`Subhodeep Moitra ` and `Olivier Grisel`_. All tests now pass under\n Python 3.3.\n\n- Ability to pass one penalty (alpha value) per target in\n :class:`linear_model.Ridge`, by @eickenberg and `Mathieu Blondel`_.\n\n- Fixed :mod:`sklearn.linear_model.stochastic_gradient.py` L2 regularization\n issue (minor practical significance).\n By :user:`Norbert Crombach ` and `Mathieu Blondel`_ .\n\n- Added an interactive version of `Andreas M\u00fcller`_'s\n `Machine Learning Cheat Sheet (for scikit-learn)\n `_\n to the documentation. See :ref:`Choosing the right estimator `.\n By `Jaques Grobler`_.\n\n- :class:`grid_search.GridSearchCV` and\n :func:`cross_validation.cross_val_score` now support the use of advanced\n scoring function such as area under the ROC curve and f-beta scores.\n See :ref:`scoring_parameter` for details. By `Andreas M\u00fcller`_\n and `Lars Buitinck`_.\n Passing a function from :mod:`sklearn.metrics` as ``score_func`` is\n deprecated.\n\n- Multi-label classification output is now supported by\n :func:`metrics.accuracy_score`, :func:`metrics.zero_one_loss`,\n :func:`metrics.f1_score`, :func:`metrics.fbeta_score`,\n :func:`metrics.classification_report`,\n :func:`metrics.precision_score` and :func:`metrics.recall_score`\n by `Arnaud Joly`_.\n\n- Two new metrics :func:`metrics.hamming_loss` and\n :func:`metrics.jaccard_similarity_score`\n are added with multi-label support by `Arnaud Joly`_.\n\n- Speed and memory usage improvements in\n :class:`feature_extraction.text.CountVectorizer` and\n :class:`feature_extraction.text.TfidfVectorizer`,\n by Jochen Wersd\u00f6rfer and Roman Sinayev.\n\n- The ``min_df`` parameter in\n :class:`feature_extraction.text.CountVectorizer` and\n :class:`feature_extraction.text.TfidfVectorizer`, which used to be 2,\n has been reset to 1 to avoid unpleasant surprises (empty vocabularies)\n for novice users who try it out on tiny document collections.\n A value of at least 2 is still recommended for practical use.\n\n- :class:`svm.LinearSVC`, :class:`linear_model.SGDClassifier` and\n :class:`linear_model.SGDRegressor` now have a ``sparsify`` method that\n converts their ``coef_`` into a sparse matrix, meaning stored models\n trained using these estimators can be made much more compact.\n\n- :class:`linear_model.SGDClassifier` now produces multiclass probability\n estimates when trained under log loss or modified Huber loss.\n\n- Hyperlinks to documentation in example code on the website by\n :user:`Martin Luessi `.\n\n- Fixed bug in :class:`preprocessing.MinMaxScaler` causing incorrect scaling\n of the features for non-default ``feature_range`` settings. By `Andreas\n M\u00fcller`_.\n\n- ``max_features`` in :class:`tree.DecisionTreeClassifier`,\n :class:`tree.DecisionTreeRegressor` and all derived ensemble estimators\n now supports percentage values. By `Gilles Louppe`_.\n\n- Performance improvements in :class:`isotonic.IsotonicRegression` by\n `Nelle Varoquaux`_.\n\n- :func:`metrics.accuracy_score` has an option normalize to return\n the fraction or the number of correctly classified sample\n by `Arnaud Joly`_.\n\n- Added :func:`metrics.log_loss` that computes log loss, aka cross-entropy\n loss. By Jochen Wersd\u00f6rfer and `Lars Buitinck`_.\n\n- A bug that caused :class:`ensemble.AdaBoostClassifier`'s to output\n incorrect probabilities has been fixed.\n\n- Feature selectors now share a mixin providing consistent ``transform``,\n ``inverse_transform`` and ``get_support`` methods. By `Joel Nothman`_.\n\n- A fitted :class:`grid_search.GridSearchCV` or\n :class:`grid_search.RandomizedSearchCV` can now generally be pickled.\n By `Joel Nothman`_.\n\n- Refactored and vectorized implementation of :func:`metrics.roc_curve`\n and :func:`metrics.precision_recall_curve`. By `Joel Nothman`_.\n\n- The new estimator :class:`sklearn.decomposition.TruncatedSVD`\n performs dimensionality reduction using SVD on sparse matrices,\n and can be used for latent semantic analysis (LSA).\n By `Lars Buitinck`_.\n\n- Added self-contained example of out-of-core learning on text data\n :ref:`sphx_glr_auto_examples_applications_plot_out_of_core_classification.py`.\n By :user:`Eustache Diemert `.\n\n- The default number of components for\n :class:`sklearn.decomposition.RandomizedPCA` is now correctly documented\n to be ``n_features``. This was the default behavior, so programs using it\n will continue to work as they did.\n\n- :class:`sklearn.cluster.KMeans` now fits several orders of magnitude\n faster on sparse data (the speedup depends on the sparsity). By\n `Lars Buitinck`_.\n\n- Reduce memory footprint of FastICA by `Denis Engemann`_ and\n `Alexandre Gramfort`_.\n\n- Verbose output in :mod:`sklearn.ensemble.gradient_boosting` now uses\n a column format and prints progress in decreasing frequency.\n It also shows the remaining time. By `Peter Prettenhofer`_.\n\n- :mod:`sklearn.ensemble.gradient_boosting` provides out-of-bag improvement\n :attr:`~sklearn.ensemble.GradientBoostingRegressor.oob_improvement_`\n rather than the OOB score for model selection. An example that shows\n how to use OOB estimates to select the number of trees was added.\n By `Peter Prettenhofer`_.\n\n- Most metrics now support string labels for multiclass classification\n by `Arnaud Joly`_ and `Lars Buitinck`_.\n\n- New OrthogonalMatchingPursuitCV class by `Alexandre Gramfort`_\n and `Vlad Niculae`_.\n\n- Fixed a bug in :class:`sklearn.covariance.GraphLassoCV`: the\n 'alphas' parameter now works as expected when given a list of\n values. By Philippe Gervais.\n\n- Fixed an important bug in :class:`sklearn.covariance.GraphLassoCV`\n that prevented all folds provided by a CV object to be used (only\n the first 3 were used). When providing a CV object, execution\n time may thus increase significantly compared to the previous\n version (bug results are correct now). By Philippe Gervais.\n\n- :class:`cross_validation.cross_val_score` and the :mod:`grid_search`\n module is now tested with multi-output data by `Arnaud Joly`_.\n\n- :func:`datasets.make_multilabel_classification` can now return\n the output in label indicator multilabel format by `Arnaud Joly`_.\n\n- K-nearest neighbors, :class:`neighbors.KNeighborsRegressor`\n and :class:`neighbors.RadiusNeighborsRegressor`,\n and radius neighbors, :class:`neighbors.RadiusNeighborsRegressor` and\n :class:`neighbors.RadiusNeighborsClassifier` support multioutput data\n by `Arnaud Joly`_.\n\n- Random state in LibSVM-based estimators (:class:`svm.SVC`, :class:`NuSVC`,\n :class:`OneClassSVM`, :class:`svm.SVR`, :class:`svm.NuSVR`) can now be\n controlled. This is useful to ensure consistency in the probability\n estimates for the classifiers trained with ``probability=True``. By\n `Vlad Niculae`_.\n\n- Out-of-core learning support for discrete naive Bayes classifiers\n :class:`sklearn.naive_bayes.MultinomialNB` and\n :class:`sklearn.naive_bayes.BernoulliNB` by adding the ``partial_fit``\n method by `Olivier Grisel`_.\n\n- New website design and navigation by `Gilles Louppe`_, `Nelle Varoquaux`_,\n Vincent Michel and `Andreas M\u00fcller`_.\n\n- Improved documentation on :ref:`multi-class, multi-label and multi-output\n classification ` by `Yannick Schwartz`_ and `Arnaud Joly`_.\n\n- Better input and error handling in the :mod:`metrics` module by\n `Arnaud Joly`_ and `Joel Nothman`_.\n\n- Speed optimization of the :mod:`hmm` module by :user:`Mikhail Korobov `\n\n- Significant speed improvements for :class:`sklearn.cluster.DBSCAN`\n by `cleverless `_\nAPI changes summary\n-------------------\n\n- The :func:`auc_score` was renamed :func:`roc_auc_score`.\n\n- Testing scikit-learn with ``sklearn.test()`` is deprecated. Use\n ``nosetests sklearn`` from the command line.\n\n- Feature importances in :class:`tree.DecisionTreeClassifier`,\n :class:`tree.DecisionTreeRegressor` and all derived ensemble estimators\n are now computed on the fly when accessing the ``feature_importances_``\n attribute. Setting ``compute_importances=True`` is no longer required.\n By `Gilles Louppe`_.\n\n- :class:`linear_model.lasso_path` and\n :class:`linear_model.enet_path` can return its results in the same\n format as that of :class:`linear_model.lars_path`. This is done by\n setting the ``return_models`` parameter to ``False``. By\n `Jaques Grobler`_ and `Alexandre Gramfort`_\n\n- :class:`grid_search.IterGrid` was renamed to\n :class:`grid_search.ParameterGrid`.\n\n- Fixed bug in :class:`KFold` causing imperfect class balance in some\n cases. By `Alexandre Gramfort`_ and Tadej Jane\u017e.\n\n- :class:`sklearn.neighbors.BallTree` has been refactored, and a\n :class:`sklearn.neighbors.KDTree` has been\n added which shares the same interface. The Ball Tree now works with\n a wide variety of distance metrics. Both classes have many new\n methods, including single-tree and dual-tree queries, breadth-first\n and depth-first searching, and more advanced queries such as\n kernel density estimation and 2-point correlation functions.\n By `Jake Vanderplas`_\n\n- Support for scipy.spatial.cKDTree within neighbors queries has been\n removed, and the functionality replaced with the new :class:`KDTree`\n class.\n\n- :class:`sklearn.neighbors.KernelDensity` has been added, which performs\n efficient kernel density estimation with a variety of kernels.\n\n- :class:`sklearn.decomposition.KernelPCA` now always returns output with\n ``n_components`` components, unless the new parameter ``remove_zero_eig``\n is set to ``True``. This new behavior is consistent with the way\n kernel PCA was always documented; previously, the removal of components\n with zero eigenvalues was tacitly performed on all data.\n\n- ``gcv_mode=\"auto\"`` no longer tries to perform SVD on a densified\n sparse matrix in :class:`sklearn.linear_model.RidgeCV`.\n\n- Sparse matrix support in :class:`sklearn.decomposition.RandomizedPCA`\n is now deprecated in favor of the new ``TruncatedSVD``.\n\n- :class:`cross_validation.KFold` and\n :class:`cross_validation.StratifiedKFold` now enforce `n_folds >= 2`\n otherwise a ``ValueError`` is raised. By `Olivier Grisel`_.\n\n- :func:`datasets.load_files`'s ``charset`` and ``charset_errors``\n parameters were renamed ``encoding`` and ``decode_errors``.\n\n- Attribute ``oob_score_`` in :class:`sklearn.ensemble.GradientBoostingRegressor`\n and :class:`sklearn.ensemble.GradientBoostingClassifier`\n is deprecated and has been replaced by ``oob_improvement_`` .\n\n- Attributes in OrthogonalMatchingPursuit have been deprecated\n (copy_X, Gram, ...) and precompute_gram renamed precompute\n for consistency. See #2224.\n\n- :class:`sklearn.preprocessing.StandardScaler` now converts integer input\n to float, and raises a warning. Previously it rounded for dense integer\n input.\n\n- :class:`sklearn.multiclass.OneVsRestClassifier` now has a\n ``decision_function`` method. This will return the distance of each\n sample from the decision boundary for each class, as long as the\n underlying estimators implement the ``decision_function`` method.\n By `Kyle Kastner`_.\n\n- Better input validation, warning on unexpected shapes for y.\n\nPeople\n------\nList of contributors for release 0.14 by number of commits.\n* 277 Gilles Louppe\n * 245 Lars Buitinck\n * 187 Andreas Mueller\n * 124 Arnaud Joly\n * 112 Jaques Grobler\n * 109 Gael Varoquaux\n * 107 Olivier Grisel\n * 102 Noel Dawe\n * 99 Kemal Eren\n * 79 Joel Nothman\n * 75 Jake VanderPlas\n * 73 Nelle Varoquaux\n * 71 Vlad Niculae\n * 65 Peter Prettenhofer\n * 64 Alexandre Gramfort\n * 54 Mathieu Blondel\n * 38 Nicolas Tr\u00e9segnie\n * 35 eustache\n * 27 Denis Engemann\n * 25 Yann N. Dauphin\n * 19 Justin Vincent\n * 17 Robert Layton\n * 15 Doug Coleman\n * 14 Michael Eickenberg\n * 13 Robert Marchman\n * 11 Fabian Pedregosa\n * 11 Philippe Gervais\n * 10 Jim Holmstr\u00f6m\n * 10 Tadej Jane\u017e\n * 10 syhw\n * 9 Mikhail Korobov\n * 9 Steven De Gryze\n * 8 sergeyf\n * 7 Ben Root\n * 7 Hrishikesh Huilgolkar\n * 6 Kyle Kastner\n * 6 Martin Luessi\n * 6 Rob Speer\n * 5 Federico Vaggi\n * 5 Raul Garreta\n * 5 Rob Zinkov\n * 4 Ken Geis\n * 3 A. Flaxman\n * 3 Denton Cockburn\n * 3 Dougal Sutherland\n * 3 Ian Ozsvald\n * 3 Johannes Sch\u00f6nberger\n * 3 Robert McGibbon\n * 3 Roman Sinayev\n * 3 Szabo Roland\n * 2 Diego Molla\n * 2 Imran Haque\n * 2 Jochen Wersd\u00f6rfer\n * 2 Sergey Karayev\n * 2 Yannick Schwartz\n * 2 jamestwebber\n * 1 Abhijeet Kolhe\n * 1 Alexander Fabisch\n * 1 Bastiaan van den Berg\n * 1 Benjamin Peterson\n * 1 Daniel Velkov\n * 1 Fazlul Shahriar\n * 1 Felix Brockherde\n * 1 F\u00e9lix-Antoine Fortin\n * 1 Harikrishnan S\n * 1 Jack Hale\n * 1 JakeMick\n * 1 James McDermott\n * 1 John Benediktsson\n * 1 John Zwinck\n * 1 Joshua Vredevoogd\n * 1 Justin Pati\n * 1 Kevin Hughes\n * 1 Kyle Kelley\n * 1 Matthias Ekman\n * 1 Miroslav Shubernetskiy\n * 1 Naoki Orii\n * 1 Norbert Crombach\n * 1 Rafael Cunha de Almeida\n * 1 Rolando Espinoza La fuente\n * 1 Seamus Abshere\n * 1 Sergey Feldman\n * 1 Sergio Medina\n * 1 Stefano Lattarini\n * 1 Steve Koch\n * 1 Sturla Molden\n * 1 Thomas Jarosch\n * 1 Yaroslav Halchenko\n\n.. _changes_0_13_1:\n\nVersion 0.13.1\n==============\n\n**February 23, 2013**\n\nThe 0.13.1 release only fixes some bugs and does not add any new functionality.\n\nChangelog\n---------\n\n- Fixed a testing error caused by the function :func:`cross_validation.train_test_split` being\n interpreted as a test by `Yaroslav Halchenko`_.\n\n- Fixed a bug in the reassignment of small clusters in the :class:`cluster.MiniBatchKMeans`\n by `Gael Varoquaux`_.\n\n- Fixed default value of ``gamma`` in :class:`decomposition.KernelPCA` by `Lars Buitinck`_.\n\n- Updated joblib to ``0.7.0d`` by `Gael Varoquaux`_.\n\n- Fixed scaling of the deviance in :class:`ensemble.GradientBoostingClassifier` by `Peter Prettenhofer`_.\n\n- Better tie-breaking in :class:`multiclass.OneVsOneClassifier` by `Andreas M\u00fcller`_.\n\n- Other small improvements to tests and documentation.\n\nPeople\n------\nList of contributors for release 0.13.1 by number of commits.\n * 16 `Lars Buitinck`_\n * 12 `Andreas M\u00fcller`_\n * 8 `Gael Varoquaux`_\n * 5 Robert Marchman\n * 3 `Peter Prettenhofer`_\n * 2 Hrishikesh Huilgolkar\n * 1 Bastiaan van den Berg\n * 1 Diego Molla\n * 1 `Gilles Louppe`_\n * 1 `Mathieu Blondel`_\n * 1 `Nelle Varoquaux`_\n * 1 Rafael Cunha de Almeida\n * 1 Rolando Espinoza La fuente\n * 1 `Vlad Niculae`_\n * 1 `Yaroslav Halchenko`_\n.. _changes_0_13:\n\nVersion 0.13\n============\n\n**January 21, 2013**\n\nNew Estimator Classes\n---------------------\n\n- :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor`, two\n data-independent predictors by `Mathieu Blondel`_. Useful to sanity-check\n your estimators. See :ref:`dummy_estimators` in the user guide.\n Multioutput support added by `Arnaud Joly`_.\n\n- :class:`decomposition.FactorAnalysis`, a transformer implementing the\n classical factor analysis, by `Christian Osendorfer`_ and `Alexandre\n Gramfort`_. See :ref:`FA` in the user guide.\n\n- :class:`feature_extraction.FeatureHasher`, a transformer implementing the\n \"hashing trick\" for fast, low-memory feature extraction from string fields\n by `Lars Buitinck`_ and :class:`feature_extraction.text.HashingVectorizer`\n for text documents by `Olivier Grisel`_ See :ref:`feature_hashing` and\n :ref:`hashing_vectorizer` for the documentation and sample usage.\n\n- :class:`pipeline.FeatureUnion`, a transformer that concatenates\n results of several other transformers by `Andreas M\u00fcller`_. See\n :ref:`feature_union` in the user guide.\n\n- :class:`random_projection.GaussianRandomProjection`,\n :class:`random_projection.SparseRandomProjection` and the function\n :func:`random_projection.johnson_lindenstrauss_min_dim`. The first two are\n transformers implementing Gaussian and sparse random projection matrix\n by `Olivier Grisel`_ and `Arnaud Joly`_.\n See :ref:`random_projection` in the user guide.\n\n- :class:`kernel_approximation.Nystroem`, a transformer for approximating\n arbitrary kernels by `Andreas M\u00fcller`_. See\n :ref:`nystroem_kernel_approx` in the user guide.\n\n- :class:`preprocessing.OneHotEncoder`, a transformer that computes binary\n encodings of categorical features by `Andreas M\u00fcller`_. See\n :ref:`preprocessing_categorical_features` in the user guide.\n\n- :class:`linear_model.PassiveAggressiveClassifier` and\n :class:`linear_model.PassiveAggressiveRegressor`, predictors implementing\n an efficient stochastic optimization for linear models by `Rob Zinkov`_ and\n `Mathieu Blondel`_. See :ref:`passive_aggressive` in the user\n guide.\n\n- :class:`ensemble.RandomTreesEmbedding`, a transformer for creating high-dimensional\n sparse representations using ensembles of totally random trees by `Andreas M\u00fcller`_.\n See :ref:`random_trees_embedding` in the user guide.\n\n- :class:`manifold.SpectralEmbedding` and function\n :func:`manifold.spectral_embedding`, implementing the \"laplacian\n eigenmaps\" transformation for non-linear dimensionality reduction by Wei\n Li. See :ref:`spectral_embedding` in the user guide.\n\n- :class:`isotonic.IsotonicRegression` by `Fabian Pedregosa`_, `Alexandre Gramfort`_\n and `Nelle Varoquaux`_,\nChangelog\n---------\n\n- :func:`metrics.zero_one_loss` (formerly ``metrics.zero_one``) now has\n option for normalized output that reports the fraction of\n misclassifications, rather than the raw number of misclassifications. By\n Kyle Beauchamp.\n\n- :class:`tree.DecisionTreeClassifier` and all derived ensemble models now\n support sample weighting, by `Noel Dawe`_ and `Gilles Louppe`_.\n\n- Speedup improvement when using bootstrap samples in forests of randomized\n trees, by `Peter Prettenhofer`_ and `Gilles Louppe`_.\n\n- Partial dependence plots for :ref:`gradient_boosting` in\n :func:`ensemble.partial_dependence.partial_dependence` by `Peter\n Prettenhofer`_. See :ref:`sphx_glr_auto_examples_ensemble_plot_partial_dependence.py` for an\n example.\n\n- The table of contents on the website has now been made expandable by\n `Jaques Grobler`_.\n\n- :class:`feature_selection.SelectPercentile` now breaks ties\n deterministically instead of returning all equally ranked features.\n\n- :class:`feature_selection.SelectKBest` and\n :class:`feature_selection.SelectPercentile` are more numerically stable\n since they use scores, rather than p-values, to rank results. This means\n that they might sometimes select different features than they did\n previously.\n\n- Ridge regression and ridge classification fitting with ``sparse_cg`` solver\n no longer has quadratic memory complexity, by `Lars Buitinck`_ and\n `Fabian Pedregosa`_.\n\n- Ridge regression and ridge classification now support a new fast solver\n called ``lsqr``, by `Mathieu Blondel`_.\n\n- Speed up of :func:`metrics.precision_recall_curve` by Conrad Lee.\n\n- Added support for reading/writing svmlight files with pairwise\n preference attribute (qid in svmlight file format) in\n :func:`datasets.dump_svmlight_file` and\n :func:`datasets.load_svmlight_file` by `Fabian Pedregosa`_.\n\n- Faster and more robust :func:`metrics.confusion_matrix` and\n :ref:`clustering_evaluation` by Wei Li.\n\n- :func:`cross_validation.cross_val_score` now works with precomputed kernels\n and affinity matrices, by `Andreas M\u00fcller`_.\n\n- LARS algorithm made more numerically stable with heuristics to drop\n regressors too correlated as well as to stop the path when\n numerical noise becomes predominant, by `Gael Varoquaux`_.\n\n- Faster implementation of :func:`metrics.precision_recall_curve` by\n Conrad Lee.\n\n- New kernel :class:`metrics.chi2_kernel` by `Andreas M\u00fcller`_, often used\n in computer vision applications.\n\n- Fix of longstanding bug in :class:`naive_bayes.BernoulliNB` fixed by\n Shaun Jackman.\n\n- Implemented ``predict_proba`` in :class:`multiclass.OneVsRestClassifier`,\n by Andrew Winterman.\n\n- Improve consistency in gradient boosting: estimators\n :class:`ensemble.GradientBoostingRegressor` and\n :class:`ensemble.GradientBoostingClassifier` use the estimator\n :class:`tree.DecisionTreeRegressor` instead of the\n :class:`tree._tree.Tree` data structure by `Arnaud Joly`_.\n\n- Fixed a floating point exception in the :ref:`decision trees `\n module, by Seberg.\n\n- Fix :func:`metrics.roc_curve` fails when y_true has only one class\n by Wei Li.\n\n- Add the :func:`metrics.mean_absolute_error` function which computes the\n mean absolute error. The :func:`metrics.mean_squared_error`,\n :func:`metrics.mean_absolute_error` and\n :func:`metrics.r2_score` metrics support multioutput by `Arnaud Joly`_.\n\n- Fixed ``class_weight`` support in :class:`svm.LinearSVC` and\n :class:`linear_model.LogisticRegression` by `Andreas M\u00fcller`_. The meaning\n of ``class_weight`` was reversed as erroneously higher weight meant less\n positives of a given class in earlier releases.\n\n- Improve narrative documentation and consistency in\n :mod:`sklearn.metrics` for regression and classification metrics\n by `Arnaud Joly`_.\n\n- Fixed a bug in :class:`sklearn.svm.SVC` when using csr-matrices with\n unsorted indices by Xinfan Meng and `Andreas M\u00fcller`_.\n\n- :class:`MiniBatchKMeans`: Add random reassignment of cluster centers\n with little observations attached to them, by `Gael Varoquaux`_.\nAPI changes summary\n-------------------\n- Renamed all occurrences of ``n_atoms`` to ``n_components`` for consistency.\n This applies to :class:`decomposition.DictionaryLearning`,\n :class:`decomposition.MiniBatchDictionaryLearning`,\n :func:`decomposition.dict_learning`, :func:`decomposition.dict_learning_online`.\n\n- Renamed all occurrences of ``max_iters`` to ``max_iter`` for consistency.\n This applies to :class:`semi_supervised.LabelPropagation` and\n :class:`semi_supervised.label_propagation.LabelSpreading`.\n\n- Renamed all occurrences of ``learn_rate`` to ``learning_rate`` for\n consistency in :class:`ensemble.BaseGradientBoosting` and\n :class:`ensemble.GradientBoostingRegressor`.\n\n- The module ``sklearn.linear_model.sparse`` is gone. Sparse matrix support\n was already integrated into the \"regular\" linear models.\n\n- :func:`sklearn.metrics.mean_square_error`, which incorrectly returned the\n accumulated error, was removed. Use ``mean_squared_error`` instead.\n\n- Passing ``class_weight`` parameters to ``fit`` methods is no longer\n supported. Pass them to estimator constructors instead.\n\n- GMMs no longer have ``decode`` and ``rvs`` methods. Use the ``score``,\n ``predict`` or ``sample`` methods instead.\n\n- The ``solver`` fit option in Ridge regression and classification is now\n deprecated and will be removed in v0.14. Use the constructor option\n instead.\n\n- :class:`feature_extraction.text.DictVectorizer` now returns sparse\n matrices in the CSR format, instead of COO.\n\n- Renamed ``k`` in :class:`cross_validation.KFold` and\n :class:`cross_validation.StratifiedKFold` to ``n_folds``, renamed\n ``n_bootstraps`` to ``n_iter`` in ``cross_validation.Bootstrap``.\n\n- Renamed all occurrences of ``n_iterations`` to ``n_iter`` for consistency.\n This applies to :class:`cross_validation.ShuffleSplit`,\n :class:`cross_validation.StratifiedShuffleSplit`,\n :func:`utils.randomized_range_finder` and :func:`utils.randomized_svd`.\n\n- Replaced ``rho`` in :class:`linear_model.ElasticNet` and\n :class:`linear_model.SGDClassifier` by ``l1_ratio``. The ``rho`` parameter\n had different meanings; ``l1_ratio`` was introduced to avoid confusion.\n It has the same meaning as previously ``rho`` in\n :class:`linear_model.ElasticNet` and ``(1-rho)`` in\n :class:`linear_model.SGDClassifier`.\n\n- :class:`linear_model.LassoLars` and :class:`linear_model.Lars` now\n store a list of paths in the case of multiple targets, rather than\n an array of paths.\n\n- The attribute ``gmm`` of :class:`hmm.GMMHMM` was renamed to ``gmm_``\n to adhere more strictly with the API.\n\n- :func:`cluster.spectral_embedding` was moved to\n :func:`manifold.spectral_embedding`.\n\n- Renamed ``eig_tol`` in :func:`manifold.spectral_embedding`,\n :class:`cluster.SpectralClustering` to ``eigen_tol``, renamed ``mode``\n to ``eigen_solver``.\n\n- Renamed ``mode`` in :func:`manifold.spectral_embedding` and\n :class:`cluster.SpectralClustering` to ``eigen_solver``.\n\n- ``classes_`` and ``n_classes_`` attributes of\n :class:`tree.DecisionTreeClassifier` and all derived ensemble models are\n now flat in case of single output problems and nested in case of\n multi-output problems.\n\n- The ``estimators_`` attribute of\n :class:`ensemble.gradient_boosting.GradientBoostingRegressor` and\n :class:`ensemble.gradient_boosting.GradientBoostingClassifier` is now an\n array of :class:'tree.DecisionTreeRegressor'.\n\n- Renamed ``chunk_size`` to ``batch_size`` in\n :class:`decomposition.MiniBatchDictionaryLearning` and\n :class:`decomposition.MiniBatchSparsePCA` for consistency.\n\n- :class:`svm.SVC` and :class:`svm.NuSVC` now provide a ``classes_``\n attribute and support arbitrary dtypes for labels ``y``.\n Also, the dtype returned by ``predict`` now reflects the dtype of\n ``y`` during ``fit`` (used to be ``np.float``).\n\n- Changed default test_size in :func:`cross_validation.train_test_split`\n to None, added possibility to infer ``test_size`` from ``train_size`` in\n :class:`cross_validation.ShuffleSplit` and\n :class:`cross_validation.StratifiedShuffleSplit`.\n\n- Renamed function :func:`sklearn.metrics.zero_one` to\n :func:`sklearn.metrics.zero_one_loss`. Be aware that the default behavior\n in :func:`sklearn.metrics.zero_one_loss` is different from\n :func:`sklearn.metrics.zero_one`: ``normalize=False`` is changed to\n ``normalize=True``.\n\n- Renamed function :func:`metrics.zero_one_score` to\n :func:`metrics.accuracy_score`.\n\n- :func:`datasets.make_circles` now has the same number of inner and outer points.\n\n- In the Naive Bayes classifiers, the ``class_prior`` parameter was moved\n from ``fit`` to ``__init__``.\n\nPeople\n------\nList of contributors for release 0.13 by number of commits.\n* 364 `Andreas M\u00fcller`_\n * 143 `Arnaud Joly`_\n * 137 `Peter Prettenhofer`_\n * 131 `Gael Varoquaux`_\n * 117 `Mathieu Blondel`_\n * 108 `Lars Buitinck`_\n * 106 Wei Li\n * 101 `Olivier Grisel`_\n * 65 `Vlad Niculae`_\n * 54 `Gilles Louppe`_\n * 40 `Jaques Grobler`_\n * 38 `Alexandre Gramfort`_\n * 30 `Rob Zinkov`_\n * 19 Aymeric Masurelle\n * 18 Andrew Winterman\n * 17 `Fabian Pedregosa`_\n * 17 Nelle Varoquaux\n * 16 `Christian Osendorfer`_\n * 14 `Daniel Nouri`_\n * 13 :user:`Virgile Fritsch `\n * 13 syhw\n * 12 `Satrajit Ghosh`_\n * 10 Corey Lynch\n * 10 Kyle Beauchamp\n * 9 Brian Cheung\n * 9 Immanuel Bayer\n * 9 mr.Shu\n * 8 Conrad Lee\n * 8 `James Bergstra`_\n * 7 Tadej Jane\u017e\n * 6 Brian Cajes\n * 6 `Jake Vanderplas`_\n * 6 Michael\n * 6 Noel Dawe\n * 6 Tiago Nunes\n * 6 cow\n * 5 Anze\n * 5 Shiqiao Du\n * 4 Christian Jauvin\n * 4 Jacques Kvam\n * 4 Richard T. Guy\n * 4 `Robert Layton`_\n * 3 Alexandre Abraham\n * 3 Doug Coleman\n * 3 Scott Dickerson\n * 2 ApproximateIdentity\n * 2 John Benediktsson\n * 2 Mark Veronda\n * 2 Matti Lyra\n * 2 Mikhail Korobov\n * 2 Xinfan Meng\n * 1 Alejandro Weinstein\n * 1 `Alexandre Passos`_\n * 1 Christoph Deil\n * 1 Eugene Nizhibitsky\n * 1 Kenneth C. Arnold\n * 1 Luis Pedro Coelho\n * 1 Miroslav Batchkarov\n * 1 Pavel\n * 1 Sebastian Berg\n * 1 Shaun Jackman\n * 1 Subhodeep Moitra\n * 1 bob\n * 1 dengemann\n * 1 emanuele\n * 1 x006\n.. _changes_0_12.1:\n\nVersion 0.12.1\n===============\n\n**October 8, 2012**\n\nThe 0.12.1 release is a bug-fix release with no additional features, but is\ninstead a set of bug fixes\n\nChangelog\n----------\n\n- Improved numerical stability in spectral embedding by `Gael\n Varoquaux`_\n\n- Doctest under windows 64bit by `Gael Varoquaux`_\n\n- Documentation fixes for elastic net by `Andreas M\u00fcller`_ and\n `Alexandre Gramfort`_\n\n- Proper behavior with fortran-ordered NumPy arrays by `Gael Varoquaux`_\n\n- Make GridSearchCV work with non-CSR sparse matrix by `Lars Buitinck`_\n\n- Fix parallel computing in MDS by `Gael Varoquaux`_\n\n- Fix Unicode support in count vectorizer by `Andreas M\u00fcller`_\n\n- Fix MinCovDet breaking with X.shape = (3, 1) by :user:`Virgile Fritsch `\n\n- Fix clone of SGD objects by `Peter Prettenhofer`_\n\n- Stabilize GMM by :user:`Virgile Fritsch `\n\nPeople\n------\n* 14 `Peter Prettenhofer`_\n * 12 `Gael Varoquaux`_\n * 10 `Andreas M\u00fcller`_\n * 5 `Lars Buitinck`_\n * 3 :user:`Virgile Fritsch `\n * 1 `Alexandre Gramfort`_\n * 1 `Gilles Louppe`_\n * 1 `Mathieu Blondel`_\n\n.. _changes_0_12:\n\nVersion 0.12\n============\n\n**September 4, 2012**\n\nChangelog\n---------\n\n- Various speed improvements of the :ref:`decision trees ` module, by\n `Gilles Louppe`_.\n\n- :class:`ensemble.GradientBoostingRegressor` and\n :class:`ensemble.GradientBoostingClassifier` now support feature subsampling\n via the ``max_features`` argument, by `Peter Prettenhofer`_.\n\n- Added Huber and Quantile loss functions to\n :class:`ensemble.GradientBoostingRegressor`, by `Peter Prettenhofer`_.\n\n- :ref:`Decision trees ` and :ref:`forests of randomized trees `\n now support multi-output classification and regression problems, by\n `Gilles Louppe`_.\n\n- Added :class:`preprocessing.LabelEncoder`, a simple utility class to\n normalize labels or transform non-numerical labels, by `Mathieu Blondel`_.\n\n- Added the epsilon-insensitive loss and the ability to make probabilistic\n predictions with the modified huber loss in :ref:`sgd`, by\n `Mathieu Blondel`_.\n\n- Added :ref:`multidimensional_scaling`, by Nelle Varoquaux.\n\n- SVMlight file format loader now detects compressed (gzip/bzip2) files and\n decompresses them on the fly, by `Lars Buitinck`_.\n\n- SVMlight file format serializer now preserves double precision floating\n point values, by `Olivier Grisel`_.\n\n- A common testing framework for all estimators was added, by `Andreas M\u00fcller`_.\n\n- Understandable error messages for estimators that do not accept\n sparse input by `Gael Varoquaux`_\n\n- Speedups in hierarchical clustering by `Gael Varoquaux`_. In\n particular building the tree now supports early stopping. This is\n useful when the number of clusters is not small compared to the\n number of samples.\n\n- Add MultiTaskLasso and MultiTaskElasticNet for joint feature selection,\n by `Alexandre Gramfort`_.\n\n- Added :func:`metrics.auc_score` and\n :func:`metrics.average_precision_score` convenience functions by `Andreas\n M\u00fcller`_.\n\n- Improved sparse matrix support in the :ref:`feature_selection`\n module by `Andreas M\u00fcller`_.\n\n- New word boundaries-aware character n-gram analyzer for the\n :ref:`text_feature_extraction` module by :user:`@kernc `.\n\n- Fixed bug in spectral clustering that led to single point clusters\n by `Andreas M\u00fcller`_.\n\n- In :class:`feature_extraction.text.CountVectorizer`, added an option to\n ignore infrequent words, ``min_df`` by `Andreas M\u00fcller`_.\n\n- Add support for multiple targets in some linear models (ElasticNet, Lasso\n and OrthogonalMatchingPursuit) by `Vlad Niculae`_ and\n `Alexandre Gramfort`_.\n\n- Fixes in :class:`decomposition.ProbabilisticPCA` score function by Wei Li.\n\n- Fixed feature importance computation in\n :ref:`gradient_boosting`.\n\nAPI changes summary\n-------------------\n\n- The old ``scikits.learn`` package has disappeared; all code should import\n from ``sklearn`` instead, which was introduced in 0.9.\n\n- In :func:`metrics.roc_curve`, the ``thresholds`` array is now returned\n with it's order reversed, in order to keep it consistent with the order\n of the returned ``fpr`` and ``tpr``.\n\n- In :class:`hmm` objects, like :class:`hmm.GaussianHMM`,\n :class:`hmm.MultinomialHMM`, etc., all parameters must be passed to the\n object when initialising it and not through ``fit``. Now ``fit`` will\n only accept the data as an input parameter.\n\n- For all SVM classes, a faulty behavior of ``gamma`` was fixed. Previously,\n the default gamma value was only computed the first time ``fit`` was called\n and then stored. It is now recalculated on every call to ``fit``.\n\n- All ``Base`` classes are now abstract meta classes so that they can not be\n instantiated.\n\n- :func:`cluster.ward_tree` now also returns the parent array. This is\n necessary for early-stopping in which case the tree is not\n completely built.\n\n- In :class:`feature_extraction.text.CountVectorizer` the parameters\n ``min_n`` and ``max_n`` were joined to the parameter ``n_gram_range`` to\n enable grid-searching both at once.\n\n- In :class:`feature_extraction.text.CountVectorizer`, words that appear\n only in one document are now ignored by default. To reproduce\n the previous behavior, set ``min_df=1``.\n\n- Fixed API inconsistency: :meth:`linear_model.SGDClassifier.predict_proba` now\n returns 2d array when fit on two classes.\n\n- Fixed API inconsistency: :meth:`discriminant_analysis.QuadraticDiscriminantAnalysis.decision_function`\n and :meth:`discriminant_analysis.LinearDiscriminantAnalysis.decision_function` now return 1d arrays\n when fit on two classes.\n\n- Grid of alphas used for fitting :class:`linear_model.LassoCV` and\n :class:`linear_model.ElasticNetCV` is now stored\n in the attribute ``alphas_`` rather than overriding the init parameter\n ``alphas``.\n\n- Linear models when alpha is estimated by cross-validation store\n the estimated value in the ``alpha_`` attribute rather than just\n ``alpha`` or ``best_alpha``.\n\n- :class:`ensemble.GradientBoostingClassifier` now supports\n :meth:`ensemble.GradientBoostingClassifier.staged_predict_proba`, and\n :meth:`ensemble.GradientBoostingClassifier.staged_predict`.\n\n- :class:`svm.sparse.SVC` and other sparse SVM classes are now deprecated.\n The all classes in the :ref:`svm` module now automatically select the\n sparse or dense representation base on the input.\n\n- All clustering algorithms now interpret the array ``X`` given to ``fit`` as\n input data, in particular :class:`cluster.SpectralClustering` and\n :class:`cluster.AffinityPropagation` which previously expected affinity matrices.\n\n- For clustering algorithms that take the desired number of clusters as a parameter,\n this parameter is now called ``n_clusters``.\nPeople\n------\n * 267 `Andreas M\u00fcller`_\n * 94 `Gilles Louppe`_\n * 89 `Gael Varoquaux`_\n * 79 `Peter Prettenhofer`_\n * 60 `Mathieu Blondel`_\n * 57 `Alexandre Gramfort`_\n * 52 `Vlad Niculae`_\n * 45 `Lars Buitinck`_\n * 44 Nelle Varoquaux\n * 37 `Jaques Grobler`_\n * 30 Alexis Mignon\n * 30 Immanuel Bayer\n * 27 `Olivier Grisel`_\n * 16 Subhodeep Moitra\n * 13 Yannick Schwartz\n * 12 :user:`@kernc `\n * 11 :user:`Virgile Fritsch `\n * 9 Daniel Duckworth\n * 9 `Fabian Pedregosa`_\n * 9 `Robert Layton`_\n * 8 John Benediktsson\n * 7 Marko Burjek\n * 5 `Nicolas Pinto`_\n * 4 Alexandre Abraham\n * 4 `Jake Vanderplas`_\n * 3 `Brian Holt`_\n * 3 `Edouard Duchesnay`_\n * 3 Florian Hoenig\n * 3 flyingimmidev\n * 2 Francois Savard\n * 2 Hannes Schulz\n * 2 Peter Welinder\n * 2 `Yaroslav Halchenko`_\n * 2 Wei Li\n * 1 Alex Companioni\n * 1 Brandyn A. White\n * 1 Bussonnier Matthias\n * 1 Charles-Pierre Astolfi\n * 1 Dan O'Huiginn\n * 1 David Cournapeau\n * 1 Keith Goodman\n * 1 Ludwig Schwardt\n * 1 Olivier Hervieu\n * 1 Sergio Medina\n * 1 Shiqiao Du\n * 1 Tim Sheerman-Chase\n * 1 buguen\n.. _changes_0_11:\n\nVersion 0.11\n============\n\n**May 7, 2012**\n\nChangelog\n---------\n\nHighlights\n.............\n\n- Gradient boosted regression trees (:ref:`gradient_boosting`)\n for classification and regression by `Peter Prettenhofer`_\n and `Scott White`_ .\n\n- Simple dict-based feature loader with support for categorical variables\n (:class:`feature_extraction.DictVectorizer`) by `Lars Buitinck`_.\n\n- Added Matthews correlation coefficient (:func:`metrics.matthews_corrcoef`)\n and added macro and micro average options to\n :func:`metrics.precision_score`, :func:`metrics.recall_score` and\n :func:`metrics.f1_score` by `Satrajit Ghosh`_.\n\n- :ref:`out_of_bag` of generalization error for :ref:`ensemble`\n by `Andreas M\u00fcller`_.\n\n- Randomized sparse linear models for feature\n selection, by `Alexandre Gramfort`_ and `Gael Varoquaux`_\n\n- :ref:`label_propagation` for semi-supervised learning, by Clay\n Woolam. **Note** the semi-supervised API is still work in progress,\n and may change.\n\n- Added BIC/AIC model selection to classical :ref:`gmm` and unified\n the API with the remainder of scikit-learn, by `Bertrand Thirion`_\n\n- Added :class:`sklearn.cross_validation.StratifiedShuffleSplit`, which is\n a :class:`sklearn.cross_validation.ShuffleSplit` with balanced splits,\n by Yannick Schwartz.\n\n- :class:`sklearn.neighbors.NearestCentroid` classifier added, along with a\n ``shrink_threshold`` parameter, which implements **shrunken centroid\n classification**, by `Robert Layton`_.\n\nOther changes\n..............\n\n- Merged dense and sparse implementations of :ref:`sgd` module and\n exposed utility extension types for sequential\n datasets ``seq_dataset`` and weight vectors ``weight_vector``\n by `Peter Prettenhofer`_.\n\n- Added ``partial_fit`` (support for online/minibatch learning) and\n warm_start to the :ref:`sgd` module by `Mathieu Blondel`_.\n\n- Dense and sparse implementations of :ref:`svm` classes and\n :class:`linear_model.LogisticRegression` merged by `Lars Buitinck`_.\n\n- Regressors can now be used as base estimator in the :ref:`multiclass`\n module by `Mathieu Blondel`_.\n\n- Added n_jobs option to :func:`metrics.pairwise.pairwise_distances`\n and :func:`metrics.pairwise.pairwise_kernels` for parallel computation,\n by `Mathieu Blondel`_.\n\n- :ref:`k_means` can now be run in parallel, using the ``n_jobs`` argument\n to either :ref:`k_means` or :class:`KMeans`, by `Robert Layton`_.\n\n- Improved :ref:`cross_validation` and :ref:`grid_search` documentation\n and introduced the new :func:`cross_validation.train_test_split`\n helper function by `Olivier Grisel`_\n\n- :class:`svm.SVC` members ``coef_`` and ``intercept_`` changed sign for\n consistency with ``decision_function``; for ``kernel==linear``,\n ``coef_`` was fixed in the one-vs-one case, by `Andreas M\u00fcller`_.\n\n- Performance improvements to efficient leave-one-out cross-validated\n Ridge regression, esp. for the ``n_samples > n_features`` case, in\n :class:`linear_model.RidgeCV`, by Reuben Fletcher-Costin.\n\n- Refactoring and simplification of the :ref:`text_feature_extraction`\n API and fixed a bug that caused possible negative IDF,\n by `Olivier Grisel`_.\n\n- Beam pruning option in :class:`_BaseHMM` module has been removed since it\n is difficult to Cythonize. If you are interested in contributing a Cython\n version, you can use the python version in the git history as a reference.\n\n- Classes in :ref:`neighbors` now support arbitrary Minkowski metric for\n nearest neighbors searches. The metric can be specified by argument ``p``.\n\nAPI changes summary\n-------------------\n\n- :class:`covariance.EllipticEnvelop` is now deprecated - Please use :class:`covariance.EllipticEnvelope`\n instead.\n\n- ``NeighborsClassifier`` and ``NeighborsRegressor`` are gone in the module\n :ref:`neighbors`. Use the classes :class:`KNeighborsClassifier`,\n :class:`RadiusNeighborsClassifier`, :class:`KNeighborsRegressor`\n and/or :class:`RadiusNeighborsRegressor` instead.\n\n- Sparse classes in the :ref:`sgd` module are now deprecated.\n\n- In :class:`mixture.GMM`, :class:`mixture.DPGMM` and :class:`mixture.VBGMM`,\n parameters must be passed to an object when initialising it and not through\n ``fit``. Now ``fit`` will only accept the data as an input parameter.\n\n- methods ``rvs`` and ``decode`` in :class:`GMM` module are now deprecated.\n ``sample`` and ``score`` or ``predict`` should be used instead.\n\n- attribute ``_scores`` and ``_pvalues`` in univariate feature selection\n objects are now deprecated.\n ``scores_`` or ``pvalues_`` should be used instead.\n\n- In :class:`LogisticRegression`, :class:`LinearSVC`, :class:`SVC` and\n :class:`NuSVC`, the ``class_weight`` parameter is now an initialization\n parameter, not a parameter to fit. This makes grid searches\n over this parameter possible.\n\n- LFW ``data`` is now always shape ``(n_samples, n_features)`` to be\n consistent with the Olivetti faces dataset. Use ``images`` and\n ``pairs`` attribute to access the natural images shapes instead.\n\n- In :class:`svm.LinearSVC`, the meaning of the ``multi_class`` parameter\n changed. Options now are ``'ovr'`` and ``'crammer_singer'``, with\n ``'ovr'`` being the default. This does not change the default behavior\n but hopefully is less confusing.\n\n- Class :class:`feature_selection.text.Vectorizer` is deprecated and\n replaced by :class:`feature_selection.text.TfidfVectorizer`.\n\n- The preprocessor / analyzer nested structure for text feature\n extraction has been removed. All those features are\n now directly passed as flat constructor arguments\n to :class:`feature_selection.text.TfidfVectorizer` and\n :class:`feature_selection.text.CountVectorizer`, in particular the\n following parameters are now used:\n\n- ``analyzer`` can be ``'word'`` or ``'char'`` to switch the default\n analysis scheme, or use a specific python callable (as previously).\n\n- ``tokenizer`` and ``preprocessor`` have been introduced to make it\n still possible to customize those steps with the new API.\n\n- ``input`` explicitly control how to interpret the sequence passed to\n ``fit`` and ``predict``: filenames, file objects or direct (byte or\n Unicode) strings.\n\n- charset decoding is explicit and strict by default.\n\n- the ``vocabulary``, fitted or not is now stored in the\n ``vocabulary_`` attribute to be consistent with the project\n conventions.\n\n- Class :class:`feature_selection.text.TfidfVectorizer` now derives directly\n from :class:`feature_selection.text.CountVectorizer` to make grid\n search trivial.\n\n- methods ``rvs`` in :class:`_BaseHMM` module are now deprecated.\n ``sample`` should be used instead.\n\n- Beam pruning option in :class:`_BaseHMM` module is removed since it is\n difficult to be Cythonized. If you are interested, you can look in the\n history codes by git.\n\n- The SVMlight format loader now supports files with both zero-based and\n one-based column indices, since both occur \"in the wild\".\n\n- Arguments in class :class:`ShuffleSplit` are now consistent with\n :class:`StratifiedShuffleSplit`. Arguments ``test_fraction`` and\n ``train_fraction`` are deprecated and renamed to ``test_size`` and\n ``train_size`` and can accept both ``float`` and ``int``.\n\n- Arguments in class :class:`Bootstrap` are now consistent with\n :class:`StratifiedShuffleSplit`. Arguments ``n_test`` and\n ``n_train`` are deprecated and renamed to ``test_size`` and\n ``train_size`` and can accept both ``float`` and ``int``.\n\n- Argument ``p`` added to classes in :ref:`neighbors` to specify an\n arbitrary Minkowski metric for nearest neighbors searches.\nPeople\n------\n * 282 `Andreas M\u00fcller`_\n * 239 `Peter Prettenhofer`_\n * 198 `Gael Varoquaux`_\n * 129 `Olivier Grisel`_\n * 114 `Mathieu Blondel`_\n * 103 Clay Woolam\n * 96 `Lars Buitinck`_\n * 88 `Jaques Grobler`_\n * 82 `Alexandre Gramfort`_\n * 50 `Bertrand Thirion`_\n * 42 `Robert Layton`_\n * 28 flyingimmidev\n * 26 `Jake Vanderplas`_\n * 26 Shiqiao Du\n * 21 `Satrajit Ghosh`_\n * 17 `David Marek`_\n * 17 `Gilles Louppe`_\n * 14 `Vlad Niculae`_\n * 11 Yannick Schwartz\n * 10 `Fabian Pedregosa`_\n * 9 fcostin\n * 7 Nick Wilson\n * 5 Adrien Gaidon\n * 5 `Nicolas Pinto`_\n * 4 `David Warde-Farley`_\n * 5 Nelle Varoquaux\n * 5 Emmanuelle Gouillart\n * 3 Joonas Sillanp\u00e4\u00e4\n * 3 Paolo Losi\n * 2 Charles McCarthy\n * 2 Roy Hyunjin Han\n * 2 Scott White\n * 2 ibayer\n * 1 Brandyn White\n * 1 Carlos Scheidegger\n * 1 Claire Revillet\n * 1 Conrad Lee\n * 1 `Edouard Duchesnay`_\n * 1 Jan Hendrik Metzen\n * 1 Meng Xinfan\n * 1 `Rob Zinkov`_\n * 1 Shiqiao\n * 1 Udi Weinsberg\n * 1 Virgile Fritsch\n * 1 Xinfan Meng\n * 1 Yaroslav Halchenko\n * 1 jansoe\n * 1 Leon Palafox\n.. _changes_0_10:\n\nVersion 0.10\n============\n\n**January 11, 2012**\n\nChangelog\n---------\n\n- Python 2.5 compatibility was dropped; the minimum Python version needed\n to use scikit-learn is now 2.6.\n\n- :ref:`sparse_inverse_covariance` estimation using the graph Lasso, with\n associated cross-validated estimator, by `Gael Varoquaux`_\n\n- New :ref:`Tree ` module by `Brian Holt`_, `Peter Prettenhofer`_,\n `Satrajit Ghosh`_ and `Gilles Louppe`_. The module comes with complete\n documentation and examples.\n\n- Fixed a bug in the RFE module by `Gilles Louppe`_ (issue #378).\n\n- Fixed a memory leak in :ref:`svm` module by `Brian Holt`_ (issue #367).\n\n- Faster tests by `Fabian Pedregosa`_ and others.\n\n- Silhouette Coefficient cluster analysis evaluation metric added as\n :func:`sklearn.metrics.silhouette_score` by Robert Layton.\n\n- Fixed a bug in :ref:`k_means` in the handling of the ``n_init`` parameter:\n the clustering algorithm used to be run ``n_init`` times but the last\n solution was retained instead of the best solution by `Olivier Grisel`_.\n\n- Minor refactoring in :ref:`sgd` module; consolidated dense and sparse\n predict methods; Enhanced test time performance by converting model\n parameters to fortran-style arrays after fitting (only multi-class).\n\n- Adjusted Mutual Information metric added as\n :func:`sklearn.metrics.adjusted_mutual_info_score` by Robert Layton.\n\n- Models like SVC/SVR/LinearSVC/LogisticRegression from libsvm/liblinear\n now support scaling of C regularization parameter by the number of\n samples by `Alexandre Gramfort`_.\n\n- New :ref:`Ensemble Methods ` module by `Gilles Louppe`_ and\n `Brian Holt`_. The module comes with the random forest algorithm and the\n extra-trees method, along with documentation and examples.\n\n- :ref:`outlier_detection`: outlier and novelty detection, by\n :user:`Virgile Fritsch `.\n\n- :ref:`kernel_approximation`: a transform implementing kernel\n approximation for fast SGD on non-linear kernels by\n `Andreas M\u00fcller`_.\n\n- Fixed a bug due to atom swapping in :ref:`OMP` by `Vlad Niculae`_.\n\n- :ref:`SparseCoder` by `Vlad Niculae`_.\n\n- :ref:`mini_batch_kmeans` performance improvements by `Olivier Grisel`_.\n\n- :ref:`k_means` support for sparse matrices by `Mathieu Blondel`_.\n\n- Improved documentation for developers and for the :mod:`sklearn.utils`\n module, by `Jake Vanderplas`_.\n\n- Vectorized 20newsgroups dataset loader\n (:func:`sklearn.datasets.fetch_20newsgroups_vectorized`) by\n `Mathieu Blondel`_.\n\n- :ref:`multiclass` by `Lars Buitinck`_.\n\n- Utilities for fast computation of mean and variance for sparse matrices\n by `Mathieu Blondel`_.\n\n- Make :func:`sklearn.preprocessing.scale` and\n :class:`sklearn.preprocessing.Scaler` work on sparse matrices by\n `Olivier Grisel`_\n\n- Feature importances using decision trees and/or forest of trees,\n by `Gilles Louppe`_.\n\n- Parallel implementation of forests of randomized trees by\n `Gilles Louppe`_.\n\n- :class:`sklearn.cross_validation.ShuffleSplit` can subsample the train\n sets as well as the test sets by `Olivier Grisel`_.\n\n- Errors in the build of the documentation fixed by `Andreas M\u00fcller`_.\nAPI changes summary\n-------------------\n\nHere are the code migration instructions when upgrading from scikit-learn\nversion 0.9:\n\n- Some estimators that may overwrite their inputs to save memory previously\n had ``overwrite_`` parameters; these have been replaced with ``copy_``\n parameters with exactly the opposite meaning.\nThis particularly affects some of the estimators in :mod:`linear_model`.\n The default behavior is still to copy everything passed in.\n\n- The SVMlight dataset loader :func:`sklearn.datasets.load_svmlight_file` no\n longer supports loading two files at once; use ``load_svmlight_files``\n instead. Also, the (unused) ``buffer_mb`` parameter is gone.\n\n- Sparse estimators in the :ref:`sgd` module use dense parameter vector\n ``coef_`` instead of ``sparse_coef_``. This significantly improves\n test time performance.\n\n- The :ref:`covariance` module now has a robust estimator of\n covariance, the Minimum Covariance Determinant estimator.\n\n- Cluster evaluation metrics in :mod:`metrics.cluster` have been refactored\n but the changes are backwards compatible. They have been moved to the\n :mod:`metrics.cluster.supervised`, along with\n :mod:`metrics.cluster.unsupervised` which contains the Silhouette\n Coefficient.\n\n- The ``permutation_test_score`` function now behaves the same way as\n ``cross_val_score`` (i.e. uses the mean score across the folds.)\n\n- Cross Validation generators now use integer indices (``indices=True``)\n by default instead of boolean masks. This make it more intuitive to\n use with sparse matrix data.\n\n- The functions used for sparse coding, ``sparse_encode`` and\n ``sparse_encode_parallel`` have been combined into\n :func:`sklearn.decomposition.sparse_encode`, and the shapes of the arrays\n have been transposed for consistency with the matrix factorization setting,\n as opposed to the regression setting.\n\n- Fixed an off-by-one error in the SVMlight/LibSVM file format handling;\n files generated using :func:`sklearn.datasets.dump_svmlight_file` should be\n re-generated. (They should continue to work, but accidentally had one\n extra column of zeros prepended.)\n\n- ``BaseDictionaryLearning`` class replaced by ``SparseCodingMixin``.\n\n- :func:`sklearn.utils.extmath.fast_svd` has been renamed\n :func:`sklearn.utils.extmath.randomized_svd` and the default\n oversampling is now fixed to 10 additional random vectors instead\n of doubling the number of components to extract. The new behavior\n follows the reference paper.\nPeople\n------\n\nThe following people contributed to scikit-learn since last release:\n* 246 `Andreas M\u00fcller`_\n * 242 `Olivier Grisel`_\n * 220 `Gilles Louppe`_\n * 183 `Brian Holt`_\n * 166 `Gael Varoquaux`_\n * 144 `Lars Buitinck`_\n * 73 `Vlad Niculae`_\n * 65 `Peter Prettenhofer`_\n * 64 `Fabian Pedregosa`_\n * 60 Robert Layton\n * 55 `Mathieu Blondel`_\n * 52 `Jake Vanderplas`_\n * 44 Noel Dawe\n * 38 `Alexandre Gramfort`_\n * 24 :user:`Virgile Fritsch `\n * 23 `Satrajit Ghosh`_\n * 3 Jan Hendrik Metzen\n * 3 Kenneth C. Arnold\n * 3 Shiqiao Du\n * 3 Tim Sheerman-Chase\n * 3 `Yaroslav Halchenko`_\n * 2 Bala Subrahmanyam Varanasi\n * 2 DraXus\n * 2 Michael Eickenberg\n * 1 Bogdan Trach\n * 1 F\u00e9lix-Antoine Fortin\n * 1 Juan Manuel Caicedo Carvajal\n * 1 Nelle Varoquaux\n * 1 `Nicolas Pinto`_\n * 1 Tiziano Zito\n * 1 Xinfan Meng\n.. _changes_0_9:\n\nVersion 0.9\n===========\n\n**September 21, 2011**\n\nscikit-learn 0.9 was released on September 2011, three months after the 0.8\nrelease and includes the new modules :ref:`manifold`, :ref:`dirichlet_process`\nas well as several new algorithms and documentation improvements.\n\nThis release also includes the dictionary-learning work developed by\n`Vlad Niculae`_ as part of the `Google Summer of Code\n `_ program.\n.. |banner1| image:: ./auto_examples/manifold/images/thumb/sphx_glr_plot_compare_methods_thumb.png\n :target: auto_examples/manifold/plot_compare_methods.html\n\n.. |banner2| image:: ./auto_examples/linear_model/images/thumb/sphx_glr_plot_omp_thumb.png\n :target: auto_examples/linear_model/plot_omp.html\n\n.. |banner3| image:: ./auto_examples/decomposition/images/thumb/sphx_glr_plot_kernel_pca_thumb.png\n :target: auto_examples/decomposition/plot_kernel_pca.html\n\n.. |center-div| raw:: html\n.. |end-div| raw:: html\n|center-div| |banner2| |banner1| |banner3| |end-div|\n\nChangelog\n---------\n\n- New :ref:`manifold` module by `Jake Vanderplas`_ and\n `Fabian Pedregosa`_.\n\n- New :ref:`Dirichlet Process ` Gaussian Mixture\n Model by `Alexandre Passos`_\n\n- :ref:`neighbors` module refactoring by `Jake Vanderplas`_ :\n general refactoring, support for sparse matrices in input, speed and\n documentation improvements. See the next section for a full list of API\n changes.\n\n- Improvements on the :ref:`feature_selection` module by\n `Gilles Louppe`_ : refactoring of the RFE classes, documentation\n rewrite, increased efficiency and minor API changes.\n\n- :ref:`SparsePCA` by `Vlad Niculae`_, `Gael Varoquaux`_ and\n `Alexandre Gramfort`_\n\n- Printing an estimator now behaves independently of architectures\n and Python version thanks to :user:`Jean Kossaifi `.\n\n- :ref:`Loader for libsvm/svmlight format ` by\n `Mathieu Blondel`_ and `Lars Buitinck`_\n\n- Documentation improvements: thumbnails in\n example gallery by `Fabian Pedregosa`_.\n\n- Important bugfixes in :ref:`svm` module (segfaults, bad\n performance) by `Fabian Pedregosa`_.\n\n- Added :ref:`multinomial_naive_bayes` and :ref:`bernoulli_naive_bayes`\n by `Lars Buitinck`_\n\n- Text feature extraction optimizations by Lars Buitinck\n\n- Chi-Square feature selection\n (:func:`feature_selection.univariate_selection.chi2`) by `Lars Buitinck`_.\n\n- :ref:`sample_generators` module refactoring by `Gilles Louppe`_\n\n- :ref:`multiclass` by `Mathieu Blondel`_\n\n- Ball tree rewrite by `Jake Vanderplas`_\n\n- Implementation of :ref:`dbscan` algorithm by Robert Layton\n\n- Kmeans predict and transform by Robert Layton\n\n- Preprocessing module refactoring by `Olivier Grisel`_\n\n- Faster mean shift by Conrad Lee\n\n- New ``Bootstrap``, :ref:`ShuffleSplit` and various other\n improvements in cross validation schemes by `Olivier Grisel`_ and\n `Gael Varoquaux`_\n\n- Adjusted Rand index and V-Measure clustering evaluation metrics by `Olivier Grisel`_\n\n- Added :class:`Orthogonal Matching Pursuit ` by `Vlad Niculae`_\n\n- Added 2D-patch extractor utilities in the :ref:`feature_extraction` module by `Vlad Niculae`_\n\n- Implementation of :class:`linear_model.LassoLarsCV`\n (cross-validated Lasso solver using the Lars algorithm) and\n :class:`linear_model.LassoLarsIC` (BIC/AIC model\n selection in Lars) by `Gael Varoquaux`_\n and `Alexandre Gramfort`_\n\n- Scalability improvements to :func:`metrics.roc_curve` by Olivier Hervieu\n\n- Distance helper functions :func:`metrics.pairwise.pairwise_distances`\n and :func:`metrics.pairwise.pairwise_kernels` by Robert Layton\n\n- :class:`Mini-Batch K-Means ` by Nelle Varoquaux and Peter Prettenhofer.\n\n- :ref:`mldata` utilities by Pietro Berkes.\n\n- :ref:`olivetti_faces` by `David Warde-Farley`_.\nAPI changes summary\n-------------------\n\nHere are the code migration instructions when upgrading from scikit-learn\nversion 0.8:\n\n- The ``scikits.learn`` package was renamed ``sklearn``. There is\n still a ``scikits.learn`` package alias for backward compatibility.\nThird-party projects with a dependency on scikit-learn 0.9+ should\n upgrade their codebase. For instance, under Linux / MacOSX just run\n (make a backup first!)::\nfind -name \"*.py\" | xargs sed -i 's/\\bscikits.learn\\b/sklearn/g'\n\n- Estimators no longer accept model parameters as ``fit`` arguments:\n instead all parameters must be only be passed as constructor\n arguments or using the now public ``set_params`` method inherited\n from :class:`base.BaseEstimator`.\nSome estimators can still accept keyword arguments on the ``fit``\n but this is restricted to data-dependent values (e.g. a Gram matrix\n or an affinity matrix that are precomputed from the ``X`` data matrix.\n\n- The ``cross_val`` package has been renamed to ``cross_validation``\n although there is also a ``cross_val`` package alias in place for\n backward compatibility.\nThird-party projects with a dependency on scikit-learn 0.9+ should\n upgrade their codebase. For instance, under Linux / MacOSX just run\n (make a backup first!)::\nfind -name \"*.py\" | xargs sed -i 's/\\bcross_val\\b/cross_validation/g'\n\n- The ``score_func`` argument of the\n ``sklearn.cross_validation.cross_val_score`` function is now expected\n to accept ``y_test`` and ``y_predicted`` as only arguments for\n classification and regression tasks or ``X_test`` for unsupervised\n estimators.\n\n- ``gamma`` parameter for support vector machine algorithms is set\n to ``1 / n_features`` by default, instead of ``1 / n_samples``.\n\n- The ``sklearn.hmm`` has been marked as orphaned: it will be removed\n from scikit-learn in version 0.11 unless someone steps up to\n contribute documentation, examples and fix lurking numerical\n stability issues.\n\n- ``sklearn.neighbors`` has been made into a submodule. The two previously\n available estimators, ``NeighborsClassifier`` and ``NeighborsRegressor``\n have been marked as deprecated. Their functionality has been divided\n among five new classes: ``NearestNeighbors`` for unsupervised neighbors\n searches, ``KNeighborsClassifier`` & ``RadiusNeighborsClassifier``\n for supervised classification problems, and ``KNeighborsRegressor``\n & ``RadiusNeighborsRegressor`` for supervised regression problems.\n\n- ``sklearn.ball_tree.BallTree`` has been moved to\n ``sklearn.neighbors.BallTree``. Using the former will generate a warning.\n\n- ``sklearn.linear_model.LARS()`` and related classes (LassoLARS,\n LassoLARSCV, etc.) have been renamed to\n ``sklearn.linear_model.Lars()``.\n\n- All distance metrics and kernels in ``sklearn.metrics.pairwise`` now have a Y\n parameter, which by default is None. If not given, the result is the distance\n (or kernel similarity) between each sample in Y. If given, the result is the\n pairwise distance (or kernel similarity) between samples in X to Y.\n\n- ``sklearn.metrics.pairwise.l1_distance`` is now called ``manhattan_distance``,\n and by default returns the pairwise distance. For the component wise distance,\n set the parameter ``sum_over_features`` to ``False``.\n\nBackward compatibility package aliases and other deprecated classes and\nfunctions will be removed in version 0.11.\nPeople\n------\n\n38 people contributed to this release.\n\n- 387 `Vlad Niculae`_\n- 320 `Olivier Grisel`_\n- 192 `Lars Buitinck`_\n- 179 `Gael Varoquaux`_\n- 168 `Fabian Pedregosa`_ (`INRIA`_, `Parietal Team`_)\n- 127 `Jake Vanderplas`_\n- 120 `Mathieu Blondel`_\n- 85 `Alexandre Passos`_\n- 67 `Alexandre Gramfort`_\n- 57 `Peter Prettenhofer`_\n- 56 `Gilles Louppe`_\n- 42 Robert Layton\n- 38 Nelle Varoquaux\n- 32 :user:`Jean Kossaifi `\n- 30 Conrad Lee\n- 22 Pietro Berkes\n- 18 andy\n- 17 David Warde-Farley\n- 12 Brian Holt\n- 11 Robert\n- 8 Amit Aides\n- 8 :user:`Virgile Fritsch `\n- 7 `Yaroslav Halchenko`_\n- 6 Salvatore Masecchia\n- 5 Paolo Losi\n- 4 Vincent Schut\n- 3 Alexis Metaireau\n- 3 Bryan Silverthorn\n- 3 `Andreas M\u00fcller`_\n- 2 Minwoo Jake Lee\n- 1 Emmanuelle Gouillart\n- 1 Keith Goodman\n- 1 Lucas Wiman\n- 1 `Nicolas Pinto`_\n- 1 Thouis (Ray) Jones\n- 1 Tim Sheerman-Chase\n.. _changes_0_8:\n\nVersion 0.8\n===========\n\n**May 11, 2011**\n\nscikit-learn 0.8 was released on May 2011, one month after the first\n\"international\" `scikit-learn coding sprint\n `_ and is\nmarked by the inclusion of important modules: :ref:`hierarchical_clustering`,\n:ref:`cross_decomposition`, :ref:`NMF`, initial support for Python 3 and by important\nenhancements and bug fixes.\nChangelog\n---------\n\nSeveral new modules where introduced during this release:\n\n- New :ref:`hierarchical_clustering` module by Vincent Michel,\n `Bertrand Thirion`_, `Alexandre Gramfort`_ and `Gael Varoquaux`_.\n\n- :ref:`kernel_pca` implementation by `Mathieu Blondel`_\n\n- :ref:`labeled_faces_in_the_wild` by `Olivier Grisel`_.\n\n- New :ref:`cross_decomposition` module by `Edouard Duchesnay`_.\n\n- :ref:`NMF` module `Vlad Niculae`_\n\n- Implementation of the :ref:`oracle_approximating_shrinkage` algorithm by\n :user:`Virgile Fritsch ` in the :ref:`covariance` module.\nSome other modules benefited from significant improvements or cleanups.\n- Initial support for Python 3: builds and imports cleanly,\n some modules are usable while others have failing tests by `Fabian Pedregosa`_.\n\n- :class:`decomposition.PCA` is now usable from the Pipeline object by `Olivier Grisel`_.\n\n- Guide :ref:`performance-howto` by `Olivier Grisel`_.\n\n- Fixes for memory leaks in libsvm bindings, 64-bit safer BallTree by Lars Buitinck.\n\n- bug and style fixing in :ref:`k_means` algorithm by Jan Schl\u00fcter.\n\n- Add attribute converged to Gaussian Mixture Models by Vincent Schut.\n\n- Implemented ``transform``, ``predict_log_proba`` in\n :class:`discriminant_analysis.LinearDiscriminantAnalysis` By `Mathieu Blondel`_.\n\n- Refactoring in the :ref:`svm` module and bug fixes by `Fabian Pedregosa`_,\n `Gael Varoquaux`_ and Amit Aides.\n\n- Refactored SGD module (removed code duplication, better variable naming),\n added interface for sample weight by `Peter Prettenhofer`_.\n\n- Wrapped BallTree with Cython by Thouis (Ray) Jones.\n\n- Added function :func:`svm.l1_min_c` by Paolo Losi.\n\n- Typos, doc style, etc. by `Yaroslav Halchenko`_, `Gael Varoquaux`_,\n `Olivier Grisel`_, Yann Malet, `Nicolas Pinto`_, Lars Buitinck and\n `Fabian Pedregosa`_.\nPeople\n-------\n\nPeople that made this release possible preceded by number of commits:\n- 159 `Olivier Grisel`_\n- 96 `Gael Varoquaux`_\n- 96 `Vlad Niculae`_\n- 94 `Fabian Pedregosa`_\n- 36 `Alexandre Gramfort`_\n- 32 Paolo Losi\n- 31 `Edouard Duchesnay`_\n- 30 `Mathieu Blondel`_\n- 25 `Peter Prettenhofer`_\n- 22 `Nicolas Pinto`_\n- 11 :user:`Virgile Fritsch `\n - 7 Lars Buitinck\n - 6 Vincent Michel\n - 5 `Bertrand Thirion`_\n - 4 Thouis (Ray) Jones\n - 4 Vincent Schut\n - 3 Jan Schl\u00fcter\n - 2 Julien Miotte\n - 2 `Matthieu Perrot`_\n - 2 Yann Malet\n - 2 `Yaroslav Halchenko`_\n - 1 Amit Aides\n - 1 `Andreas M\u00fcller`_\n - 1 Feth Arezki\n - 1 Meng Xinfan\n.. _changes_0_7:\n\nVersion 0.7\n===========\n\n**March 2, 2011**\n\nscikit-learn 0.7 was released in March 2011, roughly three months\nafter the 0.6 release. This release is marked by the speed\nimprovements in existing algorithms like k-Nearest Neighbors and\nK-Means algorithm and by the inclusion of an efficient algorithm for\ncomputing the Ridge Generalized Cross Validation solution. Unlike the\npreceding release, no new modules where added to this release.\n\nChangelog\n---------\n\n- Performance improvements for Gaussian Mixture Model sampling [Jan\n Schl\u00fcter].\n\n- Implementation of efficient leave-one-out cross-validated Ridge in\n :class:`linear_model.RidgeCV` [`Mathieu Blondel`_]\n\n- Better handling of collinearity and early stopping in\n :func:`linear_model.lars_path` [`Alexandre Gramfort`_ and `Fabian\n Pedregosa`_].\n\n- Fixes for liblinear ordering of labels and sign of coefficients\n [Dan Yamins, Paolo Losi, `Mathieu Blondel`_ and `Fabian Pedregosa`_].\n\n- Performance improvements for Nearest Neighbors algorithm in\n high-dimensional spaces [`Fabian Pedregosa`_].\n\n- Performance improvements for :class:`cluster.KMeans` [`Gael\n Varoquaux`_ and `James Bergstra`_].\n\n- Sanity checks for SVM-based classes [`Mathieu Blondel`_].\n\n- Refactoring of :class:`neighbors.NeighborsClassifier` and\n :func:`neighbors.kneighbors_graph`: added different algorithms for\n the k-Nearest Neighbor Search and implemented a more stable\n algorithm for finding barycenter weights. Also added some\n developer documentation for this module, see\n `notes_neighbors\n `_ for more information [`Fabian Pedregosa`_].\n\n- Documentation improvements: Added :class:`pca.RandomizedPCA` and\n :class:`linear_model.LogisticRegression` to the class\n reference. Also added references of matrices used for clustering\n and other fixes [`Gael Varoquaux`_, `Fabian Pedregosa`_, `Mathieu\n Blondel`_, `Olivier Grisel`_, Virgile Fritsch , Emmanuelle\n Gouillart]\n\n- Binded decision_function in classes that make use of liblinear_,\n dense and sparse variants, like :class:`svm.LinearSVC` or\n :class:`linear_model.LogisticRegression` [`Fabian Pedregosa`_].\n\n- Performance and API improvements to\n :func:`metrics.euclidean_distances` and to\n :class:`pca.RandomizedPCA` [`James Bergstra`_].\n\n- Fix compilation issues under NetBSD [Kamel Ibn Hassen Derouiche]\n\n- Allow input sequences of different lengths in :class:`hmm.GaussianHMM`\n [`Ron Weiss`_].\n\n- Fix bug in affinity propagation caused by incorrect indexing [Xinfan Meng]\nPeople\n------\n\nPeople that made this release possible preceded by number of commits:\n\n- 85 `Fabian Pedregosa`_\n- 67 `Mathieu Blondel`_\n- 20 `Alexandre Gramfort`_\n- 19 `James Bergstra`_\n- 14 Dan Yamins\n- 13 `Olivier Grisel`_\n- 12 `Gael Varoquaux`_\n- 4 `Edouard Duchesnay`_\n- 4 `Ron Weiss`_\n- 2 Satrajit Ghosh\n- 2 Vincent Dubourg\n- 1 Emmanuelle Gouillart\n- 1 Kamel Ibn Hassen Derouiche\n- 1 Paolo Losi\n- 1 VirgileFritsch\n- 1 `Yaroslav Halchenko`_\n- 1 Xinfan Meng\n.. _changes_0_6:\n\nVersion 0.6\n===========\n\n**December 21, 2010**\n\nscikit-learn 0.6 was released on December 2010. It is marked by the\ninclusion of several new modules and a general renaming of old\nones. It is also marked by the inclusion of new example, including\napplications to real-world datasets.\nChangelog\n---------\n\n- New `stochastic gradient\n `_ descent\n module by Peter Prettenhofer. The module comes with complete\n documentation and examples.\n\n- Improved svm module: memory consumption has been reduced by 50%,\n heuristic to automatically set class weights, possibility to\n assign weights to samples (see\n :ref:`sphx_glr_auto_examples_svm_plot_weighted_samples.py` for an example).\n\n- New :ref:`gaussian_process` module by Vincent Dubourg. This module\n also has great documentation and some very neat examples. See\n example_gaussian_process_plot_gp_regression.py or\n example_gaussian_process_plot_gp_probabilistic_classification_after_regression.py\n for a taste of what can be done.\n\n- It is now possible to use liblinear\u2019s Multi-class SVC (option\n multi_class in :class:`svm.LinearSVC`)\n\n- New features and performance improvements of text feature\n extraction.\n\n- Improved sparse matrix support, both in main classes\n (:class:`grid_search.GridSearchCV`) as in modules\n sklearn.svm.sparse and sklearn.linear_model.sparse.\n\n- Lots of cool new examples and a new section that uses real-world\n datasets was created. These include:\n :ref:`sphx_glr_auto_examples_applications_plot_face_recognition.py`,\n :ref:`sphx_glr_auto_examples_applications_plot_species_distribution_modeling.py`,\n :ref:`sphx_glr_auto_examples_applications_svm_gui.py`,\n :ref:`sphx_glr_auto_examples_applications_wikipedia_principal_eigenvector.py` and\n others.\n\n- Faster :ref:`least_angle_regression` algorithm. It is now 2x\n faster than the R version on worst case and up to 10x times faster\n on some cases.\n\n- Faster coordinate descent algorithm. In particular, the full path\n version of lasso (:func:`linear_model.lasso_path`) is more than\n 200x times faster than before.\n\n- It is now possible to get probability estimates from a\n :class:`linear_model.LogisticRegression` model.\n\n- module renaming: the glm module has been renamed to linear_model,\n the gmm module has been included into the more general mixture\n model and the sgd module has been included in linear_model.\n\n- Lots of bug fixes and documentation improvements.\nPeople\n------\n\nPeople that made this release possible preceded by number of commits:\n* 207 `Olivier Grisel`_\n* 167 `Fabian Pedregosa`_\n* 97 `Peter Prettenhofer`_\n* 68 `Alexandre Gramfort`_\n* 59 `Mathieu Blondel`_\n* 55 `Gael Varoquaux`_\n* 33 Vincent Dubourg\n* 21 `Ron Weiss`_\n* 9 Bertrand Thirion\n* 3 `Alexandre Passos`_\n* 3 Anne-Laure Fouque\n* 2 Ronan Amicel\n* 1 `Christian Osendorfer`_\n.. _changes_0_5:\nVersion 0.5\n===========\n\n**October 11, 2010**\n\nChangelog\n---------\n\nNew classes\n-----------\n\n- Support for sparse matrices in some classifiers of modules\n ``svm`` and ``linear_model`` (see :class:`svm.sparse.SVC`,\n :class:`svm.sparse.SVR`, :class:`svm.sparse.LinearSVC`,\n :class:`linear_model.sparse.Lasso`, :class:`linear_model.sparse.ElasticNet`)\n\n- New :class:`pipeline.Pipeline` object to compose different estimators.\n\n- Recursive Feature Elimination routines in module\n :ref:`feature_selection`.\n\n- Addition of various classes capable of cross validation in the\n linear_model module (:class:`linear_model.LassoCV`, :class:`linear_model.ElasticNetCV`,\n etc.).\n\n- New, more efficient LARS algorithm implementation. The Lasso\n variant of the algorithm is also implemented. See\n :class:`linear_model.lars_path`, :class:`linear_model.Lars` and\n :class:`linear_model.LassoLars`.\n\n- New Hidden Markov Models module (see classes\n :class:`hmm.GaussianHMM`, :class:`hmm.MultinomialHMM`,\n :class:`hmm.GMMHMM`)\n\n- New module feature_extraction (see :ref:`class reference\n `)\n\n- New FastICA algorithm in module sklearn.fastica\nDocumentation\n-------------\n\n- Improved documentation for many modules, now separating\n narrative documentation from the class reference. As an example,\n see `documentation for the SVM module\n `_ and the\n complete `class reference\n `_.\n\nFixes\n-----\n\n- API changes: adhere variable names to PEP-8, give more\n meaningful names.\n\n- Fixes for svm module to run on a shared memory context\n (multiprocessing).\n\n- It is again possible to generate latex (and thus PDF) from the\n sphinx docs.\n\nExamples\n--------\n\n- new examples using some of the mlcomp datasets:\n ``sphx_glr_auto_examples_mlcomp_sparse_document_classification.py`` (since removed) and\n :ref:`sphx_glr_auto_examples_text_document_classification_20newsgroups.py`\n\n- Many more examples. `See here\n `_\n the full list of examples.\nExternal dependencies\n---------------------\n\n- Joblib is now a dependency of this package, although it is\n shipped with (sklearn.externals.joblib).\n\nRemoved modules\n---------------\n\n- Module ann (Artificial Neural Networks) has been removed from\n the distribution. Users wanting this sort of algorithms should\n take a look into pybrain.\n\nMisc\n----\n\n- New sphinx theme for the web page.\nAuthors\n-------\n\nThe following is a list of authors for this release, preceded by\nnumber of commits:\n* 262 Fabian Pedregosa\n * 240 Gael Varoquaux\n * 149 Alexandre Gramfort\n * 116 Olivier Grisel\n * 40 Vincent Michel\n * 38 Ron Weiss\n * 23 Matthieu Perrot\n * 10 Bertrand Thirion\n * 7 Yaroslav Halchenko\n * 9 VirgileFritsch\n * 6 Edouard Duchesnay\n * 4 Mathieu Blondel\n * 1 Ariel Rokem\n * 1 Matthieu Brucher\n\nVersion 0.4\n===========\n\n**August 26, 2010**\n\nChangelog\n---------\n\nMajor changes in this release include:\n\n- Coordinate Descent algorithm (Lasso, ElasticNet) refactoring &\n speed improvements (roughly 100x times faster).\n\n- Coordinate Descent Refactoring (and bug fixing) for consistency\n with R's package GLMNET.\n\n- New metrics module.\n\n- New GMM module contributed by Ron Weiss.\n\n- Implementation of the LARS algorithm (without Lasso variant for now).\n\n- feature_selection module redesign.\n\n- Migration to GIT as version control system.\n\n- Removal of obsolete attrselect module.\n\n- Rename of private compiled extensions (added underscore).\n\n- Removal of legacy unmaintained code.\n\n- Documentation improvements (both docstring and rst).\n\n- Improvement of the build system to (optionally) link with MKL.\n Also, provide a lite BLAS implementation in case no system-wide BLAS is\n found.\n\n- Lots of new examples.\n\n- Many, many bug fixes ...\nAuthors\n-------\n\nThe committer list for this release is the following (preceded by number\nof commits):\n* 143 Fabian Pedregosa\n * 35 Alexandre Gramfort\n * 34 Olivier Grisel\n * 11 Gael Varoquaux\n * 5 Yaroslav Halchenko\n * 2 Vincent Michel\n * 1 Chris Filo Gorgolewski\nEarlier versions\n================\n\nEarlier versions included contributions by Fred Mailhot, David Cooke,\nDavid Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.\n\n.. _Olivier Grisel: https://twitter.com/ogrisel\n\n.. _Gael Varoquaux: http://gael-varoquaux.info\n\n.. _Alexandre Gramfort: http://alexandre.gramfort.net\n\n.. _Fabian Pedregosa: http://fa.bianp.net\n\n.. _Mathieu Blondel: http://www.mblondel.org\n\n.. _James Bergstra: http://www-etud.iro.umontreal.ca/~bergstrj/\n\n.. _liblinear: http://www.csie.ntu.edu.tw/~cjlin/liblinear/\n\n.. _Yaroslav Halchenko: http://www.onerussian.com/\n\n.. _Vlad Niculae: http://vene.ro\n\n.. _Edouard Duchesnay: https://sites.google.com/site/duchesnay/home\n\n.. _Peter Prettenhofer: https://sites.google.com/site/peterprettenhofer/\n\n.. _Alexandre Passos: http://atpassos.me\n\n.. _Nicolas Pinto: https://twitter.com/npinto\n\n.. _Bertrand Thirion: https://team.inria.fr/parietal/bertrand-thirions-page\n\n.. _Andreas M\u00fcller: http://peekaboo-vision.blogspot.com\n\n.. _Matthieu Perrot: http://brainvisa.info/biblio/lnao/en/Author/PERROT-M.html\n\n.. _Jake Vanderplas: http://staff.washington.edu/jakevdp/\n\n.. _Gilles Louppe: http://www.montefiore.ulg.ac.be/~glouppe/\n\n.. _INRIA: http://www.inria.fr\n\n.. _Parietal Team: http://parietal.saclay.inria.fr/\n\n.. _David Warde-Farley: http://www-etud.iro.umontreal.ca/~wardefar/\n\n.. _Brian Holt: http://personal.ee.surrey.ac.uk/Personal/B.Holt\n\n.. _Satrajit Ghosh: http://www.mit.edu/~satra/\n\n.. _Robert Layton: https://twitter.com/robertlayton\n\n.. _Scott White: https://twitter.com/scottblanc\n\n.. _David Marek: http://www.davidmarek.cz/\n\n.. _Christian Osendorfer: https://osdf.github.io\n\n.. _Arnaud Joly: http://www.ajoly.org\n\n.. _Rob Zinkov: http://zinkov.com\n\n.. _Joel Nothman: http://joelnothman.com\n\n.. _Nicolas Tr\u00e9segnie : http://nicolastr.com/\n\n.. _Kemal Eren: http://www.kemaleren.com\n\n.. _Yann Dauphin: http://ynd.github.io/\n\n.. _Yannick Schwartz: https://team.inria.fr/parietal/schwarty/\n\n.. _Kyle Kastner: http://kastnerkyle.github.io\n\n.. _Daniel Nouri: http://danielnouri.org\n\n.. _Manoj Kumar: https://manojbits.wordpress.com\n\n.. _Luis Pedro Coelho: http://luispedro.org\n\n.. _Fares Hedyati: http://www.eecs.berkeley.edu/~fareshed\n\n.. _Antony Lee: https://www.ocf.berkeley.edu/~antonyl/\n\n.. _Martin Billinger: http://tnsre.embs.org/author/martinbillinger\n\n.. _Matteo Visconti di Oleggio Castello: http://www.mvdoc.me\n\n.. _Trevor Stephens: http://trevorstephens.com/\n\n.. _Jan Hendrik Metzen: https://jmetzen.github.io/\n\n.. _Will Dawson: http://www.dawsonresearch.com\n\n.. _Andrew Tulloch: http://tullo.ch/\n\n.. _Hanna Wallach: http://dirichlet.net/\n\n.. _Yan Yi: http://seowyanyi.org\n\n.. _Herv\u00e9 Bredin: http://herve.niderb.fr/\n\n.. _Eric Martin: http://www.ericmart.in\n\n.. _Nicolas Goix: https://perso.telecom-paristech.fr/~goix/\n\n.. _Sebastian Raschka: http://sebastianraschka.com\n\n.. _Brian McFee: https://bmcfee.github.io\n\n.. _Valentin Stolbunov: http://www.vstolbunov.com\n\n.. _Jaques Grobler: https://github.com/jaquesgrobler\n\n.. _Lars Buitinck: https://github.com/larsmans\n\n.. _Loic Esteve: https://github.com/lesteve\n\n.. _Noel Dawe: https://github.com/ndawe\n\n.. _Raghav RV: https://github.com/raghavrv\n\n.. _Tom Dupre la Tour: https://github.com/TomDLT\n\n.. _Nelle Varoquaux: https://github.com/nellev\n\n.. _Bing Tian Dai: https://github.com/btdai\n\n.. _Dylan Werner-Meier: https://github.com/unautre\n\n.. _Alyssa Batula: https://github.com/abatula\n\n.. _Srivatsan Ramesh: https://github.com/srivatsan-ramesh\n\n.. _Ron Weiss: http://www.ee.columbia.edu/~ronw\n\n.. _Kathleen Chen: https://github.com/kchen17\n\n.. _Vincent Pham: https://github.com/vincentpham1991\n\n.. _Denis Engemann: http://denis-engemann.de\n.. _Anish Shah: https://github.com/AnishShah\n\n.. _Neeraj Gangwar: http://neerajgangwar.in\n.. _Arthur Mensch: https://amensch.fr\n"
    },
    {
      "turn_idx": 19,
      "action": "fetch_website_content({\"website_url\": \"https://scikit-learn.org/1.3/_sources/whats_new/v0.19.rst.txt\"})",
      "explanation": "",
      "observation": ".. include:: _contributors.rst\n\n.. currentmodule:: sklearn\n\n.. _changes_0_19:\n\nVersion 0.19.2\n==============\n\n**July, 2018**\n\nThis release is exclusively in order to support Python 3.7.\n\nRelated changes\n---------------\n\n- ``n_iter_`` may vary from previous releases in\n :class:`linear_model.LogisticRegression` with ``solver='lbfgs'`` and\n :class:`linear_model.HuberRegressor`. For Scipy <= 1.0.0, the optimizer could\n perform more than the requested maximum number of iterations. Now both\n estimators will report at most ``max_iter`` iterations even if more were\n performed. :issue:`10723` by `Joel Nothman`_.\n\nVersion 0.19.1\n==============\n\n**October 23, 2017**\n\nThis is a bug-fix release with some minor documentation improvements and\nenhancements to features released in 0.19.0.\n\nNote there may be minor differences in TSNE output in this release (due to\n:issue:`9623`), in the case where multiple samples have equal distance to some\nsample.\n\nChangelog\n---------\n\nAPI changes\n...........\n\n- Reverted the addition of ``metrics.ndcg_score`` and ``metrics.dcg_score``\n which had been merged into version 0.19.0 by error. The implementations\n were broken and undocumented.\n\n- ``return_train_score`` which was added to\n :class:`model_selection.GridSearchCV`,\n :class:`model_selection.RandomizedSearchCV` and\n :func:`model_selection.cross_validate` in version 0.19.0 will be changing its\n default value from True to False in version 0.21. We found that calculating\n training score could have a great effect on cross validation runtime in some\n cases. Users should explicitly set ``return_train_score`` to False if\n prediction or scoring functions are slow, resulting in a deleterious effect\n on CV runtime, or to True if they wish to use the calculated scores.\n :issue:`9677` by :user:`Kumar Ashutosh ` and `Joel\n Nothman`_.\n\n- ``correlation_models`` and ``regression_models`` from the legacy gaussian\n processes implementation have been belatedly deprecated. :issue:`9717` by\n :user:`Kumar Ashutosh `.\n\nBug fixes\n.........\n\n- Avoid integer overflows in :func:`metrics.matthews_corrcoef`.\n :issue:`9693` by :user:`Sam Steingold `.\n\n- Fixed a bug in the objective function for :class:`manifold.TSNE` (both exact\n and with the Barnes-Hut approximation) when ``n_components >= 3``.\n :issue:`9711` by :user:`goncalo-rodrigues`.\n\n- Fix regression in :func:`model_selection.cross_val_predict` where it\n raised an error with ``method='predict_proba'`` for some probabilistic\n classifiers. :issue:`9641` by :user:`James Bourbeau `.\n\n- Fixed a bug where :func:`datasets.make_classification` modified its input\n ``weights``. :issue:`9865` by :user:`Sachin Kelkar `.\n\n- :class:`model_selection.StratifiedShuffleSplit` now works with multioutput\n multiclass or multilabel data with more than 1000 columns. :issue:`9922` by\n :user:`Charlie Brummitt `.\n\n- Fixed a bug with nested and conditional parameter setting, e.g. setting a\n pipeline step and its parameter at the same time. :issue:`9945` by `Andreas\n M\u00fcller`_ and `Joel Nothman`_.\n\nRegressions in 0.19.0 fixed in 0.19.1:\n\n- Fixed a bug where parallelised prediction in random forests was not\n thread-safe and could (rarely) result in arbitrary errors. :issue:`9830` by\n `Joel Nothman`_.\n\n- Fix regression in :func:`model_selection.cross_val_predict` where it no\n longer accepted ``X`` as a list. :issue:`9600` by :user:`Rasul Kerimov\n `.\n\n- Fixed handling of :func:`model_selection.cross_val_predict` for binary\n classification with ``method='decision_function'``. :issue:`9593` by\n :user:`Reiichiro Nakano ` and core devs.\n\n- Fix regression in :class:`pipeline.Pipeline` where it no longer accepted\n ``steps`` as a tuple. :issue:`9604` by :user:`Joris Van den Bossche\n `.\n\n- Fix bug where ``n_iter`` was not properly deprecated, leaving ``n_iter``\n unavailable for interim use in\n :class:`linear_model.SGDClassifier`, :class:`linear_model.SGDRegressor`,\n :class:`linear_model.PassiveAggressiveClassifier`,\n :class:`linear_model.PassiveAggressiveRegressor` and\n :class:`linear_model.Perceptron`. :issue:`9558` by `Andreas M\u00fcller`_.\n\n- Dataset fetchers make sure temporary files are closed before removing them,\n which caused errors on Windows. :issue:`9847` by :user:`Joan Massich `.\n\n- Fixed a regression in :class:`manifold.TSNE` where it no longer supported\n metrics other than 'euclidean' and 'precomputed'. :issue:`9623` by :user:`Oli\n Blum `.\n\nEnhancements\n............\n\n- Our test suite and :func:`utils.estimator_checks.check_estimator` can now be\n run without Nose installed. :issue:`9697` by :user:`Joan Massich `.\n\n- To improve usability of version 0.19's :class:`pipeline.Pipeline`\n caching, ``memory`` now allows ``joblib.Memory`` instances.\n This make use of the new :func:`utils.validation.check_memory` helper.\n issue:`9584` by :user:`Kumar Ashutosh `\n\n- Some fixes to examples: :issue:`9750`, :issue:`9788`, :issue:`9815`\n\n- Made a FutureWarning in SGD-based estimators less verbose. :issue:`9802` by\n :user:`Vrishank Bhardwaj `.\n\nCode and Documentation Contributors\n-----------------------------------\n\nWith thanks to:\n\nJoel Nothman, Loic Esteve, Andreas Mueller, Kumar Ashutosh,\nVrishank Bhardwaj, Hanmin Qin, Rasul Kerimov, James Bourbeau,\nNagarjuna Kumar, Nathaniel Saul, Olivier Grisel, Roman\nYurchak, Reiichiro Nakano, Sachin Kelkar, Sam Steingold,\nYaroslav Halchenko, diegodlh, felix, goncalo-rodrigues,\njkleint, oliblum90, pasbi, Anthony Gitter, Ben Lawson, Charlie\nBrummitt, Didi Bar-Zev, Gael Varoquaux, Joan Massich, Joris\nVan den Bossche, nielsenmarkus11\nVersion 0.19\n============\n\n**August 12, 2017**\n\nHighlights\n----------\n\nWe are excited to release a number of great new features including\n:class:`neighbors.LocalOutlierFactor` for anomaly detection,\n:class:`preprocessing.QuantileTransformer` for robust feature transformation,\nand the :class:`multioutput.ClassifierChain` meta-estimator to simply account\nfor dependencies between classes in multilabel problems. We have some new\nalgorithms in existing estimators, such as multiplicative update in\n:class:`decomposition.NMF` and multinomial\n:class:`linear_model.LogisticRegression` with L1 loss (use ``solver='saga'``).\n\nCross validation is now able to return the results from multiple metric\nevaluations. The new :func:`model_selection.cross_validate` can return many\nscores on the test data as well as training set performance and timings, and we\nhave extended the ``scoring`` and ``refit`` parameters for grid/randomized\nsearch :ref:`to handle multiple metrics `.\n\nYou can also learn faster. For instance, the :ref:`new option to cache\ntransformations ` in :class:`pipeline.Pipeline` makes grid\nsearch over pipelines including slow transformations much more efficient. And\nyou can predict faster: if you're sure you know what you're doing, you can turn\noff validating that the input is finite using :func:`config_context`.\n\nWe've made some important fixes too. We've fixed a longstanding implementation\nerror in :func:`metrics.average_precision_score`, so please be cautious with\nprior results reported from that function. A number of errors in the\n:class:`manifold.TSNE` implementation have been fixed, particularly in the\ndefault Barnes-Hut approximation. :class:`semi_supervised.LabelSpreading` and\n:class:`semi_supervised.LabelPropagation` have had substantial fixes.\nLabelPropagation was previously broken. LabelSpreading should now correctly\nrespect its alpha parameter.\n\nChanged models\n--------------\n\nThe following estimators and functions, when fit with the same data and\nparameters, may produce different models from the previous version. This often\noccurs due to changes in the modelling logic (bug fixes or enhancements), or in\nrandom sampling procedures.\n\n- :class:`cluster.KMeans` with sparse X and initial centroids given (bug fix)\n- :class:`cross_decomposition.PLSRegression`\n with ``scale=True`` (bug fix)\n- :class:`ensemble.GradientBoostingClassifier` and\n :class:`ensemble.GradientBoostingRegressor` where ``min_impurity_split`` is used (bug fix)\n- gradient boosting ``loss='quantile'`` (bug fix)\n- :class:`ensemble.IsolationForest` (bug fix)\n- :class:`feature_selection.SelectFdr` (bug fix)\n- :class:`linear_model.RANSACRegressor` (bug fix)\n- :class:`linear_model.LassoLars` (bug fix)\n- :class:`linear_model.LassoLarsIC` (bug fix)\n- :class:`manifold.TSNE` (bug fix)\n- :class:`neighbors.NearestCentroid` (bug fix)\n- :class:`semi_supervised.LabelSpreading` (bug fix)\n- :class:`semi_supervised.LabelPropagation` (bug fix)\n- tree based models where ``min_weight_fraction_leaf`` is used (enhancement)\n- :class:`model_selection.StratifiedKFold` with ``shuffle=True``\n (this change, due to :issue:`7823` was not mentioned in the release notes at\n the time)\n\nDetails are listed in the changelog below.\n\n(While we are trying to better inform users by providing this information, we\ncannot assure that this list is complete.)\n\nChangelog\n---------\n\nNew features\n............\n\nClassifiers and regressors\n\n- Added :class:`multioutput.ClassifierChain` for multi-label\n classification. By :user:`Adam Kleczewski `.\n\n- Added solver ``'saga'`` that implements the improved version of Stochastic\n Average Gradient, in :class:`linear_model.LogisticRegression` and\n :class:`linear_model.Ridge`. It allows the use of L1 penalty with\n multinomial logistic loss, and behaves marginally better than 'sag'\n during the first epochs of ridge and logistic regression.\n :issue:`8446` by `Arthur Mensch`_.\n\nOther estimators\n\n- Added the :class:`neighbors.LocalOutlierFactor` class for anomaly\n detection based on nearest neighbors.\n :issue:`5279` by `Nicolas Goix`_ and `Alexandre Gramfort`_.\n\n- Added :class:`preprocessing.QuantileTransformer` class and\n :func:`preprocessing.quantile_transform` function for features\n normalization based on quantiles.\n :issue:`8363` by :user:`Denis Engemann `,\n :user:`Guillaume Lemaitre `, `Olivier Grisel`_, `Raghav RV`_,\n :user:`Thierry Guillemot `, and `Gael Varoquaux`_.\n\n- The new solver ``'mu'`` implements a Multiplicate Update in\n :class:`decomposition.NMF`, allowing the optimization of all\n beta-divergences, including the Frobenius norm, the generalized\n Kullback-Leibler divergence and the Itakura-Saito divergence.\n :issue:`5295` by `Tom Dupre la Tour`_.\n\nModel selection and evaluation\n\n- :class:`model_selection.GridSearchCV` and\n :class:`model_selection.RandomizedSearchCV` now support simultaneous\n evaluation of multiple metrics. Refer to the\n :ref:`multimetric_grid_search` section of the user guide for more\n information. :issue:`7388` by `Raghav RV`_\n\n- Added the :func:`model_selection.cross_validate` which allows evaluation\n of multiple metrics. This function returns a dict with more useful\n information from cross-validation such as the train scores, fit times and\n score times.\n Refer to :ref:`multimetric_cross_validation` section of the userguide\n for more information. :issue:`7388` by `Raghav RV`_\n\n- Added :func:`metrics.mean_squared_log_error`, which computes\n the mean square error of the logarithmic transformation of targets,\n particularly useful for targets with an exponential trend.\n :issue:`7655` by :user:`Karan Desai `.\n\n- Added :func:`metrics.dcg_score` and :func:`metrics.ndcg_score`, which\n compute Discounted cumulative gain (DCG) and Normalized discounted\n cumulative gain (NDCG).\n :issue:`7739` by :user:`David Gasquez `.\n\n- Added the :class:`model_selection.RepeatedKFold` and\n :class:`model_selection.RepeatedStratifiedKFold`.\n :issue:`8120` by `Neeraj Gangwar`_.\n\nMiscellaneous\n\n- Validation that input data contains no NaN or inf can now be suppressed\n using :func:`config_context`, at your own risk. This will save on runtime,\n and may be particularly useful for prediction time. :issue:`7548` by\n `Joel Nothman`_.\n\n- Added a test to ensure parameter listing in docstrings match the\n function/class signature. :issue:`9206` by `Alexandre Gramfort`_ and\n `Raghav RV`_.\n\nEnhancements\n............\n\nTrees and ensembles\n\n- The ``min_weight_fraction_leaf`` constraint in tree construction is now\n more efficient, taking a fast path to declare a node a leaf if its weight\n is less than 2 * the minimum. Note that the constructed tree will be\n different from previous versions where ``min_weight_fraction_leaf`` is\n used. :issue:`7441` by :user:`Nelson Liu `.\n\n- :class:`ensemble.GradientBoostingClassifier` and :class:`ensemble.GradientBoostingRegressor`\n now support sparse input for prediction.\n :issue:`6101` by :user:`Ibraim Ganiev `.\n\n- :class:`ensemble.VotingClassifier` now allows changing estimators by using\n :meth:`ensemble.VotingClassifier.set_params`. An estimator can also be\n removed by setting it to ``None``.\n :issue:`7674` by :user:`Yichuan Liu `.\n\n- :func:`tree.export_graphviz` now shows configurable number of decimal\n places. :issue:`8698` by :user:`Guillaume Lemaitre `.\n\n- Added ``flatten_transform`` parameter to :class:`ensemble.VotingClassifier`\n to change output shape of `transform` method to 2 dimensional.\n :issue:`7794` by :user:`Ibraim Ganiev ` and\n :user:`Herilalaina Rakotoarison `.\n\nLinear, kernelized and related models\n\n- :class:`linear_model.SGDClassifier`, :class:`linear_model.SGDRegressor`,\n :class:`linear_model.PassiveAggressiveClassifier`,\n :class:`linear_model.PassiveAggressiveRegressor` and\n :class:`linear_model.Perceptron` now expose ``max_iter`` and\n ``tol`` parameters, to handle convergence more precisely.\n ``n_iter`` parameter is deprecated, and the fitted estimator exposes\n a ``n_iter_`` attribute, with actual number of iterations before\n convergence. :issue:`5036` by `Tom Dupre la Tour`_.\n\n- Added ``average`` parameter to perform weight averaging in\n :class:`linear_model.PassiveAggressiveClassifier`. :issue:`4939`\n by :user:`Andrea Esuli `.\n\n- :class:`linear_model.RANSACRegressor` no longer throws an error\n when calling ``fit`` if no inliers are found in its first iteration.\n Furthermore, causes of skipped iterations are tracked in newly added\n attributes, ``n_skips_*``.\n :issue:`7914` by :user:`Michael Horrell `.\n\n- In :class:`gaussian_process.GaussianProcessRegressor`, method ``predict``\n is a lot faster with ``return_std=True``. :issue:`8591` by\n :user:`Hadrien Bertrand `.\n\n- Added ``return_std`` to ``predict`` method of\n :class:`linear_model.ARDRegression` and\n :class:`linear_model.BayesianRidge`.\n :issue:`7838` by :user:`Sergey Feldman `.\n\n- Memory usage enhancements: Prevent cast from float32 to float64 in:\n :class:`linear_model.MultiTaskElasticNet`;\n :class:`linear_model.LogisticRegression` when using newton-cg solver; and\n :class:`linear_model.Ridge` when using svd, sparse_cg, cholesky or lsqr\n solvers. :issue:`8835`, :issue:`8061` by :user:`Joan Massich ` and :user:`Nicolas\n Cordier ` and :user:`Thierry Guillemot `.\n\nOther predictors\n\n- Custom metrics for the :mod:`sklearn.neighbors` binary trees now have\n fewer constraints: they must take two 1d-arrays and return a float.\n :issue:`6288` by `Jake Vanderplas`_.\n\n- ``algorithm='auto`` in :mod:`sklearn.neighbors` estimators now chooses the most\n appropriate algorithm for all input types and metrics. :issue:`9145` by\n :user:`Herilalaina Rakotoarison ` and :user:`Reddy Chinthala\n `.\n\nDecomposition, manifold learning and clustering\n\n- :class:`cluster.MiniBatchKMeans` and :class:`cluster.KMeans`\n now use significantly less memory when assigning data points to their\n nearest cluster center. :issue:`7721` by :user:`Jon Crall `.\n\n- :class:`decomposition.PCA`, :class:`decomposition.IncrementalPCA` and\n :class:`decomposition.TruncatedSVD` now expose the singular values\n from the underlying SVD. They are stored in the attribute\n ``singular_values_``, like in :class:`decomposition.IncrementalPCA`.\n :issue:`7685` by :user:`Tommy L\u00f6fstedt `\n\n- :class:`decomposition.NMF` now faster when ``beta_loss=0``.\n :issue:`9277` by :user:`hongkahjun`.\n\n- Memory improvements for method ``barnes_hut`` in :class:`manifold.TSNE`\n :issue:`7089` by :user:`Thomas Moreau ` and `Olivier Grisel`_.\n\n- Optimization schedule improvements for Barnes-Hut :class:`manifold.TSNE`\n so the results are closer to the one from the reference implementation\n `lvdmaaten/bhtsne `_ by :user:`Thomas\n Moreau ` and `Olivier Grisel`_.\n\n- Memory usage enhancements: Prevent cast from float32 to float64 in\n :class:`decomposition.PCA` and\n `decomposition.randomized_svd_low_rank`.\n :issue:`9067` by `Raghav RV`_.\n\nPreprocessing and feature selection\n\n- Added ``norm_order`` parameter to :class:`feature_selection.SelectFromModel`\n to enable selection of the norm order when ``coef_`` is more than 1D.\n :issue:`6181` by :user:`Antoine Wendlinger `.\n\n- Added ability to use sparse matrices in :func:`feature_selection.f_regression`\n with ``center=True``. :issue:`8065` by :user:`Daniel LeJeune `.\n\n- Small performance improvement to n-gram creation in\n :mod:`sklearn.feature_extraction.text` by binding methods for loops and\n special-casing unigrams. :issue:`7567` by :user:`Jaye Doepke `\n\n- Relax assumption on the data for the\n :class:`kernel_approximation.SkewedChi2Sampler`. Since the Skewed-Chi2\n kernel is defined on the open interval :math:`(-skewedness; +\\infty)^d`,\n the transform function should not check whether ``X < 0`` but whether ``X <\n -self.skewedness``. :issue:`7573` by :user:`Romain Brault `.\n\n- Made default kernel parameters kernel-dependent in\n :class:`kernel_approximation.Nystroem`.\n :issue:`5229` by :user:`Saurabh Bansod ` and `Andreas M\u00fcller`_.\n\nModel evaluation and meta-estimators\n\n- :class:`pipeline.Pipeline` is now able to cache transformers\n within a pipeline by using the ``memory`` constructor parameter.\n :issue:`7990` by :user:`Guillaume Lemaitre `.\n\n- :class:`pipeline.Pipeline` steps can now be accessed as attributes of its\n ``named_steps`` attribute. :issue:`8586` by :user:`Herilalaina\n Rakotoarison `.\n\n- Added ``sample_weight`` parameter to :meth:`pipeline.Pipeline.score`.\n :issue:`7723` by :user:`Mikhail Korobov `.\n\n- Added ability to set ``n_jobs`` parameter to :func:`pipeline.make_union`.\n A ``TypeError`` will be raised for any other kwargs. :issue:`8028`\n by :user:`Alexander Booth `.\n\n- :class:`model_selection.GridSearchCV`,\n :class:`model_selection.RandomizedSearchCV` and\n :func:`model_selection.cross_val_score` now allow estimators with callable\n kernels which were previously prohibited.\n :issue:`8005` by `Andreas M\u00fcller`_ .\n\n- :func:`model_selection.cross_val_predict` now returns output of the\n correct shape for all values of the argument ``method``.\n :issue:`7863` by :user:`Aman Dalmia `.\n\n- Added ``shuffle`` and ``random_state`` parameters to shuffle training\n data before taking prefixes of it based on training sizes in\n :func:`model_selection.learning_curve`.\n :issue:`7506` by :user:`Narine Kokhlikyan `.\n\n- :class:`model_selection.StratifiedShuffleSplit` now works with multioutput\n multiclass (or multilabel) data. :issue:`9044` by `Vlad Niculae`_.\n\n- Speed improvements to :class:`model_selection.StratifiedShuffleSplit`.\n :issue:`5991` by :user:`Arthur Mensch ` and `Joel Nothman`_.\n\n- Add ``shuffle`` parameter to :func:`model_selection.train_test_split`.\n :issue:`8845` by :user:`themrmax `\n\n- :class:`multioutput.MultiOutputRegressor` and :class:`multioutput.MultiOutputClassifier`\n now support online learning using ``partial_fit``.\n :issue: `8053` by :user:`Peng Yu `.\n\n- Add ``max_train_size`` parameter to :class:`model_selection.TimeSeriesSplit`\n :issue:`8282` by :user:`Aman Dalmia `.\n\n- More clustering metrics are now available through :func:`metrics.get_scorer`\n and ``scoring`` parameters. :issue:`8117` by `Raghav RV`_.\n\n- A scorer based on :func:`metrics.explained_variance_score` is also available.\n :issue:`9259` by :user:`Hanmin Qin `.\n\nMetrics\n\n- :func:`metrics.matthews_corrcoef` now support multiclass classification.\n :issue:`8094` by :user:`Jon Crall `.\n\n- Add ``sample_weight`` parameter to :func:`metrics.cohen_kappa_score`.\n :issue:`8335` by :user:`Victor Poughon `.\n\nMiscellaneous\n\n- :func:`utils.estimator_checks.check_estimator` now attempts to ensure that methods\n transform, predict, etc. do not set attributes on the estimator.\n :issue:`7533` by :user:`Ekaterina Krivich `.\n\n- Added type checking to the ``accept_sparse`` parameter in\n :mod:`sklearn.utils.validation` methods. This parameter now accepts only boolean,\n string, or list/tuple of strings. ``accept_sparse=None`` is deprecated and\n should be replaced by ``accept_sparse=False``.\n :issue:`7880` by :user:`Josh Karnofsky `.\n\n- Make it possible to load a chunk of an svmlight formatted file by\n passing a range of bytes to :func:`datasets.load_svmlight_file`.\n :issue:`935` by :user:`Olivier Grisel `.\n\n- :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor`\n now accept non-finite features. :issue:`8931` by :user:`Attractadore`.\n\nBug fixes\n.........\n\nTrees and ensembles\n\n- Fixed a memory leak in trees when using trees with ``criterion='mae'``.\n :issue:`8002` by `Raghav RV`_.\n\n- Fixed a bug where :class:`ensemble.IsolationForest` uses an\n an incorrect formula for the average path length\n :issue:`8549` by `Peter Wang `_.\n\n- Fixed a bug where :class:`ensemble.AdaBoostClassifier` throws\n ``ZeroDivisionError`` while fitting data with single class labels.\n :issue:`7501` by :user:`Dominik Krzeminski `.\n\n- Fixed a bug in :class:`ensemble.GradientBoostingClassifier` and\n :class:`ensemble.GradientBoostingRegressor` where a float being compared\n to ``0.0`` using ``==`` caused a divide by zero error. :issue:`7970` by\n :user:`He Chen `.\n\n- Fix a bug where :class:`ensemble.GradientBoostingClassifier` and\n :class:`ensemble.GradientBoostingRegressor` ignored the\n ``min_impurity_split`` parameter.\n :issue:`8006` by :user:`Sebastian P\u00f6lsterl `.\n\n- Fixed ``oob_score`` in :class:`ensemble.BaggingClassifier`.\n :issue:`8936` by :user:`Michael Lewis `\n\n- Fixed excessive memory usage in prediction for random forests estimators.\n :issue:`8672` by :user:`Mike Benfield `.\n\n- Fixed a bug where ``sample_weight`` as a list broke random forests in Python 2\n :issue:`8068` by :user:`xor`.\n\n- Fixed a bug where :class:`ensemble.IsolationForest` fails when\n ``max_features`` is less than 1.\n :issue:`5732` by :user:`Ishank Gulati `.\n\n- Fix a bug where gradient boosting with ``loss='quantile'`` computed\n negative errors for negative values of ``ytrue - ypred`` leading to wrong\n values when calling ``__call__``.\n :issue:`8087` by :user:`Alexis Mignon `\n\n- Fix a bug where :class:`ensemble.VotingClassifier` raises an error\n when a numpy array is passed in for weights. :issue:`7983` by\n :user:`Vincent Pham `.\n\n- Fixed a bug where :func:`tree.export_graphviz` raised an error\n when the length of features_names does not match n_features in the decision\n tree. :issue:`8512` by :user:`Li Li `.\n\nLinear, kernelized and related models\n\n- Fixed a bug where :func:`linear_model.RANSACRegressor.fit` may run until\n ``max_iter`` if it finds a large inlier group early. :issue:`8251` by\n :user:`aivision2020`.\n\n- Fixed a bug where :class:`naive_bayes.MultinomialNB` and\n :class:`naive_bayes.BernoulliNB` failed when ``alpha=0``. :issue:`5814` by\n :user:`Yichuan Liu ` and :user:`Herilalaina Rakotoarison\n `.\n\n- Fixed a bug where :class:`linear_model.LassoLars` does not give\n the same result as the LassoLars implementation available\n in R (lars library). :issue:`7849` by :user:`Jair Montoya Martinez `.\n\n- Fixed a bug in `linear_model.RandomizedLasso`,\n :class:`linear_model.Lars`, :class:`linear_model.LassoLars`,\n :class:`linear_model.LarsCV` and :class:`linear_model.LassoLarsCV`,\n where the parameter ``precompute`` was not used consistently across\n classes, and some values proposed in the docstring could raise errors.\n :issue:`5359` by `Tom Dupre la Tour`_.\n\n- Fix inconsistent results between :class:`linear_model.RidgeCV` and\n :class:`linear_model.Ridge` when using ``normalize=True``. :issue:`9302`\n by `Alexandre Gramfort`_.\n\n- Fix a bug where :func:`linear_model.LassoLars.fit` sometimes\n left ``coef_`` as a list, rather than an ndarray.\n :issue:`8160` by :user:`CJ Carey `.\n\n- Fix :func:`linear_model.BayesianRidge.fit` to return\n ridge parameter ``alpha_`` and ``lambda_`` consistent with calculated\n coefficients ``coef_`` and ``intercept_``.\n :issue:`8224` by :user:`Peter Gedeck `.\n\n- Fixed a bug in :class:`svm.OneClassSVM` where it returned floats instead of\n integer classes. :issue:`8676` by :user:`Vathsala Achar `.\n\n- Fix AIC/BIC criterion computation in :class:`linear_model.LassoLarsIC`.\n :issue:`9022` by `Alexandre Gramfort`_ and :user:`Mehmet Basbug `.\n\n- Fixed a memory leak in our LibLinear implementation. :issue:`9024` by\n :user:`Sergei Lebedev `\n\n- Fix bug where stratified CV splitters did not work with\n :class:`linear_model.LassoCV`. :issue:`8973` by\n :user:`Paulo Haddad `.\n\n- Fixed a bug in :class:`gaussian_process.GaussianProcessRegressor`\n when the standard deviation and covariance predicted without fit\n would fail with a unmeaningful error by default.\n :issue:`6573` by :user:`Quazi Marufur Rahman ` and\n `Manoj Kumar`_.\n\nOther predictors\n\n- Fix `semi_supervised.BaseLabelPropagation` to correctly implement\n ``LabelPropagation`` and ``LabelSpreading`` as done in the referenced\n papers. :issue:`9239`\n by :user:`Andre Ambrosio Boechat `, :user:`Utkarsh Upadhyay\n `, and `Joel Nothman`_.\n\nDecomposition, manifold learning and clustering\n\n- Fixed the implementation of :class:`manifold.TSNE`:\n- ``early_exageration`` parameter had no effect and is now used for the\n first 250 optimization iterations.\n- Fixed the ``AssertionError: Tree consistency failed`` exception\n reported in :issue:`8992`.\n- Improve the learning schedule to match the one from the reference\n implementation `lvdmaaten/bhtsne `_.\n by :user:`Thomas Moreau ` and `Olivier Grisel`_.\n\n- Fix a bug in :class:`decomposition.LatentDirichletAllocation`\n where the ``perplexity`` method was returning incorrect results because\n the ``transform`` method returns normalized document topic distributions\n as of version 0.18. :issue:`7954` by :user:`Gary Foreman `.\n\n- Fix output shape and bugs with n_jobs > 1 in\n :class:`decomposition.SparseCoder` transform and\n :func:`decomposition.sparse_encode`\n for one-dimensional data and one component.\n This also impacts the output shape of :class:`decomposition.DictionaryLearning`.\n :issue:`8086` by `Andreas M\u00fcller`_.\n\n- Fixed the implementation of ``explained_variance_``\n in :class:`decomposition.PCA`,\n `decomposition.RandomizedPCA` and\n :class:`decomposition.IncrementalPCA`.\n :issue:`9105` by `Hanmin Qin `_.\n\n- Fixed the implementation of ``noise_variance_`` in :class:`decomposition.PCA`.\n :issue:`9108` by `Hanmin Qin `_.\n\n- Fixed a bug where :class:`cluster.DBSCAN` gives incorrect\n result when input is a precomputed sparse matrix with initial\n rows all zero. :issue:`8306` by :user:`Akshay Gupta `\n\n- Fix a bug regarding fitting :class:`cluster.KMeans` with a sparse\n array X and initial centroids, where X's means were unnecessarily being\n subtracted from the centroids. :issue:`7872` by :user:`Josh Karnofsky `.\n\n- Fixes to the input validation in :class:`covariance.EllipticEnvelope`.\n :issue:`8086` by `Andreas M\u00fcller`_.\n\n- Fixed a bug in :class:`covariance.MinCovDet` where inputting data\n that produced a singular covariance matrix would cause the helper method\n ``_c_step`` to throw an exception.\n :issue:`3367` by :user:`Jeremy Steward `\n\n- Fixed a bug in :class:`manifold.TSNE` affecting convergence of the\n gradient descent. :issue:`8768` by :user:`David DeTomaso `.\n\n- Fixed a bug in :class:`manifold.TSNE` where it stored the incorrect\n ``kl_divergence_``. :issue:`6507` by :user:`Sebastian Saeger `.\n\n- Fixed improper scaling in :class:`cross_decomposition.PLSRegression`\n with ``scale=True``. :issue:`7819` by :user:`jayzed82 `.\n\n- :class:`cluster.SpectralCoclustering` and\n :class:`cluster.SpectralBiclustering` ``fit`` method conforms\n with API by accepting ``y`` and returning the object. :issue:`6126`,\n :issue:`7814` by :user:`Laurent Direr ` and :user:`Maniteja\n Nandana `.\n\n- Fix bug where :mod:`sklearn.mixture` ``sample`` methods did not return as many\n samples as requested. :issue:`7702` by :user:`Levi John Wolf `.\n\n- Fixed the shrinkage implementation in :class:`neighbors.NearestCentroid`.\n :issue:`9219` by `Hanmin Qin `_.\n\nPreprocessing and feature selection\n\n- For sparse matrices, :func:`preprocessing.normalize` with ``return_norm=True``\n will now raise a ``NotImplementedError`` with 'l1' or 'l2' norm and with\n norm 'max' the norms returned will be the same as for dense matrices.\n :issue:`7771` by `Ang Lu `_.\n\n- Fix a bug where :class:`feature_selection.SelectFdr` did not\n exactly implement Benjamini-Hochberg procedure. It formerly may have\n selected fewer features than it should.\n :issue:`7490` by :user:`Peng Meng `.\n\n- Fixed a bug where `linear_model.RandomizedLasso` and\n `linear_model.RandomizedLogisticRegression` breaks for\n sparse input. :issue:`8259` by :user:`Aman Dalmia `.\n\n- Fix a bug where :class:`feature_extraction.FeatureHasher`\n mandatorily applied a sparse random projection to the hashed features,\n preventing the use of\n :class:`feature_extraction.text.HashingVectorizer` in a\n pipeline with :class:`feature_extraction.text.TfidfTransformer`.\n :issue:`7565` by :user:`Roman Yurchak `.\n\n- Fix a bug where :class:`feature_selection.mutual_info_regression` did not\n correctly use ``n_neighbors``. :issue:`8181` by :user:`Guillaume Lemaitre\n `.\n\nModel evaluation and meta-estimators\n\n- Fixed a bug where `model_selection.BaseSearchCV.inverse_transform`\n returns ``self.best_estimator_.transform()`` instead of\n ``self.best_estimator_.inverse_transform()``.\n :issue:`8344` by :user:`Akshay Gupta ` and :user:`Rasmus Eriksson `.\n\n- Added ``classes_`` attribute to :class:`model_selection.GridSearchCV`,\n :class:`model_selection.RandomizedSearchCV`, `grid_search.GridSearchCV`,\n and `grid_search.RandomizedSearchCV` that matches the ``classes_``\n attribute of ``best_estimator_``. :issue:`7661` and :issue:`8295`\n by :user:`Alyssa Batula `, :user:`Dylan Werner-Meier `,\n and :user:`Stephen Hoover `.\n\n- Fixed a bug where :func:`model_selection.validation_curve`\n reused the same estimator for each parameter value.\n :issue:`7365` by :user:`Aleksandr Sandrovskii `.\n\n- :func:`model_selection.permutation_test_score` now works with Pandas\n types. :issue:`5697` by :user:`Stijn Tonk `.\n\n- Several fixes to input validation in\n :class:`multiclass.OutputCodeClassifier`\n :issue:`8086` by `Andreas M\u00fcller`_.\n\n- :class:`multiclass.OneVsOneClassifier`'s ``partial_fit`` now ensures all\n classes are provided up-front. :issue:`6250` by\n :user:`Asish Panda `.\n\n- Fix :func:`multioutput.MultiOutputClassifier.predict_proba` to return a\n list of 2d arrays, rather than a 3d array. In the case where different\n target columns had different numbers of classes, a ``ValueError`` would be\n raised on trying to stack matrices with different dimensions.\n :issue:`8093` by :user:`Peter Bull `.\n\n- Cross validation now works with Pandas datatypes that have a\n read-only index. :issue:`9507` by `Loic Esteve`_.\n\nMetrics\n\n- :func:`metrics.average_precision_score` no longer linearly\n interpolates between operating points, and instead weighs precisions\n by the change in recall since the last operating point, as per the\n `Wikipedia entry `_.\n (`#7356 `_). By\n :user:`Nick Dingwall ` and `Gael Varoquaux`_.\n\n- Fix a bug in `metrics.classification._check_targets`\n which would return ``'binary'`` if ``y_true`` and ``y_pred`` were\n both ``'binary'`` but the union of ``y_true`` and ``y_pred`` was\n ``'multiclass'``. :issue:`8377` by `Loic Esteve`_.\n\n- Fixed an integer overflow bug in :func:`metrics.confusion_matrix` and\n hence :func:`metrics.cohen_kappa_score`. :issue:`8354`, :issue:`7929`\n by `Joel Nothman`_ and :user:`Jon Crall `.\n\n- Fixed passing of ``gamma`` parameter to the ``chi2`` kernel in\n :func:`metrics.pairwise.pairwise_kernels` :issue:`5211` by\n :user:`Nick Rhinehart `,\n :user:`Saurabh Bansod ` and `Andreas M\u00fcller`_.\n\nMiscellaneous\n\n- Fixed a bug when :func:`datasets.make_classification` fails\n when generating more than 30 features. :issue:`8159` by\n :user:`Herilalaina Rakotoarison `.\n\n- Fixed a bug where :func:`datasets.make_moons` gives an\n incorrect result when ``n_samples`` is odd.\n :issue:`8198` by :user:`Josh Levy `.\n\n- Some ``fetch_`` functions in :mod:`sklearn.datasets` were ignoring the\n ``download_if_missing`` keyword. :issue:`7944` by :user:`Ralf Gommers `.\n\n- Fix estimators to accept a ``sample_weight`` parameter of type\n ``pandas.Series`` in their ``fit`` function. :issue:`7825` by\n `Kathleen Chen`_.\n\n- Fix a bug in cases where ``numpy.cumsum`` may be numerically unstable,\n raising an exception if instability is identified. :issue:`7376` and\n :issue:`7331` by `Joel Nothman`_ and :user:`yangarbiter`.\n\n- Fix a bug where `base.BaseEstimator.__getstate__`\n obstructed pickling customizations of child-classes, when used in a\n multiple inheritance context.\n :issue:`8316` by :user:`Holger Peters `.\n\n- Update Sphinx-Gallery from 0.1.4 to 0.1.7 for resolving links in\n documentation build with Sphinx>1.5 :issue:`8010`, :issue:`7986` by\n :user:`Oscar Najera `\n\n- Add ``data_home`` parameter to :func:`sklearn.datasets.fetch_kddcup99`.\n :issue:`9289` by `Loic Esteve`_.\n\n- Fix dataset loaders using Python 3 version of makedirs to also work in\n Python 2. :issue:`9284` by :user:`Sebastin Santy `.\n\n- Several minor issues were fixed with thanks to the alerts of\n `lgtm.com `_. :issue:`9278` by :user:`Jean Helie `,\n among others.\n\nAPI changes summary\n-------------------\n\nTrees and ensembles\n\n- Gradient boosting base models are no longer estimators. By `Andreas M\u00fcller`_.\n\n- All tree based estimators now accept a ``min_impurity_decrease``\n parameter in lieu of the ``min_impurity_split``, which is now deprecated.\n The ``min_impurity_decrease`` helps stop splitting the nodes in which\n the weighted impurity decrease from splitting is no longer at least\n ``min_impurity_decrease``. :issue:`8449` by `Raghav RV`_.\n\nLinear, kernelized and related models\n\n- ``n_iter`` parameter is deprecated in :class:`linear_model.SGDClassifier`,\n :class:`linear_model.SGDRegressor`,\n :class:`linear_model.PassiveAggressiveClassifier`,\n :class:`linear_model.PassiveAggressiveRegressor` and\n :class:`linear_model.Perceptron`. By `Tom Dupre la Tour`_.\n\nOther predictors\n\n- `neighbors.LSHForest` has been deprecated and will be\n removed in 0.21 due to poor performance.\n :issue:`9078` by :user:`Laurent Direr `.\n\n- :class:`neighbors.NearestCentroid` no longer purports to support\n ``metric='precomputed'`` which now raises an error. :issue:`8515` by\n :user:`Sergul Aydore `.\n\n- The ``alpha`` parameter of :class:`semi_supervised.LabelPropagation` now\n has no effect and is deprecated to be removed in 0.21. :issue:`9239`\n by :user:`Andre Ambrosio Boechat `, :user:`Utkarsh Upadhyay\n `, and `Joel Nothman`_.\n\nDecomposition, manifold learning and clustering\n\n- Deprecate the ``doc_topic_distr`` argument of the ``perplexity`` method\n in :class:`decomposition.LatentDirichletAllocation` because the\n user no longer has access to the unnormalized document topic distribution\n needed for the perplexity calculation. :issue:`7954` by\n :user:`Gary Foreman `.\n\n- The ``n_topics`` parameter of :class:`decomposition.LatentDirichletAllocation`\n has been renamed to ``n_components`` and will be removed in version 0.21.\n :issue:`8922` by :user:`Attractadore`.\n\n- :meth:`decomposition.SparsePCA.transform`'s ``ridge_alpha`` parameter is\n deprecated in preference for class parameter.\n :issue:`8137` by :user:`Naoya Kanai `.\n\n- :class:`cluster.DBSCAN` now has a ``metric_params`` parameter.\n :issue:`8139` by :user:`Naoya Kanai `.\n\nPreprocessing and feature selection\n\n- :class:`feature_selection.SelectFromModel` now has a ``partial_fit``\n method only if the underlying estimator does. By `Andreas M\u00fcller`_.\n\n- :class:`feature_selection.SelectFromModel` now validates the ``threshold``\n parameter and sets the ``threshold_`` attribute during the call to\n ``fit``, and no longer during the call to ``transform```. By `Andreas\n M\u00fcller`_.\n\n- The ``non_negative`` parameter in :class:`feature_extraction.FeatureHasher`\n has been deprecated, and replaced with a more principled alternative,\n ``alternate_sign``.\n :issue:`7565` by :user:`Roman Yurchak `.\n\n- `linear_model.RandomizedLogisticRegression`,\n and `linear_model.RandomizedLasso` have been deprecated and will\n be removed in version 0.21.\n :issue:`8995` by :user:`Ramana.S `.\n\nModel evaluation and meta-estimators\n\n- Deprecate the ``fit_params`` constructor input to the\n :class:`model_selection.GridSearchCV` and\n :class:`model_selection.RandomizedSearchCV` in favor\n of passing keyword parameters to the ``fit`` methods\n of those classes. Data-dependent parameters needed for model\n training should be passed as keyword arguments to ``fit``,\n and conforming to this convention will allow the hyperparameter\n selection classes to be used with tools such as\n :func:`model_selection.cross_val_predict`.\n :issue:`2879` by :user:`Stephen Hoover `.\n\n- In version 0.21, the default behavior of splitters that use the\n ``test_size`` and ``train_size`` parameter will change, such that\n specifying ``train_size`` alone will cause ``test_size`` to be the\n remainder. :issue:`7459` by :user:`Nelson Liu `.\n\n- :class:`multiclass.OneVsRestClassifier` now has ``partial_fit``,\n ``decision_function`` and ``predict_proba`` methods only when the\n underlying estimator does. :issue:`7812` by `Andreas M\u00fcller`_ and\n :user:`Mikhail Korobov `.\n\n- :class:`multiclass.OneVsRestClassifier` now has a ``partial_fit`` method\n only if the underlying estimator does. By `Andreas M\u00fcller`_.\n\n- The ``decision_function`` output shape for binary classification in\n :class:`multiclass.OneVsRestClassifier` and\n :class:`multiclass.OneVsOneClassifier` is now ``(n_samples,)`` to conform\n to scikit-learn conventions. :issue:`9100` by `Andreas M\u00fcller`_.\n\n- The :func:`multioutput.MultiOutputClassifier.predict_proba`\n function used to return a 3d array (``n_samples``, ``n_classes``,\n ``n_outputs``). In the case where different target columns had different\n numbers of classes, a ``ValueError`` would be raised on trying to stack\n matrices with different dimensions. This function now returns a list of\n arrays where the length of the list is ``n_outputs``, and each array is\n (``n_samples``, ``n_classes``) for that particular output.\n :issue:`8093` by :user:`Peter Bull `.\n\n- Replace attribute ``named_steps`` ``dict`` to :class:`utils.Bunch`\n in :class:`pipeline.Pipeline` to enable tab completion in interactive\n environment. In the case conflict value on ``named_steps`` and ``dict``\n attribute, ``dict`` behavior will be prioritized.\n :issue:`8481` by :user:`Herilalaina Rakotoarison `.\n\nMiscellaneous\n\n- Deprecate the ``y`` parameter in ``transform`` and ``inverse_transform``.\n The method should not accept ``y`` parameter, as it's used at the prediction time.\n :issue:`8174` by :user:`Tahar Zanouda `, `Alexandre Gramfort`_\n and `Raghav RV`_.\n\n- SciPy >= 0.13.3 and NumPy >= 1.8.2 are now the minimum supported versions\n for scikit-learn. The following backported functions in\n :mod:`sklearn.utils` have been removed or deprecated accordingly.\n :issue:`8854` and :issue:`8874` by :user:`Naoya Kanai `\n\n- The ``store_covariances`` and ``covariances_`` parameters of\n :class:`discriminant_analysis.QuadraticDiscriminantAnalysis`\n has been renamed to ``store_covariance`` and ``covariance_`` to be\n consistent with the corresponding parameter names of the\n :class:`discriminant_analysis.LinearDiscriminantAnalysis`. They will be\n removed in version 0.21. :issue:`7998` by :user:`Jiacheng `\nRemoved in 0.19:\n- ``utils.fixes.argpartition``\n - ``utils.fixes.array_equal``\n - ``utils.fixes.astype``\n - ``utils.fixes.bincount``\n - ``utils.fixes.expit``\n - ``utils.fixes.frombuffer_empty``\n - ``utils.fixes.in1d``\n - ``utils.fixes.norm``\n - ``utils.fixes.rankdata``\n - ``utils.fixes.safe_copy``\nDeprecated in 0.19, to be removed in 0.21:\n- ``utils.arpack.eigs``\n - ``utils.arpack.eigsh``\n - ``utils.arpack.svds``\n - ``utils.extmath.fast_dot``\n - ``utils.extmath.logsumexp``\n - ``utils.extmath.norm``\n - ``utils.extmath.pinvh``\n - ``utils.graph.graph_laplacian``\n - ``utils.random.choice``\n - ``utils.sparsetools.connected_components``\n - ``utils.stats.rankdata``\n\n- Estimators with both methods ``decision_function`` and ``predict_proba``\n are now required to have a monotonic relation between them. The\n method ``check_decision_proba_consistency`` has been added in\n **utils.estimator_checks** to check their consistency.\n :issue:`7578` by :user:`Shubham Bhardwaj `\n\n- All checks in ``utils.estimator_checks``, in particular\n :func:`utils.estimator_checks.check_estimator` now accept estimator\n instances. Most other checks do not accept\n estimator classes any more. :issue:`9019` by `Andreas M\u00fcller`_.\n\n- Ensure that estimators' attributes ending with ``_`` are not set\n in the constructor but only in the ``fit`` method. Most notably,\n ensemble estimators (deriving from `ensemble.BaseEnsemble`)\n now only have ``self.estimators_`` available after ``fit``.\n :issue:`7464` by `Lars Buitinck`_ and `Loic Esteve`_.\nCode and Documentation Contributors\n-----------------------------------\n\nThanks to everyone who has contributed to the maintenance and improvement of the\nproject since version 0.18, including:\n\nJoel Nothman, Loic Esteve, Andreas Mueller, Guillaume Lemaitre, Olivier Grisel,\nHanmin Qin, Raghav RV, Alexandre Gramfort, themrmax, Aman Dalmia, Gael\nVaroquaux, Naoya Kanai, Tom Dupr\u00e9 la Tour, Rishikesh, Nelson Liu, Taehoon Lee,\nNelle Varoquaux, Aashil, Mikhail Korobov, Sebastin Santy, Joan Massich, Roman\nYurchak, RAKOTOARISON Herilalaina, Thierry Guillemot, Alexandre Abadie, Carol\nWilling, Balakumaran Manoharan, Josh Karnofsky, Vlad Niculae, Utkarsh Upadhyay,\nDmitry Petrov, Minghui Liu, Srivatsan, Vincent Pham, Albert Thomas, Jake\nVanderPlas, Attractadore, JC Liu, alexandercbooth, chkoar, \u00d3scar N\u00e1jera,\nAarshay Jain, Kyle Gilliam, Ramana Subramanyam, CJ Carey, Clement Joudet, David\nRobles, He Chen, Joris Van den Bossche, Karan Desai, Katie Luangkote, Leland\nMcInnes, Maniteja Nandana, Michele Lacchia, Sergei Lebedev, Shubham Bhardwaj,\nakshay0724, omtcyfz, rickiepark, waterponey, Vathsala Achar, jbDelafosse, Ralf\nGommers, Ekaterina Krivich, Vivek Kumar, Ishank Gulati, Dave Elliott, ldirer,\nReiichiro Nakano, Levi John Wolf, Mathieu Blondel, Sid Kapur, Dougal J.\nSutherland, midinas, mikebenfield, Sourav Singh, Aseem Bansal, Ibraim Ganiev,\nStephen Hoover, AishwaryaRK, Steven C. Howell, Gary Foreman, Neeraj Gangwar,\nTahar, Jon Crall, dokato, Kathy Chen, ferria, Thomas Moreau, Charlie Brummitt,\nNicolas Goix, Adam Kleczewski, Sam Shleifer, Nikita Singh, Basil Beirouti,\nGiorgio Patrini, Manoj Kumar, Rafael Possas, James Bourbeau, James A. Bednar,\nJanine Harper, Jaye, Jean Helie, Jeremy Steward, Artsiom, John Wei, Jonathan\nLIgo, Jonathan Rahn, seanpwilliams, Arthur Mensch, Josh Levy, Julian Kuhlmann,\nJulien Aubert, J\u00f6rn Hees, Kai, shivamgargsya, Kat Hempstalk, Kaushik\nLakshmikanth, Kennedy, Kenneth Lyons, Kenneth Myers, Kevin Yap, Kirill Bobyrev,\nKonstantin Podshumok, Arthur Imbert, Lee Murray, toastedcornflakes, Lera, Li\nLi, Arthur Douillard, Mainak Jas, tobycheese, Manraj Singh, Manvendra Singh,\nMarc Meketon, MarcoFalke, Matthew Brett, Matthias Gilch, Mehul Ahuja, Melanie\nGoetz, Meng, Peng, Michael Dezube, Michal Baumgartner, vibrantabhi19, Artem\nGolubin, Milen Paskov, Antonin Carette, Morikko, MrMjauh, NALEPA Emmanuel,\nNamiya, Antoine Wendlinger, Narine Kokhlikyan, NarineK, Nate Guerin, Angus\nWilliams, Ang Lu, Nicole Vavrova, Nitish Pandey, Okhlopkov Daniil Olegovich,\nAndy Craze, Om Prakash, Parminder Singh, Patrick Carlson, Patrick Pei, Paul\nGanssle, Paulo Haddad, Pawe\u0142 Lorek, Peng Yu, Pete Bachant, Peter Bull, Peter\nCsizsek, Peter Wang, Pieter Arthur de Jong, Ping-Yao, Chang, Preston Parry,\nPuneet Mathur, Quentin Hibon, Andrew Smith, Andrew Jackson, 1kastner, Rameshwar\nBhaskaran, Rebecca Bilbro, Remi Rampin, Andrea Esuli, Rob Hall, Robert\nBradshaw, Romain Brault, Aman Pratik, Ruifeng Zheng, Russell Smith, Sachin\nAgarwal, Sailesh Choyal, Samson Tan, Samu\u00ebl Weber, Sarah Brown, Sebastian\nP\u00f6lsterl, Sebastian Raschka, Sebastian Saeger, Alyssa Batula, Abhyuday Pratap\nSingh, Sergey Feldman, Sergul Aydore, Sharan Yalburgi, willduan, Siddharth\nGupta, Sri Krishna, Almer, Stijn Tonk, Allen Riddell, Theofilos Papapanagiotou,\nAlison, Alexis Mignon, Tommy Boucher, Tommy L\u00f6fstedt, Toshihiro Kamishima,\nTyler Folkman, Tyler Lanigan, Alexander Junge, Varun Shenoy, Victor Poughon,\nVilhelm von Ehrenheim, Aleksandr Sandrovskii, Alan Yee, Vlasios Vasileiou,\nWarut Vijitbenjaronk, Yang Zhang, Yaroslav Halchenko, Yichuan Liu, Yuichi\nFujikawa, affanv14, aivision2020, xor, andreh7, brady salz, campustrampus,\nAgamemnon Krasoulis, ditenberg, elena-sharova, filipj8, fukatani, gedeck,\nguiniol, guoci, hakaa1, hongkahjun, i-am-xhy, jakirkham, jaroslaw-weber,\njayzed82, jeroko, jmontoyam, jonathan.striebel, josephsalmon, jschendel,\nleereeves, martin-hahn, mathurinm, mehak-sachdeva, mlewis1729, mlliou112,\nmthorrell, ndingwall, nuffe, yangarbiter, plagree, pldtc325, Breno Freitas,\nBrett Olsen, Brian A. Alfano, Brian Burns, polmauri, Brandon Carter, Charlton\nAustin, Chayant T15h, Chinmaya Pancholi, Christian Danielsen, Chung Yen,\nChyi-Kwei Yau, pravarmahajan, DOHMATOB Elvis, Daniel LeJeune, Daniel Hnyk,\nDarius Morawiec, David DeTomaso, David Gasquez, David Haberth\u00fcr, David\nHeryanto, David Kirkby, David Nicholson, rashchedrin, Deborah Gertrude Digges,\nDenis Engemann, Devansh D, Dickson, Bob Baxley, Don86, E. Lynch-Klarup, Ed\nRogers, Elizabeth Ferriss, Ellen-Co2, Fabian Egli, Fang-Chieh Chou, Bing Tian\nDai, Greg Stupp, Grzegorz Szpak, Bertrand Thirion, Hadrien Bertrand, Harizo\nRajaona, zxcvbnius, Henry Lin, Holger Peters, Icyblade Dai, Igor\nAndriushchenko, Ilya, Isaac Laughlin, Iv\u00e1n Vall\u00e9s, Aur\u00e9lien Bellet, JPFrancoia,\nJacob Schreiber, Asish Mahapatra\n"
    },
    {
      "turn_idx": 20,
      "action": "tavily_search({\"query\": \"site:github.com scikit-learn 9239 BaseLabelPropagation\"})",
      "explanation": "",
      "observation": "{'query': 'scikit-learn 9239 BaseLabelPropagation', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': 'Improve docs of BaseLabelPropagation.transduction_ \u00b7 scikit ...', 'url': 'https://github.com/scikit-learn/scikit-learn/actions/runs/3498417168/usage', 'content': 'scikit-learn : machine learning in Python. Contribute to scikit-learn / scikit-learn development by creating an account on GitHub.', 'score': 0.629889, 'raw_content': None}, {'title': 'scikit-learn.github.io/0.16/modules/label_propagation.html at ...', 'url': 'https://github.com/scikit-learn/scikit-learn.github.io/blob/main/0.16/modules/label_propagation.html', 'content': 'Scikit-learn website hosted by github. Contribute to scikit-learn / scikit-learn .github.io development by creating an account on GitHub.', 'score': 0.59401023, 'raw_content': None}, {'title': 'FIX nan bug in BaseLabelPropagation by ThuWangzw \u00b7 Pull Request...', 'url': 'https://github.com/scikit-learn/scikit-learn/pull/19271', 'content': 'rth merged 13 commits into scikit-learn:main from ThuWangzw:label\\\\_propagation\\\\_nan\\\\_bug. rth merged 13 commits into scikit-learn:main from ThuWangzw:label\\\\_propagation\\\\_nan\\\\_bug. `Bug fix: Label propagation sometimes produces label_distributions tha\u2026`. github-actions bot added the  module:semi\\\\_supervised  label Jan 25, 2021. `add label propagation nan test`. This can be done with something like:  ``` @pytest.mark.parametrize(\"label_propagation_class\", pytest mark parametrize \"label_propagation_class\" [label_propagation.LabelSpreading, label_propagation LabelSpreading label_propagation.LabelPropagation]) label_propagation LabelPropagation def test_label_propagation_non_zero_normalizer label_propagation_class # check that we don\\'t divide by zero in case of null normalizer# non-regression test for# https://github.com/scikit-learn/scikit-learn/pull/15946# https://github.com/scikit-learn/scikit-learn/pull/19271 X = np array100.100.100.100.0.0.0.0. sklearn/semi\\\\_supervised/tests/test\\\\_label\\\\_propagation.py  Outdated   Show resolved Hide resolved. sklearn/semi\\\\_supervised/tests/test\\\\_label\\\\_propagation.py  Outdated   Show resolved Hide resolved. sklearn/semi\\\\_supervised/tests/test\\\\_label\\\\_propagation.py  Outdated   Show resolved Hide resolved. sklearn/semi\\\\_supervised/tests/test\\\\_label\\\\_propagation.py  Outdated   Show resolved Hide resolved. rth changed the title ~~Bug fix: nan bug in BaseLabelPropagation~~ FIX nan bug in BaseLabelPropagation Feb 1, 2021. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge.', 'score': 0.5240786, 'raw_content': None}, {'title': '[MRG+1] Fix semi_supervised by musically-ut \u00b7 Pull Request # 9239 ...', 'url': 'https://github.com/scikit-learn/scikit-learn/pull/9239', 'content': '------------------------------ You can view, comment on, or merge this pull request online at: #9239 Commit Summary - Files for my dev environment with Docker - Fixing label clamping (alpha=0 for hard clamping) - Deprecating alpha, fixing its value to zero - Correct way to deprecate alpha for LabelPropagation - Detailed info for LabelSpreading\\'s alpha parameter - Minor changes in the deprecation message - Improving \"deprecated\" doc string and raising DeprecationWarning - Using a local \"alpha\" in \"fit\" to deprecate LabelPropagation\\'s alpha - Removal of my development files - Using sphinx\\'s \"deprecated\" tag (jnothman\\'s suggestion) - Deprecation warning: stating that the alpha\\'s value will be ignored - Use \\\\_\\\\_init\\\\_\\\\_ with alpha=None - Merge branch \\'master\\' into lpalpha - Update what\\'s new - Merge pull request #2 from jnothman/lpalpha - Merge branch \\'master\\' into issue-5774 - Try fix RuntimeWarning in test\\\\_alpha\\\\_deprecation - DOC Indent deprecation details - DOC wording - Update docs - Change to the one true implementation.', 'score': 0.50490767, 'raw_content': None}, {'title': 'scikit-learn/examples/semi_supervised/plot_label_propagation ...', 'url': 'https://github.com/scikit-learn/scikit-learn/blob/main/examples/semi_supervised/plot_label_propagation_structure.py', 'content': 'scikit-learn : machine learning in Python. Contribute to scikit-learn / scikit-learn development by creating an account on GitHub.', 'score': 0.47913864, 'raw_content': None}], 'response_time': 4.02, 'request_id': 'a7ad79c8-cb7e-40fb-bb77-421bc5e81965'}"
    },
    {
      "turn_idx": 21,
      "action": "file_saver({\"file_path\": \"work_space/gaia_d0633230_20251221_193229_869271/info_scikit-learn July 2017 changelog_tavily.md\", \"content\": \"Title: Information from scikit-learn July 2017 changelog via Tavily\\n\\nSources:\\n- https://scikit-learn.org/stable/whats_new/v0.19.html\\n- https://scikit-learn.org/0.19/_sources/whats_new.rst.txt\\n- https://scikit-learn.org/1.3/_sources/whats_new/v0.19.rst.txt\\n- https://github.com/scikit-learn/scikit-learn/pull/9239\\n- https://scikit-learn.org/0.19/_downloads/scikit-learn-docs.pdf\\n- https://www.scribd.com/document/773770354/scikit-learn-docs-1\\n\\nExtracted Content:\\n- From scikit-learn v0.19 stable whats_new (Version 0.19 \u2014 scikit-learn 1.8.0 documentation): Under Bug fixes \u2192 Other predictors: \\\"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 by Andre Ambrosio Boechat, Utkarsh Upadhyay, and Joel Nothman.\\\"\\n\\n- From scikit-learn 0.19 _sources/whats_new.rst.txt: Under Bug fixes \u2192 Other predictors: \\\"Fix :class:`semi_supervised.BaseLabelPropagation` to correctly implement ``LabelPropagation`` and ``LabelSpreading`` as done in the referenced papers. :issue:`9239` by :user:`Andre Ambrosio Boechat`, :user:`Utkarsh Upadhyay`, and `Joel Nothman`_.\\\"\\n\\n- From scikit-learn 1.3 _sources/whats_new/v0.19.rst.txt (archived source): Under Bug fixes \u2192 Other predictors: \\\"Fix :class:`semi_supervised.BaseLabelPropagation` to correctly implement ``LabelPropagation`` and ``LabelSpreading`` as done in the referenced papers. :issue:`9239` by :user:`Andre Ambrosio Boechat`, :user:`Utkarsh Upadhyay`, and `Joel Nothman`_. Also in API changes summary \u2192 Other predictors: \\\"The alpha parameter of :class:`semi_supervised.LabelPropagation` now has no effect and is deprecated to be removed in 0.21. :issue:`9239`.\\\"\\n\\n- From GitHub Pull Request #9239 (scikit-learn/scikit-learn): Title \\\"[MRG+1] Fix semi_supervised ...\\\": PR details mention fixing label clamping, deprecating alpha for LabelPropagation, updating what's new, and references to correctly implementing LabelPropagation and LabelSpreading per papers.\\n\\n- From scikit-learn 0.19 docs PDF and Scribd mirrors: In the \\\"Other predictors\\\" bug fixes section: \\\"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 ...\\\" (snippet consistent with the web sources).\\n\\nDirect Quotations:\\n- \\\"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 by Andre Ambrosio Boechat, Utkarsh Upadhyay, and Joel Nothman.\\\" \u2014 https://scikit-learn.org/stable/whats_new/v0.19.html\\n- \\\"Fix :class:`semi_supervised.BaseLabelPropagation` to correctly implement ``LabelPropagation`` and ``LabelSpreading`` as done in the referenced papers. :issue:`9239`\\\" \u2014 https://scikit-learn.org/0.19/_sources/whats_new.rst.txt\\n- \\\"The alpha parameter of :class:`semi_supervised.LabelPropagation` now has no effect and is deprecated to be removed in 0.21. :issue:`9239`\\\" \u2014 https://scikit-learn.org/1.3/_sources/whats_new/v0.19.rst.txt\\n\", \"mode\": \"w\"})",
      "explanation": "",
      "observation": "Content successfully saved to work_space/gaia_d0633230_20251221_193229_869271/info_scikit-learn July 2017 changelog_tavily.md"
    },
    {
      "turn_idx": 22,
      "action": "mark_step({\"step_index\": 3, \"step_status\": \"completed\", \"step_notes\": \"Cross-verified the identified command name using multiple authoritative sources. The scikit-learn v0.19 (August 12, 2017) changelog under Bug fixes \u2192 Other predictors explicitly states: \\\"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 ...\\\". Secondary confirmation via the 0.19 _sources RST text, the 1.3 archived sources, and the GitHub PR #9239 corroborate this. Consolidated extraction saved at: work_space/gaia_d0633230_20251221_193229_869271/info_scikit-learn July 2017 changelog_tavily.md. Ready to finalize with the required name only.\"})",
      "explanation": "",
      "observation": "Step 3: step_status is completed, step_notes is Cross-verified the identified command name using multiple authoritative sources. The scikit-learn v0.19 (August 12, 2017) changelog under Bug fixes \u2192 Other predictors explicitly states: \"Fix semi_supervised.BaseLabelPropagation to correctly implement LabelPropagation and LabelSpreading as done in the referenced papers. #9239 ...\". Secondary confirmation via the 0.19 _sources RST text, the 1.3 archived sources, and the GitHub PR #9239 corroborate this. Consolidated extraction saved at: work_space/gaia_d0633230_20251221_193229_869271/info_scikit-learn July 2017 changelog_tavily.md. Ready to finalize with the required name only. "
    },
    {
      "turn_idx": 23,
      "action": "file_saver({\"content\": \"BaseLabelPropagation\", \"file_path\": \"work_space/gaia_d0633230_20251221_193229_869271/final_step4_answer.txt\", \"mode\": \"w\"})",
      "explanation": "Composed the final answer according to the required minimal format and template. Verified the command name from Step2 and Step3 evidence. Output saved for record: work_spacegaia_d0633230_20251221_193229_869271/final_step4_answer.txt",
      "observation": "Content successfully saved to work_space/gaia_d0633230_20251221_193229_869271/final_step4_answer.txt"
    },
    {
      "turn_idx": 24,
      "action": "mark_step({\"step_index\": 4, \"step_status\": \"completed\", \"step_notes\": \"Composed the final answer according to the required minimal format and template. Verified the command name from Step2 and Step3 evidence. Output saved for record: work_space/gaia_d0633230_20251221_193229_869271/final_step4_answer.txt\"})",
      "explanation": "",
      "observation": "Step 4: step_status is completed, step_notes is Composed the final answer according to the required minimal format and template. Verified the command name from Step2 and Step3 evidence. Output saved for record: work_space/gaia_d0633230_20251221_193229_869271/final_step4_answer.txt "
    },
    {
      "turn_idx": 25,
      "action": "FINAL_ANSWER: BaseLabelPropagation",
      "explanation": "Overview:\n- The task was completed successfully. The target item from the Scikit-Learn July 2017 changelog was identified and formatted per the strict output rules.\n\nWhat was done:\n- Parsed the question and established a compliance checklist to ensure the final answer met the formatting constraints (minimal words, no path, template ending).\n- Retrieved the official Scikit-Learn 0.19 release notes/changelog covering July 2017 content from primary documentation sources.\n- Located the Bug fixes \u2192 Other predictors section and extracted the specific entry mentioning the bug fix.\n- Cross-verified the command name via secondary sources, including GitHub PR references and archived RST documentation.\n- Composed a final response adhering to the minimal string, \u201cname only, not a path,\u201d requirement and the mandated final answer template.\n\nOutcome:\n- Correctly identified the \u201cother predictor base command\u201d that received a bug fix as BaseLabelPropagation and prepared the final answer in the required format.\n\nKey success factors:\n- Clear compliance checklist for formatting rules.\n- Use of authoritative primary sources (official release notes) and corroborating secondary sources (GitHub PR, archived docs).\n- Precise extraction of the class name without path, matching the question\u2019s constraint.\n- Verification step to ensure accuracy before finalizing.\n\nThoughts:\n- The \u201cOther predictors\u201d bug fix explicitly references semi_supervised.BaseLabelPropagation; since only the name was requested (not a path), the correct minimal answer is BaseLabelPropagation.\n\nFINAL ANSWER: BaseLabelPropagation",
      "observation": ""
    }
  ],
  "source": "gaia_level1",
  "ground_truth": "BaseLabelPropagation",
  "predicted": "BaseLabelPropagation",
  "qa_pairs": [
    {
      "question": "Which turn caused the agent to locate the specific bug fix entry under 'Other predictors'?",
      "answer": "Turn 10",
      "type": "A"
    },
    {
      "question": "If the agent had searched for 'July 2017' instead of 'Version 0.19', what critical version mapping would have been missed?",
      "answer": "That the July 2017 changelog corresponds to Version 0.19 released August 12, 2017",
      "type": "A"
    },
    {
      "question": "At turn 15, what specific verification query did the agent use to cross-check the bug fix?",
      "answer": "scikit-learn July 2017 changelog other predictors bug fix BaseLabelPropagation",
      "type": "A"
    },
    {
      "question": "Before extracting the bug fix name at turn 10, what essential document did the agent need to have retrieved?",
      "answer": "The scikit-learn Version 0.19 changelog with the Bug fixes section",
      "type": "B"
    },
    {
      "question": "What precondition was necessary before the agent could identify BaseLabelPropagation?",
      "answer": "Access to the 'Bug fixes, Other predictors' section of the Version 0.19 changelog",
      "type": "B"
    },
    {
      "question": "After finding the bug fix entry, what verification step was needed before providing the final answer?",
      "answer": "Confirming that 'BaseLabelPropagation' was the command name (not a path) and met output format requirements",
      "type": "B"
    },
    {
      "question": "At what turn did the agent complete the cross-verification of the command name?",
      "answer": "Turn 22",
      "type": "C"
    },
    {
      "question": "What is the last file saved and turn index before the final answer?",
      "answer": "work_spacegaia_d0633230_20251221_193229_869271/final_step4_answer.txt at turn 23",
      "type": "C"
    },
    {
      "question": "What was the exact release date of scikit-learn Version 0.19?",
      "answer": "August 12, 2017",
      "type": "C"
    },
    {
      "question": "What two related classes were listed under 'Changed models' that benefited from this bug fix?",
      "answer": "LabelPropagation and LabelSpreading",
      "type": "C"
    },
    {
      "question": "What were the key observations between turn 2 and turn 5 that allowed the agent to narrow the search from a general timeline to specific version documentation?",
      "answer": "Search results in turn 2 and turn 4 explicitly linked the July 2017 release to version 0.19.0. This mapping was the turning point that allowed the agent to move away from broad date-based queries to a targeted fetch of the 0.19 changelog URL.",
      "type": "D"
    },
    {
      "question": "Analyze why the initial search queries in turns 2 through 4 were insufficient for a final answer, necessitating the manual fetch in turn 5.",
      "answer": "The initial search snippets provided high-level summaries but failed to expose the granular 'Other predictors' subsection. Because the specific bug fix details were buried deep within the changelog, a full-page fetch of the HTML/RST source was required to see the actual class names.",
      "type": "D"
    }
  ]
}