{
    "Selected_candidate": {
        "pr_number": 11043,
        "pr_title": "[MRG+2] ENH Passthrough DataFrame in FunctionTransformer",
        "pr_body": "<!--\r\nThanks for contributing a pull request! Please ensure you have taken a look at\r\nthe contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md#pull-request-checklist\r\n-->\r\n\r\n#### Reference Issues/PRs\r\n<!--\r\nExample: Fixes #1234. See also #3456.\r\nPlease use keywords (e.g., Fixes) to create link to the issues or pull requests\r\nyou resolved, so that they will automatically be closed when your pull request\r\nis merged. See https://github.com/blog/1506-closing-issues-via-pull-requests\r\n-->\r\n\r\ncloses #10655 \r\n\r\n#### What does this implement/fix? Explain your changes.\r\n\r\nAdded the following option\r\n\r\n- [x] Raise FutureWarning that `validate=False` will be the default in the future.\r\n- [x] Convert list to array when `validate=False`.\r\n- [x] Make a what's new entry for the change of behaviour.\r\n\r\n#### Any other comments?\r\n\r\n\r\n<!--\r\nPlease be aware that we are a loose team of volunteers so patience is\r\nnecessary; assistance handling other issues is very welcome. We value\r\nall user contributions, no matter how minor they are. If we are slow to\r\nreview, either the pull request needs some benchmarking, tinkering,\r\nconvincing, etc. or more likely the reviewers are simply busy. In either\r\ncase, we ask for your understanding during the review process.\r\nFor more information, see our FAQ on this topic:\r\nhttp://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.\r\n\r\nThanks for contributing!\r\n-->\r\n",
        "issue_id": 10655,
        "issue_title": "FunctionTransformer should not convert DataFrames to arrays by default",
        "issue_body": "I would expect a common use of FunctionTransformer is to apply some function to a Pandas DataFrame, ideally using its own methods or accessors. As noted in #10648, it can be easy for users to miss that they need to set validate=False to pass through a DataFrame without converting it to a NumPy array. I think it would be more user-friendly to have `validate='array-or-frame'` by default, which would pass through DataFrames to the function, but otherwise convert its input to a 2d array. For strict backwards compatibility, the default should be changed through a deprecation cycle, warning whenever using the default validation means a DataFrame is currently converted to an array.\r\n\r\nDo others agree?",
        "issue_closed_at": "2018-07-10T09:48:04Z",
        "base_commit": "19bc7e8af6ec3468b6c7f4718a31cd5f508528cd",
        "changes": [
            {
                "file": "sklearn/preprocessing/_function_transformer.py",
                "type": "class",
                "name": "FunctionTransformer",
                "code": "class FunctionTransformer(BaseEstimator, TransformerMixin):\n    \"\"\"Constructs a transformer from an arbitrary callable.\n\n    A FunctionTransformer forwards its X (and optionally y) arguments to a\n    user-defined function or function object and returns the result of this\n    function. This is useful for stateless transformations such as taking the\n    log of frequencies, doing custom scaling, etc.\n\n    Note: If a lambda is used as the function, then the resulting\n    transformer will not be pickleable.\n\n    .. versionadded:: 0.17\n\n    Read more in the :ref:`User Guide <function_transformer>`.\n\n    Parameters\n    ----------\n    func : callable, optional default=None\n        The callable to use for the transformation. This will be passed\n        the same arguments as transform, with args and kwargs forwarded.\n        If func is None, then func will be the identity function.\n\n    inverse_func : callable, optional default=None\n        The callable to use for the inverse transformation. This will be\n        passed the same arguments as inverse transform, with args and\n        kwargs forwarded. If inverse_func is None, then inverse_func\n        will be the identity function.\n\n    validate : bool, optional default=True\n        Indicate that the input X array should be checked before calling\n        func. If validate is false, there will be no input validation.\n        If it is true, then X will be converted to a 2-dimensional NumPy\n        array or sparse matrix. If this conversion is not possible or X\n        contains NaN or infinity, an exception is raised.\n\n    accept_sparse : boolean, optional\n        Indicate that func accepts a sparse matrix as input. If validate is\n        False, this has no effect. Otherwise, if accept_sparse is false,\n        sparse matrix inputs will cause an exception to be raised.\n\n    pass_y : bool, optional default=False\n        Indicate that transform should forward the y argument to the\n        inner callable.\n\n        .. deprecated::0.19\n\n    check_inverse : bool, default=True\n       Whether to check that or ``func`` followed by ``inverse_func`` leads to\n       the original inputs. It can be used for a sanity check, raising a\n       warning when the condition is not fulfilled.\n\n       .. versionadded:: 0.20\n\n    kw_args : dict, optional\n        Dictionary of additional keyword arguments to pass to func.\n\n    inv_kw_args : dict, optional\n        Dictionary of additional keyword arguments to pass to inverse_func.\n\n    \"\"\"\n    def __init__(self, func=None, inverse_func=None, validate=True,\n                 accept_sparse=False, pass_y='deprecated', check_inverse=True,\n                 kw_args=None, inv_kw_args=None):\n        self.func = func\n        self.inverse_func = inverse_func\n        self.validate = validate\n        self.accept_sparse = accept_sparse\n        self.pass_y = pass_y\n        self.check_inverse = check_inverse\n        self.kw_args = kw_args\n        self.inv_kw_args = inv_kw_args\n\n    def _check_inverse_transform(self, X):\n        \"\"\"Check that func and inverse_func are the inverse.\"\"\"\n        idx_selected = slice(None, None, max(1, X.shape[0] // 100))\n        try:\n            assert_allclose_dense_sparse(\n                X[idx_selected],\n                self.inverse_transform(self.transform(X[idx_selected])))\n        except AssertionError:\n            warnings.warn(\"The provided functions are not strictly\"\n                          \" inverse of each other. If you are sure you\"\n                          \" want to proceed regardless, set\"\n                          \" 'check_inverse=False'.\", UserWarning)\n\n    def fit(self, X, y=None):\n        \"\"\"Fit transformer by checking X.\n\n        If ``validate`` is ``True``, ``X`` will be checked.\n\n        Parameters\n        ----------\n        X : array-like, shape (n_samples, n_features)\n            Input array.\n\n        Returns\n        -------\n        self\n        \"\"\"\n        if self.validate:\n            X = check_array(X, self.accept_sparse)\n        if (self.check_inverse and not (self.func is None or\n                                        self.inverse_func is None)):\n            self._check_inverse_transform(X)\n        return self\n\n    def transform(self, X, y='deprecated'):\n        \"\"\"Transform X using the forward function.\n\n        Parameters\n        ----------\n        X : array-like, shape (n_samples, n_features)\n            Input array.\n\n        y : (ignored)\n            .. deprecated::0.19\n\n        Returns\n        -------\n        X_out : array-like, shape (n_samples, n_features)\n            Transformed input.\n        \"\"\"\n        if not isinstance(y, string_types) or y != 'deprecated':\n            warnings.warn(\"The parameter y on transform() is \"\n                          \"deprecated since 0.19 and will be removed in 0.21\",\n                          DeprecationWarning)\n\n        return self._transform(X, y=y, func=self.func, kw_args=self.kw_args)\n\n    def inverse_transform(self, X, y='deprecated'):\n        \"\"\"Transform X using the inverse function.\n\n        Parameters\n        ----------\n        X : array-like, shape (n_samples, n_features)\n            Input array.\n\n        y : (ignored)\n            .. deprecated::0.19\n\n        Returns\n        -------\n        X_out : array-like, shape (n_samples, n_features)\n            Transformed input.\n        \"\"\"\n        if not isinstance(y, string_types) or y != 'deprecated':\n            warnings.warn(\"The parameter y on inverse_transform() is \"\n                          \"deprecated since 0.19 and will be removed in 0.21\",\n                          DeprecationWarning)\n        return self._transform(X, y=y, func=self.inverse_func,\n                               kw_args=self.inv_kw_args)\n\n    def _transform(self, X, y=None, func=None, kw_args=None):\n        if self.validate:\n            X = check_array(X, self.accept_sparse)\n\n        if func is None:\n            func = _identity\n\n        if (not isinstance(self.pass_y, string_types) or\n                self.pass_y != 'deprecated'):\n            # We do this to know if pass_y was set to False / True\n            pass_y = self.pass_y\n            warnings.warn(\"The parameter pass_y is deprecated since 0.19 and \"\n                          \"will be removed in 0.21\", DeprecationWarning)\n        else:\n            pass_y = False\n\n        return func(X, *((y,) if pass_y else ()),\n                    **(kw_args if kw_args else {}))"
            },
            {
                "file": "sklearn/preprocessing/_function_transformer.py",
                "type": "class",
                "name": "FunctionTransformer",
                "code": "class FunctionTransformer(BaseEstimator, TransformerMixin):\n    \"\"\"Constructs a transformer from an arbitrary callable.\n\n    A FunctionTransformer forwards its X (and optionally y) arguments to a\n    user-defined function or function object and returns the result of this\n    function. This is useful for stateless transformations such as taking the\n    log of frequencies, doing custom scaling, etc.\n\n    Note: If a lambda is used as the function, then the resulting\n    transformer will not be pickleable.\n\n    .. versionadded:: 0.17\n\n    Read more in the :ref:`User Guide <function_transformer>`.\n\n    Parameters\n    ----------\n    func : callable, optional default=None\n        The callable to use for the transformation. This will be passed\n        the same arguments as transform, with args and kwargs forwarded.\n        If func is None, then func will be the identity function.\n\n    inverse_func : callable, optional default=None\n        The callable to use for the inverse transformation. This will be\n        passed the same arguments as inverse transform, with args and\n        kwargs forwarded. If inverse_func is None, then inverse_func\n        will be the identity function.\n\n    validate : bool, optional default=True\n        Indicate that the input X array should be checked before calling\n        func. If validate is false, there will be no input validation.\n        If it is true, then X will be converted to a 2-dimensional NumPy\n        array or sparse matrix. If this conversion is not possible or X\n        contains NaN or infinity, an exception is raised.\n\n    accept_sparse : boolean, optional\n        Indicate that func accepts a sparse matrix as input. If validate is\n        False, this has no effect. Otherwise, if accept_sparse is false,\n        sparse matrix inputs will cause an exception to be raised.\n\n    pass_y : bool, optional default=False\n        Indicate that transform should forward the y argument to the\n        inner callable.\n\n        .. deprecated::0.19\n\n    check_inverse : bool, default=True\n       Whether to check that or ``func`` followed by ``inverse_func`` leads to\n       the original inputs. It can be used for a sanity check, raising a\n       warning when the condition is not fulfilled.\n\n       .. versionadded:: 0.20\n\n    kw_args : dict, optional\n        Dictionary of additional keyword arguments to pass to func.\n\n    inv_kw_args : dict, optional\n        Dictionary of additional keyword arguments to pass to inverse_func.\n\n    \"\"\"\n    def __init__(self, func=None, inverse_func=None, validate=True,\n                 accept_sparse=False, pass_y='deprecated', check_inverse=True,\n                 kw_args=None, inv_kw_args=None):\n        self.func = func\n        self.inverse_func = inverse_func\n        self.validate = validate\n        self.accept_sparse = accept_sparse\n        self.pass_y = pass_y\n        self.check_inverse = check_inverse\n        self.kw_args = kw_args\n        self.inv_kw_args = inv_kw_args\n\n    def _check_inverse_transform(self, X):\n        \"\"\"Check that func and inverse_func are the inverse.\"\"\"\n        idx_selected = slice(None, None, max(1, X.shape[0] // 100))\n        try:\n            assert_allclose_dense_sparse(\n                X[idx_selected],\n                self.inverse_transform(self.transform(X[idx_selected])))\n        except AssertionError:\n            warnings.warn(\"The provided functions are not strictly\"\n                          \" inverse of each other. If you are sure you\"\n                          \" want to proceed regardless, set\"\n                          \" 'check_inverse=False'.\", UserWarning)\n\n    def fit(self, X, y=None):\n        \"\"\"Fit transformer by checking X.\n\n        If ``validate`` is ``True``, ``X`` will be checked.\n\n        Parameters\n        ----------\n        X : array-like, shape (n_samples, n_features)\n            Input array.\n\n        Returns\n        -------\n        self\n        \"\"\"\n        if self.validate:\n            X = check_array(X, self.accept_sparse)\n        if (self.check_inverse and not (self.func is None or\n                                        self.inverse_func is None)):\n            self._check_inverse_transform(X)\n        return self\n\n    def transform(self, X, y='deprecated'):\n        \"\"\"Transform X using the forward function.\n\n        Parameters\n        ----------\n        X : array-like, shape (n_samples, n_features)\n            Input array.\n\n        y : (ignored)\n            .. deprecated::0.19\n\n        Returns\n        -------\n        X_out : array-like, shape (n_samples, n_features)\n            Transformed input.\n        \"\"\"\n        if not isinstance(y, string_types) or y != 'deprecated':\n            warnings.warn(\"The parameter y on transform() is \"\n                          \"deprecated since 0.19 and will be removed in 0.21\",\n                          DeprecationWarning)\n\n        return self._transform(X, y=y, func=self.func, kw_args=self.kw_args)\n\n    def inverse_transform(self, X, y='deprecated'):\n        \"\"\"Transform X using the inverse function.\n\n        Parameters\n        ----------\n        X : array-like, shape (n_samples, n_features)\n            Input array.\n\n        y : (ignored)\n            .. deprecated::0.19\n\n        Returns\n        -------\n        X_out : array-like, shape (n_samples, n_features)\n            Transformed input.\n        \"\"\"\n        if not isinstance(y, string_types) or y != 'deprecated':\n            warnings.warn(\"The parameter y on inverse_transform() is \"\n                          \"deprecated since 0.19 and will be removed in 0.21\",\n                          DeprecationWarning)\n        return self._transform(X, y=y, func=self.inverse_func,\n                               kw_args=self.inv_kw_args)\n\n    def _transform(self, X, y=None, func=None, kw_args=None):\n        if self.validate:\n            X = check_array(X, self.accept_sparse)\n\n        if func is None:\n            func = _identity\n\n        if (not isinstance(self.pass_y, string_types) or\n                self.pass_y != 'deprecated'):\n            # We do this to know if pass_y was set to False / True\n            pass_y = self.pass_y\n            warnings.warn(\"The parameter pass_y is deprecated since 0.19 and \"\n                          \"will be removed in 0.21\", DeprecationWarning)\n        else:\n            pass_y = False\n\n        return func(X, *((y,) if pass_y else ()),\n                    **(kw_args if kw_args else {}))"
            },
            {
                "file": "sklearn/preprocessing/_function_transformer.py",
                "type": "function",
                "name": "__init__",
                "class_name": "FunctionTransformer",
                "code": "def __init__(self, func=None, inverse_func=None, validate=True,\n                 accept_sparse=False, pass_y='deprecated', check_inverse=True,\n                 kw_args=None, inv_kw_args=None):\n        self.func = func\n        self.inverse_func = inverse_func\n        self.validate = validate\n        self.accept_sparse = accept_sparse\n        self.pass_y = pass_y\n        self.check_inverse = check_inverse\n        self.kw_args = kw_args\n        self.inv_kw_args = inv_kw_args"
            },
            {
                "file": "sklearn/preprocessing/_function_transformer.py",
                "type": "function",
                "name": "fit",
                "class_name": "FunctionTransformer",
                "code": "def fit(self, X, y=None):\n        \"\"\"Fit transformer by checking X.\n\n        If ``validate`` is ``True``, ``X`` will be checked.\n\n        Parameters\n        ----------\n        X : array-like, shape (n_samples, n_features)\n            Input array.\n\n        Returns\n        -------\n        self\n        \"\"\"\n        if self.validate:\n            X = check_array(X, self.accept_sparse)\n        if (self.check_inverse and not (self.func is None or\n                                        self.inverse_func is None)):\n            self._check_inverse_transform(X)\n        return self"
            },
            {
                "file": "sklearn/preprocessing/_function_transformer.py",
                "type": "function",
                "name": "inverse_transform",
                "class_name": "FunctionTransformer",
                "code": "def inverse_transform(self, X, y='deprecated'):\n        \"\"\"Transform X using the inverse function.\n\n        Parameters\n        ----------\n        X : array-like, shape (n_samples, n_features)\n            Input array.\n\n        y : (ignored)\n            .. deprecated::0.19\n\n        Returns\n        -------\n        X_out : array-like, shape (n_samples, n_features)\n            Transformed input.\n        \"\"\"\n        if not isinstance(y, string_types) or y != 'deprecated':\n            warnings.warn(\"The parameter y on inverse_transform() is \"\n                          \"deprecated since 0.19 and will be removed in 0.21\",\n                          DeprecationWarning)\n        return self._transform(X, y=y, func=self.inverse_func,\n                               kw_args=self.inv_kw_args)"
            }
        ]
    },
    "Justification": "Candidate E is the most relevant because it deals with the `FunctionTransformer`, which has a direct relationship with how data is processed in pipelines. The current bug involves a `ColumnTransformer` that is failing due to handling a transformer step with no features, potentially related to how input types (like DataFrames) are handled. The fix proposal in Candidate E also emphasizes adjustments in handling DataFrames, which is relevant for ensuring that transformers are used correctly when outputting to pandas. The shared context of data transformation and pipeline behavior between Candidate E and the CURRENT bug makes it the best candidate for assistance."
}