{
  "RepoName": "https://github.com/scikit-learn-contrib/sklearn-pandas.git",
  "CommitSHA": "c9db2d6dcbf515eade751073f43318e43cae5177",
  "Time": "",
  "Difficulty": "Medium",
  "Type": "argument error",
  "BuggyCode": [
    {
      "path": "scikit-learn-contrib_sklearn-pandas/test.py",
      "content": "import pytest\nfrom unittest.mock import Mock\nimport numpy as np\nimport pandas as pd\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn.compose import make_column_selector\nfrom sklearn.preprocessing import StandardScaler\n\n\nclass GetStartWith:\n    def __init__(self, start_str):\n        self.start_str = start_str\n\n    def __call__(self, X: pd.DataFrame) -> list:\n        return [c for c in X.columns if c.startswith(self.start_str)]\n\n\ndf = pd.DataFrame({\n    'sepal length (cm)': [1.0, 2.0, 3.0],\n    'sepal width (cm)': [1.0, 2.0, 3.0],\n    'petal length (cm)': [1.0, 2.0, 3.0],\n    'petal width (cm)': [1.0, 2.0, 3.0]\n})\nt = DataFrameMapper([\n    (make_column_selector(dtype_include=float), StandardScaler(), {'alias': 'x'}),\n    (GetStartWith('petal'), None, {'alias': 'petal'})\n], df_out=True, default=False)\n\nt.fit(df)\nprint(t.transform(df).shape)\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/setup.py",
      "content": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom setuptools import setup\nfrom setuptools.command.test import test as TestCommand\nimport re\n\nfor line in open('sklearn_pandas/__init__.py'):\n    match = re.match(\"__version__ *= *'(.*)'\", line)\n    if match:\n        __version__, = match.groups()\n\n\nclass PyTest(TestCommand):\n    user_options = [('pytest-args=', 'a', \"Arguments to pass to py.test\")]\n\n    def initialize_options(self):\n        TestCommand.initialize_options(self)\n        self.pytest_args = []\n\n    def finalize_options(self):\n        TestCommand.finalize_options(self)\n        self.test_args = []\n        self.test_suite = True\n\n    def run(self):\n        import pytest\n        errno = pytest.main(self.pytest_args)\n        raise SystemExit(errno)\n\n\nsetup(name='sklearn-pandas',\n      version=__version__,\n      description='Pandas integration with sklearn',\n      maintainer='Ritesh Agrawal',\n      maintainer_email='ragrawal@gmail.com',\n      url='https://github.com/scikit-learn-contrib/sklearn-pandas',\n      packages=['sklearn_pandas'],\n      keywords=['scikit', 'sklearn', 'pandas'],\n      install_requires=[\n          'scikit-learn>=0.23.0',\n          'scipy>=1.5.1',\n          'pandas>=1.1.4',\n          'numpy>=1.18.1'\n      ],\n      tests_require=['pytest', 'mock'],\n      cmdclass={'test': PyTest},\n      license='MIT License'\n)\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/noxfile.py",
      "content": "import nox\n\n@nox.session\ndef lint(session):\n    session.install('pytest>=5.3.5', 'setuptools>=45.2',\n                    'wheel>=0.34.2', 'flake8>=3.7.9',\n                    'numpy==1.18.1', 'pandas==1.1.4')\n    session.install('.')\n    session.run('flake8', 'sklearn_pandas/', 'tests')\n\n@nox.session\n@nox.parametrize('numpy', ['1.18.1', '1.19.4', '1.20.1'])\n@nox.parametrize('scipy', ['1.5.4', '1.6.0'])\n@nox.parametrize('pandas', ['1.1.4', '1.2.2'])\ndef tests(session, numpy, scipy, pandas):\n    session.install('pytest>=5.3.5', \n                    'setuptools>=45.2',\n                    'wheel>=0.34.2',\n                    f'numpy=={numpy}',\n                    f'scipy=={scipy}',\n                    f'pandas=={pandas}'\n                    )\n    session.install('.')\n    session.run('py.test', 'README.rst', 'tests')\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/features_generator.py",
      "content": "def gen_features(columns, classes=None, prefix='', suffix=''):\n    \"\"\"Generates a feature definition list which can be passed\n    into DataFrameMapper\n\n    Params:\n\n    columns     a list of column names to generate features for.\n\n    classes     a list of classes for each feature, a list of dictionaries with\n                transformer class and init parameters, or None.\n\n                If list of classes is provided, then each of them is\n                instantiated with default arguments. Example:\n\n                    classes = [StandardScaler, LabelBinarizer]\n\n                If list of dictionaries is provided, then each of them should\n                have a 'class' key with transformer class. All other keys are\n                passed into 'class' value constructor. Example:\n\n                    classes = [\n                        {'class': StandardScaler, 'with_mean': False},\n                        {'class': LabelBinarizer}\n                    }]\n\n                If None value selected, then each feature left as is.\n\n    prefix      add prefix to transformed column names\n\n    suffix      add suffix to transformed column names.\n\n    \"\"\"\n    if classes is None:\n        return [(column, None) for column in columns]\n\n    feature_defs = []\n\n    for column in columns:\n        feature_transformers = []\n\n        arguments = {}\n        if prefix and prefix != \"\":\n            arguments['prefix'] = prefix\n        if suffix and suffix != \"\":\n            arguments['suffix'] = suffix\n\n        classes = [cls for cls in classes if cls is not None]\n        if not classes:\n            feature_defs.append((column, None, arguments))\n\n        else:\n            for definition in classes:\n                if isinstance(definition, dict):\n                    params = definition.copy()\n                    klass = params.pop('class')\n                    feature_transformers.append(klass(**params))\n                else:\n                    feature_transformers.append(definition())\n\n            if not feature_transformers:\n                feature_transformers = None\n\n            feature_defs.append((column, feature_transformers, arguments))\n\n    return feature_defs\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/transformers.py",
      "content": "import numpy as np\nimport pandas as pd\nfrom sklearn.base import TransformerMixin\nimport warnings\n\n\ndef _get_mask(X, value):\n    \"\"\"\n    Compute the boolean mask X == missing_values.\n    \"\"\"\n    if value == \"NaN\" or \\\n       value is None or \\\n       (isinstance(value, float) and np.isnan(value)):\n        return pd.isnull(X)\n    else:\n        return X == value\n\n\nclass NumericalTransformer(TransformerMixin):\n    \"\"\"\n    Provides commonly used numerical transformers.\n    \"\"\"\n    SUPPORTED_FUNCTIONS = ['log', 'log1p']\n\n    def __init__(self, func):\n        \"\"\"\n        Params\n\n        func    function to apply to input columns. The function will be\n                applied to each value. Supported functions are defined\n                in SUPPORTED_FUNCTIONS variable. Throws assertion error if the\n                not supported.\n        \"\"\"\n\n        warnings.warn(\"\"\"\n            NumericalTransformer will be deprecated in 3.0 version.\n            Please use Sklearn.base.TransformerMixin to write\n            customer transformers\n            \"\"\", DeprecationWarning)\n\n        assert func in self.SUPPORTED_FUNCTIONS, \\\n            f\"Only following func are supported: {self.SUPPORTED_FUNCTIONS}\"\n        super(NumericalTransformer, self).__init__()\n        self.__func = func\n\n    def fit(self, X, y=None):\n        return self\n\n    def transform(self, X, y=None):\n        if self.__func == 'log1p':\n            return np.vectorize(np.log1p)(X)\n        elif self.__func == 'log':\n            return np.vectorize(np.log)(X)\n\n        raise ValueError(f\"Invalid function name: {self.__func}\")\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/__init__.py",
      "content": "__version__ = '2.2.0'\n\nimport logging\nlogger = logging.getLogger(__name__)\n\nfrom .dataframe_mapper import DataFrameMapper  # NOQA\nfrom .features_generator import gen_features  # NOQA\nfrom .transformers import NumericalTransformer # NOQA\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/pipeline.py",
      "content": "import six\nfrom sklearn.pipeline import _name_estimators, Pipeline\nfrom sklearn.utils import tosequence\n\n\ndef _call_fit(fit_method, X, y=None, **kwargs):\n    \"\"\"\n    helper function, calls the fit or fit_transform method with the correct\n    number of parameters\n\n    fit_method: fit or fit_transform method of the transformer\n    X: the data to fit\n    y: the target vector relative to X, optional\n    kwargs: any keyword arguments to the fit method\n\n    return: the result of the fit or fit_transform method\n\n    WARNING: if this function raises a TypeError exception, test the fit\n    or fit_transform method passed to it in isolation as _call_fit will not\n    distinguish TypeError due to incorrect number of arguments from\n    other TypeError\n    \"\"\"\n    try:\n        return fit_method(X, y, **kwargs)\n    except TypeError:\n        # fit takes only one argument\n        return fit_method(X, **kwargs)\n\n\nclass TransformerPipeline(Pipeline):\n    \"\"\"\n    Pipeline that expects all steps to be transformers taking a single X\n    argument, an optional y argument, and having fit and transform methods.\n\n    Code is copied from sklearn's Pipeline\n    \"\"\"\n\n    def __init__(self, steps):\n        names, estimators = zip(*steps)\n        if len(dict(steps)) != len(steps):\n            raise ValueError(\n                \"Provided step names are not unique: %s\" % (names,))\n\n        # shallow copy of steps\n        self.steps = tosequence(steps)\n        estimator = estimators[-1]\n\n        for e in estimators:\n            if (not (hasattr(e, \"fit\") or hasattr(e, \"fit_transform\")) or not\n                    hasattr(e, \"transform\")):\n                raise TypeError(\"All steps of the chain should \"\n                                \"be transforms and implement fit and transform\"\n                                \" '%s' (type %s) doesn't)\" % (e, type(e)))\n\n        if not hasattr(estimator, \"fit\"):\n            raise TypeError(\"Last step of chain should implement fit \"\n                            \"'%s' (type %s) doesn't)\"\n                            % (estimator, type(estimator)))\n\n    def _pre_transform(self, X, y=None, **fit_params):\n        fit_params_steps = dict((step, {}) for step, _ in self.steps)\n        for pname, pval in six.iteritems(fit_params):\n            step, param = pname.split('__', 1)\n            fit_params_steps[step][param] = pval\n        Xt = X\n        for name, transform in self.steps[:-1]:\n            if hasattr(transform, \"fit_transform\"):\n                Xt = _call_fit(transform.fit_transform,\n                               Xt, y, **fit_params_steps[name])\n            else:\n                Xt = _call_fit(transform.fit,\n                               Xt, y, **fit_params_steps[name]).transform(Xt)\n        return Xt, fit_params_steps[self.steps[-1][0]]\n\n    def fit(self, X, y=None, **fit_params):\n        Xt, fit_params = self._pre_transform(X, y, **fit_params)\n        _call_fit(self.steps[-1][-1].fit, Xt, y, **fit_params)\n        return self\n\n    def fit_transform(self, X, y=None, **fit_params):\n        Xt, fit_params = self._pre_transform(X, y, **fit_params)\n        if hasattr(self.steps[-1][-1], 'fit_transform'):\n            return _call_fit(self.steps[-1][-1].fit_transform,\n                             Xt, y, **fit_params)\n        else:\n            return _call_fit(self.steps[-1][-1].fit,\n                             Xt, y, **fit_params).transform(Xt)\n\n\ndef make_transformer_pipeline(*steps):\n    \"\"\"Construct a TransformerPipeline from the given estimators.\n    \"\"\"\n    return TransformerPipeline(_name_estimators(steps))\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py",
      "content": "import contextlib\nfrom datetime import datetime\nimport pandas as pd\nimport numpy as np\nfrom scipy import sparse\nfrom sklearn.base import BaseEstimator, TransformerMixin\nfrom .cross_validation import DataWrapper\nfrom .pipeline import make_transformer_pipeline, _call_fit, TransformerPipeline\nfrom . import logger\n\nstring_types = text_type = str\n\n\ndef _handle_feature(fea):\n    \"\"\"\n    Convert 1-dimensional arrays to 2-dimensional column vectors.\n    \"\"\"\n    if len(fea.shape) == 1:\n        fea = np.array([fea]).T\n\n    return fea\n\n\ndef _build_transformer(transformers):\n    if isinstance(transformers, list):\n        transformers = make_transformer_pipeline(*transformers)\n    return transformers\n\n\ndef _build_feature(columns, transformers, options={}, X=None):\n    if X is None:\n        return (columns, _build_transformer(transformers), options)\n    return (\n        columns(X) if callable(columns) else columns,\n        _build_transformer(transformers),\n        options\n    )\n\n\ndef _elapsed_secs(t1):\n    return (datetime.now()-t1).total_seconds()\n\n\ndef _get_feature_names(estimator):\n    \"\"\"\n    Attempt to extract feature names based on a given estimator\n    \"\"\"\n    if hasattr(estimator, 'classes_'):\n        return estimator.classes_\n    elif hasattr(estimator, 'get_feature_names'):\n        return estimator.get_feature_names()\n    return None\n\n\n@contextlib.contextmanager\ndef add_column_names_to_exception(column_names):\n    # Stolen from https://stackoverflow.com/a/17677938/356729\n    try:\n        yield\n    except Exception as ex:\n        if ex.args:\n            msg = u'{}: {}'.format(column_names, ex.args[0])\n        else:\n            msg = text_type(column_names)\n        ex.args = (msg,) + ex.args[1:]\n        raise\n\n\nclass DataFrameMapper(BaseEstimator, TransformerMixin):\n    \"\"\"\n    Map Pandas data frame column subsets to their own\n    sklearn transformation.\n    \"\"\"\n\n    def __init__(self, features, default=False, sparse=False, df_out=False,\n                 input_df=False, drop_cols=None):\n        \"\"\"\n        Params:\n\n        features    a list of tuples with features definitions.\n                    The first element is the pandas column selector. This can\n                    be a string (for one column) or a list of strings.\n                    The second element is an object that supports\n                    sklearn's transform interface, or a list of such objects\n                    The third element is optional and, if present, must be\n                    a dictionary with the options to apply to the\n                    transformation. Example: {'alias': 'day_of_week'}\n\n        default     default transformer to apply to the columns not\n                    explicitly selected in the mapper. If False (default),\n                    discard them. If None, pass them through untouched. Any\n                    other transformer will be applied to all the unselected\n                    columns as a whole, taken as a 2d-array.\n\n        sparse      will return sparse matrix if set True and any of the\n                    extracted features is sparse. Defaults to False.\n\n        df_out      return a pandas data frame, with each column named using\n                    the pandas column that created it (if there's only one\n                    input and output) or the input columns joined with '_'\n                    if there's multiple inputs, and the name concatenated with\n                    '_1', '_2' etc if there's multiple outputs. NB: does not\n                    work if *default* or *sparse* are true\n\n        input_df    If ``True`` pass the selected columns to the transformers\n                    as a pandas DataFrame or Series. Otherwise pass them as a\n                    numpy array. Defaults to ``False``.\n\n        drop_cols   List of columns to be dropped. Defaults to None.\n\n        \"\"\"\n        self.features = features\n        self.default = default\n        self.built_default = None\n        self.sparse = sparse\n        self.df_out = df_out\n        self.input_df = input_df\n        self.drop_cols = [] if drop_cols is None else drop_cols\n        self.transformed_names_ = []\n        if (df_out and (sparse or default)):\n            raise ValueError(\"Can not use df_out with sparse or default\")\n\n    def _build(self, X=None):\n        \"\"\"\n        Build attributes built_features and built_default.\n        \"\"\"\n        if isinstance(self.features, list):\n            self.built_features = [\n                _build_feature(*f, X=X) for f in self.features\n            ]\n        else:\n            self.built_features = _build_feature(*self.features, X=X)\n        self.built_default = _build_transformer(self.default)\n\n    @property\n    def _selected_columns(self):\n        \"\"\"\n        Return a set of selected columns in the feature list.\n        \"\"\"\n        selected_columns = set()\n        for feature in self.features:\n            columns = feature[0]\n            if isinstance(columns, list):\n                selected_columns = selected_columns.union(set(columns))\n            else:\n                selected_columns.add(columns)\n        return selected_columns\n\n    def _unselected_columns(self, X):\n        \"\"\"\n        Return list of columns present in X and not selected explicitly in the\n        mapper.\n\n        Unselected columns are returned in the order they appear in the\n        dataframe to avoid issues with different ordering during default fit\n        and transform steps.\n        \"\"\"\n        X_columns = list(X.columns)\n        return [column for column in X_columns if\n                column not in self._selected_columns\n                and column not in self.drop_cols]\n\n    def __setstate__(self, state):\n        # compatibility for older versions of sklearn-pandas\n        super().__setstate__(state)\n        self.features = [_build_feature(*feat) for feat in state['features']]\n        self.sparse = state.get('sparse', False)\n        self.default = state.get('default', False)\n        self.df_out = state.get('df_out', False)\n        self.input_df = state.get('input_df', False)\n        self.drop_cols = state.get('drop_cols', [])\n        self.built_features = state.get('built_features', self.features)\n        self.built_default = state.get('built_default', self.default)\n        self.transformed_names_ = state.get('transformed_names_', [])\n\n    def __getstate__(self):\n        state = super().__getstate__()\n        state['features'] = self.features\n        state['sparse'] = self.sparse\n        state['default'] = self.default\n        state['df_out'] = self.df_out\n        state['input_df'] = self.input_df\n        state['drop_cols'] = self.drop_cols\n        state['build_features'] = getattr(self, 'built_features', None)\n        state['built_default'] = self.built_default\n        state['transformed_names_'] = self.transformed_names_\n        return state\n\n    def _get_col_subset(self, X, cols, input_df=False):\n        \"\"\"\n        Get a subset of columns from the given table X.\n\n        X       a Pandas dataframe; the table to select columns from\n        cols    a string or list of strings representing the columns to select.\n                It can also be a callable that returns True or False, i.e.\n                compatible with the built-in filter function.\n\n        Returns a numpy array with the data from the selected columns\n        \"\"\"\n\n        if isinstance(cols, string_types):\n            return_vector = True\n            cols = [cols]\n        else:\n            return_vector = False\n\n        # Needed when using the cross-validation compatibility\n        # layer for sklearn<0.16.0.\n        # Will be dropped on sklearn-pandas 2.0.\n        if isinstance(X, list):\n            X = [x[cols] for x in X]\n            X = pd.DataFrame(X)\n\n        elif isinstance(X, DataWrapper):\n            X = X.df  # fetch underlying data\n\n        if return_vector:\n            t = X[cols[0]]\n        else:\n            t = X[cols]\n\n        # return either a DataFrame/Series or a numpy array\n        if input_df:\n            return t\n        else:\n            return t.values\n\n    def fit(self, X, y=None):\n        \"\"\"\n        Fit a transformation from the pipeline\n\n        X       the data to fit\n\n        y       the target vector relative to X, optional\n\n        \"\"\"\n        self._build(X=X)\n\n        for columns, transformers, options in self.built_features:\n            t1 = datetime.now()\n            input_df = options.get('input_df', self.input_df)\n\n            if transformers is not None:\n                with add_column_names_to_exception(columns):\n                    Xt = self._get_col_subset(X, columns, input_df)\n                    _call_fit(transformers.fit, Xt, y)\n            logger.info(f\"[FIT] {columns}: {_elapsed_secs(t1)} secs\")\n\n        # handle features not explicitly selected\n        if self.built_default:  # not False and not None\n            unsel_cols = self._unselected_columns(X)\n            with add_column_names_to_exception(unsel_cols):\n                Xt = self._get_col_subset(X, unsel_cols, self.input_df)\n                _call_fit(self.built_default.fit, Xt, y)\n        return self\n\n    def get_names(self, columns, transformer, x, alias=None, prefix='',\n                  suffix=''):\n        \"\"\"\n        Return verbose names for the transformed columns.\n\n        columns       name (or list of names) of the original column(s)\n        transformer   transformer - can be a TransformerPipeline\n        x             transformed columns (numpy.ndarray)\n        alias         base name to use for the selected columns\n        \"\"\"\n        if alias is not None:\n            name = alias\n        elif isinstance(columns, list):\n            name = '_'.join(map(str, columns))\n        else:\n            name = columns\n        num_cols = x.shape[1] if len(x.shape) > 1 else 1\n\n        output = []\n\n        if num_cols > 1:\n            # If there are as many columns as classes in the transformer,\n            # infer column names from classes names.\n\n            # If we are dealing with multiple transformers for these columns\n            # attempt to extract the names from each of them, starting from the\n            # last one\n            if isinstance(transformer, TransformerPipeline):\n                inverse_steps = transformer.steps[::-1]\n                estimators = (estimator for name, estimator in inverse_steps)\n                names_steps = (_get_feature_names(e) for e in estimators)\n                names = next((n for n in names_steps if n is not None), None)\n            # Otherwise use the only estimator present\n            else:\n                names = _get_feature_names(transformer)\n\n            if names is not None and len(names) == num_cols:\n                output = [f\"{name}_{o}\" for o in names]\n                # otherwise, return name concatenated with '_1', '_2', etc.\n            else:\n                output = [name + '_' + str(o) for o in range(num_cols)]\n        else:\n            output = [name]\n\n        if prefix == suffix == \"\":\n            return output\n\n        return ['{}{}{}'.format(prefix, x, suffix) for x in output]\n\n    def get_dtypes(self, extracted):\n        dtypes_features = [self.get_dtype(ex) for ex in extracted]\n        return [dtype for dtype_feature in dtypes_features\n                for dtype in dtype_feature]\n\n    def get_dtype(self, ex):\n        if isinstance(ex, np.ndarray) or sparse.issparse(ex):\n            return [ex.dtype] * ex.shape[1]\n        elif isinstance(ex, pd.DataFrame):\n            return list(ex.dtypes)\n        else:\n            raise TypeError(type(ex))\n\n    def _transform(self, X, y=None, do_fit=False):\n        \"\"\"\n        Transform the given data with possibility to fit in advance.\n        Avoids code duplication for implementation of transform and\n        fit_transform.\n        \"\"\"\n        if do_fit:\n            self._build(X=X)\n\n        extracted = []\n        transformed_names_ = []\n        for columns, transformers, options in self.built_features:\n            input_df = options.get('input_df', self.input_df)\n\n            # columns could be a string or list of\n            # strings; we don't care because pandas\n            # will handle either.\n            Xt = self._get_col_subset(X, columns, input_df)\n\n            if transformers is not None:\n                with add_column_names_to_exception(columns):\n                    if do_fit and hasattr(transformers, 'fit_transform'):\n                        t1 = datetime.now()\n                        Xt = _call_fit(transformers.fit_transform, Xt, y)\n                        logger.info(f\"[FIT_TRANSFORM] {columns}: {_elapsed_secs(t1)} secs\")  # NOQA\n                    else:\n                        if do_fit:\n                            t1 = datetime.now()\n                            _call_fit(transformers.fit, Xt, y)\n                            logger.info(\n                                f\"[FIT] {columns}: {_elapsed_secs(t1)} secs\")\n\n                        t1 = datetime.now()\n                        Xt = transformers.transform(Xt)\n                        logger.info(f\"[TRANSFORM] {columns}: {_elapsed_secs(t1)} secs\")  # NOQA\n\n            extracted.append(_handle_feature(Xt))\n\n            alias = options.get('alias')\n\n            prefix = options.get('prefix', '')\n            suffix = options.get('suffix', '')\n\n            transformed_names_ += self.get_names(\n                columns, transformers, Xt, alias, prefix, suffix)\n\n        # handle features not explicitly selected\n        if self.built_default is not False:\n            unsel_cols = self._unselected_columns(X)\n            Xt = self._get_col_subset(X, unsel_cols, self.input_df)\n            if self.built_default is not None:\n                with add_column_names_to_exception(unsel_cols):\n                    if do_fit and hasattr(self.built_default, 'fit_transform'):\n                        Xt = _call_fit(self.built_default.fit_transform, Xt, y)\n                    else:\n                        if do_fit:\n                            _call_fit(self.built_default.fit, Xt, y)\n                        Xt = self.built_default.transform(Xt)\n                transformed_names_ += self.get_names(\n                    unsel_cols, self.built_default, Xt)\n            else:\n                # if not applying a default transformer,\n                # keep column names unmodified\n                transformed_names_ += unsel_cols\n\n            extracted.append(_handle_feature(Xt))\n\n        self.transformed_names_ = transformed_names_\n\n        # combine the feature outputs into one array.\n        # at this point we lose track of which features\n        # were created from which input columns, so it's\n        # assumed that that doesn't matter to the model.\n\n        # If any of the extracted features is sparse, combine sparsely.\n        # Otherwise, combine as normal arrays.\n        if any(sparse.issparse(fea) for fea in extracted):\n            stacked = sparse.hstack(extracted).tocsr()\n            # return a sparse matrix only if the mapper was initialized\n            # with sparse=True\n            if not self.sparse:\n                stacked = stacked.toarray()\n        else:\n            stacked = np.hstack(extracted)\n\n        if self.df_out:\n            # if no rows were dropped preserve the original index,\n            # otherwise use a new integer one\n            no_rows_dropped = len(X) == len(stacked)\n            if no_rows_dropped:\n                index = X.index\n            else:\n                index = None\n\n            # output different data types, if appropriate\n            dtypes = self.get_dtypes(extracted)\n            df_out = pd.DataFrame(\n                stacked,\n                columns=self.transformed_names_,\n                index=index)\n            # preserve types\n            for col, dtype in zip(self.transformed_names_, dtypes):\n                df_out[col] = df_out[col].astype(dtype)\n            return df_out\n        else:\n            return stacked\n\n    def transform(self, X):\n        \"\"\"\n        Transform the given data. Assumes that fit has already been called.\n\n        X       the data to transform\n        \"\"\"\n        return self._transform(X)\n\n    def fit_transform(self, X, y=None):\n        \"\"\"\n        Fit a transformation from the pipeline and directly apply\n        it to the given data.\n\n        X       the data to fit\n\n        y       the target vector relative to X, optional\n        \"\"\"\n        return self._transform(X, y, True)\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/cross_validation.py",
      "content": "class DataWrapper(object):\n\n    def __init__(self, df):\n        self.df = df\n\n    def __len__(self):\n        return len(self.df)\n\n    def __getitem__(self, key):\n        return self.df.iloc[key]\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_pipeline.py",
      "content": "import pytest\nfrom sklearn_pandas.pipeline import TransformerPipeline, _call_fit\n\n# In py3, mock is included with the unittest standard library\n# In py2, it's a separate package\ntry:\n    from unittest.mock import patch\nexcept ImportError:\n    from mock import patch\n\n\nclass NoTransformT(object):\n    \"\"\"Transformer without transform method.\n    \"\"\"\n    def fit(self, x):\n        return self\n\n\nclass NoFitT(object):\n    \"\"\"Transformer without fit method.\n    \"\"\"\n    def transform(self, x):\n        return self\n\n\nclass Trans(object):\n    \"\"\"\n    Transformer with fit and transform methods\n    \"\"\"\n    def fit(self, x, y=None):\n        return self\n\n    def transform(self, x):\n        return self\n\n\ndef func_x_y(x, y, kwarg='kwarg'):\n    \"\"\"\n    Function with required x and y arguments\n    \"\"\"\n    return\n\n\ndef func_x(x, kwarg='kwarg'):\n    \"\"\"\n    Function with required x argument\n    \"\"\"\n    return\n\n\ndef func_raise_type_err(x, y, kwarg='kwarg'):\n    \"\"\"\n    Function with required x and y arguments,\n    raises TypeError\n    \"\"\"\n    raise TypeError\n\n\ndef test_all_steps_fit_transform():\n    \"\"\"\n    All steps must implement fit and transform. Otherwise, raise TypeError.\n    \"\"\"\n    with pytest.raises(TypeError):\n        TransformerPipeline([('svc', NoTransformT())])\n\n    with pytest.raises(TypeError):\n        TransformerPipeline([('svc', NoFitT())])\n\n\n@patch.object(Trans, 'fit', side_effect=func_x_y)\ndef test_called_with_x_and_y(mock_fit):\n    \"\"\"\n    Fit method with required X and y arguments is called with both and with\n    any additional keywords\n    \"\"\"\n    _call_fit(Trans().fit, 'X', 'y', kwarg='kwarg')\n    mock_fit.assert_called_with('X', 'y', kwarg='kwarg')\n\n\n@patch.object(Trans, 'fit', side_effect=func_x)\ndef test_called_with_x(mock_fit):\n    \"\"\"\n    Fit method with a required X arguments is called with it and with\n    any additional keywords\n    \"\"\"\n    _call_fit(Trans().fit, 'X', 'y', kwarg='kwarg')\n    mock_fit.assert_called_with('X', kwarg='kwarg')\n\n    _call_fit(Trans().fit, 'X', kwarg='kwarg')\n    mock_fit.assert_called_with('X', kwarg='kwarg')\n\n\n@patch.object(Trans, 'fit', side_effect=func_raise_type_err)\ndef test_raises_type_error(mock_fit):\n    \"\"\"\n    If a fit method with required X and y arguments raises a TypeError, it's\n    re-raised (for a different reason) when it's called with one argument\n    \"\"\"\n    with pytest.raises(TypeError):\n        _call_fit(Trans().fit, 'X', 'y', kwarg='kwarg')\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_transformers.py",
      "content": "import tempfile\nimport pytest\nimport numpy as np\nfrom pandas import DataFrame\nimport joblib\n\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn_pandas import NumericalTransformer\n\n\n@pytest.fixture\ndef simple_dataset():\n    return DataFrame({\n        'feat1': [1, 2, 1, 3, 1],\n        'feat2': [1, 2, 2, 2, 3],\n        'feat3': [1, 2, 3, 4, 5],\n    })\n\n\ndef test_common_numerical_transformer(simple_dataset):\n    \"\"\"\n    Test log transformation\n    \"\"\"\n    transfomer = DataFrameMapper([\n        ('feat1', NumericalTransformer('log'))\n    ], df_out=True)\n    df = simple_dataset\n    outDF = transfomer.fit_transform(df)\n    assert list(outDF.columns) == ['feat1']\n    assert np.array_equal(df['feat1'].apply(np.log).values, outDF.feat1.values)\n\n\ndef test_numerical_transformer_serialization(simple_dataset):\n    \"\"\"\n    Test if you can serialize transformer\n    \"\"\"\n    transfomer = DataFrameMapper([\n        ('feat1', NumericalTransformer('log'))\n    ])\n\n    df = simple_dataset\n    transfomer.fit(df)\n    f = tempfile.NamedTemporaryFile(delete=True)\n    joblib.dump(transfomer, f.name)\n    transfomer2 = joblib.load(f.name)\n    np.array_equal(transfomer.transform(df), transfomer2.transform(df))\n    f.close()\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
      "content": "# -*- coding: utf8 -*-\n\nimport pytest\nfrom unittest.mock import Mock\nfrom pandas import DataFrame\nimport pandas as pd\nfrom scipy import sparse\nfrom sklearn.datasets import load_iris\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.model_selection import cross_val_score\nfrom sklearn.svm import SVC\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.feature_extraction import DictVectorizer\nfrom sklearn.preprocessing import (\n    StandardScaler, OneHotEncoder, LabelBinarizer)\nfrom sklearn.impute import SimpleImputer as Imputer\nfrom sklearn.feature_selection import SelectKBest, chi2\nfrom sklearn.base import BaseEstimator, TransformerMixin\nimport sklearn.decomposition\nimport numpy as np\nfrom numpy.testing import assert_array_equal\nimport pickle\nfrom sklearn.compose import make_column_selector\n\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn_pandas.dataframe_mapper import _handle_feature, _build_transformer\nfrom sklearn_pandas.pipeline import TransformerPipeline\n\n\nclass MockXTransformer(object):\n    \"\"\"\n    Mock transformer that accepts no y argument.\n    \"\"\"\n    def fit(self, X):\n        return self\n\n    def transform(self, X):\n        return X\n\n\nclass MockTClassifier(object):\n    \"\"\"\n    Mock transformer/classifier.\n    \"\"\"\n    def fit(self, X, y=None):\n        return self\n\n    def transform(self, X):\n        return X\n\n    def predict(self, X):\n        return True\n\n\nclass DateEncoder():\n    def fit(self, X, y=None):\n        return self\n\n    def transform(self, X):\n        dt = X.dt\n        return pd.concat([dt.year, dt.month, dt.day], axis=1)\n\n\nclass ToSparseTransformer(BaseEstimator, TransformerMixin):\n    \"\"\"\n    Transforms numpy matrix to sparse format.\n    \"\"\"\n    def fit(self, X):\n        return self\n\n    def transform(self, X):\n        return sparse.csr_matrix(X)\n\n\nclass CustomTransformer(BaseEstimator, TransformerMixin):\n    \"\"\"\n    Example of transformer in which the number of classes\n    is not equals to the number of output columns.\n    \"\"\"\n    def fit(self, X, y=None):\n        self.min = X.min()\n        self.classes_ = np.unique(X)\n        return self\n\n    def transform(self, X):\n        classes = np.unique(X)\n        if len(np.setdiff1d(classes, self.classes_)) > 0:\n            raise ValueError('Unknown values found.')\n        return X - self.min\n\n\nclass MockImageTransformer(BaseEstimator, TransformerMixin):\n    \"\"\"\n    Example transformer that takes the max of a 2d vector\n    then scales the result.\n    \"\"\"\n    def __init__(self, multiplier=10.0):\n        self.multiplier = multiplier\n\n    def fit(self, X, y=None):\n        return self\n\n    def transform(self, X):\n        assert isinstance(X, pd.DataFrame)\n        for col in X.columns:\n            X[col] = X[col].map(lambda img: np.max(img))\n        return X * self.multiplier\n\n\n@pytest.fixture\ndef simple_dataframe():\n    return pd.DataFrame({'a': [1, 2, 3]})\n\n\n@pytest.fixture\ndef complex_dataframe():\n    return pd.DataFrame({'target': ['a', 'a', 'b', 'b', 'c', 'c'],\n                         'feat1': [1, 2, 3, 4, 5, 6],\n                         'feat2': [1, 2, 3, 2, 3, 4]})\n\n\n@pytest.fixture\ndef complex_object_dataframe():\n    return pd.DataFrame({'target': ['a', 'a', 'b', 'b', 'c', 'c'],\n                         'feat1': [1, 2, 3, 4, 5, 6],\n                         'feat2': [1, 2, 3, 2, 3, 4],\n                         'img2d': [1*np.eye(2), 2*np.eye(2), 3*np.eye(2),\n                                   4*np.eye(2), 5*np.eye(2), 6*np.eye(2)]})\n\n\n@pytest.fixture\ndef multiindex_dataframe():\n    \"\"\"Example MultiIndex DataFrame, taken from pandas documentation\n    \"\"\"\n    iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]\n    index = pd.MultiIndex.from_product(iterables, names=['first', 'second'])\n    df = pd.DataFrame(np.random.randn(10, 8), columns=index)\n    return df\n\n\n@pytest.fixture\ndef multiindex_dataframe_incomplete(multiindex_dataframe):\n    \"\"\"Example MultiIndex DataFrame with missing entries\n    \"\"\"\n    df = multiindex_dataframe\n    mask_array = np.zeros(df.size)\n    mask_array[:20] = 1\n    np.random.shuffle(mask_array)\n    mask = mask_array.reshape(df.shape).astype(bool)\n    df.mask(mask, inplace=True)\n    return df\n\n\ndef test_transformed_names_simple(simple_dataframe):\n    \"\"\"\n    Get transformed names of features in `transformed_names` attribute\n    for simple transformation\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', None)])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['a']\n\n\ndef test_transformed_names_binarizer(complex_dataframe):\n    \"\"\"\n    Get transformed names of features in `transformed_names` attribute\n    for a transformation that multiplies the number of columns\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([('target', LabelBinarizer())])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['target_a', 'target_b', 'target_c']\n\n\ndef test_logging(caplog, complex_dataframe):\n    \"\"\"\n    Get transformed names of features in `transformed_names` attribute\n    for a transformation that multiplies the number of columns\n    \"\"\"\n    import logging\n    logger = logging.getLogger('sklearn_pandas')\n    logger.setLevel(logging.INFO)\n    df = complex_dataframe\n    mapper = DataFrameMapper([('target', LabelBinarizer())])\n    mapper.fit_transform(df)\n    assert '[FIT_TRANSFORM] target:' in caplog.text\n\n\ndef test_transformed_names_binarizer_unicode():\n    df = pd.DataFrame({'target': [u'ñ', u'á', u'é']})\n    mapper = DataFrameMapper([('target', LabelBinarizer())])\n    mapper.fit_transform(df)\n    expected_names = {u'target_ñ', u'target_á', u'target_é'}\n    assert set(mapper.transformed_names_) == expected_names\n\n\ndef test_transformed_names_transformers_list(complex_dataframe):\n    \"\"\"\n    When using a list of transformers, use them in inverse order to get the\n    transformed names\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([\n        ('target', [LabelBinarizer(), MockXTransformer()])\n    ])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['target_a', 'target_b', 'target_c']\n\n\ndef test_transformed_names_simple_alias(simple_dataframe):\n    \"\"\"\n    If we specify an alias for a single output column, it is used for the\n    output\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', None, {'alias': 'new_name'})])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['new_name']\n\n\ndef test_transformed_names_complex_alias(complex_dataframe):\n    \"\"\"\n    If we specify an alias for a multiple output column, it is used for the\n    output\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([('target', LabelBinarizer(), {'alias': 'new'})])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['new_a', 'new_b', 'new_c']\n\n\ndef test_exception_column_context_transform(simple_dataframe):\n    \"\"\"\n    If an exception is raised when transforming a column,\n    the exception includes the name of the column being transformed\n    \"\"\"\n    class FailingTransformer(object):\n        def fit(self, X):\n            pass\n\n        def transform(self, X):\n            raise Exception('Some exception')\n\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', FailingTransformer())])\n    mapper.fit(df)\n\n    with pytest.raises(Exception, match='a: Some exception'):\n        mapper.transform(df)\n\n\ndef test_exception_column_context_fit(simple_dataframe):\n    \"\"\"\n    If an exception is raised when fit a column,\n    the exception includes the name of the column being fitted\n    \"\"\"\n    class FailingFitter(object):\n        def fit(self, X):\n            raise Exception('Some exception')\n\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', FailingFitter())])\n\n    with pytest.raises(Exception, match='a: Some exception'):\n        mapper.fit(df)\n\n\ndef test_simple_df(simple_dataframe):\n    \"\"\"\n    Get a dataframe from a simple mapped dataframe\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', None)], df_out=True)\n    transformed = mapper.fit_transform(df)\n    assert type(transformed) == pd.DataFrame\n    assert len(transformed[\"a\"]) == len(simple_dataframe[\"a\"])\n\n\ndef test_complex_df(complex_dataframe):\n    \"\"\"\n    Get a dataframe from a complex mapped dataframe\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper(\n        [('target', None), ('feat1', None), ('feat2', None)],\n        df_out=True)\n    transformed = mapper.fit_transform(df)\n    assert len(transformed) == len(complex_dataframe)\n    for c in df.columns:\n        assert len(transformed[c]) == len(df[c])\n\n\ndef test_complex_object_df(complex_object_dataframe):\n    \"\"\"\n    Get a dataframe from a complex dataframe with 2d features\n    \"\"\"\n    df = complex_object_dataframe\n    img_scale = 10\n    mapper = DataFrameMapper(\n        [('target', None), ('feat1', None),\n         (make_column_selector('feat2'), StandardScaler()),\n         (make_column_selector('img2d'), MockImageTransformer(img_scale))],\n        df_out=True, input_df=True)\n    transformed = mapper.fit_transform(df)\n    assert len(transformed) == len(complex_object_dataframe)\n    assert np.isclose(\n        np.sum(transformed['img2d']),\n        np.max(np.sum(df['img2d'])) * img_scale, atol=1e-12)\n\n\ndef test_numeric_column_names(complex_dataframe):\n    \"\"\"\n    Get a dataframe from a complex mapped dataframe with numeric column names\n    \"\"\"\n    df = complex_dataframe\n    df.columns = [0, 1, 2]\n    mapper = DataFrameMapper(\n        [(0, None), (1, None), (2, None)], df_out=True)\n    transformed = mapper.fit_transform(df)\n    assert len(transformed) == len(complex_dataframe)\n    for c in df.columns:\n        assert len(transformed[c]) == len(df[c])\n\n\ndef test_multiindex_df(multiindex_dataframe_incomplete):\n    \"\"\"\n    Get a dataframe from a multiindex dataframe with missing data\n    \"\"\"\n    df = multiindex_dataframe_incomplete\n    mapper = DataFrameMapper([([c], Imputer()) for c in df.columns],\n                             df_out=True)\n    transformed = mapper.fit_transform(df)\n    assert len(transformed) == len(multiindex_dataframe_incomplete)\n    for c in df.columns:\n        assert len(transformed[str(c)]) == len(df[c])\n\n\ndef test_binarizer_df():\n    \"\"\"\n    Check level names from LabelBinarizer\n    \"\"\"\n    df = pd.DataFrame({'target': ['a', 'a', 'b', 'b', 'c', 'a']})\n    mapper = DataFrameMapper([('target', LabelBinarizer())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 3\n    assert cols[0] == 'target_a'\n    assert cols[1] == 'target_b'\n    assert cols[2] == 'target_c'\n\n\ndef test_binarizer_int_df():\n    \"\"\"\n    Check level names from LabelBinarizer for a numeric array.\n    \"\"\"\n    df = pd.DataFrame({'target': [5, 5, 6, 6, 7, 5]})\n    mapper = DataFrameMapper([('target', LabelBinarizer())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 3\n    assert cols[0] == 'target_5'\n    assert cols[1] == 'target_6'\n    assert cols[2] == 'target_7'\n\n\ndef test_binarizer2_df():\n    \"\"\"\n    Check level names from LabelBinarizer with just one output column\n    \"\"\"\n    df = pd.DataFrame({'target': ['a', 'a', 'b', 'b', 'a']})\n    mapper = DataFrameMapper([('target', LabelBinarizer())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 1\n    assert cols[0] == 'target'\n\n\ndef test_onehot_df():\n    \"\"\"\n    Check level ids from one-hot\n    \"\"\"\n    df = pd.DataFrame({'target': [0, 0, 1, 1, 2, 3, 0]})\n    mapper = DataFrameMapper([(['target'], OneHotEncoder())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 4\n    assert cols[0] == 'target_0'\n    assert cols[3] == 'target_3'\n\n\ndef test_customtransform_df():\n    \"\"\"\n    Check level ids from a transformer in which\n    the number of classes is not equals to the number of output columns.\n    \"\"\"\n    df = pd.DataFrame({'target': [6, 5, 7, 5, 4, 8, 8]})\n    mapper = DataFrameMapper([(['target'], CustomTransformer())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(mapper.features[0][1].classes_) == 5\n    assert len(cols) == 1\n    assert cols[0] == 'target'\n\n\ndef test_preserve_df_index():\n    \"\"\"\n    The index is preserved when df_out=True\n    \"\"\"\n    df = pd.DataFrame({'target': [1, 2, 3]},\n                      index=['a', 'b', 'c'])\n    mapper = DataFrameMapper([('target', None)],\n                             df_out=True)\n\n    transformed = mapper.fit_transform(df)\n\n    assert_array_equal(transformed.index, df.index)\n\n\ndef test_preserve_df_index_rows_dropped():\n    \"\"\"\n    If df_out=True but the original df index length doesn't\n    match the number of final rows, use a numeric index\n    \"\"\"\n    class DropLastRowTransformer(object):\n        def fit(self, X):\n            return self\n\n        def transform(self, X):\n            return X[:-1]\n\n    df = pd.DataFrame({'target': [1, 2, 3]},\n                      index=['a', 'b', 'c'])\n    mapper = DataFrameMapper([('target', DropLastRowTransformer())],\n                             df_out=True)\n\n    transformed = mapper.fit_transform(df)\n\n    assert_array_equal(transformed.index, np.array([0, 1]))\n\n\ndef test_pca(complex_dataframe):\n    \"\"\"\n    Check multi in and out with PCA\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper(\n        [(['feat1', 'feat2'], sklearn.decomposition.PCA(2))],\n        df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 2\n    assert cols[0] == 'feat1_feat2_0'\n    assert cols[1] == 'feat1_feat2_1'\n\n\ndef test_fit_transform(simple_dataframe):\n    \"\"\"\n    Check that custom fit_transform methods of the transformers are invoked.\n    \"\"\"\n    df = simple_dataframe\n    mock_transformer = Mock()\n    # return something of measurable length but does nothing\n    mock_transformer.fit_transform.return_value = np.array([1, 2, 3])\n    mapper = DataFrameMapper([(\"a\", mock_transformer)])\n    mapper.fit_transform(df)\n    assert mock_transformer.fit_transform.called\n\n\ndef test_fit_transform_equiv_mock(simple_dataframe):\n    \"\"\"\n    Check for equivalent results for code paths fit_transform\n    versus fit and transform in DataFrameMapper using the mock\n    transformer which does not implement a custom fit_transform.\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', MockXTransformer())])\n    transformed_combined = mapper.fit_transform(df)\n    transformed_separate = mapper.fit(df).transform(df)\n    assert np.all(transformed_combined == transformed_separate)\n\n\ndef test_fit_transform_equiv_pca(complex_dataframe):\n    \"\"\"\n    Check for equivalent results for code paths fit_transform\n    versus fit and transform in DataFrameMapper and transformer\n    using PCA which implements a custom fit_transform. The\n    equivalence of both paths in the transformer only can be\n    asserted since this is tested in the sklearn tests\n    scikit-learn/sklearn/decomposition/tests/test_pca.py\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper(\n        [(['feat1', 'feat2'], sklearn.decomposition.PCA(2))],\n        df_out=True)\n    transformed_combined = mapper.fit_transform(df)\n    transformed_separate = mapper.fit(df).transform(df)\n    assert np.allclose(transformed_combined, transformed_separate)\n\n\ndef test_input_df_true_first_transformer(simple_dataframe, monkeypatch):\n    \"\"\"\n    If input_df is True, the first transformer is passed\n    a pd.Series instead of an np.array\n    \"\"\"\n    df = simple_dataframe\n    monkeypatch.setattr(MockXTransformer, 'fit', Mock())\n    monkeypatch.setattr(MockXTransformer, 'transform',\n                        Mock(return_value=np.array([1, 2, 3])))\n    mapper = DataFrameMapper([\n        ('a', MockXTransformer())\n    ], input_df=True)\n    out = mapper.fit_transform(df)\n\n    args, _ = MockXTransformer().fit.call_args\n    assert isinstance(args[0], pd.Series)\n\n    args, _ = MockXTransformer().transform.call_args\n    assert isinstance(args[0], pd.Series)\n\n    assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n\n\ndef test_input_df_true_next_transformers(simple_dataframe, monkeypatch):\n    \"\"\"\n    If input_df is True, the subsequent transformers get passed pandas\n    objects instead of numpy arrays (given the previous transformers\n    output pandas objects as well)\n    \"\"\"\n    df = simple_dataframe\n    monkeypatch.setattr(MockTClassifier, 'fit', Mock())\n    monkeypatch.setattr(MockTClassifier, 'transform',\n                        Mock(return_value=pd.Series([1, 2, 3])))\n    mapper = DataFrameMapper([\n        ('a', [MockXTransformer(), MockTClassifier()])\n    ], input_df=True)\n    mapper.fit(df)\n    out = mapper.transform(df)\n\n    args = MockTClassifier().fit.call_args\n    assert isinstance(args[0], pd.Series)\n\n    assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n\n\ndef test_input_df_true_multiple_cols(complex_dataframe):\n    \"\"\"\n    When input_df is True, applying transformers to multiple columns\n    works as expected\n    \"\"\"\n    df = complex_dataframe\n\n    mapper = DataFrameMapper([\n        ('target', MockXTransformer()),\n        ('feat1',  MockXTransformer()),\n    ], input_df=True)\n    out = mapper.fit_transform(df)\n\n    assert_array_equal(out[:, 0], df['target'].values)\n    assert_array_equal(out[:, 1], df['feat1'].values)\n\n\ndef test_input_df_date_encoder():\n    \"\"\"\n    When input_df is True we can apply a transformer that only works\n    with pandas dataframes like a DateEncoder\n    \"\"\"\n    df = pd.DataFrame(\n        {'dates': pd.date_range('2015-10-30', '2015-11-02')})\n    mapper = DataFrameMapper([\n        ('dates', DateEncoder())\n    ], input_df=True)\n    out = mapper.fit_transform(df)\n    expected = np.array([\n        [2015, 10, 30],\n        [2015, 10, 31],\n        [2015, 11, 1],\n        [2015, 11, 2]\n    ])\n    assert_array_equal(out, expected)\n\n\ndef test_local_input_df_date_encoder():\n    \"\"\"\n    When input_df is True we can apply a transformer that only works\n    with pandas dataframes like a DateEncoder\n    \"\"\"\n    df = pd.DataFrame(\n        {'dates': pd.date_range('2015-10-30', '2015-11-02')})\n    mapper = DataFrameMapper([\n        ('dates', DateEncoder(), {'input_df': True})\n    ], input_df=False)\n    out = mapper.fit_transform(df)\n    expected = np.array([\n        [2015, 10, 30],\n        [2015, 10, 31],\n        [2015, 11, 1],\n        [2015, 11, 2]\n    ])\n    assert_array_equal(out, expected)\n\n\ndef test_nonexistent_columns_explicit_fail(simple_dataframe):\n    \"\"\"\n    If a nonexistent column is selected, KeyError is raised.\n    \"\"\"\n    mapper = DataFrameMapper(None)\n    with pytest.raises(KeyError):\n        mapper._get_col_subset(simple_dataframe, ['nonexistent_feature'])\n\n\ndef test_get_col_subset_single_column_array(simple_dataframe):\n    \"\"\"\n    Selecting a single column should return a 1-dimensional numpy array.\n    \"\"\"\n    mapper = DataFrameMapper(None)\n    array = mapper._get_col_subset(simple_dataframe, \"a\")\n\n    assert type(array) == np.ndarray\n    assert array.shape == (len(simple_dataframe[\"a\"]),)\n\n\ndef test_get_col_subset_single_column_list(simple_dataframe):\n    \"\"\"\n    Selecting a list of columns (even if the list contains a single element)\n    should return a 2-dimensional numpy array.\n    \"\"\"\n    mapper = DataFrameMapper(None)\n    array = mapper._get_col_subset(simple_dataframe, [\"a\"])\n\n    assert type(array) == np.ndarray\n    assert array.shape == (len(simple_dataframe[\"a\"]), 1)\n\n\ndef test_cols_string_array(simple_dataframe):\n    \"\"\"\n    If a string is specified as the columns, the transformer\n    is called with a 1-d array as input.\n    \"\"\"\n    df = simple_dataframe\n    mock_transformer = Mock()\n    mapper = DataFrameMapper([(\"a\", mock_transformer)])\n\n    mapper.fit(df)\n    args, kwargs = mock_transformer.fit.call_args\n    assert args[0].shape == (3,)\n\n\ndef test_cols_list_column_vector(simple_dataframe):\n    \"\"\"\n    If a one-element list is specified as the columns, the transformer\n    is called with a column vector as input.\n    \"\"\"\n    df = simple_dataframe\n    mock_transformer = Mock()\n    mapper = DataFrameMapper([([\"a\"], mock_transformer)])\n\n    mapper.fit(df)\n    args, kwargs = mock_transformer.fit.call_args\n    assert args[0].shape == (3, 1)\n\n\ndef test_handle_feature_2dim():\n    \"\"\"\n    2-dimensional arrays are returned unchanged.\n    \"\"\"\n    array = np.array([[1, 2], [3, 4]])\n    assert_array_equal(_handle_feature(array), array)\n\n\ndef test_handle_feature_1dim():\n    \"\"\"\n    1-dimensional arrays are converted to 2-dimensional column vectors.\n    \"\"\"\n    array = np.array([1, 2])\n    assert_array_equal(_handle_feature(array), np.array([[1], [2]]))\n\n\ndef test_build_transformers():\n    \"\"\"\n    When a list of transformers is passed, return a pipeline with\n    each element of the iterable as a step of the pipeline.\n    \"\"\"\n    transformers = [MockTClassifier(), MockTClassifier()]\n    pipeline = _build_transformer(transformers)\n    assert isinstance(pipeline, Pipeline)\n    for ix, transformer in enumerate(transformers):\n        assert pipeline.steps[ix][1] == transformer\n\n\ndef test_selected_columns():\n    \"\"\"\n    selected_columns returns a set of the columns appearing in the features\n    of the mapper.\n    \"\"\"\n    mapper = DataFrameMapper([\n        ('a', None),\n        (['a', 'b'], None)\n    ])\n    assert mapper._selected_columns == {'a', 'b'}\n\n\ndef test_unselected_columns():\n    \"\"\"\n    unselected_columns returns a list of the columns not appearing in the\n    features of the mapper but present in the given dataframe.\n    \"\"\"\n    df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})\n    mapper = DataFrameMapper([\n        ('a', None),\n        (['a', 'b'], None)\n    ])\n    assert 'c' in mapper._unselected_columns(df)\n\n\ndef test_drop_and_default_false():\n    \"\"\"\n    If default=False, non explicitly selected columns and drop columns\n    are discarded.\n    \"\"\"\n    df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})\n    mapper = DataFrameMapper([\n            ('a', None)\n        ], drop_cols=['c'], default=False)\n    transformed = mapper.fit_transform(df)\n    assert transformed.shape == (1, 1)\n    assert mapper.transformed_names_ == ['a']\n\n\ndef test_drop_and_default_none():\n    \"\"\"\n    If default=None, drop columns are discarded and\n    remaining non explicitly selected columns are passed through untransformed\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([\n        ('a', None)\n    ], drop_cols=['c'], default=None)\n\n    transformed = mapper.fit_transform(df)\n    assert transformed.shape == (3, 2)\n    assert mapper.transformed_names_ == ['a', 'b']\n\n\ndef test_conflicting_drop():\n    \"\"\"\n    Drop column name shouldn't get confused with transformed columns.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([\n        ('a', None)\n    ], drop_cols=['a'], default=False)\n\n    transformed = mapper.fit_transform(df)\n    assert transformed.shape == (3, 1)\n    assert mapper.transformed_names_ == ['a']\n\n\ndef test_default_false():\n    \"\"\"\n    If default=False, non explicitly selected columns are discarded.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([\n        ('b', None)\n    ], default=False)\n\n    transformed = mapper.fit_transform(df)\n    assert transformed.shape == (3, 1)\n\n\ndef test_default_none():\n    \"\"\"\n    If default=None, non explicitly selected columns are passed through\n    untransformed.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([\n        (['a'], OneHotEncoder())\n    ], default=None)\n\n    transformed = mapper.fit_transform(df)\n    assert (transformed[:, 3] == np.array([3, 5, 7]).T).all()\n\n\ndef test_default_none_names():\n    \"\"\"\n    If default=None, column names are returned unmodified.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([], default=None)\n\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['a', 'b']\n\n\ndef test_default_transformer():\n    \"\"\"\n    If default=Transformer, non explicitly selected columns are applied this\n    transformer.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, np.nan, 3], })\n    mapper = DataFrameMapper([], default=Imputer())\n\n    transformed = mapper.fit_transform(df)\n    assert (transformed[: 0] == np.array([1., 2., 3.])).all()\n\n\ndef test_list_transformers_single_arg(simple_dataframe):\n    \"\"\"\n    Multiple transformers can be specified in a list even if some of them\n    only accept one X argument instead of two (X, y).\n    \"\"\"\n    mapper = DataFrameMapper([\n        ('a', [MockXTransformer()])\n    ])\n    # doesn't fail\n    mapper.fit_transform(simple_dataframe)\n\n\ndef test_list_transformers():\n    \"\"\"\n    Specifying a list of transformers applies them sequentially to the\n    selected column.\n    \"\"\"\n    dataframe = pd.DataFrame({\"a\": [1, np.nan, 3], \"b\": [1, 5, 7]},\n                             dtype=np.float64)\n\n    mapper = DataFrameMapper([\n        ([\"a\"], [Imputer(), StandardScaler()]),\n        ([\"b\"], StandardScaler()),\n    ])\n    dmatrix = mapper.fit_transform(dataframe)\n\n    assert pd.isnull(dmatrix).sum() == 0  # no null values\n\n    # all features have mean 0 and std deviation 1 (standardized)\n    assert (abs(dmatrix.mean(axis=0) - 0) <= 1e-6).all()\n    assert (abs(dmatrix.std(axis=0) - 1) <= 1e-6).all()\n\n\ndef test_list_transformers_old_unpickle(simple_dataframe):\n    mapper = DataFrameMapper(None)\n    # simulate the mapper was created with < 1.0.0 code\n    mapper.features = [('a', [MockXTransformer()])]\n    mapper_pickled = pickle.dumps(mapper)\n\n    loaded_mapper = pickle.loads(mapper_pickled)\n    transformer = loaded_mapper.features[0][1]\n    assert isinstance(transformer, TransformerPipeline)\n    assert isinstance(transformer.steps[0][1], MockXTransformer)\n\n\ndef test_sparse_features(simple_dataframe):\n    \"\"\"\n    If any of the extracted features is sparse and \"sparse\" argument\n    is true, the hstacked result is also sparse.\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([\n        (\"a\", ToSparseTransformer())\n    ], sparse=True)\n    dmatrix = mapper.fit_transform(df)\n\n    assert type(dmatrix) == sparse.csr.csr_matrix\n\n\ndef test_sparse_off(simple_dataframe):\n    \"\"\"\n    If the resulting features are sparse but the \"sparse\" argument\n    of the mapper is False, return a non-sparse matrix.\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([\n        (\"a\", ToSparseTransformer())\n    ], sparse=False)\n\n    dmatrix = mapper.fit_transform(df)\n    assert type(dmatrix) != sparse.csr.csr_matrix\n\n\ndef test_fit_with_optional_y_arg(complex_dataframe):\n    \"\"\"\n    Transformers with an optional y argument in the fit method\n    are handled correctly\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([(['feat1', 'feat2'], MockTClassifier())])\n    # doesn't fail\n    mapper.fit(df[['feat1', 'feat2']], df['target'])\n\n\ndef test_fit_with_required_y_arg(complex_dataframe):\n    \"\"\"\n    Transformers with a required y argument in the fit method\n    are handled and perform correctly\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([(['feat1', 'feat2'], SelectKBest(chi2, k=1))])\n\n    # fit, doesn't fail\n    ft_arr = mapper.fit(df[['feat1', 'feat2']], df['target'])\n\n    # fit_transform\n    ft_arr = mapper.fit_transform(df[['feat1', 'feat2']], df['target'])\n    assert_array_equal(ft_arr, df[['feat1']].values)\n\n    # transform\n    t_arr = mapper.transform(df[['feat1', 'feat2']])\n    assert_array_equal(t_arr, df[['feat1']].values)\n\n\n# Integration tests with real dataframes\n\n@pytest.fixture\ndef iris_dataframe():\n    iris = load_iris()\n    return DataFrame(\n        data={\n            iris.feature_names[0]: iris.data[:, 0],\n            iris.feature_names[1]: iris.data[:, 1],\n            iris.feature_names[2]: iris.data[:, 2],\n            iris.feature_names[3]: iris.data[:, 3],\n            \"species\": np.array([iris.target_names[e] for e in iris.target])\n        }\n    )\n\n\n@pytest.fixture\ndef cars_dataframe():\n    return pd.read_csv(\"tests/test_data/cars.csv.gz\", compression='gzip')\n\n\ndef test_with_iris_dataframe(iris_dataframe):\n    pipeline = Pipeline([\n        (\"preprocess\", DataFrameMapper([\n            (\"petal length (cm)\", None),\n            (\"petal width (cm)\", None),\n            (\"sepal length (cm)\", None),\n            (\"sepal width (cm)\", None),\n        ])),\n        (\"classify\", SVC(kernel='linear'))\n    ])\n    data = iris_dataframe.drop(\"species\", axis=1)\n    labels = iris_dataframe[\"species\"]\n    scores = cross_val_score(pipeline, data, labels)\n    assert scores.mean() > 0.96\n    assert (scores.std() * 2) < 0.04\n\n\ndef test_dict_vectorizer():\n    df = pd.DataFrame(\n        [[{'a': 1, 'b': 2}], [{'a': 3}]],\n        columns=['colA']\n    )\n\n    outdf = DataFrameMapper(\n        [('colA', DictVectorizer())],\n        df_out=True,\n        default=False\n    ).fit_transform(df)\n\n    columns = sorted(list(outdf.columns))\n    assert len(columns) == 2\n    assert columns[0] == 'colA_0'\n    assert columns[1] == 'colA_1'\n\n\ndef test_with_car_dataframe(cars_dataframe):\n    pipeline = Pipeline([\n        (\"preprocess\", DataFrameMapper([\n            (\"description\", CountVectorizer()),\n        ])),\n        (\"classify\", SVC(kernel='linear'))\n    ])\n    data = cars_dataframe.drop(\"model\", axis=1)\n    labels = cars_dataframe[\"model\"]\n    scores = cross_val_score(pipeline, data, labels)\n    assert scores.mean() > 0.30\n\n\ndef test_direct_cross_validation(iris_dataframe):\n    \"\"\"\n    Starting with sklearn>=0.16.0 we no longer need CV wrappers for dataframes.\n    See https://github.com/paulgb/sklearn-pandas/issues/11\n    \"\"\"\n    pipeline = Pipeline([\n        (\"preprocess\", DataFrameMapper([\n            (\"petal length (cm)\", None),\n            (\"petal width (cm)\", None),\n            (\"sepal length (cm)\", None),\n            (\"sepal width (cm)\", None),\n        ])),\n        (\"classify\", SVC(kernel='linear'))\n    ])\n    data = iris_dataframe.drop(\"species\", axis=1)\n    labels = iris_dataframe[\"species\"]\n    scores = cross_val_score(pipeline, data, labels)\n    assert scores.mean() > 0.96\n    assert (scores.std() * 2) < 0.04\n\n\ndef test_heterogeneous_output_types_input_df():\n    \"\"\"\n    Modify feat2, but pass feat1 through unmodified.\n    This fails if input_df == False\n    \"\"\"\n    df = pd.DataFrame({\n        'feat1': [1, 2, 3, 4, 5, 6],\n        'feat2': [1.0, 2.0, 3.0, 2.0, 3.0, 4.0]\n    })\n    M = DataFrameMapper([\n        (['feat2'], StandardScaler())\n        ], input_df=True, df_out=True, default=None)\n    dft = M.fit_transform(df)\n    assert dft['feat1'].dtype == np.dtype('int64')\n    assert dft['feat2'].dtype == np.dtype('float64')\n\n\ndef test_make_column_selector(iris_dataframe):\n    t = DataFrameMapper([\n        (make_column_selector(dtype_include=float), None, {'alias': 'x'}),\n        ('sepal length (cm)', None),\n    ], df_out=True, default=False)\n\n    xt = t.fit(iris_dataframe).transform(iris_dataframe)\n    expected = ['x_0', 'x_1', 'x_2', 'x_3', 'sepal length (cm)']\n    assert list(xt.columns) == expected\n\n    pickled = pickle.dumps(t)\n    t2 = pickle.loads(pickled)\n    xt2 = t2.transform(iris_dataframe)\n    assert np.array_equal(xt.values, xt2.values)\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_features_generator.py",
      "content": "from collections import Counter\n\nimport pytest\nimport numpy as np\nfrom pandas import DataFrame\nfrom numpy.testing import assert_array_equal\n\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn_pandas.features_generator import gen_features\n\n\nclass MockClass(object):\n\n    def __init__(self, value=1, name='class'):\n        self.value = value\n        self.name = name\n\n\nclass MockTransformer(object):\n\n    def __init__(self):\n        self.most_common_ = None\n\n    def fit(self, X, y=None):\n        [(value, _)] = Counter(X).most_common(1)\n        self.most_common_ = value\n        return self\n\n    def transform(self, X, y=None):\n        return np.asarray([self.most_common_] * len(X))\n\n\n@pytest.fixture\ndef simple_dataset():\n    return DataFrame({\n        'feat1': [1, 2, 1, 3, 1],\n        'feat2': [1, 2, 2, 2, 3],\n        'feat3': [1, 2, 3, 4, 5],\n    })\n\n\ndef test_generate_features_with_default_parameters():\n    \"\"\"\n    Tests generating features from classes with default init arguments.\n    \"\"\"\n    columns = ['colA', 'colB', 'colC']\n    feature_defs = gen_features(columns=columns, classes=[MockClass])\n    assert len(feature_defs) == len(columns)\n\n    for feature in feature_defs:\n        assert feature[2] == {}\n\n    feature_dict = dict([_[0:2] for _ in feature_defs])\n    assert columns == sorted(feature_dict.keys())\n\n    # default init arguments for MockClass for clarification.\n    expected = {'value': 1, 'name': 'class'}\n    for column, transformers in feature_dict.items():\n        for obj in transformers:\n            assert_attributes(obj, **expected)\n\n\ndef test_generate_features_with_several_classes():\n    \"\"\"\n    Tests generating features pipeline with different transformers parameters.\n    \"\"\"\n    feature_defs = gen_features(\n        columns=['colA', 'colB', 'colC'],\n        classes=[\n            {'class': MockClass},\n            {'class': MockClass, 'name': 'mockA'},\n            {'class': MockClass, 'name': 'mockB', 'value': None}\n        ]\n    )\n\n    for col, transformers, params in feature_defs:\n        assert_attributes(transformers[0], name='class', value=1)\n        assert_attributes(transformers[1], name='mockA', value=1)\n        assert_attributes(transformers[2], name='mockB', value=None)\n\n\ndef test_generate_features_with_none_only_transformers():\n    \"\"\"\n    Tests generating \"dummy\" feature definition which doesn't apply any\n    transformation.\n    \"\"\"\n    feature_defs = gen_features(\n        columns=['colA', 'colB', 'colC'], classes=[None])\n\n    expected = [('colA', None, {}),\n                ('colB', None, {}),\n                ('colC', None, {})]\n\n    assert feature_defs == expected\n\n\ndef test_compatibility_with_data_frame_mapper(simple_dataset):\n    \"\"\"\n    Tests compatibility of generated feature definition with DataFrameMapper.\n    \"\"\"\n    features_defs = gen_features(\n        columns=['feat1', 'feat2'],\n        classes=[MockTransformer])\n    features_defs.append(('feat3', None))\n\n    mapper = DataFrameMapper(features_defs)\n    X = mapper.fit_transform(simple_dataset)\n    expected = np.asarray([\n        [1, 2, 1],\n        [1, 2, 2],\n        [1, 2, 3],\n        [1, 2, 4],\n        [1, 2, 5]\n    ])\n\n    assert_array_equal(X, expected)\n\n\ndef assert_attributes(obj, **attrs):\n    for attr, value in attrs.items():\n        assert getattr(obj, attr) == value\n"
    }
  ],
  "OriginCode": [
    {
      "path": "scikit-learn-contrib_sklearn-pandas/test.py",
      "content": "import pytest\nfrom unittest.mock import Mock\nimport numpy as np\nimport pandas as pd\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn.compose import make_column_selector\nfrom sklearn.preprocessing import StandardScaler\n\n\nclass GetStartWith:\n    def __init__(self, start_str):\n        self.start_str = start_str\n\n    def __call__(self, X: pd.DataFrame) -> list:\n        return [c for c in X.columns if c.startswith(self.start_str)]\n\n\ndf = pd.DataFrame({\n    'sepal length (cm)': [1.0, 2.0, 3.0],\n    'sepal width (cm)': [1.0, 2.0, 3.0],\n    'petal length (cm)': [1.0, 2.0, 3.0],\n    'petal width (cm)': [1.0, 2.0, 3.0]\n})\nt = DataFrameMapper([\n    (make_column_selector(dtype_include=float), StandardScaler(), {'alias': 'x'}),\n    (GetStartWith('petal'), None, {'alias': 'petal'})\n], df_out=True, default=False)\n\nt.fit(df)\nprint(t.transform(df).shape)\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/setup.py",
      "content": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom setuptools import setup\nfrom setuptools.command.test import test as TestCommand\nimport re\n\nfor line in open('sklearn_pandas/__init__.py'):\n    match = re.match(\"__version__ *= *'(.*)'\", line)\n    if match:\n        __version__, = match.groups()\n\n\nclass PyTest(TestCommand):\n    user_options = [('pytest-args=', 'a', \"Arguments to pass to py.test\")]\n\n    def initialize_options(self):\n        TestCommand.initialize_options(self)\n        self.pytest_args = []\n\n    def finalize_options(self):\n        TestCommand.finalize_options(self)\n        self.test_args = []\n        self.test_suite = True\n\n    def run(self):\n        import pytest\n        errno = pytest.main(self.pytest_args)\n        raise SystemExit(errno)\n\n\nsetup(name='sklearn-pandas',\n      version=__version__,\n      description='Pandas integration with sklearn',\n      maintainer='Ritesh Agrawal',\n      maintainer_email='ragrawal@gmail.com',\n      url='https://github.com/scikit-learn-contrib/sklearn-pandas',\n      packages=['sklearn_pandas'],\n      keywords=['scikit', 'sklearn', 'pandas'],\n      install_requires=[\n          'scikit-learn>=0.23.0',\n          'scipy>=1.5.1',\n          'pandas>=1.1.4',\n          'numpy>=1.18.1'\n      ],\n      tests_require=['pytest', 'mock'],\n      cmdclass={'test': PyTest},\n      license='MIT License'\n)\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/noxfile.py",
      "content": "import nox\n\n@nox.session\ndef lint(session):\n    session.install('pytest>=5.3.5', 'setuptools>=45.2',\n                    'wheel>=0.34.2', 'flake8>=3.7.9',\n                    'numpy==1.18.1', 'pandas==1.1.4')\n    session.install('.')\n    session.run('flake8', 'sklearn_pandas/', 'tests')\n\n@nox.session\n@nox.parametrize('numpy', ['1.18.1', '1.19.4', '1.20.1'])\n@nox.parametrize('scipy', ['1.5.4', '1.6.0'])\n@nox.parametrize('pandas', ['1.1.4', '1.2.2'])\ndef tests(session, numpy, scipy, pandas):\n    session.install('pytest>=5.3.5', \n                    'setuptools>=45.2',\n                    'wheel>=0.34.2',\n                    f'numpy=={numpy}',\n                    f'scipy=={scipy}',\n                    f'pandas=={pandas}'\n                    )\n    session.install('.')\n    session.run('py.test', 'README.rst', 'tests')\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/features_generator.py",
      "content": "def gen_features(columns, classes=None, prefix='', suffix=''):\n    \"\"\"Generates a feature definition list which can be passed\n    into DataFrameMapper\n\n    Params:\n\n    columns     a list of column names to generate features for.\n\n    classes     a list of classes for each feature, a list of dictionaries with\n                transformer class and init parameters, or None.\n\n                If list of classes is provided, then each of them is\n                instantiated with default arguments. Example:\n\n                    classes = [StandardScaler, LabelBinarizer]\n\n                If list of dictionaries is provided, then each of them should\n                have a 'class' key with transformer class. All other keys are\n                passed into 'class' value constructor. Example:\n\n                    classes = [\n                        {'class': StandardScaler, 'with_mean': False},\n                        {'class': LabelBinarizer}\n                    }]\n\n                If None value selected, then each feature left as is.\n\n    prefix      add prefix to transformed column names\n\n    suffix      add suffix to transformed column names.\n\n    \"\"\"\n    if classes is None:\n        return [(column, None) for column in columns]\n\n    feature_defs = []\n\n    for column in columns:\n        feature_transformers = []\n\n        arguments = {}\n        if prefix and prefix != \"\":\n            arguments['prefix'] = prefix\n        if suffix and suffix != \"\":\n            arguments['suffix'] = suffix\n\n        classes = [cls for cls in classes if cls is not None]\n        if not classes:\n            feature_defs.append((column, None, arguments))\n\n        else:\n            for definition in classes:\n                if isinstance(definition, dict):\n                    params = definition.copy()\n                    klass = params.pop('class')\n                    feature_transformers.append(klass(**params))\n                else:\n                    feature_transformers.append(definition())\n\n            if not feature_transformers:\n                feature_transformers = None\n\n            feature_defs.append((column, feature_transformers, arguments))\n\n    return feature_defs\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/transformers.py",
      "content": "import numpy as np\nimport pandas as pd\nfrom sklearn.base import TransformerMixin\nimport warnings\n\n\ndef _get_mask(X, value):\n    \"\"\"\n    Compute the boolean mask X == missing_values.\n    \"\"\"\n    if value == \"NaN\" or \\\n       value is None or \\\n       (isinstance(value, float) and np.isnan(value)):\n        return pd.isnull(X)\n    else:\n        return X == value\n\n\nclass NumericalTransformer(TransformerMixin):\n    \"\"\"\n    Provides commonly used numerical transformers.\n    \"\"\"\n    SUPPORTED_FUNCTIONS = ['log', 'log1p']\n\n    def __init__(self, func):\n        \"\"\"\n        Params\n\n        func    function to apply to input columns. The function will be\n                applied to each value. Supported functions are defined\n                in SUPPORTED_FUNCTIONS variable. Throws assertion error if the\n                not supported.\n        \"\"\"\n\n        warnings.warn(\"\"\"\n            NumericalTransformer will be deprecated in 3.0 version.\n            Please use Sklearn.base.TransformerMixin to write\n            customer transformers\n            \"\"\", DeprecationWarning)\n\n        assert func in self.SUPPORTED_FUNCTIONS, \\\n            f\"Only following func are supported: {self.SUPPORTED_FUNCTIONS}\"\n        super(NumericalTransformer, self).__init__()\n        self.__func = func\n\n    def fit(self, X, y=None):\n        return self\n\n    def transform(self, X, y=None):\n        if self.__func == 'log1p':\n            return np.vectorize(np.log1p)(X)\n        elif self.__func == 'log':\n            return np.vectorize(np.log)(X)\n\n        raise ValueError(f\"Invalid function name: {self.__func}\")\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/__init__.py",
      "content": "__version__ = '2.2.0'\n\nimport logging\nlogger = logging.getLogger(__name__)\n\nfrom .dataframe_mapper import DataFrameMapper  # NOQA\nfrom .features_generator import gen_features  # NOQA\nfrom .transformers import NumericalTransformer # NOQA\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/pipeline.py",
      "content": "import six\nfrom sklearn.pipeline import _name_estimators, Pipeline\nfrom sklearn.utils import tosequence\n\n\ndef _call_fit(fit_method, X, y=None, **kwargs):\n    \"\"\"\n    helper function, calls the fit or fit_transform method with the correct\n    number of parameters\n\n    fit_method: fit or fit_transform method of the transformer\n    X: the data to fit\n    y: the target vector relative to X, optional\n    kwargs: any keyword arguments to the fit method\n\n    return: the result of the fit or fit_transform method\n\n    WARNING: if this function raises a TypeError exception, test the fit\n    or fit_transform method passed to it in isolation as _call_fit will not\n    distinguish TypeError due to incorrect number of arguments from\n    other TypeError\n    \"\"\"\n    try:\n        return fit_method(X, y, **kwargs)\n    except TypeError:\n        # fit takes only one argument\n        return fit_method(X, **kwargs)\n\n\nclass TransformerPipeline(Pipeline):\n    \"\"\"\n    Pipeline that expects all steps to be transformers taking a single X\n    argument, an optional y argument, and having fit and transform methods.\n\n    Code is copied from sklearn's Pipeline\n    \"\"\"\n\n    def __init__(self, steps):\n        names, estimators = zip(*steps)\n        if len(dict(steps)) != len(steps):\n            raise ValueError(\n                \"Provided step names are not unique: %s\" % (names,))\n\n        # shallow copy of steps\n        self.steps = tosequence(steps)\n        estimator = estimators[-1]\n\n        for e in estimators:\n            if (not (hasattr(e, \"fit\") or hasattr(e, \"fit_transform\")) or not\n                    hasattr(e, \"transform\")):\n                raise TypeError(\"All steps of the chain should \"\n                                \"be transforms and implement fit and transform\"\n                                \" '%s' (type %s) doesn't)\" % (e, type(e)))\n\n        if not hasattr(estimator, \"fit\"):\n            raise TypeError(\"Last step of chain should implement fit \"\n                            \"'%s' (type %s) doesn't)\"\n                            % (estimator, type(estimator)))\n\n    def _pre_transform(self, X, y=None, **fit_params):\n        fit_params_steps = dict((step, {}) for step, _ in self.steps)\n        for pname, pval in six.iteritems(fit_params):\n            step, param = pname.split('__', 1)\n            fit_params_steps[step][param] = pval\n        Xt = X\n        for name, transform in self.steps[:-1]:\n            if hasattr(transform, \"fit_transform\"):\n                Xt = _call_fit(transform.fit_transform,\n                               Xt, y, **fit_params_steps[name])\n            else:\n                Xt = _call_fit(transform.fit,\n                               Xt, y, **fit_params_steps[name]).transform(Xt)\n        return Xt, fit_params_steps[self.steps[-1][0]]\n\n    def fit(self, X, y=None, **fit_params):\n        Xt, fit_params = self._pre_transform(X, y, **fit_params)\n        _call_fit(self.steps[-1][-1].fit, Xt, y, **fit_params)\n        return self\n\n    def fit_transform(self, X, y=None, **fit_params):\n        Xt, fit_params = self._pre_transform(X, y, **fit_params)\n        if hasattr(self.steps[-1][-1], 'fit_transform'):\n            return _call_fit(self.steps[-1][-1].fit_transform,\n                             Xt, y, **fit_params)\n        else:\n            return _call_fit(self.steps[-1][-1].fit,\n                             Xt, y, **fit_params).transform(Xt)\n\n\ndef make_transformer_pipeline(*steps):\n    \"\"\"Construct a TransformerPipeline from the given estimators.\n    \"\"\"\n    return TransformerPipeline(_name_estimators(steps))\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py",
      "content": "import contextlib\nfrom datetime import datetime\nimport pandas as pd\nimport numpy as np\nfrom scipy import sparse\nfrom sklearn.base import BaseEstimator, TransformerMixin\nfrom .cross_validation import DataWrapper\nfrom .pipeline import make_transformer_pipeline, _call_fit, TransformerPipeline\nfrom . import logger\n\nstring_types = text_type = str\n\n\ndef _handle_feature(fea):\n    \"\"\"\n    Convert 1-dimensional arrays to 2-dimensional column vectors.\n    \"\"\"\n    if len(fea.shape) == 1:\n        fea = np.array([fea]).T\n\n    return fea\n\n\ndef _build_transformer(transformers):\n    if isinstance(transformers, list):\n        transformers = make_transformer_pipeline(*transformers)\n    return transformers\n\n\ndef _build_feature(columns, transformers, options={}, X=None):\n    if X is None:\n        return (columns, _build_transformer(transformers), options)\n    return (\n        columns(X) if callable(columns) else columns,\n        _build_transformer(transformers),\n        options\n    )\n\n\ndef _elapsed_secs(t1):\n    return (datetime.now()-t1).total_seconds()\n\n\ndef _get_feature_names(estimator):\n    \"\"\"\n    Attempt to extract feature names based on a given estimator\n    \"\"\"\n    if hasattr(estimator, 'classes_'):\n        return estimator.classes_\n    elif hasattr(estimator, 'get_feature_names'):\n        return estimator.get_feature_names()\n    return None\n\n\n@contextlib.contextmanager\ndef add_column_names_to_exception(column_names):\n    # Stolen from https://stackoverflow.com/a/17677938/356729\n    try:\n        yield\n    except Exception as ex:\n        if ex.args:\n            msg = u'{}: {}'.format(column_names, ex.args[0])\n        else:\n            msg = text_type(column_names)\n        ex.args = (msg,) + ex.args[1:]\n        raise\n\n\nclass DataFrameMapper(BaseEstimator, TransformerMixin):\n    \"\"\"\n    Map Pandas data frame column subsets to their own\n    sklearn transformation.\n    \"\"\"\n\n    def __init__(self, features, default=False, sparse=False, df_out=False,\n                 input_df=False, drop_cols=None):\n        \"\"\"\n        Params:\n\n        features    a list of tuples with features definitions.\n                    The first element is the pandas column selector. This can\n                    be a string (for one column) or a list of strings.\n                    The second element is an object that supports\n                    sklearn's transform interface, or a list of such objects\n                    The third element is optional and, if present, must be\n                    a dictionary with the options to apply to the\n                    transformation. Example: {'alias': 'day_of_week'}\n\n        default     default transformer to apply to the columns not\n                    explicitly selected in the mapper. If False (default),\n                    discard them. If None, pass them through untouched. Any\n                    other transformer will be applied to all the unselected\n                    columns as a whole, taken as a 2d-array.\n\n        sparse      will return sparse matrix if set True and any of the\n                    extracted features is sparse. Defaults to False.\n\n        df_out      return a pandas data frame, with each column named using\n                    the pandas column that created it (if there's only one\n                    input and output) or the input columns joined with '_'\n                    if there's multiple inputs, and the name concatenated with\n                    '_1', '_2' etc if there's multiple outputs. NB: does not\n                    work if *default* or *sparse* are true\n\n        input_df    If ``True`` pass the selected columns to the transformers\n                    as a pandas DataFrame or Series. Otherwise pass them as a\n                    numpy array. Defaults to ``False``.\n\n        drop_cols   List of columns to be dropped. Defaults to None.\n\n        \"\"\"\n        self.features = features\n        self.default = default\n        self.built_default = None\n        self.sparse = sparse\n        self.df_out = df_out\n        self.input_df = input_df\n        self.drop_cols = [] if drop_cols is None else drop_cols\n        self.transformed_names_ = []\n        if (df_out and (sparse or default)):\n            raise ValueError(\"Can not use df_out with sparse or default\")\n\n    def _build(self, X=None):\n        \"\"\"\n        Build attributes built_features and built_default.\n        \"\"\"\n        if isinstance(self.features, list):\n            self.built_features = [\n                _build_feature(*f, X=X) for f in self.features\n            ]\n        else:\n            self.built_features = _build_feature(*self.features, X=X)\n        self.built_default = _build_transformer(self.default)\n\n    @property\n    def _selected_columns(self):\n        \"\"\"\n        Return a set of selected columns in the feature list.\n        \"\"\"\n        selected_columns = set()\n        for feature in self.features:\n            columns = feature[0]\n            if isinstance(columns, list):\n                selected_columns = selected_columns.union(set(columns))\n            else:\n                selected_columns.add(columns)\n        return selected_columns\n\n    def _unselected_columns(self, X):\n        \"\"\"\n        Return list of columns present in X and not selected explicitly in the\n        mapper.\n\n        Unselected columns are returned in the order they appear in the\n        dataframe to avoid issues with different ordering during default fit\n        and transform steps.\n        \"\"\"\n        X_columns = list(X.columns)\n        return [column for column in X_columns if\n                column not in self._selected_columns\n                and column not in self.drop_cols]\n\n    def __setstate__(self, state):\n        # compatibility for older versions of sklearn-pandas\n        super().__setstate__(state)\n        self.features = [_build_feature(*feat) for feat in state['features']]\n        self.sparse = state.get('sparse', False)\n        self.default = state.get('default', False)\n        self.df_out = state.get('df_out', False)\n        self.input_df = state.get('input_df', False)\n        self.drop_cols = state.get('drop_cols', [])\n        self.built_features = state.get('built_features', self.features)\n        self.built_default = state.get('built_default', self.default)\n        self.transformed_names_ = state.get('transformed_names_', [])\n\n    def __getstate__(self):\n        state = super().__getstate__()\n        state['features'] = self.features\n        state['sparse'] = self.sparse\n        state['default'] = self.default\n        state['df_out'] = self.df_out\n        state['input_df'] = self.input_df\n        state['drop_cols'] = self.drop_cols\n        state['build_features'] = getattr(self, 'built_features', None)\n        state['built_default'] = self.built_default\n        state['transformed_names_'] = self.transformed_names_\n        return state\n\n    def _get_col_subset(self, X, cols, input_df=False):\n        \"\"\"\n        Get a subset of columns from the given table X.\n\n        X       a Pandas dataframe; the table to select columns from\n        cols    a string or list of strings representing the columns to select.\n                It can also be a callable that returns True or False, i.e.\n                compatible with the built-in filter function.\n\n        Returns a numpy array with the data from the selected columns\n        \"\"\"\n\n        if isinstance(cols, string_types):\n            return_vector = True\n            cols = [cols]\n        else:\n            return_vector = False\n\n        # Needed when using the cross-validation compatibility\n        # layer for sklearn<0.16.0.\n        # Will be dropped on sklearn-pandas 2.0.\n        if isinstance(X, list):\n            X = [x[cols] for x in X]\n            X = pd.DataFrame(X)\n\n        elif isinstance(X, DataWrapper):\n            X = X.df  # fetch underlying data\n\n        if return_vector:\n            t = X[cols[0]]\n        else:\n            t = X[cols]\n\n        # return either a DataFrame/Series or a numpy array\n        if input_df:\n            return t\n        else:\n            return t.values\n\n    def fit(self, X, y=None):\n        \"\"\"\n        Fit a transformation from the pipeline\n\n        X       the data to fit\n\n        y       the target vector relative to X, optional\n\n        \"\"\"\n        self._build(X=X)\n\n        for columns, transformers, options in self.built_features:\n            t1 = datetime.now()\n            input_df = options.get('input_df', self.input_df)\n\n            if transformers is not None:\n                with add_column_names_to_exception(columns):\n                    Xt = self._get_col_subset(X, columns, input_df)\n                    _call_fit(transformers.fit, Xt, y)\n            logger.info(f\"[FIT] {columns}: {_elapsed_secs(t1)} secs\")\n\n        # handle features not explicitly selected\n        if self.built_default:  # not False and not None\n            unsel_cols = self._unselected_columns(X)\n            with add_column_names_to_exception(unsel_cols):\n                Xt = self._get_col_subset(X, unsel_cols, self.input_df)\n                _call_fit(self.built_default.fit, Xt, y)\n        return self\n\n    def get_names(self, columns, transformer, x, alias=None, prefix='',\n                  suffix=''):\n        \"\"\"\n        Return verbose names for the transformed columns.\n\n        columns       name (or list of names) of the original column(s)\n        transformer   transformer - can be a TransformerPipeline\n        x             transformed columns (numpy.ndarray)\n        alias         base name to use for the selected columns\n        \"\"\"\n        if alias is not None:\n            name = alias\n        elif isinstance(columns, list):\n            name = '_'.join(map(str, columns))\n        else:\n            name = columns\n        num_cols = x.shape[1] if len(x.shape) > 1 else 1\n\n        output = []\n\n        if num_cols > 1:\n            # If there are as many columns as classes in the transformer,\n            # infer column names from classes names.\n\n            # If we are dealing with multiple transformers for these columns\n            # attempt to extract the names from each of them, starting from the\n            # last one\n            if isinstance(transformer, TransformerPipeline):\n                inverse_steps = transformer.steps[::-1]\n                estimators = (estimator for name, estimator in inverse_steps)\n                names_steps = (_get_feature_names(e) for e in estimators)\n                names = next((n for n in names_steps if n is not None), None)\n            # Otherwise use the only estimator present\n            else:\n                names = _get_feature_names(transformer)\n\n            if names is not None and len(names) == num_cols:\n                output = [f\"{name}_{o}\" for o in names]\n                # otherwise, return name concatenated with '_1', '_2', etc.\n            else:\n                output = [name + '_' + str(o) for o in range(num_cols)]\n        else:\n            output = [name]\n\n        if prefix == suffix == \"\":\n            return output\n\n        return ['{}{}{}'.format(prefix, x, suffix) for x in output]\n\n    def get_dtypes(self, extracted):\n        dtypes_features = [self.get_dtype(ex) for ex in extracted]\n        return [dtype for dtype_feature in dtypes_features\n                for dtype in dtype_feature]\n\n    def get_dtype(self, ex):\n        if isinstance(ex, np.ndarray) or sparse.issparse(ex):\n            return [ex.dtype] * ex.shape[1]\n        elif isinstance(ex, pd.DataFrame):\n            return list(ex.dtypes)\n        else:\n            raise TypeError(type(ex))\n\n    def _transform(self, X, y=None, do_fit=False):\n        \"\"\"\n        Transform the given data with possibility to fit in advance.\n        Avoids code duplication for implementation of transform and\n        fit_transform.\n        \"\"\"\n        if do_fit:\n            self._build(X=X)\n\n        extracted = []\n        transformed_names_ = []\n        for columns, transformers, options in self.built_features:\n            input_df = options.get('input_df', self.input_df)\n\n            # columns could be a string or list of\n            # strings; we don't care because pandas\n            # will handle either.\n            Xt = self._get_col_subset(X, columns, input_df)\n\n            if transformers is not None:\n                with add_column_names_to_exception(columns):\n                    if do_fit and hasattr(transformers, 'fit_transform'):\n                        t1 = datetime.now()\n                        Xt = _call_fit(transformers.fit_transform, Xt, y)\n                        logger.info(f\"[FIT_TRANSFORM] {columns}: {_elapsed_secs(t1)} secs\")  # NOQA\n                    else:\n                        if do_fit:\n                            t1 = datetime.now()\n                            _call_fit(transformers.fit, Xt, y)\n                            logger.info(\n                                f\"[FIT] {columns}: {_elapsed_secs(t1)} secs\")\n\n                        t1 = datetime.now()\n                        Xt = transformers.transform(Xt)\n                        logger.info(f\"[TRANSFORM] {columns}: {_elapsed_secs(t1)} secs\")  # NOQA\n\n            extracted.append(_handle_feature(Xt))\n\n            alias = options.get('alias')\n\n            prefix = options.get('prefix', '')\n            suffix = options.get('suffix', '')\n\n            transformed_names_ += self.get_names(\n                columns, transformers, Xt, alias, prefix, suffix)\n\n        # handle features not explicitly selected\n        if self.built_default is not False:\n            unsel_cols = self._unselected_columns(X)\n            Xt = self._get_col_subset(X, unsel_cols, self.input_df)\n            if self.built_default is not None:\n                with add_column_names_to_exception(unsel_cols):\n                    if do_fit and hasattr(self.built_default, 'fit_transform'):\n                        Xt = _call_fit(self.built_default.fit_transform, Xt, y)\n                    else:\n                        if do_fit:\n                            _call_fit(self.built_default.fit, Xt, y)\n                        Xt = self.built_default.transform(Xt)\n                transformed_names_ += self.get_names(\n                    unsel_cols, self.built_default, Xt)\n            else:\n                # if not applying a default transformer,\n                # keep column names unmodified\n                transformed_names_ += unsel_cols\n\n            extracted.append(_handle_feature(Xt))\n\n        self.transformed_names_ = transformed_names_\n\n        # combine the feature outputs into one array.\n        # at this point we lose track of which features\n        # were created from which input columns, so it's\n        # assumed that that doesn't matter to the model.\n\n        # If any of the extracted features is sparse, combine sparsely.\n        # Otherwise, combine as normal arrays.\n        if any(sparse.issparse(fea) for fea in extracted):\n            stacked = sparse.hstack(extracted).tocsr()\n            # return a sparse matrix only if the mapper was initialized\n            # with sparse=True\n            if not self.sparse:\n                stacked = stacked.toarray()\n        else:\n            stacked = np.hstack(extracted)\n\n        if self.df_out:\n            # if no rows were dropped preserve the original index,\n            # otherwise use a new integer one\n            no_rows_dropped = len(X) == len(stacked)\n            if no_rows_dropped:\n                index = X.index\n            else:\n                index = None\n\n            # output different data types, if appropriate\n            dtypes = self.get_dtypes(extracted)\n            df_out = pd.DataFrame(\n                stacked,\n                columns=self.transformed_names_,\n                index=index)\n            # preserve types\n            for col, dtype in zip(self.transformed_names_, dtypes):\n                df_out[col] = df_out[col].astype(dtype)\n            return df_out\n        else:\n            return stacked\n\n    def transform(self, X):\n        \"\"\"\n        Transform the given data. Assumes that fit has already been called.\n\n        X       the data to transform\n        \"\"\"\n        return self._transform(X)\n\n    def fit_transform(self, X, y=None):\n        \"\"\"\n        Fit a transformation from the pipeline and directly apply\n        it to the given data.\n\n        X       the data to fit\n\n        y       the target vector relative to X, optional\n        \"\"\"\n        return self._transform(X, y, True)\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/cross_validation.py",
      "content": "class DataWrapper(object):\n\n    def __init__(self, df):\n        self.df = df\n\n    def __len__(self):\n        return len(self.df)\n\n    def __getitem__(self, key):\n        return self.df.iloc[key]\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_pipeline.py",
      "content": "import pytest\nfrom sklearn_pandas.pipeline import TransformerPipeline, _call_fit\n\n# In py3, mock is included with the unittest standard library\n# In py2, it's a separate package\ntry:\n    from unittest.mock import patch\nexcept ImportError:\n    from mock import patch\n\n\nclass NoTransformT(object):\n    \"\"\"Transformer without transform method.\n    \"\"\"\n    def fit(self, x):\n        return self\n\n\nclass NoFitT(object):\n    \"\"\"Transformer without fit method.\n    \"\"\"\n    def transform(self, x):\n        return self\n\n\nclass Trans(object):\n    \"\"\"\n    Transformer with fit and transform methods\n    \"\"\"\n    def fit(self, x, y=None):\n        return self\n\n    def transform(self, x):\n        return self\n\n\ndef func_x_y(x, y, kwarg='kwarg'):\n    \"\"\"\n    Function with required x and y arguments\n    \"\"\"\n    return\n\n\ndef func_x(x, kwarg='kwarg'):\n    \"\"\"\n    Function with required x argument\n    \"\"\"\n    return\n\n\ndef func_raise_type_err(x, y, kwarg='kwarg'):\n    \"\"\"\n    Function with required x and y arguments,\n    raises TypeError\n    \"\"\"\n    raise TypeError\n\n\ndef test_all_steps_fit_transform():\n    \"\"\"\n    All steps must implement fit and transform. Otherwise, raise TypeError.\n    \"\"\"\n    with pytest.raises(TypeError):\n        TransformerPipeline([('svc', NoTransformT())])\n\n    with pytest.raises(TypeError):\n        TransformerPipeline([('svc', NoFitT())])\n\n\n@patch.object(Trans, 'fit', side_effect=func_x_y)\ndef test_called_with_x_and_y(mock_fit):\n    \"\"\"\n    Fit method with required X and y arguments is called with both and with\n    any additional keywords\n    \"\"\"\n    _call_fit(Trans().fit, 'X', 'y', kwarg='kwarg')\n    mock_fit.assert_called_with('X', 'y', kwarg='kwarg')\n\n\n@patch.object(Trans, 'fit', side_effect=func_x)\ndef test_called_with_x(mock_fit):\n    \"\"\"\n    Fit method with a required X arguments is called with it and with\n    any additional keywords\n    \"\"\"\n    _call_fit(Trans().fit, 'X', 'y', kwarg='kwarg')\n    mock_fit.assert_called_with('X', kwarg='kwarg')\n\n    _call_fit(Trans().fit, 'X', kwarg='kwarg')\n    mock_fit.assert_called_with('X', kwarg='kwarg')\n\n\n@patch.object(Trans, 'fit', side_effect=func_raise_type_err)\ndef test_raises_type_error(mock_fit):\n    \"\"\"\n    If a fit method with required X and y arguments raises a TypeError, it's\n    re-raised (for a different reason) when it's called with one argument\n    \"\"\"\n    with pytest.raises(TypeError):\n        _call_fit(Trans().fit, 'X', 'y', kwarg='kwarg')\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_transformers.py",
      "content": "import tempfile\nimport pytest\nimport numpy as np\nfrom pandas import DataFrame\nimport joblib\n\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn_pandas import NumericalTransformer\n\n\n@pytest.fixture\ndef simple_dataset():\n    return DataFrame({\n        'feat1': [1, 2, 1, 3, 1],\n        'feat2': [1, 2, 2, 2, 3],\n        'feat3': [1, 2, 3, 4, 5],\n    })\n\n\ndef test_common_numerical_transformer(simple_dataset):\n    \"\"\"\n    Test log transformation\n    \"\"\"\n    transfomer = DataFrameMapper([\n        ('feat1', NumericalTransformer('log'))\n    ], df_out=True)\n    df = simple_dataset\n    outDF = transfomer.fit_transform(df)\n    assert list(outDF.columns) == ['feat1']\n    assert np.array_equal(df['feat1'].apply(np.log).values, outDF.feat1.values)\n\n\ndef test_numerical_transformer_serialization(simple_dataset):\n    \"\"\"\n    Test if you can serialize transformer\n    \"\"\"\n    transfomer = DataFrameMapper([\n        ('feat1', NumericalTransformer('log'))\n    ])\n\n    df = simple_dataset\n    transfomer.fit(df)\n    f = tempfile.NamedTemporaryFile(delete=True)\n    joblib.dump(transfomer, f.name)\n    transfomer2 = joblib.load(f.name)\n    np.array_equal(transfomer.transform(df), transfomer2.transform(df))\n    f.close()\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
      "content": "# -*- coding: utf8 -*-\n\nimport pytest\nfrom unittest.mock import Mock\nfrom pandas import DataFrame\nimport pandas as pd\nfrom scipy import sparse\nfrom sklearn.datasets import load_iris\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.model_selection import cross_val_score\nfrom sklearn.svm import SVC\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.feature_extraction import DictVectorizer\nfrom sklearn.preprocessing import (\n    StandardScaler, OneHotEncoder, LabelBinarizer)\nfrom sklearn.impute import SimpleImputer as Imputer\nfrom sklearn.feature_selection import SelectKBest, chi2\nfrom sklearn.base import BaseEstimator, TransformerMixin\nimport sklearn.decomposition\nimport numpy as np\nfrom numpy.testing import assert_array_equal\nimport pickle\nfrom sklearn.compose import make_column_selector\n\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn_pandas.dataframe_mapper import _handle_feature, _build_transformer\nfrom sklearn_pandas.pipeline import TransformerPipeline\n\n\nclass MockXTransformer(object):\n    \"\"\"\n    Mock transformer that accepts no y argument.\n    \"\"\"\n    def fit(self, X):\n        return self\n\n    def transform(self, X):\n        return X\n\n\nclass MockTClassifier(object):\n    \"\"\"\n    Mock transformer/classifier.\n    \"\"\"\n    def fit(self, X, y=None):\n        return self\n\n    def transform(self, X):\n        return X\n\n    def predict(self, X):\n        return True\n\n\nclass DateEncoder():\n    def fit(self, X, y=None):\n        return self\n\n    def transform(self, X):\n        dt = X.dt\n        return pd.concat([dt.year, dt.month, dt.day], axis=1)\n\n\nclass ToSparseTransformer(BaseEstimator, TransformerMixin):\n    \"\"\"\n    Transforms numpy matrix to sparse format.\n    \"\"\"\n    def fit(self, X):\n        return self\n\n    def transform(self, X):\n        return sparse.csr_matrix(X)\n\n\nclass CustomTransformer(BaseEstimator, TransformerMixin):\n    \"\"\"\n    Example of transformer in which the number of classes\n    is not equals to the number of output columns.\n    \"\"\"\n    def fit(self, X, y=None):\n        self.min = X.min()\n        self.classes_ = np.unique(X)\n        return self\n\n    def transform(self, X):\n        classes = np.unique(X)\n        if len(np.setdiff1d(classes, self.classes_)) > 0:\n            raise ValueError('Unknown values found.')\n        return X - self.min\n\n\nclass MockImageTransformer(BaseEstimator, TransformerMixin):\n    \"\"\"\n    Example transformer that takes the max of a 2d vector\n    then scales the result.\n    \"\"\"\n    def __init__(self, multiplier=10.0):\n        self.multiplier = multiplier\n\n    def fit(self, X, y=None):\n        return self\n\n    def transform(self, X):\n        assert isinstance(X, pd.DataFrame)\n        for col in X.columns:\n            X[col] = X[col].map(lambda img: np.max(img))\n        return X * self.multiplier\n\n\n@pytest.fixture\ndef simple_dataframe():\n    return pd.DataFrame({'a': [1, 2, 3]})\n\n\n@pytest.fixture\ndef complex_dataframe():\n    return pd.DataFrame({'target': ['a', 'a', 'b', 'b', 'c', 'c'],\n                         'feat1': [1, 2, 3, 4, 5, 6],\n                         'feat2': [1, 2, 3, 2, 3, 4]})\n\n\n@pytest.fixture\ndef complex_object_dataframe():\n    return pd.DataFrame({'target': ['a', 'a', 'b', 'b', 'c', 'c'],\n                         'feat1': [1, 2, 3, 4, 5, 6],\n                         'feat2': [1, 2, 3, 2, 3, 4],\n                         'img2d': [1*np.eye(2), 2*np.eye(2), 3*np.eye(2),\n                                   4*np.eye(2), 5*np.eye(2), 6*np.eye(2)]})\n\n\n@pytest.fixture\ndef multiindex_dataframe():\n    \"\"\"Example MultiIndex DataFrame, taken from pandas documentation\n    \"\"\"\n    iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]\n    index = pd.MultiIndex.from_product(iterables, names=['first', 'second'])\n    df = pd.DataFrame(np.random.randn(10, 8), columns=index)\n    return df\n\n\n@pytest.fixture\ndef multiindex_dataframe_incomplete(multiindex_dataframe):\n    \"\"\"Example MultiIndex DataFrame with missing entries\n    \"\"\"\n    df = multiindex_dataframe\n    mask_array = np.zeros(df.size)\n    mask_array[:20] = 1\n    np.random.shuffle(mask_array)\n    mask = mask_array.reshape(df.shape).astype(bool)\n    df.mask(mask, inplace=True)\n    return df\n\n\ndef test_transformed_names_simple(simple_dataframe):\n    \"\"\"\n    Get transformed names of features in `transformed_names` attribute\n    for simple transformation\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', None)])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['a']\n\n\ndef test_transformed_names_binarizer(complex_dataframe):\n    \"\"\"\n    Get transformed names of features in `transformed_names` attribute\n    for a transformation that multiplies the number of columns\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([('target', LabelBinarizer())])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['target_a', 'target_b', 'target_c']\n\n\ndef test_logging(caplog, complex_dataframe):\n    \"\"\"\n    Get transformed names of features in `transformed_names` attribute\n    for a transformation that multiplies the number of columns\n    \"\"\"\n    import logging\n    logger = logging.getLogger('sklearn_pandas')\n    logger.setLevel(logging.INFO)\n    df = complex_dataframe\n    mapper = DataFrameMapper([('target', LabelBinarizer())])\n    mapper.fit_transform(df)\n    assert '[FIT_TRANSFORM] target:' in caplog.text\n\n\ndef test_transformed_names_binarizer_unicode():\n    df = pd.DataFrame({'target': [u'ñ', u'á', u'é']})\n    mapper = DataFrameMapper([('target', LabelBinarizer())])\n    mapper.fit_transform(df)\n    expected_names = {u'target_ñ', u'target_á', u'target_é'}\n    assert set(mapper.transformed_names_) == expected_names\n\n\ndef test_transformed_names_transformers_list(complex_dataframe):\n    \"\"\"\n    When using a list of transformers, use them in inverse order to get the\n    transformed names\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([\n        ('target', [LabelBinarizer(), MockXTransformer()])\n    ])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['target_a', 'target_b', 'target_c']\n\n\ndef test_transformed_names_simple_alias(simple_dataframe):\n    \"\"\"\n    If we specify an alias for a single output column, it is used for the\n    output\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', None, {'alias': 'new_name'})])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['new_name']\n\n\ndef test_transformed_names_complex_alias(complex_dataframe):\n    \"\"\"\n    If we specify an alias for a multiple output column, it is used for the\n    output\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([('target', LabelBinarizer(), {'alias': 'new'})])\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['new_a', 'new_b', 'new_c']\n\n\ndef test_exception_column_context_transform(simple_dataframe):\n    \"\"\"\n    If an exception is raised when transforming a column,\n    the exception includes the name of the column being transformed\n    \"\"\"\n    class FailingTransformer(object):\n        def fit(self, X):\n            pass\n\n        def transform(self, X):\n            raise Exception('Some exception')\n\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', FailingTransformer())])\n    mapper.fit(df)\n\n    with pytest.raises(Exception, match='a: Some exception'):\n        mapper.transform(df)\n\n\ndef test_exception_column_context_fit(simple_dataframe):\n    \"\"\"\n    If an exception is raised when fit a column,\n    the exception includes the name of the column being fitted\n    \"\"\"\n    class FailingFitter(object):\n        def fit(self, X):\n            raise Exception('Some exception')\n\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', FailingFitter())])\n\n    with pytest.raises(Exception, match='a: Some exception'):\n        mapper.fit(df)\n\n\ndef test_simple_df(simple_dataframe):\n    \"\"\"\n    Get a dataframe from a simple mapped dataframe\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', None)], df_out=True)\n    transformed = mapper.fit_transform(df)\n    assert type(transformed) == pd.DataFrame\n    assert len(transformed[\"a\"]) == len(simple_dataframe[\"a\"])\n\n\ndef test_complex_df(complex_dataframe):\n    \"\"\"\n    Get a dataframe from a complex mapped dataframe\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper(\n        [('target', None), ('feat1', None), ('feat2', None)],\n        df_out=True)\n    transformed = mapper.fit_transform(df)\n    assert len(transformed) == len(complex_dataframe)\n    for c in df.columns:\n        assert len(transformed[c]) == len(df[c])\n\n\ndef test_complex_object_df(complex_object_dataframe):\n    \"\"\"\n    Get a dataframe from a complex dataframe with 2d features\n    \"\"\"\n    df = complex_object_dataframe\n    img_scale = 10\n    mapper = DataFrameMapper(\n        [('target', None), ('feat1', None),\n         (make_column_selector('feat2'), StandardScaler()),\n         (make_column_selector('img2d'), MockImageTransformer(img_scale))],\n        df_out=True, input_df=True)\n    transformed = mapper.fit_transform(df)\n    assert len(transformed) == len(complex_object_dataframe)\n    assert np.isclose(\n        np.sum(transformed['img2d']),\n        np.max(np.sum(df['img2d'])) * img_scale, atol=1e-12)\n\n\ndef test_numeric_column_names(complex_dataframe):\n    \"\"\"\n    Get a dataframe from a complex mapped dataframe with numeric column names\n    \"\"\"\n    df = complex_dataframe\n    df.columns = [0, 1, 2]\n    mapper = DataFrameMapper(\n        [(0, None), (1, None), (2, None)], df_out=True)\n    transformed = mapper.fit_transform(df)\n    assert len(transformed) == len(complex_dataframe)\n    for c in df.columns:\n        assert len(transformed[c]) == len(df[c])\n\n\ndef test_multiindex_df(multiindex_dataframe_incomplete):\n    \"\"\"\n    Get a dataframe from a multiindex dataframe with missing data\n    \"\"\"\n    df = multiindex_dataframe_incomplete\n    mapper = DataFrameMapper([([c], Imputer()) for c in df.columns],\n                             df_out=True)\n    transformed = mapper.fit_transform(df)\n    assert len(transformed) == len(multiindex_dataframe_incomplete)\n    for c in df.columns:\n        assert len(transformed[str(c)]) == len(df[c])\n\n\ndef test_binarizer_df():\n    \"\"\"\n    Check level names from LabelBinarizer\n    \"\"\"\n    df = pd.DataFrame({'target': ['a', 'a', 'b', 'b', 'c', 'a']})\n    mapper = DataFrameMapper([('target', LabelBinarizer())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 3\n    assert cols[0] == 'target_a'\n    assert cols[1] == 'target_b'\n    assert cols[2] == 'target_c'\n\n\ndef test_binarizer_int_df():\n    \"\"\"\n    Check level names from LabelBinarizer for a numeric array.\n    \"\"\"\n    df = pd.DataFrame({'target': [5, 5, 6, 6, 7, 5]})\n    mapper = DataFrameMapper([('target', LabelBinarizer())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 3\n    assert cols[0] == 'target_5'\n    assert cols[1] == 'target_6'\n    assert cols[2] == 'target_7'\n\n\ndef test_binarizer2_df():\n    \"\"\"\n    Check level names from LabelBinarizer with just one output column\n    \"\"\"\n    df = pd.DataFrame({'target': ['a', 'a', 'b', 'b', 'a']})\n    mapper = DataFrameMapper([('target', LabelBinarizer())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 1\n    assert cols[0] == 'target'\n\n\ndef test_onehot_df():\n    \"\"\"\n    Check level ids from one-hot\n    \"\"\"\n    df = pd.DataFrame({'target': [0, 0, 1, 1, 2, 3, 0]})\n    mapper = DataFrameMapper([(['target'], OneHotEncoder())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 4\n    assert cols[0] == 'target_0'\n    assert cols[3] == 'target_3'\n\n\ndef test_customtransform_df():\n    \"\"\"\n    Check level ids from a transformer in which\n    the number of classes is not equals to the number of output columns.\n    \"\"\"\n    df = pd.DataFrame({'target': [6, 5, 7, 5, 4, 8, 8]})\n    mapper = DataFrameMapper([(['target'], CustomTransformer())], df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(mapper.features[0][1].classes_) == 5\n    assert len(cols) == 1\n    assert cols[0] == 'target'\n\n\ndef test_preserve_df_index():\n    \"\"\"\n    The index is preserved when df_out=True\n    \"\"\"\n    df = pd.DataFrame({'target': [1, 2, 3]},\n                      index=['a', 'b', 'c'])\n    mapper = DataFrameMapper([('target', None)],\n                             df_out=True)\n\n    transformed = mapper.fit_transform(df)\n\n    assert_array_equal(transformed.index, df.index)\n\n\ndef test_preserve_df_index_rows_dropped():\n    \"\"\"\n    If df_out=True but the original df index length doesn't\n    match the number of final rows, use a numeric index\n    \"\"\"\n    class DropLastRowTransformer(object):\n        def fit(self, X):\n            return self\n\n        def transform(self, X):\n            return X[:-1]\n\n    df = pd.DataFrame({'target': [1, 2, 3]},\n                      index=['a', 'b', 'c'])\n    mapper = DataFrameMapper([('target', DropLastRowTransformer())],\n                             df_out=True)\n\n    transformed = mapper.fit_transform(df)\n\n    assert_array_equal(transformed.index, np.array([0, 1]))\n\n\ndef test_pca(complex_dataframe):\n    \"\"\"\n    Check multi in and out with PCA\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper(\n        [(['feat1', 'feat2'], sklearn.decomposition.PCA(2))],\n        df_out=True)\n    transformed = mapper.fit_transform(df)\n    cols = transformed.columns\n    assert len(cols) == 2\n    assert cols[0] == 'feat1_feat2_0'\n    assert cols[1] == 'feat1_feat2_1'\n\n\ndef test_fit_transform(simple_dataframe):\n    \"\"\"\n    Check that custom fit_transform methods of the transformers are invoked.\n    \"\"\"\n    df = simple_dataframe\n    mock_transformer = Mock()\n    # return something of measurable length but does nothing\n    mock_transformer.fit_transform.return_value = np.array([1, 2, 3])\n    mapper = DataFrameMapper([(\"a\", mock_transformer)])\n    mapper.fit_transform(df)\n    assert mock_transformer.fit_transform.called\n\n\ndef test_fit_transform_equiv_mock(simple_dataframe):\n    \"\"\"\n    Check for equivalent results for code paths fit_transform\n    versus fit and transform in DataFrameMapper using the mock\n    transformer which does not implement a custom fit_transform.\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([('a', MockXTransformer())])\n    transformed_combined = mapper.fit_transform(df)\n    transformed_separate = mapper.fit(df).transform(df)\n    assert np.all(transformed_combined == transformed_separate)\n\n\ndef test_fit_transform_equiv_pca(complex_dataframe):\n    \"\"\"\n    Check for equivalent results for code paths fit_transform\n    versus fit and transform in DataFrameMapper and transformer\n    using PCA which implements a custom fit_transform. The\n    equivalence of both paths in the transformer only can be\n    asserted since this is tested in the sklearn tests\n    scikit-learn/sklearn/decomposition/tests/test_pca.py\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper(\n        [(['feat1', 'feat2'], sklearn.decomposition.PCA(2))],\n        df_out=True)\n    transformed_combined = mapper.fit_transform(df)\n    transformed_separate = mapper.fit(df).transform(df)\n    assert np.allclose(transformed_combined, transformed_separate)\n\n\ndef test_input_df_true_first_transformer(simple_dataframe, monkeypatch):\n    \"\"\"\n    If input_df is True, the first transformer is passed\n    a pd.Series instead of an np.array\n    \"\"\"\n    df = simple_dataframe\n    monkeypatch.setattr(MockXTransformer, 'fit', Mock())\n    monkeypatch.setattr(MockXTransformer, 'transform',\n                        Mock(return_value=np.array([1, 2, 3])))\n    mapper = DataFrameMapper([\n        ('a', MockXTransformer())\n    ], input_df=True)\n    out = mapper.fit_transform(df)\n\n    args, _ = MockXTransformer().fit.call_args\n    assert isinstance(args[0], pd.Series)\n\n    args, _ = MockXTransformer().transform.call_args\n    assert isinstance(args[0], pd.Series)\n\n    assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n\n\ndef test_input_df_true_next_transformers(simple_dataframe, monkeypatch):\n    \"\"\"\n    If input_df is True, the subsequent transformers get passed pandas\n    objects instead of numpy arrays (given the previous transformers\n    output pandas objects as well)\n    \"\"\"\n    df = simple_dataframe\n    monkeypatch.setattr(MockTClassifier, 'fit', Mock())\n    monkeypatch.setattr(MockTClassifier, 'transform',\n                        Mock(return_value=pd.Series([1, 2, 3])))\n    mapper = DataFrameMapper([\n        ('a', [MockXTransformer(), MockTClassifier()])\n    ], input_df=True)\n    mapper.fit(df)\n    out = mapper.transform(df)\n\n    args, _ = MockTClassifier().fit.call_args\n    assert isinstance(args[0], pd.Series)\n\n    assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n\n\ndef test_input_df_true_multiple_cols(complex_dataframe):\n    \"\"\"\n    When input_df is True, applying transformers to multiple columns\n    works as expected\n    \"\"\"\n    df = complex_dataframe\n\n    mapper = DataFrameMapper([\n        ('target', MockXTransformer()),\n        ('feat1',  MockXTransformer()),\n    ], input_df=True)\n    out = mapper.fit_transform(df)\n\n    assert_array_equal(out[:, 0], df['target'].values)\n    assert_array_equal(out[:, 1], df['feat1'].values)\n\n\ndef test_input_df_date_encoder():\n    \"\"\"\n    When input_df is True we can apply a transformer that only works\n    with pandas dataframes like a DateEncoder\n    \"\"\"\n    df = pd.DataFrame(\n        {'dates': pd.date_range('2015-10-30', '2015-11-02')})\n    mapper = DataFrameMapper([\n        ('dates', DateEncoder())\n    ], input_df=True)\n    out = mapper.fit_transform(df)\n    expected = np.array([\n        [2015, 10, 30],\n        [2015, 10, 31],\n        [2015, 11, 1],\n        [2015, 11, 2]\n    ])\n    assert_array_equal(out, expected)\n\n\ndef test_local_input_df_date_encoder():\n    \"\"\"\n    When input_df is True we can apply a transformer that only works\n    with pandas dataframes like a DateEncoder\n    \"\"\"\n    df = pd.DataFrame(\n        {'dates': pd.date_range('2015-10-30', '2015-11-02')})\n    mapper = DataFrameMapper([\n        ('dates', DateEncoder(), {'input_df': True})\n    ], input_df=False)\n    out = mapper.fit_transform(df)\n    expected = np.array([\n        [2015, 10, 30],\n        [2015, 10, 31],\n        [2015, 11, 1],\n        [2015, 11, 2]\n    ])\n    assert_array_equal(out, expected)\n\n\ndef test_nonexistent_columns_explicit_fail(simple_dataframe):\n    \"\"\"\n    If a nonexistent column is selected, KeyError is raised.\n    \"\"\"\n    mapper = DataFrameMapper(None)\n    with pytest.raises(KeyError):\n        mapper._get_col_subset(simple_dataframe, ['nonexistent_feature'])\n\n\ndef test_get_col_subset_single_column_array(simple_dataframe):\n    \"\"\"\n    Selecting a single column should return a 1-dimensional numpy array.\n    \"\"\"\n    mapper = DataFrameMapper(None)\n    array = mapper._get_col_subset(simple_dataframe, \"a\")\n\n    assert type(array) == np.ndarray\n    assert array.shape == (len(simple_dataframe[\"a\"]),)\n\n\ndef test_get_col_subset_single_column_list(simple_dataframe):\n    \"\"\"\n    Selecting a list of columns (even if the list contains a single element)\n    should return a 2-dimensional numpy array.\n    \"\"\"\n    mapper = DataFrameMapper(None)\n    array = mapper._get_col_subset(simple_dataframe, [\"a\"])\n\n    assert type(array) == np.ndarray\n    assert array.shape == (len(simple_dataframe[\"a\"]), 1)\n\n\ndef test_cols_string_array(simple_dataframe):\n    \"\"\"\n    If a string is specified as the columns, the transformer\n    is called with a 1-d array as input.\n    \"\"\"\n    df = simple_dataframe\n    mock_transformer = Mock()\n    mapper = DataFrameMapper([(\"a\", mock_transformer)])\n\n    mapper.fit(df)\n    args, kwargs = mock_transformer.fit.call_args\n    assert args[0].shape == (3,)\n\n\ndef test_cols_list_column_vector(simple_dataframe):\n    \"\"\"\n    If a one-element list is specified as the columns, the transformer\n    is called with a column vector as input.\n    \"\"\"\n    df = simple_dataframe\n    mock_transformer = Mock()\n    mapper = DataFrameMapper([([\"a\"], mock_transformer)])\n\n    mapper.fit(df)\n    args, kwargs = mock_transformer.fit.call_args\n    assert args[0].shape == (3, 1)\n\n\ndef test_handle_feature_2dim():\n    \"\"\"\n    2-dimensional arrays are returned unchanged.\n    \"\"\"\n    array = np.array([[1, 2], [3, 4]])\n    assert_array_equal(_handle_feature(array), array)\n\n\ndef test_handle_feature_1dim():\n    \"\"\"\n    1-dimensional arrays are converted to 2-dimensional column vectors.\n    \"\"\"\n    array = np.array([1, 2])\n    assert_array_equal(_handle_feature(array), np.array([[1], [2]]))\n\n\ndef test_build_transformers():\n    \"\"\"\n    When a list of transformers is passed, return a pipeline with\n    each element of the iterable as a step of the pipeline.\n    \"\"\"\n    transformers = [MockTClassifier(), MockTClassifier()]\n    pipeline = _build_transformer(transformers)\n    assert isinstance(pipeline, Pipeline)\n    for ix, transformer in enumerate(transformers):\n        assert pipeline.steps[ix][1] == transformer\n\n\ndef test_selected_columns():\n    \"\"\"\n    selected_columns returns a set of the columns appearing in the features\n    of the mapper.\n    \"\"\"\n    mapper = DataFrameMapper([\n        ('a', None),\n        (['a', 'b'], None)\n    ])\n    assert mapper._selected_columns == {'a', 'b'}\n\n\ndef test_unselected_columns():\n    \"\"\"\n    unselected_columns returns a list of the columns not appearing in the\n    features of the mapper but present in the given dataframe.\n    \"\"\"\n    df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})\n    mapper = DataFrameMapper([\n        ('a', None),\n        (['a', 'b'], None)\n    ])\n    assert 'c' in mapper._unselected_columns(df)\n\n\ndef test_drop_and_default_false():\n    \"\"\"\n    If default=False, non explicitly selected columns and drop columns\n    are discarded.\n    \"\"\"\n    df = pd.DataFrame({'a': [1], 'b': [2], 'c': [3]})\n    mapper = DataFrameMapper([\n            ('a', None)\n        ], drop_cols=['c'], default=False)\n    transformed = mapper.fit_transform(df)\n    assert transformed.shape == (1, 1)\n    assert mapper.transformed_names_ == ['a']\n\n\ndef test_drop_and_default_none():\n    \"\"\"\n    If default=None, drop columns are discarded and\n    remaining non explicitly selected columns are passed through untransformed\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([\n        ('a', None)\n    ], drop_cols=['c'], default=None)\n\n    transformed = mapper.fit_transform(df)\n    assert transformed.shape == (3, 2)\n    assert mapper.transformed_names_ == ['a', 'b']\n\n\ndef test_conflicting_drop():\n    \"\"\"\n    Drop column name shouldn't get confused with transformed columns.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([\n        ('a', None)\n    ], drop_cols=['a'], default=False)\n\n    transformed = mapper.fit_transform(df)\n    assert transformed.shape == (3, 1)\n    assert mapper.transformed_names_ == ['a']\n\n\ndef test_default_false():\n    \"\"\"\n    If default=False, non explicitly selected columns are discarded.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([\n        ('b', None)\n    ], default=False)\n\n    transformed = mapper.fit_transform(df)\n    assert transformed.shape == (3, 1)\n\n\ndef test_default_none():\n    \"\"\"\n    If default=None, non explicitly selected columns are passed through\n    untransformed.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([\n        (['a'], OneHotEncoder())\n    ], default=None)\n\n    transformed = mapper.fit_transform(df)\n    assert (transformed[:, 3] == np.array([3, 5, 7]).T).all()\n\n\ndef test_default_none_names():\n    \"\"\"\n    If default=None, column names are returned unmodified.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 5, 7]})\n    mapper = DataFrameMapper([], default=None)\n\n    mapper.fit_transform(df)\n    assert mapper.transformed_names_ == ['a', 'b']\n\n\ndef test_default_transformer():\n    \"\"\"\n    If default=Transformer, non explicitly selected columns are applied this\n    transformer.\n    \"\"\"\n    df = pd.DataFrame({'a': [1, np.nan, 3], })\n    mapper = DataFrameMapper([], default=Imputer())\n\n    transformed = mapper.fit_transform(df)\n    assert (transformed[: 0] == np.array([1., 2., 3.])).all()\n\n\ndef test_list_transformers_single_arg(simple_dataframe):\n    \"\"\"\n    Multiple transformers can be specified in a list even if some of them\n    only accept one X argument instead of two (X, y).\n    \"\"\"\n    mapper = DataFrameMapper([\n        ('a', [MockXTransformer()])\n    ])\n    # doesn't fail\n    mapper.fit_transform(simple_dataframe)\n\n\ndef test_list_transformers():\n    \"\"\"\n    Specifying a list of transformers applies them sequentially to the\n    selected column.\n    \"\"\"\n    dataframe = pd.DataFrame({\"a\": [1, np.nan, 3], \"b\": [1, 5, 7]},\n                             dtype=np.float64)\n\n    mapper = DataFrameMapper([\n        ([\"a\"], [Imputer(), StandardScaler()]),\n        ([\"b\"], StandardScaler()),\n    ])\n    dmatrix = mapper.fit_transform(dataframe)\n\n    assert pd.isnull(dmatrix).sum() == 0  # no null values\n\n    # all features have mean 0 and std deviation 1 (standardized)\n    assert (abs(dmatrix.mean(axis=0) - 0) <= 1e-6).all()\n    assert (abs(dmatrix.std(axis=0) - 1) <= 1e-6).all()\n\n\ndef test_list_transformers_old_unpickle(simple_dataframe):\n    mapper = DataFrameMapper(None)\n    # simulate the mapper was created with < 1.0.0 code\n    mapper.features = [('a', [MockXTransformer()])]\n    mapper_pickled = pickle.dumps(mapper)\n\n    loaded_mapper = pickle.loads(mapper_pickled)\n    transformer = loaded_mapper.features[0][1]\n    assert isinstance(transformer, TransformerPipeline)\n    assert isinstance(transformer.steps[0][1], MockXTransformer)\n\n\ndef test_sparse_features(simple_dataframe):\n    \"\"\"\n    If any of the extracted features is sparse and \"sparse\" argument\n    is true, the hstacked result is also sparse.\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([\n        (\"a\", ToSparseTransformer())\n    ], sparse=True)\n    dmatrix = mapper.fit_transform(df)\n\n    assert type(dmatrix) == sparse.csr.csr_matrix\n\n\ndef test_sparse_off(simple_dataframe):\n    \"\"\"\n    If the resulting features are sparse but the \"sparse\" argument\n    of the mapper is False, return a non-sparse matrix.\n    \"\"\"\n    df = simple_dataframe\n    mapper = DataFrameMapper([\n        (\"a\", ToSparseTransformer())\n    ], sparse=False)\n\n    dmatrix = mapper.fit_transform(df)\n    assert type(dmatrix) != sparse.csr.csr_matrix\n\n\ndef test_fit_with_optional_y_arg(complex_dataframe):\n    \"\"\"\n    Transformers with an optional y argument in the fit method\n    are handled correctly\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([(['feat1', 'feat2'], MockTClassifier())])\n    # doesn't fail\n    mapper.fit(df[['feat1', 'feat2']], df['target'])\n\n\ndef test_fit_with_required_y_arg(complex_dataframe):\n    \"\"\"\n    Transformers with a required y argument in the fit method\n    are handled and perform correctly\n    \"\"\"\n    df = complex_dataframe\n    mapper = DataFrameMapper([(['feat1', 'feat2'], SelectKBest(chi2, k=1))])\n\n    # fit, doesn't fail\n    ft_arr = mapper.fit(df[['feat1', 'feat2']], df['target'])\n\n    # fit_transform\n    ft_arr = mapper.fit_transform(df[['feat1', 'feat2']], df['target'])\n    assert_array_equal(ft_arr, df[['feat1']].values)\n\n    # transform\n    t_arr = mapper.transform(df[['feat1', 'feat2']])\n    assert_array_equal(t_arr, df[['feat1']].values)\n\n\n# Integration tests with real dataframes\n\n@pytest.fixture\ndef iris_dataframe():\n    iris = load_iris()\n    return DataFrame(\n        data={\n            iris.feature_names[0]: iris.data[:, 0],\n            iris.feature_names[1]: iris.data[:, 1],\n            iris.feature_names[2]: iris.data[:, 2],\n            iris.feature_names[3]: iris.data[:, 3],\n            \"species\": np.array([iris.target_names[e] for e in iris.target])\n        }\n    )\n\n\n@pytest.fixture\ndef cars_dataframe():\n    return pd.read_csv(\"tests/test_data/cars.csv.gz\", compression='gzip')\n\n\ndef test_with_iris_dataframe(iris_dataframe):\n    pipeline = Pipeline([\n        (\"preprocess\", DataFrameMapper([\n            (\"petal length (cm)\", None),\n            (\"petal width (cm)\", None),\n            (\"sepal length (cm)\", None),\n            (\"sepal width (cm)\", None),\n        ])),\n        (\"classify\", SVC(kernel='linear'))\n    ])\n    data = iris_dataframe.drop(\"species\", axis=1)\n    labels = iris_dataframe[\"species\"]\n    scores = cross_val_score(pipeline, data, labels)\n    assert scores.mean() > 0.96\n    assert (scores.std() * 2) < 0.04\n\n\ndef test_dict_vectorizer():\n    df = pd.DataFrame(\n        [[{'a': 1, 'b': 2}], [{'a': 3}]],\n        columns=['colA']\n    )\n\n    outdf = DataFrameMapper(\n        [('colA', DictVectorizer())],\n        df_out=True,\n        default=False\n    ).fit_transform(df)\n\n    columns = sorted(list(outdf.columns))\n    assert len(columns) == 2\n    assert columns[0] == 'colA_0'\n    assert columns[1] == 'colA_1'\n\n\ndef test_with_car_dataframe(cars_dataframe):\n    pipeline = Pipeline([\n        (\"preprocess\", DataFrameMapper([\n            (\"description\", CountVectorizer()),\n        ])),\n        (\"classify\", SVC(kernel='linear'))\n    ])\n    data = cars_dataframe.drop(\"model\", axis=1)\n    labels = cars_dataframe[\"model\"]\n    scores = cross_val_score(pipeline, data, labels)\n    assert scores.mean() > 0.30\n\n\ndef test_direct_cross_validation(iris_dataframe):\n    \"\"\"\n    Starting with sklearn>=0.16.0 we no longer need CV wrappers for dataframes.\n    See https://github.com/paulgb/sklearn-pandas/issues/11\n    \"\"\"\n    pipeline = Pipeline([\n        (\"preprocess\", DataFrameMapper([\n            (\"petal length (cm)\", None),\n            (\"petal width (cm)\", None),\n            (\"sepal length (cm)\", None),\n            (\"sepal width (cm)\", None),\n        ])),\n        (\"classify\", SVC(kernel='linear'))\n    ])\n    data = iris_dataframe.drop(\"species\", axis=1)\n    labels = iris_dataframe[\"species\"]\n    scores = cross_val_score(pipeline, data, labels)\n    assert scores.mean() > 0.96\n    assert (scores.std() * 2) < 0.04\n\n\ndef test_heterogeneous_output_types_input_df():\n    \"\"\"\n    Modify feat2, but pass feat1 through unmodified.\n    This fails if input_df == False\n    \"\"\"\n    df = pd.DataFrame({\n        'feat1': [1, 2, 3, 4, 5, 6],\n        'feat2': [1.0, 2.0, 3.0, 2.0, 3.0, 4.0]\n    })\n    M = DataFrameMapper([\n        (['feat2'], StandardScaler())\n        ], input_df=True, df_out=True, default=None)\n    dft = M.fit_transform(df)\n    assert dft['feat1'].dtype == np.dtype('int64')\n    assert dft['feat2'].dtype == np.dtype('float64')\n\n\ndef test_make_column_selector(iris_dataframe):\n    t = DataFrameMapper([\n        (make_column_selector(dtype_include=float), None, {'alias': 'x'}),\n        ('sepal length (cm)', None),\n    ], df_out=True, default=False)\n\n    xt = t.fit(iris_dataframe).transform(iris_dataframe)\n    expected = ['x_0', 'x_1', 'x_2', 'x_3', 'sepal length (cm)']\n    assert list(xt.columns) == expected\n\n    pickled = pickle.dumps(t)\n    t2 = pickle.loads(pickled)\n    xt2 = t2.transform(iris_dataframe)\n    assert np.array_equal(xt.values, xt2.values)\n"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_features_generator.py",
      "content": "from collections import Counter\n\nimport pytest\nimport numpy as np\nfrom pandas import DataFrame\nfrom numpy.testing import assert_array_equal\n\nfrom sklearn_pandas import DataFrameMapper\nfrom sklearn_pandas.features_generator import gen_features\n\n\nclass MockClass(object):\n\n    def __init__(self, value=1, name='class'):\n        self.value = value\n        self.name = name\n\n\nclass MockTransformer(object):\n\n    def __init__(self):\n        self.most_common_ = None\n\n    def fit(self, X, y=None):\n        [(value, _)] = Counter(X).most_common(1)\n        self.most_common_ = value\n        return self\n\n    def transform(self, X, y=None):\n        return np.asarray([self.most_common_] * len(X))\n\n\n@pytest.fixture\ndef simple_dataset():\n    return DataFrame({\n        'feat1': [1, 2, 1, 3, 1],\n        'feat2': [1, 2, 2, 2, 3],\n        'feat3': [1, 2, 3, 4, 5],\n    })\n\n\ndef test_generate_features_with_default_parameters():\n    \"\"\"\n    Tests generating features from classes with default init arguments.\n    \"\"\"\n    columns = ['colA', 'colB', 'colC']\n    feature_defs = gen_features(columns=columns, classes=[MockClass])\n    assert len(feature_defs) == len(columns)\n\n    for feature in feature_defs:\n        assert feature[2] == {}\n\n    feature_dict = dict([_[0:2] for _ in feature_defs])\n    assert columns == sorted(feature_dict.keys())\n\n    # default init arguments for MockClass for clarification.\n    expected = {'value': 1, 'name': 'class'}\n    for column, transformers in feature_dict.items():\n        for obj in transformers:\n            assert_attributes(obj, **expected)\n\n\ndef test_generate_features_with_several_classes():\n    \"\"\"\n    Tests generating features pipeline with different transformers parameters.\n    \"\"\"\n    feature_defs = gen_features(\n        columns=['colA', 'colB', 'colC'],\n        classes=[\n            {'class': MockClass},\n            {'class': MockClass, 'name': 'mockA'},\n            {'class': MockClass, 'name': 'mockB', 'value': None}\n        ]\n    )\n\n    for col, transformers, params in feature_defs:\n        assert_attributes(transformers[0], name='class', value=1)\n        assert_attributes(transformers[1], name='mockA', value=1)\n        assert_attributes(transformers[2], name='mockB', value=None)\n\n\ndef test_generate_features_with_none_only_transformers():\n    \"\"\"\n    Tests generating \"dummy\" feature definition which doesn't apply any\n    transformation.\n    \"\"\"\n    feature_defs = gen_features(\n        columns=['colA', 'colB', 'colC'], classes=[None])\n\n    expected = [('colA', None, {}),\n                ('colB', None, {}),\n                ('colC', None, {})]\n\n    assert feature_defs == expected\n\n\ndef test_compatibility_with_data_frame_mapper(simple_dataset):\n    \"\"\"\n    Tests compatibility of generated feature definition with DataFrameMapper.\n    \"\"\"\n    features_defs = gen_features(\n        columns=['feat1', 'feat2'],\n        classes=[MockTransformer])\n    features_defs.append(('feat3', None))\n\n    mapper = DataFrameMapper(features_defs)\n    X = mapper.fit_transform(simple_dataset)\n    expected = np.asarray([\n        [1, 2, 1],\n        [1, 2, 2],\n        [1, 2, 3],\n        [1, 2, 4],\n        [1, 2, 5]\n    ])\n\n    assert_array_equal(X, expected)\n\n\ndef assert_attributes(obj, **attrs):\n    for attr, value in attrs.items():\n        assert getattr(obj, attr) == value\n"
    }
  ],
  "ErrorMessage": "============================================================================================= FAILURES ==============================================================================================\n_______________________________________________________________________________ test_input_df_true_next_transformers ________________________________________________________________________________\n\nsimple_dataframe =    a\n0  1\n1  2\n2  3, monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7218ac9f8640>\n\n    def test_input_df_true_next_transformers(simple_dataframe, monkeypatch):\n        \"\"\"\n        If input_df is True, the subsequent transformers get passed pandas\n        objects instead of numpy arrays (given the previous transformers\n        output pandas objects as well)\n        \"\"\"\n        df = simple_dataframe\n        monkeypatch.setattr(MockTClassifier, 'fit', Mock())\n        monkeypatch.setattr(MockTClassifier, 'transform',\n                            Mock(return_value=pd.Series([1, 2, 3])))\n        mapper = DataFrameMapper([\n            ('a', [MockXTransformer(), MockTClassifier()])\n        ], input_df=True)\n        mapper.fit(df)\n        out = mapper.transform(df)\n    \n        args = MockTClassifier().fit.call_args\n>       assert isinstance(args[0], pd.Series)\nE       AssertionError: assert False\nE        +  where False = isinstance((0    1\\n1    2\\n2    3\\nName: a, dtype: int64, None), <class 'pandas.core.series.Series'>)\nE        +    where <class 'pandas.core.series.Series'> = pd.Series\n\ntests/test_dataframe_mapper.py:541: AssertionError\n----------------------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------------------\nINFO     sklearn_pandas:dataframe_mapper.py:247 [FIT] a: 0.0001 secs\nINFO     sklearn_pandas:dataframe_mapper.py:353 [TRANSFORM] a: 1.5e-05 secs\n========================================================================================= warnings summary ==========================================================================================\ntests/test_dataframe_mapper.py::test_complex_object_df\n  /home/user/Documents/repoben/buggycode/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py:106: SettingWithCopyWarning: \n  A value is trying to be set on a copy of a slice from a DataFrame.\n  Try using .loc[row_indexer,col_indexer] = value instead\n  \n  See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n    X[col] = X[col].map(lambda img: np.max(img))\n\ntests/test_dataframe_mapper.py::test_sparse_features\n  /home/user/Documents/repoben/buggycode/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py:865: DeprecationWarning: Please use `csr_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.csr` namespace is deprecated.\n    assert type(dmatrix) == sparse.csr.csr_matrix\n\ntests/test_dataframe_mapper.py::test_sparse_off\n  /home/user/Documents/repoben/buggycode/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py:879: DeprecationWarning: Please use `csr_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.csr` namespace is deprecated.\n    assert type(dmatrix) != sparse.csr.csr_matrix\n\ntests/test_transformers.py::test_common_numerical_transformer\ntests/test_transformers.py::test_numerical_transformer_serialization\n  /home/user/Documents/repoben/buggycode/scikit-learn-contrib_sklearn-pandas/sklearn_pandas/transformers.py:35: DeprecationWarning: \n              NumericalTransformer will be deprecated in 3.0 version.\n              Please use Sklearn.base.TransformerMixin to write\n              customer transformers\n              \n    warnings.warn(\"\"\"\n\n-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html\n====================================================================================== short test summary info ======================================================================================\nFAILED tests/test_dataframe_mapper.py::test_input_df_true_next_transformers - AssertionError: assert False\n============================================================================= 1 failed, 69 passed, 5 warnings in 1.20s ==============================================================================",
  "Patch": "--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -537,7 +537,7 @@\n     mapper.fit(df)\n     out = mapper.transform(df)\n \n-    args = MockTClassifier().fit.call_args\n+    args, _ = MockTClassifier().fit.call_args\n     assert isinstance(args[0], pd.Series)\n \n     assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n",
  "BuggyCodeLocation": [
    {
      "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
      "function": {
        "524": "test_input_df_true_next_transformers"
      },
      "content_all": {
        "537": "    mapper.fit(df)\n",
        "538": "    out = mapper.transform(df)\n",
        "539": "\n",
        "540": "    args = MockTClassifier().fit.call_args\n",
        "541": "    assert isinstance(args[0], pd.Series)\n",
        "542": "\n"
      },
      "content_change": {
        "540": "    args = MockTClassifier().fit.call_args\n"
      }
    },
    {
      "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
      "function": null,
      "content_all": {
        "543": "    assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n"
      },
      "content_change": {}
    }
  ],
  "Issue": {
    "title": "AttributeError in Test Suite when Accessing Call Arguments of MockTClassifier",
    "description": "### Issue Description\nIn the scikit-learn-contrib sklearn-pandas package, when running the tests in `test_dataframe_mapper.py`, specifically the test function `test_input_df_true_next_transformers`, an `AttributeError` is encountered. The error occurs when attempting to access the call arguments of the `MockTClassifier().fit` method. This happens because `MockTClassifier().fit.call_args` is being used without unpacking the returned tuple, which leads to an exception since the expected call arguments structure is not followed.\n\n### Impact\nThis error causes the test suite to fail, which can impede continuous integration and the validation process of the package. It is critical to ensure that all tests pass to maintain the integrity and reliability of the library.\n\n### Steps to Reproduce\n1. Clone the repository of scikit-learn-contrib sklearn-pandas.\n2. Navigate to the `tests` directory.\n3. Run the test suite using `pytest tests/test_dataframe_mapper.py`.\n4. Observe the `AttributeError` related to `MockTClassifier` in the output.\n\n### Expected Behavior\nThe tests should run without any errors, correctly validating that `MockTClassifier().fit` method is called with the expected arguments.\n\n### Actual Behavior\nThe test suite throws an `AttributeError`, indicating a problem when accessing `MockTClassifier().fit.call_args`.\n\n### Possible Solution\nRefactor the test to correctly unpack the call arguments tuple from `MockTClassifier().fit.call_args`, ensuring that the verification of the call arguments is done appropriately.",
    "explanation": "### Summary of the Issue\n\nThe issue reported involves an `AttributeError` occurring in a test function within the `scikit-learn-contrib sklearn-pandas` package. Specifically, the error arises in the test function `test_input_df_true_next_transformers` located in `test_dataframe_mapper.py`. This error manifests when attempting to access the call arguments of the `MockTClassifier().fit` method due to the incorrect handling of the return value from the `call_args` attribute. The `call_args` attribute returns a tuple, which was mistakenly used without unpacking, leading to an exception because the expected structure was not followed.\n\n### Details of the Issue\n\n- **Issue Description**: \n  - Within the `test_input_df_true_next_transformers` function, an `AttributeError` occurs when accessing `MockTClassifier().fit.call_args`.\n  - The error stems from improper handling and unpacking of the `call_args` returned tuple.\n  \n- **Impact**: \n  - This issue causes the test suite to fail, disrupting continuous integration and validation processes for the package.\n  - All tests need to pass to maintain the reliability and integrity of the library.\n\n- **Steps to Reproduce**: \n  1. Clone the repository.\n  2. Navigate to the `tests` directory.\n  3. Run the test suite using `pytest tests/test_dataframe_mapper.py`.\n  4. Observe the `AttributeError` related to `MockTClassifier` in the output.\n\n### Cause of the Issue\n\nThe cause of the issue is that the `call_args` attribute of a mock object's method returns a tuple containing arguments and keyword arguments. In the test, `MockTClassifier().fit.call_args` was accessed but not unpacked into individual components (args and kwargs). This improper handling led to the `AttributeError` because the subsequent code expected `args` to be properly extracted from the tuple.\n\n### Commit Details\n\n- **Commit Changes**:\n  - Title of the commit: The patch refactors the test function to unpack the call arguments tuple correctly.\n  - In the method, change `args = MockTClassifier().fit.call_args` to `args, _ = MockTClassifier().fit.call_args`.\n\n### How the Commit Solves the Issue\n\n- **Solution Approach**:\n  - The commit addresses the issue by refactoring the erroneous line of code.\n  - Instead of directly accessing `MockTClassifier().fit.call_args`, the code is changed to unpack the returned tuple of the call arguments properly.\n  - By doing `args, _ = MockTClassifier().fit.call_args`, the code now explicitly extracts the positional arguments (args) while ignoring the keyword arguments (kwargs) by assigning them to `_`.\n\n### Explanation from the Developer's Perspective\n\nTo solve the error, the developer recognized that the direct usage of `MockTClassifier().fit.call_args` did not account for the structure that `call_args` returns, which is a tuple of (args, kwargs). The patch introduces a refactoring by unpacking this tuple correctly. This unpacking ensures that the subsequent assertions and checks in the test receive the expected arguments, thus preventing the `AttributeError`.\n\n### Solution Explanation\n\nThe solution ensures proper handling of mock call arguments in Python's mock library:\n1. **Correct Unpacking**: The call arguments are unpacked into two components – positional and keyword arguments.\n2. **Proper Access**: By properly accessing just the positional arguments portion needed for the test, the error is sidestepped.\n3. **Validation of Correctness**: The test now validates that the arguments passed to `fit` are as expected without falling into structural misassumptions.\n\nWith this fix in place, the test suite can run successfully, thereby restoring continuous integration and verification workflows, and upholding the package's reliability."
  },
  "Explain": "### Summary of the Issue\n\nThe issue reported involves an `AttributeError` occurring in a test function within the `scikit-learn-contrib sklearn-pandas` package. Specifically, the error arises in the test function `test_input_df_true_next_transformers` located in `test_dataframe_mapper.py`. This error manifests when attempting to access the call arguments of the `MockTClassifier().fit` method due to the incorrect handling of the return value from the `call_args` attribute. The `call_args` attribute returns a tuple, which was mistakenly used without unpacking, leading to an exception because the expected structure was not followed.\n\n### Details of the Issue\n\n- **Issue Description**: \n  - Within the `test_input_df_true_next_transformers` function, an `AttributeError` occurs when accessing `MockTClassifier().fit.call_args`.\n  - The error stems from improper handling and unpacking of the `call_args` returned tuple.\n  \n- **Impact**: \n  - This issue causes the test suite to fail, disrupting continuous integration and validation processes for the package.\n  - All tests need to pass to maintain the reliability and integrity of the library.\n\n- **Steps to Reproduce**: \n  1. Clone the repository.\n  2. Navigate to the `tests` directory.\n  3. Run the test suite using `pytest tests/test_dataframe_mapper.py`.\n  4. Observe the `AttributeError` related to `MockTClassifier` in the output.\n\n### Cause of the Issue\n\nThe cause of the issue is that the `call_args` attribute of a mock object's method returns a tuple containing arguments and keyword arguments. In the test, `MockTClassifier().fit.call_args` was accessed but not unpacked into individual components (args and kwargs). This improper handling led to the `AttributeError` because the subsequent code expected `args` to be properly extracted from the tuple.\n\n### Commit Details\n\n- **Commit Changes**:\n  - Title of the commit: The patch refactors the test function to unpack the call arguments tuple correctly.\n  - In the method, change `args = MockTClassifier().fit.call_args` to `args, _ = MockTClassifier().fit.call_args`.\n\n### How the Commit Solves the Issue\n\n- **Solution Approach**:\n  - The commit addresses the issue by refactoring the erroneous line of code.\n  - Instead of directly accessing `MockTClassifier().fit.call_args`, the code is changed to unpack the returned tuple of the call arguments properly.\n  - By doing `args, _ = MockTClassifier().fit.call_args`, the code now explicitly extracts the positional arguments (args) while ignoring the keyword arguments (kwargs) by assigning them to `_`.\n\n### Explanation from the Developer's Perspective\n\nTo solve the error, the developer recognized that the direct usage of `MockTClassifier().fit.call_args` did not account for the structure that `call_args` returns, which is a tuple of (args, kwargs). The patch introduces a refactoring by unpacking this tuple correctly. This unpacking ensures that the subsequent assertions and checks in the test receive the expected arguments, thus preventing the `AttributeError`.\n\n### Solution Explanation\n\nThe solution ensures proper handling of mock call arguments in Python's mock library:\n1. **Correct Unpacking**: The call arguments are unpacked into two components – positional and keyword arguments.\n2. **Proper Access**: By properly accessing just the positional arguments portion needed for the test, the error is sidestepped.\n3. **Validation of Correctness**: The test now validates that the arguments passed to `fit` are as expected without falling into structural misassumptions.\n\nWith this fix in place, the test suite can run successfully, thereby restoring continuous integration and verification workflows, and upholding the package's reliability.",
  "Source": "Human",
  "Token": 1278,
  "Command": [
    "pytest tests"
  ],
  "FilteredCode": [
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
      "content": "1 # -*- coding: utf8 -*-\n2 \n3 import pytest\n4 from unittest.mock import Mock\n5 from pandas import DataFrame\n6 import pandas as pd\n7 from scipy import sparse\n8 from sklearn.datasets import load_iris\n9 from sklearn.pipeline import Pipeline\n10 from sklearn.model_selection import cross_val_score\n11 from sklearn.svm import SVC\n12 from sklearn.feature_extraction.text import CountVectorizer\n13 from sklearn.feature_extraction import DictVectorizer\n14 from sklearn.preprocessing import (\n15     StandardScaler, OneHotEncoder, LabelBinarizer)\n16 from sklearn.impute import SimpleImputer as Imputer\n17 from sklearn.feature_selection import SelectKBest, chi2\n18 from sklearn.base import BaseEstimator, TransformerMixin\n19 import sklearn.decomposition\n20 import numpy as np\n21 from numpy.testing import assert_array_equal\n22 import pickle\n23 from sklearn.compose import make_column_selector\n24 \n25 from sklearn_pandas import DataFrameMapper\n26 from sklearn_pandas.dataframe_mapper import _handle_feature, _build_transformer\n27 from sklearn_pandas.pipeline import TransformerPipeline\n28 \n29 \n30 class MockXTransformer(object):\n31     \"\"\"\n32     Mock transformer that accepts no y argument.\n33     \"\"\"\n34     def fit(self, X):\n35         return self\n36 \n37     def transform(self, X):\n38         return X\n39 \n40 \n41 class MockTClassifier(object):\n42     \"\"\"\n43     Mock transformer/classifier.\n44     \"\"\"\n45     def fit(self, X, y=None):\n46         return self\n47 \n48     def transform(self, X):\n49         return X\n50 \n51     def predict(self, X):\n52         return True\n53 \n54 \n55 class DateEncoder():\n56     def fit(self, X, y=None):\n57         retur(...truncated)"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/setup.py",
      "content": "1 #!/usr/bin/env python\n2 # -*- coding: utf-8 -*-\n3 \n4 from setuptools import setup\n5 from setuptools.command.test import test as TestCommand\n6 import re\n7 \n8 for line in open('sklearn_pandas/__init__.py'):\n9     match = re.match(\"__version__ *= *'(.*)'\", line)\n10     if match:\n11         __version__, = match.groups()\n12 \n13 \n14 class PyTest(TestCommand):\n15     user_options = [('pytest-args=', 'a', \"Arguments to pass to py.test\")]\n16 \n17     def initialize_options(self):\n18         TestCommand.initialize_options(self)\n19         self.pytest_args = []\n20 \n21     def finalize_options(self):\n22         TestCommand.finalize_options(self)\n23         self.test_args = []\n24         self.test_suite = True\n25 \n26     def run(self):\n27         import pytest\n28         errno = pytest.main(self.pytest_args)\n29         raise SystemExit(errno)\n30 \n31 \n32 setup(name='sklearn-pandas',\n33       version=__version__,\n34       description='Pandas integration with sklearn',\n35       maintainer='Ritesh Agrawal',\n36       maintainer_email='ragrawal@gmail.com',\n37       url='https://github.com/scikit-learn-contrib/sklearn-pandas',\n38       packages=['sklearn_pandas'],\n39       keywords=['scikit', 'sklearn', 'pandas'],\n40       install_requires=[\n41           'scikit-learn>=0.23.0',\n42           'scipy>=1.5.1',\n43           'pandas>=1.1.4',\n44           'numpy>=1.18.1'\n45       ],\n46       tests_require=['pytest', 'mock'],\n47       cmdclass={'test': PyTest},\n48       license='MIT License'\n49 )"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_features_generator.py",
      "content": "1 from collections import Counter\n2 \n3 import pytest\n4 import numpy as np\n5 from pandas import DataFrame\n6 from numpy.testing import assert_array_equal\n7 \n8 from sklearn_pandas import DataFrameMapper\n9 from sklearn_pandas.features_generator import gen_features\n10 \n11 \n12 class MockClass(object):\n13 \n14     def __init__(self, value=1, name='class'):\n15         self.value = value\n16         self.name = name\n17 \n18 \n19 class MockTransformer(object):\n20 \n21     def __init__(self):\n22         self.most_common_ = None\n23 \n24     def fit(self, X, y=None):\n25         [(value, _)] = Counter(X).most_common(1)\n26         self.most_common_ = value\n27         return self\n28 \n29     def transform(self, X, y=None):\n30         return np.asarray([self.most_co(...truncated)"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py",
      "content": "1 import contextlib\n2 from datetime import datetime\n3 import pandas as pd\n4 import numpy as np\n5 from scipy import sparse\n6 from sklearn.base import BaseEstimator, TransformerMixin\n7 from .cross_validation import DataWrapper\n8 from .pipeline import make_transformer_pipeline, _call_fit, TransformerPipeline\n9 from . import logger\n10 \n11 string_types = text_type = str\n12 \n13 \n14 def _handle_feature(fea):\n15     \"\"\"\n16     Convert 1-dimensional arrays to 2-dimensional column vectors.\n17     \"\"\"\n18     if l(...truncated)"
    },
    {
      "path": "scikit-learn-contrib_sklearn-pandas/tests/test_pipeline.py",
      "content": "1 import pytest\n2 from sklearn_pandas.pipeline import TransformerPipeline, _call_fit\n3 \n4 # In py3, mock is included with the unittest standard library\n5 # In py2, it's a separate package\n6 try:\n7     from unittest.mock import patch\n8 except ImportError:\n9     from mock import patch\n10 \n11 \n12 class NoTransformT(object):\n13     \"\"\"Transformer with(...truncated)"
    }
  ],
  "TokenAll": 18487,
  "FilteredLevel": 1500,
  "Results": {
    "model": "GPT-4o",
    "Difficulty": "Medium",
    "issue_origin": {
      "title": "Use of deprecated cross_validation module",
      "description": "The code in 'scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py' is importing 'cross_validation' from sklearn, which is deprecated. Replace the import with 'model_selection' to ensure compatibility with newer versions of sklearn.",
      "explanation": "### Summary of the Issue\n\nThe issue describes the use of a deprecated `cross_validation` module from scikit-learn in the codebase, specifically within 'scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py'. The scikit-learn library has deprecated `cross_validation` in favor of the `model_selection` module. This will lead to compatibility problems with newer versions of scikit-learn. Therefore, the task is to replace the usage of `cross_validation` with `model_selection`.\n\n### Analysis of the Code Base\n\n- **File:** `scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py`\n  - Imports `cross_val_score` from `sklearn.model_selection`.\n  \n- **File:** `scikit-learn-contrib_sklearn-pandas/setup.py`\n  - No direct relevance to `cross_validation` or its replacement but defines the package setup process.\n\n- **File:** `scikit-learn-contrib_sklearn-pandas/tests/test_features_generator.py`\n  - No usage of `cross_validation`.\n  \n- **File:** `scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py`\n  - Line 7 imports from the internal module, `.cross_validation`, which likely corresponds to the deprecated module.\n\n- **File:** `scikit-learn-contrib_sklearn-pandas/tests/test_pipeline.py`\n  - This file does not use `cross_validation`.\n\n### Content of the Commit\n\nTo address the issue, the commit typically involves:\n\n1. **Changing Imports:**\n   - Replacing any imports from `cross_validation` with `model_selection`.\n   \n2. **Updating Method/Function Calls:**\n   - Renovating any `cross_validation` specific functions to their `model_selection` counterparts.\n\n3. **Testing:**\n   - Ensuring that all tests that previously used `cross_validation` are updated accordingly and pass.\n\n### Detailed Explanation of the Commit\n\n#### 1. Changing Imports:\n\nIn `dataframe_mapper.py`:\n```python\n- from .cross_validation import DataWrapper\n+ from .model_selection import DataWrapper\n```\n\nThe `DataWrapper` class or function is now imported from the new `.model_selection` module instead of the deprecated `.cross_validation`.\n\n#### 2. Updating Method/Function Calls:\n\nIf there are any specific uses of cross-validation functions, they need to be updated to the new API provided by `model_selection`.\n\nFor instance:\n\n```python\n- from sklearn.cross_validation import cross_val_score\n+ from sklearn.model_selection import cross_val_score\n```\n\n#### 3. Testing:\n\nEnsuring that the tests still function correctly with the new import:\n\nIn `test_dataframe_mapper.py`:\n```python\n- from sklearn.cross_validation import cross_val_score\n+ from sklearn.model_selection import cross_val_score\n```\n\n### How the Commit Solves the Issue\n\nThe commit:\n\n1. **Ensures Future Compatibility:**\n   - By importing from `model_selection`, the code is updated to be compatible with future versions of scikit-learn. This prevents any deprecated errors from emerging when new versions are released.\n\n2. **Maintains Functionality:**\n   - The `model_selection` module provides similar functionality as `cross_validation` with additional improvements and bug fixes. The transition ensures that the original functionality (such as `cross_val_score`) remains intact.\n\n3. **Prevents Deprecation Warnings:**\n   - Transitioning to the `model_selection` module eliminates deprecated warnings which clutter logs and can lead to confusion or missed errors in the future.\n\n### Conclusion\n\nBy changing the imports from the deprecated `cross_validation` module to the `model_selection` module and updating relevant function calls, the codebase ensures compatibility with current and future versions of scikit-learn. This commit is crucial for preventing potential runtime errors and warnings, thus maintaining the stability and reliability of the code."
    },
    "issue_message": {
      "title": "Failed assertion in test_input_df_true_next_transformers",
      "description": "The test `test_input_df_true_next_transformers` in `tests/test_dataframe_mapper.py` fails because the assertion `assert isinstance(args[0], pd.Series)` does not hold true. The object passed to `MockTClassifier().fit.call_args` is a tuple ((0    1\n1    2\n2    3\nName: a, dtype: int64, None)) instead of the expected `pd.Series`. This could be caused by the way the pandas Series is being handled or transformed in the pipeline.",
      "explanation": "### Summary of the Issue\n\nThe test function `test_input_df_true_next_transformers` in `tests/test_dataframe_mapper.py` fails because the assertion `assert isinstance(args[0], pd.Series)` does not hold true. Instead of a `pd.Series`, the argument passed to `MockTClassifier().fit.call_args` is a tuple containing a `pd.Series` and `None`. This indicates a problem with how the pandas `Series` is being passed or transformed within the code.\n\n### Analysis of the Commit Content\n\nThe following commit has been made to resolve the issue:\n\n```diff\nclass MockTClassifier(object):\n    \"\"\"\n    Mock transformer/classifier.\n    \"\"\"\n-    def fit(self, X, y=None):\n+    def fit(self, X, y=None, **fit_params):\n        return self\n```\n\n### Explanation of How the Commit Solves the Issue\n\n#### Cause of the Issue\n\nThe issue seems to stem from the way the `fit` method of the `MockTClassifier` is defined and called. Specifically, the `fit` method of the `MockTClassifier` does not accept additional keyword arguments (`fit_params`). When the `fit` method is called within the pipeline, the method is invoked with additional parameters, which leads to `call_args` capturing the `X` parameter as part of a tuple along with `None`.\n\n#### Detailed Explanation of Changes\n\n1. **Modification in MockTClassifier's fit method**:\n   - **Original Definition**:\n     ```python\n     def fit(self, X, y=None):\n         return self\n     ```\n   - **Modified Definition**:\n     ```python\n     def fit(self, X, y=None, **fit_params):\n         return self\n     ```\n\n   The `fit` method is updated to include `**fit_params`. This means it can accept and ignore any number of additional keyword arguments without failure.\n\n#### How the Commit Fixes the Issue\n\n- **Understanding `**fit_params`**:\n  - The `**fit_params` in the method definition allows the function to capture any additional keyword arguments in a dictionary. This is crucial in sklearn's pipeline as sometimes the `fit` method might be called with additional parameters related to different configurations or metadata.\n  \n- **Effect on `MockTClassifier`**:\n  - With the modified definition of the `fit` method, `MockTClassifier` now gracefully handles any additional parameters by accepting them through `**fit_params`. This prevents any unintentional parameter from being bundled into a tuple with `X`.\n\n- **Testing Assertion**:\n  - Updating the `fit` method ensures that `MockTClassifier().fit.call_args` captures the correct call signature. Consequently, `args[0]` will directly be the `pd.Series` as expected, and not a tuple that includes unintended elements. Thus, the `assert isinstance(args[0], pd.Series)` check will pass correctly.\n  \n### Final Verification\n\nAfter applying this change, when the `test_input_df_true_next_transformers` test is rerun, the assertion should now correctly identify `args[0]` as an instance of `pd.Series`, leading to the test passing successfully.\n\n### Recap\n\n- **Issue**: The test `test_input_df_true_next_transformers` fails because the `fit` method argument in `MockTClassifier` is bundled unintentionally into a tuple.\n- **Commit Change**: Added `**fit_params` to the `fit` method of `MockTClassifier`.\n- **Resolution**: This allows the method to accept and discard additional unexpected keyword arguments, ensuring the correct handling of parameters and thereby solving the test failure."
    },
    "issue_ground": {
      "title": "AttributeError in Test Suite when Accessing Call Arguments of MockTClassifier",
      "description": "### Issue Description\nIn the scikit-learn-contrib sklearn-pandas package, when running the tests in `test_dataframe_mapper.py`, specifically the test function `test_input_df_true_next_transformers`, an `AttributeError` is encountered. The error occurs when attempting to access the call arguments of the `MockTClassifier().fit` method. This happens because `MockTClassifier().fit.call_args` is being used without unpacking the returned tuple, which leads to an exception since the expected call arguments structure is not followed.\n\n### Impact\nThis error causes the test suite to fail, which can impede continuous integration and the validation process of the package. It is critical to ensure that all tests pass to maintain the integrity and reliability of the library.\n\n### Steps to Reproduce\n1. Clone the repository of scikit-learn-contrib sklearn-pandas.\n2. Navigate to the `tests` directory.\n3. Run the test suite using `pytest tests/test_dataframe_mapper.py`.\n4. Observe the `AttributeError` related to `MockTClassifier` in the output.\n\n### Expected Behavior\nThe tests should run without any errors, correctly validating that `MockTClassifier().fit` method is called with the expected arguments.\n\n### Actual Behavior\nThe test suite throws an `AttributeError`, indicating a problem when accessing `MockTClassifier().fit.call_args`.\n\n### Possible Solution\nRefactor the test to correctly unpack the call arguments tuple from `MockTClassifier().fit.call_args`, ensuring that the verification of the call arguments is done appropriately.",
      "explanation": "### Summary of the Issue\n\nThe issue reported is an `AttributeError` encountered during the test suite execution for the `scikit-learn-contrib sklearn-pandas` package. This error occurs in the `test_input_df_true_next_transformers` function within `test_dataframe_mapper.py` when accessing the call arguments of the `MockTClassifier().fit` method. The test fails because the structure of the expected call arguments is not being followed correctly. Specifically, `MockTClassifier().fit.call_args` returns a tuple, and the test does not unpack this tuple correctly, which causes the error and prevents the test suite from passing.\n\n### Explanation of the Commit\n\nThe commit modifies the test function `test_input_df_true_next_transformers` to correctly unpack the call arguments from `MockTClassifier().fit.call_args`. Here are the relevant changes introduced in the commit:\n\nBefore:\n```python\nargs = MockTClassifier().fit.call_args\nassert isinstance(args[0], pd.Series)\n```\n\nAfter:\n```python\nargs, _ = MockTClassifier().fit.call_args\nassert isinstance(args[0], pd.Series)\n```\n\n### Detailed Explanation of the Changes\n\n**1. Understanding `call_args`:**\n\n`call_args` in the `unittest.mock` library is an attribute that returns a tuple of two elements:\n   - The first element is a tuple of positional arguments passed to the mocked method.\n   - The second element is a dictionary of keyword arguments passed to the mocked method.\n\nIn the original test code:\n```python\nargs = MockTClassifier().fit.call_args\n```\nThe test attempts to access the call arguments directly in a way that assumes `call_args` is a flat structure, which it is not.\n\n**2. Correct Tuple Unpacking:**\n\nThe commit changes the code to correctly unpack the tuple returned by `call_args`:\n```python\nargs, _ = MockTClassifier().fit.call_args\n```\nHere, `args` now correctly captures the positional arguments (which should include the pandas Series that we want to check), and `_` is a placeholder for the keyword arguments which we are not interested in for this test.\n\n**3. Adjusting the Assertion:**\n\nThe assertion is modified to check that the first positional argument within the `args` tuple is an instance of `pd.Series`:\n```python\nassert isinstance(args[0], pd.Series)\n```\n\n### How the Commit Solves the Issue\n\nBy correctly unpacking the tuple returned by `call_args`, the test now properly accesses the list of positional arguments passed to the `fit` method of `MockTClassifier`. This resolves the `AttributeError` because the test originally failed due to incorrect handling of `call_args`. \n\nNow, the test correctly verifies whether the first argument passed to `fit` is a `pd.Series`. This ensures that the behavior of `MockTClassifier` is as expected when integrated within a `DataFrameMapper` pipeline, maintaining the integrity of the test suite and allowing it to pass without error.\n\n### Conclusion\n\nThe commit addresses the core issue by changing the way call arguments are accessed within the test, thereby ensuring the test suite's execution adheres to the expected behavior. The modification clarifies the structure returned by `call_args` and adjusts the verification process accordingly, fixing the `AttributeError` and ensuring the test suite continues to validate the package correctly."
    },
    "issue_ground_truth": {
      "title": "AttributeError in Test Suite when Accessing Call Arguments of MockTClassifier",
      "description": "### Issue Description\nIn the scikit-learn-contrib sklearn-pandas package, when running the tests in `test_dataframe_mapper.py`, specifically the test function `test_input_df_true_next_transformers`, an `AttributeError` is encountered. The error occurs when attempting to access the call arguments of the `MockTClassifier().fit` method. This happens because `MockTClassifier().fit.call_args` is being used without unpacking the returned tuple, which leads to an exception since the expected call arguments structure is not followed.\n\n### Impact\nThis error causes the test suite to fail, which can impede continuous integration and the validation process of the package. It is critical to ensure that all tests pass to maintain the integrity and reliability of the library.\n\n### Steps to Reproduce\n1. Clone the repository of scikit-learn-contrib sklearn-pandas.\n2. Navigate to the `tests` directory.\n3. Run the test suite using `pytest tests/test_dataframe_mapper.py`.\n4. Observe the `AttributeError` related to `MockTClassifier` in the output.\n\n### Expected Behavior\nThe tests should run without any errors, correctly validating that `MockTClassifier().fit` method is called with the expected arguments.\n\n### Actual Behavior\nThe test suite throws an `AttributeError`, indicating a problem when accessing `MockTClassifier().fit.call_args`.\n\n### Possible Solution\nRefactor the test to correctly unpack the call arguments tuple from `MockTClassifier().fit.call_args`, ensuring that the verification of the call arguments is done appropriately.",
      "explanation": "### Summary of the Issue\n\nThe issue reported involves an `AttributeError` occurring in a test function within the `scikit-learn-contrib sklearn-pandas` package. Specifically, the error arises in the test function `test_input_df_true_next_transformers` located in `test_dataframe_mapper.py`. This error manifests when attempting to access the call arguments of the `MockTClassifier().fit` method due to the incorrect handling of the return value from the `call_args` attribute. The `call_args` attribute returns a tuple, which was mistakenly used without unpacking, leading to an exception because the expected structure was not followed.\n\n### Details of the Issue\n\n- **Issue Description**: \n  - Within the `test_input_df_true_next_transformers` function, an `AttributeError` occurs when accessing `MockTClassifier().fit.call_args`.\n  - The error stems from improper handling and unpacking of the `call_args` returned tuple.\n  \n- **Impact**: \n  - This issue causes the test suite to fail, disrupting continuous integration and validation processes for the package.\n  - All tests need to pass to maintain the reliability and integrity of the library.\n\n- **Steps to Reproduce**: \n  1. Clone the repository.\n  2. Navigate to the `tests` directory.\n  3. Run the test suite using `pytest tests/test_dataframe_mapper.py`.\n  4. Observe the `AttributeError` related to `MockTClassifier` in the output.\n\n### Cause of the Issue\n\nThe cause of the issue is that the `call_args` attribute of a mock object's method returns a tuple containing arguments and keyword arguments. In the test, `MockTClassifier().fit.call_args` was accessed but not unpacked into individual components (args and kwargs). This improper handling led to the `AttributeError` because the subsequent code expected `args` to be properly extracted from the tuple.\n\n### Commit Details\n\n- **Commit Changes**:\n  - Title of the commit: The patch refactors the test function to unpack the call arguments tuple correctly.\n  - In the method, change `args = MockTClassifier().fit.call_args` to `args, _ = MockTClassifier().fit.call_args`.\n\n### How the Commit Solves the Issue\n\n- **Solution Approach**:\n  - The commit addresses the issue by refactoring the erroneous line of code.\n  - Instead of directly accessing `MockTClassifier().fit.call_args`, the code is changed to unpack the returned tuple of the call arguments properly.\n  - By doing `args, _ = MockTClassifier().fit.call_args`, the code now explicitly extracts the positional arguments (args) while ignoring the keyword arguments (kwargs) by assigning them to `_`.\n\n### Explanation from the Developer's Perspective\n\nTo solve the error, the developer recognized that the direct usage of `MockTClassifier().fit.call_args` did not account for the structure that `call_args` returns, which is a tuple of (args, kwargs). The patch introduces a refactoring by unpacking this tuple correctly. This unpacking ensures that the subsequent assertions and checks in the test receive the expected arguments, thus preventing the `AttributeError`.\n\n### Solution Explanation\n\nThe solution ensures proper handling of mock call arguments in Python's mock library:\n1. **Correct Unpacking**: The call arguments are unpacked into two components – positional and keyword arguments.\n2. **Proper Access**: By properly accessing just the positional arguments portion needed for the test, the error is sidestepped.\n3. **Validation of Correctness**: The test now validates that the arguments passed to `fit` are as expected without falling into structural misassumptions.\n\nWith this fix in place, the test suite can run successfully, thereby restoring continuous integration and verification workflows, and upholding the package's reliability."
    },
    "location_origin": [
      {
        "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
        "function": {
          "10": "top-level import"
        },
        "content_all": {
          "7": "import pandas as pd\n",
          "8": "from scipy import sparse\n",
          "9": "from sklearn.datasets import load_iris\n",
          "10": "from sklearn.model_selection import cross_val_score\n",
          "11": "from sklearn.svm import SVC\n",
          "12": "from sklearn.feature_extraction.text import CountVectorizer\n",
          "13": "from sklearn.feature_extraction import DictVectorizer\n"
        },
        "content_change": {
          "10": "from sklearn.model_selection import cross_val_score\n"
        }
      },
      {
        "file": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py",
        "function": {
          "7": "top-level import"
        },
        "content_all": {
          "4": "import numpy as np\n",
          "5": "from scipy import sparse\n",
          "6": "from sklearn.base import BaseEstimator, TransformerMixin\n",
          "7": "from .cross_validation import DataWrapper\n",
          "8": "from .pipeline import make_transformer_pipeline, _call_fit, TransformerPipeline\n",
          "9": "from . import logger\n",
          "10": "import contextlib\n"
        },
        "content_change": {
          "7": "from .model_selection import DataWrapper\n"
        }
      }
    ],
    "location_message": [
      {
        "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
        "function": {
          "541": "test_input_df_true_next_transformers"
        },
        "content_all": {
          "538": "        monkeypatch.setattr(MockTClassifier, 'transform',",
          "539": "                            Mock(return_value=pd.Series([1, 2, 3])))",
          "540": "        mapper = DataFrameMapper([",
          "541": "            ('a', [MockXTransformer(), MockTClassifier()])",
          "542": "        ], input_df=True)",
          "543": "        mapper.fit(df)",
          "544": "        out = mapper.transform(df)",
          "545": "    args = MockTClassifier().fit.call_args",
          "546": "        assert isinstance(args[0], pd.Series)"
        },
        "content_change": {
          "541": "        assert isinstance(args[0], pd.Series)"
        }
      },
      {
        "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
        "function": {
          "45": "MockTClassifier"
        },
        "content_all": {
          "42": "class MockTClassifier(object):",
          "43": "    \"\"\"",
          "44": "    Mock transformer/classifier.",
          "45": "    \"\"\"",
          "46": "    def fit(self, X, y=None, **fit_params):",
          "47": "        return self",
          "48": "    def transform(self, X):",
          "49": "        return X",
          "50": "    def predict(self, X):",
          "51": "        return True"
        },
        "content_change": {
          "46": "    def fit(self, X, y=None, **fit_params):"
        }
      }
    ],
    "location_ground": [
      {
        "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
        "function": {
          "30": "test_input_df_true_next_transformers"
        },
        "content_all": {
          "27": "    df = pd.DataFrame({'feature': ['a', 'b', 'c']})\n",
          "28": "    mapper = DataFrameMapper([\n",
          "29": "        ('feature', MockTClassifier())\n",
          "30": "    ], input_df=True)\n",
          "31": "    mapper.fit(df)\n",
          "32": "\n",
          "33": "    args = MockTClassifier().fit.call_args\n",
          "34": "    assert isinstance(args[0], pd.Series)\n",
          "35": "\n",
          "36": "\n"
        },
        "content_change": {
          "33": "    args, _ = MockTClassifier().fit.call_args\n"
        }
      }
    ],
    "location_ground_exp": [
      {
        "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
        "function": {
          "137": "test_input_df_true_next_transformers"
        },
        "content_all": {
          "134": "    mock_classifier.fit(DataFrame(X), y)\n",
          "135": "\n",
          "136": "    # Ensure the fit method was called with correct arguments\n",
          "137": "    # Issue: Attempting to access call_args without unpacking\n",
          "138": "    args = mock_classifier.fit.call_args  # This causes the AttributeError\n",
          "139": "    assert isinstance(args[0][0], DataFrame)\n",
          "140": "    assert isinstance(args[0][1], np.ndarray)\n",
          "141": "    assert_array_equal(args[0][1], y)\n"
        },
        "content_change": {
          "138": "    args, _ = mock_classifier.fit.call_args  # Correct unpacking of call_args\n"
        }
      }
    ],
    "location_ground_truth": [
      {
        "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
        "function": {
          "524": "test_input_df_true_next_transformers"
        },
        "content_all": {
          "537": "    mapper.fit(df)\n",
          "538": "    out = mapper.transform(df)\n",
          "539": "\n",
          "540": "    args = MockTClassifier().fit.call_args\n",
          "541": "    assert isinstance(args[0], pd.Series)\n",
          "542": "\n"
        },
        "content_change": {
          "540": "    args = MockTClassifier().fit.call_args\n"
        }
      },
      {
        "file": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
        "function": null,
        "content_all": {
          "543": "    assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n"
        },
        "content_change": {}
      }
    ],
    "patch_i": "\n--- a/scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py\n@@ -4,7 +4,7 @@\n import numpy as np\n from scipy import sparse\n from sklearn.base import BaseEstimator, TransformerMixin\n-from .cross_validation import DataWrapper\n+from .model_selection import DataWrapper\n from .pipeline import make_transformer_pipeline, _call_fit, TransformerPipeline\n from . import logger\n \n--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -9,7 +9,7 @@\n from sklearn.pipeline import Pipeline\n-from sklearn.cross_validation import cross_val_score\n+from sklearn.model_selection import cross_val_score\n from sklearn.svm import SVC\n from sklearn.feature_extraction.text import CountVectorizer\n from sklearn.feature_extraction import DictVectorizer\n",
    "patch_im": "\n--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -44,7 +44,7 @@ class MockTClassifier(object):\n     Mock transformer/classifier.\n     \"\"\"\n \n-    def fit(self, X, y=None):\n+    def fit(self, X, y=None, **fit_params):\n         return self\n \n     def transform(self, X):\n",
    "patch_il": "\n--- a/scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py\n@@ -4,7 +4,7 @@\n import numpy as np\n from scipy import sparse\n from sklearn.base import BaseEstimator, TransformerMixin\n-from .cross_validation import DataWrapper\n+from .model_selection import DataWrapper\n from .pipeline import make_transformer_pipeline, _call_fit, TransformerPipeline\n from . import logger\n \n--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -10,7 +10,7 @@\n from pandas import DataFrame\n import pandas as pd\n from scipy import sparse\n-from sklearn.datasets import load_iris\n from sklearn.pipeline import Pipeline\n-from sklearn.model_selection import cross_val_score\n+from sklearn.model_selection import cross_val_score\n from sklearn.svm import SVC\n from sklearn.feature_extraction.text import CountVectorizer\n from sklearn.feature_extraction import DictVectorizer\n",
    "patch_iml": "\n--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -42,7 +42,7 @@ class MockTClassifier(object):\n     \"\"\"\n     Mock transformer/classifier.\n     \"\"\"\n-    def fit(self, X, y=None):\n+    def fit(self, X, y=None, **fit_params):\n         return self\n \n     def transform(self, X):\n",
    "patch_ground": "\n--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -125,7 +125,8 @@ def test_input_df_true(self):\n \n def test_input_df_true_next_transformers():\n     MockTClassifier = Mock(name=\"MockTClassifier\", spec=MockTClassifier)\n-    clf = Pipeline([('mapper', DataFrameMapper([(['age'], None)])), ('classifier', MockTClassifier)])\n+    clf = Pipeline([('mapper', DataFrameMapper([(['age'], None)])),\n+                    ('classifier', MockTClassifier())])\n     X = DataFrame({'age': [4, 5, 9, 10]})\n     y = [0, 1, 0, 1]\n     clf.fit(X, y)\n@@ -133,7 +134,7 @@ def test_input_df_true_next_transformers():\n     # Ensure next transformers are called with pandas Series\n     assert len(MockTClassifier().fit.call_args_list) > 0\n \n-    args = MockTClassifier().fit.call_args\n+    args, _ = MockTClassifier().fit.call_args\n     assert isinstance(args[0][0], pd.Series)\n ",
    "patch_ground_location": "\n--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -30,8 +30,9 @@\n     df = pd.DataFrame({'feature': ['a', 'b', 'c']})\n     mapper = DataFrameMapper([\n         ('feature', MockTClassifier())\n     ], input_df=True)\n     mapper.fit(df)\n \n-    args = MockTClassifier().fit.call_args\n+    args, _ = MockTClassifier().fit.call_args\n     assert isinstance(args[0], pd.Series)\n \n",
    "patch_ground_exp": "\n--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -134,12 +134,12 @@ def test_input_df_true_next_transformers():\n     mock_classifier.fit(DataFrame(X), y)\n \n     # Ensure the fit method was called with correct arguments\n     # Issue: Attempting to access call_args without unpacking\n-    args = mock_classifier.fit.call_args  # This causes the AttributeError\n+    args, _ = mock_classifier.fit.call_args  # Correct unpacking of call_args\n     assert isinstance(args[0][0], DataFrame)\n     assert isinstance(args[0][1], np.ndarray)\n     assert_array_equal(args[0][1], y)\n",
    "patch_ground_all": "\n--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -537,7 +537,7 @@\n     mapper.fit(df)\n     out = mapper.transform(df)\n \n-    args = MockTClassifier().fit.call_args\n+    args, _ = MockTClassifier().fit.call_args\n     assert isinstance(args[0], pd.Series)\n \n     assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n",
    "patch_ground_truth": "--- a/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n+++ b/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py\n@@ -537,7 +537,7 @@\n     mapper.fit(df)\n     out = mapper.transform(df)\n \n-    args = MockTClassifier().fit.call_args\n+    args, _ = MockTClassifier().fit.call_args\n     assert isinstance(args[0], pd.Series)\n \n     assert_array_equal(out, np.array([1, 2, 3]).reshape(-1, 1))\n",
    "message": "============================================================================================= FAILURES ==============================================================================================\n_______________________________________________________________________________ test_input_df_true_next_transformers ________________________________________________________________________________\n\nsimple_dataframe =    a\n0  1\n1  2\n2  3, monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7218ac9f8640>\n\n    def test_input_df_true_next_transformers(simple_dataframe, monkeypatch):\n        \"\"\"\n        If input_df is True, the subsequent transformers get passed pandas\n        objects instead of numpy arrays (given the previous transformers\n        output pandas objects as well)\n        \"\"\"\n        df = simple_dataframe\n        monkeypatch.setattr(MockTClassifier, 'fit', Mock())\n        monkeypatch.setattr(MockTClassifier, 'transform',\n                            Mock(return_value=pd.Series([1, 2, 3])))\n        mapper = DataFrameMapper([\n            ('a', [MockXTransformer(), MockTClassifier()])\n        ], input_df=True)\n        mapper.fit(df)\n        out = mapper.transform(df)\n    \n        args = MockTClassifier().fit.call_args\n>       assert isinstance(args[0], pd.Series)\nE       AssertionError: assert False\nE        +  where False = isinstance((0    1\\n1    2\\n2    3\\nName: a, dtype: int64, None), <class 'pandas.core.series.Series'>)\nE        +    where <class 'pandas.core.series.Series'> = pd.Series\n\ntests/test_dataframe_mapper.py:541: AssertionError\n----------------------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------------------\nINFO     sklearn_pandas:dataframe_mapper.py:247 [FIT] a: 0.0001 secs\nINFO     sklearn_pandas:dataframe_mapper.py:353 [TRANSFORM] a: 1.5e-05 secs\n========================================================================================= warnings summary ==========================================================================================\ntests/test_dataframe_mapper.py::test_complex_object_df\n  /home/user/Documents/repoben/buggycode/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py:106: SettingWithCopyWarning: \n  A value is trying to be set on a copy of a slice from a DataFrame.\n  Try using .loc[row_indexer,col_indexer] = value instead\n  \n  See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n    X[col] = X[col].map(lambda img: np.max(img))\n\ntests/test_dataframe_mapper.py::test_sparse_features\n  /home/user/Documents/repoben/buggycode/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py:865: DeprecationWarning: Please use `csr_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.csr` namespace is deprecated.\n    assert type(dmatrix) == sparse.csr.csr_matrix\n\ntests/test_dataframe_mapper.py::test_sparse_off\n  /home/user/Documents/repoben/buggycode/scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py:879: DeprecationWarning: Please use `csr_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.csr` namespace is deprecated.\n    assert type(dmatrix) != sparse.csr.csr_matrix\n\ntests/test_transformers.py::test_common_numerical_transformer\ntests/test_transformers.py::test_numerical_transformer_serialization\n  /home/user/Documents/repoben/buggycode/scikit-learn-contrib_sklearn-pandas/sklearn_pandas/transformers.py:35: DeprecationWarning: \n              NumericalTransformer will be deprecated in 3.0 version.\n              Please use Sklearn.base.TransformerMixin to write\n              customer transformers\n              \n    warnings.warn(\"\"\"\n\n-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html\n====================================================================================== short test summary info ======================================================================================\nFAILED tests/test_dataframe_mapper.py::test_input_df_true_next_transformers - AssertionError: assert False\n============================================================================= 1 failed, 69 passed, 5 warnings in 1.20s ==============================================================================",
    "CodeBase": [
      {
        "path": "scikit-learn-contrib_sklearn-pandas/tests/test_dataframe_mapper.py",
        "content": "1 # -*- coding: utf8 -*-\n2 \n3 import pytest\n4 from unittest.mock import Mock\n5 from pandas import DataFrame\n6 import pandas as pd\n7 from scipy import sparse\n8 from sklearn.datasets import load_iris\n9 from sklearn.pipeline import Pipeline\n10 from sklearn.model_selection import cross_val_score\n11 from sklearn.svm import SVC\n12 from sklearn.feature_extraction.text import CountVectorizer\n13 from sklearn.feature_extraction import DictVectorizer\n14 from sklearn.preprocessing import (\n15     StandardScaler, OneHotEncoder, LabelBinarizer)\n16 from sklearn.impute import SimpleImputer as Imputer\n17 from sklearn.feature_selection import SelectKBest, chi2\n18 from sklearn.base import BaseEstimator, TransformerMixin\n19 import sklearn.decomposition\n20 import numpy as np\n21 from numpy.testing import assert_array_equal\n22 import pickle\n23 from sklearn.compose import make_column_selector\n24 \n25 from sklearn_pandas import DataFrameMapper\n26 from sklearn_pandas.dataframe_mapper import _handle_feature, _build_transformer\n27 from sklearn_pandas.pipeline import TransformerPipeline\n28 \n29 \n30 class MockXTransformer(object):\n31     \"\"\"\n32     Mock transformer that accepts no y argument.\n33     \"\"\"\n34     def fit(self, X):\n35         return self\n36 \n37     def transform(self, X):\n38         return X\n39 \n40 \n41 class MockTClassifier(object):\n42     \"\"\"\n43     Mock transformer/classifier.\n44     \"\"\"\n45     def fit(self, X, y=None):\n46         return self\n47 \n48     def transform(self, X):\n49         return X\n50 \n51     def predict(self, X):\n52         return True\n53 \n54 \n55 class DateEncoder():\n56     def fit(self, X, y=None):\n57         retur(...truncated)"
      },
      {
        "path": "scikit-learn-contrib_sklearn-pandas/setup.py",
        "content": "1 #!/usr/bin/env python\n2 # -*- coding: utf-8 -*-\n3 \n4 from setuptools import setup\n5 from setuptools.command.test import test as TestCommand\n6 import re\n7 \n8 for line in open('sklearn_pandas/__init__.py'):\n9     match = re.match(\"__version__ *= *'(.*)'\", line)\n10     if match:\n11         __version__, = match.groups()\n12 \n13 \n14 class PyTest(TestCommand):\n15     user_options = [('pytest-args=', 'a', \"Arguments to pass to py.test\")]\n16 \n17     def initialize_options(self):\n18         TestCommand.initialize_options(self)\n19         self.pytest_args = []\n20 \n21     def finalize_options(self):\n22         TestCommand.finalize_options(self)\n23         self.test_args = []\n24         self.test_suite = True\n25 \n26     def run(self):\n27         import pytest\n28         errno = pytest.main(self.pytest_args)\n29         raise SystemExit(errno)\n30 \n31 \n32 setup(name='sklearn-pandas',\n33       version=__version__,\n34       description='Pandas integration with sklearn',\n35       maintainer='Ritesh Agrawal',\n36       maintainer_email='ragrawal@gmail.com',\n37       url='https://github.com/scikit-learn-contrib/sklearn-pandas',\n38       packages=['sklearn_pandas'],\n39       keywords=['scikit', 'sklearn', 'pandas'],\n40       install_requires=[\n41           'scikit-learn>=0.23.0',\n42           'scipy>=1.5.1',\n43           'pandas>=1.1.4',\n44           'numpy>=1.18.1'\n45       ],\n46       tests_require=['pytest', 'mock'],\n47       cmdclass={'test': PyTest},\n48       license='MIT License'\n49 )"
      },
      {
        "path": "scikit-learn-contrib_sklearn-pandas/tests/test_features_generator.py",
        "content": "1 from collections import Counter\n2 \n3 import pytest\n4 import numpy as np\n5 from pandas import DataFrame\n6 from numpy.testing import assert_array_equal\n7 \n8 from sklearn_pandas import DataFrameMapper\n9 from sklearn_pandas.features_generator import gen_features\n10 \n11 \n12 class MockClass(object):\n13 \n14     def __init__(self, value=1, name='class'):\n15         self.value = value\n16         self.name = name\n17 \n18 \n19 class MockTransformer(object):\n20 \n21     def __init__(self):\n22         self.most_common_ = None\n23 \n24     def fit(self, X, y=None):\n25         [(value, _)] = Counter(X).most_common(1)\n26         self.most_common_ = value\n27         return self\n28 \n29     def transform(self, X, y=None):\n30         return np.asarray([self.most_co(...truncated)"
      },
      {
        "path": "scikit-learn-contrib_sklearn-pandas/sklearn_pandas/dataframe_mapper.py",
        "content": "1 import contextlib\n2 from datetime import datetime\n3 import pandas as pd\n4 import numpy as np\n5 from scipy import sparse\n6 from sklearn.base import BaseEstimator, TransformerMixin\n7 from .cross_validation import DataWrapper\n8 from .pipeline import make_transformer_pipeline, _call_fit, TransformerPipeline\n9 from . import logger\n10 \n11 string_types = text_type = str\n12 \n13 \n14 def _handle_feature(fea):\n15     \"\"\"\n16     Convert 1-dimensional arrays to 2-dimensional column vectors.\n17     \"\"\"\n18     if l(...truncated)"
      },
      {
        "path": "scikit-learn-contrib_sklearn-pandas/tests/test_pipeline.py",
        "content": "1 import pytest\n2 from sklearn_pandas.pipeline import TransformerPipeline, _call_fit\n3 \n4 # In py3, mock is included with the unittest standard library\n5 # In py2, it's a separate package\n6 try:\n7     from unittest.mock import patch\n8 except ImportError:\n9     from mock import patch\n10 \n11 \n12 class NoTransformT(object):\n13     \"\"\"Transformer with(...truncated)"
      }
    ],
    "CommitSHA": "c9db2d6dcbf515eade751073f43318e43cae5177"
  },
  "Score": {
    "Difficulty": "Medium",
    "issue_origin": {
      "Title": 8,
      "Description": 6,
      "Reproducibility": 7,
      "Relevance": 8,
      "Explanation": 9,
      "Overall": 8
    },
    "issue_message": {
      "Title": 6,
      "Description": 7,
      "Reproducibility": 4,
      "Relevance": 6,
      "Explanation": 8,
      "Overall": 7
    },
    "issue_ground": {
      "Title": 8,
      "Description": 8,
      "Reproducibility": 8,
      "Relevance": 8,
      "Explanation": 8,
      "Overall": 8
    },
    "issue_ground_truth": {
      "title": "AttributeError in Test Suite when Accessing Call Arguments of MockTClassifier",
      "description": "### Issue Description\nIn the scikit-learn-contrib sklearn-pandas package, when running the tests in `test_dataframe_mapper.py`, specifically the test function `test_input_df_true_next_transformers`, an `AttributeError` is encountered. The error occurs when attempting to access the call arguments of the `MockTClassifier().fit` method. This happens because `MockTClassifier().fit.call_args` is being used without unpacking the returned tuple, which leads to an exception since the expected call arguments structure is not followed.\n\n### Impact\nThis error causes the test suite to fail, which can impede continuous integration and the validation process of the package. It is critical to ensure that all tests pass to maintain the integrity and reliability of the library.\n\n### Steps to Reproduce\n1. Clone the repository of scikit-learn-contrib sklearn-pandas.\n2. Navigate to the `tests` directory.\n3. Run the test suite using `pytest tests/test_dataframe_mapper.py`.\n4. Observe the `AttributeError` related to `MockTClassifier` in the output.\n\n### Expected Behavior\nThe tests should run without any errors, correctly validating that `MockTClassifier().fit` method is called with the expected arguments.\n\n### Actual Behavior\nThe test suite throws an `AttributeError`, indicating a problem when accessing `MockTClassifier().fit.call_args`.\n\n### Possible Solution\nRefactor the test to correctly unpack the call arguments tuple from `MockTClassifier().fit.call_args`, ensuring that the verification of the call arguments is done appropriately.",
      "explanation": "### Summary of the Issue\n\nThe issue reported involves an `AttributeError` occurring in a test function within the `scikit-learn-contrib sklearn-pandas` package. Specifically, the error arises in the test function `test_input_df_true_next_transformers` located in `test_dataframe_mapper.py`. This error manifests when attempting to access the call arguments of the `MockTClassifier().fit` method due to the incorrect handling of the return value from the `call_args` attribute. The `call_args` attribute returns a tuple, which was mistakenly used without unpacking, leading to an exception because the expected structure was not followed.\n\n### Details of the Issue\n\n- **Issue Description**: \n  - Within the `test_input_df_true_next_transformers` function, an `AttributeError` occurs when accessing `MockTClassifier().fit.call_args`.\n  - The error stems from improper handling and unpacking of the `call_args` returned tuple.\n  \n- **Impact**: \n  - This issue causes the test suite to fail, disrupting continuous integration and validation processes for the package.\n  - All tests need to pass to maintain the reliability and integrity of the library.\n\n- **Steps to Reproduce**: \n  1. Clone the repository.\n  2. Navigate to the `tests` directory.\n  3. Run the test suite using `pytest tests/test_dataframe_mapper.py`.\n  4. Observe the `AttributeError` related to `MockTClassifier` in the output.\n\n### Cause of the Issue\n\nThe cause of the issue is that the `call_args` attribute of a mock object's method returns a tuple containing arguments and keyword arguments. In the test, `MockTClassifier().fit.call_args` was accessed but not unpacked into individual components (args and kwargs). This improper handling led to the `AttributeError` because the subsequent code expected `args` to be properly extracted from the tuple.\n\n### Commit Details\n\n- **Commit Changes**:\n  - Title of the commit: The patch refactors the test function to unpack the call arguments tuple correctly.\n  - In the method, change `args = MockTClassifier().fit.call_args` to `args, _ = MockTClassifier().fit.call_args`.\n\n### How the Commit Solves the Issue\n\n- **Solution Approach**:\n  - The commit addresses the issue by refactoring the erroneous line of code.\n  - Instead of directly accessing `MockTClassifier().fit.call_args`, the code is changed to unpack the returned tuple of the call arguments properly.\n  - By doing `args, _ = MockTClassifier().fit.call_args`, the code now explicitly extracts the positional arguments (args) while ignoring the keyword arguments (kwargs) by assigning them to `_`.\n\n### Explanation from the Developer's Perspective\n\nTo solve the error, the developer recognized that the direct usage of `MockTClassifier().fit.call_args` did not account for the structure that `call_args` returns, which is a tuple of (args, kwargs). The patch introduces a refactoring by unpacking this tuple correctly. This unpacking ensures that the subsequent assertions and checks in the test receive the expected arguments, thus preventing the `AttributeError`.\n\n### Solution Explanation\n\nThe solution ensures proper handling of mock call arguments in Python's mock library:\n1. **Correct Unpacking**: The call arguments are unpacked into two components – positional and keyword arguments.\n2. **Proper Access**: By properly accessing just the positional arguments portion needed for the test, the error is sidestepped.\n3. **Validation of Correctness**: The test now validates that the arguments passed to `fit` are as expected without falling into structural misassumptions.\n\nWith this fix in place, the test suite can run successfully, thereby restoring continuous integration and verification workflows, and upholding the package's reliability."
    }
  }
}