{
  "Selected_candidate": {
    "pr_number": 3953,
    "pr_title": "Fix wrong order of coordinate converted from pd.series with MultiIndex",
    "pr_body": "<!-- Feel free to remove check-list items aren't relevant to your change -->\r\n\r\n - [x] Closes #3951\r\n - [x] Tests added\r\n - [x] Passes `isort -rc . && black . && mypy . && flake8`\r\n - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API\r\n\r\nIt looks \r\n`dataframe.set_index(index).index == index` is not always true.\r\n\r\nAdded a workaround for this...",
    "issue_id": 3951,
    "issue_title": "series.to_xarray() fails when MultiIndex not sorted in xarray 0.15.1",
    "issue_body": "series.to_xarray() fails when MultiIndex not sorted in xarray 0.15.1\r\n\r\n# Summary\r\nIt seems that `series.to_xarray()` fails (returns incorrect data) in xarray 0.15.1 when the dataframe's MultiIndex dimensions are not sorted\r\n\r\n# Demonstration\r\n\r\nxarray should be able to handle MultiIndices with unsorted dimensions. Using a fresh conda environment with xarray 0.14.1:\r\n\r\n```python\r\n$ conda run -n py37xr14 python test.py\r\n>>> df\r\nalpha  B  A\r\nnum\r\n0      1  4\r\n1      2  5\r\n2      3  6\r\n\r\n>>> df.stack('alpha')\r\nnum  alpha\r\n0    B        1\r\n     A        4\r\n1    B        2\r\n     A        5\r\n2    B        3\r\n     A        6\r\ndtype: int64\r\n\r\n>>> df.stack('alpha').to_xarray()\r\n<xarray.DataArray (num: 3, alpha: 2)>\r\narray([[1, 4],\r\n       [2, 5],\r\n       [3, 6]])\r\nCoordinates:\r\n  * num      (num) int64 0 1 2\r\n  * alpha    (alpha) object 'B' 'A'\r\n```\r\n\r\nThis fails in xarray 0.15.1 - note the data is not merely reordered - the data in column 'B' now has the incorrect values 4, 5, 6 rather than 1, 2, 3:\r\n\r\n```python\r\n$ conda run -n py37xr15 python test.py\r\n>>> df\r\nalpha  B  A\r\nnum\r\n0      1  4\r\n1      2  5\r\n2      3  6\r\n\r\n>>> df.stack('alpha')\r\nnum  alpha\r\n0    B        1\r\n     A        4\r\n1    B        2\r\n     A        5\r\n2    B        3\r\n     A        6\r\ndtype: int64\r\n\r\n>>> df.stack('alpha').to_xarray()\r\n<xarray.DataArray (num: 3, alpha: 2)>\r\narray([[4, 1],\r\n       [5, 2],\r\n       [6, 3]])\r\nCoordinates:\r\n  * num      (num) int64 0 1 2\r\n  * alpha    (alpha) object 'B' 'A'\r\n```\r\n\r\n## Test setup & environment info\r\n\r\n<details>\r\n    <summary>contents of test.py</summary>\r\n\r\n\r\n```python\r\nimport pandas as pd\r\n\r\ndf = pd.DataFrame({'B': [1, 2, 3], 'A': [4, 5, 6]})\r\ndf = df.rename_axis('num').rename_axis('alpha', axis=1)\r\n\r\nprint(\">>> df\")\r\nprint(df)\r\n\r\nprint(\"\\n>>> df.stack('alpha')\")\r\nprint(df.stack('alpha'))\r\n\r\nprint(\"\\n>>> df.stack('alpha').to_xarray()\")\r\nprint(df.stack('alpha').to_xarray())\r\n```\r\n\r\n</details>\r\n\r\n<details>\r\n    <summary>packages in py37xr14 environment</summary>\r\n\r\n```bash\r\n$ conda list -n py37xr14\r\n# packages in environment at /Users/delgadom/miniconda3/envs/py37xr14:\r\n#\r\n# Name                    Version                   Build  Channel\r\nca-certificates           2020.4.5.1           hecc5488_0    conda-forge\r\ncertifi                   2020.4.5.1       py37hc8dfbb8_0    conda-forge\r\nlibblas                   3.8.0               16_openblas    conda-forge\r\nlibcblas                  3.8.0               16_openblas    conda-forge\r\nlibcxx                    9.0.1                         2    conda-forge\r\nlibffi                    3.2.1             h4a8c4bd_1007    conda-forge\r\nlibgfortran               4.0.0                         2    conda-forge\r\nliblapack                 3.8.0               16_openblas    conda-forge\r\nlibopenblas               0.3.9                h3d69b6c_0    conda-forge\r\nllvm-openmp               9.0.1                h28b9765_2    conda-forge\r\nncurses                   6.1               h0a44026_1002    conda-forge\r\nnumpy                     1.18.1           py37h7687784_1    conda-forge\r\nopenssl                   1.1.1f               h0b31af3_0    conda-forge\r\npandas                    1.0.3            py37h94625e5_0    conda-forge\r\npip                       20.0.2                     py_2    conda-forge\r\npython                    3.7.6           h90870a6_5_cpython    conda-forge\r\npython-dateutil           2.8.1                      py_0    conda-forge\r\npython_abi                3.7                     1_cp37m    conda-forge\r\npytz                      2019.3                     py_0    conda-forge\r\nreadline                  8.0                  hcfe32e1_0    conda-forge\r\nsetuptools                46.1.3           py37hc8dfbb8_0    conda-forge\r\nsix                       1.14.0                     py_1    conda-forge\r\nsqlite                    3.30.1               h93121df_0    conda-forge\r\ntk                        8.6.10               hbbe82c9_0    conda-forge\r\nwheel                     0.34.2                     py_1    conda-forge\r\nxarray                    0.14.1                     py_1    conda-forge\r\nxz                        5.2.5                h0b31af3_0    conda-forge\r\nzlib                      1.2.11            h0b31af3_1006    conda-forge\r\n```\r\n</details>\r\n\r\n<details>\r\n    <summary>packages in py37xr15 environment</summary>\r\n\r\n```bash\r\n$ conda list -n py37xr15\r\n# packages in environment at /Users/delgadom/miniconda3/envs/py37xr15:\r\n#\r\n# Name                    Version                   Build  Channel\r\nca-certificates           2020.4.5.1           hecc5488_0    conda-forge\r\ncertifi                   2020.4.5.1       py37hc8dfbb8_0    conda-forge\r\nlibblas                   3.8.0               16_openblas    conda-forge\r\nlibcblas                  3.8.0               16_openblas    conda-forge\r\nlibcxx                    9.0.1                         2    conda-forge\r\nlibffi                    3.2.1             h4a8c4bd_1007    conda-forge\r\nlibgfortran               4.0.0                         2    conda-forge\r\nliblapack                 3.8.0               16_openblas    conda-forge\r\nlibopenblas               0.3.9                h3d69b6c_0    conda-forge\r\nllvm-openmp               9.0.1                h28b9765_2    conda-forge\r\nncurses                   6.1               h0a44026_1002    conda-forge\r\nnumpy                     1.18.1           py37h7687784_1    conda-forge\r\nopenssl                   1.1.1f               h0b31af3_0    conda-forge\r\npandas                    1.0.3            py37h94625e5_0    conda-forge\r\npip                       20.0.2                     py_2    conda-forge\r\npython                    3.7.6           h90870a6_5_cpython    conda-forge\r\npython-dateutil           2.8.1                      py_0    conda-forge\r\npython_abi                3.7                     1_cp37m    conda-forge\r\npytz                      2019.3                     py_0    conda-forge\r\nreadline                  8.0                  hcfe32e1_0    conda-forge\r\nsetuptools                46.1.3           py37hc8dfbb8_0    conda-forge\r\nsix                       1.14.0                     py_1    conda-forge\r\nsqlite                    3.30.1               h93121df_0    conda-forge\r\ntk                        8.6.10               hbbe82c9_0    conda-forge\r\nwheel                     0.34.2                     py_1    conda-forge\r\nxarray                    0.15.1                     py_0    conda-forge\r\nxz                        5.2.5                h0b31af3_0    conda-forge\r\nzlib                      1.2.11            h0b31af3_1006    conda-forge\r\n```\r\n</details>\r\n",
    "issue_closed_at": "2020-04-08T02:19:11Z",
    "base_commit": "f07adb293e67ae01d305fd1c8fb42f5bad2238e7",
    "changes": [
      {
        "file": "xarray/core/dataset.py",
        "type": "function",
        "name": "from_dataframe",
        "class_name": "Dataset",
        "code": "def from_dataframe(cls, dataframe: pd.DataFrame, sparse: bool = False) -> \"Dataset\":\n        \"\"\"Convert a pandas.DataFrame into an xarray.Dataset\n\n        Each column will be converted into an independent variable in the\n        Dataset. If the dataframe's index is a MultiIndex, it will be expanded\n        into a tensor product of one-dimensional indices (filling in missing\n        values with NaN). This method will produce a Dataset very similar to\n        that on which the 'to_dataframe' method was called, except with\n        possibly redundant dimensions (since all dataset variables will have\n        the same dimensionality)\n\n        Parameters\n        ----------\n        dataframe : pandas.DataFrame\n            DataFrame from which to copy data and indices.\n        sparse : bool\n            If true, create a sparse arrays instead of dense numpy arrays. This\n            can potentially save a large amount of memory if the DataFrame has\n            a MultiIndex. Requires the sparse package (sparse.pydata.org).\n\n        Returns\n        -------\n        New Dataset.\n\n        See also\n        --------\n        xarray.DataArray.from_series\n        \"\"\"\n        # TODO: Add an option to remove dimensions along which the variables\n        # are constant, to enable consistent serialization to/from a dataframe,\n        # even if some variables have different dimensionality.\n\n        if not dataframe.columns.is_unique:\n            raise ValueError(\"cannot convert DataFrame with non-unique columns\")\n\n        idx = remove_unused_levels_categories(dataframe.index)\n        dataframe = dataframe.set_index(idx)\n        obj = cls()\n\n        if isinstance(idx, pd.MultiIndex):\n            dims = tuple(\n                name if name is not None else \"level_%i\" % n\n                for n, name in enumerate(idx.names)\n            )\n            for dim, lev in zip(dims, idx.levels):\n                obj[dim] = (dim, lev)\n        else:\n            index_name = idx.name if idx.name is not None else \"index\"\n            dims = (index_name,)\n            obj[index_name] = (dims, idx)\n\n        if sparse:\n            obj._set_sparse_data_from_dataframe(dataframe, dims)\n        else:\n            obj._set_numpy_data_from_dataframe(dataframe, dims)\n        return obj"
      },
      {
        "file": "xarray/core/indexes.py",
        "type": "line",
        "name": "line 9",
        "code": "from .variable import Variable\n\n\ndef remove_unused_levels_categories(index):\n    \"\"\"\n    Remove unused levels from MultiIndex and unused categories from CategoricalIndex\n    \"\"\""
      },
      {
        "file": "xarray/core/indexes.py",
        "type": "function",
        "name": "remove_unused_levels_categories",
        "class_name": null,
        "code": "def remove_unused_levels_categories(index):\n    \"\"\"\n    Remove unused levels from MultiIndex and unused categories from CategoricalIndex\n    \"\"\"\n    if isinstance(index, pd.MultiIndex):\n        index = index.remove_unused_levels()\n        # if it contains CategoricalIndex, we need to remove unused categories\n        # manually. See https://github.com/pandas-dev/pandas/issues/30846\n        if any(isinstance(lev, pd.CategoricalIndex) for lev in index.levels):\n            levels = []\n            for i, level in enumerate(index.levels):\n                if isinstance(level, pd.CategoricalIndex):\n                    level = level[index.codes[i]].remove_unused_categories()\n                else:\n                    level = level[index.codes[i]]\n                levels.append(level)\n            index = pd.MultiIndex.from_arrays(levels, names=index.names)\n    elif isinstance(index, pd.CategoricalIndex):\n        index = index.remove_unused_categories()\n    return index"
      }
    ]
  },
  "Justification": "Candidate C is the most helpful because it addresses issues related to stacking and unstacking data within a given framework, similar to the CURRENT bug, which concerns the `to_unstacked_dataset` function. Both candidates involve problems with the dimensionality and structure of data in xarray, leading to operational failures. The patch in Candidate C targets the correctness of the data arrangement, which could provide insights into resolving the current issue regarding merging dimensions. Additionally, the fact that both reports involve operations on datasets makes the relevance of the previous fixes directly applicable to troubleshooting and fixing the current bug.",
  "instance_id": "pydata__xarray-4094",
  "repo": "pydata/xarray",
  "created_at": "2020-05-26T00:36:02Z",
  "problem_statement": "to_unstacked_dataset broken for single-dim variables\n<!-- A short summary of the issue, if appropriate -->\r\n\r\n\r\n#### MCVE Code Sample\r\n\r\n```python\r\narr = xr.DataArray(\r\n     np.arange(3),\r\n     coords=[(\"x\", [0, 1, 2])],\r\n )\r\ndata = xr.Dataset({\"a\": arr, \"b\": arr})\r\nstacked = data.to_stacked_array('y', sample_dims=['x'])\r\nunstacked = stacked.to_unstacked_dataset('y')\r\n# MergeError: conflicting values for variable 'y' on objects to be combined. You can skip this check by specifying compat='override'.\r\n```\r\n\r\n#### Expected Output\r\nA working roundtrip.\r\n\r\n#### Problem Description\r\nI need to stack a bunch of variables and later unstack them again, however this doesn't work if the variables only have a single dimension.\r\n\r\n#### Versions\r\n\r\n<details><summary>Output of <tt>xr.show_versions()</tt></summary>\r\n\r\nINSTALLED VERSIONS\r\n------------------\r\ncommit: None\r\npython: 3.7.3 (default, Mar 27 2019, 22:11:17) \r\n[GCC 7.3.0]\r\npython-bits: 64\r\nOS: Linux\r\nOS-release: 4.15.0-96-generic\r\nmachine: x86_64\r\nprocessor: x86_64\r\nbyteorder: little\r\nLC_ALL: None\r\nLANG: en_GB.UTF-8\r\nLOCALE: en_GB.UTF-8\r\nlibhdf5: 1.10.4\r\nlibnetcdf: 4.6.2\r\n\r\nxarray: 0.15.1\r\npandas: 1.0.3\r\nnumpy: 1.17.3\r\nscipy: 1.3.1\r\nnetCDF4: 1.4.2\r\npydap: None\r\nh5netcdf: None\r\nh5py: 2.10.0\r\nNio: None\r\nzarr: None\r\ncftime: 1.0.4.2\r\nnc_time_axis: None\r\nPseudoNetCDF: None\r\nrasterio: None\r\ncfgrib: None\r\niris: None\r\nbottleneck: None\r\ndask: 2.10.1\r\ndistributed: 2.10.0\r\nmatplotlib: 3.1.1\r\ncartopy: None\r\nseaborn: 0.10.0\r\nnumbagg: None\r\nsetuptools: 41.0.0\r\npip: 19.0.3\r\nconda: 4.8.3\r\npytest: 5.3.5\r\nIPython: 7.9.0\r\nsphinx: None\r\n\r\n\r\n</details>\r\n\n",
  "patch": "diff --git a/xarray/core/dataarray.py b/xarray/core/dataarray.py\n--- a/xarray/core/dataarray.py\n+++ b/xarray/core/dataarray.py\n@@ -1961,7 +1961,7 @@ def to_unstacked_dataset(self, dim, level=0):\n         # pull variables out of datarray\n         data_dict = {}\n         for k in variables:\n-            data_dict[k] = self.sel({variable_dim: k}).squeeze(drop=True)\n+            data_dict[k] = self.sel({variable_dim: k}, drop=True).squeeze(drop=True)\n \n         # unstacked dataset\n         return Dataset(data_dict)\n"
}