{
  "Selected_candidate": {
    "pr_number": 2603,
    "pr_title": "Support HighLevelGraphs",
    "pr_body": "Fixes https://github.com/dask/dask/issues/4291\r\n\r\n - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API\r\n",
    "issue_id": 4291,
    "issue_title": "resample function gives 0s instead of NaNs",
    "issue_body": "<!-- Please include a self-contained copy-pastable example that generates the issue if possible.\r\n\r\nPlease be concise with code posted. See guidelines below on how to provide a good bug report:\r\n\r\n- Craft Minimal Bug Reports: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports\r\n- Minimal Complete Verifiable Examples: https://stackoverflow.com/help/mcve\r\n\r\nBug reports that follow these guidelines are easier to diagnose, and so are often handled much more quickly.\r\n-->\r\n\r\n**What happened**:\r\nWhen I use `resample(time='1d').sum(dim='time')` to resample a time series with NaNs, the resampled result gives me 0s instead of NaNs, while NaNs should be the correct answer.\r\n\r\n**What you expected to happen**:\r\n\r\nNaNs should be the correct answer.\r\n\r\n**Minimal Complete Verifiable Example**:\r\n\r\n```python\r\nimport xarray as xr\r\n\r\ndates =  pd.date_range('20200101', '20200601', freq='h')\r\ndata = np.linspace(0, 10, num=len(dates))\r\ndata[0:30*24] = np.nan\r\n\r\nda = xr.DataArray(data, coords=[dates], dims='time')\r\nda.plot()\r\n\r\n# Instead of NaNs, the resampled time series in January 20202 give us 0s, which not right.\r\nda.resample(time='1d', skipna=True).sum(dim='time', skipna=True).plot()\r\n```\r\n\r\n**Anything else we need to know?**:\r\n\r\nDid I misunderstand something here? Thanks!\r\n\r\n\r\n**Environment**:\r\nxarray - '0.15.1' \r\n\r\n<details><summary>Output of <tt>xr.show_versions()</tt></summary>\r\n\r\nxarray - '0.15.1' \r\n\r\n\r\n</details>\r\n",
    "issue_closed_at": "2020-08-05T16:55:58Z",
    "base_commit": "82789bc6f72a76d69ace4bbabd00601e28e808da",
    "changes": [
      {
        "file": "xarray/core/dataarray.py",
        "type": "function",
        "name": "__dask_graph__",
        "class_name": "DataArray",
        "code": "def __dask_graph__(self):\n        return self._to_temp_dataset().__dask_graph__()"
      },
      {
        "file": "xarray/core/dataset.py",
        "type": "function",
        "name": "__dask_graph__",
        "class_name": "Dataset",
        "code": "def __dask_graph__(self):\n        graphs = {k: v.__dask_graph__() for k, v in self.variables.items()}\n        graphs = {k: v for k, v in graphs.items() if v is not None}\n        if not graphs:\n            return None\n        else:\n            from dask import sharedict\n            return sharedict.merge(*graphs.values())"
      },
      {
        "file": "xarray/core/variable.py",
        "type": "function",
        "name": "__dask_graph__",
        "class_name": "Variable",
        "code": "def __dask_graph__(self):\n        if isinstance(self._data, dask_array_type):\n            return self._data.__dask_graph__()\n        else:\n            return None"
      }
    ]
  },
  "Justification": "Candidate A is the most relevant to the CURRENT bug report as both involve the handling of xarray DataArray objects. The issue with `DataSet.update` resulting in a DataArray losing its chunked property relates to the internal handling of DataArrays and how their underlying data is managed. The structural and module similarities are strong, as they both share core xarray functionalities concerning data handling. Additionally, the bug report shows a misconception similar to the confusion in Candidate A about when data is computed. Insights from Candidate A's resolution will help understand potential computational behavior in the CURRENT bug context.",
  "instance_id": "pydata__xarray-4493",
  "repo": "pydata/xarray",
  "created_at": "2020-10-06T22:00:41Z",
  "problem_statement": "DataSet.update causes chunked dask DataArray to evalute its values eagerly \n**What happened**:\r\nUsed `DataSet.update` to update a chunked dask DataArray, but the DataArray is no longer chunked after the update.\r\n\r\n**What you expected to happen**:\r\nThe chunked DataArray should still be chunked after the update\r\n\r\n**Minimal Complete Verifiable Example**:\r\n\r\n```python\r\nfoo = xr.DataArray(np.random.randn(3, 3), dims=(\"x\", \"y\")).chunk()  # foo is chunked\r\nds = xr.Dataset({\"foo\": foo, \"bar\": (\"x\", [1, 2, 3])})  # foo is still chunked here\r\nds  # you can verify that foo is chunked\r\n```\r\n```python\r\nupdate_dict = {\"foo\": ((\"x\", \"y\"), ds.foo[1:, :]), \"bar\": (\"x\", ds.bar[1:])}\r\nupdate_dict[\"foo\"][1]  # foo is still chunked\r\n```\r\n```python\r\nds.update(update_dict)\r\nds  # now foo is no longer chunked\r\n```\r\n\r\n**Environment**:\r\n\r\n<details><summary>Output of <tt>xr.show_versions()</tt></summary>\r\n\r\n```\r\ncommit: None\r\npython: 3.8.3 (default, Jul  2 2020, 11:26:31) \r\n[Clang 10.0.0 ]\r\npython-bits: 64\r\nOS: Darwin\r\nOS-release: 19.6.0\r\nmachine: x86_64\r\nprocessor: i386\r\nbyteorder: little\r\nLC_ALL: None\r\nLANG: en_US.UTF-8\r\nLOCALE: en_US.UTF-8\r\nlibhdf5: 1.10.6\r\nlibnetcdf: None\r\n\r\nxarray: 0.16.0\r\npandas: 1.0.5\r\nnumpy: 1.18.5\r\nscipy: 1.5.0\r\nnetCDF4: None\r\npydap: None\r\nh5netcdf: None\r\nh5py: 2.10.0\r\nNio: None\r\nzarr: None\r\ncftime: None\r\nnc_time_axis: None\r\nPseudoNetCDF: None\r\nrasterio: None\r\ncfgrib: None\r\niris: None\r\nbottleneck: None\r\ndask: 2.20.0\r\ndistributed: 2.20.0\r\nmatplotlib: 3.2.2\r\ncartopy: None\r\nseaborn: None\r\nnumbagg: None\r\npint: None\r\nsetuptools: 49.2.0.post20200714\r\npip: 20.1.1\r\nconda: None\r\npytest: 5.4.3\r\nIPython: 7.16.1\r\nsphinx: None\r\n```\r\n\r\n</details>\nDataset constructor with DataArray triggers computation\nIs it intentional that creating a Dataset with a DataArray and dimension names for a single variable causes computation of that variable?  In other words, why does ```xr.Dataset(dict(a=('d0', xr.DataArray(da.random.random(10)))))``` cause the dask array to compute?\r\n\r\nA longer example:\r\n\r\n```python\r\nimport dask.array as da\r\nimport xarray as xr\r\nx = da.random.randint(1, 10, size=(100, 25))\r\nds = xr.Dataset(dict(a=xr.DataArray(x, dims=('x', 'y'))))\r\ntype(ds.a.data)\r\ndask.array.core.Array\r\n\r\n# Recreate the dataset with the same array, but also redefine the dimensions\r\nds2 = xr.Dataset(dict(a=(('x', 'y'), ds.a))\r\ntype(ds2.a.data)\r\nnumpy.ndarray\r\n```\r\n\r\n\n",
  "patch": "diff --git a/xarray/core/variable.py b/xarray/core/variable.py\n--- a/xarray/core/variable.py\n+++ b/xarray/core/variable.py\n@@ -120,6 +120,16 @@ def as_variable(obj, name=None) -> \"Union[Variable, IndexVariable]\":\n     if isinstance(obj, Variable):\n         obj = obj.copy(deep=False)\n     elif isinstance(obj, tuple):\n+        if isinstance(obj[1], DataArray):\n+            # TODO: change into TypeError\n+            warnings.warn(\n+                (\n+                    \"Using a DataArray object to construct a variable is\"\n+                    \" ambiguous, please extract the data using the .data property.\"\n+                    \" This will raise a TypeError in 0.19.0.\"\n+                ),\n+                DeprecationWarning,\n+            )\n         try:\n             obj = Variable(*obj)\n         except (TypeError, ValueError) as error:\n"
}