{
  "Selected_candidate": {
    "pr_number": 2603,
    "pr_title": "Support HighLevelGraphs",
    "pr_body": "Fixes https://github.com/dask/dask/issues/4291\r\n\r\n - [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API\r\n",
    "issue_id": 4291,
    "issue_title": "resample function gives 0s instead of NaNs",
    "issue_body": "<!-- Please include a self-contained copy-pastable example that generates the issue if possible.\r\n\r\nPlease be concise with code posted. See guidelines below on how to provide a good bug report:\r\n\r\n- Craft Minimal Bug Reports: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports\r\n- Minimal Complete Verifiable Examples: https://stackoverflow.com/help/mcve\r\n\r\nBug reports that follow these guidelines are easier to diagnose, and so are often handled much more quickly.\r\n-->\r\n\r\n**What happened**:\r\nWhen I use `resample(time='1d').sum(dim='time')` to resample a time series with NaNs, the resampled result gives me 0s instead of NaNs, while NaNs should be the correct answer.\r\n\r\n**What you expected to happen**:\r\n\r\nNaNs should be the correct answer.\r\n\r\n**Minimal Complete Verifiable Example**:\r\n\r\n```python\r\nimport xarray as xr\r\n\r\ndates =  pd.date_range('20200101', '20200601', freq='h')\r\ndata = np.linspace(0, 10, num=len(dates))\r\ndata[0:30*24] = np.nan\r\n\r\nda = xr.DataArray(data, coords=[dates], dims='time')\r\nda.plot()\r\n\r\n# Instead of NaNs, the resampled time series in January 20202 give us 0s, which not right.\r\nda.resample(time='1d', skipna=True).sum(dim='time', skipna=True).plot()\r\n```\r\n\r\n**Anything else we need to know?**:\r\n\r\nDid I misunderstand something here? Thanks!\r\n\r\n\r\n**Environment**:\r\nxarray - '0.15.1' \r\n\r\n<details><summary>Output of <tt>xr.show_versions()</tt></summary>\r\n\r\nxarray - '0.15.1' \r\n\r\n\r\n</details>\r\n",
    "issue_closed_at": "2020-08-05T16:55:58Z",
    "base_commit": "82789bc6f72a76d69ace4bbabd00601e28e808da",
    "changes": [
      {
        "file": "xarray/core/dataarray.py",
        "type": "function",
        "name": "__dask_graph__",
        "class_name": "DataArray",
        "code": "def __dask_graph__(self):\n        return self._to_temp_dataset().__dask_graph__()"
      },
      {
        "file": "xarray/core/dataset.py",
        "type": "function",
        "name": "__dask_graph__",
        "class_name": "Dataset",
        "code": "def __dask_graph__(self):\n        graphs = {k: v.__dask_graph__() for k, v in self.variables.items()}\n        graphs = {k: v for k, v in graphs.items() if v is not None}\n        if not graphs:\n            return None\n        else:\n            from dask import sharedict\n            return sharedict.merge(*graphs.values())"
      },
      {
        "file": "xarray/core/variable.py",
        "type": "function",
        "name": "__dask_graph__",
        "class_name": "Variable",
        "code": "def __dask_graph__(self):\n        if isinstance(self._data, dask_array_type):\n            return self._data.__dask_graph__()\n        else:\n            return None"
      }
    ]
  },
  "Justification": "Candidate A shares structural similarity since both bugs relate to DataArray outputs and their representation in Python interactions. The issue with trailing whitespaces in DatasetGroupBy's representation may be conceptually connected to how other xarray functions, such as resample, generate outputs that aren't as expected, as highlighted by the user concerns over streamlining outputs for consistency with tools like flake8. Addressing formatting inconsistencies, as in both cases, would be critical in the debugging and patching processes, particularly relevant since the fix in Candidate A involves core data structures of xarray. This makes it the most helpful report for understanding and resolving the current issue.",
  "instance_id": "pydata__xarray-5131",
  "repo": "pydata/xarray",
  "created_at": "2021-04-08T09:19:30Z",
  "problem_statement": "Trailing whitespace in DatasetGroupBy text representation\nWhen displaying a DatasetGroupBy in an interactive Python session, the first line of output contains a trailing whitespace. The first example in the documentation demonstrate this:\r\n\r\n```pycon\r\n>>> import xarray as xr, numpy as np\r\n>>> ds = xr.Dataset(\r\n...     {\"foo\": ((\"x\", \"y\"), np.random.rand(4, 3))},\r\n...     coords={\"x\": [10, 20, 30, 40], \"letters\": (\"x\", list(\"abba\"))},\r\n... )\r\n>>> ds.groupby(\"letters\")\r\nDatasetGroupBy, grouped over 'letters' \r\n2 groups with labels 'a', 'b'.\r\n```\r\n\r\nThere is a trailing whitespace in the first line of output which is \"DatasetGroupBy, grouped over 'letters' \". This can be seen more clearly by converting the object to a string (note the whitespace before `\\n`):\r\n\r\n```pycon\r\n>>> str(ds.groupby(\"letters\"))\r\n\"DatasetGroupBy, grouped over 'letters' \\n2 groups with labels 'a', 'b'.\"\r\n```\r\n\r\n\r\nWhile this isn't a problem in itself, it causes an issue for us because we use flake8 in continuous integration to verify that our code is correctly formatted and we also have doctests that rely on DatasetGroupBy textual representation. Flake8 reports a violation on the trailing whitespaces in our docstrings. If we remove the trailing whitespaces, our doctests fail because the expected output doesn't match the actual output. So we have conflicting constraints coming from our tools which both seem reasonable. Trailing whitespaces are forbidden by flake8 because, among other reasons, they lead to noisy git diffs. Doctest want the expected output to be exactly the same as the actual output and considers a trailing whitespace to be a significant difference. We could configure flake8 to ignore this particular violation for the files in which we have these doctests, but this may cause other trailing whitespaces to creep in our code, which we don't want. Unfortunately it's not possible to just add `# NoQA` comments to get flake8 to ignore the violation only for specific lines because that creates a difference between expected and actual output from doctest point of view. Flake8 doesn't allow to disable checks for blocks of code either.\r\n\r\nIs there a reason for having this trailing whitespace in DatasetGroupBy representation? Whould it be OK to remove it? If so please let me know and I can make a pull request.\n",
  "patch": "diff --git a/xarray/core/groupby.py b/xarray/core/groupby.py\n--- a/xarray/core/groupby.py\n+++ b/xarray/core/groupby.py\n@@ -436,7 +436,7 @@ def __iter__(self):\n         return zip(self._unique_coord.values, self._iter_grouped())\n \n     def __repr__(self):\n-        return \"{}, grouped over {!r} \\n{!r} groups with labels {}.\".format(\n+        return \"{}, grouped over {!r}\\n{!r} groups with labels {}.\".format(\n             self.__class__.__name__,\n             self._unique_coord.name,\n             self._unique_coord.size,\n"
}