{
    "Selected_candidate": {
        "pr_number": 2417,
        "pr_title": "Improve NA robustness in VectorPlotter.comp_data",
        "pr_body": "This PR avoids passing `nan` through the matplotlib converters used to obtain a numeric/computable representation of the data (i.e. `VectorPlotter.comp_data`).\r\n\r\nIt also\r\n- codifies that the converted columns in `comp_data` have a float dtype\r\n- converts `inf` to `nan`, in line with what matplotlib does\r\n\r\nFixes #2295 \r\n\r\nAdditionally this will implicitly address #1971 once the regression plots are refactored to use `comp_data` internally. (@mojones, funny that you opened both issues).",
        "issue_id": 2295,
        "issue_title": "histplot with categorical values crashes with missing data, though numerical values work fine",
        "issue_body": "Not sure if this is intended behaviour, but it caught me out due to the difference in handling numerical/categorical data. I note that drawing histograms of categorical data is labelled as experimental, so ignore/close if that explains it.\r\n\r\nWith numerical data `histplot` ignores NaN and plots the other values, this is the behaviour I would expect:\r\n\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    [1.1, 1.2, 1.3, 1.4, np.nan]\r\n)\r\n```\r\n\r\nbut with categorical data it crashes:\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    ['foo', 'foo', 'bar', np.nan]\r\n)\r\n\r\n# output\r\n---------------------------------------------------------------------------\r\nTypeError                                 Traceback (most recent call last)\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1519         try:\r\n-> 1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in convert(value, unit, axis)\r\n     60         # force an update so it also does type checking\r\n---> 61         unit.update(values)\r\n     62         return np.vectorize(unit._mapping.__getitem__, otypes=[float])(values)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in update(self, data)\r\n    210             # OrderedDict just iterates over unique values in data.\r\n--> 211             cbook._check_isinstance((str, bytes), value=val)\r\n    212             if convertible:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/cbook/__init__.py in _check_isinstance(_types, **kwargs)\r\n   2234         if not isinstance(v, types):\r\n-> 2235             raise TypeError(\r\n   2236                 \"{!r} must be an instance of {}, not a {}\".format(\r\n\r\nTypeError: 'value' must be an instance of str or bytes, not a float\r\n\r\nThe above exception was the direct cause of the following exception:\r\n\r\nConversionError                           Traceback (most recent call last)\r\n<ipython-input-61-b132ea7dca6c> in <module>\r\n      2 import seaborn as sns\r\n      3 \r\n----> 4 sns.histplot(\r\n      5     ['foo', 'foo', 'bar', np.nan]\r\n      6 )\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)\r\n   1420     if p.univariate:\r\n   1421 \r\n-> 1422         p.plot_univariate_histogram(\r\n   1423             multiple=multiple,\r\n   1424             element=element,\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)\r\n    421 \r\n    422         # First pass through the data to compute the histograms\r\n--> 423         for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\r\n    424 \r\n    425             # Prepare the relevant data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in iter_data(self, grouping_vars, reverse, from_comp_data)\r\n    965 \r\n    966         if from_comp_data:\r\n--> 967             data = self.comp_data\r\n    968         else:\r\n    969             data = self.plot_data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in comp_data(self)\r\n   1034                 axis = getattr(ax, f\"{var}axis\")\r\n   1035 \r\n-> 1036                 comp_var = axis.convert_units(self.plot_data[var])\r\n   1037                 if axis.get_scale() == \"log\":\r\n   1038                     comp_var = np.log10(comp_var)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n-> 1522             raise munits.ConversionError('Failed to convert value(s) to axis '\r\n   1523                                          f'units: {x!r}') from e\r\n   1524         return ret\r\n\r\nConversionError: Failed to convert value(s) to axis units: 0    foo\r\n1    foo\r\n2    bar\r\n3    NaN\r\nName: x, dtype: object\r\n```\r\n\r\n",
        "issue_closed_at": "2021-01-05T19:40:57Z",
        "base_commit": "aad96f8d2e36ceceb82a42b69aa3a8f47ef7210d",
        "changes": [
            {
                "file": "seaborn/_core.py",
                "type": "function",
                "name": "comp_data",
                "class_name": "VectorPlotter",
                "code": "def comp_data(self):\n        \"\"\"Dataframe with numeric x and y, after unit conversion and log scaling.\"\"\"\n        if not hasattr(self, \"ax\"):\n            # Probably a good idea, but will need a bunch of tests updated\n            # Most of these tests should just use the external interface\n            # Then this can be re-enabled.\n            # raise AttributeError(\"No Axes attached to plotter\")\n            return self.plot_data\n\n        if not hasattr(self, \"_comp_data\"):\n\n            comp_data = (\n                self.plot_data\n                .copy(deep=False)\n                .drop([\"x\", \"y\"], axis=1, errors=\"ignore\")\n            )\n            for var in \"yx\":\n                if var not in self.variables:\n                    continue\n\n                # Get a corresponding axis object so that we can convert the units\n                # to matplotlib's numeric representation, which we can compute on\n                # This is messy and it would probably be better for VectorPlotter\n                # to manage its own converters (using the matplotlib tools).\n                # XXX Currently does not support unshared categorical axes!\n                # (But see comment in _attach about how those don't exist)\n                if self.ax is None:\n                    ax = self.facets.axes.flat[0]\n                else:\n                    ax = self.ax\n                axis = getattr(ax, f\"{var}axis\")\n\n                comp_var = axis.convert_units(self.plot_data[var])\n                if axis.get_scale() == \"log\":\n                    comp_var = np.log10(comp_var)\n                comp_data.insert(0, var, comp_var)\n\n            self._comp_data = comp_data\n\n        return self._comp_data"
            }
        ]
    },
    "Justification": "Candidate B is chosen as the most helpful because it deals with a similar issue involving the handling of data types, specifically how the presence of NaN values affects the behavior of plotting functions. Both the CURRENT bug and Candidate B's report highlight issues with data processing during plotting, specifically focusing on how different data types and missing values can lead to errors in plotting functions. This commonality in data handling problems makes Candidate B particularly relevant for debugging the CURRENT bug related to boolean data in color mapping."
}