{
    "Selected_candidate": {
        "pr_number": 2417,
        "pr_title": "Improve NA robustness in VectorPlotter.comp_data",
        "pr_body": "This PR avoids passing `nan` through the matplotlib converters used to obtain a numeric/computable representation of the data (i.e. `VectorPlotter.comp_data`).\r\n\r\nIt also\r\n- codifies that the converted columns in `comp_data` have a float dtype\r\n- converts `inf` to `nan`, in line with what matplotlib does\r\n\r\nFixes #2295 \r\n\r\nAdditionally this will implicitly address #1971 once the regression plots are refactored to use `comp_data` internally. (@mojones, funny that you opened both issues).",
        "issue_id": 2295,
        "issue_title": "histplot with categorical values crashes with missing data, though numerical values work fine",
        "issue_body": "Not sure if this is intended behaviour, but it caught me out due to the difference in handling numerical/categorical data. I note that drawing histograms of categorical data is labelled as experimental, so ignore/close if that explains it.\r\n\r\nWith numerical data `histplot` ignores NaN and plots the other values, this is the behaviour I would expect:\r\n\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    [1.1, 1.2, 1.3, 1.4, np.nan]\r\n)\r\n```\r\n\r\nbut with categorical data it crashes:\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    ['foo', 'foo', 'bar', np.nan]\r\n)\r\n\r\n# output\r\n---------------------------------------------------------------------------\r\nTypeError                                 Traceback (most recent call last)\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1519         try:\r\n-> 1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in convert(value, unit, axis)\r\n     60         # force an update so it also does type checking\r\n---> 61         unit.update(values)\r\n     62         return np.vectorize(unit._mapping.__getitem__, otypes=[float])(values)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in update(self, data)\r\n    210             # OrderedDict just iterates over unique values in data.\r\n--> 211             cbook._check_isinstance((str, bytes), value=val)\r\n    212             if convertible:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/cbook/__init__.py in _check_isinstance(_types, **kwargs)\r\n   2234         if not isinstance(v, types):\r\n-> 2235             raise TypeError(\r\n   2236                 \"{!r} must be an instance of {}, not a {}\".format(\r\n\r\nTypeError: 'value' must be an instance of str or bytes, not a float\r\n\r\nThe above exception was the direct cause of the following exception:\r\n\r\nConversionError                           Traceback (most recent call last)\r\n<ipython-input-61-b132ea7dca6c> in <module>\r\n      2 import seaborn as sns\r\n      3 \r\n----> 4 sns.histplot(\r\n      5     ['foo', 'foo', 'bar', np.nan]\r\n      6 )\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)\r\n   1420     if p.univariate:\r\n   1421 \r\n-> 1422         p.plot_univariate_histogram(\r\n   1423             multiple=multiple,\r\n   1424             element=element,\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)\r\n    421 \r\n    422         # First pass through the data to compute the histograms\r\n--> 423         for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\r\n    424 \r\n    425             # Prepare the relevant data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in iter_data(self, grouping_vars, reverse, from_comp_data)\r\n    965 \r\n    966         if from_comp_data:\r\n--> 967             data = self.comp_data\r\n    968         else:\r\n    969             data = self.plot_data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in comp_data(self)\r\n   1034                 axis = getattr(ax, f\"{var}axis\")\r\n   1035 \r\n-> 1036                 comp_var = axis.convert_units(self.plot_data[var])\r\n   1037                 if axis.get_scale() == \"log\":\r\n   1038                     comp_var = np.log10(comp_var)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n-> 1522             raise munits.ConversionError('Failed to convert value(s) to axis '\r\n   1523                                          f'units: {x!r}') from e\r\n   1524         return ret\r\n\r\nConversionError: Failed to convert value(s) to axis units: 0    foo\r\n1    foo\r\n2    bar\r\n3    NaN\r\nName: x, dtype: object\r\n```\r\n\r\n",
        "issue_closed_at": "2021-01-05T19:40:57Z",
        "base_commit": "aad96f8d2e36ceceb82a42b69aa3a8f47ef7210d",
        "changes": [
            {
                "file": "seaborn/_core.py",
                "type": "function",
                "name": "comp_data",
                "class_name": "VectorPlotter",
                "code": "def comp_data(self):\n        \"\"\"Dataframe with numeric x and y, after unit conversion and log scaling.\"\"\"\n        if not hasattr(self, \"ax\"):\n            # Probably a good idea, but will need a bunch of tests updated\n            # Most of these tests should just use the external interface\n            # Then this can be re-enabled.\n            # raise AttributeError(\"No Axes attached to plotter\")\n            return self.plot_data\n\n        if not hasattr(self, \"_comp_data\"):\n\n            comp_data = (\n                self.plot_data\n                .copy(deep=False)\n                .drop([\"x\", \"y\"], axis=1, errors=\"ignore\")\n            )\n            for var in \"yx\":\n                if var not in self.variables:\n                    continue\n\n                # Get a corresponding axis object so that we can convert the units\n                # to matplotlib's numeric representation, which we can compute on\n                # This is messy and it would probably be better for VectorPlotter\n                # to manage its own converters (using the matplotlib tools).\n                # XXX Currently does not support unshared categorical axes!\n                # (But see comment in _attach about how those don't exist)\n                if self.ax is None:\n                    ax = self.facets.axes.flat[0]\n                else:\n                    ax = self.ax\n                axis = getattr(ax, f\"{var}axis\")\n\n                comp_var = axis.convert_units(self.plot_data[var])\n                if axis.get_scale() == \"log\":\n                    comp_var = np.log10(comp_var)\n                comp_data.insert(0, var, comp_var)\n\n            self._comp_data = comp_data\n\n        return self._comp_data"
            }
        ]
    },
    "Justification": "Candidate E addresses similar issues with handling missing data, which is directly relevant to the CURRENT bug where missing values lead to a crash in the PolyFit function. Both reports highlight the challenges that arise when missing data is dealt with in plotting functions. Since Candidate E's fix improves robustness against NaN values specifically for histogram plots, it provides crucial insights on how to manage missing data appropriately in the context of statistical visualizations, making it particularly beneficial for debugging the CURRENT bug related to missing data in PolyFit."
}