{
  "original_problem": {
    "instance_id": "mwaskom__seaborn-2848",
    "repo": "mwaskom/seaborn",
    "created_at": "2022-06-11T18:21:32Z",
    "problem_statement": "pairplot fails with hue_order not containing all hue values in seaborn 0.11.1\nIn seaborn < 0.11, one could plot only a subset of the values in the hue column, by passing a hue_order list containing only the desired values. Points with hue values not in the list were simply not plotted.\n```python\niris = sns.load_dataset(\"iris\")`\n# The hue column contains three different species; here we want to plot two\nsns.pairplot(iris, hue=\"species\", hue_order=[\"setosa\", \"versicolor\"])\n```\n\nThis no longer works in 0.11.1. Passing a hue_order list that does not contain some of the values in the hue column raises a long, ugly error traceback. The first exception arises in seaborn/_core.py:\n```\nTypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''\n```\nseaborn version: 0.11.1\nmatplotlib version: 3.3.2\nmatplotlib backends: MacOSX, Agg or jupyter notebook inline.",
    "patch": "diff --git a/seaborn/_oldcore.py b/seaborn/_oldcore.py\n--- a/seaborn/_oldcore.py\n+++ b/seaborn/_oldcore.py\n@@ -149,6 +149,13 @@ def _lookup_single(self, key):\n             # Use a value that's in the original data vector\n             value = self.lookup_table[key]\n         except KeyError:\n+\n+            if self.norm is None:\n+                # Currently we only get here in scatterplot with hue_order,\n+                # because scatterplot does not consider hue a grouping variable\n+                # So unused hue levels are in the data, but not the lookup table\n+                return (0, 0, 0, 0)\n+\n             # Use the colormap to interpolate between existing datapoints\n             # (e.g. in the context of making a continuous legend)\n             try:\n"
  },
  "candidates_evaluated": 5,
  "judgment_result": {
    "candidates": [
      {
        "idx": 1,
        "id": "similar_2307",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue involves user-defined function execution in PairGrid, which is unrelated to handling missing hue values in pairplot."
      },
      {
        "idx": 2,
        "id": "similar_2502",
        "decision": "Not useful",
        "confidence": "Low",
        "reason": "The issue is about log scale application in ECDF plots, which does not relate to hue value handling in pairplot."
      },
      {
        "idx": 3,
        "id": "similar_2285",
        "decision": "Not useful",
        "confidence": "Low",
        "reason": "The issue concerns visual rendering of linewidth in boxenplot, unrelated to hue value handling in pairplot."
      },
      {
        "idx": 4,
        "id": "similar_2211",
        "decision": "Not useful",
        "confidence": "Low",
        "reason": "The issue is about color gradient directionality in displot, which does not relate to hue value handling in pairplot."
      },
      {
        "idx": 5,
        "id": "similar_2295",
        "decision": "Useful",
        "confidence": "Medium",
        "reason": "Both issues involve handling of missing or unexpected data types, suggesting a similar approach to error handling and data validation."
      }
    ]
  },
  "raw_summaries": [
    {
      "similar_issue": {
        "issue_title": "map_* methods of PairGrid broken in 0.11.0",
        "issue_body": "Hi,\r\n\r\nI discovered that the map_* methods of PairGrid seem to be broken in version 0.11.0 for user defined functions. See reproducible example below with a corrfunc defined to plot the pearson correlation value on the lower plots. The function doesn't seem to get evaluated in version 0.11.0. When I pip install seaborn==0.10.1, I get the desired result. Plots from both cases also attached.\r\n\r\n```import numpy as np\r\nfrom scipy import stats\r\nimport pandas as pd\r\nimport seaborn as sns\r\nimport matplotlib.pyplot as plt\r\nsns.set(style=\"white\")\r\n\r\nmean = np.zeros(3)\r\ncov = np.random.uniform(.2, .4, (3, 3))\r\ncov += cov.T\r\ncov[np.diag_indices(3)] = 1\r\ndata = np.random.multivariate_normal(mean, cov, 100)\r\ndf = pd.DataFrame(data, columns=[\"X\", \"Y\", \"Z\"])\r\n\r\ndef corrfunc(x, y,**kws):\r\n    r, _ = stats.pearsonr(x, y)\r\n    ax = plt.gca()\r\n    ax.annotate(\"r = {:.2f}\".format(r),\r\n                xy=(.1, .9), xycoords=ax.transAxes)\r\n\r\ng = sns.PairGrid(df, palette=[\"red\"])\r\ng.map_upper(plt.scatter, s=10)\r\ng.map_diag(sns.distplot, kde=False)\r\ng.map_lower(sns.kdeplot, cmap=\"Blues_d\")\r\ng.map_lower(corrfunc)\r\nplt.show()\r\n\r\n```\r\n\r\n![seaborn-0-11-0](https://user-images.githubusercontent.com/3239171/94969718-380d3e00-04d1-11eb-821b-9aad80ec696e.png)\r\n\r\n![seaborn-0-10-1](https://user-images.githubusercontent.com/3239171/94969722-3a6f9800-04d1-11eb-8ef1-861d9beb1f26.png)\r\n\r\n\r\n\r\n",
        "issue_id": 2307,
        "pr_number": 2368,
        "pr_title": "Fix pairgrid off-diagonal plots with non-string column names",
        "pr_body": "Fixes #2307\r\n\r\nWorking reprex from original issue:\r\n\r\n![image](https://user-images.githubusercontent.com/315810/100743239-108c0200-33aa-11eb-8885-9c61d89b3acc.png)\r\n",
        "issue_closed_at": "2020-12-01T20:48:45Z",
        "base_commit": "2717408b564994002fe08f72ba2dd7e1acf359b6"
      },
      "summary": "### Summary:\nThis issue is a regression problem affecting the functionality of the `map_*` methods in the `PairGrid` class of the Seaborn library, observed in version 0.11.0. Users reported that user-defined functions, specifically for plotting purposes, are not executed as expected when applied to the lower plots of a `PairGrid`. This was identified when attempting to use a custom correlation function to display Pearson correlation values, where the function failed to evaluate in the newer version. The problem does not occur in version 0.10.1, suggesting a change in the newer release has introduced this malfunction.\n\nKey symptoms include the failure of user-defined plotting functions to execute, as evidenced by the absence of expected plot annotations in the generated visualizations. The affected system component is the `PairGrid` class within the Seaborn library, specifically involving its ability to map custom functions to grid plots. The severity of this issue is significant for users relying on custom plot annotations and transformations, as it disrupts their ability to generate expected visual output using the `PairGrid`'s mapping methods.\n\nTechnical details abstracted for general understanding indicate that changes were likely made to the internal methods handling plotting iterations and function mappings within the `PairGrid` class. The patch addresses these changes in functions such as `PairGrid._map_diag_iter_hue` and `PairGrid._plot_bivariate_iter_hue`, suggesting adjustments to how these functions iterate over and apply user-defined functions to the grid elements.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: map_* methods of PairGrid broken in 0.11.0\n\nBody:\nHi,\r\n\r\nI discovered that the map_* methods of PairGrid seem to be broken in version 0.11.0 for user defined functions. See reproducible example below with a corrfunc defined to plot the pearson correlation value on the lower plots. The function doesn't seem to get evaluated in version 0.11.0. When I pip install seaborn==0.10.1, I get the desired result. Plots from both cases also attached.\r\n\r\n```import numpy as np\r\nfrom scipy import stats\r\nimport pandas as pd\r\nimport seaborn as sns\r\nimport matplotlib.pyplot as plt\r\nsns.set(style=\"white\")\r\n\r\nmean = np.zeros(3)\r\ncov = np.random.uniform(.2, .4, (3, 3))\r\ncov += cov.T\r\ncov[np.diag_indices(3)] = 1\r\ndata = np.random.multivariate_normal(mean, cov, 100)\r\ndf = pd.DataFrame(data, columns=[\"X\", \"Y\", \"Z\"])\r\n\r\ndef corrfunc(x, y,**kws):\r\n    r, _ = stats.pearsonr(x, y)\r\n    ax = plt.gca()\r\n    ax.annotate(\"r = {:.2f}\".format(r),\r\n                xy=(.1, .9), xycoords=ax.transAxes)\r\n\r\ng = sns.PairGrid(df, palette=[\"red\"])\r\ng.map_upper(plt.scatter, s=10)\r\ng.map_diag(sns.distplot, kde=False)\r\ng.map_lower(sns.kdeplot, cmap=\"Blues_d\")\r\ng.map_lower(corrfunc)\r\nplt.show()\r\n\r\n```\r\n\r\n![seaborn-0-11-0](https://user-images.githubusercontent.com/3239171/94969718-380d3e00-04d1-11eb-821b-9aad80ec696e.png)\r\n\r\n![seaborn-0-10-1](https://user-images.githubusercontent.com/3239171/94969722-3a6f9800-04d1-11eb-8ef1-861d9beb1f26.png)\r\n\r\n\r\n\r\n\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nseaborn/axisgrid.py\n  function: JointGrid.__init__\n  function: PairGrid._map_diag_iter_hue\n  function: PairGrid._plot_bivariate_iter_hue\n"
    },
    {
      "similar_issue": {
        "issue_title": "displot(kind='ecdf',..., log_scale=True) not working",
        "issue_body": "The following line of code gives an error:\r\n\r\n```\r\nsns.displot(kind='ecdf', data=df, x='col_1', log_scale=True)\r\n\r\nUserWarning: Data has no positive values, and therefore cannot be log-scaled.\r\n```\r\n\r\n\r\nMy data is all positive and kind='hist' or 'kde' works just fine.\r\n\r\n",
        "issue_id": 2502,
        "pr_number": 2504,
        "pr_title": "Fix log scaling in distribution plots",
        "pr_body": "Fixes #2502 \r\n\r\nThis is a huge development footgun; see #2409 for thoughts on how this can be made automatic to reduce the risk of such bugs",
        "issue_closed_at": "2021-03-24T21:54:09Z",
        "base_commit": "ba4bd0fa0a90b2bd00cb62c2b4a5e38013a73ac6"
      },
      "summary": "### Summary: This issue pertains to a problem with the seaborn library when using the `displot` function with the `kind='ecdf'` parameter and enabling the `log_scale` option. The key symptom observed was the generation of a `UserWarning` indicating that the data could not be log-scaled due to the lack of positive values, despite the data being verified as positive. This suggests a discrepancy in how the log scale functionality interprets the data when applied to an empirical cumulative distribution function (ECDF) plot. The affected component is the seaborn library, specifically within its distribution plotting functions. The severity of this issue could impact users who require ECDF plots on a logarithmic scale for data analysis, leading to potential workflow disruptions or incorrect data representations. The technical detail of interest is the malfunction within the function `_DistributionPlotter.plot_univariate_ecdf` in the seaborn distribution module, which likely mismanages the log scale application for ECDF plots.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: displot(kind='ecdf',..., log_scale=True) not working\n\nBody:\nThe following line of code gives an error:\r\n\r\n```\r\nsns.displot(kind='ecdf', data=df, x='col_1', log_scale=True)\r\n\r\nUserWarning: Data has no positive values, and therefore cannot be log-scaled.\r\n```\r\n\r\n\r\nMy data is all positive and kind='hist' or 'kde' works just fine.\r\n\r\n\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nseaborn/distributions.py\n  line: line 57\n  function: _DistributionPlotter.plot_univariate_ecdf\n  function: _DistributionPlotter._plot_single_rug\n"
    },
    {
      "similar_issue": {
        "issue_title": "boxenplot linewidth bug",
        "issue_body": "I tried changing the linewidth of the boxes' edges with the `linedwidth` parameter, but it doesn't seem to do anything. \r\n\r\nThis is actually visible in the documentation:\r\nhttps://seaborn.pydata.org/generated/seaborn.boxenplot.html#seaborn.boxenplot\r\n\r\nThe following example specifies a linewidth of 2.5:\r\n\r\n```python\r\nax = sns.boxenplot(x=\"day\", y=\"total_bill\", hue=\"time\", data=tips, linewidth=2.5)\r\n```\r\n\r\nwhich produces the following image:\r\n![image](https://user-images.githubusercontent.com/10333715/93465245-cdfa7380-f8ea-11ea-8051-ce591f4ead12.png)\r\n\r\nAs you can see the boxes linewidth is unchanged. Instead, the linewidth argument changes the legend boxes. ",
        "issue_id": 2285,
        "pr_number": 2287,
        "pr_title": "Fix: use linewidth in boxenplot elements",
        "pr_body": "Fixes #2285 . Prior to this change, the `linewidth` argument is only used to frame the legend elements and is not applied to boxes and medians.",
        "issue_closed_at": "2020-10-26T22:07:11Z",
        "base_commit": "211cabb09bb9228635bdd80f74f7591d36d7b2a7"
      },
      "summary": "### Summary:\nThis issue is related to a visual rendering problem in the Seaborn data visualization library, specifically within the `boxenplot` function which is intended to create enhanced box plots. The primary concern is that the parameter intended to adjust the linewidth of the boxes' edges, `linewidth`, is not functioning as expected. Instead of altering the width of the plot boxes' edges, the parameter incorrectly affects the legend boxes. This discrepancy is clearly observable in the library's documentation where the expected visual output does not match the actual result when specifying the linewidth.\n\nKey symptoms include the unchanged appearance of the box edges even when a specific linewidth is set, leading to confusion and incorrect visual representation for users relying on this feature for data visualization. \n\nThe affected component is the `boxenplot` function within the Seaborn library, particularly the processes involved in rendering the plot's visual elements. The functions within seaborn/categorical.py, such as `_LVPlotter.__init__`, `_LVPlotter._lvplot`, and `_LVPlotter.horz_perc_box`, have been identified and patched to address this issue.\n\nThe potential impact of this issue is significant for users who depend on accurate visual cues for data interpretation, as improper edge rendering can lead to misinterpretation of data visualizations. This bug may affect the credibility of the library in providing reliable and customizable plotting tools.\n\nRelevant technical details include the need for proper parameter handling within the plotting functions to ensure that user-defined attributes like `linewidth` are applied correctly to the intended elements. The patch modifies the internal plotting logic to fix this discrepancy, ensuring that the linewidth parameter properly adjusts the plot boxes' edges as expected.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: boxenplot linewidth bug\n\nBody:\nI tried changing the linewidth of the boxes' edges with the `linedwidth` parameter, but it doesn't seem to do anything. \r\n\r\nThis is actually visible in the documentation:\r\nhttps://seaborn.pydata.org/generated/seaborn.boxenplot.html#seaborn.boxenplot\r\n\r\nThe following example specifies a linewidth of 2.5:\r\n\r\n```python\r\nax = sns.boxenplot(x=\"day\", y=\"total_bill\", hue=\"time\", data=tips, linewidth=2.5)\r\n```\r\n\r\nwhich produces the following image:\r\n![image](https://user-images.githubusercontent.com/10333715/93465245-cdfa7380-f8ea-11ea-8051-ce591f4ead12.png)\r\n\r\nAs you can see the boxes linewidth is unchanged. Instead, the linewidth argument changes the legend boxes. \n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nseaborn/categorical.py\n  function: _LVPlotter.__init__\n  function: _LVPlotter._lvplot\n  function: _LVPlotter.horz_perc_box\n  function: _LVPlotter.horz_perc_box\n"
    },
    {
      "similar_issue": {
        "issue_title": "Default color gradient directionality in displot",
        "issue_body": "The new `displot` function, when showing 2D histograms or KDE contours has its color gradient going from dark (few observations) to bright (many observations). To me this is confusing since many times the background is bright as well. In the absence of colorbar it is hard to figure out which of the bright squares has lots of observations and which has none. Consider this example (also note that the colorbar is wrong in the range of [0,1) ):\r\n```python\r\nimport seaborn as sns\r\nsns.set(style=\"dark\")\r\npenguins = sns.load_dataset(\"penguins\")\r\nsns.displot(data=penguins, x=\"flipper_length_mm\", y=\"culmen_length_mm\", cbar=True)\r\n```\r\nWhich results with:\r\n![image](https://user-images.githubusercontent.com/13831112/90917473-9d0e4800-e3eb-11ea-9f6a-39a9c32907fe.png)\r\nThe inner brighter squares contain lost of observations, as opposed to the bright bounding squares.\r\n\r\nFor a default I think that since `Axes` are filled by default with a bright color (either mpl default or seaborn-set color), it makes sense to vary colors by default from bright to dark. Also this is consistent with the way `jointplot(kind=\"hex\")` works and the 2D kdeplots from version 0.10. But if current coloring is desired, perhaps make `cbar=True` the default?",
        "issue_id": 2211,
        "pr_number": 2224,
        "pr_title": "Reverse direction of default distributions sequential colormap",
        "pr_body": "I think the contrast properties of a light-to-dark ramp are somewhat less desirable than dark-to-light, but it's a bit more intuitive and, decisively, is more internally consistent with the old defaults. So it wins out.\r\n\r\n![image](https://user-images.githubusercontent.com/315810/91326296-2c23b300-e792-11ea-8e09-3d47cc9d641d.png)\r\n\r\n\r\nFixes #2211 ",
        "issue_closed_at": "2020-08-27T13:32:07Z",
        "base_commit": "f62a4a51f3195be3860d5282e5f3afc4d438d0f3"
      },
      "summary": "### Summary:\nThis issue pertains to the default color gradient directionality used in visualizations created by the `displot` function in the Seaborn library. Specifically, when rendering 2D histograms or Kernel Density Estimate (KDE) contours, the gradient progresses from dark colors, representing areas with fewer observations, to bright colors, indicating areas with more observations. This choice of color scheme can be confusing to users, particularly when the plot's background is also bright, making it difficult to distinguish between areas with high and low data density without a colorbar. The problem is compounded by the colorbar itself displaying incorrect range values, causing further confusion in interpreting the visualization.\n\nKey symptoms and behaviors include:\n- Users experiencing difficulty in identifying areas with high data density due to the brightness of both the plot and its background.\n- Misleading colorbar range values, which add to the confusion in data interpretation.\n\nThe affected components include:\n- The `displot` function in Seaborn, particularly its handling of color gradients and colorbar display.\n\nThe potential impact of this issue is moderate, as it can lead to misinterpretation of data visualizations, affecting data analysis and decision-making processes. This is especially critical for users relying on visual clarity to discern data trends accurately.\n\nRelevant technical details include:\n- The suggestion to reverse the default color gradient from bright-to-dark to align with previous versions' 2D KDE plots and the `jointplot(kind=\"hex\")` function, which uses a more intuitive color scheme.\n- Consideration to enable the colorbar by default to provide users with an immediate reference for interpreting color intensity in terms of data density, thereby improving usability and reducing confusion.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: Default color gradient directionality in displot\n\nBody:\nThe new `displot` function, when showing 2D histograms or KDE contours has its color gradient going from dark (few observations) to bright (many observations). To me this is confusing since many times the background is bright as well. In the absence of colorbar it is hard to figure out which of the bright squares has lots of observations and which has none. Consider this example (also note that the colorbar is wrong in the range of [0,1) ):\r\n```python\r\nimport seaborn as sns\r\nsns.set(style=\"dark\")\r\npenguins = sns.load_dataset(\"penguins\")\r\nsns.displot(data=penguins, x=\"flipper_length_mm\", y=\"culmen_length_mm\", cbar=True)\r\n```\r\nWhich results with:\r\n![image](https://user-images.githubusercontent.com/13831112/90917473-9d0e4800-e3eb-11ea-9f6a-39a9c32907fe.png)\r\nThe inner brighter squares contain lost of observations, as opposed to the bright bounding squares.\r\n\r\nFor a default I think that since `Axes` are filled by default with a bright color (either mpl default or seaborn-set color), it makes sense to vary colors by default from bright to dark. Also this is consistent with the way `jointplot(kind=\"hex\")` works and the 2D kdeplots from version 0.10. But if current coloring is desired, perhaps make `cbar=True` the default?\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nexamples/layered_bivariate_plot.py\n  line: line 18\n\nseaborn/distributions.py\n  function: _DistributionPlotter._cmap_from_color\n  function: _DistributionPlotter._default_discrete\n"
    },
    {
      "similar_issue": {
        "issue_title": "histplot with categorical values crashes with missing data, though numerical values work fine",
        "issue_body": "Not sure if this is intended behaviour, but it caught me out due to the difference in handling numerical/categorical data. I note that drawing histograms of categorical data is labelled as experimental, so ignore/close if that explains it.\r\n\r\nWith numerical data `histplot` ignores NaN and plots the other values, this is the behaviour I would expect:\r\n\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    [1.1, 1.2, 1.3, 1.4, np.nan]\r\n)\r\n```\r\n\r\nbut with categorical data it crashes:\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    ['foo', 'foo', 'bar', np.nan]\r\n)\r\n\r\n# output\r\n---------------------------------------------------------------------------\r\nTypeError                                 Traceback (most recent call last)\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1519         try:\r\n-> 1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in convert(value, unit, axis)\r\n     60         # force an update so it also does type checking\r\n---> 61         unit.update(values)\r\n     62         return np.vectorize(unit._mapping.__getitem__, otypes=[float])(values)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in update(self, data)\r\n    210             # OrderedDict just iterates over unique values in data.\r\n--> 211             cbook._check_isinstance((str, bytes), value=val)\r\n    212             if convertible:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/cbook/__init__.py in _check_isinstance(_types, **kwargs)\r\n   2234         if not isinstance(v, types):\r\n-> 2235             raise TypeError(\r\n   2236                 \"{!r} must be an instance of {}, not a {}\".format(\r\n\r\nTypeError: 'value' must be an instance of str or bytes, not a float\r\n\r\nThe above exception was the direct cause of the following exception:\r\n\r\nConversionError                           Traceback (most recent call last)\r\n<ipython-input-61-b132ea7dca6c> in <module>\r\n      2 import seaborn as sns\r\n      3 \r\n----> 4 sns.histplot(\r\n      5     ['foo', 'foo', 'bar', np.nan]\r\n      6 )\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)\r\n   1420     if p.univariate:\r\n   1421 \r\n-> 1422         p.plot_univariate_histogram(\r\n   1423             multiple=multiple,\r\n   1424             element=element,\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)\r\n    421 \r\n    422         # First pass through the data to compute the histograms\r\n--> 423         for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\r\n    424 \r\n    425             # Prepare the relevant data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in iter_data(self, grouping_vars, reverse, from_comp_data)\r\n    965 \r\n    966         if from_comp_data:\r\n--> 967             data = self.comp_data\r\n    968         else:\r\n    969             data = self.plot_data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in comp_data(self)\r\n   1034                 axis = getattr(ax, f\"{var}axis\")\r\n   1035 \r\n-> 1036                 comp_var = axis.convert_units(self.plot_data[var])\r\n   1037                 if axis.get_scale() == \"log\":\r\n   1038                     comp_var = np.log10(comp_var)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n-> 1522             raise munits.ConversionError('Failed to convert value(s) to axis '\r\n   1523                                          f'units: {x!r}') from e\r\n   1524         return ret\r\n\r\nConversionError: Failed to convert value(s) to axis units: 0    foo\r\n1    foo\r\n2    bar\r\n3    NaN\r\nName: x, dtype: object\r\n```\r\n\r\n",
        "issue_id": 2295,
        "pr_number": 2417,
        "pr_title": "Improve NA robustness in VectorPlotter.comp_data",
        "pr_body": "This PR avoids passing `nan` through the matplotlib converters used to obtain a numeric/computable representation of the data (i.e. `VectorPlotter.comp_data`).\r\n\r\nIt also\r\n- codifies that the converted columns in `comp_data` have a float dtype\r\n- converts `inf` to `nan`, in line with what matplotlib does\r\n\r\nFixes #2295 \r\n\r\nAdditionally this will implicitly address #1971 once the regression plots are refactored to use `comp_data` internally. (@mojones, funny that you opened both issues).",
        "issue_closed_at": "2021-01-05T19:40:57Z",
        "base_commit": "aad96f8d2e36ceceb82a42b69aa3a8f47ef7210d"
      },
      "summary": "### Summary:\nThis issue is related to the handling of missing data within the `histplot` function of the Seaborn library when plotting categorical data. The problem arises because the function behaves differently depending on whether the input data is numerical or categorical. When plotting numerical data, `histplot` successfully ignores missing values, represented as NaN, and plots the remaining data. However, when categorical data containing missing values is used, the function crashes due to a `TypeError`, indicating an inability to handle non-string types, specifically the presence of NaN as a float in categorical data.\n\nKey symptoms include:\n- Successful plotting with numerical data containing NaN values.\n- A crash with a traceback error when attempting to plot categorical data with NaN values, highlighting a `TypeError` and `ConversionError`.\n\nThe affected component is the Seaborn library's `histplot` function, particularly in its handling of categorical data in the `VectorPlotter.comp_data` function. This issue impacts the reliability of the `histplot` function for users working with datasets that include categorical variables and missing data.\n\nThe potential impact is significant for users attempting to visualize categorical data with missing values, as it leads to application crashes and prevents the generation of desired plots. This could hinder data analysis and visualization tasks, especially in environments where categorical data is prevalent.\n\nThe technical details suggest that the issue stems from the data type conversion process within the Matplotlib library, which Seaborn relies on for plotting. The conversion process fails when encountering a mixture of data types, specifically strings and floats, leading to the observed crash. The summary of changes indicates that the resolution involved addressing this conversion within the `VectorPlotter.comp_data` function in the Seaborn library.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: histplot with categorical values crashes with missing data, though numerical values work fine\n\nBody:\nNot sure if this is intended behaviour, but it caught me out due to the difference in handling numerical/categorical data. I note that drawing histograms of categorical data is labelled as experimental, so ignore/close if that explains it.\r\n\r\nWith numerical data `histplot` ignores NaN and plots the other values, this is the behaviour I would expect:\r\n\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    [1.1, 1.2, 1.3, 1.4, np.nan]\r\n)\r\n```\r\n\r\nbut with categorical data it crashes:\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    ['foo', 'foo', 'bar', np.nan]\r\n)\r\n\r\n# output\r\n---------------------------------------------------------------------------\r\nTypeError                                 Traceback (most recent call last)\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1519         try:\r\n-> 1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in convert(value, unit, axis)\r\n     60         # force an update so it also does type checking\r\n---> 61         unit.update(values)\r\n     62         return np.vectorize(unit._mapping.__getitem__, otypes=[float])(values)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in update(self, data)\r\n    210             # OrderedDict just iterates over unique values in data.\r\n--> 211             cbook._check_isinstance((str, bytes), value=val)\r\n    212             if convertible:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/cbook/__init__.py in _check_isinstance(_types, **kwargs)\r\n   2234         if not isinstance(v, types):\r\n-> 2235             raise TypeError(\r\n   2236                 \"{!r} must be an instance of {}, not a {}\".format(\r\n\r\nTypeError: 'value' must be an instance of str or bytes, not a float\r\n\r\nThe above exception was the direct cause of the following exception:\r\n\r\nConversionError                           Traceback (most recent call last)\r\n<ipython-input-61-b132ea7dca6c> in <module>\r\n      2 import seaborn as sns\r\n      3 \r\n----> 4 sns.histplot(\r\n      5     ['foo', 'foo', 'bar', np.nan]\r\n      6 )\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)\r\n   1420     if p.univariate:\r\n   1421 \r\n-> 1422         p.plot_univariate_histogram(\r\n   1423             multiple=multiple,\r\n   1424             element=element,\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)\r\n    421 \r\n    422         # First pass through the data to compute the histograms\r\n--> 423         for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\r\n    424 \r\n    425             # Prepare the relevant data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in iter_data(self, grouping_vars, reverse, from_comp_data)\r\n    965 \r\n    966         if from_comp_data:\r\n--> 967             data = self.comp_data\r\n    968         else:\r\n    969             data = self.plot_data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in comp_data(self)\r\n   1034                 axis = getattr(ax, f\"{var}axis\")\r\n   1035 \r\n-> 1036                 comp_var = axis.convert_units(self.plot_data[var])\r\n   1037                 if axis.get_scale() == \"log\":\r\n   1038                     comp_var = np.log10(comp_var)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n-> 1522             raise munits.ConversionError('Failed to convert value(s) to axis '\r\n   1523                                          f'units: {x!r}') from e\r\n   1524         return ret\r\n\r\nConversionError: Failed to convert value(s) to axis units: 0    foo\r\n1    foo\r\n2    bar\r\n3    NaN\r\nName: x, dtype: object\r\n```\r\n\r\n\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\nseaborn/_core.py\n  function: VectorPlotter.comp_data\n"
    }
  ]
}