{
  "instance_id": "mwaskom__seaborn-3010",
  "repo": "mwaskom/seaborn",
  "created_at": "2022-09-11T19:37:32Z",
  "problem_statement": "PolyFit is not robust to missing data\n```python\r\nso.Plot([1, 2, 3, None, 4], [1, 2, 3, 4, 5]).add(so.Line(), so.PolyFit())\r\n```\r\n\r\n<details><summary>Traceback</summary>\r\n\r\n```python-traceback\r\n---------------------------------------------------------------------------\r\nLinAlgError                               Traceback (most recent call last)\r\nFile ~/miniconda3/envs/seaborn-py39-latest/lib/python3.9/site-packages/IPython/core/formatters.py:343, in BaseFormatter.__call__(self, obj)\r\n    341     method = get_real_method(obj, self.print_method)\r\n    342     if method is not None:\r\n--> 343         return method()\r\n    344     return None\r\n    345 else:\r\n\r\nFile ~/code/seaborn/seaborn/_core/plot.py:265, in Plot._repr_png_(self)\r\n    263 def _repr_png_(self) -> tuple[bytes, dict[str, float]]:\r\n--> 265     return self.plot()._repr_png_()\r\n\r\nFile ~/code/seaborn/seaborn/_core/plot.py:804, in Plot.plot(self, pyplot)\r\n    800 \"\"\"\r\n    801 Compile the plot spec and return the Plotter object.\r\n    802 \"\"\"\r\n    803 with theme_context(self._theme_with_defaults()):\r\n--> 804     return self._plot(pyplot)\r\n\r\nFile ~/code/seaborn/seaborn/_core/plot.py:822, in Plot._plot(self, pyplot)\r\n    819 plotter._setup_scales(self, common, layers, coord_vars)\r\n    821 # Apply statistical transform(s)\r\n--> 822 plotter._compute_stats(self, layers)\r\n    824 # Process scale spec for semantic variables and coordinates computed by stat\r\n    825 plotter._setup_scales(self, common, layers)\r\n\r\nFile ~/code/seaborn/seaborn/_core/plot.py:1110, in Plotter._compute_stats(self, spec, layers)\r\n   1108     grouper = grouping_vars\r\n   1109 groupby = GroupBy(grouper)\r\n-> 1110 res = stat(df, groupby, orient, scales)\r\n   1112 if pair_vars:\r\n   1113     data.frames[coord_vars] = res\r\n\r\nFile ~/code/seaborn/seaborn/_stats/regression.py:41, in PolyFit.__call__(self, data, groupby, orient, scales)\r\n     39 def __call__(self, data, groupby, orient, scales):\r\n---> 41     return groupby.apply(data, self._fit_predict)\r\n\r\nFile ~/code/seaborn/seaborn/_core/groupby.py:109, in GroupBy.apply(self, data, func, *args, **kwargs)\r\n    106 grouper, groups = self._get_groups(data)\r\n    108 if not grouper:\r\n--> 109     return self._reorder_columns(func(data, *args, **kwargs), data)\r\n    111 parts = {}\r\n    112 for key, part_df in data.groupby(grouper, sort=False):\r\n\r\nFile ~/code/seaborn/seaborn/_stats/regression.py:30, in PolyFit._fit_predict(self, data)\r\n     28     xx = yy = []\r\n     29 else:\r\n---> 30     p = np.polyfit(x, y, self.order)\r\n     31     xx = np.linspace(x.min(), x.max(), self.gridsize)\r\n     32     yy = np.polyval(p, xx)\r\n\r\nFile <__array_function__ internals>:180, in polyfit(*args, **kwargs)\r\n\r\nFile ~/miniconda3/envs/seaborn-py39-latest/lib/python3.9/site-packages/numpy/lib/polynomial.py:668, in polyfit(x, y, deg, rcond, full, w, cov)\r\n    666 scale = NX.sqrt((lhs*lhs).sum(axis=0))\r\n    667 lhs /= scale\r\n--> 668 c, resids, rank, s = lstsq(lhs, rhs, rcond)\r\n    669 c = (c.T/scale).T  # broadcast scale coefficients\r\n    671 # warn on rank reduction, which indicates an ill conditioned matrix\r\n\r\nFile <__array_function__ internals>:180, in lstsq(*args, **kwargs)\r\n\r\nFile ~/miniconda3/envs/seaborn-py39-latest/lib/python3.9/site-packages/numpy/linalg/linalg.py:2300, in lstsq(a, b, rcond)\r\n   2297 if n_rhs == 0:\r\n   2298     # lapack can't handle n_rhs = 0 - so allocate the array one larger in that axis\r\n   2299     b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype)\r\n-> 2300 x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)\r\n   2301 if m == 0:\r\n   2302     x[...] = 0\r\n\r\nFile ~/miniconda3/envs/seaborn-py39-latest/lib/python3.9/site-packages/numpy/linalg/linalg.py:101, in _raise_linalgerror_lstsq(err, flag)\r\n    100 def _raise_linalgerror_lstsq(err, flag):\r\n--> 101     raise LinAlgError(\"SVD did not converge in Linear Least Squares\")\r\n\r\nLinAlgError: SVD did not converge in Linear Least Squares\r\n\r\n```\r\n\r\n</details>\n",
  "patch": "diff --git a/seaborn/_stats/regression.py b/seaborn/_stats/regression.py\n--- a/seaborn/_stats/regression.py\n+++ b/seaborn/_stats/regression.py\n@@ -38,7 +38,10 @@ def _fit_predict(self, data):\n \n     def __call__(self, data, groupby, orient, scales):\n \n-        return groupby.apply(data, self._fit_predict)\n+        return (\n+            groupby\n+            .apply(data.dropna(subset=[\"x\", \"y\"]), self._fit_predict)\n+        )\n \n \n @dataclass\n",
  "similar_bug_items": [
    {
      "pr_number": 2477,
      "pr_title": "Fix histplot shrink with non-discrete bins",
      "pr_body": "Fixes #2476\r\n\r\nThe code for shifting the shrunken bars assumed that discrete binning\r\nwas in effect. This is probably the only situation where shrinking\r\nreally makes sense, but there was no prevention or warning of getting\r\nan innacurate result when using it with continuous bins.\r\n\r\nIt works properly now:\r\n\r\n```python\r\nsns.histplot(data=tips, x=\"total_bill\", binwidth=8)\r\nsns.histplot(data=tips, x=\"total_bill\", binwidth=8, shrink=.6)\r\n```\r\n![image](https://user-images.githubusercontent.com/315810/107373188-3e7d7900-6ab4-11eb-9a35-821bd76fbfd0.png)\r\n\r\n```python\r\nsns.histplot(data=tips, x=\"total_bill\", binwidth=8, color=\".6\")\r\nsns.histplot(data=tips, x=\"total_bill\", hue=\"time\", multiple=\"dodge\", binwidth=8, shrink=.6)\r\n```\r\n![image](https://user-images.githubusercontent.com/315810/107373289-610f9200-6ab4-11eb-990d-727132a53526.png)\r\n",
      "issue_id": 2476,
      "issue_title": "shrink parameter in histplot shifts data",
      "issue_body": "The smaller the value of the `shrink` parameter, the more the values in the histogram get shifted towards positive values.\r\n\r\n`import seaborn as sns`\r\n`import numpy as np`\r\n\r\n`r = np.random.random(100)`\r\n\r\n\r\n`sns.histplot(r);`\r\n\r\n`sns.histplot(r, shrink=0.5);`\r\n\r\n![Screen Shot 2021-02-08 at 10 11 43 PM](https://user-images.githubusercontent.com/35338267/107310557-aacb8e80-6a5a-11eb-991e-d4d4cbfd15b7.png)\r\n\r\n![Screen Shot 2021-02-08 at 10 15 46 PM](https://user-images.githubusercontent.com/35338267/107310839-36ddb600-6a5b-11eb-9f0d-54ae495f14cc.png)\r\n",
      "issue_closed_at": "2021-02-10T00:01:13Z",
      "base_commit": "b1dc1bc336ca2aec8308915836ec0550397e856e",
      "changes": [
        {
          "file": "seaborn/distributions.py",
          "type": "function",
          "name": "plot_univariate_histogram",
          "class_name": "_DistributionPlotter",
          "code": "def plot_univariate_histogram(\n        self,\n        multiple,\n        element,\n        fill,\n        common_norm,\n        common_bins,\n        shrink,\n        kde,\n        kde_kws,\n        color,\n        legend,\n        line_kws,\n        estimate_kws,\n        **plot_kws,\n    ):\n\n        # -- Default keyword dicts\n        kde_kws = {} if kde_kws is None else kde_kws.copy()\n        line_kws = {} if line_kws is None else line_kws.copy()\n        estimate_kws = {} if estimate_kws is None else estimate_kws.copy()\n\n        # --  Input checking\n        _check_argument(\"multiple\", [\"layer\", \"stack\", \"fill\", \"dodge\"], multiple)\n        _check_argument(\"element\", [\"bars\", \"step\", \"poly\"], element)\n\n        if estimate_kws[\"discrete\"] and element != \"bars\":\n            raise ValueError(\"`element` must be 'bars' when `discrete` is True\")\n\n        auto_bins_with_weights = (\n            \"weights\" in self.variables\n            and estimate_kws[\"bins\"] == \"auto\"\n            and estimate_kws[\"binwidth\"] is None\n            and not estimate_kws[\"discrete\"]\n        )\n        if auto_bins_with_weights:\n            msg = (\n                \"`bins` cannot be 'auto' when using weights. \"\n                \"Setting `bins=10`, but you will likely want to adjust.\"\n            )\n            warnings.warn(msg, UserWarning)\n            estimate_kws[\"bins\"] = 10\n\n        # Simplify downstream code if we are not normalizing\n        if estimate_kws[\"stat\"] == \"count\":\n            common_norm = False\n\n        # Now initialize the Histogram estimator\n        estimator = Histogram(**estimate_kws)\n        histograms = {}\n\n        # Do pre-compute housekeeping related to multiple groups\n        # TODO best way to account for facet/semantic?\n        if set(self.variables) - {\"x\", \"y\"}:\n\n            all_data = self.comp_data.dropna()\n\n            if common_bins:\n                all_observations = all_data[self.data_variable]\n                estimator.define_bin_edges(\n                    all_observations,\n                    weights=all_data.get(\"weights\", None),\n                )\n\n        else:\n            common_norm = False\n\n        # Estimate the smoothed kernel densities, for use later\n        if kde:\n            # TODO alternatively, clip at min/max bins?\n            kde_kws.setdefault(\"cut\", 0)\n            kde_kws[\"cumulative\"] = estimate_kws[\"cumulative\"]\n            log_scale = self._log_scaled(self.data_variable)\n            densities = self._compute_univariate_density(\n                self.data_variable,\n                common_norm,\n                common_bins,\n                kde_kws,\n                log_scale,\n            )\n\n        # First pass through the data to compute the histograms\n        for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\n\n            # Prepare the relevant data\n            key = tuple(sub_vars.items())\n            observations = sub_data[self.data_variable]\n\n            if \"weights\" in self.variables:\n                weights = sub_data[\"weights\"]\n            else:\n                weights = None\n\n            # Do the histogram computation\n            heights, edges = estimator(observations, weights=weights)\n\n            # Rescale the smoothed curve to match the histogram\n            if kde and key in densities:\n                density = densities[key]\n                if estimator.cumulative:\n                    hist_norm = heights.max()\n                else:\n                    hist_norm = (heights * np.diff(edges)).sum()\n                densities[key] *= hist_norm\n\n            # Convert edges back to original units for plotting\n            if self._log_scaled(self.data_variable):\n                edges = np.power(10, edges)\n\n            # Pack the histogram data and metadata together\n            index = pd.MultiIndex.from_arrays([\n                pd.Index(edges[:-1], name=\"edges\"),\n                pd.Index(np.diff(edges) * shrink, name=\"widths\"),\n            ])\n            hist = pd.Series(heights, index=index, name=\"heights\")\n\n            # Apply scaling to normalize across groups\n            if common_norm:\n                hist *= len(sub_data) / len(all_data)\n\n            # Store the finalized histogram data for future plotting\n            histograms[key] = hist\n\n        # Modify the histogram and density data to resolve multiple groups\n        histograms, baselines = self._resolve_multiple(histograms, multiple)\n        if kde:\n            densities, _ = self._resolve_multiple(\n                densities, None if multiple == \"dodge\" else multiple\n            )\n\n        # Set autoscaling-related meta\n        sticky_stat = (0, 1) if multiple == \"fill\" else (0, np.inf)\n        if multiple == \"fill\":\n            # Filled plots should not have any margins\n            bin_vals = histograms.index.to_frame()\n            edges = bin_vals[\"edges\"]\n            widths = bin_vals[\"widths\"]\n            sticky_data = (\n                edges.min(),\n                edges.max() + widths.loc[edges.idxmax()]\n            )\n        else:\n            sticky_data = []\n\n        # --- Handle default visual attributes\n\n        # Note: default linewidth is determined after plotting\n\n        # Default alpha should depend on other parameters\n        if fill:\n            # Note: will need to account for other grouping semantics if added\n            if \"hue\" in self.variables and multiple == \"layer\":\n                default_alpha = .5 if element == \"bars\" else .25\n            elif kde:\n                default_alpha = .5\n            else:\n                default_alpha = .75\n        else:\n            default_alpha = 1\n        alpha = plot_kws.pop(\"alpha\", default_alpha)  # TODO make parameter?\n\n        hist_artists = []\n\n        # Go back through the dataset and draw the plots\n        for sub_vars, _ in self.iter_data(\"hue\", reverse=True):\n\n            key = tuple(sub_vars.items())\n            hist = histograms[key].rename(\"heights\").reset_index()\n            bottom = np.asarray(baselines[key])\n\n            ax = self._get_axes(sub_vars)\n\n            # Define the matplotlib attributes that depend on semantic mapping\n            if \"hue\" in self.variables:\n                sub_color = self._hue_map(sub_vars[\"hue\"])\n            else:\n                sub_color = color\n\n            artist_kws = self._artist_kws(\n                plot_kws, fill, element, multiple, sub_color, alpha\n            )\n\n            if element == \"bars\":\n\n                # Use matplotlib bar plotting\n\n                plot_func = ax.bar if self.data_variable == \"x\" else ax.barh\n                move = .5 * (1 - shrink)\n                artists = plot_func(\n                    hist[\"edges\"] + move,\n                    hist[\"heights\"] - bottom,\n                    hist[\"widths\"],\n                    bottom,\n                    align=\"edge\",\n                    **artist_kws,\n                )\n\n                for bar in artists:\n                    if self.data_variable == \"x\":\n                        bar.sticky_edges.x[:] = sticky_data\n                        bar.sticky_edges.y[:] = sticky_stat\n                    else:\n                        bar.sticky_edges.x[:] = sticky_stat\n                        bar.sticky_edges.y[:] = sticky_data\n\n                hist_artists.extend(artists)\n\n            else:\n\n                # Use either fill_between or plot to draw hull of histogram\n                if element == \"step\":\n\n                    final = hist.iloc[-1]\n                    x = np.append(hist[\"edges\"], final[\"edges\"] + final[\"widths\"])\n                    y = np.append(hist[\"heights\"], final[\"heights\"])\n                    b = np.append(bottom, bottom[-1])\n\n                    if self.data_variable == \"x\":\n                        step = \"post\"\n                        drawstyle = \"steps-post\"\n                    else:\n                        step = \"post\"  # fillbetweenx handles mapping internally\n                        drawstyle = \"steps-pre\"\n\n                elif element == \"poly\":\n\n                    x = hist[\"edges\"] + hist[\"widths\"] / 2\n                    y = hist[\"heights\"]\n                    b = bottom\n\n                    step = None\n                    drawstyle = None\n\n                if self.data_variable == \"x\":\n                    if fill:\n                        artist = ax.fill_between(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(x, y, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_data\n                    artist.sticky_edges.y[:] = sticky_stat\n                else:\n                    if fill:\n                        artist = ax.fill_betweenx(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(y, x, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_stat\n                    artist.sticky_edges.y[:] = sticky_data\n\n                hist_artists.append(artist)\n\n            if kde:\n\n                # Add in the density curves\n\n                try:\n                    density = densities[key]\n                except KeyError:\n                    continue\n                support = density.index\n\n                if \"x\" in self.variables:\n                    line_args = support, density\n                    sticky_x, sticky_y = None, (0, np.inf)\n                else:\n                    line_args = density, support\n                    sticky_x, sticky_y = (0, np.inf), None\n\n                line_kws[\"color\"] = to_rgba(sub_color, 1)\n                line, = ax.plot(\n                    *line_args, **line_kws,\n                )\n\n                if sticky_x is not None:\n                    line.sticky_edges.x[:] = sticky_x\n                if sticky_y is not None:\n                    line.sticky_edges.y[:] = sticky_y\n\n        if element == \"bars\" and \"linewidth\" not in plot_kws:\n\n            # Now we handle linewidth, which depends on the scaling of the plot\n\n            # Loop through subsets based only on facet variables\n            for sub_vars, _ in self.iter_data():\n\n                ax = self._get_axes(sub_vars)\n\n                # Needed in some cases to get valid transforms.\n                # Innocuous in other cases?\n                ax.autoscale_view()\n\n                # We will base everything on the minimum bin width\n                hist_metadata = [h.index.to_frame() for _, h in histograms.items()]\n                binwidth = min([\n                    h[\"widths\"].min() for h in hist_metadata\n                ])\n\n                # Convert binwidth from data coordinates to pixels\n                pts_x, pts_y = 72 / ax.figure.dpi * (\n                    ax.transData.transform([binwidth, binwidth])\n                    - ax.transData.transform([0, 0])\n                )\n                if self.data_variable == \"x\":\n                    binwidth_points = pts_x\n                else:\n                    binwidth_points = pts_y\n\n                # The relative size of the lines depends on the appearance\n                # This is a provisional value and may need more tweaking\n                default_linewidth = .1 * binwidth_points\n\n                # Set the attributes\n                for bar in hist_artists:\n\n                    # Don't let the lines get too thick\n                    max_linewidth = bar.get_linewidth()\n                    if not fill:\n                        max_linewidth *= 1.5\n\n                    linewidth = min(default_linewidth, max_linewidth)\n\n                    # If not filling, don't let lines dissapear\n                    if not fill:\n                        min_linewidth = .5\n                        linewidth = max(linewidth, min_linewidth)\n\n                    bar.set_linewidth(linewidth)\n\n        # --- Finalize the plot ----\n\n        # Axis labels\n        ax = self.ax if self.ax is not None else self.facets.axes.flat[0]\n        default_x = default_y = \"\"\n        if self.data_variable == \"x\":\n            default_y = estimator.stat.capitalize()\n        if self.data_variable == \"y\":\n            default_x = estimator.stat.capitalize()\n        self._add_axis_labels(ax, default_x, default_y)\n\n        # Legend for semantic variables\n        if \"hue\" in self.variables and legend:\n\n            if fill or element == \"bars\":\n                artist = partial(mpl.patches.Patch)\n            else:\n                artist = partial(mpl.lines.Line2D, [], [])\n\n            ax_obj = self.ax if self.ax is not None else self.facets\n            self._add_legend(\n                ax_obj, artist, fill, element, multiple, alpha, plot_kws, {},\n            )"
        },
        {
          "file": "seaborn/distributions.py",
          "type": "function",
          "name": "plot_univariate_histogram",
          "class_name": "_DistributionPlotter",
          "code": "def plot_univariate_histogram(\n        self,\n        multiple,\n        element,\n        fill,\n        common_norm,\n        common_bins,\n        shrink,\n        kde,\n        kde_kws,\n        color,\n        legend,\n        line_kws,\n        estimate_kws,\n        **plot_kws,\n    ):\n\n        # -- Default keyword dicts\n        kde_kws = {} if kde_kws is None else kde_kws.copy()\n        line_kws = {} if line_kws is None else line_kws.copy()\n        estimate_kws = {} if estimate_kws is None else estimate_kws.copy()\n\n        # --  Input checking\n        _check_argument(\"multiple\", [\"layer\", \"stack\", \"fill\", \"dodge\"], multiple)\n        _check_argument(\"element\", [\"bars\", \"step\", \"poly\"], element)\n\n        if estimate_kws[\"discrete\"] and element != \"bars\":\n            raise ValueError(\"`element` must be 'bars' when `discrete` is True\")\n\n        auto_bins_with_weights = (\n            \"weights\" in self.variables\n            and estimate_kws[\"bins\"] == \"auto\"\n            and estimate_kws[\"binwidth\"] is None\n            and not estimate_kws[\"discrete\"]\n        )\n        if auto_bins_with_weights:\n            msg = (\n                \"`bins` cannot be 'auto' when using weights. \"\n                \"Setting `bins=10`, but you will likely want to adjust.\"\n            )\n            warnings.warn(msg, UserWarning)\n            estimate_kws[\"bins\"] = 10\n\n        # Simplify downstream code if we are not normalizing\n        if estimate_kws[\"stat\"] == \"count\":\n            common_norm = False\n\n        # Now initialize the Histogram estimator\n        estimator = Histogram(**estimate_kws)\n        histograms = {}\n\n        # Do pre-compute housekeeping related to multiple groups\n        # TODO best way to account for facet/semantic?\n        if set(self.variables) - {\"x\", \"y\"}:\n\n            all_data = self.comp_data.dropna()\n\n            if common_bins:\n                all_observations = all_data[self.data_variable]\n                estimator.define_bin_edges(\n                    all_observations,\n                    weights=all_data.get(\"weights\", None),\n                )\n\n        else:\n            common_norm = False\n\n        # Estimate the smoothed kernel densities, for use later\n        if kde:\n            # TODO alternatively, clip at min/max bins?\n            kde_kws.setdefault(\"cut\", 0)\n            kde_kws[\"cumulative\"] = estimate_kws[\"cumulative\"]\n            log_scale = self._log_scaled(self.data_variable)\n            densities = self._compute_univariate_density(\n                self.data_variable,\n                common_norm,\n                common_bins,\n                kde_kws,\n                log_scale,\n            )\n\n        # First pass through the data to compute the histograms\n        for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\n\n            # Prepare the relevant data\n            key = tuple(sub_vars.items())\n            observations = sub_data[self.data_variable]\n\n            if \"weights\" in self.variables:\n                weights = sub_data[\"weights\"]\n            else:\n                weights = None\n\n            # Do the histogram computation\n            heights, edges = estimator(observations, weights=weights)\n\n            # Rescale the smoothed curve to match the histogram\n            if kde and key in densities:\n                density = densities[key]\n                if estimator.cumulative:\n                    hist_norm = heights.max()\n                else:\n                    hist_norm = (heights * np.diff(edges)).sum()\n                densities[key] *= hist_norm\n\n            # Convert edges back to original units for plotting\n            if self._log_scaled(self.data_variable):\n                edges = np.power(10, edges)\n\n            # Pack the histogram data and metadata together\n            index = pd.MultiIndex.from_arrays([\n                pd.Index(edges[:-1], name=\"edges\"),\n                pd.Index(np.diff(edges) * shrink, name=\"widths\"),\n            ])\n            hist = pd.Series(heights, index=index, name=\"heights\")\n\n            # Apply scaling to normalize across groups\n            if common_norm:\n                hist *= len(sub_data) / len(all_data)\n\n            # Store the finalized histogram data for future plotting\n            histograms[key] = hist\n\n        # Modify the histogram and density data to resolve multiple groups\n        histograms, baselines = self._resolve_multiple(histograms, multiple)\n        if kde:\n            densities, _ = self._resolve_multiple(\n                densities, None if multiple == \"dodge\" else multiple\n            )\n\n        # Set autoscaling-related meta\n        sticky_stat = (0, 1) if multiple == \"fill\" else (0, np.inf)\n        if multiple == \"fill\":\n            # Filled plots should not have any margins\n            bin_vals = histograms.index.to_frame()\n            edges = bin_vals[\"edges\"]\n            widths = bin_vals[\"widths\"]\n            sticky_data = (\n                edges.min(),\n                edges.max() + widths.loc[edges.idxmax()]\n            )\n        else:\n            sticky_data = []\n\n        # --- Handle default visual attributes\n\n        # Note: default linewidth is determined after plotting\n\n        # Default alpha should depend on other parameters\n        if fill:\n            # Note: will need to account for other grouping semantics if added\n            if \"hue\" in self.variables and multiple == \"layer\":\n                default_alpha = .5 if element == \"bars\" else .25\n            elif kde:\n                default_alpha = .5\n            else:\n                default_alpha = .75\n        else:\n            default_alpha = 1\n        alpha = plot_kws.pop(\"alpha\", default_alpha)  # TODO make parameter?\n\n        hist_artists = []\n\n        # Go back through the dataset and draw the plots\n        for sub_vars, _ in self.iter_data(\"hue\", reverse=True):\n\n            key = tuple(sub_vars.items())\n            hist = histograms[key].rename(\"heights\").reset_index()\n            bottom = np.asarray(baselines[key])\n\n            ax = self._get_axes(sub_vars)\n\n            # Define the matplotlib attributes that depend on semantic mapping\n            if \"hue\" in self.variables:\n                sub_color = self._hue_map(sub_vars[\"hue\"])\n            else:\n                sub_color = color\n\n            artist_kws = self._artist_kws(\n                plot_kws, fill, element, multiple, sub_color, alpha\n            )\n\n            if element == \"bars\":\n\n                # Use matplotlib bar plotting\n\n                plot_func = ax.bar if self.data_variable == \"x\" else ax.barh\n                move = .5 * (1 - shrink)\n                artists = plot_func(\n                    hist[\"edges\"] + move,\n                    hist[\"heights\"] - bottom,\n                    hist[\"widths\"],\n                    bottom,\n                    align=\"edge\",\n                    **artist_kws,\n                )\n\n                for bar in artists:\n                    if self.data_variable == \"x\":\n                        bar.sticky_edges.x[:] = sticky_data\n                        bar.sticky_edges.y[:] = sticky_stat\n                    else:\n                        bar.sticky_edges.x[:] = sticky_stat\n                        bar.sticky_edges.y[:] = sticky_data\n\n                hist_artists.extend(artists)\n\n            else:\n\n                # Use either fill_between or plot to draw hull of histogram\n                if element == \"step\":\n\n                    final = hist.iloc[-1]\n                    x = np.append(hist[\"edges\"], final[\"edges\"] + final[\"widths\"])\n                    y = np.append(hist[\"heights\"], final[\"heights\"])\n                    b = np.append(bottom, bottom[-1])\n\n                    if self.data_variable == \"x\":\n                        step = \"post\"\n                        drawstyle = \"steps-post\"\n                    else:\n                        step = \"post\"  # fillbetweenx handles mapping internally\n                        drawstyle = \"steps-pre\"\n\n                elif element == \"poly\":\n\n                    x = hist[\"edges\"] + hist[\"widths\"] / 2\n                    y = hist[\"heights\"]\n                    b = bottom\n\n                    step = None\n                    drawstyle = None\n\n                if self.data_variable == \"x\":\n                    if fill:\n                        artist = ax.fill_between(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(x, y, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_data\n                    artist.sticky_edges.y[:] = sticky_stat\n                else:\n                    if fill:\n                        artist = ax.fill_betweenx(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(y, x, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_stat\n                    artist.sticky_edges.y[:] = sticky_data\n\n                hist_artists.append(artist)\n\n            if kde:\n\n                # Add in the density curves\n\n                try:\n                    density = densities[key]\n                except KeyError:\n                    continue\n                support = density.index\n\n                if \"x\" in self.variables:\n                    line_args = support, density\n                    sticky_x, sticky_y = None, (0, np.inf)\n                else:\n                    line_args = density, support\n                    sticky_x, sticky_y = (0, np.inf), None\n\n                line_kws[\"color\"] = to_rgba(sub_color, 1)\n                line, = ax.plot(\n                    *line_args, **line_kws,\n                )\n\n                if sticky_x is not None:\n                    line.sticky_edges.x[:] = sticky_x\n                if sticky_y is not None:\n                    line.sticky_edges.y[:] = sticky_y\n\n        if element == \"bars\" and \"linewidth\" not in plot_kws:\n\n            # Now we handle linewidth, which depends on the scaling of the plot\n\n            # Loop through subsets based only on facet variables\n            for sub_vars, _ in self.iter_data():\n\n                ax = self._get_axes(sub_vars)\n\n                # Needed in some cases to get valid transforms.\n                # Innocuous in other cases?\n                ax.autoscale_view()\n\n                # We will base everything on the minimum bin width\n                hist_metadata = [h.index.to_frame() for _, h in histograms.items()]\n                binwidth = min([\n                    h[\"widths\"].min() for h in hist_metadata\n                ])\n\n                # Convert binwidth from data coordinates to pixels\n                pts_x, pts_y = 72 / ax.figure.dpi * (\n                    ax.transData.transform([binwidth, binwidth])\n                    - ax.transData.transform([0, 0])\n                )\n                if self.data_variable == \"x\":\n                    binwidth_points = pts_x\n                else:\n                    binwidth_points = pts_y\n\n                # The relative size of the lines depends on the appearance\n                # This is a provisional value and may need more tweaking\n                default_linewidth = .1 * binwidth_points\n\n                # Set the attributes\n                for bar in hist_artists:\n\n                    # Don't let the lines get too thick\n                    max_linewidth = bar.get_linewidth()\n                    if not fill:\n                        max_linewidth *= 1.5\n\n                    linewidth = min(default_linewidth, max_linewidth)\n\n                    # If not filling, don't let lines dissapear\n                    if not fill:\n                        min_linewidth = .5\n                        linewidth = max(linewidth, min_linewidth)\n\n                    bar.set_linewidth(linewidth)\n\n        # --- Finalize the plot ----\n\n        # Axis labels\n        ax = self.ax if self.ax is not None else self.facets.axes.flat[0]\n        default_x = default_y = \"\"\n        if self.data_variable == \"x\":\n            default_y = estimator.stat.capitalize()\n        if self.data_variable == \"y\":\n            default_x = estimator.stat.capitalize()\n        self._add_axis_labels(ax, default_x, default_y)\n\n        # Legend for semantic variables\n        if \"hue\" in self.variables and legend:\n\n            if fill or element == \"bars\":\n                artist = partial(mpl.patches.Patch)\n            else:\n                artist = partial(mpl.lines.Line2D, [], [])\n\n            ax_obj = self.ax if self.ax is not None else self.facets\n            self._add_legend(\n                ax_obj, artist, fill, element, multiple, alpha, plot_kws, {},\n            )"
        }
      ]
    },
    {
      "pr_number": 2504,
      "pr_title": "Fix log scaling in distribution plots",
      "pr_body": "Fixes #2502 \r\n\r\nThis is a huge development footgun; see #2409 for thoughts on how this can be made automatic to reduce the risk of such bugs",
      "issue_id": 2502,
      "issue_title": "displot(kind='ecdf',..., log_scale=True) not working",
      "issue_body": "The following line of code gives an error:\r\n\r\n```\r\nsns.displot(kind='ecdf', data=df, x='col_1', log_scale=True)\r\n\r\nUserWarning: Data has no positive values, and therefore cannot be log-scaled.\r\n```\r\n\r\n\r\nMy data is all positive and kind='hist' or 'kde' works just fine.\r\n\r\n",
      "issue_closed_at": "2021-03-24T21:54:09Z",
      "base_commit": "ba4bd0fa0a90b2bd00cb62c2b4a5e38013a73ac6",
      "changes": [
        {
          "file": "seaborn/distributions.py",
          "type": "line",
          "name": "line 57",
          "code": "    \"\"\",\n    log_scale=\"\"\"\nlog_scale : bool or number, or pair of bools or numbers\n    Set a log scale on the data axis (or axes, with bivariate data) with the\n    given base (default 10), and evaluate the KDE in log space.\n    \"\"\",\n    legend=\"\"\"\nlegend : bool"
        },
        {
          "file": "seaborn/distributions.py",
          "type": "function",
          "name": "plot_univariate_ecdf",
          "class_name": "_DistributionPlotter",
          "code": "def plot_univariate_ecdf(self, estimate_kws, legend, **plot_kws):\n\n        estimator = ECDF(**estimate_kws)\n\n        # Set the draw style to step the right way for the data variable\n        drawstyles = dict(x=\"steps-post\", y=\"steps-pre\")\n        plot_kws[\"drawstyle\"] = drawstyles[self.data_variable]\n\n        # Loop through the subsets, transform and plot the data\n        for sub_vars, sub_data in self.iter_data(\n            \"hue\", reverse=True, from_comp_data=True,\n        ):\n\n            # Compute the ECDF\n            if sub_data.empty:\n                continue\n\n            observations = sub_data[self.data_variable]\n            weights = sub_data.get(\"weights\", None)\n            stat, vals = estimator(observations, weights=weights)\n\n            # Assign attributes based on semantic mapping\n            artist_kws = plot_kws.copy()\n            if \"hue\" in self.variables:\n                artist_kws[\"color\"] = self._hue_map(sub_vars[\"hue\"])\n\n            # Work out the orientation of the plot\n            if self.data_variable == \"x\":\n                plot_args = vals, stat\n                stat_variable = \"y\"\n            else:\n                plot_args = stat, vals\n                stat_variable = \"x\"\n\n            if estimator.stat == \"count\":\n                top_edge = len(observations)\n            else:\n                top_edge = 1\n\n            # Draw the line for this subset\n            ax = self._get_axes(sub_vars)\n            artist, = ax.plot(*plot_args, **artist_kws)\n            sticky_edges = getattr(artist.sticky_edges, stat_variable)\n            sticky_edges[:] = 0, top_edge\n\n        # --- Finalize the plot ----\n        ax = self.ax if self.ax is not None else self.facets.axes.flat[0]\n        stat = estimator.stat.capitalize()\n        default_x = default_y = \"\"\n        if self.data_variable == \"x\":\n            default_y = stat\n        if self.data_variable == \"y\":\n            default_x = stat\n        self._add_axis_labels(ax, default_x, default_y)\n\n        if \"hue\" in self.variables and legend:\n            artist = partial(mpl.lines.Line2D, [], [])\n            alpha = plot_kws.get(\"alpha\", 1)\n            ax_obj = self.ax if self.ax is not None else self.facets\n            self._add_legend(\n                ax_obj, artist, False, False, None, alpha, plot_kws, {},\n            )"
        },
        {
          "file": "seaborn/distributions.py",
          "type": "function",
          "name": "_plot_single_rug",
          "class_name": "_DistributionPlotter",
          "code": "def _plot_single_rug(self, sub_data, var, height, ax, kws):\n        \"\"\"Draw a rugplot along one axis of the plot.\"\"\"\n        vector = sub_data[var]\n        n = len(vector)\n\n        # We'll always add a single collection with varying colors\n        if \"hue\" in self.variables:\n            colors = self._hue_map(sub_data[\"hue\"])\n        else:\n            colors = None\n\n        # Build the array of values for the LineCollection\n        if var == \"x\":\n\n            trans = tx.blended_transform_factory(ax.transData, ax.transAxes)\n            xy_pairs = np.column_stack([\n                np.repeat(vector, 2), np.tile([0, height], n)\n            ])\n\n        if var == \"y\":\n\n            trans = tx.blended_transform_factory(ax.transAxes, ax.transData)\n            xy_pairs = np.column_stack([\n                np.tile([0, height], n), np.repeat(vector, 2)\n            ])\n\n        # Draw the lines on the plot\n        line_segs = xy_pairs.reshape([n, 2, 2])\n        ax.add_collection(LineCollection(\n            line_segs, transform=trans, colors=colors, **kws\n        ))\n\n        ax.autoscale_view(scalex=var == \"x\", scaley=var == \"y\")"
        }
      ]
    },
    {
      "pr_number": 2368,
      "pr_title": "Fix pairgrid off-diagonal plots with non-string column names",
      "pr_body": "Fixes #2307\r\n\r\nWorking reprex from original issue:\r\n\r\n![image](https://user-images.githubusercontent.com/315810/100743239-108c0200-33aa-11eb-8885-9c61d89b3acc.png)\r\n",
      "issue_id": 2307,
      "issue_title": "map_* methods of PairGrid broken in 0.11.0",
      "issue_body": "Hi,\r\n\r\nI discovered that the map_* methods of PairGrid seem to be broken in version 0.11.0 for user defined functions. See reproducible example below with a corrfunc defined to plot the pearson correlation value on the lower plots. The function doesn't seem to get evaluated in version 0.11.0. When I pip install seaborn==0.10.1, I get the desired result. Plots from both cases also attached.\r\n\r\n```import numpy as np\r\nfrom scipy import stats\r\nimport pandas as pd\r\nimport seaborn as sns\r\nimport matplotlib.pyplot as plt\r\nsns.set(style=\"white\")\r\n\r\nmean = np.zeros(3)\r\ncov = np.random.uniform(.2, .4, (3, 3))\r\ncov += cov.T\r\ncov[np.diag_indices(3)] = 1\r\ndata = np.random.multivariate_normal(mean, cov, 100)\r\ndf = pd.DataFrame(data, columns=[\"X\", \"Y\", \"Z\"])\r\n\r\ndef corrfunc(x, y,**kws):\r\n    r, _ = stats.pearsonr(x, y)\r\n    ax = plt.gca()\r\n    ax.annotate(\"r = {:.2f}\".format(r),\r\n                xy=(.1, .9), xycoords=ax.transAxes)\r\n\r\ng = sns.PairGrid(df, palette=[\"red\"])\r\ng.map_upper(plt.scatter, s=10)\r\ng.map_diag(sns.distplot, kde=False)\r\ng.map_lower(sns.kdeplot, cmap=\"Blues_d\")\r\ng.map_lower(corrfunc)\r\nplt.show()\r\n\r\n```\r\n\r\n![seaborn-0-11-0](https://user-images.githubusercontent.com/3239171/94969718-380d3e00-04d1-11eb-821b-9aad80ec696e.png)\r\n\r\n![seaborn-0-10-1](https://user-images.githubusercontent.com/3239171/94969722-3a6f9800-04d1-11eb-8ef1-861d9beb1f26.png)\r\n\r\n\r\n\r\n",
      "issue_closed_at": "2020-12-01T20:48:45Z",
      "base_commit": "2717408b564994002fe08f72ba2dd7e1acf359b6",
      "changes": [
        {
          "file": "seaborn/axisgrid.py",
          "type": "function",
          "name": "__init__",
          "class_name": "JointGrid",
          "code": "def __init__(\n        self, *,\n        x=None, y=None,\n        data=None,\n        height=6, ratio=5, space=.2,\n        dropna=False, xlim=None, ylim=None, size=None, marginal_ticks=False,\n        hue=None, palette=None, hue_order=None, hue_norm=None,\n    ):\n        # Handle deprecations\n        if size is not None:\n            height = size\n            msg = (\"The `size` parameter has been renamed to `height`; \"\n                   \"please update your code.\")\n            warnings.warn(msg, UserWarning)\n\n        # Set up the subplot grid\n        f = plt.figure(figsize=(height, height))\n        gs = plt.GridSpec(ratio + 1, ratio + 1)\n\n        ax_joint = f.add_subplot(gs[1:, :-1])\n        ax_marg_x = f.add_subplot(gs[0, :-1], sharex=ax_joint)\n        ax_marg_y = f.add_subplot(gs[1:, -1], sharey=ax_joint)\n\n        self.fig = f\n        self.ax_joint = ax_joint\n        self.ax_marg_x = ax_marg_x\n        self.ax_marg_y = ax_marg_y\n\n        # Turn off tick visibility for the measure axis on the marginal plots\n        plt.setp(ax_marg_x.get_xticklabels(), visible=False)\n        plt.setp(ax_marg_y.get_yticklabels(), visible=False)\n        plt.setp(ax_marg_x.get_xticklabels(minor=True), visible=False)\n        plt.setp(ax_marg_y.get_yticklabels(minor=True), visible=False)\n\n        # Turn off the ticks on the density axis for the marginal plots\n        if not marginal_ticks:\n            plt.setp(ax_marg_x.yaxis.get_majorticklines(), visible=False)\n            plt.setp(ax_marg_x.yaxis.get_minorticklines(), visible=False)\n            plt.setp(ax_marg_y.xaxis.get_majorticklines(), visible=False)\n            plt.setp(ax_marg_y.xaxis.get_minorticklines(), visible=False)\n            plt.setp(ax_marg_x.get_yticklabels(), visible=False)\n            plt.setp(ax_marg_y.get_xticklabels(), visible=False)\n            plt.setp(ax_marg_x.get_yticklabels(minor=True), visible=False)\n            plt.setp(ax_marg_y.get_xticklabels(minor=True), visible=False)\n            ax_marg_x.yaxis.grid(False)\n            ax_marg_y.xaxis.grid(False)\n\n        # Process the input variables\n        p = VectorPlotter(data=data, variables=dict(x=x, y=y, hue=hue))\n        plot_data = p.plot_data.loc[:, p.plot_data.notna().any()]\n\n        # Possibly drop NA\n        if dropna:\n            plot_data = plot_data.dropna()\n\n        def get_var(var):\n            vector = plot_data.get(var, None)\n            if vector is not None:\n                vector = vector.rename(p.variables.get(var, None))\n            return vector\n\n        self.x = get_var(\"x\")\n        self.y = get_var(\"y\")\n        self.hue = get_var(\"hue\")\n\n        for axis in \"xy\":\n            name = p.variables.get(axis, None)\n            if name is not None:\n                getattr(ax_joint, f\"set_{axis}label\")(name)\n\n        if xlim is not None:\n            ax_joint.set_xlim(xlim)\n        if ylim is not None:\n            ax_joint.set_ylim(ylim)\n\n        # Store the semantic mapping parameters for axes-level functions\n        self._hue_params = dict(palette=palette, hue_order=hue_order, hue_norm=hue_norm)\n\n        # Make the grid look nice\n        utils.despine(f)\n        if not marginal_ticks:\n            utils.despine(ax=ax_marg_x, left=True)\n            utils.despine(ax=ax_marg_y, bottom=True)\n        for axes in [ax_marg_x, ax_marg_y]:\n            for axis in [axes.xaxis, axes.yaxis]:\n                axis.label.set_visible(False)\n        f.tight_layout()\n        f.subplots_adjust(hspace=space, wspace=space)"
        },
        {
          "file": "seaborn/axisgrid.py",
          "type": "function",
          "name": "_map_diag_iter_hue",
          "class_name": "PairGrid",
          "code": "def _map_diag_iter_hue(self, func, **kwargs):\n        \"\"\"Put marginal plot on each diagonal axes, iterating over hue.\"\"\"\n        # Plot on each of the diagonal axes\n        fixed_color = kwargs.pop(\"color\", None)\n\n        for var, ax in zip(self.diag_vars, self.diag_axes):\n            hue_grouped = self.data[var].groupby(self.hue_vals)\n\n            plt.sca(ax)\n\n            for k, label_k in enumerate(self.hue_names):\n\n                # Attempt to get data for this level, allowing for empty\n                try:\n                    data_k = hue_grouped.get_group(label_k)\n                except KeyError:\n                    data_k = pd.Series([], dtype=float)\n\n                if fixed_color is None:\n                    color = self.palette[k]\n                else:\n                    color = fixed_color\n\n                if self._dropna:\n                    data_k = utils.remove_na(data_k)\n\n                if str(func.__module__).startswith(\"seaborn\"):\n                    func(x=data_k, label=label_k, color=color, **kwargs)\n                else:\n                    func(data_k, label=label_k, color=color, **kwargs)\n\n            self._clean_axis(ax)\n\n        self._add_axis_labels()\n\n        return self"
        },
        {
          "file": "seaborn/axisgrid.py",
          "type": "function",
          "name": "_plot_bivariate_iter_hue",
          "class_name": "PairGrid",
          "code": "def _plot_bivariate_iter_hue(self, x_var, y_var, ax, func, **kwargs):\n        \"\"\"Draw a bivariate plot while iterating over hue subsets.\"\"\"\n        plt.sca(ax)\n        if x_var == y_var:\n            axes_vars = [x_var]\n        else:\n            axes_vars = [x_var, y_var]\n\n        hue_grouped = self.data.groupby(self.hue_vals)\n        for k, label_k in enumerate(self.hue_names):\n\n            kws = kwargs.copy()\n\n            # Attempt to get data for this level, allowing for empty\n            try:\n                data_k = hue_grouped.get_group(label_k)\n            except KeyError:\n                data_k = pd.DataFrame(columns=axes_vars,\n                                      dtype=float)\n\n            if self._dropna:\n                data_k = data_k[axes_vars].dropna()\n\n            x = data_k[x_var]\n            y = data_k[y_var]\n\n            for kw, val_list in self.hue_kws.items():\n                kws[kw] = val_list[k]\n            kws.setdefault(\"color\", self.palette[k])\n            if self._hue_var is not None:\n                kws[\"label\"] = label_k\n\n            if str(func.__module__).startswith(\"seaborn\"):\n                func(x=x, y=y, **kws)\n            else:\n                func(x, y, **kws)\n\n        self._update_legend_data(ax)\n        self._clean_axis(ax)"
        }
      ]
    },
    {
      "pr_number": 2559,
      "pr_title": "Reduce redundant computation in distplot linewidth",
      "pr_body": "Fixes #2555\r\n\r\n* moves `binwidth`, `thin_bar_idx`, and `left_edge` calculation out of the loop since it's invariant over the iterations\r\n* Only `set_linewidth` one per bar, instead of setting all bar's linewidth once per facet\r\n\r\n## Some evidence this works\r\n\r\nI've run this on this branch, and on master\r\n\r\n```python\r\nimport seaborn as sns\r\nimport matplotlib as mpl\r\nfrom setuptools_scm import get_version\r\n\r\n# To show commit\r\nprint(get_version(root='..', relative_to=sns.__file__))\r\n\r\ndiamonds = sns.load_dataset(\"diamonds\")\r\n```\r\n\r\n```python\r\n%%timeit\r\ng = sns.displot(diamonds, x=\"price\", row=\"cut\", col=\"color\")\r\n```\r\n\r\n```python\r\ng = sns.displot(diamonds, x=\"price\", row=\"cut\", col=\"color\")\r\nprint({rect.get_linewidth() for rect in g.fig.findobj(mpl.patches.Rectangle)})\r\n```\r\n\r\n### This branch\r\n\r\n```\r\n0.10.1.dev198+ga365acc\r\n4.08 s \u00b1 60.8 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n{0.0, 0.37466112012984926}\r\n```\r\n\r\n### master\r\n\r\n```\r\n0.10.1.dev197+g66b4783\r\n5.03 s \u00b1 82.5 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n{0.0, 0.37466112012984926}\r\n```\r\n\r\n## TODO\r\n\r\n- [x] Tests (manual?)",
      "issue_id": 2555,
      "issue_title": "linewidth calculation slow for histograms",
      "issue_body": "## Example\r\n\r\n```python\r\nimport pandas as pd\r\nimport seaborn as sns\r\n\r\ndiamonds = sns.load_dataset(\"diamonds\")\r\n```\r\n\r\n```python\r\n%%timeit\r\nsns.displot(diamonds, x=\"price\", row=\"cut\", col=\"color\")\r\n# 5.85 s \u00b1 157 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n```\r\n\r\n```python\r\n%%timeit\r\nsns.displot(diamonds, x=\"price\", row=\"cut\", col=\"color\", linewidth=0.1)\r\n# 4.37 s \u00b1 81.8 ms per loop (mean \u00b1 std. dev. of 7 runs, 1 loop each)\r\n```\r\n\r\n## Description\r\n\r\nSetting the linewidth here is taking about ~25% of the computation time. I would note that in the case where I caught this, passing `linewidth` cut plot time in half. I believe the issue is due to an inefficiencies here: https://github.com/mwaskom/seaborn/blob/66b478390c20089de7f9644ba9965ce5d4f973ff/seaborn/distributions.py#L638-L684\r\n\r\nOne cause is that line widths are being set multiple times on each object. In the loop linked above, line widths are calculated, and then set for all subplots (not just the subplot corresponding for this subset). We can see this is the case since all the rectangles have the same linewidth in the end:\r\n\r\n```python\r\nfrom itertools import chain\r\nimport matplotlib as mpl\r\n\r\nsnsfig = sns.displot(diamonds, x=\"price\", row=\"cut\", col=\"color\")\r\n\r\nchildren = chain.from_iterable(ax.get_children() for ax in snsfig.axes.flat)\r\nrects = filter(lambda x: isinstance(x, mpl.patches.Rectangle), children)\r\n\r\n{rect.get_linewidth() for rect in rects}\r\n```\r\n\r\n```\r\n{0.0, 0.37466112012984926}\r\n```\r\n\r\nSomething a little more dataset dependent is this expression:\r\n\r\nhttps://github.com/mwaskom/seaborn/blob/66b478390c20089de7f9644ba9965ce5d4f973ff/seaborn/distributions.py#L647-L650\r\n\r\nWhere `index.to_frame()` can take a very long time in some circumstances. I'm not sure exactly what these are, but for my own data [`%%snakeviz`](https://jiffyclub.github.io/snakeviz/) noted a lot of time spend doing this. But I believe `hist_metadata` should be the same for each iterations of `sub_vars`, so this could probably just be moved outside the loop.\r\n\r\n## Version info\r\n\r\nThis is using seaborn at commit 66b478390c20089de7f9644ba9965ce5d4f973ff, though I'd initially noticed this using the last release.\r\n\r\n<details>\r\n<summary> sinfo report </summary>\r\n\r\n\r\n```\r\n-----\r\nmatplotlib          3.4.1\r\npandas              1.2.4\r\nseaborn             0.12.0.dev0\r\nsinfo               0.3.1\r\nvega_datasets       0.9.0\r\n-----\r\nPIL                 8.2.0\r\nappnope             0.1.2\r\nbackcall            0.2.0\r\ncffi                1.14.5\r\ncycler              0.10.0\r\ncython_runtime      NA\r\ndateutil            2.8.1\r\ndecorator           4.4.2\r\nipykernel           5.3.4\r\nipython_genutils    0.2.0\r\njedi                0.17.0\r\nkiwisolver          1.3.1\r\nmatplotlib          3.4.1\r\nmpl_toolkits        NA\r\nnumexpr             2.7.3\r\nnumpy               1.20.2\r\npandas              1.2.4\r\nparso               0.8.2\r\npexpect             4.8.0\r\npickleshare         0.7.5\r\nprompt_toolkit      3.0.17\r\nptyprocess          0.7.0\r\npygments            2.8.1\r\npyparsing           2.4.7\r\npytz                2021.1\r\nscipy               1.6.2\r\nseaborn             0.12.0.dev0\r\nsinfo               0.3.1\r\nsix                 1.15.0\r\nsnakeviz            2.1.0\r\nstatsmodels         0.12.2\r\nstoremagic          NA\r\ntornado             6.1\r\ntraitlets           5.0.5\r\nvega_datasets       0.9.0\r\nwcwidth             0.2.5\r\nzmq                 20.0.0\r\n-----\r\nIPython             7.22.0\r\njupyter_client      6.1.12\r\njupyter_core        4.7.1\r\nnotebook            6.3.0\r\n-----\r\nPython 3.8.8 (default, Apr 13 2021, 12:59:45) [Clang 10.0.0 ]\r\nmacOS-10.15.7-x86_64-i386-64bit\r\n16 logical CPU cores, i386\r\n-----\r\nSession information updated at 2021-04-16 16:25\r\n```\r\n\r\n</details>\r\n",
      "issue_closed_at": "2021-04-23T11:40:36Z",
      "base_commit": "e04b07eb3df135511e71e556c2bd34ef59ba08ba",
      "changes": [
        {
          "file": "seaborn/distributions.py",
          "type": "function",
          "name": "plot_univariate_histogram",
          "class_name": "_DistributionPlotter",
          "code": "def plot_univariate_histogram(\n        self,\n        multiple,\n        element,\n        fill,\n        common_norm,\n        common_bins,\n        shrink,\n        kde,\n        kde_kws,\n        color,\n        legend,\n        line_kws,\n        estimate_kws,\n        **plot_kws,\n    ):\n\n        # -- Default keyword dicts\n        kde_kws = {} if kde_kws is None else kde_kws.copy()\n        line_kws = {} if line_kws is None else line_kws.copy()\n        estimate_kws = {} if estimate_kws is None else estimate_kws.copy()\n\n        # --  Input checking\n        _check_argument(\"multiple\", [\"layer\", \"stack\", \"fill\", \"dodge\"], multiple)\n        _check_argument(\"element\", [\"bars\", \"step\", \"poly\"], element)\n\n        if estimate_kws[\"discrete\"] and element != \"bars\":\n            raise ValueError(\"`element` must be 'bars' when `discrete` is True\")\n\n        auto_bins_with_weights = (\n            \"weights\" in self.variables\n            and estimate_kws[\"bins\"] == \"auto\"\n            and estimate_kws[\"binwidth\"] is None\n            and not estimate_kws[\"discrete\"]\n        )\n        if auto_bins_with_weights:\n            msg = (\n                \"`bins` cannot be 'auto' when using weights. \"\n                \"Setting `bins=10`, but you will likely want to adjust.\"\n            )\n            warnings.warn(msg, UserWarning)\n            estimate_kws[\"bins\"] = 10\n\n        # Simplify downstream code if we are not normalizing\n        if estimate_kws[\"stat\"] == \"count\":\n            common_norm = False\n\n        # Now initialize the Histogram estimator\n        estimator = Histogram(**estimate_kws)\n        histograms = {}\n\n        # Do pre-compute housekeeping related to multiple groups\n        # TODO best way to account for facet/semantic?\n        if set(self.variables) - {\"x\", \"y\"}:\n\n            all_data = self.comp_data.dropna()\n\n            if common_bins:\n                all_observations = all_data[self.data_variable]\n                estimator.define_bin_edges(\n                    all_observations,\n                    weights=all_data.get(\"weights\", None),\n                )\n\n        else:\n            common_norm = False\n\n        # Estimate the smoothed kernel densities, for use later\n        if kde:\n            # TODO alternatively, clip at min/max bins?\n            kde_kws.setdefault(\"cut\", 0)\n            kde_kws[\"cumulative\"] = estimate_kws[\"cumulative\"]\n            log_scale = self._log_scaled(self.data_variable)\n            densities = self._compute_univariate_density(\n                self.data_variable,\n                common_norm,\n                common_bins,\n                kde_kws,\n                log_scale,\n            )\n\n        # First pass through the data to compute the histograms\n        for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\n\n            # Prepare the relevant data\n            key = tuple(sub_vars.items())\n            observations = sub_data[self.data_variable]\n\n            if \"weights\" in self.variables:\n                weights = sub_data[\"weights\"]\n            else:\n                weights = None\n\n            # Do the histogram computation\n            heights, edges = estimator(observations, weights=weights)\n\n            # Rescale the smoothed curve to match the histogram\n            if kde and key in densities:\n                density = densities[key]\n                if estimator.cumulative:\n                    hist_norm = heights.max()\n                else:\n                    hist_norm = (heights * np.diff(edges)).sum()\n                densities[key] *= hist_norm\n\n            # Convert edges back to original units for plotting\n            if self._log_scaled(self.data_variable):\n                edges = np.power(10, edges)\n\n            # Pack the histogram data and metadata together\n            orig_widths = np.diff(edges)\n            widths = shrink * orig_widths\n            edges = edges[:-1] + (1 - shrink) / 2 * orig_widths\n            index = pd.MultiIndex.from_arrays([\n                pd.Index(edges, name=\"edges\"),\n                pd.Index(widths, name=\"widths\"),\n            ])\n            hist = pd.Series(heights, index=index, name=\"heights\")\n\n            # Apply scaling to normalize across groups\n            if common_norm:\n                hist *= len(sub_data) / len(all_data)\n\n            # Store the finalized histogram data for future plotting\n            histograms[key] = hist\n\n        # Modify the histogram and density data to resolve multiple groups\n        histograms, baselines = self._resolve_multiple(histograms, multiple)\n        if kde:\n            densities, _ = self._resolve_multiple(\n                densities, None if multiple == \"dodge\" else multiple\n            )\n\n        # Set autoscaling-related meta\n        sticky_stat = (0, 1) if multiple == \"fill\" else (0, np.inf)\n        if multiple == \"fill\":\n            # Filled plots should not have any margins\n            bin_vals = histograms.index.to_frame()\n            edges = bin_vals[\"edges\"]\n            widths = bin_vals[\"widths\"]\n            sticky_data = (\n                edges.min(),\n                edges.max() + widths.loc[edges.idxmax()]\n            )\n        else:\n            sticky_data = []\n\n        # --- Handle default visual attributes\n\n        # Note: default linewidth is determined after plotting\n\n        # Default alpha should depend on other parameters\n        if fill:\n            # Note: will need to account for other grouping semantics if added\n            if \"hue\" in self.variables and multiple == \"layer\":\n                default_alpha = .5 if element == \"bars\" else .25\n            elif kde:\n                default_alpha = .5\n            else:\n                default_alpha = .75\n        else:\n            default_alpha = 1\n        alpha = plot_kws.pop(\"alpha\", default_alpha)  # TODO make parameter?\n\n        hist_artists = []\n\n        # Go back through the dataset and draw the plots\n        for sub_vars, _ in self.iter_data(\"hue\", reverse=True):\n\n            key = tuple(sub_vars.items())\n            hist = histograms[key].rename(\"heights\").reset_index()\n            bottom = np.asarray(baselines[key])\n\n            ax = self._get_axes(sub_vars)\n\n            # Define the matplotlib attributes that depend on semantic mapping\n            if \"hue\" in self.variables:\n                sub_color = self._hue_map(sub_vars[\"hue\"])\n            else:\n                sub_color = color\n\n            artist_kws = self._artist_kws(\n                plot_kws, fill, element, multiple, sub_color, alpha\n            )\n\n            if element == \"bars\":\n\n                # Use matplotlib bar plotting\n\n                plot_func = ax.bar if self.data_variable == \"x\" else ax.barh\n                artists = plot_func(\n                    hist[\"edges\"],\n                    hist[\"heights\"] - bottom,\n                    hist[\"widths\"],\n                    bottom,\n                    align=\"edge\",\n                    **artist_kws,\n                )\n\n                for bar in artists:\n                    if self.data_variable == \"x\":\n                        bar.sticky_edges.x[:] = sticky_data\n                        bar.sticky_edges.y[:] = sticky_stat\n                    else:\n                        bar.sticky_edges.x[:] = sticky_stat\n                        bar.sticky_edges.y[:] = sticky_data\n\n                hist_artists.extend(artists)\n\n            else:\n\n                # Use either fill_between or plot to draw hull of histogram\n                if element == \"step\":\n\n                    final = hist.iloc[-1]\n                    x = np.append(hist[\"edges\"], final[\"edges\"] + final[\"widths\"])\n                    y = np.append(hist[\"heights\"], final[\"heights\"])\n                    b = np.append(bottom, bottom[-1])\n\n                    if self.data_variable == \"x\":\n                        step = \"post\"\n                        drawstyle = \"steps-post\"\n                    else:\n                        step = \"post\"  # fillbetweenx handles mapping internally\n                        drawstyle = \"steps-pre\"\n\n                elif element == \"poly\":\n\n                    x = hist[\"edges\"] + hist[\"widths\"] / 2\n                    y = hist[\"heights\"]\n                    b = bottom\n\n                    step = None\n                    drawstyle = None\n\n                if self.data_variable == \"x\":\n                    if fill:\n                        artist = ax.fill_between(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(x, y, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_data\n                    artist.sticky_edges.y[:] = sticky_stat\n                else:\n                    if fill:\n                        artist = ax.fill_betweenx(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(y, x, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_stat\n                    artist.sticky_edges.y[:] = sticky_data\n\n                hist_artists.append(artist)\n\n            if kde:\n\n                # Add in the density curves\n\n                try:\n                    density = densities[key]\n                except KeyError:\n                    continue\n                support = density.index\n\n                if \"x\" in self.variables:\n                    line_args = support, density\n                    sticky_x, sticky_y = None, (0, np.inf)\n                else:\n                    line_args = density, support\n                    sticky_x, sticky_y = (0, np.inf), None\n\n                line_kws[\"color\"] = to_rgba(sub_color, 1)\n                line, = ax.plot(\n                    *line_args, **line_kws,\n                )\n\n                if sticky_x is not None:\n                    line.sticky_edges.x[:] = sticky_x\n                if sticky_y is not None:\n                    line.sticky_edges.y[:] = sticky_y\n\n        if element == \"bars\" and \"linewidth\" not in plot_kws:\n\n            # Now we handle linewidth, which depends on the scaling of the plot\n\n            # Loop through subsets based only on facet variables\n            for sub_vars, _ in self.iter_data():\n\n                ax = self._get_axes(sub_vars)\n\n                # Needed in some cases to get valid transforms.\n                # Innocuous in other cases?\n                ax.autoscale_view()\n\n                # We will base everything on the minimum bin width\n                hist_metadata = pd.concat([\n                    # Use .items for generality over dict or df\n                    h.index.to_frame() for _, h in histograms.items()\n                ]).reset_index(drop=True)\n                thin_bar_idx = hist_metadata[\"widths\"].idxmin()\n                binwidth = hist_metadata.loc[thin_bar_idx, \"widths\"]\n                left_edge = hist_metadata.loc[thin_bar_idx, \"edges\"]\n\n                # Convert binwidth from data coordinates to pixels\n                pts_x, pts_y = 72 / ax.figure.dpi * abs(\n                    ax.transData.transform([left_edge + binwidth] * 2)\n                    - ax.transData.transform([left_edge] * 2)\n                )\n                if self.data_variable == \"x\":\n                    binwidth_points = pts_x\n                else:\n                    binwidth_points = pts_y\n\n                # The relative size of the lines depends on the appearance\n                # This is a provisional value and may need more tweaking\n                default_linewidth = .1 * binwidth_points\n\n                # Set the attributes\n                for bar in hist_artists:\n\n                    # Don't let the lines get too thick\n                    max_linewidth = bar.get_linewidth()\n                    if not fill:\n                        max_linewidth *= 1.5\n\n                    linewidth = min(default_linewidth, max_linewidth)\n\n                    # If not filling, don't let lines dissapear\n                    if not fill:\n                        min_linewidth = .5\n                        linewidth = max(linewidth, min_linewidth)\n\n                    bar.set_linewidth(linewidth)\n\n        # --- Finalize the plot ----\n\n        # Axis labels\n        ax = self.ax if self.ax is not None else self.facets.axes.flat[0]\n        default_x = default_y = \"\"\n        if self.data_variable == \"x\":\n            default_y = estimator.stat.capitalize()\n        if self.data_variable == \"y\":\n            default_x = estimator.stat.capitalize()\n        self._add_axis_labels(ax, default_x, default_y)\n\n        # Legend for semantic variables\n        if \"hue\" in self.variables and legend:\n\n            if fill or element == \"bars\":\n                artist = partial(mpl.patches.Patch)\n            else:\n                artist = partial(mpl.lines.Line2D, [], [])\n\n            ax_obj = self.ax if self.ax is not None else self.facets\n            self._add_legend(\n                ax_obj, artist, fill, element, multiple, alpha, plot_kws, {},\n            )"
        },
        {
          "file": "seaborn/distributions.py",
          "type": "function",
          "name": "plot_univariate_histogram",
          "class_name": "_DistributionPlotter",
          "code": "def plot_univariate_histogram(\n        self,\n        multiple,\n        element,\n        fill,\n        common_norm,\n        common_bins,\n        shrink,\n        kde,\n        kde_kws,\n        color,\n        legend,\n        line_kws,\n        estimate_kws,\n        **plot_kws,\n    ):\n\n        # -- Default keyword dicts\n        kde_kws = {} if kde_kws is None else kde_kws.copy()\n        line_kws = {} if line_kws is None else line_kws.copy()\n        estimate_kws = {} if estimate_kws is None else estimate_kws.copy()\n\n        # --  Input checking\n        _check_argument(\"multiple\", [\"layer\", \"stack\", \"fill\", \"dodge\"], multiple)\n        _check_argument(\"element\", [\"bars\", \"step\", \"poly\"], element)\n\n        if estimate_kws[\"discrete\"] and element != \"bars\":\n            raise ValueError(\"`element` must be 'bars' when `discrete` is True\")\n\n        auto_bins_with_weights = (\n            \"weights\" in self.variables\n            and estimate_kws[\"bins\"] == \"auto\"\n            and estimate_kws[\"binwidth\"] is None\n            and not estimate_kws[\"discrete\"]\n        )\n        if auto_bins_with_weights:\n            msg = (\n                \"`bins` cannot be 'auto' when using weights. \"\n                \"Setting `bins=10`, but you will likely want to adjust.\"\n            )\n            warnings.warn(msg, UserWarning)\n            estimate_kws[\"bins\"] = 10\n\n        # Simplify downstream code if we are not normalizing\n        if estimate_kws[\"stat\"] == \"count\":\n            common_norm = False\n\n        # Now initialize the Histogram estimator\n        estimator = Histogram(**estimate_kws)\n        histograms = {}\n\n        # Do pre-compute housekeeping related to multiple groups\n        # TODO best way to account for facet/semantic?\n        if set(self.variables) - {\"x\", \"y\"}:\n\n            all_data = self.comp_data.dropna()\n\n            if common_bins:\n                all_observations = all_data[self.data_variable]\n                estimator.define_bin_edges(\n                    all_observations,\n                    weights=all_data.get(\"weights\", None),\n                )\n\n        else:\n            common_norm = False\n\n        # Estimate the smoothed kernel densities, for use later\n        if kde:\n            # TODO alternatively, clip at min/max bins?\n            kde_kws.setdefault(\"cut\", 0)\n            kde_kws[\"cumulative\"] = estimate_kws[\"cumulative\"]\n            log_scale = self._log_scaled(self.data_variable)\n            densities = self._compute_univariate_density(\n                self.data_variable,\n                common_norm,\n                common_bins,\n                kde_kws,\n                log_scale,\n            )\n\n        # First pass through the data to compute the histograms\n        for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\n\n            # Prepare the relevant data\n            key = tuple(sub_vars.items())\n            observations = sub_data[self.data_variable]\n\n            if \"weights\" in self.variables:\n                weights = sub_data[\"weights\"]\n            else:\n                weights = None\n\n            # Do the histogram computation\n            heights, edges = estimator(observations, weights=weights)\n\n            # Rescale the smoothed curve to match the histogram\n            if kde and key in densities:\n                density = densities[key]\n                if estimator.cumulative:\n                    hist_norm = heights.max()\n                else:\n                    hist_norm = (heights * np.diff(edges)).sum()\n                densities[key] *= hist_norm\n\n            # Convert edges back to original units for plotting\n            if self._log_scaled(self.data_variable):\n                edges = np.power(10, edges)\n\n            # Pack the histogram data and metadata together\n            orig_widths = np.diff(edges)\n            widths = shrink * orig_widths\n            edges = edges[:-1] + (1 - shrink) / 2 * orig_widths\n            index = pd.MultiIndex.from_arrays([\n                pd.Index(edges, name=\"edges\"),\n                pd.Index(widths, name=\"widths\"),\n            ])\n            hist = pd.Series(heights, index=index, name=\"heights\")\n\n            # Apply scaling to normalize across groups\n            if common_norm:\n                hist *= len(sub_data) / len(all_data)\n\n            # Store the finalized histogram data for future plotting\n            histograms[key] = hist\n\n        # Modify the histogram and density data to resolve multiple groups\n        histograms, baselines = self._resolve_multiple(histograms, multiple)\n        if kde:\n            densities, _ = self._resolve_multiple(\n                densities, None if multiple == \"dodge\" else multiple\n            )\n\n        # Set autoscaling-related meta\n        sticky_stat = (0, 1) if multiple == \"fill\" else (0, np.inf)\n        if multiple == \"fill\":\n            # Filled plots should not have any margins\n            bin_vals = histograms.index.to_frame()\n            edges = bin_vals[\"edges\"]\n            widths = bin_vals[\"widths\"]\n            sticky_data = (\n                edges.min(),\n                edges.max() + widths.loc[edges.idxmax()]\n            )\n        else:\n            sticky_data = []\n\n        # --- Handle default visual attributes\n\n        # Note: default linewidth is determined after plotting\n\n        # Default alpha should depend on other parameters\n        if fill:\n            # Note: will need to account for other grouping semantics if added\n            if \"hue\" in self.variables and multiple == \"layer\":\n                default_alpha = .5 if element == \"bars\" else .25\n            elif kde:\n                default_alpha = .5\n            else:\n                default_alpha = .75\n        else:\n            default_alpha = 1\n        alpha = plot_kws.pop(\"alpha\", default_alpha)  # TODO make parameter?\n\n        hist_artists = []\n\n        # Go back through the dataset and draw the plots\n        for sub_vars, _ in self.iter_data(\"hue\", reverse=True):\n\n            key = tuple(sub_vars.items())\n            hist = histograms[key].rename(\"heights\").reset_index()\n            bottom = np.asarray(baselines[key])\n\n            ax = self._get_axes(sub_vars)\n\n            # Define the matplotlib attributes that depend on semantic mapping\n            if \"hue\" in self.variables:\n                sub_color = self._hue_map(sub_vars[\"hue\"])\n            else:\n                sub_color = color\n\n            artist_kws = self._artist_kws(\n                plot_kws, fill, element, multiple, sub_color, alpha\n            )\n\n            if element == \"bars\":\n\n                # Use matplotlib bar plotting\n\n                plot_func = ax.bar if self.data_variable == \"x\" else ax.barh\n                artists = plot_func(\n                    hist[\"edges\"],\n                    hist[\"heights\"] - bottom,\n                    hist[\"widths\"],\n                    bottom,\n                    align=\"edge\",\n                    **artist_kws,\n                )\n\n                for bar in artists:\n                    if self.data_variable == \"x\":\n                        bar.sticky_edges.x[:] = sticky_data\n                        bar.sticky_edges.y[:] = sticky_stat\n                    else:\n                        bar.sticky_edges.x[:] = sticky_stat\n                        bar.sticky_edges.y[:] = sticky_data\n\n                hist_artists.extend(artists)\n\n            else:\n\n                # Use either fill_between or plot to draw hull of histogram\n                if element == \"step\":\n\n                    final = hist.iloc[-1]\n                    x = np.append(hist[\"edges\"], final[\"edges\"] + final[\"widths\"])\n                    y = np.append(hist[\"heights\"], final[\"heights\"])\n                    b = np.append(bottom, bottom[-1])\n\n                    if self.data_variable == \"x\":\n                        step = \"post\"\n                        drawstyle = \"steps-post\"\n                    else:\n                        step = \"post\"  # fillbetweenx handles mapping internally\n                        drawstyle = \"steps-pre\"\n\n                elif element == \"poly\":\n\n                    x = hist[\"edges\"] + hist[\"widths\"] / 2\n                    y = hist[\"heights\"]\n                    b = bottom\n\n                    step = None\n                    drawstyle = None\n\n                if self.data_variable == \"x\":\n                    if fill:\n                        artist = ax.fill_between(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(x, y, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_data\n                    artist.sticky_edges.y[:] = sticky_stat\n                else:\n                    if fill:\n                        artist = ax.fill_betweenx(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(y, x, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_stat\n                    artist.sticky_edges.y[:] = sticky_data\n\n                hist_artists.append(artist)\n\n            if kde:\n\n                # Add in the density curves\n\n                try:\n                    density = densities[key]\n                except KeyError:\n                    continue\n                support = density.index\n\n                if \"x\" in self.variables:\n                    line_args = support, density\n                    sticky_x, sticky_y = None, (0, np.inf)\n                else:\n                    line_args = density, support\n                    sticky_x, sticky_y = (0, np.inf), None\n\n                line_kws[\"color\"] = to_rgba(sub_color, 1)\n                line, = ax.plot(\n                    *line_args, **line_kws,\n                )\n\n                if sticky_x is not None:\n                    line.sticky_edges.x[:] = sticky_x\n                if sticky_y is not None:\n                    line.sticky_edges.y[:] = sticky_y\n\n        if element == \"bars\" and \"linewidth\" not in plot_kws:\n\n            # Now we handle linewidth, which depends on the scaling of the plot\n\n            # Loop through subsets based only on facet variables\n            for sub_vars, _ in self.iter_data():\n\n                ax = self._get_axes(sub_vars)\n\n                # Needed in some cases to get valid transforms.\n                # Innocuous in other cases?\n                ax.autoscale_view()\n\n                # We will base everything on the minimum bin width\n                hist_metadata = pd.concat([\n                    # Use .items for generality over dict or df\n                    h.index.to_frame() for _, h in histograms.items()\n                ]).reset_index(drop=True)\n                thin_bar_idx = hist_metadata[\"widths\"].idxmin()\n                binwidth = hist_metadata.loc[thin_bar_idx, \"widths\"]\n                left_edge = hist_metadata.loc[thin_bar_idx, \"edges\"]\n\n                # Convert binwidth from data coordinates to pixels\n                pts_x, pts_y = 72 / ax.figure.dpi * abs(\n                    ax.transData.transform([left_edge + binwidth] * 2)\n                    - ax.transData.transform([left_edge] * 2)\n                )\n                if self.data_variable == \"x\":\n                    binwidth_points = pts_x\n                else:\n                    binwidth_points = pts_y\n\n                # The relative size of the lines depends on the appearance\n                # This is a provisional value and may need more tweaking\n                default_linewidth = .1 * binwidth_points\n\n                # Set the attributes\n                for bar in hist_artists:\n\n                    # Don't let the lines get too thick\n                    max_linewidth = bar.get_linewidth()\n                    if not fill:\n                        max_linewidth *= 1.5\n\n                    linewidth = min(default_linewidth, max_linewidth)\n\n                    # If not filling, don't let lines dissapear\n                    if not fill:\n                        min_linewidth = .5\n                        linewidth = max(linewidth, min_linewidth)\n\n                    bar.set_linewidth(linewidth)\n\n        # --- Finalize the plot ----\n\n        # Axis labels\n        ax = self.ax if self.ax is not None else self.facets.axes.flat[0]\n        default_x = default_y = \"\"\n        if self.data_variable == \"x\":\n            default_y = estimator.stat.capitalize()\n        if self.data_variable == \"y\":\n            default_x = estimator.stat.capitalize()\n        self._add_axis_labels(ax, default_x, default_y)\n\n        # Legend for semantic variables\n        if \"hue\" in self.variables and legend:\n\n            if fill or element == \"bars\":\n                artist = partial(mpl.patches.Patch)\n            else:\n                artist = partial(mpl.lines.Line2D, [], [])\n\n            ax_obj = self.ax if self.ax is not None else self.facets\n            self._add_legend(\n                ax_obj, artist, fill, element, multiple, alpha, plot_kws, {},\n            )"
        },
        {
          "file": "seaborn/distributions.py",
          "type": "function",
          "name": "plot_univariate_histogram",
          "class_name": "_DistributionPlotter",
          "code": "def plot_univariate_histogram(\n        self,\n        multiple,\n        element,\n        fill,\n        common_norm,\n        common_bins,\n        shrink,\n        kde,\n        kde_kws,\n        color,\n        legend,\n        line_kws,\n        estimate_kws,\n        **plot_kws,\n    ):\n\n        # -- Default keyword dicts\n        kde_kws = {} if kde_kws is None else kde_kws.copy()\n        line_kws = {} if line_kws is None else line_kws.copy()\n        estimate_kws = {} if estimate_kws is None else estimate_kws.copy()\n\n        # --  Input checking\n        _check_argument(\"multiple\", [\"layer\", \"stack\", \"fill\", \"dodge\"], multiple)\n        _check_argument(\"element\", [\"bars\", \"step\", \"poly\"], element)\n\n        if estimate_kws[\"discrete\"] and element != \"bars\":\n            raise ValueError(\"`element` must be 'bars' when `discrete` is True\")\n\n        auto_bins_with_weights = (\n            \"weights\" in self.variables\n            and estimate_kws[\"bins\"] == \"auto\"\n            and estimate_kws[\"binwidth\"] is None\n            and not estimate_kws[\"discrete\"]\n        )\n        if auto_bins_with_weights:\n            msg = (\n                \"`bins` cannot be 'auto' when using weights. \"\n                \"Setting `bins=10`, but you will likely want to adjust.\"\n            )\n            warnings.warn(msg, UserWarning)\n            estimate_kws[\"bins\"] = 10\n\n        # Simplify downstream code if we are not normalizing\n        if estimate_kws[\"stat\"] == \"count\":\n            common_norm = False\n\n        # Now initialize the Histogram estimator\n        estimator = Histogram(**estimate_kws)\n        histograms = {}\n\n        # Do pre-compute housekeeping related to multiple groups\n        # TODO best way to account for facet/semantic?\n        if set(self.variables) - {\"x\", \"y\"}:\n\n            all_data = self.comp_data.dropna()\n\n            if common_bins:\n                all_observations = all_data[self.data_variable]\n                estimator.define_bin_edges(\n                    all_observations,\n                    weights=all_data.get(\"weights\", None),\n                )\n\n        else:\n            common_norm = False\n\n        # Estimate the smoothed kernel densities, for use later\n        if kde:\n            # TODO alternatively, clip at min/max bins?\n            kde_kws.setdefault(\"cut\", 0)\n            kde_kws[\"cumulative\"] = estimate_kws[\"cumulative\"]\n            log_scale = self._log_scaled(self.data_variable)\n            densities = self._compute_univariate_density(\n                self.data_variable,\n                common_norm,\n                common_bins,\n                kde_kws,\n                log_scale,\n            )\n\n        # First pass through the data to compute the histograms\n        for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\n\n            # Prepare the relevant data\n            key = tuple(sub_vars.items())\n            observations = sub_data[self.data_variable]\n\n            if \"weights\" in self.variables:\n                weights = sub_data[\"weights\"]\n            else:\n                weights = None\n\n            # Do the histogram computation\n            heights, edges = estimator(observations, weights=weights)\n\n            # Rescale the smoothed curve to match the histogram\n            if kde and key in densities:\n                density = densities[key]\n                if estimator.cumulative:\n                    hist_norm = heights.max()\n                else:\n                    hist_norm = (heights * np.diff(edges)).sum()\n                densities[key] *= hist_norm\n\n            # Convert edges back to original units for plotting\n            if self._log_scaled(self.data_variable):\n                edges = np.power(10, edges)\n\n            # Pack the histogram data and metadata together\n            orig_widths = np.diff(edges)\n            widths = shrink * orig_widths\n            edges = edges[:-1] + (1 - shrink) / 2 * orig_widths\n            index = pd.MultiIndex.from_arrays([\n                pd.Index(edges, name=\"edges\"),\n                pd.Index(widths, name=\"widths\"),\n            ])\n            hist = pd.Series(heights, index=index, name=\"heights\")\n\n            # Apply scaling to normalize across groups\n            if common_norm:\n                hist *= len(sub_data) / len(all_data)\n\n            # Store the finalized histogram data for future plotting\n            histograms[key] = hist\n\n        # Modify the histogram and density data to resolve multiple groups\n        histograms, baselines = self._resolve_multiple(histograms, multiple)\n        if kde:\n            densities, _ = self._resolve_multiple(\n                densities, None if multiple == \"dodge\" else multiple\n            )\n\n        # Set autoscaling-related meta\n        sticky_stat = (0, 1) if multiple == \"fill\" else (0, np.inf)\n        if multiple == \"fill\":\n            # Filled plots should not have any margins\n            bin_vals = histograms.index.to_frame()\n            edges = bin_vals[\"edges\"]\n            widths = bin_vals[\"widths\"]\n            sticky_data = (\n                edges.min(),\n                edges.max() + widths.loc[edges.idxmax()]\n            )\n        else:\n            sticky_data = []\n\n        # --- Handle default visual attributes\n\n        # Note: default linewidth is determined after plotting\n\n        # Default alpha should depend on other parameters\n        if fill:\n            # Note: will need to account for other grouping semantics if added\n            if \"hue\" in self.variables and multiple == \"layer\":\n                default_alpha = .5 if element == \"bars\" else .25\n            elif kde:\n                default_alpha = .5\n            else:\n                default_alpha = .75\n        else:\n            default_alpha = 1\n        alpha = plot_kws.pop(\"alpha\", default_alpha)  # TODO make parameter?\n\n        hist_artists = []\n\n        # Go back through the dataset and draw the plots\n        for sub_vars, _ in self.iter_data(\"hue\", reverse=True):\n\n            key = tuple(sub_vars.items())\n            hist = histograms[key].rename(\"heights\").reset_index()\n            bottom = np.asarray(baselines[key])\n\n            ax = self._get_axes(sub_vars)\n\n            # Define the matplotlib attributes that depend on semantic mapping\n            if \"hue\" in self.variables:\n                sub_color = self._hue_map(sub_vars[\"hue\"])\n            else:\n                sub_color = color\n\n            artist_kws = self._artist_kws(\n                plot_kws, fill, element, multiple, sub_color, alpha\n            )\n\n            if element == \"bars\":\n\n                # Use matplotlib bar plotting\n\n                plot_func = ax.bar if self.data_variable == \"x\" else ax.barh\n                artists = plot_func(\n                    hist[\"edges\"],\n                    hist[\"heights\"] - bottom,\n                    hist[\"widths\"],\n                    bottom,\n                    align=\"edge\",\n                    **artist_kws,\n                )\n\n                for bar in artists:\n                    if self.data_variable == \"x\":\n                        bar.sticky_edges.x[:] = sticky_data\n                        bar.sticky_edges.y[:] = sticky_stat\n                    else:\n                        bar.sticky_edges.x[:] = sticky_stat\n                        bar.sticky_edges.y[:] = sticky_data\n\n                hist_artists.extend(artists)\n\n            else:\n\n                # Use either fill_between or plot to draw hull of histogram\n                if element == \"step\":\n\n                    final = hist.iloc[-1]\n                    x = np.append(hist[\"edges\"], final[\"edges\"] + final[\"widths\"])\n                    y = np.append(hist[\"heights\"], final[\"heights\"])\n                    b = np.append(bottom, bottom[-1])\n\n                    if self.data_variable == \"x\":\n                        step = \"post\"\n                        drawstyle = \"steps-post\"\n                    else:\n                        step = \"post\"  # fillbetweenx handles mapping internally\n                        drawstyle = \"steps-pre\"\n\n                elif element == \"poly\":\n\n                    x = hist[\"edges\"] + hist[\"widths\"] / 2\n                    y = hist[\"heights\"]\n                    b = bottom\n\n                    step = None\n                    drawstyle = None\n\n                if self.data_variable == \"x\":\n                    if fill:\n                        artist = ax.fill_between(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(x, y, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_data\n                    artist.sticky_edges.y[:] = sticky_stat\n                else:\n                    if fill:\n                        artist = ax.fill_betweenx(x, b, y, step=step, **artist_kws)\n                    else:\n                        artist, = ax.plot(y, x, drawstyle=drawstyle, **artist_kws)\n                    artist.sticky_edges.x[:] = sticky_stat\n                    artist.sticky_edges.y[:] = sticky_data\n\n                hist_artists.append(artist)\n\n            if kde:\n\n                # Add in the density curves\n\n                try:\n                    density = densities[key]\n                except KeyError:\n                    continue\n                support = density.index\n\n                if \"x\" in self.variables:\n                    line_args = support, density\n                    sticky_x, sticky_y = None, (0, np.inf)\n                else:\n                    line_args = density, support\n                    sticky_x, sticky_y = (0, np.inf), None\n\n                line_kws[\"color\"] = to_rgba(sub_color, 1)\n                line, = ax.plot(\n                    *line_args, **line_kws,\n                )\n\n                if sticky_x is not None:\n                    line.sticky_edges.x[:] = sticky_x\n                if sticky_y is not None:\n                    line.sticky_edges.y[:] = sticky_y\n\n        if element == \"bars\" and \"linewidth\" not in plot_kws:\n\n            # Now we handle linewidth, which depends on the scaling of the plot\n\n            # Loop through subsets based only on facet variables\n            for sub_vars, _ in self.iter_data():\n\n                ax = self._get_axes(sub_vars)\n\n                # Needed in some cases to get valid transforms.\n                # Innocuous in other cases?\n                ax.autoscale_view()\n\n                # We will base everything on the minimum bin width\n                hist_metadata = pd.concat([\n                    # Use .items for generality over dict or df\n                    h.index.to_frame() for _, h in histograms.items()\n                ]).reset_index(drop=True)\n                thin_bar_idx = hist_metadata[\"widths\"].idxmin()\n                binwidth = hist_metadata.loc[thin_bar_idx, \"widths\"]\n                left_edge = hist_metadata.loc[thin_bar_idx, \"edges\"]\n\n                # Convert binwidth from data coordinates to pixels\n                pts_x, pts_y = 72 / ax.figure.dpi * abs(\n                    ax.transData.transform([left_edge + binwidth] * 2)\n                    - ax.transData.transform([left_edge] * 2)\n                )\n                if self.data_variable == \"x\":\n                    binwidth_points = pts_x\n                else:\n                    binwidth_points = pts_y\n\n                # The relative size of the lines depends on the appearance\n                # This is a provisional value and may need more tweaking\n                default_linewidth = .1 * binwidth_points\n\n                # Set the attributes\n                for bar in hist_artists:\n\n                    # Don't let the lines get too thick\n                    max_linewidth = bar.get_linewidth()\n                    if not fill:\n                        max_linewidth *= 1.5\n\n                    linewidth = min(default_linewidth, max_linewidth)\n\n                    # If not filling, don't let lines dissapear\n                    if not fill:\n                        min_linewidth = .5\n                        linewidth = max(linewidth, min_linewidth)\n\n                    bar.set_linewidth(linewidth)\n\n        # --- Finalize the plot ----\n\n        # Axis labels\n        ax = self.ax if self.ax is not None else self.facets.axes.flat[0]\n        default_x = default_y = \"\"\n        if self.data_variable == \"x\":\n            default_y = estimator.stat.capitalize()\n        if self.data_variable == \"y\":\n            default_x = estimator.stat.capitalize()\n        self._add_axis_labels(ax, default_x, default_y)\n\n        # Legend for semantic variables\n        if \"hue\" in self.variables and legend:\n\n            if fill or element == \"bars\":\n                artist = partial(mpl.patches.Patch)\n            else:\n                artist = partial(mpl.lines.Line2D, [], [])\n\n            ax_obj = self.ax if self.ax is not None else self.facets\n            self._add_legend(\n                ax_obj, artist, fill, element, multiple, alpha, plot_kws, {},\n            )"
        }
      ]
    },
    {
      "pr_number": 2417,
      "pr_title": "Improve NA robustness in VectorPlotter.comp_data",
      "pr_body": "This PR avoids passing `nan` through the matplotlib converters used to obtain a numeric/computable representation of the data (i.e. `VectorPlotter.comp_data`).\r\n\r\nIt also\r\n- codifies that the converted columns in `comp_data` have a float dtype\r\n- converts `inf` to `nan`, in line with what matplotlib does\r\n\r\nFixes #2295 \r\n\r\nAdditionally this will implicitly address #1971 once the regression plots are refactored to use `comp_data` internally. (@mojones, funny that you opened both issues).",
      "issue_id": 2295,
      "issue_title": "histplot with categorical values crashes with missing data, though numerical values work fine",
      "issue_body": "Not sure if this is intended behaviour, but it caught me out due to the difference in handling numerical/categorical data. I note that drawing histograms of categorical data is labelled as experimental, so ignore/close if that explains it.\r\n\r\nWith numerical data `histplot` ignores NaN and plots the other values, this is the behaviour I would expect:\r\n\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    [1.1, 1.2, 1.3, 1.4, np.nan]\r\n)\r\n```\r\n\r\nbut with categorical data it crashes:\r\n```\r\nimport numpy as np\r\nimport seaborn as sns\r\n\r\nsns.histplot(\r\n    ['foo', 'foo', 'bar', np.nan]\r\n)\r\n\r\n# output\r\n---------------------------------------------------------------------------\r\nTypeError                                 Traceback (most recent call last)\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1519         try:\r\n-> 1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in convert(value, unit, axis)\r\n     60         # force an update so it also does type checking\r\n---> 61         unit.update(values)\r\n     62         return np.vectorize(unit._mapping.__getitem__, otypes=[float])(values)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/category.py in update(self, data)\r\n    210             # OrderedDict just iterates over unique values in data.\r\n--> 211             cbook._check_isinstance((str, bytes), value=val)\r\n    212             if convertible:\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/cbook/__init__.py in _check_isinstance(_types, **kwargs)\r\n   2234         if not isinstance(v, types):\r\n-> 2235             raise TypeError(\r\n   2236                 \"{!r} must be an instance of {}, not a {}\".format(\r\n\r\nTypeError: 'value' must be an instance of str or bytes, not a float\r\n\r\nThe above exception was the direct cause of the following exception:\r\n\r\nConversionError                           Traceback (most recent call last)\r\n<ipython-input-61-b132ea7dca6c> in <module>\r\n      2 import seaborn as sns\r\n      3 \r\n----> 4 sns.histplot(\r\n      5     ['foo', 'foo', 'bar', np.nan]\r\n      6 )\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)\r\n   1420     if p.univariate:\r\n   1421 \r\n-> 1422         p.plot_univariate_histogram(\r\n   1423             multiple=multiple,\r\n   1424             element=element,\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)\r\n    421 \r\n    422         # First pass through the data to compute the histograms\r\n--> 423         for sub_vars, sub_data in self.iter_data(\"hue\", from_comp_data=True):\r\n    424 \r\n    425             # Prepare the relevant data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in iter_data(self, grouping_vars, reverse, from_comp_data)\r\n    965 \r\n    966         if from_comp_data:\r\n--> 967             data = self.comp_data\r\n    968         else:\r\n    969             data = self.plot_data\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/seaborn/_core.py in comp_data(self)\r\n   1034                 axis = getattr(ax, f\"{var}axis\")\r\n   1035 \r\n-> 1036                 comp_var = axis.convert_units(self.plot_data[var])\r\n   1037                 if axis.get_scale() == \"log\":\r\n   1038                     comp_var = np.log10(comp_var)\r\n\r\n~/.virtualenvs/drawingfromdata/lib/python3.8/site-packages/matplotlib/axis.py in convert_units(self, x)\r\n   1520             ret = self.converter.convert(x, self.units, self)\r\n   1521         except Exception as e:\r\n-> 1522             raise munits.ConversionError('Failed to convert value(s) to axis '\r\n   1523                                          f'units: {x!r}') from e\r\n   1524         return ret\r\n\r\nConversionError: Failed to convert value(s) to axis units: 0    foo\r\n1    foo\r\n2    bar\r\n3    NaN\r\nName: x, dtype: object\r\n```\r\n\r\n",
      "issue_closed_at": "2021-01-05T19:40:57Z",
      "base_commit": "aad96f8d2e36ceceb82a42b69aa3a8f47ef7210d",
      "changes": [
        {
          "file": "seaborn/_core.py",
          "type": "function",
          "name": "comp_data",
          "class_name": "VectorPlotter",
          "code": "def comp_data(self):\n        \"\"\"Dataframe with numeric x and y, after unit conversion and log scaling.\"\"\"\n        if not hasattr(self, \"ax\"):\n            # Probably a good idea, but will need a bunch of tests updated\n            # Most of these tests should just use the external interface\n            # Then this can be re-enabled.\n            # raise AttributeError(\"No Axes attached to plotter\")\n            return self.plot_data\n\n        if not hasattr(self, \"_comp_data\"):\n\n            comp_data = (\n                self.plot_data\n                .copy(deep=False)\n                .drop([\"x\", \"y\"], axis=1, errors=\"ignore\")\n            )\n            for var in \"yx\":\n                if var not in self.variables:\n                    continue\n\n                # Get a corresponding axis object so that we can convert the units\n                # to matplotlib's numeric representation, which we can compute on\n                # This is messy and it would probably be better for VectorPlotter\n                # to manage its own converters (using the matplotlib tools).\n                # XXX Currently does not support unshared categorical axes!\n                # (But see comment in _attach about how those don't exist)\n                if self.ax is None:\n                    ax = self.facets.axes.flat[0]\n                else:\n                    ax = self.ax\n                axis = getattr(ax, f\"{var}axis\")\n\n                comp_var = axis.convert_units(self.plot_data[var])\n                if axis.get_scale() == \"log\":\n                    comp_var = np.log10(comp_var)\n                comp_data.insert(0, var, comp_var)\n\n            self._comp_data = comp_data\n\n        return self._comp_data"
        }
      ]
    }
  ]
}