{
  "id": "pvlib__pvlib-python-1368",
  "question": "`read_crn` returns -99999 instead of `NaN`\n**Describe the bug**\r\n`read_crn` fails to map -99999 to `NaN`\r\n\r\n**To Reproduce**\r\n```python\r\nfrom pvlib.iotools import read_crn\r\ncrn = read_crn('https://www.ncei.noaa.gov/pub/data/uscrn/products/subhourly01/2021/CRNS0101-05-2021-NY_Millbrook_3_W.txt')\r\ncrn.loc['2021-12-14 0930':'2021-12-14 1130', 'ghi']\r\n2021-12-14 09:30:00+00:00        0.0\r\n2021-12-14 09:35:00+00:00        0.0\r\n2021-12-14 09:40:00+00:00        0.0\r\n2021-12-14 09:45:00+00:00        0.0\r\n2021-12-14 09:50:00+00:00        0.0\r\n2021-12-14 09:55:00+00:00        0.0\r\n2021-12-14 10:00:00+00:00        0.0\r\n2021-12-14 10:05:00+00:00   -99999.0\r\n2021-12-14 10:10:00+00:00   -99999.0\r\n2021-12-14 10:15:00+00:00   -99999.0\r\n2021-12-14 10:20:00+00:00   -99999.0\r\n2021-12-14 10:25:00+00:00   -99999.0\r\n2021-12-14 10:30:00+00:00   -99999.0\r\n2021-12-14 10:35:00+00:00   -99999.0\r\n2021-12-14 10:40:00+00:00   -99999.0\r\n2021-12-14 10:45:00+00:00   -99999.0\r\n2021-12-14 10:50:00+00:00   -99999.0\r\n2021-12-14 10:55:00+00:00   -99999.0\r\n2021-12-14 11:00:00+00:00   -99999.0\r\n2021-12-14 11:05:00+00:00        0.0\r\n2021-12-14 11:10:00+00:00        0.0\r\n2021-12-14 11:15:00+00:00        0.0\r\n2021-12-14 11:20:00+00:00        0.0\r\n2021-12-14 11:25:00+00:00        0.0\r\n2021-12-14 11:30:00+00:00        0.0\r\nName: ghi, dtype: float64\r\n```\r\n\r\n**Expected behavior**\r\nShould return `NaN` instead of -99999\r\n\r\n**Versions:**\r\n - ``pvlib.__version__``: 0.9.0\r\n - ``pandas.__version__``: 1.0.3 (doesn't matter)\r\n - python: 3.7\r\n\r\n**Additional context**\r\n\r\nDocumentation [here](https://www.ncei.noaa.gov/pub/data/uscrn/products/subhourly01/) says\r\n\r\n>          C.  Missing data are indicated by the lowest possible integer for a \r\n>             given column format, such as -9999.0 for 7-character fields with \r\n>             one decimal place or -99.000 for 7-character fields with three\r\n>             decimal places.\r\n\r\nSo we should change \r\n\r\nhttps://github.com/pvlib/pvlib-python/blob/1ab0eb20f9cd9fb9f7a0ddf35f81283f2648e34a/pvlib/iotools/crn.py#L112-L117\r\n\r\nto include -99999 and perhaps -999999. Or do the smarter thing as discussed in the comment.\r\n\r\nalso https://github.com/SolarArbiter/solarforecastarbiter-core/issues/773\n",
  "answer": "{\"task_id\": \"c224d504-fc66-4cb6-b5d4-952663682592\", \"workflow_run_id\": \"3fad9c69-4c38-4c36-95d7-9daa2677b1a4\", \"data\": {\"id\": \"3fad9c69-4c38-4c36-95d7-9daa2677b1a4\", \"workflow_id\": \"4faf996e-a613-49ba-90e4-a0af9c740cdf\", \"status\": \"succeeded\", \"outputs\": {\"ranked_patches\": \"```json\\n{\\n  \\\"ranked_patches\\\": [\\n    {\\n      \\\"patch_id\\\": 1,\\n      \\\"patch\\\": \\\"diff --git a/pvlib/iotools/crn.py b/pvlib/iotools/crn.py\\\\nindex abcdefg..hijklmn 100644\\\\n--- a/pvlib/iotools/crn.py\\\\n+++ b/pvlib/iotools/crn.py\\\\n@@ -112,7 +112,7 @@ def read_crn(filepath_or_buffer, names, **kwargs):\\\\n     ...\\\\n     # initialize the dataframe, replacing error codes with NaN\\\\n     data = pd.read_csv(filepath_or_buffer,\\\\n-                       na_values=-9999.0,\\\\n+                       na_values=[-9999.0, -99999.0, -999999.0],\\\\n                        ...\\\\n                        **kwargs)\\\\n\\\\n\\\",\\n      \\\"rank\\\": 1,\\n      \\\"score\\\": 0.95,\\n      \\\"evaluation\\\": {\\n        \\\"correctness\\\": 0.95,\\n        \\\"code_quality\\\": 0.95,\\n        \\\"risk_level\\\": 0.05,\\n        \\\"best_practices\\\": 0.95,\\n        \\\"test_coverage\\\": 0.9\\n      },\\n      \\\"reasoning\\\": \\\"This patch correctly addresses the issue by expanding the `na_values` parameter to include -99999 and -999999. It is simple, clear, and aligns well with best practices for handling missing data in pandas. The risk of introducing new bugs is minimal as it leverages existing pandas functionality.\\\",\\n      \\\"recommendation\\\": \\\"Highly recommended\\\"\\n    },\\n    {\\n      \\\"patch_id\\\": 2,\\n      \\\"patch\\\": \\\"diff --git a/pvlib/iotools/crn.py b/pvlib/iotools/crn.py\\\\nindex abcdefg..opqrstu 100644\\\\n--- a/pvlib/iotools/crn.py\\\\n+++ b/pvlib/iotools/crn.py\\\\n@@ -112,7 +112,7 @@ def read_crn(filepath_or_buffer, names, **kwargs):\\\\n     ...\\\\n     # initialize the dataframe, replacing error codes with NaN\\\\n     data = pd.read_csv(filepath_or_buffer,\\\\n-                       na_values=-9999.0,\\\\n+                       na_values=[-9999.0, -99999.0, -999999.0],\\\\n                        converters={col: lambda x: np.nan if float(x) < -90000 else float(x) for col in names},\\\\n                        ...\\\\n                        **kwargs)\\\\n\\\\n\\\",\\n      \\\"rank\\\": 2,\\n      \\\"score\\\": 0.85,\\n      \\\"evaluation\\\": {\\n        \\\"correctness\\\": 0.9,\\n        \\\"code_quality\\\": 0.85,\\n        \\\"risk_level\\\": 0.1,\\n        \\\"best_practices\\\": 0.85,\\n        \\\"test_coverage\\\": 0.9\\n      },\\n      \\\"reasoning\\\": \\\"This patch also addresses the issue by expanding `na_values` and adds a converter to handle any value below -90000 as NaN. While this adds flexibility, it introduces additional complexity and potential for errors, as it assumes all values below -90000 are invalid, which may not always be the case.\\\",\\n      \\\"recommendation\\\": \\\"Recommended with caution\\\"\\n    },\\n    {\\n      \\\"patch_id\\\": 3,\\n      \\\"patch\\\": \\\"diff --git a/pvlib/iotools/crn.py b/pvlib/iotools/crn.py\\\\nindex abcdefg..vwxyzab 100644\\\\n--- a/pvlib/iotools/crn.py\\\\n+++ b/pvlib/iotools/crn.py\\\\n@@ -112,7 +112,10 @@ def read_crn(filepath_or_buffer, names, **kwargs):\\\\n     ...\\\\n     # initialize the dataframe, replacing error codes with NaN\\\\n     data = pd.read_csv(filepath_or_buffer,\\\\n-                       na_values=-9999.0,\\\\n+                       na_values=-9999.0,\\\\n+                       converters={col: (\\\\n+                           lambda x: np.nan if isinstance(x, (int, float)) and x <= -9999 else x\\\\n+                       ) for col in names},\\\\n                        ...\\\\n                        **kwargs)\\\\n\\\\n\\\",\\n      \\\"rank\\\": 3,\\n      \\\"score\\\": 0.75,\\n      \\\"evaluation\\\": {\\n        \\\"correctness\\\": 0.85,\\n        \\\"code_quality\\\": 0.8,\\n        \\\"risk_level\\\": 0.15,\\n        \\\"best_practices\\\": 0.8,\\n        \\\"test_coverage\\\": 0.9\\n      },\\n      \\\"reasoning\\\": \\\"This patch introduces a dynamic approach using converters to replace any numeric value less than or equal to -9999 with NaN. While flexible, it is more complex and may have unintended consequences if other negative values are valid in some contexts. It also deviates from the simpler and more maintainable approach of using `na_values`.\\\",\\n      \\\"recommendation\\\": \\\"Use with caution\\\"\\n    }\\n  ],\\n  \\\"evaluation_summary\\\": \\\"Patch 1 is the best choice as it provides a straightforward and effective solution by expanding the `na_values` parameter. Patch 2 adds unnecessary complexity with a converter, which could lead to potential errors. Patch 3, while flexible, introduces more complexity and risk, making it less desirable. Overall, Patch 1 is highly recommended for its simplicity and alignment with best practices.\\\"\\n}\\n```\", \"generated_tests\": \"{\\n  \\\"reproduction_tests\\\": [\\n    {\\n      \\\"test_name\\\": \\\"test_reproduce_original_issue\\\",\\n      \\\"test_code\\\": \\\"def test_reproduce_original_issue():\\\\n    from pvlib.iotools import read_crn\\\\n    import pandas as pd\\\\n\\\\n    # Create sample data with -99999 values\\\\n    data = {'datetime': pd.date_range('2021-12-14 09:30:00', periods=5, freq='5T'), 'ghi': [0.0, 0.0, 0.0, -99999.0, -99999.0]}\\\\n    df = pd.DataFrame(data)\\\\n    df.to_csv('test_crn_data.csv', index=False)\\\\n\\\\n    # Test reading the data file\\\\n    crn = read_crn('test_crn_data.csv')\\\\n    result = crn.loc[:, 'ghi'].values.tolist()\\\\n    assert result == [0.0, 0.0, 0.0, pd.NA, pd.NA], 'Values not replaced with NaN'\\\",\\n      \\\"description\\\": \\\"This test reproduces the original issue by creating a sample data file with -99999 values and checking if they are replaced with NaN\\\",\\n      \\\"expected_behavior\\\": \\\"The test should fail before applying the patches and pass after applying the patches\\\"\\n    },\\n    {\\n      \\\"test_name\\\": \\\"test_edge_cases\\\",\\n      \\\"test_code\\\": \\\"def test_edge_cases():\\\\n    from pvlib.iotools import read_crn\\\\n    import pandas as pd\\\\n\\\\n    # Create sample data with various missing values\\\\n    data = {'datetime': pd.date_range('2021-12-14 09:30:00', periods=5, freq='5T'), 'ghi': [0.0, 0.0, -9999.0, -99999.0, -999999.0]}\\\\n    df = pd.DataFrame(data)\\\\n    df.to_csv('test_crn_data_edge_cases.csv', index=False)\\\\n\\\\n    # Test reading the data file\\\\n    crn = read_crn('test_crn_data_edge_cases.csv')\\\\n    result = crn.loc[:, 'ghi'].values.tolist()\\\\n    assert result == [0.0, 0.0, pd.NA, pd.NA, pd.NA], 'Values not replaced with NaN'\\\",\\n      \\\"description\\\": \\\"This test covers edge cases by creating a sample data file with various missing values and checking if they are replaced with NaN\\\",\\n      \\\"expected_behavior\\\": \\\"The test should fail before applying the patches and pass after applying the patches\\\"\\n    }\\n  ],\\n  \\\"validation_tests\\\": [\\n    {\\n      \\\"test_name\\\": \\\"test_patch_validation\\\",\\n      \\\"test_code\\\": \\\"def test_patch_validation():\\\\n    from pvlib.iotools import read_crn\\\\n    import pandas as pd\\\\n\\\\n    # Create sample data with -99999 values\\\\n    data = {'datetime': pd.date_range('2021-12-14 09:30:00', periods=5, freq='5T'), 'ghi': [0.0, 0.0, 0.0, -99999.0, -99999.0]}\\\\n    df = pd.DataFrame(data)\\\\n    df.to_csv('test_crn_data.csv', index=False)\\\\n\\\\n    # Test reading the data file after applying the patch\\\\n    crn = read_crn('test_crn_data.csv')\\\\n    result = crn.loc[:, 'ghi'].values.tolist()\\\\n    assert result == [0.0, 0.0, 0.0, pd.NA, pd.NA], 'Values not replaced with NaN as expected after patching'\\\",\\n      \\\"description\\\": \\\"This test validates that the patch correctly replaces -99999 with NaN\\\",\\n      \\\"expected_behavior\\\": \\\"The test should pass if the patch is correctly applied\\\"\\n    }\\n  ],\\n  \\\"test_summary\\\": \\\"Generated test cases include reproducing the original issue, testing edge cases with various missing values, and validating the patch to ensure correct handling of -99999 values.\\\"\\n}\"}, \"error\": \"\", \"elapsed_time\": 177.359276, \"total_tokens\": 18939, \"total_steps\": 9, \"created_at\": 1753371941, \"finished_at\": 1753372118}}"
}