{
  "original_problem": {
    "instance_id": "django__django-17051",
    "repo": "django/django",
    "created_at": "2023-07-07T11:01:09Z",
    "problem_statement": "Allow returning IDs in QuerySet.bulk_create() when updating conflicts.\nDescription\n\t\nCurrently, when using bulk_create with a conflict handling flag turned on (e.g. ignore_conflicts or update_conflicts), the primary keys are not set in the returned queryset, as documented in bulk_create.\nWhile I understand using ignore_conflicts can lead to PostgreSQL not returning the IDs when a row is ignored (see ​this SO thread), I don't understand why we don't return the IDs in the case of update_conflicts.\nFor instance:\nMyModel.objects.bulk_create([MyModel(...)], update_conflicts=True, update_fields=[...], unique_fields=[...])\ngenerates a query without a RETURNING my_model.id part:\nINSERT INTO \"my_model\" (...)\nVALUES (...)\n\tON CONFLICT(...) DO UPDATE ...\nIf I append the RETURNING my_model.id clause, the query is indeed valid and the ID is returned (checked with PostgreSQL).\nI investigated a bit and ​this in Django source is where the returning_fields gets removed.\nI believe we could discriminate the cases differently so as to keep those returning_fields in the case of update_conflicts.\nThis would be highly helpful when using bulk_create as a bulk upsert feature.\n",
    "patch": "diff --git a/django/db/models/query.py b/django/db/models/query.py\n--- a/django/db/models/query.py\n+++ b/django/db/models/query.py\n@@ -1837,12 +1837,17 @@ def _batched_insert(\n         inserted_rows = []\n         bulk_return = connection.features.can_return_rows_from_bulk_insert\n         for item in [objs[i : i + batch_size] for i in range(0, len(objs), batch_size)]:\n-            if bulk_return and on_conflict is None:\n+            if bulk_return and (\n+                on_conflict is None or on_conflict == OnConflict.UPDATE\n+            ):\n                 inserted_rows.extend(\n                     self._insert(\n                         item,\n                         fields=fields,\n                         using=self.db,\n+                        on_conflict=on_conflict,\n+                        update_fields=update_fields,\n+                        unique_fields=unique_fields,\n                         returning_fields=self.model._meta.db_returning_fields,\n                     )\n                 )\n"
  },
  "candidates_evaluated": 5,
  "judgment_result": {
    "candidates": [
      {
        "idx": 1,
        "id": "similar_34177",
        "decision": "Useful",
        "confidence": "High",
        "reason": "Both issues involve handling primary keys and conflict resolution in bulk_create, focusing on ORM expectations versus database schema."
      },
      {
        "idx": 2,
        "id": "similar_34544",
        "decision": "Not useful",
        "confidence": "Low",
        "reason": "The issue is specific to Oracle query generation and does not relate to conflict handling or primary key management."
      },
      {
        "idx": 3,
        "id": "similar_34171",
        "decision": "Useful",
        "confidence": "High",
        "reason": "Both issues deal with SQL query generation errors in bulk_create related to field name handling during conflict resolution."
      },
      {
        "idx": 4,
        "id": "similar_29367",
        "decision": "Not useful",
        "confidence": "Medium",
        "reason": "The issue focuses on instance state management post-bulk_create, not on conflict resolution or returning IDs."
      },
      {
        "idx": 5,
        "id": "similar_28375",
        "decision": "Not useful",
        "confidence": "Low",
        "reason": "The issue is about prefetch_related with non-standard primary keys, unrelated to conflict handling in bulk_create."
      }
    ]
  },
  "raw_summaries": [
    {
      "similar_issue": {
        "issue_title": "QuerySet.bulk_create() crashes on \"pk\" in unique_fields.",
        "issue_body": "QuerySet.bulk_create()\ncrashes on\n\"pk\"\nin\nunique_fields\nwhich should be allowed.\nFile \"/django/django/db/backends/utils.py\", line 89, in _execute\n    return self.cursor.execute(sql, params)\ndjango.db.utils.ProgrammingError: column \"pk\" does not exist\nLINE 1: ...S (3127, 3, 3, 'c'), (3128, 4, 4, 'd') ON CONFLICT(\"pk\") DO ...\nBug in\n0f6946495a8ec955b471ca1baaf408ceb53d4796\n.",
        "issue_id": 34177,
        "pr_number": 16317,
        "pr_title": "Fixed #34177 -- Fixed QuerySet.bulk_create() crash on \"pk\" in unique_fields.",
        "pr_body": "Noticed when reviewing #16315.\r\n\r\nBug in 0f6946495a8ec955b471ca1baaf408ceb53d4796.\r\n\r\nticket-34177",
        "issue_closed_at": "2022-11-22T07:27:06",
        "base_commit": "744a1af7f943106e30d538e6ace55c2c66ccd791"
      },
      "summary": "### Summary:\nThis issue is related to the Django framework, specifically within the `QuerySet.bulk_create()` method, where an error occurs due to the handling of primary keys (pk) within the `unique_fields` parameter. The problem arises when attempting to perform a bulk insert operation with the `ON CONFLICT` clause using \"pk\" as a unique field, resulting in a `ProgrammingError` because the column \"pk\" does not exist in the database schema. This error indicates a mismatch between the ORM's expectations and the actual database schema.\n\nKey symptoms and behaviors observed include the method crashing when \"pk\" is specified in `unique_fields`, and the error trace pointing out a missing column \"pk\". This affects the backend database operations in Django, particularly when managing bulk data inserts with potential conflicts on primary keys.\n\nThe affected components are primarily within Django's ORM layer, specifically the `QuerySet` methods related to bulk creation and conflict resolution. The severity of this issue can be considered significant for applications relying on bulk create operations with conflict resolution, as it prevents the use of primary keys in `unique_fields`, potentially causing application downtime or data consistency issues.\n\nRelevant technical details include the specific methods impacted (`QuerySet._check_bulk_create_options` and `QuerySet.bulk_create`) and the need for correct handling of primary keys in the context of unique constraints during bulk operations. The patch addresses this by adjusting the logic to appropriately recognize and handle primary keys, thereby aligning the ORM's behavior with the intended database operations.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: QuerySet.bulk_create() crashes on \"pk\" in unique_fields.\n\nBody:\nQuerySet.bulk_create()\ncrashes on\n\"pk\"\nin\nunique_fields\nwhich should be allowed.\nFile \"/django/django/db/backends/utils.py\", line 89, in _execute\n    return self.cursor.execute(sql, params)\ndjango.db.utils.ProgrammingError: column \"pk\" does not exist\nLINE 1: ...S (3127, 3, 3, 'c'), (3128, 4, 4, 'd') ON CONFLICT(\"pk\") DO ...\nBug in\n0f6946495a8ec955b471ca1baaf408ceb53d4796\n.\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\ndjango/db/models/query.py\n  function: QuerySet._check_bulk_create_options\n  function: QuerySet.bulk_create\n  function: QuerySet.bulk_create\n"
    },
    {
      "similar_issue": {
        "issue_title": "DBMS_LOB.SUBSTR() is unnecessary for \"IS NULL\" condition on Oracle.",
        "issue_body": "Model contains\nsomefield = models.TextField(blank=True, null=True)\nAt 4.1.9:\nprint(MyModel.objects.filter(somefield=None).only('id').query)\nSELECT \"TNAME_MYMODEL\".\"ID\" FROM \"TNAME_MYMODEL\" WHERE \"TNAME_MYMODEL\".\"SOMEFIELD\"  IS NULL\nworks\nAt 4.2.0:\nprint(MyModel.objects.filter(somefield=None).only('id').query)\nSELECT \"TNAME_MYMODEL\".\"ID\" FROM \"TNAME_MYMODEL\" WHERE DBMS_LOB.SUBSTR(\"TNAME_MYMODEL\".\"SOMEFIELD\") IS NULL\n\nDatabaseError: ORA-06502: PL/SQL: numeric or value error: character string buffer too small\nORA-06512: at line 1",
        "issue_id": 34544,
        "pr_number": 16835,
        "pr_title": "Fixed #34544 -- Avoided DBMS_LOB.SUBSTR() wrapping with IS NULL condition on Oracle.",
        "pr_body": "ticket-34544\r\n\r\nRegression in 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca.\r\n\r\nThanks Michael Smith for the report.\r\n\r\nThis also reverts commit 1e4da439556cdd69eb9f91e07f99cf77997e70d2.",
        "issue_closed_at": "2023-05-08T12:34:53",
        "base_commit": "aaf8c76c567e8311f4a85cf74c82fc3d70cc6f12"
      },
      "summary": "### Summary:\n\nThis issue is centered around a regression in the behavior of database query generation for Oracle databases in a specific version of a software framework. The problem arises when querying a database model field, specifically a text field, to check for null values. In the newer version of the software, an unnecessary and incorrect usage of the Oracle DBMS_LOB.SUBSTR() function is introduced when forming SQL queries to check if a field is null. This leads to a database error due to Oracle's limitations on handling LOB data types with the SUBSTR() function in this context.\n\n1. **Problem description in general terms**: The software framework incorrectly includes a function call in SQL queries for checking null values in text fields on Oracle databases, leading to execution errors.\n\n2. **Key symptoms and behaviors observed**: \n   - In the newer version (4.2.0), when executing a query to filter records where a text field is null, the SQL query includes the function DBMS_LOB.SUBSTR(), which is unnecessary for \"IS NULL\" conditions.\n   - This results in a DatabaseError indicating a numeric or value error due to character string buffer limitations.\n\n3. **Affected components or systems**: The problem affects the Oracle database backend operations in the software framework, particularly the SQL query generation process when filtering for null values in text fields.\n\n4. **Potential impact or severity**: The issue causes SQL queries that worked in previous versions to fail in the new version, potentially affecting any application using this framework to interact with Oracle databases. This could lead to application downtime or data retrieval issues, making it a critical problem for affected systems.\n\n5. **Relevant technical details abstracted for broader understanding**: \n   - The regression impacts how SQL queries are constructed for Oracle databases in the newer version of the software.\n   - The use of DBMS_LOB.SUBSTR() in a context where it is not needed or appropriate leads to errors.\n   - Fixes were made in specific parts of the codebase handling database operations, specifically in files related to Oracle and PostgreSQL database backends, suggesting a cross-database relevance in optimization or compatibility improvements.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: DBMS_LOB.SUBSTR() is unnecessary for \"IS NULL\" condition on Oracle.\n\nBody:\nModel contains\nsomefield = models.TextField(blank=True, null=True)\nAt 4.1.9:\nprint(MyModel.objects.filter(somefield=None).only('id').query)\nSELECT \"TNAME_MYMODEL\".\"ID\" FROM \"TNAME_MYMODEL\" WHERE \"TNAME_MYMODEL\".\"SOMEFIELD\"  IS NULL\nworks\nAt 4.2.0:\nprint(MyModel.objects.filter(somefield=None).only('id').query)\nSELECT \"TNAME_MYMODEL\".\"ID\" FROM \"TNAME_MYMODEL\" WHERE DBMS_LOB.SUBSTR(\"TNAME_MYMODEL\".\"SOMEFIELD\") IS NULL\n\nDatabaseError: ORA-06502: PL/SQL: numeric or value error: character string buffer too small\nORA-06512: at line 1\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\ndjango/db/backends/oracle/operations.py\n  function: DatabaseOperations.fetch_returned_insert_columns\n  function: DatabaseOperations.last_insert_id\n\ndjango/db/backends/postgresql/features.py\n  class: DatabaseFeatures\n"
    },
    {
      "similar_issue": {
        "issue_title": "QuerySet.bulk_create() crashes on mixed case columns in unique_fields/update_fields.",
        "issue_body": "Not sure exactly how to phrase this, but when I I'm calling\nbulk_update\non the manager for a class with\ndb_column\nset on fields the SQL is invalid. Ellipses indicate other fields excluded for clarity.\nclass ActivityBlackListed(models.Model):\n    \"\"\"\n    Originally sourced from Activity_BlackListed in /home/josh/PNDS_Interim_MIS-Data.accdb (13 records)\n    \"\"\"\n\n    class Meta:\n        db_table = \"Activity_BlackListed\"\n\n    blacklistid = models.IntegerField(primary_key=True, db_column=\"BlacklistID\")\n    sectorid = models.IntegerField(null=True, blank=True, db_column=\"SectorID\")\n    ...\nqs.bulk_create(instances, update_conflicts=True, update_fields=[\"sectorid\", ...], unique_fields=[\"blacklistid\"])\nThe \"INSERT\" code does take into account the db_columns\nINSERT INTO \"Activity_BlackListed\" (\"BlacklistID\",...) VALUES (%s,  ...),\nThe code which is generated for \"ON CONFLICT\" uses the field name and not the db_column which leads to a syntax error\n'ON CONFLICT(\"blacklistid\") DO UPDATE SET \"sectorid\" = EXCLUDED.\"sectorid\", ...\nPostgreSQL returns\nERROR:  column \"blacklistid\" does not exist at character 1508\nWhat should be generated is I think:\n'ON CONFLICT(\"BlacklistID\") DO UPDATE SET \"SectorID\" = EXCLUDED.\"SectorID\", ...",
        "issue_id": 34171,
        "pr_number": 16315,
        "pr_title": "Fixed #34171 -- Fixed QuerySet.bulk_create() on fields with db_column in unique_fields/update_fields.",
        "pr_body": "[Ticket 34171](https://code.djangoproject.com/ticket/34171)\r\nI think we don't need separate test for this, just adding a ```db_column``` to one of the models in ```bulk_create``` should do the work.",
        "issue_closed_at": "2022-11-22T13:05:04",
        "base_commit": "7d5329852f19c6ae78c6f6f3d3e41835377bf295"
      },
      "summary": "### Summary: This issue pertains to a bug in Django's QuerySet.bulk_create() method, where SQL queries involving mixed-case column names (specified via db_column) in unique_fields or update_fields result in invalid SQL syntax. This problem arises when the generated SQL code for managing database conflicts does not correctly map Django model field names to their corresponding database column names, leading to syntax errors, particularly in PostgreSQL.\n\n1. **Problem description in general terms**: The issue involves a mismatch between model field names and their corresponding database column names during SQL query generation in Django, specifically within the bulk_create method when handling updates and uniqueness constraints. This mismatch results in invalid SQL queries when columns have mixed-case names.\n\n2. **Key symptoms and behaviors observed**: The primary symptom is a syntax error in SQL queries, specifically during operations involving conflict resolution (ON CONFLICT clause) in the bulk_create method. The error occurs because the SQL code incorrectly uses the model field names instead of the defined db_column names, leading to PostgreSQL errors such as \"column does not exist.\"\n\n3. **Affected components or systems**: The affected components include Django's ORM, particularly the bulk_create method within the QuerySet class, and the SQL query generation logic in the SQL compiler. This issue impacts any Django application using bulk_create with models that define db_column attributes for fields, particularly when using PostgreSQL as the database backend.\n\n4. **Potential impact or severity**: The severity of this issue can be considered moderate as it results in application crashes during bulk data operations, which could disrupt data processing workflows. It may affect any applications that rely on bulk updates involving mixed-case column names, potentially leading to data integrity issues if not addressed.\n\n5. **Relevant technical details abstracted for broader understanding**: The core of the issue is the discrepancy between model field names and database column names during SQL conflict handling. The patch addresses this by ensuring that the SQL generation process correctly maps field names to their db_column equivalents, thus preventing SQL syntax errors. This fix involves changes in several functions within Django's query handling and SQL compilation components.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: QuerySet.bulk_create() crashes on mixed case columns in unique_fields/update_fields.\n\nBody:\nNot sure exactly how to phrase this, but when I I'm calling\nbulk_update\non the manager for a class with\ndb_column\nset on fields the SQL is invalid. Ellipses indicate other fields excluded for clarity.\nclass ActivityBlackListed(models.Model):\n    \"\"\"\n    Originally sourced from Activity_BlackListed in /home/josh/PNDS_Interim_MIS-Data.accdb (13 records)\n    \"\"\"\n\n    class Meta:\n        db_table = \"Activity_BlackListed\"\n\n    blacklistid = models.IntegerField(primary_key=True, db_column=\"BlacklistID\")\n    sectorid = models.IntegerField(null=True, blank=True, db_column=\"SectorID\")\n    ...\nqs.bulk_create(instances, update_conflicts=True, update_fields=[\"sectorid\", ...], unique_fields=[\"blacklistid\"])\nThe \"INSERT\" code does take into account the db_columns\nINSERT INTO \"Activity_BlackListed\" (\"BlacklistID\",...) VALUES (%s,  ...),\nThe code which is generated for \"ON CONFLICT\" uses the field name and not the db_column which leads to a syntax error\n'ON CONFLICT(\"blacklistid\") DO UPDATE SET \"sectorid\" = EXCLUDED.\"sectorid\", ...\nPostgreSQL returns\nERROR:  column \"blacklistid\" does not exist at character 1508\nWhat should be generated is I think:\n'ON CONFLICT(\"BlacklistID\") DO UPDATE SET \"SectorID\" = EXCLUDED.\"SectorID\", ...\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\ndjango/db/models/query.py\n  function: QuerySet._check_bulk_create_options\n  function: QuerySet._check_bulk_create_options\n  function: QuerySet.bulk_create\n\ndjango/db/models/sql/compiler.py\n  function: SQLAggregateCompiler.as_sql\n"
    },
    {
      "similar_issue": {
        "issue_title": "bulk_create with manual primary_key don't update instances state",
        "issue_body": "Given a model with manually defined primary keys:\nclass\nState\n(\nmodels\n.\nModel\n):\ntwo_letter_code\n=\nmodels\n.\nCharField\n(\nmax_length\n=\n2\n,\nprimary_key\n=\nTrue\n)\nPerforming a bulk_create with model instances will not correctly update their state.\nLooping through the instances and calling save() individually will result in instances with different state from instances persisted with bulk_create:\nstate_ca\n=\nState\n(\ntwo_letter_code\n=\n'CA'\n)\nState\n.\nobjects\n.\nbulk_create\n([\nstate_ca\n])\nstate_ca\n.\n_state\n.\nadding\n# => True\nstate_ca\n.\n_state\n.\ndb\n# => None\nstate_ny\n=\nState\n(\ntwo_letter_code\n=\n'NY'\n)\nstate_ny\n.\nsave\n()\nstate_ny\n.\n_state\n.\nadding\n# => False\nstate_ny\n.\n_state\n.\ndb\n# => 'default'\nOne implication of this behavior is that the instances saved with bulk_create can't be used to build relationships with model instances loaded with other Queryset API methods.\nHere is a demonstration:\nclass\nGroup\n(\nmodels\n.\nModel\n):\next_id\n=\nmodels\n.\nCharField\n(\nprimary_key\n=\nTrue\n,\nmax_length\n=\n32\n)\nclass\nAnalyst\n(\nmodels\n.\nModel\n):\next_id\n=\nmodels\n.\nCharField\n(\nprimary_key\n=\nTrue\n,\nmax_length\n=\n32\n)\ngroups\n=\nmodels\n.\nManyToManyField\n(\nGroup\n)\ngroup_aaa\n=\nGroup\n.\nobjects\n.\nget\n(\next_id\n=\n'AAA'\n)\nanalyst_eee\n=\nAnalyst\n(\next_id\n=\n'EEE'\n)\nAnalyst\n.\nobjects\n.\nbulk_create\n([\nanalyst_eee\n])\nanalyst_eee\n.\ngroups\n.\nset\n([\ngroup_aaa\n])\n# ValueError: Cannot add \"<Group: AAA>\": instance is on database \"None\", value is on database \"default\"\nIt fails when the\n._state.db\nis compared.\nA current workaround option is to manually set the\n._state.db\nafter the bulk_create:\nanalyst_eee = Analyst(ext_id='EEE')\nAnalyst.objects.bulk_create([analyst_eee])\nanalyst_eee._state.db = 'default'\n\nanalyst_eee.groups.set([group_aaa]) # And now it works",
        "issue_id": 29367,
        "pr_number": 9902,
        "pr_title": "Fixed #29367 -- Fixed model state on objects with a primary key created with QuerySet.bulk_create().",
        "pr_body": "Instance of models with manually set primary_key persisted with\r\nbulk_create would not update the state `adding` and `db` attributes.\r\n\r\nSee: ticket 29367\r\n",
        "issue_closed_at": "2018-04-27T17:20:07",
        "base_commit": "3246ad106517e61437f80e8ef3c9d216754039e7"
      },
      "summary": "### Summary: This issue arises in the Django ORM when using the `bulk_create` method on model instances with manually defined primary keys. The primary problem is that `bulk_create` does not update the internal state of the instances in the same way as individual `save()` operations do. This results in discrepancies in the instances' state attributes, specifically `_state.adding` and `_state.db`, leading to operational issues when these instances are used in conjunction with other ORM features such as relationships or when they are compared to instances persisted with `save()`.\n\n1. **Problem Description in General Terms**: When using `bulk_create` to persist instances of Django models with manually defined primary keys, the method does not update the instances' internal state correctly. This causes issues when these instances interact with other database operations, as their state does not reflect that of a fully persisted instance.\n\n2. **Key Symptoms and Behaviors Observed**: \n   - Instances created with `bulk_create` have `_state.adding` set to `True` and `_state.db` set to `None`, whereas instances saved with `save()` have `_state.adding` set to `False` and `_state.db` set to the database alias (e.g., `'default'`).\n   - Attempting to establish relationships between instances created with `bulk_create` and those interacting with the database through other methods can lead to errors, such as `ValueError` due to mismatched database states.\n\n3. **Affected Components or Systems**: The issue affects the Django ORM, specifically the `QuerySet.bulk_create` method, and any system relying on it for creating model instances with manually assigned primary keys. It impacts the integrity of model instance states and their ability to form relationships correctly.\n\n4. **Potential Impact or Severity**: The severity of this issue depends on how extensively `bulk_create` is used in the application and whether relationships between models are critical to the application's functionality. It can lead to runtime errors, data integrity problems, and unexpected behavior in applications, especially those relying on complex model relationships.\n\n5. **Relevant Technical Details Abstracted for Broader Understanding**: The internal state of a Django model instance, maintained by the `_state` attribute, is crucial for ORM operations. The discrepancy in this state when using `bulk_create` versus `save()` can lead to mismatches in expected behavior, particularly in database operations and relationships. A workaround involves manually setting the `_state.db` attribute post `bulk_create`, but this is not ideal for large-scale applications or maintaining code simplicity. The patch likely addresses this by ensuring `bulk_create` updates the instance state similarly to individual `save()` operations.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: bulk_create with manual primary_key don't update instances state\n\nBody:\nGiven a model with manually defined primary keys:\nclass\nState\n(\nmodels\n.\nModel\n):\ntwo_letter_code\n=\nmodels\n.\nCharField\n(\nmax_length\n=\n2\n,\nprimary_key\n=\nTrue\n)\nPerforming a bulk_create with model instances will not correctly update their state.\nLooping through the instances and calling save() individually will result in instances with different state from instances persisted with bulk_create:\nstate_ca\n=\nState\n(\ntwo_letter_code\n=\n'CA'\n)\nState\n.\nobjects\n.\nbulk_create\n([\nstate_ca\n])\nstate_ca\n.\n_state\n.\nadding\n# => True\nstate_ca\n.\n_state\n.\ndb\n# => None\nstate_ny\n=\nState\n(\ntwo_letter_code\n=\n'NY'\n)\nstate_ny\n.\nsave\n()\nstate_ny\n.\n_state\n.\nadding\n# => False\nstate_ny\n.\n_state\n.\ndb\n# => 'default'\nOne implication of this behavior is that the instances saved with bulk_create can't be used to build relationships with model instances loaded with other Queryset API methods.\nHere is a demonstration:\nclass\nGroup\n(\nmodels\n.\nModel\n):\next_id\n=\nmodels\n.\nCharField\n(\nprimary_key\n=\nTrue\n,\nmax_length\n=\n32\n)\nclass\nAnalyst\n(\nmodels\n.\nModel\n):\next_id\n=\nmodels\n.\nCharField\n(\nprimary_key\n=\nTrue\n,\nmax_length\n=\n32\n)\ngroups\n=\nmodels\n.\nManyToManyField\n(\nGroup\n)\ngroup_aaa\n=\nGroup\n.\nobjects\n.\nget\n(\next_id\n=\n'AAA'\n)\nanalyst_eee\n=\nAnalyst\n(\next_id\n=\n'EEE'\n)\nAnalyst\n.\nobjects\n.\nbulk_create\n([\nanalyst_eee\n])\nanalyst_eee\n.\ngroups\n.\nset\n([\ngroup_aaa\n])\n# ValueError: Cannot add \"<Group: AAA>\": instance is on database \"None\", value is on database \"default\"\nIt fails when the\n._state.db\nis compared.\nA current workaround option is to manually set the\n._state.db\nafter the bulk_create:\nanalyst_eee = Analyst(ext_id='EEE')\nAnalyst.objects.bulk_create([analyst_eee])\nanalyst_eee._state.db = 'default'\n\nanalyst_eee.groups.set([group_aaa]) # And now it works\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\ndjango/db/models/query.py\n  function: QuerySet.bulk_create\n"
    },
    {
      "similar_issue": {
        "issue_title": "QuerySet.prefetch_related() crashes with KeyError if model uses to_field and string primary key",
        "issue_body": "The issue:\nprefetch_related failed if prefetching by char primary key.\nDjango version 1.11.3\nPython 2.7\nreproducible steps:\n1) django-admin startproject pk_string\n2) cd pk_string\n3) django-admin startapp users\n4) update  users.models\n# -*- coding: utf-8 -*-\nfrom\n__future__\nimport\nunicode_literals\nfrom\ndjango.db\nimport\nmodels\n# Create your models here.\nclass\nUser\n(\nmodels\n.\nModel\n):\nemail\n=\nmodels\n.\nCharField\n(\nmax_length\n=\n255\n,\nunique\n=\nTrue\n)\nclass\nUserData\n(\nmodels\n.\nModel\n):\nemail\n=\nmodels\n.\nOneToOneField\n(\nUser\n,\nto_field\n=\n'email'\n,\nprimary_key\n=\nTrue\n)\nnote\n=\nmodels\n.\nCharField\n(\nmax_length\n=\n255\n);\n5) install app pk_string.settings\n...\nINSTALLED_APPS\n=\n[\n'django.contrib.admin'\n,\n'django.contrib.auth'\n,\n'django.contrib.contenttypes'\n,\n'django.contrib.sessions'\n,\n'django.contrib.messages'\n,\n'django.contrib.staticfiles'\n,\n'users'\n,\n]\n...\n6) ./manage.py makemigrations\n7) ./manage.py migrate\n8) ./manage.py shell\nfrom\nusers.models\nimport\nUser\n,\nUserData\nUser\n.\nobjects\n.\ncreate\n(\nemail\n=\n'111111'\n)\nUser\n.\nobjects\n.\ncreate\n(\nemail\n=\n'222222'\n)\nUser\n.\nobjects\n.\ncreate\n(\nemail\n=\n'333333'\n)\nUser\n.\nobjects\n.\ncreate\n(\nemail\n=\n'444444'\n)\nusers\n=\nUser\n.\nobjects\n.\nall\n()\n.\nprefetch_related\n(\n'userdata'\n)\nusers\n[\n0\n]\n>>>\n\"<User: User object>\"\nUserData\n.\nobjects\n.\ncreate\n(\nemail\n=\nusers\n[\n0\n]\n,\nnote\n=\n'111'\n)\nUserData\n.\nobjects\n.\ncreate\n(\nemail\n=\nusers\n[\n2\n]\n,\nnote\n=\n'222'\n)\nUserData\n.\nobjects\n.\ncreate\n(\nemail\n=\nusers\n[\n3\n]\n,\nnote\n=\n'333'\n)\nusers\n=\nUser\n.\nobjects\n.\nall\n()\n.\nprefetch_related\n(\n'userdata'\n)\n>>>\nusers\n[\n0\n]\nTraceback\n(\nmost\nrecent\ncall\nlast\n):\nFile\n\"<console>\"\n,\nline\n1\n,\nin\n<\nmodule\n>\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n289\n,\nin\n__getitem__\nreturn\nlist\n(\nqs\n)[\n0\n]\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n250\n,\nin\n__iter__\nself\n.\n_fetch_all\n()\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n1120\n,\nin\n_fetch_all\nself\n.\n_prefetch_related_objects\n()\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n675\n,\nin\n_prefetch_related_objects\nprefetch_related_objects\n(\nself\n.\n_result_cache\n,\n*\nself\n.\n_prefetch_related_lookups\n)\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n1469\n,\nin\nprefetch_related_objects\nobj_list\n,\nadditional_lookups\n=\nprefetch_one_level\n(\nobj_list\n,\nprefetcher\n,\nlookup\n,\nlevel\n)\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n1582\n,\nin\nprefetch_one_level\nprefetcher\n.\nget_prefetch_queryset\n(\ninstances\n,\nlookup\n.\nget_current_queryset\n(\nlevel\n)))\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/fields/related_descriptors.py\"\n,\nline\n362\n,\nin\nget_prefetch_queryset\ninstance\n=\ninstances_dict\n[\nrel_obj_attr\n(\nrel_obj\n)]\nKeyError\n:\nu\n'111111'",
        "issue_id": 28375,
        "pr_number": 8905,
        "pr_title": "Fixed #28375 -- Fixed KeyError raised on reverse prefetch of a model with OneToOne primary key to non-pk field",
        "pr_body": "https://code.djangoproject.com/ticket/28375",
        "issue_closed_at": "2017-08-21T15:47:49",
        "base_commit": "b5ad5c628a0327c2208d76e5cacb3cb6f48750b5"
      },
      "summary": "### Summary:\nThis issue pertains to a bug in Django version 1.11.3 where the `QuerySet.prefetch_related()` method crashes with a `KeyError` when models use a `to_field` attribute and a string-based primary key. This error occurs during prefetching operations involving related models, specifically in scenarios where a `OneToOneField` uses a character field as its primary key. The primary symptom is a `KeyError` that arises during the prefetching process, indicating a failure to properly map or retrieve related objects based on the specified field. The affected component is the Django ORM, particularly the function `ManyRelatedManager.get_prefetch_queryset` within `django/db/models/fields/related_descriptors.py`. The potential impact of this bug is significant, as it can lead to application crashes or incorrect data retrieval in any Django application that relies on prefetching related objects using a non-integer primary key. This issue highlights a necessity for robust handling of non-standard primary keys in Django's ORM operations.",
      "prompt_used": "You are an expert in software issue reasoning analysis.\nGiven the following problem report and its fixed code elements, generate a comprehensive summary based on the entire document. Your goal is to abstract the information in the problem description into a more general description.\n\n## Original Issue Report:\nTitle: QuerySet.prefetch_related() crashes with KeyError if model uses to_field and string primary key\n\nBody:\nThe issue:\nprefetch_related failed if prefetching by char primary key.\nDjango version 1.11.3\nPython 2.7\nreproducible steps:\n1) django-admin startproject pk_string\n2) cd pk_string\n3) django-admin startapp users\n4) update  users.models\n# -*- coding: utf-8 -*-\nfrom\n__future__\nimport\nunicode_literals\nfrom\ndjango.db\nimport\nmodels\n# Create your models here.\nclass\nUser\n(\nmodels\n.\nModel\n):\nemail\n=\nmodels\n.\nCharField\n(\nmax_length\n=\n255\n,\nunique\n=\nTrue\n)\nclass\nUserData\n(\nmodels\n.\nModel\n):\nemail\n=\nmodels\n.\nOneToOneField\n(\nUser\n,\nto_field\n=\n'email'\n,\nprimary_key\n=\nTrue\n)\nnote\n=\nmodels\n.\nCharField\n(\nmax_length\n=\n255\n);\n5) install app pk_string.settings\n...\nINSTALLED_APPS\n=\n[\n'django.contrib.admin'\n,\n'django.contrib.auth'\n,\n'django.contrib.contenttypes'\n,\n'django.contrib.sessions'\n,\n'django.contrib.messages'\n,\n'django.contrib.staticfiles'\n,\n'users'\n,\n]\n...\n6) ./manage.py makemigrations\n7) ./manage.py migrate\n8) ./manage.py shell\nfrom\nusers.models\nimport\nUser\n,\nUserData\nUser\n.\nobjects\n.\ncreate\n(\nemail\n=\n'111111'\n)\nUser\n.\nobjects\n.\ncreate\n(\nemail\n=\n'222222'\n)\nUser\n.\nobjects\n.\ncreate\n(\nemail\n=\n'333333'\n)\nUser\n.\nobjects\n.\ncreate\n(\nemail\n=\n'444444'\n)\nusers\n=\nUser\n.\nobjects\n.\nall\n()\n.\nprefetch_related\n(\n'userdata'\n)\nusers\n[\n0\n]\n>>>\n\"<User: User object>\"\nUserData\n.\nobjects\n.\ncreate\n(\nemail\n=\nusers\n[\n0\n]\n,\nnote\n=\n'111'\n)\nUserData\n.\nobjects\n.\ncreate\n(\nemail\n=\nusers\n[\n2\n]\n,\nnote\n=\n'222'\n)\nUserData\n.\nobjects\n.\ncreate\n(\nemail\n=\nusers\n[\n3\n]\n,\nnote\n=\n'333'\n)\nusers\n=\nUser\n.\nobjects\n.\nall\n()\n.\nprefetch_related\n(\n'userdata'\n)\n>>>\nusers\n[\n0\n]\nTraceback\n(\nmost\nrecent\ncall\nlast\n):\nFile\n\"<console>\"\n,\nline\n1\n,\nin\n<\nmodule\n>\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n289\n,\nin\n__getitem__\nreturn\nlist\n(\nqs\n)[\n0\n]\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n250\n,\nin\n__iter__\nself\n.\n_fetch_all\n()\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n1120\n,\nin\n_fetch_all\nself\n.\n_prefetch_related_objects\n()\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n675\n,\nin\n_prefetch_related_objects\nprefetch_related_objects\n(\nself\n.\n_result_cache\n,\n*\nself\n.\n_prefetch_related_lookups\n)\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n1469\n,\nin\nprefetch_related_objects\nobj_list\n,\nadditional_lookups\n=\nprefetch_one_level\n(\nobj_list\n,\nprefetcher\n,\nlookup\n,\nlevel\n)\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/query.py\"\n,\nline\n1582\n,\nin\nprefetch_one_level\nprefetcher\n.\nget_prefetch_queryset\n(\ninstances\n,\nlookup\n.\nget_current_queryset\n(\nlevel\n)))\nFile\n\"/usr/local/lib/python2.7/site-packages/django/db/models/fields/related_descriptors.py\"\n,\nline\n362\n,\nin\nget_prefetch_queryset\ninstance\n=\ninstances_dict\n[\nrel_obj_attr\n(\nrel_obj\n)]\nKeyError\n:\nu\n'111111'\n\n## Code elements fixed by the patch:\n{FIXED_CODE_ELEMENTS}\n\nPlease analyze the above issue report and provide a structured summary that includes:\n1. Problem description in general terms\n2. Key symptoms and behaviors observed\n3. Affected components or systems\n4. Potential impact or severity\n5. Any relevant technical details abstracted for broader understanding\n\nPlease return the summary with “### Summary:\", For example:\n### Summary: This issue is ...\n\nChanges Summary:\ndjango/db/models/fields/related_descriptors.py\n  function: ManyRelatedManager.get_prefetch_queryset\n"
    }
  ]
}