{
  "metadata": {
    "database_type": "small",
    "source_stage": "Stage 10 (final documents with proposals already applied and problems filtered)",
    "processing_completed": true,
    "chromadb_populated": true,
    "chunking_enabled": true,
    "chunk_split_size": 500,
    "embedding_functions_used": [
      "default",
      "text-embedding-3-small",
      "text-embedding-3-large"
    ],
    "created_at": "2025-09-11T02:14:23.061231",
    "collections": {
      "default": {
        "collection_name": "default",
        "embedding_function": "DefaultEmbeddingFunction",
        "document_count": 2011,
        "db_path": "./databases/chroma_db/small"
      },
      "text-embedding-3-small": {
        "collection_name": "openai-small",
        "embedding_function": "OpenAIEmbeddingFunction",
        "document_count": 2011,
        "db_path": "./databases/chroma_db/small"
      },
      "text-embedding-3-large": {
        "collection_name": "openai-large",
        "embedding_function": "OpenAIEmbeddingFunction",
        "document_count": 2011,
        "db_path": "./databases/chroma_db/small"
      }
    },
    "subsample_info": {
      "original_document_count": 457,
      "sampled_document_count": 2,
      "sampling_timestamp": "2025-09-13T14:40:43.267048"
    }
  },
  "problem_to_documents": {
    "gsm8k-1588-train": [
      "hobby_club_newsletter_mary_sailboat_announcement",
      "mary_sailboat_project_journal_sail_planning",
      "millers_craft_store_canvas_receipt",
      "mary_sailboat_jib_sail_pattern",
      "mary_sailboat_mizzen_sail_pattern",
      "mary_triangle_area_reference_notes"
    ],
    "gsm8k-1892-train": [
      "tamika_driver_log_charity_relay",
      "tamika_speed_report_charity_relay",
      "logan_drive_completion_certificate",
      "logan_speed_telematics_report"
    ]
  },
  "problem_ids": [
    "gsm8k-1892-train",
    "gsm8k-1588-train"
  ],
  "documents": [
    {
      "question_id": "gsm8k-1588-train",
      "document_ids": [
        "hobby_club_newsletter_mary_sailboat_announcement",
        "mary_sailboat_project_journal_sail_planning",
        "millers_craft_store_canvas_receipt",
        "mary_sailboat_jib_sail_pattern",
        "mary_sailboat_mizzen_sail_pattern",
        "mary_triangle_area_reference_notes"
      ],
      "documents": [
        {
          "id": "hobby_club_newsletter_mary_sailboat_announcement",
          "document": "RIVERSIDE HOBBY CLUB\nMONTHLY NEWSLETTER\nEarly 90s\n\nMEMBER PROJECT SPOTLIGHT\n\nWe're excited to announce that our long-time member has begun work on her newest project - a detailed model sailboat! She shared her plans with the club during our last meeting, and we're all looking forward to seeing her progress in the coming weeks.\n\n\"I've always been fascinated by sailing vessels,\" the member told us, \"and after completing my model train last year, I felt ready to take on this new challenge.\"\n\nFellow members interested in maritime modeling are encouraged to stop by during our Tuesday workshop hours to share tips and techniques with her as she works on this exciting new project.\n\n-------------------\nNewsletter Editor\nRiverside Hobby Club",
          "metadata": {
            "Type": "Newsletter",
            "timestamp": "1992-04-01T00:00:00",
            "identities": {
              "long-time member": "Mary Thompson",
              "Newsletter Editor": "James Wilson"
            },
            "source": "Riverside Hobby Club Newsletter Archive"
          }
        },
        {
          "id": "mary_sailboat_project_journal_sail_planning",
          "document": "Project Journal\nMary Thompson\nModel Sailboat Build\n\nApril 3, 1992\n\nAfter researching traditional sailing vessels and consulting several model-making books from the library, I've decided on the sail configuration for my model sailboat. I'll be adding three sails to create an authentic appearance. This should give the model good proportions and match the historical accuracy I'm aiming for. \n\nTomorrow I'll start working on the sail patterns and material selection. I picked up some canvas from Miller's Craft Store - hopefully it will work well for this scale.\n\nTasks for next session:\n- Draw sail patterns\n- Test canvas with small sample\n- Check rigging requirements for three-sail setup",
          "metadata": {
            "Type": "Project Journal Entry",
            "Timestamp": "1992-04-03T00:00:00",
            "names": [
              "Mary Thompson"
            ]
          }
        },
        {
          "id": "millers_craft_store_canvas_receipt",
          "document": "MILLER'S CRAFT & HOBBY SUPPLIES\n1234 Main Street\nRiverside, CA 92501\nTel: (714) 555-0123\n\nSALES RECEIPT\nDate: 04/05/1992\nTime: 10:23 AM\nCashier: Sarah\n\nCustomer: Mary Thompson\nProject: Model Sailboat\n\n--------------------------------\n1x Custom-Cut Canvas Piece\n    Rectangular - 5\" x 8\"\n    (Main Sail Material)      $3.25\n--------------------------------\nSubtotal:                    $3.25\nTax (6.5%):                  $0.21\n--------------------------------\nTotal:                       $3.46\n\nPayment Method: Cash\nAmount Tendered:             $5.00\nChange Due:                  $1.54\n\nThank you for shopping at \nMiller's Craft & Hobby Supplies!\n\n*All custom-cut materials are \nfinal sale*\n\nReceipt #: 92405-234",
          "metadata": {
            "Type": "Sales Receipt",
            "Timestamp": "1992-04-05T10:23:00",
            "names": [
              "Mary Thompson",
              "Sarah"
            ]
          }
        },
        {
          "id": "mary_sailboat_jib_sail_pattern",
          "document": "SAIL PATTERN SPECIFICATIONS\nProject: Model Sailboat\nComponent: Jib Sail\nDesigner: Mary Thompson\nDate: April 6, 1992\n\n[Technical Drawing]\n    |\\ \n    | \\\n    |  \\\n 4\" |   \\\n    |    \\\n    |     \\\n    --------\n      3\"\n\nSPECIFICATIONS:\n- Shape: Right Triangle\n- Base Length: 3 inches\n- Height: 4 inches\n- Material: Canvas\n- Color: Natural\n- Hem Allowance: Add 1/4\" on all sides\n\nNotes:\nCut along solid lines. Ensure right angle is precise \nfor proper fit. Pattern tested and verified for \ndimensions.\n\nPattern #: MS-92-J1\nVerified by: MT\nScale: 1:1",
          "metadata": {
            "Type": "Technical Drawing",
            "Timestamp": "1992-04-06T14:00:00",
            "names": [
              "Mary Thompson"
            ]
          }
        },
        {
          "id": "mary_sailboat_mizzen_sail_pattern",
          "document": "SAIL PATTERN SPECIFICATIONS\nProject: Model Sailboat\nComponent: Mizzen Sail\nDesigner: Mary Thompson\nDate: April 7, 1992\n\n[Technical Drawing]\n    |\\ \n    | \\\n    |  \\\n    |   \\\n    |    \\\n 6\" |     \\\n    |      \\\n    |       \\\n    |        \\\n    ----------\n       4\"\n\nSPECIFICATIONS:\n- Shape: Right Triangle\n- Base Length: 4 inches\n- Height: 6 inches\n- Material: Canvas\n- Color: Natural\n- Hem Allowance: Add 1/4\" on all sides\n\nNotes:\nCut along solid lines. Ensure right angle is precise \nfor proper fit. Pattern tested and verified for \ndimensions.\n\nPattern #: MS-92-M1\nVerified by: MT\nScale: 1:1",
          "metadata": {
            "Type": "Technical Drawing",
            "Timestamp": "1992-04-07T10:30:00",
            "names": [
              "Mary Thompson"
            ]
          }
        },
        {
          "id": "mary_triangle_area_reference_notes",
          "document": "MODEL MAKING REFERENCE NOTES\nBy: Mary Thompson\nDate: April 4, 1992\n\nCALCULATING SAIL AREAS\n-----------------------\n\nGEOMETRIC PRINCIPLE FOR TRIANGULAR SHAPES:\n\nTo find the area of a triangle:\n1. Take a square with the same height and length as the triangle\n2. Divide the square's area by 2\n\n[Diagram]\n   Square               Triangle\n    ____                  /|\n   |    |                / |\n   |    |     -->      /  |\n   |    |              /   |\n   |____|             /____|\n\nNotes from \"Model Maker's Mathematics\" \n(Library Reference Section)\n\nRemember: This principle works for ANY right triangle!\n* Helpful for calculating material requirements\n* Works regardless of triangle size\n* Keep this handy for future sail projects\n\n-------------------\nPage 12 of Reference Notebook",
          "metadata": {
            "Type": "Reference Notes",
            "Timestamp": "1992-04-04T15:00:00",
            "names": [
              "Mary Thompson"
            ]
          }
        }
      ],
      "question": "How many square inches of canvas does Mary need in April 1992?",
      "premises": [
        {
          "content": "Mary is making a model sailboat."
        },
        {
          "content": "Mary wants to add three sails to her model sailboat."
        },
        {
          "content": "One of the sails Mary wants to add is rectangular and measures 5 inches by 8 inches."
        },
        {
          "content": "One of the sails Mary wants to add is a right triangular sail that's 3 inches long at the bottom and 4 inches tall."
        },
        {
          "content": "One of the sails Mary wants to add is a right triangular sail that's 4 inches long at the bottom and 6 inches tall."
        },
        {
          "content": "The area of a triangle can be found by dividing the area of a square with the same height and length by 2."
        }
      ],
      "final_answer": 58.0,
      "timestamp": "1992-04",
      "original": {
        "question": "Mary is making a model sailboat. She wants to add three sails: a rectangular sail that measures 5 inches by 8 inches and two right triangular sails, one that's 3 inches long at the bottom and 4 inches tall and one that's 4 inches long at the bottom and 6 inches tall. (Remember you can find the area of a triangle by dividing the area of a square with the same height and length by 2). How many square inches of canvas does she need total?",
        "answer": "First find the area of the square sail: 5 inches * 8 inches = <<5*8=40>>40 square inches\nThen find the area of a square sail with the same height and length as the first triangular sail: 3 inches * 4 inches = <<3*4=12>>12 square inches\nThen divide the area in two to find the area of the triangular sail: 12 square inches / 2 = <<12/2=6>>6 square inches\nThen find the area of a square sail with the same height and length as the second triangular sail: 4 inches * 6 inches = <<4*6=24>>24 square inches\nThen divide the area in two to find the area of the triangular sail: 24 square inches / 2 = <<24/2=12>>12 square inches\nThen add up the areas of all the sails to find the total amount of canvas needed: 12 square inches + 6 square inches + 40 square inches = <<12+6+40=58>>58 square inches\n#### 58"
      },
      "entity": {
        "name": "Mary",
        "is_generic": false
      },
      "source_split": "train"
    },
    {
      "question_id": "gsm8k-1892-train",
      "document_ids": [
        "tamika_driver_log_charity_relay",
        "tamika_speed_report_charity_relay",
        "logan_drive_completion_certificate",
        "logan_speed_telematics_report"
      ],
      "documents": [
        {
          "id": "tamika_driver_log_charity_relay",
          "document": "CHARITY RELAY DRIVE\nDRIVER LOG SHEET\n\nDriver Name: Tamika Johnson\nVehicle ID: CR-143\nEvent: Summer Charity Relay 2023\n\nDRIVE TIME LOG:\nStart Time: 8:00 AM\nEnd Time: 4:00 PM\nTotal Drive Duration: 8 hours\n\nDriver Signature: _Tamika Johnson_\nSupervisor Signature: _Mark Chen_\n\nNotes: Successfully completed assigned relay segment. All safety protocols followed.",
          "metadata": {
            "Type": "Driver Log",
            "Timestamp": "2023-06-15T16:00:00",
            "names": [
              "Tamika Johnson",
              "Mark Chen"
            ]
          }
        },
        {
          "id": "tamika_speed_report_charity_relay",
          "document": "RELAY DRIVE GPS TRACKING REPORT\nGenerated by SafeTrack\u2122 Fleet Management\n\nDriver: Tamika Johnson\nVehicle: CR-143\nDate: June 15, 2023\nReport Type: Speed Analysis\n\nSPEED SUMMARY\nAverage Speed Maintained: 45 miles per hour\nRoute: Eastern Charity Relay Route B\nStatus: Within Safety Parameters \u2713\n\nSystem Notes:\nConsistent speed maintenance observed throughout relay segment.\nNo safety violations detected.\nGPS tracking functional throughout journey.\n\nReport Generated by: SafeTrack\u2122 System\nVerification Code: ST23-6154-TJ",
          "metadata": {
            "Type": "GPS Report",
            "Timestamp": "2023-06-15T16:30:00",
            "names": [
              "Tamika Johnson"
            ]
          }
        },
        {
          "id": "logan_drive_completion_certificate",
          "document": "VOLUNTEER DRIVER COMPLETION CERTIFICATE\n\nThis certifies that\n\nLOGAN MARTINEZ\n\nhas successfully completed their assigned relay segment\nfor the Summer Charity Relay 2023\n\nSegment Details:\nDriver Shift Duration: 5 hours\nRoute Section: Western Route A\nVehicle Assignment: CR-144\n\nShift Start: 9:30 AM\nShift End: 2:30 PM\n\nCertificate issued by:\n_Sarah Thompson_\nSarah Thompson\nRelay Coordinator\n\nCertificate ID: SCR-0615-LM-5\nEvent Location: Western District",
          "metadata": {
            "Type": "Completion Certificate",
            "Timestamp": "2023-06-15T14:30:00",
            "names": [
              "Logan Martinez",
              "Sarah Thompson"
            ]
          }
        },
        {
          "id": "logan_speed_telematics_report",
          "document": "VEHICLE TELEMATICS ANALYSIS\nSummer Charity Relay 2023\n\nDRIVER PERFORMANCE REPORT\nGenerated by AutoTrack Pro\u2122\n\nDriver: Logan Martinez\nVehicle ID: CR-144\nRoute: Western Route A\n\nSPEED METRICS\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\nAverage Speed: 55 miles per hour\nStatus: Within Approved Range \u2713\n\nVERIFICATION DETAILS\nAnalysis Period: Western Route A Segment\nSystem ID: ATP-0615-LM\nData Quality: 100% Capture Rate\n\nNotes:\n\u2022 Consistent speed maintenance observed\n\u2022 All speed readings within safety parameters\n\u2022 Telematics system functioning normally\n\nGenerated by: AutoTrack Pro\u2122\nReport ID: ATP-2023-0615-144\nData Certified By: _Michael Reed_\nFleet Safety Officer",
          "metadata": {
            "Type": "Telematics Report",
            "Timestamp": "2023-06-15T15:00:00",
            "names": [
              "Logan Martinez",
              "Michael Reed"
            ]
          }
        }
      ],
      "question": "How many miles farther did Tamika drive than Logan?",
      "premises": [
        {
          "content": "Tamika drove for 8 hours."
        },
        {
          "content": "Tamika's average speed was 45 miles per hour."
        },
        {
          "content": "Logan drove for 5 hours."
        },
        {
          "content": "Logan's average speed was 55 miles per hour."
        }
      ],
      "final_answer": 85.0,
      "timestamp": null,
      "original": {
        "question": "Tamika drove for 8 hours at an average speed of 45 miles per hour. Logan drove for 5 hours at 55 miles an hour. How many miles farther did Tamika drive?",
        "answer": "Tamika drove 8 hours * 45 mph = <<8*45=360>>360 miles\nLogan drove 5 hours * 55 mph = <<5*55=275>>275 miles\n360 - 275 = <<360-275=85>>85 miles\nTamika drove 85 miles farther.\n#### 85"
      },
      "entity": {
        "name": [
          "Tamika",
          "Logan"
        ],
        "is_generic": false
      },
      "source_split": "train"
    }
  ],
  "statistics": {
    "total_problems": 2,
    "total_documents": 2,
    "unique_documents": 10,
    "train_problems": 2,
    "test_problems": 0,
    "unknown_split_problems": 0,
    "avg_documents_per_problem": 1.0
  }
}
