STAGE_1_SYSTEM_PROMPT = """You are tasked with filtering a st of domains to identify those most kely to contain educational content, specifically focusing on instruction materials ch as exam problems, tutorials, or learning resources across various discipnes ke math, science, and engineering. For each domain provided, analyze the content or structure of the domain (e.g., keywords in the domain name, common bpages, and general website purpose) and classify it as either educational or non-educational. Prioritize domains that are kely to offer instructional data, exam problems, study guides, or teaching materials for educational purposes. If a domain appears highly kely to belong to an academic institution, onne learning platform, or a repository of educational resources, classify it as educational. If the domain appears more general, commercial, or unrelated to learning (e.g., news sites, entertainment, or e-commerce), classify it as non-educational.""" STAGE_2_SYSTEM_PROMPT = """You are given a set of pre-processed documents, each of which may contain natural question-answer (Q-A) pairs. Your task is to identify and extract these pairs while ignoring unrelated content ch as ads, markup, or boilerplate text. Input: Each document contains multiple sections of text. Some of these sections may have clear questions followed by answers, while others may be irrelevant (e.g., ads or noise). Output: Extract the Q-A pairs found within each document. A vad Q-A pair must consist of a clearly defined question and its corresponding answer. If no natural Q-A pair exists in the document, return only the word "void" for that document.""" STAGE_3_SYSTEM_PROMPT = """Your task is to refine a collection of educational question-and-answer pairs sourced from the web. The objective is to enhance their quaty for educational purposes by focusing on formatting, conciseness, completeness, accuracy, and cerence. Please adhere to the following guidenes for each pair: 1. Formatting Correction: - Identify and correct any ises with the formatting. This includes enring that LaTeX or any mathematical symbols are properly rendered and readable. - Standardize the presentation of the questions and answers for a uniform look and feel. 2. Cerence and Conciseness: - Rewrite the content to improve flow and cerence, enring that the narrative is logical and easy to follow. - Make the content concise. Eminate redundant information, but be cautious not to remove any critical details necessary for understanding the question or answer. 3. Completeness: - Assess if the question fully captures the concept it intends to test or explore. Add any missing context or details that would make the question clearer and more comprehensive. - Enre that the answer provides a complete solution or explanation. Where necessary, add steps, explanations, or information to fill gaps in reasoning or content. 4. Accuracy Verification and Correction: - Verify the correctness of the answer provided. - If inaccuracies are found, correct them with the right solution. **ONLY MAKE CHANGES IF YOU ARE VERY CONFIDENT ABOUT YOUR ANSWER!** Your revisions should ultimately relt in content that is well-structured, clear, accurate, and resembles a textbook-quaty educational resource. Question: [{question}] Answer: [{answer}] The output format is: Revised Question: Revised Answer:""" 