"""
Improved prompts for the build corpus pipeline - optimized for 90% reusability
"""

OPEN_CODING_PROMPT = """IMPORTANT: Skip thinking mode. Do not use <think> tags. Go straight to answering with the JSON response.

You are performing focused open coding on text data to identify the most important themes, patterns, and concepts.

Research Question: {question}

Text Chunk: {chunk}

Instructions:
Generate as many unique, non-overlapping codes as you deem necessary to capture the essential themes and concepts in the text. Focus on QUALITY over quantity.

CRITICAL REQUIREMENTS:
- Each code must be UNIQUE and DISTINCT from all other codes
- Avoid generating similar codes with slight variations
- Only generate codes that add meaningful value to understanding the content
- Each code should capture a different aspect or concept
- Focus on the most important and meaningful concepts only
- Generate as many codes as needed, but no more than necessary

Code Generation Guidelines:
- Each code should be 5-15 words and clearly descriptive
- Use precise, meaningful language
- Focus on concepts that would be relevant across similar contexts
- Capture both explicit and implicit concepts
- Consider the most significant themes and patterns
- Prioritize concepts that are central to understanding the content
- Stop when you have captured all meaningful concepts

QUALITY CONTROL:
- Before finalizing, review all codes to ensure they are distinct
- Remove any codes that overlap with others
- Each code should add unique value to understanding the content
- If a code doesn't add new insight, don't include it
- Better to have fewer high-quality codes than many overlapping ones

Output format: JSON with a "codes" array containing the generated codes
Example: {{"codes": ["Theme 1", "Theme 2", "Theme 3", "Theme 4", "Theme 5"]}}"""

CODE_REPLACEMENT_PROMPT = """
IMPORTANT: Skip thinking mode. Do not use <think> tags. Go straight to answering.

You are evaluating whether any of the candidate codes can semantically cover the current code.

Research Question: {question}

Current Code: {current_code}

Candidate Codes:
{candidate_codes}

Instructions:
- Your goal is to find a candidate code that can SEMANTICALLY COVER the current code
- Focus on MEANING and INFORMATION RETENTION, not exact wording
- A candidate can replace the current code if it captures ANY of the following:
  * Core concept or idea (even partially)
  * Underlying principle or method
  * Domain or category of knowledge
  * Functional purpose or outcome
  * Strategic approach or framework
  * Related theme or pattern
  * Similar context or application
  * Overlapping concepts or ideas
  * ANY remotely related concept
  * ANY concept in the same general domain
  * ANY concept that could be tangentially related
  * ANY concept that shares similar keywords
  * ANY concept that could be a broader category
  * ANY concept that could be a specific instance
  * ANY concept that could be a related process
  * ANY concept that could be a similar outcome
  * ANY concept that could be a parallel approach
  * ANY concept that could be a related tool
  * ANY concept that could be a similar mindset
  * ANY concept that could be a related challenge
  * ANY concept that could be a similar solution
  * ANY concept that could be a related principle
  * ANY concept that could be a similar framework
  * ANY concept that could be a related strategy
  * ANY concept that could be a similar behavior
  * ANY concept that could be a related practice
  * ANY concept that could be a similar result
  * ANY concept that could be a related insight
  * ANY concept that could be a similar learning
  * ANY concept that could be a related connection
  * ANY concept that could be a similar relationship
  * ANY concept that could be a related pattern
  * ANY concept that could be a similar trend
  * ANY concept that could be a related implication
  * ANY concept that could be a similar application
  * ANY concept that could be a related foundation
  * ANY concept that could be a similar theory
- Be EXTREMELY GENEROUS - if a candidate code captures even a TINY PART of the meaning, it can replace the current code
- Consider that training codes are optimized for generalization and broad applicability
- Think of this as "conceptual coverage" rather than "exact matching"
- Examples of semantic coverage:
  * "Job Crafting" → "Autonomy Through Demonstrated Mastery" (both about taking control of work)
  * "Work-life balance" → "Integrating life and work priorities" (both about balancing life aspects)
  * "Productivity tools" → "Enhancing Productivity with Practical Tools" (same concept, different wording)
  * "Time management" → "Strategic Prioritization for Resource Efficiency" (both about optimizing time use)
  * "Learning" → "Interactive learning through real-world examples" (learning is covered by interactive learning)
  * "Motivation" → "Sustained effort through intrinsic motivation" (motivation is covered by intrinsic motivation)
  * "Task delegation" → "Delegation for Process Optimization and Scalability" (both about delegation)
  * "Three Core Actions" → "Action-Oriented Iteration Over Perfectionism" (both about taking action)
  * "Grammarly as writing assistant" → "Enhancing Productivity with Practical Tools" (both about productivity tools)
  * "Reading habits" → "Commitment to self-education for financial success" (both about learning/education)
  * "Economic Entities" → "Individual vs business entity" (both about entity types)
  * "Profit Margins" → "Monetary Validation of Product Value" (both about financial validation)
  * "Employee Level" → "Transition from employee to business owner for financial independence" (both about employee status)
  * "Customer engagement tools" → "User-Centric Product Evolution Through Feedback Loops" (both about user interaction)
  * "Sales as core capitalist skill" → "Monetary Validation of Product Value" (both about sales/money)
  * "Hiring and delegation" → "Strategic hiring to sustain operational scalability" (both about hiring)
  * "Hardware vs. software complexity" → "Differentiating Between Consulting and Product-Based Models" (both about complexity differences)
  * "Slack/Discord for User Communication" → "User-Centric Product Evolution Through Feedback Loops" (both about user communication)
  * "Asymmetric career growth" → "Transition from employee to business owner for financial independence" (both about career progression)
  * "Unadvertised workplace opportunities" → "Leveraging Personal Networks for Early Testing" (both about hidden opportunities)
  * "Supportive relationships in business" → "Leveraging Personal Networks for Early Testing" (both about relationships)
  * "Unfair Advantage in Access to Users" → "Leveraging Personal Networks for Early Testing" (both about user access)
  * "Three Core Actions" → "Action-Oriented Iteration Over Perfectionism" (both about taking action)
- When in doubt, choose the BEST available match rather than "none"
- Only respond "none" if absolutely no candidate captures any relevant meaning whatsoever
- Look for conceptual similarity even if wording is completely different
- Consider broader categories and themes
- Be VERY LIBERAL in your interpretation
- If a candidate can semantically cover it (even partially), respond with the number (1-100)
- If no candidate can cover it at all, respond with "none"
"""
