## STRATEGIES & INSIGHTS

[sai-00001] helpful=1 harmful=0 :: For Boc protection of amines, especially in molecules with multiple nucleophilic sites (e.g., aliphatic and aromatic amines), di-tert-butyl dicarbonate (Boc2O) is the preferred reagent over tert-butyl chloroformate (Boc-Cl). Boc2O offers superior stability, selectivity, and minimizes side reactions like over-protection.
[sai-00003] helpful=2 harmful=2 :: For oxidation retrosynthesis, identify the highest oxidation state functional groups (aldehydes, carboxylic acids) and replace them with their precursor states (primary alcohols for aldehydes/carboxylic acids, secondary alcohols for ketones). Always verify the SMILES modification maintains proper aromaticity and connectivity.
[sai-00004] helpful=17 harmful=6 :: In retrosynthetic analysis, prioritize the disconnection of amide bonds (especially those formed via acylation reactions) before considering protecting group manipulations. Amide bonds are stable but are common targets for synthesis via amine + acyl chloride/carboxylic acid derivative coupling. Protecting groups like Boc that are stable under the reaction conditions should remain intact in the precursor molecules.
[sai-00005] helpful=1 harmful=3 :: For molecules containing a beta-ketoamide group (NC(=O)CC(=O)R), perform a retrosynthetic disconnection to the corresponding amine and a beta-keto acid or beta-keto ester. The beta-keto ester is often a more stable and practical synthetic equivalent than the beta-keto acid. This disconnection is typically higher priority than protecting group removal.
[sai-00008] helpful=0 harmful=1 :: In reduction retrosynthesis, when a primary alcohol is directly attached to a heteroatom (especially nitrogen), prioritize its disconnection to a carboxylic acid precursor (e.g., change CCO to CC(=O)O). This takes precedence over reducing other functional groups like ketones, which yield secondary alcohols. Verify the oxidation state change: carboxylic acid (highest) to primary alcohol (reduced).
[sai-00009] helpful=3 harmful=2 :: When performing retrosynthetic disconnection of ether linkages via heteroatom alkylation, carefully evaluate nucleophile strength. For heterocycles containing both nitrogen and oxygen nucleophilic sites (e.g., pyrazoles), prioritize disconnections where nitrogen acts as the nucleophile over oxygen, especially when the nitrogen is adjacent to electron-donating groups. Verify the leaving group in the electrophilic component aligns with common practice (bromide is often preferred over chloride).
[sai-00012] helpful=1 harmful=1 :: For Boc protection retrosynthesis in heterocyclic systems, carefully examine the nitrogen hybridization. Boc protects primary or secondary amines, not tertiary amines. If the protected nitrogen is part of a ring, verify the unprotected precursor has a secondary amine (-NH-) or primary amine (-NH2) group, not a tertiary nitrogen. Use structural analysis (e.g., checking for hydrogen atoms on nitrogen in SMILES) to confirm.
[sai-00015] helpful=2 harmful=2 :: For azide ions in SMILES notation, use the standard representation [N-]=[N+]=[N-] with the negative charge on the terminal nitrogen. This accurately reflects the resonance structure where the terminal nitrogen carries the formal negative charge, which is particularly important when representing sodium azide (NaN3) in reactions like [3+2] cycloadditions for tetrazole formation.
[sai-00016] helpful=0 harmful=0 :: For retrosynthetic transformations involving nitro groups (e.g., reduction to amines), use the standard SMILES representation [N+](=O)[O-] for aromatic nitro groups. While notations like N(=O)=O are chemically equivalent, [N+](=O)[O-] is the conventional format and ensures consistency with typical cheminformatics tools and databases.
[sai-00017] helpful=1 harmful=0 :: For α-functionalized phosphonate esters (e.g., Horner-Wadsworth-Emmons reagents), recognize that electrophilic halogenation (using NBS or Br2) of the phosphonate carbanion is the standard synthesis method. The Arbuzov reaction (phosphite + alkyl halide) is used to form the phosphonate ester itself, not to introduce halogens at the α-position.
[sai-00019] helpful=0 harmful=1 :: In reduction retrosynthesis, systematically evaluate all possible reduction transformations, not just functional group interconversions. When a saturated carbon chain (C-C) is present in the target molecule, consider it may originate from hydrogenation of an alkene precursor (C=C). This alkene-to-alkane reduction is a common transformation that often takes precedence over reducing other functional groups like carboxylic acids or aldehydes.
[sai-00022] helpful=1 harmful=0 :: For oxime functional groups (C=NO), perform a retrosynthetic functional group interconversion (FGI) to the corresponding carbonyl compound (aldehyde C=O or ketone C(C)=O) and hydroxylamine (NO). This transformation is a key step in synthesizing oximes from carbonyl precursors. Always verify the carbonyl type (aldehyde if attached to H or aryl, ketone if attached to two alkyl/aryl groups) in the precursor.
[sai-00023] helpful=0 harmful=2 :: For secondary amide bond disconnection (where nitrogen is substituted with one alkyl/aryl group, NC(=O)R), the retrosynthetic transformation should yield an amine precursor with a free primary amine group (NH2) and a carboxylic acid precursor (C(=O)OH). This is the standard disconnection for amides formed via acylation reactions, and the carboxylic acid is a valid precursor (often activated in the forward synthesis).
[sai-00024] helpful=1 harmful=3 :: In Functional Group Addition (FGA) retrosynthesis, prioritize disconnection at the most recently added functional group (e.g., halogens like bromine in CBr, hydroxy groups, nitro groups) rather than core bond formations like amides or ethers. Identify groups that are typically introduced via specific reagents (bromination, hydroxylation, nitration) and revert them to their pre-addition state (e.g., CBr to CH, OH to H, NO2 to H) while pairing with the appropriate reagent (e.g., NBS for bromination).
[sai-00026] helpful=1 harmful=0 :: When generating SMILES for cyclic compounds, ensure proper ring closure notation by incorporating functional groups into the ring system. For lactams and other heterocycles, the carbonyl group should be part of the ring closure (e.g., use O=C1NCc2ccc([N+](=O)[O-])cc21, not O=C1N([H])Cc2ccc([N+](=O)[O-])cc2C1=O). Ring closures are indicated by matching numbers, and all atoms in the ring should form a continuous path without duplicating atom labels.
[sai-00027] helpful=0 harmful=0 :: For aromatic nitro groups in SMILES notation, prefer the O=[N+]([O-]) format over [N+](=O)[O-] when the nitro group is attached to an aromatic system. While both representations are chemically equivalent, the oxygen-first notation (O=) is more commonly used in standard cheminformatics tools and databases for aromatic nitro compounds.
[sai-00028] helpful=0 harmful=1 :: Before applying a deprotection strategy (e.g., Boc removal), rigorously verify the presence of the protecting group in the target molecule. For Boc deprotection, confirm the carbamate group [C(=O)OC(C)(C)C]-N- is present. If absent (e.g., the carbonyl is part of an amide O=C(NR)), consider alternative deprotection types (e.g., benzyl ether, silyl ether) or reassess the reaction type. Always parse the SMILES connectivity to avoid misinterpreting functional groups.
[sai-00030] helpful=7 harmful=2 :: When the reaction context specifies 'Acylation and related processes,' interpret this as requiring an activated acylating agent (e.g., acid chloride, acid anhydride) rather than a direct carboxylic acid for ester or amide formation. Acylation reactions typically use these more reactive species to efficiently couple with alcohols or amines, especially with complex or sterically hindered partners.
[sai-00032] helpful=4 harmful=2 :: In heteroatom alkylation and arylation reactions, prioritize disconnection at tertiary amine nitrogens (formed by alkylation of secondary amines) over N-aryl bonds. Tertiary amines typically result from alkylation events, while N-aryl bonds may form through different mechanisms (e.g., SNAr). Evaluate which disconnection yields more feasible and common precursors - secondary amines like piperidine are preferred nucleophiles over heterocyclic NH groups for alkylation.
[sai-00034] helpful=0 harmful=0 :: When generating SMILES for molecules with multiple identical substituents (e.g., phosphonium salts with multiple phenyl groups), ensure each aromatic ring or cyclic system has a unique atom index to avoid ambiguity. For example, use distinct labels like c2ccccc2, c3ccccc3 for each phenyl group attached to phosphorus, rather than reusing the same index (c1). This prevents chemically invalid representations and ensures the structure is unambiguous.
[sai-00035] helpful=1 harmful=0 :: When generating SMILES for halogenated heterocyclic aromatic compounds, pay attention to conventional ring closure notation. For substituents on aromatic rings, particularly halogens like iodine or bromine, the standard representation typically places the halogen outside the ring closure notation (e.g., 'cc1I' for 2-chloro-5-iodopyridine) rather than explicitly showing it within the ring ('c(I)c'). While chemically equivalent, consistent formatting with standard cheminformatics conventions ensures answer compatibility.
[sai-00036] helpful=0 harmful=0 :: For phenolic alcohol retrosynthesis via ester reduction, prefer benzoate esters (OC(=O)c1ccccc1) over acetate esters (OC(=O)C) as precursors. Benzoate esters are more commonly used in synthetic practice for phenolic groups due to their stability, ease of reduction, and regioselectivity considerations in complex molecules.
[sai-00038] helpful=1 harmful=2 :: In retrosynthesis of protected amino acid derivatives, recognize that removing protecting groups (like Boc) typically yields the amino acid in its zwitterionic form with a protonated ammonium group ([NH3+]), not a neutral amine (N). This protonation state is standard under aqueous or acidic conditions used in protection/deprotection chemistry.
[sai-00040] helpful=0 harmful=0 :: In SMILES notation, always use uppercase 'C' for aliphatic carbon atoms (e.g., methyl groups as C) and lowercase 'c' for aromatic carbon atoms. This distinction is critical for valid chemical representation - for example, a methyl substituent on an aromatic ring should be written as Cc1ccccc1, not cC1ccccc1 (which would incorrectly represent the methyl carbon as aromatic).
[sai-00042] helpful=0 harmful=0 :: When evaluating retrosynthetic answers, recognize that different SMILES notations can represent chemically equivalent compounds, particularly for ionic species like sulfonate salts. For sodium alkyl sulfonates, both 'BrCCCS(=O)(=O)[O-].[Na+]' and 'O=S(=O)([O-])CCCBr.[Na+]' are valid representations of the same molecule. Focus on chemical connectivity and functional groups rather than superficial string differences when comparing predicted and expected precursors.
[sai-00043] helpful=0 harmful=1 :: In retrosynthetic analysis of molecules containing lactams (cyclic amides), prioritize hydrolysis of the lactam as the primary FGI transformation over modifications to secondary functional groups like phenolic OH. Lactam hydrolysis breaks the ring to form the corresponding amino acid precursor, which is a fundamental transformation for amide functional groups and often takes synthetic precedence.
[sai-00046] helpful=0 harmful=1 :: For pyrrole heterocycle formation with specific substitution patterns (non-fused or with alkyl substituents), prioritize Paal-Knorr synthesis using a 1,4-dicarbonyl compound and a primary amine. This approach is particularly appropriate when the target pyrrole lacks the fused benzene ring characteristic of indoles formed via Fischer indole synthesis.
[sai-00049] helpful=0 harmful=0 :: For 1,3-dioxolane rings attached to substituted aromatic systems, consider epoxidation and nucleophilic ring closure as a retrosynthetic strategy instead of ketalization. If the dioxolane ring contains a quaternary carbon and ether linkages (e.g., -OCH2- groups), disconnect to a styrene derivative precursor (with alkene C=C) and a peracid (e.g., mCPBA) for epoxidation, followed by intramolecular ring closure by the phenolic oxygen. Retain all aromatic substituents (e.g., methoxy, benzyloxy) in the precursors, as they are not protecting groups but integral to the structure.
[sai-00052] helpful=0 harmful=2 :: For amine synthesis via reduction, consider azide reduction as a primary FGI strategy when the target contains a free amine that could reasonably originate from an azide precursor. In retrosynthesis, convert the amine back to an azide group ([N+]=[N-]) rather than assuming other transformations like protecting group removal. This is particularly important when the amine is secondary or primary and other functional groups (like Boc carbamates) are present and should remain intact.
[sai-00054] helpful=3 harmful=1 :: When performing retrosynthetic analysis for oxidation reactions (e.g., sulfide to sulfoxide, alcohol to aldehyde/ketone), always identify both the reduced form of the organic substrate AND the oxidizing reagent required. Oxidation is fundamentally a bimolecular process requiring an electron acceptor (common oxidants include mCPBA for sulfoxides, Cr(VI) reagents for alcohol oxidations). Return both precursors separated by a period in the SMILES output.
[sai-00055] helpful=3 harmful=2 :: For heteroatom alkylation retrosynthesis involving heterocycles like imidazoles, meticulously analyze the ring substitution pattern and atom ordering in SMILES. Verify that the unprotected precursor (e.g., secondary amine -NH- form) matches the target's substitution, paying close attention to the position of substituents relative to the nitrogen being alkylated. Incorrect atom ordering can lead to invalid precursors.
[sai-00057] helpful=2 harmful=0 :: In retrosynthetic analysis of molecules with alkynes attached to aromatic rings, always evaluate the possibility of a Sonogashira coupling disconnection. Look for a leaving group (e.g., bromine) on the aromatic ring in the precursor. The bond between the alkyne carbon and the aromatic carbon is a prime target for this palladium-catalyzed cross-coupling reaction between an aryl halide and a terminal alkyne.
[sai-00058] helpful=0 harmful=0 :: When disconnecting C-C bonds involving alcohols, rigorously verify the alcohol type (primary, secondary, tertiary). A tertiary alcohol (carbon with OH and three alkyl groups) cannot be directly formed via alkynylation of a ketone; such a transformation would require a nucleophilic addition to a ketone followed by dehydration or other steps. Misidentifying a tertiary alcohol as a ketone derivative leads to invalid retrosynthetic steps.
[sai-00061] helpful=3 harmful=2 :: When evaluating multiple functional groups for retrosynthetic disconnection, assess both synthetic accessibility and typical reaction timing. Highly reactive functionalities like chloromethyl groups (-CH2Cl) are typically introduced late in syntheses via chlorination of methyl groups (using SOCl2, Cl2, etc.) and should take precedence over more stable groups like azides in FGA analysis. The heuristic 'azides are commonly added via SN2' should not override the principle that reactive, unstable functionalities are usually introduced later.
[sai-00063] helpful=0 harmful=0 :: When analyzing molecules with both alkyne-aryl bonds and biaryl bonds, prioritize disconnection of the biaryl bond via Suzuki-Miyaura coupling if one aromatic system contains a halogen leaving group (Cl, Br, I). Suzuki coupling is generally more robust for biaryl formations than Sonogashira for alkyne-aryl bonds, especially in complex systems with multiple substituents. Always scan aromatic rings for existing halogens to identify strategic disconnection sites.
[sai-00065] helpful=0 harmful=3 :: When performing retrosynthetic analysis for heterocycle formation reactions (e.g., Hantzsch thiazole synthesis, Paal-Knorr pyrrole synthesis), systematically map all substituents on the heterocyclic ring to their corresponding precursors. Each atom in the heterocycle originates from specific components in the forward synthesis, so every substituent must be accounted for in the appropriate precursor molecule. This prevents errors where substituents are incorrectly assumed to be 'already present' rather than being incorporated via one of the starting materials.
[sai-00066] helpful=0 harmful=2 :: For Hantzsch thiazole synthesis retrosynthesis, recognize that: (1) The carbon atom between nitrogen and sulfur (position 5) originates from the α-carbon of the α-halocarbonyl component, (2) The carbon adjacent to nitrogen (position 2) also comes from the α-halocarbonyl component, meaning any substituent at this position must be present in the α-halocarbonyl precursor, (3) The carbon adjacent to sulfur (position 4) and the nitrogen atom come from the thioamide component. Always disconnect to an α-halocarbonyl that contains all substituents needed for positions 2 and 5, and a thioamide that provides the substituent for position 4.
[sai-00067] helpful=0 harmful=0 :: In deprotection retrosynthesis, the target molecule represents the deprotected state. Work backward by identifying functional groups that are typically protected (e.g., phenolic OH, alcohols, amines) and add the appropriate protecting group to create the precursor. For phenolic OH groups (Oc1...), the common protecting group is methyl ether, so the precursor should have COc1... (methoxy group).
[sai-00069] helpful=0 harmful=2 :: When generating SMILES for retrosynthesis precursors involving multiple components separated by periods, recognize that the order of reactants typically does not affect chemical correctness. For heteroatom alkylation and other bimolecular reactions, focus on verifying the chemical connectivity and functional groups of each component rather than strictly matching the ordering convention used in ground truth answers, as different ordering represents the same chemical system.
[sai-00070] helpful=0 harmful=0 :: When evaluating SMILES representations for heterocyclic amines, recognize that different tautomeric forms (e.g., 'nn([H])' vs 'n[nH]') can represent chemically equivalent compounds. The key principle is to focus on chemical connectivity and functional groups rather than superficial string differences, as tautomeric forms of the same compound are functionally equivalent in retrosynthetic analysis.
[sai-00071] helpful=3 harmful=0 :: For molecules containing vinyl groups (C=C) attached to aromatic rings, prioritize Wittig reaction disconnection at the vinyl-aromatic bond. The precursors are an aromatic aldehyde (with other functional groups like acetals intact) and a phosphonium ylide (typically represented by its phosphonium salt precursor). This disconnection takes precedence over protecting group manipulations when the vinyl group is the key C-C bond formed in the synthesis.
[sai-00074] helpful=1 harmful=1 :: When analyzing tetrazole-containing molecules, recognize that tert-butyl groups (CC(C)(C)) attached to the tetrazole nitrogen are typically permanent substituents, not Boc protecting groups. Tetrazoles commonly feature tert-butyl as a stable substituent to modulate properties like lipophilicity and metabolic stability, unlike Boc groups which are temporary protections for amines.
[sai-00075] helpful=1 harmful=0 :: In deprotection retrosynthesis, prioritize identifying the most acidic or reactive functionality that would require protection. Carboxylic acids (C(=O)O) are commonly protected as methyl esters (COC(=O)) in precursors, requiring acidic deprotection. This takes precedence over analyzing tert-butyl groups that may be permanent substituents rather than protecting groups.
[sai-00078] helpful=0 harmful=0 :: When analyzing carboxylic acid deprotection retrosynthesis, carefully parse the SMILES notation to determine the alpha-carbon substitution pattern. For CC(C(=O)O), this represents a carboxylic acid attached to a carbon that has two methyl groups (quaternary alpha-carbon), indicating 2-methylpropanoic acid. This differs from C(C)C(=O)O which would represent a secondary carbon. Accurate parsing prevents misidentification of ester protecting groups - quaternary alpha-carbons typically come from tert-butyl esters, while secondary alpha-carbons can come from various esters including ethyl.
[sai-00081] helpful=3 harmful=0 :: When selecting acylating agents for complex amine substrates with sensitive functional groups or stereocenters, prefer anhydrides (e.g., acetic anhydride) over acid chlorides. While acid chlorides are more reactive, anhydrides offer better selectivity and milder conditions that minimize side reactions with delicate molecular frameworks.
[sai-00082] helpful=0 harmful=3 :: Never specify stereochemistry at carbons adjacent to primary amine nitrogen atoms in retrosynthetic precursors. Due to rapid nitrogen inversion, these positions cannot be stereocenters in primary amines. Always use achiral notation (e.g., CC(N) instead of C[C@H](N)) for such carbons to avoid chemically invalid representations.
[sai-00084] helpful=0 harmful=0 :: In Functional Group Interconversion (FGI) retrosynthesis, carefully evaluate which specific functional groups actually require transformation. Not all functional groups in the target molecule need modification; some may already be in their desired synthetic state. Always validate whether a functional group conversion is necessary based on the specific FGI context before applying any transformation.
[sai-00085] helpful=0 harmful=0 :: When interpreting carboxylic acid groups in SMILES notation, recognize that 'CO' or 'COC' patterns can represent carboxylic acids in specific structural contexts, particularly when part of complex ring systems. The standard 'C(=O)O' representation may not always be used. Carefully analyze atom connectivity to distinguish carboxylic acids from esters - esters typically involve 'COC(=O)' notation while carboxylic acids may use simpler notations in constrained structural environments.
[sai-00086] helpful=4 harmful=8 :: In Functional Group Addition (FGA) retrosynthesis, systematically identify ALL functional groups that were likely added in the forward synthesis, not just the most obvious one. For each added group (e.g., benzyl bromine, cyano groups), determine both the precursor molecule before addition AND the specific reagent required. Common transformations like benzyl bromination require NBS as the brominating agent, while cyano groups typically require specific cyanation reagents. The precursor should represent the molecule before functional group installation, and the reagent should be included as a separate component in the SMILES output.
[sai-00087] helpful=1 harmful=3 :: When performing FGA retrosynthesis for molecules with multiple added functional groups, analyze the synthetic timing and compatibility. Highly reactive groups like benzyl bromides are typically introduced late in syntheses via specific reagents (e.g., NBS for benzylic bromination), while more stable groups like cyano substituents may be installed earlier. The retrosynthetic analysis should reflect this logical sequence by identifying the appropriate precursor states and reagents for each transformation.
[sai-00088] helpful=5 harmful=1 :: For common reagents like N-bromosuccinimide (NBS), always use the standard SMILES representation (O=C1CCC(=O)N1Br) rather than attempting to deconstruct or build them from components. NBS is a cyclic imide where bromine is attached to the nitrogen atom in a five-membered succinimide ring. This ensures chemical accuracy and consistency with ground truth representations.
[sai-00089] helpful=2 harmful=3 :: In oxidation retrosynthesis, systematically analyze each oxidized functional group to determine if it was likely formed in the specified oxidation step. Not all highly oxidized groups (sulfones, nitro groups, etc.) are necessarily products of the immediate reaction - some may be pre-existing in the precursor. Focus on identifying the specific oxidation transformation described (e.g., alcohol to ketone, sulfide to sulfoxide) and only apply the reverse reduction to those groups. Other oxidized functionalities should remain unchanged in the precursor.
[sai-00091] helpful=5 harmful=4 :: In retrosynthesis tasks, distinguish between precursor reactants (organic substrates) and reagents. For oxidation reactions, provide only the reduced form of the organic molecule as the precursor. Oxidizing agents (e.g., Dess-Martin periodinane, mCPBA) are reagents used in forward synthesis but are not considered 'precursor reactants' in retrosynthesis outputs. This convention applies specifically when the task asks for 'precursor reactants' for a 'single-step retrosynthesis reaction.'
[sai-00093] helpful=3 harmful=0 :: For beta-ketoamide disconnections (NC(=O)CC(=O)R), avoid using acid chlorides as acylating agents due to the instability of beta-keto acid chlorides which are prone to enolization and decomposition. Instead, use beta-keto esters protected with tert-butyl groups (OC(C)(C)C) as stable synthetic equivalents. The tert-butyl ester provides both activation compatibility and prevents decarboxylation during handling and storage.
[sai-00095] helpful=0 harmful=0 :: When interpreting complex fused heterocycles in SMILES notation, systematically validate ring closure patterns and atom connectivity. For benzimidazolinone derivatives, carefully distinguish between standard benzimidazol-2-one (O=C1NC2=CC=CC=C2N1) and saturated variants containing CH2 groups (e.g., O=C1CNc2ccccc2N1). Pay close attention to 'CC(=O)N' patterns in SMILES which may indicate a methylene group rather than a direct ring fusion.
[sai-00096] helpful=0 harmful=0 :: For acylation disconnections, recognize that carboxylic acids (C(=O)O) can serve as valid acylating precursors when appropriate coupling agents (e.g., DCC, HATU, EDC) are used in the forward synthesis. While acid chlorides are common, carboxylic acids are frequently employed in modern peptide coupling and amide formation strategies, especially with sensitive functional groups present.
[sai-00099] helpful=0 harmful=1 :: In reduction retrosynthesis, carefully distinguish between aromatic nitrogen atoms (represented by lowercase 'n' in SMILES) and amine functional groups. Aromatic nitrogens are part of ring systems and should not be modified in reduction transformations, while amines (typically from nitro reduction) should be converted back to nitro groups. Always verify the reduction site by identifying alcohol groups (CCO) that likely originated from carbonyl precursors (CC(=O)O for esters), not by modifying aromatic heterocyclic systems.
[sai-00101] helpful=0 harmful=0 :: When disconnecting ether linkages in molecules with multiple oxygen nucleophiles, systematically evaluate nucleophilicity hierarchy: aliphatic OH > heterocyclic OH > phenolic OH. Phenolic oxygens are particularly poor nucleophiles when sterically hindered or part of electron-rich aromatic systems with other substituents. Prioritize disconnection at the most nucleophilic oxygen site, which is typically the heterocyclic hydroxy group over phenolic hydroxy in complex systems.
[sai-00102] helpful=0 harmful=2 :: For ether disconnections involving aromatic systems, carefully evaluate benzylic positions in electrophilic components. Benzylic halides (especially bromides) are highly reactive electrophiles due to adjacent aromatic stabilization of the transition state. When a leaving group is located on a benzylic carbon, this indicates a preferred site for SN2 reactions and should be prioritized in retrosynthetic analysis over less reactive alkyl halides.
[sai-00104] helpful=0 harmful=0 :: For oxidation retrosynthesis targeting aromatic aldehydes, specifically convert the aldehyde group (-CHO) to a primary alcohol (-CH2OH) while maintaining the aromatic ring structure. This applies when the aldehyde is directly attached to an aromatic carbon, as aromatic aldehydes are typically synthesized from aromatic primary alcohols through oxidation. Verify that only the organic precursor is output, excluding oxidizing agents from the SMILES string.
[sai-00105] helpful=0 harmful=0 :: For primary amide retrosynthesis (where nitrogen is attached to two hydrogens, NC(=O)), prioritize disconnection to an ester precursor (C(=O)OC) and ammonia (N) rather than carboxylic acid (C(=O)O). Primary amides are commonly synthesized via ammonolysis of esters, making this the synthetically relevant pathway. Verify the amide type by checking nitrogen connectivity - primary amides have NH2 group while secondary amides have NHR.
[sai-00106] helpful=0 harmful=0 :: When generating SMILES for Boc deprotection precursors in fused heterocyclic systems (e.g., indole-pyrrolidine fused rings), validate the hydrogenation state of the nitrogen atom. The unprotected amine must have an explicit hydrogen (e.g., 'N[H]' or 'n1CC[nH]2...') for secondary amines. Avoid simply removing the Boc group from the target SMILES; instead, reconstruct the precursor SMILES to ensure correct ring connectivity and atom ordering, as standard SMILES notation may not preserve hydrogen states during string manipulation.
[sai-00108] helpful=0 harmful=2 :: For oxidation retrosynthesis of aldehydes in complex fused ring systems (bicyclic, tricyclic, etc.), apply the same transformation as for aromatic aldehydes: convert the carbonyl group (-CHO) to a primary alcohol (-CH2OH) while preserving the entire ring structure and connectivity. This universal approach works for both aliphatic and aromatic aldehydes, ensuring the organic precursor maintains the same carbon skeleton without oxidizing agents.
[sai-00109] helpful=0 harmful=0 :: For tetrazole synthesis via [3+2] cycloaddition between azide and nitrile, the nitrile component must contain the entire R group substituent that will be attached to the tetrazole nitrogen in the final product. The azide component is typically the simple azide ion ([N-]=[N+]=[N-]) or sodium azide. This disconnection is fundamental: the carbon atom of the nitrile becomes the carbon attached to the tetrazole ring, while the azide provides the four nitrogen atoms.
[sai-00111] helpful=0 harmful=0 :: For reduction retrosynthesis, systematically identify all functional groups that are commonly formed through reduction (primary/secondary amines from nitro groups, alcohols from carbonyls, alkyl chains from alkenes) and apply the reverse transformation to create precursors. Start by converting amines back to nitro groups ([N+] to [N+](=O)[O-]), alcohols to carbonyls, and saturated carbons to alkenes, while preserving non-reducible functional groups and aromatic systems.
[sai-00112] helpful=0 harmful=0 :: When generating SMILES for phosphonate precursors containing alkenes, meticulously preserve the alkene stereochemistry using directional bond notation (/C=C/ for trans, \C=C/ for cis). For vinylphosphonates, ensure the phosphorus atom is bonded to a carbon atom (typically -CH2P=O or -CH=CH-P=O structures), not directly to alkene carbons. Validate atom connectivity by comparing carbon skeleton preservation between target and precursor.
[sai-00114] helpful=0 harmful=0 :: When performing reduction retrosynthesis, systematically compare the saturation states between target and precursor carbon chains. Look for fully saturated segments (C-C) in the target that likely originated from hydrogenation of alkenes (C=C) in the precursor. Simultaneously verify protecting group stability - silyl ethers (COSi) typically remain intact under hydrogenation conditions and should be preserved in both target and precursor. This holistic comparison prevents misidentification of reduction sites.
[sai-00115] helpful=1 harmful=0 :: For oxime groups attached to aromatic rings (e.g., c1cc(C=NO)ccc1), the carbonyl precursor is typically an aromatic aldehyde (c1cc(C=O)ccc1) rather than a ketone. Aromatic aldehydes are common precursors for oxime formation due to their accessibility and reactivity. Always verify the aromatic ring connectivity is preserved in the aldehyde precursor.
[sai-00116] helpful=0 harmful=0 :: In retrosynthetic analysis, always prioritize disconnection to the simplest, most fundamental precursors rather than synthetic intermediates. For amide bonds formed via acylation, the fundamental disconnection is amine + carboxylic acid, not amine + activated acylating agent (e.g., acid chloride). While acid chlorides are commonly used in forward synthesis, the carboxylic acid is the appropriate retrosynthetic precursor that would be activated in the synthetic pathway.
[sai-00117] helpful=0 harmful=0 :: In Functional Group Addition (FGA) retrosynthesis, prioritize validating the chemical transformation logic (e.g., correct identification of the precursor state and reagent like NBS for bromination) over exact SMILES string matching with ground truth. Multiple valid SMILES representations exist for the same molecule due to different atom ordering or notation conventions. The precursor molecule's connectivity and functional groups must be chemically equivalent to the expected answer, even if the SMILES string differs superficially.
[sai-00118] helpful=0 harmful=1 :: When analyzing Boc protection in molecules with multiple nitrogen-containing functional groups, carefully distinguish between amide nitrogens (including lactams) and free amine nitrogens. Amide nitrogens (part of O=C-NR) cannot be Boc-protected as they are already acylated; Boc protection only applies to free primary or secondary amines. Identify the exact site of protection by locating the carbamate group [C(=O)OC(C)(C)C]-N- in the target and ensure the unprotected precursor has a free NH or NH2 group at that specific position.
[sai-00120] helpful=0 harmful=0 :: In functional group interconversion (FGI) retrosynthesis, always preserve the entire molecular scaffold unchanged and only modify the specific functional group being transformed. For transformations like nitro-to-amine reduction reversal, convert the functional group (e.g., -NH2 to -NO2) while maintaining identical ring systems, connectivity, and substituent patterns. Validate that the precursor SMILES differs from the target only at the functional group site, not in the core structure.
[sai-00121] helpful=0 harmful=0 :: For piperazine rings in deprotection retrosynthesis, meticulously analyze both nitrogen atoms: one is typically tertiary (amide nitrogen if part of O=C-N- linkage) and the other may be tertiary (alkyl-substituted) or secondary (-NH-). Only secondary amines require Boc protection; tertiary amines cannot be protected. Verify hybridization by checking for hydrogen atoms on nitrogens in SMILES notation - tertiary nitrogens have no explicit hydrogens while secondary amines have [nH] or N[H] notation.
[sai-00124] helpful=0 harmful=0 :: For ester retrosynthesis, meticulously analyze the carbon atom directly attached to the ester oxygen to determine the alcohol precursor's structure. If the carbon is methine (CH, secondary carbon), the alcohol precursor must be a secondary alcohol; if methylene (CH2, primary carbon), it must be a primary alcohol. Trace atom connectivity carefully to avoid adding extraneous carbon atoms or misidentifying hybridization.
[sai-00126] helpful=3 harmful=1 :: For heteroatom alkylation retrosynthesis, systematically evaluate all nitrogen atoms to identify N-alkyl bonds formed by nucleophilic substitution. Prioritize disconnection at tertiary amine nitrogens (formed by alkylation of secondary amines) when the target contains potential leaving groups (Cl, Br, I) on the alkyl chain attached to nitrogen. The electrophilic precursor should retain the leaving group, while the nucleophilic precursor should be the corresponding secondary amine. This approach takes precedence over N-aryl bond formation via SNAr unless clear activating groups (e.g., nitro, cyano) are present on the aryl ring.
[sai-00129] helpful=0 harmful=2 :: In Wittig retrosynthesis, represent phosphonium salts as ionic compounds (e.g., C[P+](c1ccccc1)(c1ccccc1)c1ccccc1) rather than decomposing them into alkyl halide + triphenylphosphine components. The phosphonium salt is the immediate precursor to the ylide and should be shown intact in retrosynthetic analysis, maintaining proper atom connectivity and charge representation.
[sai-00131] helpful=2 harmful=0 :: In Sonogashira coupling retrosynthesis, always disconnect the bond between the aromatic ring and alkyne carbon to yield a terminal alkyne precursor and an aryl halide precursor. Prioritize disconnection such that the terminal alkyne comes from the simplest, most commercially available compound. For heteroaromatic systems like pyridines, prefer iodine over chlorine as the leaving group due to better reactivity in palladium-catalyzed couplings.
[sai-00133] helpful=0 harmful=0 :: In reduction retrosynthesis, systematically evaluate all oxygen atoms to identify potential protective groups. Phenolic OH groups are commonly protected as esters (especially benzoates OC(=O)c1ccccc1) that can be reduced to reveal the phenol. This takes precedence over reducing ether linkages, which are typically stable under reduction conditions. Always check if a phenolic oxygen in the target might originate from ester reduction rather than assuming ether modification.
[sai-00136] helpful=0 harmful=0 :: For fused heterocyclic systems containing a hydrazine-derived nitrogen bridge (N connecting aromatic and heterocyclic rings), prioritize retrosynthetic disconnection to a phenylhydrazine precursor (aromatic ring with NN group) and a carbonyl compound (ketone or aldehyde). This approach is characteristic of Fischer indole-type syntheses and takes precedence over Paal-Knorr pyrrole synthesis when the target shows indole-like fusion patterns with the hydrazine nitrogen serving as the bridging atom.
[sai-00138] helpful=0 harmful=1 :: For ether formation via heteroatom alkylation involving phenolic compounds, correctly identify the phenolic oxygen (when deprotonated to phenoxide) as the nucleophile and an alkyl halide as the electrophile. The leaving group (typically bromide) should be on the alkyl chain, not on the aromatic ring. This standard SN2 approach takes precedence over nucleophilic aromatic substitution (SNAr) unless strong electron-withdrawing groups are ortho/para to the aromatic halide.
[sai-00140] helpful=0 harmful=0 :: For Functional Group Interconversion (FGI) of lactams, avoid hydrolysis disconnections that break C-N bonds. Instead, consider transformations that preserve the carbon skeleton, such as reduction of the carbonyl to an alcohol or amine, or formation from a hydroxy compound via dehydration (e.g., hemiaminal to lactam). FGI should only modify the functional group without altering core connectivity.
[sai-00001] helpful=0 harmful=0 :: When analyzing nitrogen-containing heterocycles, systematically verify the fusion pattern before selecting retrosynthetic approach. For true indole systems (benzopyrrole with nitrogen fused directly to benzene ring), use Fischer indole synthesis with phenylhydrazine and carbonyl precursors. For pyrrole rings connected to aromatic systems via single bonds (not fused), use Paal-Knorr synthesis with 1,4-dicarbonyl compounds and primary amines. Always check if the nitrogen is part of a fused ring system or connected via a single bond to determine the appropriate synthetic route.
[sai-00002] helpful=0 harmful=0 :: When analyzing heterocycle formation involving quaternary carbons (e.g., in 1,3-dioxolane rings), rigorously verify the alkene precursor's substitution pattern. A quaternary carbon in the cyclic structure requires a trisubstituted alkene precursor where one substituent is typically methyl (C=C(C)) to generate the quaternary center upon ring closure. This takes precedence over assuming unsubstituted styrene derivatives, which cannot form quaternary carbons.
[sai-00004] helpful=11 harmful=5 :: When performing FGI retrosynthesis, systematically evaluate ALL functional groups in the target molecule for potential transformations, not just the most prominent ones. Create a mental checklist: amines (could come from azide reduction, nitro reduction, or deprotection), carbonyls (could come from alcohol oxidation), alcohols (could come from carbonyl reduction), etc. This prevents fixation on a single transformation type like protecting group removal when other FGIs are more appropriate.
[sai-00005] helpful=0 harmful=1 :: For amine functional groups in retrosynthesis, specifically consider azide reduction ([N-]=[N+]=N to NH) as a primary FGI option when other protecting groups (like Boc) are present and should remain intact. Azide reduction is a common transformation for introducing amines and should be prioritized over protecting group removal when the amine could reasonably originate from an azide precursor.
[sai-00007] helpful=1 harmful=1 :: In retrosynthetic analysis, the term 'precursor reactants' refers to all molecules that are consumed in the forward reaction to form the target. For oxidation reactions, this includes both the reduced form of the organic substrate and the oxidizing agent (e.g., mCPBA, CrO3, H2O2). Both must be included in the output SMILES, separated by a period. This principle applies universally to any transformation where a reagent is stoichiometrically consumed, not just oxidations.
[sai-00008] helpful=0 harmful=0 :: When generating SMILES for heterocyclic precursors in retrosynthesis (e.g., for heteroatom alkylation), preserve the exact ring atom ordering from the target molecule. Only modify the specific atom involved in the transformation (e.g., change alkylated nitrogen 'n1' to protonated nitrogen '[nH]1') without rearranging other ring atoms or substituents. This ensures the substitution pattern remains identical between target and precursor.
[sai-00010] helpful=1 harmful=0 :: For terminal alkynes in SMILES notation, use 'C#C' without an explicit hydrogen prefix. The terminal hydrogen is implied by the triple bond at the end of the carbon chain. Representations like 'HC#C' are incorrect and indicate a misunderstanding of SMILES conventions. This is particularly important in Sonogashira coupling precursors where the terminal alkyne component must be correctly represented.
[sai-00011] helpful=6 harmful=0 :: In Functional Group Addition (FGA) retrosynthesis for single-step transformations like halogenation or oxidation, output only the SMILES of the organic precursor molecule(s) that directly lead to the target upon functional group addition. Reagents (e.g., Cl2 for chlorination, NBS for bromination) are excluded unless the reaction type inherently requires multiple organic precursor components (e.g., nucleophile-electrophile pairs in substitution or coupling reactions). This avoids confusion between retrosynthetic precursors and forward synthesis reagents.
[sai-00012] helpful=0 harmful=0 :: In retrosynthetic analysis of complex molecules with multiple C-C bonds, implement a hierarchical disconnection approach: prioritize simpler, more accessible bonds (like sp3-sp2 alkyl-aryl connections formed via alkylation) before complex bond formations (like biaryl couplings). The methylene bridge between heterocyclic systems and aromatic rings often represents a simpler C-C bond that should be addressed after strategic disconnections like Suzuki couplings, as alkylation reactions are typically more straightforward and robust than complex cross-couplings in multi-step syntheses.
[sai-00014] helpful=0 harmful=0 :: For Hantzsch thiazole synthesis retrosynthesis with aryl substituents at position 5, ensure the aryl group is part of the α-halocarbonyl precursor (e.g., an α-halo-β-ketoester like ethyl 2-bromo-3-arylacetoacetate) rather than the thioamide. The thioamide should be a simple species like thioacetamide (CC(N)=S) to provide the methyl group at position 4. This prevents misassignment of substituents and aligns with the mechanism where the alpha carbon of the α-halocarbonyl becomes position 5.
[sai-00015] helpful=1 harmful=0 :: For heteroatom alkylation retrosynthesis, identify tertiary amine nitrogens as potential disconnection sites where the alkyl group can be removed to reveal a secondary amine nucleophile. The alkylating agent (typically an alkyl halide) serves as the electrophilic partner. The order of reactants in SMILES output doesn't affect chemical meaning as long as the components are correct.
[sai-00017] helpful=0 harmful=0 :: When analyzing molecules with both vinyl groups and heterocyclic rings bearing alkyl substituents, prioritize disconnection at the alkyl-heterocycle junction over vinyl-aromatic bonds. Substituents on heterocyclic rings (like ethyl groups on dioxolanes) often represent strategic C-C bond formation sites via reactions like Wittig, while vinyl groups may be intrinsic to the scaffold.
[sai-00001] helpful=0 harmful=0 :: When interpreting SMILES strings for retrosynthesis, systematically parse atom connectivity to verify the exact attachment point of functional groups before applying transformations. For carboxylic acids, identify whether the C(=O)O group is attached directly to a ring carbon or to a carbon in a side chain by tracing bonds: count atoms between the carbonyl carbon and ring systems. This prevents misapplication of protection/deprotection heuristics to incorrect molecular positions.
[sai-00002] helpful=0 harmful=0 :: When parsing carboxylic acid SMILES notation, systematically analyze the alpha-carbon substitution pattern: CC(C(=O)O) indicates a carbon atom with one methyl substituent (secondary carbon, isopropyl-like), while CC(C)(C)(C(=O)O) would indicate a carbon with three methyl substituents (quaternary carbon, tert-butyl-like). For secondary alpha-carbons, use methyl or ethyl ester protection; for quaternary alpha-carbons, use tert-butyl ester protection.
[sai-00004] helpful=9 harmful=5 :: For acylation disconnections in complex molecules with sensitive functional groups (e.g., Boc-protected amines) and stereocenters, prefer anhydride acylating agents (e.g., acetic anhydride CC(=O)OC(C)=O) over carboxylic acids. While carboxylic acids can work with coupling agents, anhydrides provide better control, minimize racemization risk, and ensure efficient reaction under milder conditions that preserve delicate molecular frameworks.
[sai-00006] helpful=0 harmful=1 :: When performing carboxylic acid deprotection retrosynthesis, systematically compare ethyl ester (CCOC(=O)) and methyl ester (COC(=O)) protections as possible precursors. Common carboxylic acid deprotections can involve either ester type - do not assume methyl ester by default. Verify the carbon count in the ester alkyl group by carefully parsing the SMILES connectivity between the ester oxygen and the carbonyl carbon.
[sai-00007] helpful=1 harmful=1 :: In deprotection retrosynthesis, carefully analyze which specific functional group transformation occurred by comparing target and precursor structures. For carboxylic acid deprotection, the target should have CC(=O)O while the precursor has CCOC(=O) or COC(=O). For Boc deprotection, the target should have NH while the precursor has NC(=O)OC(C)(C)C. Avoid applying deprotection to groups that remain unchanged between target and precursor.
[sai-00009] helpful=2 harmful=1 :: In oxidation retrosynthesis, systematically evaluate alcohol types before applying oxidation: tertiary alcohols (C attached to OH and three carbons) cannot be oxidized to carbonyls and should be preserved; secondary alcohols (C attached to OH and two carbons) can be oxidized to ketones but only if context supports it (e.g., no competing groups); primary alcohols can be oxidized to aldehydes/carboxylic acids. Prioritize oxidation at sites that are synthetically feasible and align with the reaction context—avoid oxidizing secondary alcohols when the ground truth indicates preservation or when tertiary alcohols are present and might be the intended site for other transformations.
[sai-00010] helpful=1 harmful=0 :: When evaluating SMILES representations for aromatic compounds with multiple substituents, recognize that different atom ordering in the ring notation (e.g., 'c1ccc(C)cc1Br' vs 'c1cc(C)c(Br)c1') can represent chemically identical molecules. Similarly, functional groups like cyano can be written as N#C- or C#N. Focus on verifying the chemical connectivity and substituent positions rather than requiring exact string matching, especially for retrosynthesis tasks where the logical identification of precursors is paramount.
[sai-00011] helpful=5 harmful=0 :: In Functional Group Addition (FGA) retrosynthesis, differentiate between transformations where the reagent is catalytic/excluded (e.g., oxidation with CrO3 where only the reduced organic substrate is output) and transformations where the reagent is a stoichiometric reactant that must be included as a separate precursor. For halogenation with NBS, the precursors are the organic substrate (with allylic hydrogen instead of halogen) AND NBS (O=C1CCC(=O)N1Br), as NBS is consumed stoichiometrically. This applies to other reagent-involved FGAs like diazonium salt formations or Wittig reactions where the reagent is an integral reactant.
[sai-00012] helpful=0 harmful=0 :: In oxidation retrosynthesis, avoid assuming sulfone (S(=O)(=O)) or nitro ([N+](=O)[O-]) groups are necessarily the oxidation site simply because they appear highly oxidized. These groups are often pre-existing and stable. Instead, prioritize ketones (C(=O)) as common oxidation sites from secondary alcohols, especially when context clues like dioxolane protecting groups (OCO2) are present, which often protect ketones generated in recent steps.
[sai-00001] helpful=0 harmful=0 :: When performing oxidation retrosynthesis, first identify which functional groups in the target molecule are typical oxidation products (aldehydes, carboxylic acids, ketones, sulfoxides, sulfones, etc.), then systematically work backward to their reduced precursors. This strategic approach helps distinguish between groups formed via oxidation versus pre-existing oxidized functionalities, ensuring only the appropriate transformations are applied.
[sai-00002] helpful=0 harmful=0 :: For beta-ketoamide disconnections (NC(=O)CC(=O)R), always prefer tert-butyl esters (OC(C)(C)C) over ethyl or methyl esters due to the superior stability of tert-butyl beta-keto esters. Tert-butyl esters prevent decarboxylation and enolization issues that plague other ester types in beta-keto systems, making them the synthetically preferred choice.
[sai-00004] helpful=8 harmful=4 :: For fused heterocyclic systems like benzimidazolones in retrosynthesis, prioritize disconnections that form the heterocyclic ring from simpler linear precursors rather than treating the entire heterocycle as a single unit. Specifically for benzimidazol-2-one systems (O=C1NC2=CC=CC=C2N1), disconnect the internal amide bond to reveal the carboxylic acid and o-phenylenediamine derivative precursors that would cyclize to form the heterocycle. This approach takes precedence over disconnecting amide bonds that connect the heterocycle to other molecular fragments.
[sai-00006] helpful=0 harmful=1 :: When analyzing reduction reactions in retrosynthesis, carefully examine terminal alkoxy groups (especially O-alkyl chains attached to heteroatoms like nitrogen) as potential reduction products of ester groups. In medicinal chemistry, esters (CC(=O)O) are commonly reduced to primary alcohols (CCO) to modify solubility and properties, while aromatic ketones often remain unchanged as structural features. Prioritize converting terminal alkoxy groups back to their ester precursors when they appear in complex molecules with multiple reducible functionalities.
[sai-00007] helpful=1 harmful=1 :: For heteroatom alkylation retrosynthesis involving benzyl ether linkages (O-CH2-Ar), prioritize disconnection at the benzyl ether over N-alkylation of stable heterocyclic substituents like tert-butyl groups. Benzyl ether formation is typically a later synthetic step using reactive electrophiles (e.g., benzyl bromides), while groups like tert-butyl on nitrogen are often pre-existing stable substituents introduced in earlier steps and remain intact during ether formation.
[sai-00009] helpful=2 harmful=0 :: For amides attached to heteroatoms (e.g., N-alkyl amides where the nitrogen is part of a ring or substituent), preserve the heteroatom-containing group as the nucleophile precursor in retrosynthesis. Only convert the carbonyl component to an ester electrophile. The heteroatom group (like N-alkyl) should remain intact as it represents the amine nucleophile that would attack the ester in forward synthesis.
[sai-00011] helpful=5 harmful=0 :: For secondary amines in SMILES notation, especially in ring systems, always ensure explicit hydrogen notation is used (e.g., [H]N or canonical forms like ...NC... for ring nitrogen) to avoid ambiguity. Use standard canonical SMILES representations for molecules, as different notations may be chemically equivalent but explicit forms prevent misinterpretation.
[sai-00012] helpful=0 harmful=0 :: In retrosynthesis outputs, represent ionic reactants (e.g., azide, cyanide, hydroxide ions) without their counterions unless specifically required. Focus on the reactive species that participate in bond formation. For example, use [N-]=[N+]=[N-] for azide rather than [N-]=[N+]=[N-].[Na+] for sodium azide. This convention emphasizes functional group participation over salt representation.
[sai-00013] helpful=0 harmful=0 :: In nitro-to-amine reduction retrosynthesis, preserve all aromatic substituents and ring systems unchanged, including ether linkages (diaryl ethers), halogens (Cl, Br), and alkoxy groups (OCH3). These functional groups are stable under standard reduction conditions (e.g., catalytic hydrogenation, SnCl2/HCl) and should remain intact in both target and precursor molecules.
[sai-00014] helpful=0 harmful=0 :: For phosphonate esters with alkenyl substituents (vinylphosphonates), carefully analyze SMILES connectivity to ensure phosphorus is directly bonded to the alkene carbon. In patterns like 'COP(=O)(/C=C/CBr)OC', the '/C=C/CBr' represents a vinyl group where the alkene is directly attached to phosphorus (P-C=C-Br), not a separate carbon chain. The correct precursor maintains this direct P-C bond while removing the added functional group (bromine) and preserving stereochemistry (/C=C/ for trans).
[sai-00016] helpful=0 harmful=0 :: For reduction retrosynthesis of molecules with ester side chains, specifically check for α,β-unsaturated ester systems where the double bond (indicated by /C=C/ notation in SMILES) would be reduced to a single bond. The precursor should contain the unsaturated ester (e.g., CCCCOC(=O)/C=C/...), while stable protecting groups like silyl ethers remain unchanged.
[sai-00018] helpful=0 harmful=0 :: In Functional Group Addition (FGA) retrosynthesis involving stoichiometric reagents (e.g., NBS for benzylic bromination), the precursor SMILES must include both the organic substrate (with the added group reverted to its precursor state, such as CBr to CH) and the reagent (e.g., O=C1CCC(=O)N1Br for NBS). Different SMILES representations for the same molecule (due to atom ordering, chiral notation, or functional group representation) are chemically equivalent and acceptable, as structural connectivity and functional groups determine validity, not exact string matching.
[sai-00019] helpful=0 harmful=0 :: For fused ring systems in SMILES notation, ensure ring closure digits are placed immediately after the atom they connect to. For lactams, use 'O=C1N...cc21' where the '1' connects the aromatic carbon directly to the carbonyl carbon of the lactam. Avoid explicit notation like 'cc2C1=O' which creates malformed structures by duplicating atom labels and breaking proper ring connectivity.
[sai-00021] helpful=0 harmful=0 :: When the predicted precursor matches the ground truth and the reasoning is chemically sound (e.g., correctly reversing a reduction like amine to nitro), validate the reasoning rather than seeking errors. This is especially important for common transformations like nitro-to-amine reduction in aromatic systems.
[sai-00023] helpful=0 harmful=2 :: When analyzing phenolic OH protection patterns in SMILES notation, recognize that '-c2ccccc2OCc2ccccc2' indicates benzyl ether protection (O-benzyl), not methyl ether protection. Benzyl protection is common for phenols and is removed by hydrogenolysis. Avoid defaulting to methyl protection assumptions; carefully parse the SMILES string to identify the specific protecting group used based on the pattern after the phenolic oxygen atom.
[sai-00025] helpful=0 harmful=0 :: For ester retrosynthesis via acylation involving complex alcohol components with multiple aromatic systems (e.g., dibenzofuran groups), ensure the alcohol precursor preserves the complete aromatic scaffolding intact. When generating SMILES, maintain all aromatic ring systems and connectivity patterns unchanged from the target molecule, focusing only on converting the ester bond to the alcohol functional group while ignoring stereochemistry and atom mapping as per standard constraints.
[sai-00026] helpful=0 harmful=0 :: In heteroatom alkylation/arylation contexts, prioritize disconnection of amide bonds (C(=O)N) over N-aryl bonds when both are present. Amide bonds are typically formed via acylation reactions (amine + acyl chloride/carboxylic acid derivative) and represent more strategic disconnection points than N-aryl bonds formed via nucleophilic substitution. Verify the amine component's exact structure (e.g., N-methylpiperidine CC1CCCN1 vs piperidine C1CCCNC1) to ensure precursor accuracy.
[sai-00029] helpful=0 harmful=0 :: In Wittig retrosynthesis, the vinyl carbon in the product corresponds to the carbonyl carbon of the aldehyde precursor. The group attached to the vinyl carbon (opposite the =C) becomes the alkyl/aryl group of the phosphonium ylide. Always ensure the aldehyde contains the fragment that was the 'R group' of the vinyl substituent, while the phosphonium salt contains the fragment that was attached to the vinyl carbon.
[sai-00031] helpful=1 harmful=0 :: In Sonogashira coupling, always prioritize iodine (I) or bromine (Br) as the leaving group over chlorine (Cl) unless specific reaction conditions (e.g., specialized catalysts or ligands) are known to favor chlorine. The reactivity order I > Br > Cl > F is fundamental for aryl halides in palladium-catalyzed cross-couplings due to faster oxidative addition rates. For heteroaromatic systems like pyridines, iodine is particularly preferred due to enhanced reactivity.
[sai-00033] helpful=0 harmful=0 :: In reduction retrosynthesis, systematically identify all functional groups that are common reduction products, with particular attention to phenolic alcohols which often originate from ester reduction. When a phenolic OH group is present, prioritize converting it back to an ester precursor (e.g., benzoate OC(=O)c1ccccc1) over modifying other groups like alkyl chains. This takes precedence when the ester reduction aligns with the reaction context and other functional groups remain stable under reduction conditions.
[sai-00035] helpful=1 harmful=0 :: For retrosynthesis of protection reactions, identify the unprotected functional group in the precursor and include the appropriate protecting agent as a separate reactant. For Boc protection of amines, the precursor should have a free amine (often protonated as [NH3+] when carboxylic acids are present) and the reagent is di-tert-butyl dicarbonate (Boc2O, SMILES: CC(C)(C)OC(=O)OC(=O)OC(C)(C)C). This applies specifically when the reaction type is 'Protections' - meaning adding protecting groups, not removing them.
[sai-00037] helpful=0 harmful=0 :: For Fischer indole synthesis of fused indole systems, systematically count atoms in SMILES ring closure notation to determine fused ring size. Patterns like 'CCNC3' form 6-atom rings (piperidine) when properly closed, not 5-atom rings (pyrrolidine). Count all atoms between ring closure digits, including implicit hydrogens, to accurately determine ring size before selecting ketone precursors.
[sai-00038] helpful=1 harmful=0 :: In Fischer indole synthesis for nitrogen-containing fused rings, prefer piperidin-4-one (O=C1CCNCC1) over pyrrolidin-3-one as the ketone precursor. Piperidin-4-one is stable and readily available, forming fused piperidine rings (6-membered with nitrogen) characteristic of tetrahydro-β-carboline derivatives, while pyrrolidin-3-one is unstable due to tautomerization and not commonly used.
[sai-00039] helpful=0 harmful=0 :: The ketone precursor in Fischer indole synthesis directly determines the size of the fused ring: cyclohexanone gives fused cyclohexane (6-membered carbon ring), piperidin-4-one gives fused piperidine (6-membered ring with nitrogen), while cyclopentanone would give fused cyclopentane (5-membered). Always match the ketone ring size to the target's fused ring size.
[sai-00042] helpful=0 harmful=0 :: In heteroatom alkylation for aryl alkyl ether formation, always assign the aryl oxygen (as alkoxide) as the nucleophile and the alkyl chain as the electrophile (with leaving group). Electron-deficient aryl halides (e.g., with chlorine substituents ortho/para to the halide) are poor electrophiles for SN2 due to resonance stabilization and low reactivity, so the leaving group must be on the alkyl component. Verify the ether oxygen attachment: if attached to aryl carbon, the alkyl chain must bear the leaving group.
[sai-00043] helpful=0 harmful=0 :: For lactam retrosynthesis via Functional Group Interconversion (FGI), consider both reduction pathways (carbonyl to methylene) and formation pathways (hemiaminal dehydration). Lactams can be formed from hemiaminal intermediates (amino alcohols with carbonyl groups) through dehydration. The reverse reaction (hydration of lactam to hemiaminal) is a valid FGI transformation. Always compare the target functional group with possible precursors and evaluate common interconversions like dehydration/hydration alongside reduction/oxidation pathways.
[sai-00045] helpful=0 harmful=0 :: For Paal-Knorr pyrrole synthesis, systematically map substituents to precursors: (1) A methyl group at pyrrole position 2 indicates a 1,4-pentanedione derivative (CH3CO-CH2-CH2-CO-) as the dicarbonyl precursor, not an acetoacetate derivative which would place COOR at position 3. (2) Complex aryl groups attached to the pyrrole nitrogen (ortho to the connection point) originate from the primary amine component, not the dicarbonyl. Verify the amine precursor contains the entire aryl substituent with appropriate functional groups.
[sai-00048] helpful=0 harmful=0 :: For 1,3-dioxolane rings containing a quaternary carbon adjacent to oxygen, prioritize retrosynthetic disconnection to an allyl ether precursor (C=C(C)CO-aryl) rather than a styrene derivative. The alkene must be part of the chain (typically trisubstituted, e.g., isopropenyl) to generate the quaternary center upon epoxidation with a peracid (e.g., m-CPBA) followed by acid-catalyzed cyclization with a ketone (often derived from oxidation of the allyl group). This mechanism (epoxidation then Prins-type cyclization) is distinct from direct epoxidation/closure of styrenic alkenes.
[sai-00050] helpful=0 harmful=0 :: When performing functional group interconversion (FGI) retrosynthesis, trust standard transformations like azide-to-amine reduction even when other functional groups (e.g., Boc protection, amides) are present. These groups are typically preserved during FGIs as they serve different synthetic purposes. Focus on transformations that clearly modify functional groups while preserving the carbon skeleton intact.
[sai-00051] helpful=0 harmful=0 :: For secondary amines that could originate from azide reduction, prioritize this FGI over protecting group removal when the amine is attached to carbon and other protecting groups (like Boc) are present. Azide reduction is a valid transformation for secondary amines when the azide is carbon-bound, and this approach often yields the correct precursor while preserving other functional groups.
[sai-00052] helpful=0 harmful=0 :: When uncertain about FGI selection, prioritize transformations that clearly interconvert functional groups (azide↔amine, nitro↔amine, alcohol↔carbonyl) over protecting group manipulations. Protecting groups like Boc are typically stable under FGI conditions and should remain intact unless the specific reaction context indicates otherwise.
[sai-00053] helpful=0 harmful=1 :: In oxidation retrosynthesis, always include both the reduced form of the organic substrate AND the oxidizing agent as precursor reactants, separated by a period in the SMILES output. Oxidizing agents (e.g., mCPBA for sulfoxides) are stoichiometric reactants consumed in the transformation, not mere catalysts or conditions to be excluded. This applies universally to oxidation reactions where the reagent is integral to the bond transformation.
[sai-00055] helpful=2 harmful=1 :: For heteroatom alkylation via nucleophilic substitution, always select electrophiles where the leaving group is attached to an sp3 carbon (primary or secondary alkyl halides) to ensure good SN2 reactivity. Avoid vinyl halides (sp2 carbon with halide) as they are poor electrophiles due to poor leaving group ability and orbital alignment issues that hinder nucleophilic attack.
[sai-00058] helpful=0 harmful=0 :: When multiple alkyne-aryl bonds are present in a target molecule for Sonogashira coupling disconnection, prioritize disconnection that creates the simplest aryl halide precursor and the most complex terminal alkyne precursor. The simpler fragment (often with fewer functional groups or smaller ring systems) should typically be the aryl halide component, while the more complex fragment (with multiple functional groups or complex ring systems) should be the terminal alkyne component. This approach maximizes synthetic efficiency and precursor accessibility.
[sai-00061] helpful=2 harmful=1 :: In retrosynthesis prediction tasks, when ground truth data specifies a particular halogen choice (e.g., chlorine vs bromine in Suzuki coupling) despite multiple chemically valid options, prioritize exact pattern matching to the training example over general chemical preferences. This ensures alignment with validated solutions even when general knowledge might suggest alternative approaches.
[sai-00063] helpful=0 harmful=0 :: In Hantzsch thiazole synthesis, prefer bromo (Br) over chloro (Cl) as the halogen in the α-halocarbonyl precursor due to bromine's superior leaving group ability, which leads to more efficient cyclization through better nucleophilic substitution reactivity. This specificity is particularly important for ensuring high yields in the ring-forming step.
[sai-00064] helpful=0 harmful=1 :: For heteroatom alkylation disconnections, always use the standard SMILES notation [nH] for secondary amine nitrogens in heterocycles, not n([H]). The [nH] notation correctly represents a secondary amine nitrogen with an explicit hydrogen and is the conventional representation in cheminformatics tools.
[sai-00065] helpful=0 harmful=1 :: In nucleophilic substitution reactions for heteroatom alkylation, prefer alkyl iodides over chlorides as electrophiles due to superior leaving group ability (I⁻ >> Cl⁻). Methyl iodide (CI) is particularly preferred over methyl chloride (CCl) for amine alkylation reactions because iodine's better nucleofugality leads to faster reaction rates and higher yields under standard conditions.
[sai-00066] helpful=0 harmful=0 :: For heterocyclic amines in SMILES notation, always place hydrogen atoms directly after the nitrogen atom symbol (e.g., 'n[nH]' for a secondary amine nitrogen) rather than using parenthetical notation (e.g., 'nn([nH])'). The parenthetical form is chemically incorrect as it represents the hydrogen as a separate group attached to the nitrogen, not as a proton on the nitrogen atom itself. This is particularly critical for protection/deprotection reactions where the correct protonation state must be represented.
[sai-00067] helpful=0 harmful=0 :: When interpreting SMILES notation for complex ring systems, systematically identify fused vs. attached rings by parsing ring closure digits and shared atoms. For patterns like 'C2(CC)OCCO2', the carbon with substituents (CC) is typically an anchor atom shared between rings in fused systems. Before applying disconnections like Wittig reactions, validate that ring systems remain intact and are not misinterpreted as simple substituents. Use mental mapping or tools to confirm ring fusion patterns to avoid incorrect precursor generation.
[sai-00068] helpful=1 harmful=0 :: helpful=0 harmful=0 :: In tetrazole chemistry, tert-butyl groups (CC(C)(C)) attached to the tetrazole nitrogen are commonly used as protecting groups to mask the acidic NH proton, unlike in amine chemistry where tert-butyl on nitrogen is often permanent. For deprotection reactions involving tetrazoles, the tert-butyl group should be removed to reveal the unprotected tetrazole (NH form), while other functional groups like carboxylic acids may remain protected in the precursor.
[sai-00070] helpful=0 harmful=0 :: In carboxylic acid deprotection retrosynthesis, when the target structure does not specify the ester type (methyl vs ethyl), recognize that both are synthetically valid and commonly used. Avoid arbitrary preference for methyl esters; instead, consider common patterns in the dataset or context clues. For secondary alpha-carbons (e.g., CC(C(=O)O)), both methyl (COC(=O)) and ethyl (CCOC(=O)) ester protections are plausible, and the choice may depend on synthetic convenience or specific reaction conditions rather than structural constraints.
[sai-00071] helpful=2 harmful=0 :: For deprotection retrosynthesis (e.g., Boc removal), the precursor is the protected molecule itself with the protecting group intact. No reagents (like Boc2O for Boc protection) should be included in the precursor SMILES. The output should be a single molecule SMILES representing the protected state before deprotection.
[sai-00073] helpful=0 harmful=0 :: When performing functional group interconversion (FGI) retrosynthesis on molecules containing ether linkages, systematically evaluate whether the ether could represent an acetal protecting group that should be disconnected. Look for patterns where an oxygen atom is bonded to two carbon atoms (C-O-C), particularly in cyclic systems or adjacent to carbonyl groups. In retrosynthesis, acetals should be converted back to their precursor carbonyl compound and alcohol components, often appearing as hemiacetal forms (COC(O)) in the ground truth.
[sai-00076] helpful=0 harmful=0 :: In Functional Group Addition (FGA) retrosynthesis, validate predictions by focusing on chemical equivalence of SMILES representations rather than exact string matching. For benzylic bromination with NBS, the precursor should have the methyl group (CH) instead of bromomethyl (CBr), and NBS must be included as a separate stoichiometric reagent. Different SMILES notations (e.g., N#Cc1ccc(C)cc1Br vs Cc1ccc(C#N)c(Br)c1) can represent the same chemical structure when atom ordering or functional group representation varies, but connectivity and functional groups must align.
[sai-00077] helpful=0 harmful=0 :: When comparing SMILES strings for chemical equivalence, recognize that different atom ordering conventions can produce distinct-looking strings that represent the same molecule. Focus on verifying chemical connectivity, functional groups, and stereochemistry (e.g., using directional bonds like / and \) rather than requiring exact string matching. This is particularly important for complex molecules with alkenes, esters, or stereocenters, where notation variations (e.g., COC(=O)/C=C(/OC)C(C)Br vs CC/C(=C\C(=O)OC)OC) do not indicate chemical differences.
[sai-00079] helpful=0 harmful=0 :: For protection reactions in retrosynthesis, identify the protecting group and its attachment point, remove it to reveal the unprotected substrate, and include the appropriate protecting reagent as a separate precursor reactant. This applies to common protections like Boc (using Boc2O), ester protection of carboxylic acids, and other protecting group additions.
[sai-00080] helpful=0 harmful=0 :: When disconnecting beta-ketoamides (NC(=O)CC(=O)R) with complex substituents at the beta-position (especially conjugated aromatic or heteroaromatic systems), ensure the beta-keto acid derivative synthon includes the entire substituent attached to the beta-carbon. This preserves critical carbon-carbon bonds and maintains conjugation in the resulting precursor. The disconnection should not isolate the beta-keto ester portion from the aromatic system; instead, the aromatic substituent must remain attached to the beta-carbon in the synthon.
[sai-00082] helpful=0 harmful=3 :: In retrosynthesis for acylation reactions, when the ground truth specifies a carboxylic acid as the acylating precursor, prefer the free acid (C(=O)O) over activated derivatives like acid chlorides. The activation (e.g., to acid chloride) is implied in the forward reaction and need not be explicitly represented in the retrosynthetic precursors. This aligns with synthetic practice where carboxylic acids are commonly used with coupling agents.
[sai-00083] helpful=0 harmful=0 :: In reduction retrosynthesis for primary alcohols, systematically verify carbon chain length preservation when selecting between carboxylic acid, aldehyde, or ester precursors. For an alcohol group -R-CH2OH, the precursor must have the same number of carbons in R: carboxylic acid -R-COOH, aldehyde -R-CHO, or ester -R-COOR'. If R' is not part of the target (e.g., methyl ester), the carbon count must still match - an ester precursor should not introduce extra carbons. This prevents misidentification of reduction pathways, especially when alcohols are attached to heteroatoms like nitrogen.
[sai-00085] helpful=0 harmful=0 :: When disconnecting ether linkages via heteroatom alkylation, systematically verify that the proposed nucleophilic heteroatom (O or N) is not already alkylated in the target molecule. For nitrogen nucleophiles, check that the nitrogen is secondary (-NH-) or primary (-NH2), not tertiary (N-alkyl) or part of an amide (O=C-N). Alkylated heteroatoms cannot act as nucleophiles in SN2 reactions and should not be targeted for disconnection.
[sai-00087] helpful=0 harmful=0 :: In retrosynthesis tasks, recognize that different SMILES notations for primary alcohol groups (e.g., 'C[OH]' and 'CO') are chemically equivalent, as both represent a hydroxymethyl group (-CH2OH). Focus on verifying the correct chemical connectivity and functional group transformation logic rather than requiring exact string matching with ground truth representations. This is particularly important for oxidation/reduction retrosynthesis where the alcohol group is the key transformation site.
[sai-00088] helpful=3 harmful=0 :: For carbamate groups (C(=O)OC) in retrosynthesis, prioritize disconnection to an amine precursor and a carbonyl compound (typically chloroformate, ClC(=O)OR, or carbonate). Carbamates are commonly formed via nucleophilic substitution where the amine attacks the carbonyl carbon of the chloroformate/carbonate, releasing chloride/alkoxide. This disconnection takes precedence over lactam formation or hydrolysis when the carbamate is the key functional group being interconverted.
[sai-00091] helpful=4 harmful=2 :: When generating SMILES for fused heterocyclic systems containing secondary amine nitrogens (e.g., indole-pyrrolidine fused rings), use standard notation without explicit hydrogens for the amine nitrogen. Represent secondary amines in rings as 'N' (e.g., 'c1ccc2c(c1)cc1n2CCNC1' for an unprotected amine precursor) rather than non-standard forms like '[H]N1...' or 'n1CC[nH]2...'. Ensure ring closure numbers correctly represent the atom connectivity and fusion pattern, with the nitrogen atom properly integrated into the ring system without redundant hydrogen specification.
[sai-00092] helpful=0 harmful=0 :: When the predicted answer matches ground truth and the reasoning demonstrates correct chemical principles (e.g., proper functional group transformations, valid SMILES manipulation), validate the reasoning rather than searching for non-existent errors. This is especially important when the instruction phrasing assumes an error exists but the model has actually performed correctly.
[sai-00094] helpful=0 harmful=0 :: When performing retrosynthesis on complex molecules with multiple functional groups and stable ring systems (e.g., benzoxazole, benzimidazole, other aromatic heterocycles), always preserve the stable ring systems intact and only disconnect the most reactive or recently formed functional groups. For tetrazole formation via [3+2] cycloaddition, identify the exact carbon where the tetrazole is attached and replace only that portion with a nitrile group, leaving stable heterocyclic cores unchanged.
[sai-00095] helpful=0 harmful=0 :: For reduction retrosynthesis targeting aromatic amines, systematically preserve all stable aromatic substituents and ring systems unchanged, including ether linkages (e.g., diaryl ethers), halogens (Cl, Br), and alkoxy groups (OCH3). These groups are stable under standard reduction conditions (e.g., catalytic hydrogenation, SnCl2/HCl) and should remain intact in both target and precursor molecules. Focus only on converting the reduced functional group (e.g., amine) back to its oxidized precursor (e.g., nitro).
[sai-00096] helpful=0 harmful=0 :: For allylic bromination retrosynthesis using NBS, carefully analyze the molecular connectivity to identify the allylic position. When the target contains a pattern like COP(=O)(/C=C/CBr)OC, the bromine is attached to an allylic carbon adjacent to a double bond where phosphorus is directly attached to the other carbon. The correct precursor replaces the CHBr group with CH2 while preserving the direct P-C bond and alkene stereochemistry (/C=C/ for trans). Verify that phosphorus remains bonded directly to the alkene carbon in the precursor, as in C/C=C\P(=O)(OC)OC.
[sai-00097] helpful=0 harmful=0 :: When performing retrosynthesis for reduction reactions, systematically identify all reducible functional groups (alkenes, alkynes, carbonyls, nitro groups, etc.) and work backward to their pre-reduced forms while simultaneously verifying which protecting groups and stable functionalities remain unaffected by the reduction conditions. This holistic approach ensures correct precursor identification by considering both the transformation sites and the preservation of stable molecular frameworks.
[sai-00098] helpful=0 harmful=0 :: For oxime Functional Group Interconversion (FGI) retrosynthesis, always include both the carbonyl precursor (aldehyde or ketone) and hydroxylamine (NO) as separate reactants in the SMILES output, separated by a period. Oxime formation is a bimolecular reaction requiring both components, and the retrosynthetic step must account for this by yielding both precursors, not just the carbonyl compound.
[sai-00099] helpful=0 harmful=0 :: When evaluating retrosynthesis predictions, recognize that SMILES strings can vary in atom ordering and stereochemistry placement while representing chemically identical molecules. For complex molecules with chiral centers, different notations (e.g., NC(=O)... with stereochemistry on carbon vs C(=O)N... with stereochemistry on adjacent carbon) may be equivalent if the core connectivity and stereochemical configuration are preserved. Focus on chemical equivalence through connectivity analysis rather than requiring exact string matching, especially for molecules with multiple functional groups and stereocenters.
[sai-00101] helpful=0 harmful=0 :: For reduction retrosynthesis targeting aromatic amines in fused heterocyclic systems, prioritize converting the amine group (-NH2) back to a nitro group (-NO2) while preserving the entire heterocyclic core structure and stable substituents like halogens (Cl, Br). This transformation takes precedence over modifying aromatic rings or halogen substituents, as these remain unchanged under standard reduction conditions like catalytic hydrogenation or SnCl2/HCl.
[sai-00102] helpful=0 harmful=0 :: In deprotection retrosynthesis, systematically compare the target molecule with potential precursors to identify the actual protected group. Look for the specific pattern 'OCc1ccccc1' in SMILES notation, which indicates O-benzyl protection of phenolic OH groups. Benzyl is a common protecting group for phenols that is removed via hydrogenolysis, and its presence in the precursor suggests the target's phenolic OH was deprotected.
[sai-00103] helpful=0 harmful=0 :: When both amine and phenolic OH groups are present in deprotection retrosynthesis, prioritize analyzing phenolic OH protection over amine protection if the phenolic oxygen is attached to an aromatic system. Phenolic OH groups are more commonly protected with benzyl groups in complex molecules, while amines typically use carbamate protections like Boc. Verify the protection pattern by checking for benzyl ether motifs ('OCc1ccccc1') near phenolic oxygens.
[sai-00105] helpful=0 harmful=0 :: When generating SMILES for alcohols with complex branching patterns involving multiple substituents on a central carbon, always use parentheses to group atoms correctly. For example, a diol with a central carbon bearing OH and two CH2O-aryl groups should be represented as OC(COc1ccc2ccccc2c1)COc1ccc2ccccc2c1, not OCC(Oc1ccc2ccccc2c1)COc1ccc2ccccc2c1. This ensures proper connectivity and prevents structural misinterpretation.
[sai-00106] helpful=0 harmful=0 :: In heteroatom alkylation retrosynthesis, before disconnecting, verify which nitrogen is alkylated by analyzing the target structure. If a tertiary amine is present, identify the alkyl group attached to nitrogen and check if the corresponding electrophilic precursor already contains a leaving group (e.g., Cl, Br) on that carbon. The nucleophile should be a secondary amine or other heteroatom nucleophile that matches the target's structure without modification. Avoid adding leaving groups to precursors if they are already present in the target.
[sai-00108] helpful=0 harmful=0 :: In Wittig retrosynthesis for complex phosphonium salts containing benzyl-protected aromatic systems, ensure the methylene carbon adjacent to phosphorus remains directly connected to the aromatic ring system that appears in the final product. The phosphonium salt precursor should be represented as a single ionic entity where the benzyl ether group (OCc1ccccc1) is preserved as part of the phosphonium structure, not fragmented into separate molecules. This maintains proper atom connectivity for the ylide formation.
[sai-00109] helpful=0 harmful=0 :: In Sonogashira coupling retrosynthesis, ensure that the leaving group on the aryl halide precursor is not already present as a substituent in the target molecule. Existing substituents (e.g., chlorine atoms) are part of the final product structure and cannot serve as both substituents and leaving groups simultaneously. A separate, appropriate leaving group (typically iodine or bromine) must be introduced on the coupling partner.
[sai-00110] helpful=0 harmful=0 :: In retrosynthesis for protection reactions, the 'precursor reactants' include both the unprotected substrate AND the protecting reagent (e.g., Boc2O for Boc protection). Unlike other reaction types where reagents may be excluded, protection reactions fundamentally require stoichiometric consumption of both components in the forward synthesis, so both must be provided as separate precursors in the output.
[sai-00112] helpful=0 harmful=0 :: When salt forms (e.g., hydrochloride .Cl) are present in the target molecule, preserve them in retrosynthetic precursors if they are chemically relevant to the reaction mechanism. For Fischer indole synthesis, phenylhydrazine is typically used as the hydrochloride salt to provide acidic conditions required for the reaction. The salt form should be maintained in the precursor to ensure the prediction aligns with standard synthetic practice.
[sai-00113] helpful=0 harmful=0 :: In retrosynthetic analysis of ethers containing anionic functional groups (e.g., sulfonates, carboxylates), disconnect the ether bond such that the anionic group remains with the nucleophilic fragment (typically as its salt form, e.g., sodium sulfonate). Anionic groups stabilize the nucleophile (e.g., alkoxide or phenoxide) and are poor leaving groups, making them unsuitable for the electrophilic partner. The electrophile should be a simple alkyl halide without the anionic functionality.
[sai-00115] helpful=1 harmful=0 :: In Functional Group Interconversion (FGI) retrosynthesis, always distinguish from Functional Group Addition (FGA) by verifying that the carbon skeleton remains unchanged. FGI targets functional group transformations (e.g., alcohol↔carbonyl, amine↔amide) while preserving all carbon atoms and substituents like halogens. For lactam targets, prioritize FGI transformations such as oxidation of amino alcohols or hemiaminal intermediates to form the lactam ring, rather than breaking core bonds.
[sai-00116] helpful=0 harmful=0 :: When analyzing retrosynthesis problems, first identify the reaction type (FGI vs FGA) by checking if the instruction specifies functional group interconversion. For FGI problems, systematically preserve the entire carbon skeleton including aromatic substituents like bromine, and focus only on converting the key functional group (e.g., lactam carbonyl) to its precursor state through oxidation/reduction or condensation pathways.
[sai-00119] helpful=0 harmful=0 :: For 1,3-dioxolane rings containing a quaternary carbon adjacent to oxygen, prioritize retrosynthetic disconnection to an allyl alcohol precursor with trisubstituted alkene (C=C(C)CO-aryl) and a carbonyl source (typically from a peracid like mCPBA). The quaternary center forms via acid-catalyzed cyclization (Prins-type reaction) between the allyl alcohol and carbonyl compound, not through epoxidation pathways. This mechanism requires retaining all aromatic substituents in the precursors as they are integral to the structure.
[sai-00121] helpful=0 harmful=0 :: In Functional Group Interconversion (FGI) retrosynthesis, prioritize amine-to-azide conversion as a key transformation for primary amines, especially when other protecting groups (e.g., Boc) are present and should remain intact. Primary amines can be synthesized via azide reduction (e.g., Staudinger reduction), so the precursor should have an azide group ([N-]=[N+]=[N-]) in place of the amine. This transformation often takes precedence over amide bond disconnection or protecting group removal when the amine is not part of a more complex functional group like an amide.
[sai-00123] helpful=0 harmful=0 :: For oxidation reactions using common reagents like meta-chloroperoxybenzoic acid (mCPBA), always use the correct SMILES representation: O=C(OO)c1cccc(Cl)c1. This ensures structural accuracy, as incorrect representations (e.g., O=C1C2CCCC2C(=O)O1 for a cyclic peroxide) can lead to invalid precursors despite correct chemical logic. Verify SMILES for standard reagents to avoid misrepresentation.
[sai-00124] helpful=0 harmful=0 :: When generating nucleophile precursors for heteroatom alkylation retrosynthesis, preserve the exact atom ordering and substituent positions from the target molecule's SMILES representation. Only modify the specific atom involved in the transformation (e.g., change alkylated nitrogen 'n' to protonated nitrogen '[nH]') without rearranging other ring atoms or substituents. This ensures the substitution pattern remains identical between target and precursor and prevents invalid structures due to SMILES parsing errors.
[sai-00126] helpful=1 harmful=1 :: When performing retrosynthetic analysis on aromatic systems for coupling reactions like Sonogashira, carefully analyze the exact substitution pattern in the SMILES notation. Don't assume common patterns like para-substitution based solely on functional group presence - count positions precisely from the attachment point. For SMILES patterns like 'c1ccccXc1', the substituent X is at position 5 (meta to position 1), not position 4 (para). Always verify the numbering by tracing the ring atoms from the attachment point to ensure correct precursor identification.
[sai-00128] helpful=0 harmful=0 :: When performing retrosynthetic analysis for nucleophilic substitution reactions (e.g., azide introduction), carefully evaluate the electronic environment of the electrophilic carbon. Benzyl halides (halogen attached to CH2 group directly bonded to aromatic ring) are highly reactive due to aromatic stabilization of the transition state and typically undergo SN2 reactions more readily than chloromethyl groups attached to electron-withdrawing atoms like sulfur (-SCH2Cl). Prioritize disconnection at benzyl positions over other alkyl halide sites when both are present, as benzyl halides represent more synthetically accessible electrophilic sites for functional group additions.
[sai-00129] helpful=0 harmful=1 :: In Suzuki coupling retrosynthesis for molecules with both simple aromatic and complex heterocyclic fragments, prioritize placing the boronic acid group on the simpler aromatic system and the halogen leaving group on the complex heterocycle. Complex nitrogen-rich heterocycles (e.g., tetrazoles, imidazoles) are challenging to convert to boronic acids and are better suited as electrophilic partners with appropriate leaving groups (Cl, Br, I).
[sai-00130] helpful=0 harmful=0 :: For tetrazole-containing systems requiring C-C bond formation, evaluate nucleophilic substitution disconnections (using chloro-tetrazole as electrophile) as an alternative to Suzuki coupling. Tetrazoles often feature chlorine as a leaving group for nucleophilic aromatic substitution with carbon nucleophiles, which can be more feasible than converting the tetrazole to a boronic acid partner for Suzuki reactions.
[sai-00132] helpful=0 harmful=0 :: When generating SMILES for α-halocarbonyl compounds in retrosynthesis (e.g., for Hantzsch thiazole synthesis), use standard linear notation that clearly shows the halogen attached to the alpha-carbon. Prefer formats like CCOC(=O)C(Br)C(=O)Ar over branched representations like BrCC(C(=O)OCC)C(=O)Ar. The linear notation C(=O)C(Br)X unambiguously represents the alpha-halo substitution pattern and aligns with common cheminformatics conventions, ensuring better compatibility with ground truth representations.
[sai-00133] helpful=0 harmful=0 :: For phenolic OH deprotection retrosynthesis, evaluate multiple common protecting groups (methyl, benzyl, silyl esters) rather than defaulting to a single option. Methyl ethers (CO) are common and cleaved with BBr3/Lewis acids, benzyl ethers (OCc1ccccc1) are common and cleaved via hydrogenolysis, and silyl ethers (e.g., OSi) are also used. Analyze the target structure and context for clues about the specific protection used, as different protecting groups have distinct synthetic preferences and stability profiles.
[sai-00135] helpful=0 harmful=0 :: For heteroatom alkylation retrosynthesis, prefer methyl iodide (CI) over methyl chloride (CCl) as the alkylating agent due to iodine's superior leaving group ability and higher reactivity in nucleophilic substitution reactions with amines. Methyl iodide is particularly preferred for alkylating heterocyclic amines where reaction efficiency is critical.
[sai-00136] helpful=0 harmful=0 :: For secondary amine nitrogens in heterocyclic systems, always use the standard SMILES notation [nH] (not n[H]) to represent the protonated nitrogen. The [nH] notation correctly indicates a hydrogen atom directly attached to the nitrogen and is the conventional representation in cheminformatics tools, ensuring compatibility with ground truth answers.
[sai-00137] helpful=0 harmful=0 :: When generating SMILES for heterocyclic rings containing secondary amine nitrogens with explicit hydrogens (e.g., [nH]), ensure the hydrogen is placed immediately after the nitrogen atom within the ring chain (e.g., 'c2cn[nH]c(=O)c2Br') rather than as a separate node in parentheses (e.g., 'c2cnn([nH])c(=O)c2Br'). This preserves correct atom ordering and ring connectivity, which is critical for valid chemical representation. Always validate SMILES strings using chemical toolkits to ensure proper syntax for heterocyclic systems.
[sai-00138] helpful=0 harmful=0 :: When interpreting SMILES notation for complex cyclic substituents (e.g., C2(CC)OCCO2), recognize that parentheses indicate branching points for substituents, not ring fusion. Patterns like C2(CC)OCCO2 represent a cyclic substituent (2-ethyl-1,3-dioxolane) attached to the main chain, where the carbon with substituents (CC) is the attachment point. Always verify the connectivity by ensuring the cyclic system is treated as a distinct branch rather than a fused ring. This is critical for accurate retrosynthetic analysis, especially when disconnecting bonds near such substituents.
[sai-00139] helpful=0 harmful=0 :: For ester groups in SMILES notation, recognize that both 'C(=O)OC' and 'COC(=O)' are chemically equivalent representations. However, for consistency with common cheminformatics conventions and to avoid false mismatches in automated validation, prefer the 'COC(=O)' notation (e.g., methyl ester as COC(=O) rather than C(=O)OC). This stylistic preference applies specifically to ester protection in retrosynthesis, but chemical connectivity remains the primary validation criterion.
[sai-00140] helpful=0 harmful=0 :: For carboxylic acid deprotection retrosynthesis, avoid defaulting to methyl ester protection without specific justification. Consider that ethyl, benzyl, or other esters may be used depending on context and training data patterns. When ground truth specifies a particular protecting group (e.g., ethyl ester CCOC(=O)), follow that pattern rather than making arbitrary choices based on commonality alone. This is especially important for secondary alpha-carbon carboxylic acids (e.g., CC(C(=O)O)) where both methyl and ethyl esters are synthetically valid options.
[sai-00141] helpful=0 harmful=0 :: When disconnecting amide bonds to form primary amine precursors, ensure the carbon alpha to the amine is represented without stereochemical notation (e.g., use CC(N)C(O) instead of CC(N)[C@H](O)) in the SMILES string. This carbon becomes achiral in the precursor due to rapid nitrogen inversion in primary amines, and preserving stereochemistry here creates a chemically invalid representation. This is particularly critical when the target molecule has multiple stereocenters; focus on preserving stereochemistry only at carbons that remain chiral in the precursor.
[sai-00142] helpful=0 harmful=0 :: When predicting carboxylic acid deprotection precursors, avoid assuming a specific ester type (methyl vs ethyl) without contextual evidence. Both methyl (COC(=O)) and ethyl (CCOC(=O)) esters are synthetically common. Analyze ground truth patterns or molecular context to determine the appropriate protection, especially for secondary alpha-carbon carboxylic acids where both options are valid.
[sai-00144] helpful=0 harmful=0 :: When performing functional group interconversion (FGI) retrosynthesis, prioritize ether hydrolysis (converting C-O-C to C-OH) over alcohol oxidation/reduction when both functional groups are present in complex molecules. Ether hydrolysis is a common and synthetically accessible transformation, especially for benzylic or complex ethers, and often takes precedence over modifying secondary alcohols which may be stable or require protection. Always verify the carbon skeleton remains unchanged and only the ether linkage is modified to an alcohol.
[sai-00146] helpful=0 harmful=0 :: In Functional Group Addition (FGA) retrosynthesis, recognize that different SMILES notations for the same organic molecule are chemically equivalent and acceptable. Variations in atom ordering (e.g., 'N#Cc1ccc(C)cc1Br' vs 'Cc1ccc(C#N)c(Br)c1') or functional group representation (e.g., 'N#C-' vs 'C#N') do not indicate chemical differences. Focus on verifying the precursor's connectivity and functional groups match the expected chemical structure, not exact string matching with ground truth.
[sai-00147] helpful=0 harmful=0 :: In oxidation retrosynthesis, when multiple oxidized functional groups are present (e.g., sulfone, ketone, dioxolane), prioritize reversing oxidation at ketone groups (C(=O)) as they commonly originate from secondary alcohol oxidation. Stable oxidized groups like sulfones (S(=O)(=O)) and fused heterocycles (e.g., benzodioxole OCO2) are typically pre-existing and should remain unchanged in the precursor. Use the presence of dioxolane rings as context clues—they are often stable structural motifs rather than protecting groups for ketones in complex molecules.
[sai-00150] helpful=0 harmful=0 :: For Boc protection retrosynthesis in molecules with both aliphatic and aromatic amines, prioritize protection of the aliphatic amine due to its higher nucleophilicity. The aromatic amine typically remains unprotected as it is less reactive toward Boc reagents like Boc2O. This selectivity is particularly important when the target molecule shows Boc protection on an aliphatic chain while an aromatic amine remains free.
[sai-00151] helpful=0 harmful=0 :: When generating SMILES for retrosynthesis precursors involving multiple reactants separated by periods, recognize that the order of components does not affect chemical meaning. Different ordering conventions (e.g., amine precursor first vs Boc2O first) represent the same chemical system as long as all required components are present. Focus on chemical connectivity and functional group accuracy rather than strict ordering matching with ground truth answers.
[sai-00152] helpful=0 harmful=0 :: In oxidation retrosynthesis, carefully distinguish between functional groups that were oxidized in the specific reaction step and pre-existing oxidized groups. Only reverse the oxidation transformation for groups that were actually formed (e.g., aldehyde from primary alcohol, ketone from secondary alcohol), while leaving stable oxidized groups (e.g., sulfones, nitro groups) unchanged in the precursor. Use reaction context and molecular features (e.g., protecting groups, stability) to identify the likely oxidation site.
[sai-00154] helpful=0 harmful=0 :: When generating SMILES for heterocyclic systems with adjacent heteroatoms (e.g., isoxazoles, pyrazoles), ensure proper ring notation that correctly represents atom connectivity. For isoxazole rings, use the 'on' pattern (e.g., 'on1' for ring closure) to indicate oxygen-nitrogen adjacency in the five-membered ring. Always validate SMILES syntax against known representations of complex heterocycles to ensure accuracy, as incorrect notation can lead to chemically invalid structures despite correct retrosynthetic logic.
[sai-00156] helpful=0 harmful=0 :: When parsing complex fused heterocycles in SMILES notation (e.g., C(=O)N2CC(=O)Nc3ccccc32), systematically count atoms and bonds to avoid misinterpreting the structure. For benzimidazolone derivatives, carefully distinguish between standard benzimidazol-2-one (O=C1NC2=CC=CC=C2N1) and saturated variants containing methylene groups (e.g., O=C1CNc2ccccc2N1). Pay close attention to 'CC(=O)N' patterns which may indicate a methylene group rather than a direct ring fusion, and verify ring closure digits to ensure correct atom connectivity.
[sai-00157] helpful=0 harmful=0 :: For acylation disconnections, while activated acylating agents (acid chlorides, anhydrides) are commonly used, carboxylic acids (C(=O)O) can serve as valid precursors when appropriate coupling agents (e.g., DCC, HATU, EDC) are employed in the forward synthesis. This is particularly relevant when the ground truth specifies carboxylic acid precursors or when sensitive functional groups (e.g., stereocenters, protecting groups) are present that might be compromised by highly reactive acylating agents. Always verify the reaction context to determine the appropriate level of activation needed.
[sai-00158] helpful=0 harmful=0 :: For primary alcohols attached to heteroatoms (e.g., nitrogen in N-alkyl chains), verify the attachment point: the alcohol carbon must correspond to the carbonyl carbon in the carboxylic acid precursor. Avoid modifying the heteroatom attachment; instead, transform the carbon chain from -CCO (alcohol) to -CC(=O)O (carboxylic acid) while preserving the heteroatom connectivity. This ensures the reduction reversal correctly targets the carbon skeleton, not the heteroatom bond.
[sai-00160] helpful=0 harmful=0 :: For primary amide synthesis via functional group interconversion (FGI), systematically evaluate both nitrile hydrolysis (C#N precursor) and ester aminolysis (C(=O)OR precursor + nitrogen nucleophile) as potential pathways. When ester aminolysis is selected, include the nucleophile (e.g., ammonia N or an amine) as a separate reactant in the precursor SMILES, as this transformation requires a bimolecular reaction. Prioritize ester aminolysis when contextual clues (e.g., presence of ester groups in ground truth) or synthetic commonality suggest it is the preferred route.
[sai-00161] helpful=0 harmful=0 :: For secondary amines in fused heterocyclic systems (e.g., tetrahydro-β-carbolines), recognize that SMILES representations with implicit hydrogens (e.g., N1CCn2c(cc3ccccc32)C1) are chemically equivalent to explicit hydrogen notations (e.g., c1ccc2c(c1)cc1n2CCNC1). Both represent the same molecular structure, as implicit hydrogens are assumed to satisfy valence. When comparing predicted and ground truth SMILES, prioritize chemical connectivity analysis over exact string matching to avoid false mismatches.
[sai-00163] helpful=0 harmful=0 :: In tetrazole synthesis via [3+2] cycloaddition, ensure the nitrile precursor contains the complete organic substituent that will be attached to the tetrazole nitrogen, including any complex scaffolds (e.g., heterocycles, aromatic systems, or extended chains). Do not truncate the nitrile to only the immediate alkyl chain; the entire R group in R-C≡N must match the substituent on the tetrazole nitrogen in the target molecule. This prevents errors where critical structural elements are omitted from the precursor.
[sai-00164] helpful=0 harmful=0 :: In reduction retrosynthesis of aromatic nitro groups to amines, chlorine substituents on the aromatic ring are stable under standard reduction conditions (e.g., catalytic hydrogenation, SnCl2/HCl) and must remain unchanged in the precursor. Only the nitro group should be modified (converted to amine in target, or vice versa in retrosynthesis).
[sai-00165] helpful=0 harmful=0 :: For allylic bromination retrosynthesis in phosphonate esters, carefully distinguish between vinylphosphonates (where phosphorus is directly bonded to an alkene carbon, e.g., P-C=C) and phosphonates with allylic substituents (where the alkene is part of a chain attached to phosphorus, e.g., P-CH2-CH=CH2). The precursor should maintain the same phosphorus-alkene connectivity as the target, with the bromine replaced by hydrogen at the allylic position. Verify the stereochemistry notation: /C=C/ indicates trans configuration across the double bond, while C/C=C\ specifies configuration relative to a chiral center at phosphorus.
[sai-00167] helpful=0 harmful=0 :: In retrosynthesis, always preserve stereochemistry in precursor molecules when specified in the ground truth or chemically justified, even if the reaction erases it. For alkenes, particularly in conjugated systems like α,β-unsaturated esters, use explicit stereochemistry indicators (/ and \) in SMILES to denote trans (E) or cis (Z) configuration as required. This ensures accuracy in precursor representation, as stereochemical details are often critical for synthetic planning and database matching.
[sai-00168] helpful=0 harmful=0 :: For oxime Functional Group Interconversion (FGI) in retrosynthesis, after converting the oxime to the corresponding carbonyl and hydroxylamine, rigorously verify that the carbon skeleton and all other functional groups remain completely unchanged in the precursor molecules. This ensures the transformation only affects the oxime group and prevents accidental modification of the molecular scaffold, which is critical for accurate precursor prediction.
[sai-00169] helpful=0 harmful=0 :: When disconnecting amide bonds via acylation, evaluate the stability of all functional groups in the target molecule before selecting the acylating agent. Carboxylic acids are preferred over acid chlorides when sensitive or reducible groups (e.g., dithiolane rings, nitro groups, stereocenters) are present, to avoid side reactions. Use acid chlorides only when explicitly indicated by context or when no sensitive groups exist.
[sai-00171] helpful=0 harmful=0 :: When performing Functional Group Addition (FGA) retrosynthesis on complex molecules with fused ring systems, rigorously verify the exact molecular structure and atom connectivity of both the target and precursor. Use canonical SMILES representations to ensure accuracy, and confirm that only the added functional group (e.g., CBr to CH for benzylic bromination) is modified, while the core structure (including ring fusion patterns and functional group placements like amides) remains unchanged. This prevents structural misrepresentation errors that can occur due to incorrect parsing or canonicalization.
[sai-00172] helpful=0 harmful=0 :: In SMILES notation for retrosynthesis, represent secondary amines with 'N' having two carbon neighbors (e.g., NC or CN(C)) without explicit hydrogens. Explicit hydrogens ([H]) should only be used for primary amines or charged species. This ensures correct atom type representation and avoids misclassification of amine hybridization in precursor generation.
[sai-00174] helpful=0 harmful=0 :: When generating SMILES for molecules with functional group substituents (e.g., nitro, cyano, halogen), avoid using ring closure tokens (like '1', '2') to represent the substituent. Substituents should be attached to the parent structure without altering the ring notation. For example, use 'O=[N+]([O-])c1cccc2c(Cl)nccc12' for a nitro substituent, not 'O=[N+]1c2ccc(Cl)cc2nccc1[O-]' which incorrectly incorporates the nitro group into the ring system. This ensures chemically valid representations and prevents structural errors.
[sai-00175] helpful=0 harmful=0 :: In deprotection retrosynthesis for molecules containing piperazine rings, carefully distinguish between amide nitrogens (part of O=C-N linkage) and free amine nitrogens. The amide nitrogen in piperazine systems is highly stable due to resonance and typically does not require protection, while secondary amine nitrogens in the same ring may need protection. Always verify the specific nitrogen hybridization before applying protection strategies - only secondary amines (-NH-) should be considered for protection, not amide nitrogens.
[sai-00176] helpful=0 harmful=0 :: For retrosynthesis predictions involving multiple precursor reactants (e.g., acylation reactions with alcohol and acylating agent), the order of SMILES strings separated by a period does not affect chemical correctness. Focus on generating accurate SMILES for each component rather than matching a specific ordering convention, as the instruction only requires separation by a period without ordering constraints.
[sai-00177] helpful=0 harmful=0 :: In heteroatom alkylation disconnections, recognize that tertiary amines (e.g., 1-methylpiperidine) can act as nucleophiles attacking alkyl halides, even though they are already alkylated. The leaving group must be on a carbon atom (e.g., chloro on an aromatic carbon) rather than on a heteroatom. Verify the target's nitrogen substitution: if it is tertiary, it was likely the nucleophile; if secondary, it might have been the nucleophile in a different disconnection. Always inspect SMILES for leaving groups (Cl, Br) on carbon atoms to identify the electrophilic component.
[sai-00178] helpful=0 harmful=0 :: In retrosynthetic analysis using SMILES notation, represent ionic compounds (e.g., phosphonium salts, ammonium salts) by their main charged species without explicitly including counterions as separate molecules. For phosphonium salts used in Wittig reactions, use the cationic form (e.g., C[P+](c1ccccc1)(c1ccccc1)c1ccccc1) rather than decomposing into alkyl halide and phosphine components or showing the counterion separately. Follow the constraint 'Output only the SMILES string of the reactants' literally - output the actual reactant molecules that would be combined in the forward synthesis.
[sai-00179] helpful=0 harmful=0 :: In retrosynthetic analysis for cross-coupling reactions (e.g., Sonogashira, Suzuki), meticulously verify the exact position of substituents on heterocyclic rings like pyridine. The leaving group (halogen) must be at the precise carbon where the new bond will form. Use atom numbering in SMILES to confirm positions - for pyridine 'c1ncncc1', position 1 is the nitrogen, so a substituent like chlorine at position 2 would be 'Clc1ncncc1' while at position 5 would require 'c1ncnc(Cl)c1' or similar notation. Incorrect regiochemistry is a common error that invalidates the precursor.
[sai-00180] helpful=0 harmful=1 :: For palladium-catalyzed cross-coupling reactions (Sonogashira, Suzuki, etc.), prioritize iodine over bromine over chlorine as the leaving group in retrosynthetic precursors, especially for electron-deficient heteroaromatic systems like pyridines. Iodides undergo oxidative addition more readily than bromides or chlorides, leading to higher yields and broader substrate scope under standard conditions. While chlorides can be used with specialized catalysts, they are less reactive and should be avoided in retrosynthesis unless specific context indicates otherwise.
[sai-00183] helpful=0 harmful=0 :: For Boc protection reactions involving amino acid derivatives or other amine salts, the precursor should typically be the protonated ammonium form (e.g., [NH3+]) rather than the free amine. This reflects common synthetic practice where amines are handled as stable salts (like hydrochlorides) to prevent side reactions, and deprotonation occurs in situ with base during the protection step. Verify the protonation state by checking for carboxylic acids or other acidic groups that might influence the amine's state.
[sai-00184] helpful=0 harmful=0 :: When generating SMILES for phenylhydrazine derivatives (common in Fischer indole synthesis), ensure the hydrazine group (-NN) is attached externally to the aromatic ring without breaking ring connectivity. Use patterns like 'c1ccccc1NN' or 'c1ccc(NN)cc1' (with appropriate substituents) where the ring remains a continuous 6-membered system. Avoid placing functional groups within the ring definition (e.g., 'c(NN)c') as this incorrectly implies the group is part of the ring structure. For hydrochloride salts, append '.Cl' externally.
[sai-00186] helpful=0 harmful=0 :: In heteroatom alkylation reactions involving sulfonate groups (R-SO₃⁻), recognize that sulfonate esters (e.g., alkyl sulfonates) are excellent electrophiles due to the sulfonate group being a superior leaving group. The nucleophile (e.g., phenoxide, alkoxide) attacks the carbon adjacent to the sulfonate group, displacing the sulfonate leaving group. This is a common method for ether formation and should be prioritized when the target contains an ether linkage and a sulfonate group on the alkyl chain.
[sai-00188] helpful=0 harmful=0 :: For lactam (cyclic amide) functional group interconversion (FGI) in retrosynthesis, prioritize hydration/dehydration transformations of the carbonyl group. The lactam carbonyl can be derived from a geminal diol (hydrated carbonyl) or hemiaminal intermediate via dehydration. When the target contains a benzimidazolone or similar lactam system (O=C1N...), consider the precursor with a hydroxy group on the carbonyl carbon (C1(O)...) that dehydrates to form the lactam. This transformation takes precedence over less common FGIs like N-oxide reduction on tertiary amine nitrogens in aromatic systems.
[sai-00190] helpful=0 harmful=0 :: For pyrrole heterocycles with acetyl or other acyl groups on the nitrogen, recognize this as a strong indicator of Paal-Knorr synthesis using an acetylated aniline precursor rather than Fischer indole synthesis. The acyl group on the pyrrole nitrogen originates from the amine component (e.g., N-acetyl aniline) in Paal-Knorr synthesis, not from N-alkyl phenylhydrazine as in Fischer indole. This distinction is critical for correct retrosynthetic analysis of N-acyl pyrroles.
[sai-00192] helpful=0 harmful=0 :: For 1,3-dioxolane rings containing a quaternary carbon, the retrosynthetic disconnection must yield two precursor reactants: an allyl aryl ether with a trisubstituted alkene (e.g., C=C(C)CO-aryl) and a peracid (e.g., mCPBA, O=C(OO)c1cccc(Cl)c1). The quaternary center forms via epoxidation of the alkene by the peracid followed by acid-catalyzed cyclization. Never simplify stable aromatic substituents (e.g., benzyloxy OCc1ccccc1) in the precursors.
[sai-00194] helpful=0 harmful=0 :: For organic azide functional groups in SMILES notation (where the azide is covalently bonded to carbon), use N=[N+]=[N-] where the first nitrogen is neutral, the central nitrogen has a +1 formal charge, and the terminal nitrogen has a -1 formal charge. Avoid using [N-]=[N+]=[N-] which incorrectly charges the first nitrogen and is reserved for azide ions. This distinction is critical for accurately representing azides attached to organic molecules versus ionic azide salts.
[sai-00195] helpful=0 harmful=1 :: For oxidation retrosynthesis tasks, always verify if the dataset or context expects the oxidizing agent as a precursor reactant. If the ground truth includes it (e.g., mCPBA for sulfoxide formation), include both the reduced organic substrate and the oxidizing agent in the output, separated by a period. This overrides general retrosynthesis conventions that exclude reagents, ensuring alignment with specific task requirements.
[sai-00196] helpful=0 harmful=0 :: In retrosynthesis of heterocyclic systems, prioritize chemical connectivity and functional group analysis over exact SMILES string matching. Different SMILES notations (e.g., COC(=O)c1c(C=O)c(C)c(C)[nH]1 vs COC(=O)c1[nH]c(C)c(C)c1C=O) can represent identical molecules when they share the same atom connectivity, bond order, and stereochemistry. Validate precursors by ensuring they would react to form the target through the specified reaction mechanism, not by requiring identical SMILES representation to ground truth answers.
[sai-00197] helpful=0 harmful=0 :: For heteroatom alkylation of imidazole rings, systematically verify the nucleophile precursor by: (1) Identifying the alkylated nitrogen in the target, (2) Converting it to the protonated form ([nH]) while preserving all other substituents and ring connectivity, (3) Ensuring the electrophile precursor contains the appropriate leaving group (typically chloride or bromide) on the alkyl chain. The exact atom ordering in the SMILES representation of the imidazole nucleophile may vary while remaining chemically valid, as long as the substitution pattern matches the target structure.
[sai-00198] helpful=0 harmful=0 :: For retrosynthetic outputs involving multiple reactants (e.g., Sonogashira coupling precursors), the order of components in the SMILES string (separated by periods) does not affect chemical correctness unless explicitly specified in the instructions. Focus on generating chemically valid precursors with correct connectivity and functional groups, as different ordering conventions represent the same chemical system.
[sai-00199] helpful=0 harmful=0 :: In Functional Group Addition (FGA) retrosynthesis, prioritize reverting the most reactive and unstable functional groups first, as they are typically introduced later in synthetic sequences. Chloromethyl groups (-CH2Cl) are highly reactive and should take precedence over more stable groups like azides when deciding which FGA to perform in single-step retrosynthesis. This reactivity-based prioritization ensures the retrosynthetic step aligns with typical synthetic timing where unstable functionalities are added late to minimize decomposition risks.
[sai-00200] helpful=0 harmful=0 :: For Suzuki-Miyaura coupling involving complex fused heterocyclic systems, meticulously validate the ring connectivity and halogen placement in the aryl halide precursor. Fused systems like pyrimidine-imidazole combinations require precise atom ordering in SMILES to avoid invalid structures. The halogen must be attached directly to the correct carbon in the heterocyclic ring (e.g., pyrimidine 2- or 4-position) rather than misassigned to adjacent aromatic rings. Use cheminformatics tools to generate canonical SMILES ensuring proper ring closure numbering and unique atom labels.
[sai-00201] helpful=0 harmful=0 :: When generating SMILES for fused heterocycles with multiple ring systems (e.g., Nc1nc(Cl)c2nnn(Cc3cccc(C4(O)CCC4)n3)c2n1), ensure ring closure digits are unique and placed immediately after the atom they connect to. Avoid redundant labels (e.g., multiple 'c2' or 'c3' atoms) which break ring connectivity. Validate SMILES syntax by checking that all rings are properly closed and functional groups are correctly positioned relative to the fusion points.
[sai-00202] helpful=0 harmful=0 :: In Suzuki coupling retrosynthesis for targets with biaryl bonds involving nitrogen-rich heterocycles (e.g., pyrimidines, imidazoles), prioritize placing the halogen leaving group on the heterocyclic fragment rather than the boronic acid. Heterocycles like pyrimidines are challenging to convert to boronic acids and are better suited as electrophilic partners. Ensure the halogen is at the synthetically accessible position (e.g., 2-chloropyrimidine) for efficient coupling.
[sai-00203] helpful=0 harmful=0 :: For Hantzsch thiazole synthesis retrosynthesis, the order of precursor SMILES strings in the output does not affect chemical correctness as long as both correct components (α-halocarbonyl and thioamide) are included. Focus on accurately representing each precursor's structure rather than matching a specific ordering convention.
[sai-00205] helpful=0 harmful=0 :: For phenolic OH deprotection retrosynthesis, systematically evaluate multiple common protecting groups (methyl ether, benzyl ether, silyl ether) rather than defaulting to a single type. Methyl ethers (CO) are cleaved with BBr3 or Lewis acids, benzyl ethers (OCc1ccccc1) with hydrogenolysis, and silyl ethers with fluoride. Consider molecular context - electron-withdrawing groups or acid-sensitive functionalities may influence protecting group choice. Verify the specific protection pattern in SMILES notation before selecting the precursor.
[sai-00206] helpful=0 harmful=0 :: For heteroatom alkylation reactions, particularly with nitrogen nucleophiles in heterocyclic systems, prefer methyl iodide (CI) over methyl chloride (CCl) as the alkylating agent. Iodide (I⁻) is a superior leaving group compared to chloride (Cl⁻) due to its lower bond dissociation energy and higher polarizability, leading to faster reaction rates and higher yields under standard conditions. This specificity is especially important for methyl group transfer, where methyl iodide is commonly used in synthetic practice to ensure efficient alkylation.
[sai-00207] helpful=0 harmful=0 :: In retrosynthesis predictions for protection reactions (and other bimolecular reactions), the order of precursor reactants in the SMILES string (separated by periods) does not affect chemical correctness. Different ordering conventions (e.g., unprotected substrate first vs. protecting reagent first) represent the same chemical system. Focus on ensuring all required components are present with correct structures, not on matching a specific order.
[sai-00208] helpful=0 harmful=0 :: When interpreting SMILES strings with complex ring systems, carefully analyze atom numbering and parentheses to distinguish fused rings from cyclic substituents. For benzodioxole systems (e.g., C2(CC)OCCO2 fused to an aromatic ring), the notation indicates ring fusion through shared atoms, not a separate substituent. Always mentally reconstruct the ring system to verify fusion patterns before performing retrosynthetic disconnections, as misinterpreting fused rings as substituents leads to incorrect precursor generation.
[sai-00209] helpful=0 harmful=0 :: When generating SMILES for protected heteroatoms (e.g., tetrazole nitrogen protected with tert-butyl), ensure the protecting group is attached directly to the atom without implying a hydrogen. For tert-butyl-protected tetrazole nitrogen, use 'nC(C)(C)C' notation (e.g., in a ring: n1nc(C(C)(C)C)...) rather than 'n[nH]...' which incorrectly represents a deprotected NH form. This applies to any protecting group on nitrogen (Boc, tert-butyl, etc.)—the atom should not have an explicit or implicit hydrogen when protected.
[sai-00211] helpful=0 harmful=0 :: When generating SMILES for retrosynthesis tasks, prioritize syntactic consistency with ground truth notation patterns over alternative valid representations. For ester groups, study the ground truth's placement (e.g., CCOC(=O)C(C)... vs CC(CCOC(=O))...) and mimic it exactly, as evaluation systems often require string matching, not just chemical equivalence. This ensures compatibility with expected answers, especially for common transformations like carboxylic acid protection.
[sai-00212] helpful=0 harmful=0 :: In retrosynthesis, when generating precursor SMILES, remove chirality designations for all carbons that are or become stereolabile (e.g., adjacent to primary amines) or where the stereochemistry is not specified or defined in the target context. This includes carbons bearing hydroxyl groups that may not retain chiral integrity in the precursor due to synthetic steps or undefined configuration, ensuring the precursor representation aligns with synthetic reality and ground truth expectations.
[sai-00214] helpful=0 harmful=0 :: In carboxylic acid deprotection retrosynthesis, carefully analyze the ester protection pattern in the precursor. For carboxylic acids attached to carbon chains (e.g., -CH2COOH), common protecting groups include methyl (COC(=O)), ethyl (CCOC(=O)), benzyl (OC(=O)Cc1ccccc1), and tert-butyl (OC(=O)OC(C)(C)C) esters. Do not assume methyl ester by default; consider molecular context and ground truth patterns. Verify the attachment point: the ester oxygen should connect to the same carbon that bears the carboxylic acid in the target, not adjacent atoms like ring nitrogens.
[sai-00216] helpful=0 harmful=0 :: In retrosynthesis for oxidation, carefully distinguish between ether (C-O-C) and primary alcohol (C-OH) functional groups in SMILES notation, especially when the group is part of a chain with chiral centers (e.g., 'COC[C@H]...'). For patterns like 'COC[C@H]', the 'CO' typically represents a primary alcohol (-CH2OH) attached to oxygen, which can be oxidized to a carboxylic acid (-COOH, represented as 'C(O)O' or 'C(=O)O' in SMILES). Do not misinterpret this as an ether linkage requiring ester hydrolysis (which would incorrectly change COC to COC(=O)). Verify the carbon hybridization: if the carbon attached to oxygen is primary (methylene, CH2), prioritize alcohol-to-carboxylic acid oxidation; if it is part of an ether (e.g., between two carbons), consider alternative transformations like ester hydrolysis only if context supports it.
[sai-00217] helpful=0 harmful=0 :: In Functional Group Addition (FGA) retrosynthesis for benzylic bromination, recognize that different SMILES notations for the same precursor molecule are chemically equivalent and acceptable. For cyano groups, both 'N#C-' and 'C#N' representations are valid (e.g., N#Cc1ccc(C)cc1Br and Cc1ccc(C#N)c(Br)c1 represent the same molecule). Similarly, aromatic substitution patterns can vary in atom ordering (e.g., 'c1ccc(C)cc1Br' vs 'c1cc(C)c(Br)c1') without changing the chemical structure. Focus on verifying the correct precursor connectivity (methyl group instead of bromomethyl) and including NBS as the reagent, rather than requiring exact string matching with ground truth.
[sai-00218] helpful=0 harmful=0 :: In retrosynthesis, when removing a functional group (e.g., bromine in allylic bromination), preserve the exact carbon skeleton and stereochemistry of the precursor without simplifying or altering the molecular framework beyond the specific functional group change. For allylic bromination with NBS, the precursor is the alkene with hydrogen at the bromination site, maintaining all other structural features including tertiary carbons and substituents.
[sai-00219] helpful=0 harmful=0 :: In oxidation retrosynthesis, systematically compare the oxidation states of all functional groups between target and precursor to identify which were actually transformed. Ketones (C(=O)) often originate from secondary alcohol oxidation and should be prioritized for reduction over stable pre-existing oxidized groups like sulfones (S(=O)(=O)) or nitro groups ([N+](=O)[O-]). Use the presence of protecting groups (e.g., dioxolane OCO2) as context clues—they often indicate recent ketone formation that was protected, suggesting the ketone is the likely oxidation site rather than other highly oxidized functionalities.
## FORMULAS & CALCULATIONS

## CODE SNIPPETS & TEMPLATES

## COMMON MISTAKES TO AVOID

[err-00002] helpful=0 harmful=0 :: Avoid using tert-butyl chloroformate (Boc-Cl) for Boc protection when other nucleophilic groups (like aromatic amines) are present. Boc-Cl is more reactive and can lead to undesired side products. The standard and safer choice is di-tert-butyl dicarbonate (Boc2O).
[err-00006] helpful=3 harmful=5 :: Avoid performing premature protecting group removal (e.g., Boc deprotection) as the primary retrosynthetic step when a more strategic bond disconnection (like an amide formed by acylation) is available. This can lead to an incorrect and less synthetically efficient precursor.
[err-00010] helpful=0 harmful=4 :: Avoid assuming O-alkylation over N-alkylation when disconnecting ether linkages in heterocyclic systems. When a heterocycle contains both nitrogen and oxygen nucleophilic sites, incorrectly prioritizing oxygen alkylation can lead to invalid precursors. Always assess relative nucleophilicity, considering that nitrogen in heterocycles like pyrazoles is often the preferred nucleophilic site.
[err-00011] helpful=0 harmful=0 :: Avoid misclassifying amide types (primary vs. secondary) during functional group interconversion (FGI). Secondary amides (where the nitrogen is substituted) are typically formed from esters and amines (ammonolysis), not from nitrile hydrolysis, which yields primary amides. Carefully parse SMILES connectivity to verify if the amide nitrogen is attached to hydrogen (primary) or another group (secondary).
[err-00013] helpful=0 harmful=2 :: Avoid misidentifying the unprotected amine precursor in Boc protection. Do not assume removal of the Boc group yields a tertiary amine; Boc is used for primary/secondary amines. For heterocycles, ensure the unprotected nitrogen is correctly hybridized (e.g., secondary amine NH, not tertiary N).
[err-00014] helpful=1 harmful=0 :: Avoid providing incorrect SMILES for common reagents like di-tert-butyl dicarbonate (Boc2O). The correct SMILES is CC(C)(C)OC(=O)OC(=O)OC(C)(C)C (with two carbonyl groups), not CC(C)(C)OC(=O)OC(C)(C)C. Always double-check reagent structures to prevent invalid precursors.
[err-00018] helpful=1 harmful=0 :: Avoid misapplying the Arbuzov reaction (phosphite + alkyl halide) to synthesize α-halophosphonates. This reaction creates phosphonate esters but does not introduce halogens at the α-carbon. α-Halophosphonates are synthesized via electrophilic halogenation (e.g., with NBS) of the corresponding phosphonate carbanion.
[err-00020] helpful=0 harmful=1 :: Avoid focusing exclusively on obvious functional groups (like alcohols or carbonyls) when identifying reduction sites. Overlooking alkene-to-alkane reductions is a common error. Always check for saturated carbon chains in the target that could result from hydrogenation of alkenes in the precursor.
[err-00025] helpful=2 harmful=4 :: Avoid prioritizing standard bond disconnections (e.g., amide, ether) over Functional Group Addition (FGA) when the target molecule contains obvious 'added' functional groups like halogens (CBr, CCl), nitro groups (NO2), or hydroxy groups (OH). These groups often indicate a recent synthetic addition step (bromination, nitration, oxidation) that should be retrosynthetically removed first.
[err-00029] helpful=1 harmful=3 :: Avoid assuming a deprotection reaction (like Boc removal) is applicable without first confirming the presence of the specific protecting group in the target molecule. Misidentifying amides (O=C(NR)) as carbamates (O=C(OC(C)(C)C)NR) is a common error that leads to invalid precursor predictions.
[err-00031] helpful=0 harmful=4 :: Avoid using carboxylic acids as precursors when the reaction context specifies 'Acylation.' Acylation implies the use of activated acylating agents like acid chlorides or anhydrides. Using carboxylic acids instead can lead to incorrect precursor predictions, as direct esterification may not be feasible or efficient under typical acylation conditions.
[err-00033] helpful=0 harmful=2 :: Avoid assuming N-aryl bond formation is the primary disconnection site in heteroatom alkylation/arylation contexts. When a molecule contains both tertiary alkylamines and N-aryl groups, incorrectly prioritizing arylation over alkylation can lead to invalid precursors. Always assess which nitrogen site is most likely formed by alkylation (typically tertiary amines from secondary amine nucleophiles).
[err-00037] helpful=0 harmful=1 :: Avoid making arbitrary choices between valid ester precursors (e.g., acetate vs. benzoate) when reducing to phenolic alcohols. In phenolic systems, incorrectly selecting acetate esters over the more standard benzoate esters can lead to precursor mismatches. Always consider synthetic conventions and stability factors when multiple ester types are chemically possible.
[err-00039] helpful=1 harmful=2 :: Avoid representing unprotected amino acid precursors as neutral amines (N) when performing Boc deprotection retrosynthesis. The correct precursor should typically be a zwitterion with a protonated ammonium group ([NH3+]), especially when the carboxylic acid is also present or esterified.
[err-00041] helpful=0 harmful=0 :: Avoid using neutral forms of reagents when salt forms are standard practice for specific reactions. For Fischer indole synthesis, phenylhydrazine is typically used as the hydrochloride salt (.Cl in SMILES) to provide the acidic conditions required for the reaction and prevent oxidation. Using the neutral form can lead to incorrect precursor predictions.
[err-00044] helpful=0 harmful=0 :: Avoid prioritizing phenolic OH modifications (e.g., ester reduction) over lactam hydrolysis when both functional groups are present. Lactam hydrolysis to amino acids is a standard and synthetically fundamental FGI that should take precedence over secondary functional group transformations in retrosynthetic planning.
[err-00047] helpful=0 harmful=1 :: Avoid defaulting to Fischer indole synthesis for all nitrogen-containing heterocycles. Fischer indole synthesis specifically requires phenylhydrazine and ketone precursors to form indoles. Misapplying it to pyrroles that should be formed via Paal-Knorr synthesis (using 1,4-dicarbonyls and primary amines) leads to incorrect precursor predictions.
[err-00050] helpful=0 harmful=1 :: Avoid assuming ketalization (e.g., with catechol and acetone) for 1,3-dioxolane ring formation when the aromatic ring has substituents like methoxy or benzyloxy groups. These are not protections but permanent substituents that must be retained in the precursors. Incorrectly simplifying to catechol leads to invalid disconnections; instead, consider alternative mechanisms like epoxidation and ring closure.
[err-00053] helpful=0 harmful=1 :: Avoid overlooking azide-amine interconversions as potential FGI steps when analyzing amines in retrosynthesis. Do not default to protecting group removal (e.g., Boc deprotection) when the amine could be synthesized via azide reduction. Always check if converting the amine to an azide ([N+]=[N-]) is a more appropriate retrosynthetic step, especially when other protecting groups are present and should be preserved in the precursor.
[err-00056] helpful=0 harmful=6 :: Avoid overcomplicating alkyl halide precursors in heteroatom alkylation by assuming iodine is necessary as a leaving group. Chlorine is often sufficient for alkylation reactions, especially when the nucleophile is a strong base like an amine. Unnecessarily substituting chlorine with iodine in the alkyl halide precursor can lead to incorrect SMILES and synthetic inefficiency.
[err-00059] helpful=0 harmful=1 :: Avoid disconnecting the C-C bond between an alkyne and an aliphatic chain (like a cyclohexanol) when the alkyne is also attached to an aromatic ring. Instead, prioritize disconnection at the alkyne-aryl bond if a leaving group (e.g., Br) is plausible on the aromatic ring, indicating a Sonogashira coupling. Incorrectly choosing the aliphatic disconnection overlooks a standard cross-coupling strategy.
[err-00060] helpful=0 harmful=0 :: Avoid misclassifying tertiary alcohols as ketones in retrosynthetic analysis. A tertiary alcohol has a carbon atom bonded to an OH group and three carbon atoms, while a ketone has a carbonyl carbon. Alkynylation of a ketone produces a tertiary alcohol only if the alkyne is terminal and the ketone is ketone, but this is not a direct single-step transformation in retrosynthesis. Confusing these functional groups leads to flawed precursor proposals.
[err-00062] helpful=0 harmful=3 :: Avoid misprioritizing functional group addition disconnections based solely on common transformation patterns without considering functional group reactivity and synthetic timing. Do not automatically disconnect azide groups before chloromethyl groups; chloromethylation typically represents a later synthetic step than azide installation. Evaluate which functionality is more reactive and less stable - these are usually introduced later in synthetic sequences.
[err-00064] helpful=0 harmful=3 :: Avoid defaulting to Sonogashira coupling for alkyne-aryl bond disconnection when a biaryl bond is present and one aromatic system contains a halogen leaving group. Incorrectly prioritizing Sonogashira over Suzuki coupling can lead to synthetically inefficient precursors, especially for complex molecules where Suzuki is more suitable for biaryl bond formation.
[err-00068] helpful=0 harmful=0 :: Avoid searching for protecting groups in the target molecule when analyzing deprotection retrosynthesis. The target shows deprotected functional groups; the precursor contains the protected versions. Misapplying standard protecting group verification (e.g., looking for Boc in target) leads to incorrect conclusions about whether deprotection is needed.
[err-00072] helpful=0 harmful=0 :: Avoid misprioritizing acetal/ketal disconnection over Wittig reaction when both dioxolane rings and vinyl groups are present. Dioxolane rings often serve as protecting groups that remain intact during C-C bond formation steps like Wittig reactions. Incorrectly disconnecting the dioxolane instead of the vinyl-aromatic bond leads to invalid precursor predictions.
[err-00076] helpful=0 harmful=0 :: Avoid assuming all tert-butyl groups (CC(C)(C)) attached to nitrogen are Boc protecting groups. In tetrazole systems, tert-butyl is commonly a permanent substituent, not a protecting group. Misidentifying permanent substituents as protecting groups leads to incorrect deprotection retrosynthetic steps and invalid precursors.
[err-00077] helpful=0 harmful=0 :: Avoid overlooking carboxylic acid groups (C(=O)O) as sites for protection/deprotection when analyzing molecules with multiple functional groups. Carboxylic acids are commonly protected as methyl esters (COC(=O)) and require acidic deprotection, which should be prioritized over analyzing potential protecting groups on other functionalities like tetrazole substituents.
[err-00079] helpful=1 harmful=0 :: Avoid misinterpreting the alpha-carbon hybridization in carboxylic acid SMILES notation. CC(C(=O)O) represents a quaternary carbon alpha to carbonyl (2-methylpropanoic acid), not a tertiary carbon. This misinterpretation leads to incorrect ester protecting group assumptions - confusing tert-butyl deprotection (for quaternary alpha-carbons) with ethyl/methyl deprotection (for secondary alpha-carbons). Always verify the exact connectivity around the carboxylic acid carbon before selecting the protecting group type.
[err-00090] helpful=1 harmful=5 :: Avoid automatically reducing all oxidized functional groups in oxidation retrosynthesis. Do not assume sulfones, nitro groups, or other highly oxidized functionalities must be reduced simply because they appear oxidized. Only reduce groups that were specifically formed in the oxidation reaction described. Misidentifying pre-existing oxidized groups as oxidation products leads to incorrect precursors with unnecessary functional group modifications.
[err-00092] helpful=2 harmful=7 :: Avoid including oxidizing or reducing agents as 'precursor reactants' in retrosynthesis predictions. While these reagents are essential in forward synthesis, retrosynthesis focuses on identifying the organic precursor molecules. Including reagents like Dess-Martin periodinane or sodium borohydride in the output reflects confusion between retrosynthetic analysis (structural precursors) and forward synthesis conditions (reagents).
[err-00094] helpful=0 harmful=1 :: Avoid using acid chlorides or other highly reactive acylating agents for beta-keto systems in retrosynthesis. Beta-keto acids are unstable and typically handled as protected esters (especially tert-butyl esters) in synthetic practice. Incorrectly choosing acid chlorides leads to invalid precursors that would decompose under synthetic conditions.
[err-00097] helpful=0 harmful=0 :: Avoid assuming all acylation reactions require acid chlorides as precursors. Overlooking carboxylic acids as valid acylating agents when coupling conditions are feasible leads to incorrect precursor predictions. Always consider the reaction context and whether direct carboxylic acid activation would be synthetically reasonable.
[err-00098] helpful=0 harmful=0 :: Avoid misinterpreting fused ring systems by assuming standard heterocyclic structures without verifying atom connectivity. Complex SMILES patterns like 'CC(=O)N' within rings may indicate saturated components (e.g., CH2 groups) rather than direct ring fusions. Always validate proposed precursor structures against the target's exact atom connectivity.
[err-00100] helpful=0 harmful=0 :: Avoid misidentifying aromatic nitrogen atoms in heterocycles (represented by 'n' in SMILES) as amine functional groups during reduction retrosynthesis. This error leads to incorrect application of nitro group reduction when the actual transformation was ester-to-alcohol reduction. Always parse SMILES carefully to distinguish between aromatic ring components and modifiable functional groups.
[err-00103] helpful=0 harmful=0 :: Avoid defaulting to phenolic oxygen as the nucleophile in ether disconnections when multiple oxygen nucleophiles are present. Phenolic OH groups are less nucleophilic than aliphatic or heterocyclic OH groups, especially when sterically hindered or part of complex aromatic systems with electron-donating substituents. Incorrectly prioritizing phenolic oxygen alkylation over heterocyclic oxygen alkylation leads to invalid precursor predictions.
[err-00107] helpful=0 harmful=0 :: Avoid generating SMILES for Boc deprotection precursors by mechanically removing the Boc group (CC(C)(C)OC(=O)) from the target SMILES string. This approach often fails for fused heterocycles due to hydrogen specification issues and ring numbering complexities. Instead, derive the precursor structure chemically (ensuring the nitrogen is secondary or primary with explicit hydrogens) and generate the SMILES from scratch, validating with a chemical toolkit if possible.
[err-00110] helpful=0 harmful=1 :: Avoid placing complex substituents on the azide component when disconnecting tetrazoles formed via [3+2] cycloaddition. The azide should be the simple ion ([N-]=[N+]=[N-]), while the nitrile must contain all carbon chains and functional groups that will become the R group substituent on the tetrazole nitrogen. Misassigning the complex structure to the azide instead of the nitrile leads to fundamentally incorrect precursors.
[err-00113] helpful=0 harmful=0 :: Avoid generating SMILES for phosphonate precursors with incorrect atom connectivity where phosphorus is directly bonded to alkene carbons instead of the proper carbon chain. This error commonly occurs when removing functional groups (like halogens) from complex alkenylphosphonates without preserving the underlying vinylphosphonate structure.
[err-00119] helpful=0 harmful=0 :: Avoid assuming lactam or amide nitrogens can be Boc-protected. Lactam nitrogens (in cyclic amides) are part of an amide bond and cannot undergo Boc protection, which is only applicable to free amines. Misidentifying the protection site leads to incorrect precursors where the lactam structure is incorrectly modified instead of protecting a separate amine functionality.
[err-00122] helpful=0 harmful=0 :: Avoid assuming methyl protection (OCH3) for phenolic OH groups without considering alternative protecting groups like benzyl (OBn). The ground truth precursor may use benzyl protection (OCc1ccccc1 pattern in SMILES) instead of methyl. Always verify the specific protecting group used in the context rather than defaulting to the most common option.
[err-00125] helpful=0 harmful=0 :: Avoid assuming primary alcohol precursors for esters derived from secondary alcohols. Misidentifying a methine carbon (CH) as methylene (CH2) leads to incorrect alcohol precursors with extra carbon atoms. Always verify the hybridization of the carbon attached to the ester oxygen by carefully parsing SMILES connectivity.
[err-00127] helpful=0 harmful=5 :: Avoid overlooking chlorine as a valid leaving group in heteroatom alkylation retrosynthesis. Chlorine is often sufficient for nucleophilic substitution with strong nucleophiles like amines, especially in activated systems (e.g., benzylic positions or electron-deficient heterocycles). Unnecessarily substituting chlorine with bromine or iodine in the electrophilic precursor can lead to incorrect SMILES and synthetic inefficiency.
[err-00130] helpful=1 harmful=0 :: Avoid generating structurally invalid alkyl halide precursors when disconnecting Wittig reactions. The phosphonium salt component must preserve the exact atom connectivity of the target molecule's vinyl substituent. Incorrectly splitting the phosphonium salt into separate alkyl halide and phosphine components leads to chemically invalid precursors that don't match the target structure.
[err-00132] helpful=0 harmful=1 :: Avoid disconnecting the wrong C-C bond in Sonogashira coupling retrosynthesis. Do not disconnect the bond between the alkyne and phenyl group when the target contains both pyridine and phenyl rings attached to an alkyne. The correct disconnection is between the heteroaromatic ring (e.g., pyridine) and the alkyne, not between the alkyne and simpler aromatic ring (e.g., phenyl).
[err-00134] helpful=0 harmful=0 :: Avoid assuming reduction targets ether linkages when phenolic OH groups are present in the target molecule. Phenolic OH often originates from reduction of protective ester groups (e.g., benzoates), not from ether reduction. Overlooking phenolic ester protection leads to incorrect identification of reduction sites and invalid precursors.
[err-00135] helpful=0 harmful=0 :: Avoid predicting free amine (NH2) precursors when reversing Boc protection of amino acids. Under typical acidic deprotection conditions (TFA, HCl), the resulting amine is protonated to form an ammonium salt ([NH3+]). This applies specifically to amino acid systems where the carboxylic acid group may also influence protonation state.
[err-00137] helpful=0 harmful=0 :: Avoid defaulting to Paal-Knorr pyrrole synthesis when the target contains a fused heterocyclic system with a hydrazine-derived nitrogen bridge (N connecting aromatic and heterocyclic rings). Such systems typically require Fischer indole-type synthesis using phenylhydrazine and carbonyl precursors. Misapplying Paal-Knorr leads to incorrect precursor predictions that cannot form the required fused ring system.
[err-00139] helpful=0 harmful=0 :: Avoid misapplying nucleophilic aromatic substitution (SNAr) principles to ether formation when standard SN2 alkylation is appropriate. Do not assume aromatic halides (like chlorides on phenol rings) are electrophilic sites for ether formation with alkyl alcohols - instead, the phenolic oxygen should act as nucleophile attacking an alkyl halide. SNAr requires strong electron-withdrawing groups ortho/para to the leaving group, which are typically absent in simple chlorophenols.
[err-00141] helpful=0 harmful=0 :: Avoid breaking carbon-heteroatom bonds (e.g., C-N in amides) during Functional Group Interconversion (FGI). FGI should only transform functional groups (e.g., carbonyl to alcohol via reduction) without changing the carbon skeleton. Applying retrosynthetic disconnections like hydrolysis in FGI contexts leads to invalid precursors.
[err-00003] helpful=0 harmful=1 :: Avoid proposing unsubstituted alkenes (C=C) as precursors for heterocycles containing quaternary carbons. Quaternary centers require trisubstituted alkene precursors with specific substituents (e.g., methyl group C=C(C)) to generate the quaternary carbon during ring-forming reactions like epoxidation followed by nucleophilic attack. Overlooking this substitution requirement leads to invalid precursors that cannot form the target structure.
[err-00006] helpful=1 harmful=5 :: Avoid assuming that prominent protecting groups (like Boc) are necessarily the site of FGI transformation. Stable protecting groups often remain intact during other FGIs, and the actual transformation may involve a different functional group (e.g., azide reduction to amine). Always verify which specific transformation is indicated by comparing target and precursor structures.
[err-00009] helpful=0 harmful=0 :: Avoid generating SMILES for heterocyclic precursors by mechanically modifying functional groups without preserving the exact ring atom ordering from the target. Incorrect atom rearrangement (e.g., changing the sequence of 'c' and 'n' atoms in aromatic rings) alters the substitution pattern and creates invalid precursors.
[err-00018] helpful=0 harmful=0 :: Avoid defaulting to vinyl-aromatic disconnection via Wittig reaction when heterocyclic rings with alkyl substituents are present. The alkyl substituent on the heterocycle often represents the actual site of C-C bond formation, and incorrectly prioritizing the vinyl group leads to invalid precursors.
[err-00003] helpful=0 harmful=1 :: Avoid misinterpreting CC(C(=O)O) SMILES patterns as representing quaternary carbons requiring tert-butyl ester protection. This notation specifically indicates a carboxylic acid attached to a carbon with one methyl substituent (secondary carbon), which typically uses methyl or ethyl ester protection, not tert-butyl protection.
[err-00005] helpful=0 harmful=0 :: Avoid using carboxylic acids as acylating agents in complex molecular contexts with stereocenters and sensitive protecting groups (e.g., Boc). While technically possible with coupling agents, this choice increases racemization risk and side reactions. Prefer anhydride acylating agents which offer superior control and compatibility with delicate functional groups.
[err-00008] helpful=0 harmful=1 :: Avoid assuming methyl ester protection (COC(=O)) for carboxylic acid deprotection without considering ethyl ester (CCOC(=O)) as an equally valid possibility. Ethyl esters are commonly used in synthetic practice and misidentifying the ester type leads to incorrect precursor SMILES with wrong carbon counts in the protecting group.
[err-00013] helpful=0 harmful=0 :: Avoid fixating on sulfone or nitro groups as oxidation sites without verifying all functional groups. Sulfones and nitro groups are frequently stable and not transformed in oxidation steps; misidentifying them leads to incorrect precursors. Instead, check for ketones (which often come from alcohol oxidation) and use protecting groups (e.g., dioxolanes) as indicators of recent ketone formation.
[err-00003] helpful=0 harmful=1 :: Avoid using ethyl esters (OCC) or methyl esters (COC) for beta-keto ester precursors in beta-ketoamide disconnections. These esters are less stable than tert-butyl esters and can lead to decomposition issues during handling or synthesis. Always default to tert-butyl ester protection for beta-keto systems.
[err-00005] helpful=0 harmful=0 :: Avoid treating complex fused heterocycles like benzimidazolones as pre-formed units in retrosynthesis. Do not disconnect the amide bond connecting the heterocycle to other molecular fragments when the heterocycle itself should be formed via cyclization from simpler precursors. This error leads to incorrect precursor predictions that don't align with standard synthetic approaches for heterocycle formation.
[err-00008] helpful=0 harmful=1 :: Avoid prioritizing N-alkylation disconnections for stable heterocyclic substituents (e.g., tert-butyl groups on pyrazole nitrogen) over O-alkylation for benzyl ether formation. Stable groups like tert-butyl on nitrogen are typically permanent substituents, not formed via alkylation in the same step as ether linkages. Incorrect prioritization leads to invalid precursors that don't align with synthetic timing and stability considerations.
[err-00010] helpful=0 harmful=1 :: Avoid converting entire heteroatom-containing groups (like N-alkyl chains) to esters when disconnecting amides attached to heteroatoms. The heteroatom group represents the nucleophile precursor and should be preserved intact; only the carbonyl component should be modified to an ester. Misapplying esterification to the nucleophile component instead of the electrophile component leads to fundamentally incorrect precursors.
[err-00015] helpful=0 harmful=0 :: Avoid misinterpreting vinylphosphonate SMILES patterns as having separate carbon chains between phosphorus and the alkene. In 'COP(=O)(/C=C/CBr)OC', the '/C=C/CBr' indicates a vinyl group directly bonded to phosphorus (P-C=C-Br), not a three-carbon chain (P-C-C=C-Br). This error leads to incorrect precursor generation where the phosphorus connectivity is fundamentally wrong.
[err-00017] helpful=0 harmful=0 :: Avoid assuming that protected alcohols (like silyl ethers) must be reduced in the precursor when analyzing reduction reactions. Silyl protecting groups are often stable under hydrogenation conditions and may remain intact. Instead, systematically check for saturated carbon chains that could originate from alkene reduction, particularly in ester side chains where α,β-unsaturated systems are common.
[err-00020] helpful=0 harmful=0 :: Avoid placing ring closure digits after functional group notations (like 'C1=O') in fused ring systems. This creates malformed SMILES with incorrect atom connectivity. Instead, place the digit immediately after the atom involved in ring closure (e.g., use 'cc21' not 'cc2C1=O' for lactam fusion to aromatic rings).
[err-00022] helpful=0 harmful=0 :: Avoid assuming reasoning is flawed when the predicted answer matches the ground truth and follows standard chemical principles (e.g., amine-to-nitro conversion for reduction retrosynthesis). Over-correcting correct predictions can introduce errors where none exist.
[err-00024] helpful=0 harmful=0 :: Avoid assuming methyl ether protection (OCH3) for phenolic OH groups without verifying the specific protection pattern in the SMILES string. The pattern 'OCc1ccccc1' indicates benzyl ether protection, which requires different deprotection conditions (hydrogenolysis) than methyl ethers (strong acid or Lewis acid). Misidentifying benzyl as methyl protection leads to incorrect precursor predictions.
[err-00027] helpful=0 harmful=0 :: Avoid disconnecting N-aryl bonds via nucleophilic substitution when the target contains amide groups that could be disconnected via acylation. Incorrectly prioritizing N-arylation over amide bond formation leads to invalid precursors, especially when the tertiary amine nitrogen is part of an N-alkyl group (from alkylation) rather than an N-aryl group.
[err-00030] helpful=0 harmful=0 :: Avoid reversing the roles of fragments in Wittig retrosynthesis: the aldehyde precursor should contain the fragment that becomes the 'R group' of the vinyl substituent, while the phosphonium salt precursor should contain the fragment attached to the vinyl carbon. Incorrectly assigning these roles leads to chemically invalid precursors that don't match the target structure.
[err-00032] helpful=0 harmful=0 :: Avoid defaulting to chlorine as the leaving group in Sonogashira coupling retrosynthesis when iodine or bromine is synthetically preferred. Chlorine has significantly lower reactivity in palladium-catalyzed couplings, especially in electron-rich or heteroaromatic systems like pyridines. Incorrect halogen selection leads to invalid precursors that would not undergo efficient coupling under standard conditions.
[err-00034] helpful=0 harmful=0 :: Avoid focusing reduction retrosynthesis on the wrong functional groups without systematic analysis. Do not modify alkyl groups (like converting ethyl to vinyl) when phenolic alcohols are present and could be reduction products from ester precursors. Always prioritize identifying groups that are typical reduction products (alcohols from carbonyls, amines from nitro/azide, alkanes from alkenes) and apply transformations accordingly.
[err-00036] helpful=0 harmful=0 :: Avoid misinterpreting 'Protections' reaction type as deprotection. When the context specifies 'Protections', it means adding protecting groups, not removing them. For Boc protection retrosynthesis, do not output precursors with removed Boc groups; instead, output the molecule with the unprotected functional group (free amine) and the protecting agent (Boc2O).
[err-00040] helpful=0 harmful=0 :: Avoid selecting unstable ketones like pyrrolidin-3-one (O=C1CCNC1) for Fischer indole synthesis. These compounds tautomerize to enol forms and are not practical synthetic reagents. Instead, use stable, commercially available ketones like piperidin-4-one (O=C1CCNCC1) for nitrogen-containing fused rings or cyclohexanone for carbon-only fused rings.
[err-00041] helpful=0 harmful=0 :: Avoid misparsing SMILES ring closure notation when determining fused ring sizes. Patterns like 'CCNC3' with proper closure form 6-atom rings (including the nitrogen), not 5-atom rings. Always count all atoms between ring closure points, considering that each non-hydrogen atom contributes to the ring size regardless of its element type.
[err-00044] helpful=0 harmful=0 :: Avoid defaulting to lactam reduction (carbonyl to methylene) when analyzing lactam targets in FGI retrosynthesis. Incorrectly applying reduction transformations when formation pathways (e.g., hemiaminal dehydration) are more appropriate leads to invalid precursors. Always evaluate both reduction and formation possibilities for lactam functional groups, considering that lactams can be synthesized through condensation reactions rather than reduced from pre-existing lactams.
[err-00046] helpful=0 harmful=0 :: Avoid using acetoacetate derivatives (CH3CO-CH2-CO-) for Paal-Knorr pyrrole synthesis when the target has a methyl group at pyrrole position 2. This incorrectly places COOR at position 3 instead of the methyl group. Use 1,4-pentanedione derivatives (CH3CO-CH2-CH2-CO-) to correctly position methyl at position 2 and the substituent at position 3.
[err-00049] helpful=0 harmful=0 :: Avoid using styrene-like alkenes (with the double bond directly attached to an aromatic ring) as precursors for 1,3-dioxolane rings containing quaternary carbons. These cannot form the quaternary center correctly. Instead, the alkene must be an allyl ether (C=C(C)CO-aryl) where the double bond is part of the chain, allowing epoxidation and subsequent cyclization with a ketone to generate the quaternary carbon.
[err-00054] helpful=0 harmful=0 :: Avoid excluding oxidizing agents from retrosynthesis outputs for oxidation reactions. Despite being termed 'reagents' in forward synthesis, they are essential precursor reactants in retrosynthesis as they are stoichiometrically consumed. Failing to include them (e.g., omitting mCPBA in sulfide to sulfoxide oxidation) results in incomplete and incorrect precursors.
[err-00056] helpful=0 harmful=4 :: Avoid using vinyl halides (e.g., ClC=CCl2) as electrophiles in heteroatom alkylation reactions. Vinyl halides are poor electrophiles for SN2 due to the halide being a poor leaving group on sp2 carbon and the double bond character preventing proper orbital overlap for backside attack. Always prefer alkyl halides with leaving groups on sp3 carbons for reliable nucleophilic substitution.
[err-00059] helpful=0 harmful=0 :: Avoid disconnecting the wrong alkyne-aryl bond in Sonogashira coupling when multiple alkynes are attached to the same aromatic ring. Do not disconnect the bond to the simpler alkyne fragment (e.g., cyanoalkyne N#C-) to make it the terminal alkyne; instead, disconnect the bond to the simpler fragment to make it the aryl halide precursor. The more complex alkyne fragment should be the terminal alkyne component for optimal synthetic efficiency.
[err-00062] helpful=0 harmful=1 :: Avoid applying general chemical preferences (e.g., bromine over chlorine in Suzuki coupling) when ground truth data specifies a particular pattern. Overriding training data patterns with general knowledge leads to incorrect predictions even when the reasoning is chemically sound.
[err-00069] helpful=1 harmful=0 :: helpful=0 harmful=0 :: Avoid assuming that tert-butyl groups on tetrazole nitrogen are permanent substituents. In tetrazole chemistry, tert-butyl is commonly used as a protecting group for the acidic NH proton and should be considered removable in deprotection retrosynthesis. Misclassifying tetrazole tert-butyl groups as permanent leads to incorrect identification of the deprotection site.
[err-00072] helpful=0 harmful=0 :: Avoid adding reagents or performing additional functional group modifications in deprotection retrosynthesis. The precursor should only be the protected molecule—do not add protecting agents (e.g., Boc2O) or modify other groups (e.g., adding ester protection to carboxylic acids) when the target is the deprotected form.
[err-00074] helpful=0 harmful=0 :: Avoid focusing exclusively on obvious oxidation/reduction transformations (like alcohol to ketone) when ether linkages are present in the target molecule. Overlooking that ethers can represent acetal protecting groups that need to be disconnected back to carbonyl and alcohol precursors is a common error that leads to incorrect FGI predictions. Always verify if COC patterns might indicate acetal chemistry requiring reverse transformation to COC(O) hemiacetal forms.
[err-00081] helpful=0 harmful=0 :: Avoid disconnecting beta-ketoamides by isolating the beta-keto ester portion from complex aromatic substituents attached to the beta-carbon. This error breaks critical carbon-carbon bonds and fails to preserve conjugated systems, leading to incorrect precursors that cannot form the target molecule. Always ensure the beta-keto acid derivative synthon includes the entire aromatic or heteroaromatic system attached to the beta-carbon.
[err-00084] helpful=0 harmful=0 :: Avoid assuming ester reduction for primary alcohols without verifying carbon chain compatibility. Misidentifying carboxylic acid reduction as ester reduction is common when the alcohol has a short chain (e.g., -CCO from ethanolamine-like structures). Always check that the precursor ester would have the same carbon count as the alcohol chain; if not, prefer carboxylic acid or aldehyde precursors that preserve the carbon skeleton.
[err-00086] helpful=0 harmful=0 :: Avoid assuming alkylated heteroatoms (e.g., tertiary amines, N-alkyl pyrazoles) can act as nucleophiles in ether disconnection via heteroatom alkylation. Alkylated heteroatoms lack the necessary lone pair availability for nucleophilic attack. Instead, identify free heteroatoms (phenolic OH, secondary amines) as the correct nucleophilic sites for disconnection.
[err-00090] helpful=0 harmful=3 :: Avoid misidentifying the functional group interconversion target by fixating on common transformations (e.g., lactam formation via hemiaminal dehydration) without verifying against the specific molecular context. Incorrectly assuming a lactam needs formation when it is already present in the precursor leads to invalid retrosynthetic steps. Always check if highly oxidized groups (like lactams) are pre-existing before proposing their formation.
[err-00093] helpful=0 harmful=0 :: Avoid assuming reasoning is flawed when the predicted answer matches ground truth and follows standard chemical principles. Over-correcting correct predictions can introduce errors where none exist, particularly when the instruction context misleadingly suggests an error should be present.
[err-00100] helpful=0 harmful=0 :: Avoid placing ring closure digits after functional group notations (like 'C1=O') in fused ring systems. This creates malformed SMILES with incorrect atom connectivity. Instead, place the digit immediately after the atom involved in ring closure (e.g., use 'cc21' not 'cc2C1=O' for lactam fusion to aromatic rings). The ring closure should connect only the main chain atoms, and functional groups like carbonyl (=O) should not be included in the ring closure specification.
[err-00104] helpful=0 harmful=0 :: Avoid assuming amine deprotection (e.g., Boc removal) when phenolic OH groups are present in the target molecule. The phenolic OH may be the actual deprotected functionality, with O-benzyl protection being common. Always check for benzyl ether patterns ('OCc1ccccc1') in potential precursors and verify that the phenolic oxygen in the target corresponds to a protected form in the precursor before considering amine deprotection pathways.
[err-00107] helpful=0 harmful=0 :: Avoid assuming the electrophile needs modification (e.g., adding bromine) when the target already contains a leaving group (e.g., chlorine) on the carbon attached to the heteroatom. This error often occurs when misidentifying which nitrogen was alkylated. Always check the target for existing leaving groups before proposing precursor structures.
[err-00111] helpful=0 harmful=0 :: Avoid interpreting 'precursor reactants' in protection reactions as only the deprotected substrate. For protection reactions (e.g., adding Boc groups), the protecting reagent (like Boc2O) is also a necessary precursor reactant that must be included in the output alongside the unprotected substrate.
[err-00114] helpful=0 harmful=0 :: Avoid assigning anionic functional groups (e.g., sulfonates, carboxylates) to the electrophilic fragment when disconnecting ether bonds via heteroatom alkylation. These groups are stable anions that should remain with the nucleophile fragment (as the alkoxide or phenoxide salt) to ensure chemical feasibility in the forward SN2 reaction. Placing anionic groups on the electrophile creates poor leaving groups and incorrect precursor assignments.
[err-00117] helpful=0 harmful=0 :: Avoid applying Functional Group Addition (FGA) strategies to Functional Group Interconversion (FGI) problems. Do not disconnect or modify substituents like aromatic bromine in FGI contexts—these are part of the core structure. FGI requires preserving the carbon skeleton while only transforming functional groups (e.g., converting lactams to amino alcohols via reduction or formation pathways).
[err-00118] helpful=0 harmful=0 :: Avoid misidentifying the primary functional group for interconversion in complex molecules. When lactams are present, they should be prioritized for FGI over peripheral substituents like halogens. Lactam interconversion (e.g., via oxidation of cyclic amino alcohols) is often the correct FGI pathway, while halogen groups typically remain unchanged in the carbon skeleton.
[err-00120] helpful=0 harmful=0 :: Avoid using epoxidation strategies for 1,3-dioxolane formation when a quaternary carbon is present in the ring. Epoxidation of styrene derivatives cannot generate quaternary centers. Instead, use Prins-type cyclization with trisubstituted allyl alcohols (isopropenyl derivatives) and carbonyl compounds from peracids. Incorrectly applying epoxidation leads to invalid precursors that cannot form the target structure.
[err-00122] helpful=0 harmful=0 :: Avoid fixating on amide bond disconnection as the primary FGI transformation when the target contains primary amines that could originate from azide reduction. Overlooking amine-to-azide conversion leads to incorrect precursors, especially when stable protecting groups (like Boc) are present and should be preserved. Always evaluate all functional groups systematically, and consider azide reduction as a common and synthetically relevant pathway for amine introduction.
[err-00125] helpful=0 harmful=0 :: Avoid substituting halogen leaving groups in electrophilic precursors when the target molecule already contains a viable leaving group (e.g., chlorine). Chlorine is sufficient for nucleophilic substitution with strong nucleophiles like amines, especially in activated systems. Unnecessarily replacing chlorine with bromine or iodine leads to incorrect SMILES and synthetic inefficiency when the original halogen is functionally adequate and matches the target structure.
[err-00127] helpful=0 harmful=2 :: Avoid assuming common substitution patterns (e.g., para-substitution) in aromatic systems without precise SMILES analysis. When disconnecting bonds to aromatic rings, carefully count atom positions from the attachment point to determine the correct substitution pattern. Incorrect assumptions about meta vs para relationships lead to wrong precursor identification, particularly in coupling reactions like Sonogashira where the aryl halide must match the target's substitution pattern exactly.
[err-00131] helpful=0 harmful=0 :: Avoid placing boronic acid groups on complex nitrogen-rich heterocycles (e.g., tetrazoles, imidazoles with multiple nitrogens) in Suzuki coupling disconnections. These systems are synthetically challenging to functionalize with boronic acids and are better utilized as electrophilic components with halogen leaving groups. Instead, place the boronic acid on simpler aromatic fragments that are more amenable to boronation reactions.
[err-00134] helpful=0 harmful=0 :: Avoid assuming methyl protection (CO) is the only or default option for phenolic OH groups in deprotection retrosynthesis. Benzyl protection (OCc1ccccc1) is equally common and synthetically relevant, and silyl protection may also be used. Failing to consider multiple protection strategies can lead to incorrect precursor predictions, as the ground truth may use benzyl or other protections instead of methyl.
[err-00143] helpful=0 harmful=0 :: Avoid defaulting to methyl ester protection (COC(=O)) for carboxylic acid deprotection without specific justification. Ethyl esters (CCOC(=O)) are equally valid and commonly used in synthetic practice. Arbitrarily choosing methyl ester based on prevalence alone can lead to incorrect precursor predictions when ground truth or context indicates ethyl protection.
[err-00145] helpful=0 harmful=0 :: Avoid prioritizing alcohol oxidation/reduction transformations over ether hydrolysis when both functional groups are present in the target molecule. Incorrectly focusing on secondary alcohol interconversion (e.g., oxidizing to ketone) instead of hydrolyzing an ether group (e.g., converting COC to COH) leads to invalid precursors. Ether hydrolysis is often the intended FGI, especially in complex molecules with multiple functional groups.
[err-00148] helpful=0 harmful=0 :: Avoid reducing stable oxidized groups like sulfones or altering fused heterocycles (e.g., benzodioxole) in oxidation retrosynthesis. These groups are typically not the site of oxidation and should be preserved. Instead, focus on ketone groups (C(=O)) attached to aromatic or aliphatic chains, as they are common products of secondary alcohol oxidation.
[err-00153] helpful=0 harmful=0 :: Avoid reducing all oxidized functional groups (e.g., sulfones, nitro groups) in oxidation retrosynthesis without considering the specific reaction context. Not all highly oxidized groups are products of the oxidation step; some may be pre-existing and stable. Misidentifying stable oxidized groups as oxidation targets leads to incorrect precursors with unnecessary functional group modifications.
[err-00155] helpful=0 harmful=0 :: Avoid misrepresenting heterocyclic ring systems in SMILES notation, particularly for rings with adjacent heteroatoms like isoxazoles. Incorrect patterns like 'c2cc(C)no2' (which suggests a pyridine-like structure) instead of the correct 'on1' notation for isoxazoles lead to chemically invalid precursors. Always verify ring connectivity and use standard notations for heterocycles to prevent structural errors.
[err-00159] helpful=0 harmful=0 :: Avoid over-prioritizing N-alkylation over O-alkylation in heterocyclic systems without first verifying the presence of hydroxyl groups. When an ether linkage is adjacent to a heterocycle (e.g., COc2cnn...), carefully examine if the oxygen connects to a carbon that could originate from a hydroxyl group on the heterocycle (e.g., pyrazole with OH substituent). Incorrectly assuming N-alkylation when O-alkylation is correct leads to invalid precursors, especially when stable substituents like tert-butyl on nitrogen are present and should remain unchanged.
[err-00166] helpful=0 harmful=0 :: Avoid misidentifying the connectivity in phosphonate esters during allylic bromination retrosynthesis. Do not assume phosphorus is directly bonded to the alkene carbon (vinylphosphonate structure) when the target shows the alkene as part of a chain attached to phosphorus (allylic substituent). This error leads to incorrect precursors with fundamentally wrong atom connectivity, such as proposing COP(=O)(/C=C/C)OC instead of C/C=C\P(=O)(OC)OC for a target like COP(=O)(/C=C/CBr)OC.
[err-00170] helpful=0 harmful=0 :: Avoid defaulting to acid chlorides as acylating agents without considering functional group compatibility. Sensitive groups (e.g., sulfides, nitro groups, oxidizable functionalities) may require milder conditions with carboxylic acids and coupling agents. Overlooking this can lead to incorrect precursor predictions due to potential side reactions.
[err-00173] helpful=0 harmful=0 :: Avoid using explicit hydrogens ([H]) in SMILES strings for secondary amines (e.g., N[H]C), as this incorrectly represents a primary amine. Secondary amines should be denoted with 'N' and two carbon attachments (e.g., NC) without explicit hydrogens to maintain correct chemical meaning and connectivity.
[err-00181] helpful=0 harmful=0 :: Avoid assuming halogen position on heterocyclic rings without precise verification. For pyridine derivatives, common SMILES patterns like 'Clc1ncncc1' place chlorine at position 2 (ortho to nitrogen), but the coupling may require position 5 (meta to nitrogen). Always map the target's substitution pattern to ensure the halogen in the precursor matches the exact coupling site. Misplaced halogens lead to regiochemically incorrect precursors that cannot form the target molecule.
[err-00185] helpful=0 harmful=0 :: Avoid generating SMILES for phenylhydrazine derivatives with the hydrazine group incorporated into the aromatic ring notation (e.g., 'c(NN)c'). This breaks the ring connectivity and misrepresents the molecule. The hydrazine group must be attached externally to preserve the aromatic ring structure, using patterns like 'cc1NN' or 'c1ccc(NN)cc1'.
[err-00187] helpful=0 harmful=0 :: Avoid misassigning sulfonate groups (R-SO₃⁻) as nucleophiles in heteroatom alkylation reactions. Sulfonate groups are excellent leaving groups, not nucleophiles. Incorrectly identifying a sulfonated species (e.g., sulfonated phenol) as the nucleophile attacking an alkyl halide leads to invalid precursors. Instead, the sulfonate should be on the electrophilic component (alkyl sulfonate) attacked by a nucleophile like phenoxide.
[err-00189] helpful=0 harmful=0 :: Avoid misidentifying the FGI site in lactam-containing heterocycles (e.g., benzimidazolones) by focusing on tertiary amine nitrogens for transformations like N-oxide reduction. The lactam carbonyl is the primary FGI site, commonly interconverted via hydration/dehydration pathways. Incorrectly prioritizing nitrogen-centered FGIs over carbonyl hydration/dehydration leads to invalid precursors that don't align with synthetic strategies for lactam formation.
[err-00191] helpful=0 harmful=0 :: Avoid misapplying Fischer indole synthesis to pyrrole systems with acyl groups (e.g., N-acetyl) on the nitrogen. These groups typically originate from acetylated aniline precursors in Paal-Knorr synthesis, not from N-alkyl phenylhydrazines. Incorrectly using Fischer indole synthesis for such cases leads to fundamentally wrong precursor predictions that cannot form the target molecule.
[err-00193] helpful=0 harmful=0 :: Avoid omitting the peracid reactant when disconnecting 1,3-dioxolane rings formed via epoxidation/cyclization. Epoxidation requires an external oxidant (e.g., mCPBA) which is stoichiometrically consumed and must be included as a separate precursor in the retrosynthetic output. Failing to include the peracid results in an incomplete and incorrect precursor set.
[err-00210] helpful=0 harmful=0 :: Avoid using explicit hydrogen notation (e.g., [nH]) for nitrogen atoms that are protected with substituents like tert-butyl or Boc groups. This incorrectly implies a deprotected state (free NH). For protected tetrazole nitrogens, the SMILES should show the protecting group attached directly (e.g., nC(C)(C)C) without hydrogen, as in the precursor COC(=O)C1CCc2c(c(-c3ccncc3)nn2C(C)(C)C)C1.
[err-00213] helpful=0 harmful=0 :: Avoid retaining stereochemistry designations ([C@H] or [C@@H]) for hydroxyl-bearing carbons in retrosynthetic precursors when the ground truth or synthetic context indicates an achiral representation (C). This carbon may be stereolabile or its configuration undefined in the precursor, and preserving chirality can lead to mismatches with expected answers, even if the carbon is chiral in the target molecule.
[err-00215] helpful=0 harmful=0 :: Avoid misplacing the ester protection on adjacent atoms (e.g., ring nitrogens) instead of the carbon bearing the carboxylic acid. In deprotection retrosynthesis, the protecting group must be added to the exact atom that was deprotected. For example, a target with CC(=O)O (carboxylic acid on a carbon chain) should have a precursor with CCOC(=O)C (ethyl ester on the same carbon), not NCCOC(=O) (ester on nitrogen).
[err-00220] helpful=0 harmful=0 :: Avoid assuming all highly oxidized groups (e.g., sulfones, nitro groups) must be reduced in oxidation retrosynthesis. These groups are often pre-existing and stable. Only reduce groups that align with the specific reaction context—typically ketones from secondary alcohols or aldehydes from primary alcohols. Misidentifying stable oxidized groups as oxidation products leads to incorrect precursors with unnecessary functional group modifications.
## PROBLEM-SOLVING HEURISTICS

[ph-00007] helpful=0 harmful=0 :: When multiple disconnection sites are available, prioritize the simplest and most accessible bond first. Standard amide bonds (C(=O)N) formed by acylation should take precedence over complex systems like beta-ketoamides. The precursor should logically result from amine nucleophile + carboxylic acid derivative acylating agent, with the amine component typically having a free NH group.
[ph-00021] helpful=0 harmful=0 :: When analyzing reduction retrosynthesis, create a systematic checklist of all potential reduction sites: (1) alkenes to alkanes, (2) carbonyls to alcohols, (3) nitriles to amines, etc. Prioritize based on the molecular context - saturated chains often indicate alkene reduction takes precedence over functional group reductions.
[ph-00045] helpful=0 harmful=0 :: When multiple functional groups are present, prioritize FGI based on synthetic fundamentality: (1) Amide/lactam hydrolysis to carboxylic acid + amine, (2) Carbonyl reductions/alcohol oxidations, (3) Aromatic substitutions/modifications. This hierarchy ensures the most synthetically basic transformations are addressed first.
[ph-00048] helpful=0 harmful=0 :: When analyzing heterocycle formation in retrosynthesis, systematically evaluate the ring system: (1) Identify the heterocycle type (pyrrole, indole, pyrazole, etc.), (2) Check for characteristic substitution patterns and ring fusion, (3) Match to appropriate formation mechanisms (Paal-Knorr for pyrroles, Fischer for indoles, etc.), (4) Verify precursor compatibility with the target's substitution pattern.
[ph-00051] helpful=0 harmful=1 :: When analyzing heterocycles with ether linkages (e.g., dioxolanes), evaluate ring formation mechanisms: (1) Check for substituents on adjacent rings—if present (e.g., methoxy, benzyloxy), ketalization is unlikely; (2) Look for quaternary carbons in the ring, suggesting epoxidation precursors; (3) Disconnect to alkene and peracid if epoxidation is indicated; (4) Ensure all substituents are retained in precursors without unnecessary simplification.
[ph-00073] helpful=1 harmful=1 :: When multiple C-C bond disconnection options exist, prioritize based on functional group characteristics: (1) Vinyl groups (C=C attached to rings) suggest Wittig or Heck reactions, (2) Alkynes suggest Sonogashira coupling, (3) Biaryl bonds suggest Suzuki coupling. Acetals/ketals (dioxolane rings) are typically protecting groups and should not be disconnected unless they are the obvious site of recent bond formation.
[ph-00080] helpful=0 harmful=1 :: When performing carboxylic acid deprotection retrosynthesis, use this heuristic for ester selection: (1) If alpha-carbon is quaternary (no hydrogens, e.g., CC(C(=O)O)), use tert-butyl ester protection (CC(C)(C)OC(=O)); (2) If alpha-carbon is secondary (one hydrogen, e.g., C(C)C(=O)O), consider methyl (COC(=O)) or ethyl (CCOC(=O)) esters based on context; (3) If alpha-carbon is primary (two hydrogens, e.g., CCC(=O)O), multiple ester types are possible. Always match the carbon count in the ester group to the target's substitution pattern.
[ph-00083] helpful=1 harmful=0 :: When performing deprotection retrosynthesis, work backward systematically: (1) Identify which functional groups in the target are deprotected (e.g., free amine NH2, free carboxylic acid COOH), (2) Add the appropriate protecting group to these functionalities in the precursor (e.g., Boc for amines, methyl ester for carboxylic acids), (3) Ensure other functional groups that might be affected by deprotection conditions are also protected in the precursor. Remember: deprotection in retrosynthesis means adding protections back to the precursor.
[ph-00123] helpful=0 harmful=0 :: When analyzing nitrogen-containing heterocycles for protection/deprotection, systematically verify nitrogen hybridization: (1) Check if nitrogen is part of amide bond (O=C-N) - these cannot be protected, (2) Count substituents - tertiary nitrogens have three bonds to carbon, secondary have two carbon bonds and one hydrogen, (3) For piperazine rings, both nitrogens are often tertiary (one amide, one alkyl-substituted), (4) Only secondary amines (-NH-) are eligible for Boc protection.
[ph-00128] helpful=0 harmful=0 :: When analyzing heteroatom alkylation/arylation targets, use this systematic approach: (1) Identify all nitrogen atoms and their connectivity (N-alkyl vs N-aryl), (2) Scan for potential leaving groups (Cl, Br, I) on alkyl chains attached to nitrogen, (3) Prioritize disconnection at N-alkyl bonds with leaving groups present, (4) For N-aryl bonds, only consider SNAr if strong electron-withdrawing groups are ortho/para to the nitrogen attachment point, (5) Verify the nucleophilic precursor has the correct hybridization (secondary amine for tertiary N-alkyl products).
[ph-00013] helpful=0 harmful=0 :: When multiple C-C bond disconnection options exist in complex molecules, evaluate synthetic accessibility hierarchy: (1) Simple alkyl-aryl bonds (sp3-sp2) formed via alkylation, (2) Biaryl bonds formed via Suzuki coupling, (3) Alkyne-aryl bonds formed via Sonogashira coupling. Prioritize disconnections that lead to the most synthetically accessible fragments, considering that alkylation reactions are generally more robust and less sensitive to functional groups than transition metal-catalyzed cross-couplings.
[ph-00016] helpful=0 harmful=0 :: When analyzing Boc protection in complex heterocyclic systems, use this verification sequence: (1) Identify all nitrogen atoms in the target, (2) Check for carbamate pattern [C(=O)OC(C)(C)C]-N- to confirm Boc presence, (3) Verify the nitrogen is NOT part of an amide bond (O=C-NR) or lactam - only free primary/secondary amines can be protected, (4) Ensure the unprotected precursor has the correct hybridization (NH or NH2) at that specific position, (5) Preserve all other functional groups and ring systems unchanged in the precursor.
[ph-00014] helpful=0 harmful=0 :: When analyzing oxidation retrosynthesis, systematically compare oxidation states: (1) Identify all oxidized groups (ketones, aldehydes, carboxylic acids, sulfones, nitro groups), (2) Check for protecting groups (e.g., dioxolane OCO2 for ketones) that hint at recent transformations, (3) Prioritize ketone-to-alcohol reduction reversal over sulfone/nitro reduction, as ketones are more commonly formed via oxidation in synthetic sequences. Only reduce groups that align with the reaction context.
[ph-00028] helpful=0 harmful=0 :: When analyzing heteroatom alkylation/arylation targets with both amide and N-aryl/alkyl bonds, use this priority: (1) Disconnect amide bonds first (amine + acyl chloride/carboxylic acid derivative), (2) For tertiary amines, disconnect N-alkyl bonds (secondary amine + alkyl halide) only if no amide is present, (3) For N-aryl bonds, consider SNAr only if strong electron-withdrawing groups are ortho/para to the nitrogen attachment point. Always verify the amine component's substituents (e.g., N-methyl vs N-H) in precursors.
[ph-00047] helpful=0 harmful=0 :: When analyzing Paal-Knorr pyrrole synthesis targets, use this substituent mapping heuristic: (1) Identify pyrrole positions 2 and 3 - position 2 substituent comes from the methyl end of the 1,4-dicarbonyl, position 3 substituent comes from the carbonyl end. (2) The amine component provides the N-substituent and any ortho-substituents on attached aryl rings. (3) For methyl at position 2, require a 1,4-pentanedione derivative; for other alkyl groups, adjust the dicarbonyl chain length accordingly.
[ph-00057] helpful=0 harmful=0 :: When analyzing heteroatom alkylation targets, verify the electrophile's carbon hybridization: (1) Check if the leaving group (Cl, Br, I) is on an sp3 carbon (alkyl halide) - these are good electrophiles for SN2, (2) Avoid sp2 carbon halides (vinyl halides) as they are poor electrophiles, (3) For allylic systems, ensure the leaving group is on an sp3 carbon adjacent to the double bond, not directly on the vinyl carbon.
[ph-00060] helpful=0 harmful=0 :: When performing Functional Group Addition (FGA) retrosynthesis with multiple added groups, prioritize reverting the most reactive/unstable functionalities first as they are typically introduced late in syntheses. Use this reactivity hierarchy: chloromethyl (-CH2Cl), acyl chloride (-C(=O)Cl) > alkyl halides > azide (-N3), nitro (-NO2) > other stable groups. Stable groups like azides are often pre-existing and should remain unchanged when reverting more reactive late-stage additions.
[ph-00075] helpful=0 harmful=0 :: When analyzing complex molecules for FGI retrosynthesis, use this systematic approach for ether linkages: (1) Identify all C-O-C patterns in the target, (2) Check if they could represent acetal/hemiacetal functionality (particularly in cyclic systems or near carbonyl groups), (3) Prioritize acetal disconnection over other FGIs when the ether appears to be a protecting group, (4) Convert acetal ethers back to carbonyl + alcohol precursors, which may appear as hemiacetal forms in the ground truth. This takes precedence over standard alcohol oxidation when both transformations are possible.
[ph-00078] helpful=0 harmful=0 :: When performing oxidation retrosynthesis, systematically analyze all functional groups to identify oxidation sites: (1) Identify all oxidized groups (ketones, aldehydes, carboxylic acids, sulfoxides, sulfones, nitro groups) in the target molecule, (2) Compare with the precursor structure to determine which groups were formed in the oxidation step (e.g., ketone from secondary alcohol, sulfone from sulfide), (3) Only apply the reverse reduction to groups that were oxidized in the forward reaction, leaving pre-existing oxidized groups unchanged. This prevents misidentification of stable oxidized functionalities as oxidation products.
[ph-00089] helpful=0 harmful=0 :: When identifying the functional group interconversion (FGI) target, systematically analyze ALL functional groups in the target molecule to determine which are likely pre-existing versus those that were transformed. Prioritize FGI on groups that are typically synthetic products (e.g., carbamates, amides, esters) over stable core functionalities (e.g., lactams, aromatic systems). Use context clues like the presence of '.N' in ground truth to indicate amine reactants for carbamate/amide formation.
[ph-00149] helpful=0 harmful=0 :: When analyzing oxidation retrosynthesis with multiple oxidized groups, use this priority: (1) Identify ketones (C(=O)) as they often come from alcohol oxidation, (2) Check for alcohols in the precursor by replacing ketone with C(O) for secondary alcohols, (3) Preserve sulfones, nitro groups, and stable heterocycles (e.g., dioxolanes) as they are rarely reduced in standard oxidation steps, (4) Use molecular context (e.g., dioxolane fused to aromatic ring indicates stable benzodioxole, not a protecting group).
[ph-00162] helpful=0 harmful=0 :: For oxidation retrosynthesis targeting aldehydes, convert the carbonyl group (-CHO) to a primary alcohol (-CH2OH) while preserving the entire carbon skeleton unchanged. Exclude oxidizing agents from the precursor output, as they are reagents, not organic reactants. This transformation is universal for both aliphatic and aromatic aldehydes and takes precedence over modifying other oxidized groups like sulfones or nitro groups, which are typically pre-existing.
[ph-00182] helpful=0 harmful=0 :: When performing reduction retrosynthesis, systematically evaluate phenolic OH groups (O attached to aromatic carbon) as potential reduction products from ester precursors, particularly benzoate esters (OC(=O)c1ccccc1). This transformation using reducing agents like LiAlH4 is a common single-step approach for introducing phenolic alcohols and should be prioritized over reducing other functional groups when the molecular context supports it.
[ph-00204] helpful=0 harmful=0 :: When performing Hantzsch thiazole synthesis retrosynthesis, systematically map all substituents on the thiazole ring: (1) Position 4 substituent comes from the thioamide component, (2) Positions 2 and 5 substituents come from the α-halocarbonyl component. Verify that the α-halocarbonyl precursor contains all necessary substituents for both positions without truncation.
## CONTEXT CLUES & INDICATORS

## OTHERS