{"rule":"MORFOLOGIK_RULE_EN_US","sentence":"^\\QIt implements the misreporting heuristic based on token indices, runs it on prompts (taken from the LMSYS dataset) for multiple iterations, determining the plausibility in the last step, and returns the number of plausible longer tokenizations found.\\E$"}
{"rule":"MORFOLOGIK_RULE_EN_US","sentence":"^\\Q\\E(?:Dummy|Ina|Jimmy-)[0-9]+\\Q contains auxiliary functions for tokenization operations, including finding all possible tokenizations of a string, computing the cumulative autoregressive probability of a token sequence, or verifying if a token sequence is top-p/k plausible.\\E$"}
{"rule":"MORFOLOGIK_RULE_EN_US","sentence":"^\\Q\\E(?:Dummy|Ina|Jimmy-)[0-9]+\\Q analyzes the effect of a random policy the randomly splits tokens, plotting the increase in overcahrged tokens, and the likelihood of finding plausible tokenizations.\\E$"}
