<! $Id: classes-format.5,v 1.3 2007/12/19 22:08:05 stolcke Exp $>
<HTML>
<HEADER>
<TITLE>classes-format</TITLE>
<BODY>
<H1>classes-format</H1>
<H2> NAME </H2>
classes-format - File format for word class definitions
<H2> SYNOPSIS </H2>
<PRE>
<I>class</I> [<I>p</I>] <I>word1</I> <I>word2</I> ...
</PRE>
<H2> DESCRIPTION </H2>
Various programs dealing with word classes use this format to define
the posssible expansions of classes and their respective probabilities.
Each expansion appears on a separate line as in 
the synopsis, where
<I> class </I>
names a word class,
<I> p </I>
gives the probability for the class expansion, and
<I> word1 word2 ... </I>
defines the word string that the class expands to.
If 
<I> p </I>
is omitted it is assumed to be 1.
(All expansion probabilities for a given class should sum to one,
although this is not necessarily enforced by the software and would
lead to improper models.)
<P>
Note that the concept of word class here is generalized to include
``multi-words'', or phrases consisting of more than one word.
All expansions must have at least one word.
Certain models might impose more restrictive formats.
<H2> SEE ALSO </H2>
<A HREF="ngram.1.html">ngram(1)</A>, <A HREF="ngram-class.1.html">ngram-class(1)</A>, <A HREF="disambig.1.html">disambig(1)</A>, <A HREF="training-scripts.1.html">training-scripts(1)</A>, <A HREF="pfsg-scripts.5.html">pfsg-scripts(5)</A>.
<H2> AUTHOR </H2>
Andreas Stolcke &lt;stolcke@speech.sri.com&gt;.
<BR>
Copyright 1999 SRI International
</BODY>
</HTML>
