<! $Id: nbest-scripts.1,v 1.52 2019/09/09 22:35:37 stolcke Exp $>
<HTML>
<HEADER>
<TITLE>nbest-scripts</TITLE>
<BODY>
<H1>nbest-scripts</H1>
<H2> NAME </H2>
nbest-scripts, combine-rover-controls, compare-sclite, compute-best-rover-mix, compute-sclite-nbest, compute-sclite, fix-ctm, merge-nbest, nbest-error, nbest-posteriors, nbest-rover, nbest-vocab, nbest-words, nbest2-to-nbest1, rescore-acoustic, rescore-decipher, rescore-reweight, rover-control-tying, rover-control-weights, search-rover-combo, sentid-to-sclite - rescore and evaluate N-best lists
<H2> SYNOPSIS </H2>
<PRE>
<B>rescore-decipher</B> [ <B>-bytelog</B> ] [ <B>-nodecipherlm</B> ] [ <B>-multiwords</B> ] \
	[ <B>-multi-char</B> <I>C</I> ] [ <B>-pretty</B> <I>mapfile</I> ] \
	[ <B>-ngram-tool</B> <I>program</I> ][ <B>-filter</B> <I>command</I> ] \
	[ <B>-norescore</B> ] [ <B>-lm-only</B> ] [ <B>-count-oovs</B> ] [ <B>-limit-vocab</B> ] \
	[ <B>-vocab-aliases</B> <I>mapfile</I> ] [ <B>-fast</B> ] \
	<I>nbest-file-list</I> <I>score-dir</I> <B>-lm</B> ... <I>lm-options</I> ...
<B>rescore-acoustic</B> <I>old-nbest-dir</I>|<I>old-file-list</I> <I>old-ac-weight</I> \
	<I>new-score-dir1</I> <I>new-ac-weight1</I> ... <I>new-nbest-dir</I> [ <I>max-nbest</I> ]
<B>rescore-reweight</B> [ <B>-multiwords</B> ] [ <B>-multi-char</B> <I>C</I> ] <I>score-dir</I>|<I>file-list</I> \
	[ <I>lmw</I> [ <I>wtw</I> [ <I>score-dir1 score-weight1</I> ... ] [ <I>max-nbest</I> ]]]
<B>rescore-minimize-wer</B> <I>score-dir</I> [ <I>lmw</I> [ <I>wtw</I> [ <I>max-nbest</I> ]]]
<B>nbest2-to-nbest1</B> [ <I>nbest-file</I> ]
<B>nbest-rover</B> [ <I>sentid-list</I> | <B>-</B> ] <I>control-file</I> \
	[ <I>posterior-file</I> [ <I>nbest-lattice-options</I> ] ]
<B>combine-rover-controls</B> [ <B>lambda=</B><I>weights</I> ] [ <B>postscale=</B><I>S</I> ] \
	[ <B>keeppaths=1</B> ] <I>rover-control</I> [ ... ]
<B>rover-control-weights</B> [ <B>weights=</B>"<I>w1 ... wn</I>" ] <I>control-file</I>
<B>rover-control-tying</B> <I>control-file</I>
<B>nbest-optimize-args-from-rover-control</B> [ <B>print_weights=1</B> ] [ <B>print_dirs=1</B> ] <I>control-file</I>
<B>compute-best-rover-mix</B> [ <B>lambda=</B><I>weights</I> ] [ <B>addone=</B><I>c</I> ] [ <B>precision=</B><I>p</I> ] \
	[ <B>tying=</B>"<I>b1 ... bn</I>" ] [ <B>write_weights=</B><I>file</I> ] <I>reference-posteriors-output</I>
<B>search-rover-combo</B> [ <B>-scorer</B> <I>script</I> ] [ <B>-datadir</B> <I>dir</I> ] \
	[ <B>-weights</B> <I>weights</I> ] [ <B>-sentids</B> <I>list</I> ] \
	[ <B>-refs</B> <I>refs</I> ] [ <B>-smooth-weight</B> <I>S</I> ] \
	[ <B>-J</B> <I>n</I> ] <I>list-of-control-files</I>
<B>nbest-posteriors</B> [ <B>weight=</B><I>W</I> ] [ <B>lmw=</B><I>lmw</I> ] [ <B>wtw=</B><I>wtw</I> ] [ <B>postscale=</B><I>S</I> ] \
	[ <B>max_nbest=</B><I>M</I> ] <I>nbest-file</I>
<B>merge-nbest</B> [ <B>multiwords=1</B> ] [ <B>multichar=</B><I>C</I> ] [ <B>nopauses=1</B> ] \
	[ <B>max_nbest=</B><I>M</I> ] <I>nbest-file</I> ...
<B>nbest-vocab</B> [ <I>nbest-list</I> ... ]
<B>nbest-words</B> [ <I>nbest-list</I> ... ]
<B>nbest-oov-counts</B> <B>vocab=</B><I>vocabfile</I> [ <B>vocab_aliases=</B><I>aliasfile</I> ] <I>nbest-list</I>
<B>nbest-error</B> <I>score-dir</I>|<I>file-list</I> <I>refs</I> [ <I>nbest-lattice-option</I> ... ]
<B>sentid-to-sclite</B> <I>hyps</I>
<B>sentid-to-ctm</B> <I>hyps</I>
<B>fix-ctm</B> <I>ctmfile</I>
<B>compute-sclite</B> <B>-r</B> <I>refs</I> <B>-h</B> <I>hyps</I> [ <B>-h</B> <I>hyps</I> ... ] [ <B>-S</B> <I>subset</I> ... ] \
	[ <B>-multiwords</B>|<B>-M</B> ] [ <B>-noperiods</B> ] [ <B>-R</B> ] [ <B>-g</B> <I>glmfile</I> ] [ <B>-H</B> ] \
	[ <B>-v</B> ] [ <I>sclite-options</I> ...]
<B>compute-sclite-nbest</B> <I>file-list</I> <I>output-dir</I> -r <I>refs</I> [ <B>-filter</B> <I>script</I> ] [ <I>sclite-options</I> ...]
<B>compare-sclite</B> <B>-r</B> <I>refs</I> <B>-h1</B> <I>hyps1</I> <B>-h2</B> <I>hyps2</I> [ <B>-S</B> <I>subset</I> ] \
	[ <I>compute-sclite-options</I> ... ]
</PRE>
<H2> DESCRIPTION </H2>
These scripts perform common tasks on N-best hypotheses in 
<A HREF="nbest-format.5.html">nbest-format(5)</A>,
especially those needed for rescoring and extracting and evaluating
1-best hypotheses.
<P>
<B> rescore-decipher </B>
applies a language model implemented by 
<A HREF="ngram.1.html">ngram(1)</A>
to the N-best lists listed in
<I>nbest-file-list</I>.<I></I><I></I><I></I>
The N-best files may be in compressed format.
The rescored N-best lists are stored in directory
<I>score-dir</I>.<I></I><I></I><I></I>
All following arguments are passed to 
<A HREF="ngram.1.html">ngram(1)</A>
and are used to control the language model.
The following options are handled by 
<B> rescore-decipher </B>
itself:
<DL>
<DT><B> -bytelog </B>
<DD>
causes scores to be output on the bytelog scale
(see 
<A HREF="nbest-format.1.html">nbest-format(1)</A>).
<DT><B> -nodecipherlm </B>
<DD>
indicates that the recognizer language model is not being provided
(with
<B>-decipher-lm</B>).<B></B><B></B><B></B>
(This is only possible if the N-best lists are not in ``NBestList1.0'' format.)
<DT><B> -multiwords </B>
<DD>
specifies that N-best lists contain words joined by underscores, which are
to be split into their component prior to rescoring.
<DT><B>-multi-char</B><I> C</I><B></B><I></I><B></B><I></I><B></B>
<DD>
defines a multiword separator character.
The default is underscore ``_''.
<DT><B>-pretty</B><I> mapfile</I><B></B><I></I><B></B><I></I><B></B>
<DD>
specifies a word mapping file that allows individual words to be globally
replaced by strings of zero or more other words, e.g., to remove vocabulary
mismatches between the input N-best lists and the rescoring LM.
The 
<I> mapfile </I>
contains one mapping per line, the first field specifying the word to be
replaced and subsequent fields forming the replacement string.
<DT><B>-ngram-tool</B><I> program</I><B></B><I></I><B></B><I></I><B></B>
<DD>
specifies a non-standard
<I> program </I>
to perform the actual LM evaluation
(by default, 
<A HREF="ngram.1.html">ngram(1)</A>
is used).
Such a program must understand
<B>ngram</B>'s<B></B><B></B><B></B>
command-line options related to N-best rescoring.
<DT><B>-filter</B><I> command</I><B></B><I></I><B></B><I></I><B></B>
<DD>
specifies a
<I> command </I>
that is used to filter the N-best hypotheses prior to
evaluating the language model.
This may be used for more general textual rewriting so that non-standard
LMs can be applied.
The output N-best lists will contain the filtered hypotheses.
<DT><B> -norescore </B>
<DD>
causes N-best lists to be simply reformatted from one of the Decipher formats
into the SRILM N-best format, separating acoustic and LM scores, without
replacing the existing LM scores.
In this case only the 
<A HREF="ngram.1.html">ngram(1)</A>
options
<B>-decipher-lmw</B><B></B><B></B><B></B>
and 
<B>-decipher-wtw</B><B></B><B></B><B></B>
are relevant, and others are ignored.
<B> -norescore </B>
and 
<B> -filter </B>
may be used together to perform textual rewriting of N-best lists.
<DT><B> -lm-only </B>
<DD>
dumps out LM scores only, instead of complete N-best lists.
<DT><B>-count-oovs</B><B></B><B></B><B></B>
<DD>
writes the count of out-of-vocabulary and zero-probability words to
the output score files (instead of rescored N-best lists).
<DT><B> -limit-vocab </B>
<DD>
saves memory by arranging for
<A HREF="ngram.1.html">ngram(1)</A> 
to load only those N-gram parameters that are relevant to the vocabulary
of the N-best lists to be rescored.
After determining the N-best vocabulary the 
<B> -limit-vocab </B>
option is passed to 
<A HREF="ngram.1.html">ngram(1)</A>.
<DT><B>-vocab-aliases</B><I> map</I><B></B><I></I><B></B><I></I><B></B>
<DD>
declares that certain words are to be treated as alternative spellings 
of the same word for LM evaluation; see the same option for 
<A HREF="ngram.1.html">ngram(1)</A>.
The 
<I> map </I>
is filtered of unused words when used in conjunction with
<B>-limit-vocab</B>,<B></B><B></B><B></B>
and then passed on to 
<A HREF="ngram.1.html">ngram(1)</A>.
<DT><B> -fast </B>
<DD>
performs rescoring using only functions built into
<A HREF="ngram.1.html">ngram(1)</A>.
This avoids some computational and I/O overhead and therefore runs faster,
but the options
<B>-filter</B>,<B></B><B></B><B></B>
<B>-pretty</B>,<B></B><B></B><B></B>
and 
<B> -lm-only </B>
are not supported, and 
<B> -nodecipherlm </B>
is obligatory.
</DD>
</DL>
<P>
<B> rescore-acoustic </B>
replaces the acoustic scores in a set of N-best lists by a weighted 
combination of new scores.
The old N-best lists are given by either a directory
<I> old-score-dir </I>
or a filelist
<I>old-file-list</I>;<I></I><I></I><I></I>
<I> old-ac-weight </I>
is the weight given to the old scores.
Directories containing the new scores are listed alternating with the
corresponding weights; each score directory must contain one 
file per waveform segment, each having the same file basenames as 
the original N-best lists.
The new scores should appear in a single column per file, one per line.
The N-best lists containing the new combined acoustic scores are written to 
<I>new-nbest-dir</I>.<I></I><I></I><I></I>
The optional
<I> max-nbest </I>
argument can be used to limit the length of the N-best lists output.
Also, When a new score file is encountered containing fewer than
<I> max-nbest </I>
lines, the missing scores are set to the lowest score encountered so far.
<P>
<B> rescore-reweight </B>
combines the scores in N-best lists with a set of weights and outputs
the 1-best hypotheses.
The N-best files are found in directory
<I> score-dir </I>
or listed in
<I>file-list</I>.<I></I><I></I><I></I>
Optional arguments set the language model weight
<I> lmw </I>
(default 8),
the word transition weight
<I> wtw </I>
(default 0),
and the maximum number
<I> max-nbest </I>
of hypotheses to consider (default all).
Optionally, any number of additional score directories and associated
weights
<I> score-dir1 score-weight1 score-dir2 score-weight2 </I>
... can be specified, following the
<I> wtw </I>
parameter.
These additional scores are combined with those contained in the
N-best lists themselves as in
<B> rescore-acoustic </B>
(using unit weight for the original acoustic scores).
The
<B> -multiwords </B>
and
<B> -multi-char </B>
options have the same function as for
<B>rescore-decipher</B>.<B></B><B></B><B></B>
The output format for 1-best hypotheses is
<PRE>
	<I>sentid</I> <I>w1</I> <I>w2</I> ...
</PRE>
where
<I> sentid </I>
is the sentence ID derived from the N-best filename, followed by 
the words.
<P>
<B> rescore-minimize-wer </B>
is similar to 
<B> rescore-reweight </B>
but picks hypotheses using the word error minimization algorithm
of 
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>.
<P>
<B> nbest2-to-nbest1 </B>
converts an N-best list in ``NBestList2.0'' format to ``NBestlist1.0'',
for the benefit of programs that have not yet been updated to deal with 
the new format.
<P>
<B> nbest-rover </B>
combines hypotheses from multiple N-best lists at the word level,
by performing the same kind of word error minimization as 
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>,
in a generalization of the ROVER algorithm.
<I> sentid-list </I>
is a file listing sentence IDs.
These must match the filenames in a set of N-best directories,
which are specified in a
<I>control-file</I>.<I></I><I></I><I></I>
The format for the latter is
<PRE>
	<I>dir1</I> <I>lmw1</I> <I>wtw1</I> <I>w1</I> [<I>n1</I> [<I>s1</I>]]
	<I>dir2</I> <I>lmw2</I> <I>wtw2</I> <I>w2</I> [<I>n2</I> [<I>s2</I>]]
	...
</PRE>
Each line specifies an N-best directory, the language model and word transition
weights to be used in score combination, and a weight to be applied to the
posterior probabilities.
A weight of "=" denotes a value equal to the previous system and is used to encode
weight tying.
An optional next-to-last parameter for each N-best list allows the lists to be 
truncated to the top <I>n1</I>, <I>n2</I>, etc., hypotheses.
The final optional parameter sets the posterior distribution scaling factor,
which defaults to the language model weight.
Optionally,
<I> control-file </I>
can also contain lines of the form
</PRE>
	<I>dir</I> <I>w</I> <B>+</B>
</PRE>
These indicate that additional score files can be found in directory
<I> dir </I>
and that the scores found therein should be added to the following 
N-best list set with weight
<I>w</I>.<I></I><I></I><I></I>
Several lines of this form may occur preceding a regular N-best
directory specification; the corresponding additive combination of multiple
scores is performed.
<BR>
If ``-'' is specified for
<I>sentid-list</I>,<I></I><I></I><I></I>
the sentence IDs are inferred from
the contents of the first directory <I>dir1</I> specified in
<I>control-file</I>.<I></I><I></I><I></I>
If
<I> posterior-file </I>
is specified on the command line, posterior word probability estimates are
written to that file.
<P>
Additional arguments are treated as options, in particular
<DL>
<DT><B> -missing-nbest </B>
<DD>
indicates that empty hypotheses are to be used for N-best lists that are missing
from the directory specified in the control file.
</DD>
</DL>
<P>
Any other additional arguments are passed to the underlying
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>
invocation.
<BR>
<B> nbest-rover </B>
can process N-best lists in any of the formats described in
<A HREF="nbest-format.5.html">nbest-format(5)</A>,
<I>as long as all N-best lists for a given utterance are in the same format</I>.
When Decipher formats are used only their acoustic scores are used.
<P>
<B> combine-rover-controls </B>
takes one or more
<B> nbest-rover </B>
control files as arguments and outputs a new control file that specifies
the combination of the input files.
Directory names in the input files are adjusted to reflect the relative
location of the input files,
unless the 
<B> keeppaths=1 </B>
option is used.
Each input system is given equal weight,
unless the optional
<B>lambda=</B><I>weights</I><B></B><I></I><B></B><I></I><B></B>
argument is used to specify a space-separated list of system weights
(spaces in the weight vector need to be quoted on the command line).
The 
<B>postscale=</B><I>S</I><B></B><I></I><B></B><I></I><B></B>
argument overrides the posterior scaling factor in all input systems with the value
<I>S</I>.<I></I><I></I><I></I>
<P>
<B> rover-control-weights </B>
either retrieves or changes the weights in an nbest-rover control file.
If the 
<B> weights= </B>
argument is specified, the weights in the input control file are altered to the
values in the argument and a new control file is written to stdout.
Otherwise, the list of current weights is output as a single line.
<P>
<B> compute-best-rover-mix </B>
estimates the best weighting of a set of nbest system outputs for 
combination with
<B>nbest-rover</B>.<B></B><B></B><B></B>
The required input file 
<I> reference-posteriors-output </I>
is produced by running 
<B> nbest-rover </B>
to record the posteriors of the reference word strings on a tuning set:
<BR>
	<B>nbest-rover -</B> <I>control-file</I> /dev/null \
<BR>
	    <B>-refs</B> <I>references</I> \
<BR>
	    \fB-write-ref-posteriors\f <I>reference-posteriors-output</I>
<BR>
Initial weights are specified with
<B>lambda=</B><I>weights.</I><B></B><I></I><B></B><I></I><B></B>
<BR>
An additive constant for Laplace smoothing can be specified with 
<B>addone=</B><I>c.</I><B></B><I></I><B></B><I></I><B></B>
<BR>
The
<B> tying= </B>
argument allows the system weights to be tied. 
It should specify a string of positive integers (the bin numbers) with one value
for each system weight.
For example 
<B> tying='1 1 2 3 3' </B>
means that the first two and the last two of five weights are to be tied (put in the same bin).
<BR>
The estimated weight vector can optionally be written to a file using
<B>write_weights=</B><I>file.</I><B></B><I></I><B></B><I></I><B></B>
The weights can then be inserted into the original
<I>control-file</I>,<I></I><I></I><I></I>
e.g., using 
<B>rover-control-weights</B>.<B></B><B></B><B></B>
<P>
<B> rover-control-tying </B>
extracts the value for the 
<B> compute-best-rover-mix </B>
<B> tying= </B>
argument from an existing nbest-rover control file.
<P>
<B> nbest-optimize-args-from-rover-control </B>
extracts information from existing nbest-rover control files that can be passed as 
arguments to 
<A HREF="nbest-optimize.1.html">nbest-optimize(1)</A>
for initializing the search.
Options allow printing only the score weights, 
or only the list of additional scores directories.
<P>
<B> search-rover-combo </B>
searches for a good subset of systems to combine via 
<B>nbest-rover</B>.<B></B><B></B><B></B>
It performs a greedy search starting with the system that gives the lowest individual error, 
and then adds one system at a time until no further error reduction is possible.
The required argument <I>list-of-control-files</I> is a file listing the nbest-rover control files
representating the individual systems to be combined.
An nbest-rover control file is written to stdout representing the combined system.
Options are:
<DL>
<DT><B>-scorer</B><I> script</I><B></B><I></I><B></B><I></I><B></B>
<DD>
Specifies a script that evaluates a hypothesis file.
The script must take a single argument that is a hypothesis file in sentid format and output a single number
(the error rate) to stdout.
For example, the script could be based on parsing 
<B> compute-sclite </B>
output, but must know where to find the reference file etc.
<DT><B>-weights</B><I> list-of-weights</I><B></B><I></I><B></B><I></I><B></B>
<DD>
Specifies the list of system weights to try when adding a system.
By default this is just 1, but can be a space-separated list of weights, such as "1 0.5 0.2 0.1".
<DT><B>-sentids</B><I> list</I><B></B><I></I><B></B><I></I><B></B>
<DD>
A list of sentence IDs to perform the evaluation on (as in the first argument to
<B>nbest-rover</B>),<B></B><B></B><B></B>
<DT><B>-datadir</B><I> dir</I><B></B><I></I><B></B><I></I><B></B>
<DD>
Where to place auxiliary data files.
By default this is 
<B> SEARCH-DATA </B>
in the current directory.
<DT><B>-refs</B><I> refs</I><B></B><I></I><B></B><I></I><B></B>
<DD>
Triggers system weight optimization using 
<B> compute-best-rover-mix </B>
for each system combination before evaluating
its error rate.
The file 
<I> refs </I>
should point to a reference file in sentid format.
Note that these references are not used to evaluate the error rate of a system 
(which is done within the scorer script, see above) but only to be 
passed to 
<B>compute-best-rover-mix</B>.<B></B><B></B><B></B>
This option overrides the 
<B> -weights </B>
option since system weights are estimated.
<DT><B>-smooth-weight</B><I> S</I><B></B><I></I><B></B><I></I><B></B>
<DD>
Enables hierarchical weight smoothing.
Each weight estimate is interpolated with the previous estimate (with one fewer systems);
the previous weight vector gets weight
<I>S</I>.<I></I><I></I><I></I>
<DT><B>-J</B><I> n</I><B></B><I></I><B></B><I></I><B></B>
<DD>
Parallelize the evalation of system combinations with up to 
<I> n </I>
parallel jobs.
This uses the included parallelization script
<B>rexport.gnumake</B>,<B></B><B></B><B></B>
but the environment variable 
<B> REXPORT </B>
may be set to a command that takes a list of command lines as argument and executes them in an appropriate manner.
</DD>
</DL>
<P>
<B> nbest-posteriors </B>
rescales the scores in an N-best list to reflect (weighted) posterior
probabilities.
The output is the same N-best list with acoustic scores set to
the log (base 10) of the posterior hyp probabilities and LM scores set to zero.
<B>postscale=</B><I>S</I><B></B><I></I><B></B><I></I><B></B>
attenuates the posterior distribution by dividing combined log 
scores by
<I> S </I>
(the default is
<I>S</I>=<I>lmw</I>).<I></I><I></I>
If
<B>weight=</B><I>W</I><B></B><I></I><B></B><I></I><B></B>
is specified the posteriors are multiplied by
<I>W</I>.<I></I><I></I><I></I>
<B>max_nbest=</B><I>M</I><B></B><I></I><B></B><I></I><B></B>
limits the number of hypotheses used to the top 
<I>M</I>.<I></I><I></I><I></I>
This script is used mostly as a helper in
<B>nbest-rover</B>.<B></B><B></B><B></B>
<P>
<B> merge-nbest </B>
merges hypotheses from one or more N-best lists into a single list,
collapsing hypotheses that occur in more than one input list.
If all input lists use the same 
<A HREF="nbest-format.5.html">nbest-format(5)</A>
then the output will also be in that format and contain the information
from the first list in which a hypothesis was encountered.
Otherwise, the output will be in SRI Decipher(TM) NBestList1.0 format
and contain acoustic scores and word strings only.
The
<B>max_nbest=</B><I>M</I><B></B><I></I><B></B><I></I><B></B>
option limits input to the first 
<I> M </I>
hypotheses from each input list.
<B> multiwords=1 </B>
merges hypotheses that are identical after resolving multiwords, with 
<B>multichar=</B><I>C</I><B></B><I></I><B></B><I></I><B></B>
defining a non-default multiword separator character.
<B> nopauses=1 </B>
merges hypotheses that are identical after removal of pause words.
<P>
<B> nbest-vocab </B>
outputs the vocabulary used in a set of N-best lists.
(The N-best files cannot be compressed, but may be concatenated and
supplied via stdin.)
<P>
<B> nbest-words </B>
strips any score and alignment information from N-best lists and outputs
only the words, one hypothesis per line.
<P>
<B> nbest-oov-counts </B>
computes the number of out-of-vocabulary words for each hypothesis in an N-best list,
relative to a vocabulary listed in
<I>vocabfile</I>.<I></I><I></I><I></I>
Optionally a vocabulary mapping from
<I> aliasfile </I>
is applied (as with the 
<A HREF="ngram.1.html">ngram(1)</A>
<B> -vocab-aliases </B>
option).
The OOV counts are output on stdout and can be used as a score file for
N-best rescoring.
<P>
<B> nbest-error </B>
computes the overall oracle word error rate of a set of N-best lists
in directory
<I> score-dir </I>
or listed in
<I>file-list</I>.<I></I><I></I><I></I>
The reference answers are given in
<I> refs </I>
in the format output by 
<B> rescore-reweight </B>
(see above).
Additional arguments are passed to the underlying invocation of
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>,
and can be used to limit the depth of the N-best list,
compute lattice error rather than N-best error, etc.
<P>
<B> sentid-to-sclite </B>
converts 1-best hypotheses and references in the format used here to
the ``trn'' format expected by the NIST
<A HREF="sclite.1.html">sclite(1)</A>
scoring software.
<P>
<B> sentid-to-ctm </B>
converts 1-best hypotheses and references in the format used here to NIST
<A HREF="ctm.5.html">ctm(5)</A>
format.
The script relies on an encoding of conversation IDs, channel, and utterance
time marks in the sentence IDs and may need adjustment to local conventions.
<P>
<B> fix-ctm </B>
converts output produced by the
<B> -output-ctm </B>
option of 
<A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>
and
<A HREF="lattice-tool.1.html">lattice-tool(1)</A>
to a format suitable for scoring with NIST
<A HREF="sclite.1.html">sclite(1)</A>.
It, too, relies on information encoded in the sentids IDs and may need
adjustments.
<P>
<B> compute-sclite </B>
is a wrapper around 
the NIST 
<A HREF="sclite.1.html">sclite(1)</A>
scoring tool.
<I> refs </I>
and
<I> hyps </I>
are the reference and hypothesized transcripts, respectively. 
The
<I> refs </I>
file can be either in "sentid" format or in 
<A HREF="stm.5.html">stm(5)</A> 
format.  In the latter case,
<I> hyps </I>
will be converted to 
<A HREF="ctm.5.html">ctm(5)</A>
format using the 
<B> sentid-to-ctm </B>
helper script.
The
<I> hyps </I>
file can be either in "sentid" format or in 
<A HREF="ctm.5.html">ctm(5)</A>
format.
More than one 
<B> -h </B>
option can be given to combine the contents of multiple hypotheses files.
Optionally, 
<B> -S </B>
specifies a
sorted list of sentence IDs
<I> subset </I>
to score.
Multiple 
<B> -S </B>
options may be given, to form the intersection of several subsets.
<B> -multiwords </B>
or
<B> -M </B>
splits ``multiwords'' joined by underscores into their component words
prior to scoring.
<B> -noperiods </B>
deletes periods from the hypotheses prior to scoring
(typically used to bridge different conventions for spelled letters).
<B> -R </B>
preserves reject words in the hypotheses for scoring (as appropriate if
references also contain rejects).
<B> -g </B>
<I> glmfile </I>
enables filtering of references and hypotheses by the NIST
<B> csrfilt.sh </B>
script, controlled by the filter file 
<I> glmfile </I>
(this is only possible with an stm reference file).
In that case, the
<B> -H </B>
option causes hesitations (as defined by the filter)
to be deleted from the output for scoring purposes.
<B> -v </B>
displays the complete command used to invoke
<B>sclite</B>.<B></B><B></B><B></B>
Any additional options are passed to
<B>sclite</B>,<B></B><B></B><B></B>
e.g., to control its output actions or alignment mode.
<P>
<B> compute-sclite-nbest </B>
runs 
<B> compute-sclite </B>
on a set of N-best lists specified by 
<I> file-list </I>
and deposits the error counts in a directory
<I>output-dir</I>.<I></I><I></I><I></I>
These error counts may be used with the 
<A HREF="nbest-optimize.1.html">nbest-optimize(1)</A>
<B> -errors </B>
option to specify the hypothesis-level errors explicitly.
The references must be given in a file
<I> refs </I>
one per line, with the first word in each line matching
the file basename of the corresponding N-best list.
Additional options to be passed to 
<B> compute-sclite </B>
(and ultimately to 
<A HREF="sclite.1.html">sclite(1)</A>)
may be specified at the end of the command line.
The
<B> -filter </B>
option specifies a filtering
<I> script </I>
that edits the hypotheses before error computation.
<P>
<B> compare-sclite </B>
scores two sets of hypotheses 
<I> hyps1 </I>
and
<I> hyps2 </I>
for the same test set and computes in
how many cases the first or second set had lower word error.
The remaining options are as for
<B>compute-sclite</B>.<B></B><B></B><B></B>
The script ignores hypotheses for sentence that do not appear in both
hypothesis files, to ensure comparable scoring results.
<H2> SEE ALSO </H2>
<A HREF="nbest-format.5.html">nbest-format(5)</A>, <A HREF="ngram.1.html">ngram(1)</A>, <A HREF="nbest-lattice.1.html">nbest-lattice(1)</A>, <A HREF="nbest-optimize.1.html">nbest-optimize(1)</A>, <A HREF="sclite.1.html">sclite(1)</A>,
<A HREF="stm.5.html">stm(5)</A>, <A HREF="ctm.5.html">ctm(5)</A>.
<BR>
J.G. Fiscus, A Post-Processing System to Yield Reduced Word Error Rates:
Recognizer Output Voting Error Reduction (ROVER),
<I>Proc. IEEE Automatic Speech Recognition and Understanding Workshop</I>,
Santa Barbara, CA, 347-352, 1997.
<BR>
A. Stolcke et al., "The SRI March 2000 Hub-5 Conversational Speech
Transcription System",
<I>Proc. NIST Speech Transcription Workshop</I>, College Park, MD, 2000.
<H2> BUGS </H2>
<B> sentid-to-sclite </B>
has some assumptions about the structure of sentence IDs built-in and
may need to be modified for 
<B> compute-sclite </B>
and 
<B> compare-sclite </B>
to work.
<P>
<B> rescore-decipher </B>
<B> -pretty </B>
may not work correctly with the
<B> -limit-vocab </B>
option if the word mapping adds to the vocabulary subset used in the N-best
lists.
<H2> AUTHOR </H2>
Andreas Stolcke &lt;stolcke@icsi.berkeley.edu&gt;
<BR>
Copyright (c) 1995-2006 SRI International
<BR>
Copyright (c) 2011-2019 Andreas Stolcke
<BR>
Copyright (c) 2011-2018 Microsoft Corp.
</BODY>
</HTML>
