.. _UE Manager:

UE Manager
==========

UEManager is the central class for estimating uncertainty scores, calculating required underlying statistics based on model generation, and storing the results. It is invoked by the ``polygraph_eval`` script, and after successful evaluation of the benchmark, UEManager will store various statistics and results in the following attributes:

- ``stats`` is a defaultdict where keys are names of statistics and values are the statistics themselves. Values of the statistics are not restricted to any particular type and can be anything from a single number to a complex object. Most (but not all) statistics are outputs of ``StatCalculator`` objects that were used during the evaluation.
- ``estimations`` stores outputs of the ``Estimator`` objects that were specified at manager creation. It is a defaultdict where keys are in the form of ``(level, estimator_name)`` tuples and values are corresponding estimator's outputs. The ``level`` can be one of ``(sequence, token, claim)`` and represents the type of uncertainty estimation method. For sequence-level estimators, values are 1D numpy arrays with length equal to the number of examples in the dataset. For token-level estimators, values are lists of numpy arrays, where length of outer list is the number of examples, and each inner array has length ueal to the number of tokens generated by the model for the corresponding example, excluding EOS token. For claim-level estimators, values are lists of numpy arrays, where length of outer list is the number of examples, and each inner array has length equal to the number of claims generated by the model for the corresponding example.
- ``gen_metrics`` keeps quality metrics of generated sequences. It is a defaultdict where keys are in the form of ``(level, metric_name)`` tuples and values are np.arrays of metric values. The ``level`` can be one of ``(sequence, claim)`` and represents the type of quality metric. For sequence-level metrics, values are 1D numpy arrays with length equal to the number of examples in the dataset. For claim-level metrics, values are lists of numpy arrays, where length of outer list is the number of examples, and each inner array has length equal to the number of claims generated by the model for the corresponding example.
- ``metrics`` stores comparative scores of uncertainty estimation methods under evaluation. It is a dict that will hold scores for each combination of compatible estimator, generation metric and uncertainty estimation metric (e.g. PRR, RCC etc.). Only pairs of estimators and generation metrics that have the same ``level`` are included.

The UEManager object can be persisted using the ``save`` method:

.. code-block:: python

   man = UEManager(*args, **kwargs)
   man()
   man.save('path/to/save')

When using the ``polygraph_eval`` script, the manager object is saved automatically to the directory specified by the ``save_path`` config parameter. Manager does not serialize itself in full, but stores previously discussed attributes as a dict using ``torch.save``. Thus, the saved object can be loaded using ``torch.load``, or directly using the ``load`` method of the UEManager itself:

.. code-block:: python

   man = UEManager.load('path/to/save')
