{
  "InnerLevel": {
    "multiple models": "Indicates whether the approach employs more than one central shared model. \n\n*Example hint*:\n Multiple employed models could each solve a supervised or unsupervised task. In the supervision on the level of multiple models, an additional mechanism could be required to index the appropriate model for prediction. If the correct model is automatically queried for prediction independently of which task an input belongs to this level is unsupervised. Finally, no visual mark on the compass corresponds to the scenario where only a single model is used.",
    "federated": "Leaves the taining data distributed on devices, and learns (a shared) model(s) by communicating locally-computed updates or quantities. \n\n*Example hint*:\nThere could exist a large number of devices in a network with groups corresponding to different regions, different processing devices, labels on how many devices are participating actively, and so on. In that sense, communication in federated learning could provide additional supervised information that could be exploited to steer the model. Conversely, a fully unsupervised route could simply average communicated updates without any label of the aforementioned characteristics.",
    "online": "A learning paradigm where the training data points arrive in a sequential order and the existing model is immediately updated, data is typically not revisited. \n\n*Example hint*:\nThe common offline continual learning setting considers several tasks in sequence, but within each task several epochs are trained to convergence. In an online setting, the increased stochasticity alongside with data drifts become a greater challenge, requiring a process to make sure that a model stays consistent with each update. For example, one could regularize parameter deviations over time according to a supervised importance measure or rely on various unsupervised quantities, such as exponential moving averages.",
    "open world": "In an open world, the additional challenge for a learner is to robustly identify unknown, sometimes corrupted or perturbed, and potentially meaningless data instances. \n\n*Example hint*:\nBlackbox deep learning methods do not inherently posses robust identification mechanisms. If a mechanism is included to recognize instances that deviate from the observed data distribution, then it is supervised if its conception requires a class label, say a classifier entropy. A respective unsupervised example could be a difference in e.g. a reconstruction loss or a divergence measure with respect to arbitrary feature spaces.",
    "multiple modalities": "Indicates whether the approach handles multiple modalities at once. \n\n*Example hint*:\nIf an approach only learns on one modality, such as text or images, then no indication of multi-modality is marked in the CLEVA-Compass. If the constructed system is able to handle multiple sources, a distinction between whether the multi-modality aspect is unsupervised or supervised condenses to the difference between whether or not the system requires a label on which modality an instance originates from, e.g. to condition a specific computation for this modality.",
    "active data query": "The learning algorithm is allowed and able to choose the data from which it learns. \n\n*Example hint*:\nA traditional fixed sequence benchmark setting has no active data query component. An alternative is to actively query data to include into optimization (in independence of whether data is available in a pool, a stream, etc.). As such, a measure of utility for prospective inclusion of a data instance is generally constructed. Correspondingly, this utility measure can either require presence of supervision or can be entirely unsupervised.",
    "task order discovery": "Indicates whether the method discovers improved task orders, in contrast to fixed or random ordering. \n\n*Example hint*:\nA fixed sequence of benchmark data corresponds to no mark for this compass element. Alternative an approach can decide which task is meaningful to learn next. On the example of a classifier, if a method can choose an improved task order, it does so in a supervised fashion if it e.g. makes use of prospective class labels to distinguish which class would be best to learn next. On the flip side of this example, if the model discovers an improved task order based only on e.g. divergences or distances in any constructed feature space that do not require labels to compute, the task order could be said to be unsupervised.",
    "task agnostic": "A method is said to be task agnostic if for prediction in a deployed model it does not require any additional information for which task the data instance originates from. \n\n*Example hint*:\nA supervised manifestation would be to explicitly include a time-step or task label into the learning process to condition the prediction. A fully unsupervised variant would be able to inherently provide a correct prediction for any data instance from any previously observed task without any such information. In stark contrast, no mark on either supervised or unsupervised portion of the task agnostic star plot element would indicate that an approach is not capable of solving the task assignment challenge at all, implying that a task oracle is required for prediction.",
    "episodic memory": "Indicates whether an episodic memory is constructed to effectively rehearse so called exemplars or prototype data instances. \n\n*Example hint*:\nIntuitively, if a method does not employ an auxiliary episodic memory, no mark is indicated on the CLEVA-Compass. If the construction mechanism for this episodic memory relies on labels in the data, e.g. by approximating a per class mean, then it is supervised. A straightforward unsupervised example would be to fill the episodic memory by sampling random data instances.",
    "generative": "Indicates whether the approach involves a generative model. \n\n*Example hint*:\nA typical deep classifier is discriminative in the form of p(y|x), where x denotes data and y denotes labels. A supervised generative variant would correspond to a model that learns the joint distribution p(x, y) instead, whereas unsupervised generative models will learn or approximate only p(x). Even if our objective is the classification example, the first of these variants will now also base decisions on the underlying nature of the data distribution. The other way around, not every unsupervised task necessarily requires a generative model.",
    "uncertainty": "Indicates whether the approach quantifies and uses uncertainty. \n\n*Example hint*:\nSome measures of uncertainty can require a need for calibration in order to provide meaningful values, e.g. the entropy of classifier predictions. Such calibration procedures can be interpreted as providing a supervised signal of what uncertainty 'should look like'. In contrast, if the method provides inherent uncertainties it can be said to be unsupervised. Naturally, many approaches also do not provide any uncertainty estimates at all."
  },
  "OuterLevel": {
    "compute time": "Practically used computation time. Different algorithms and operations can consume dramatically different compute time, in additional dependence on hardware, even when implemented in the same software.",
    "mac operations": "Number of multiply-accumulate operations are an alternative to reporting compute requirements, in a way that is not inherently tied to specific soft- and hardware.",
    "communication": "Communication costs start to play a critical role in a distributed or decentralized federated perspective, where time spent on many rounds of communication can rapidly exceed that of model computations.",
    "forgetting": "The amount of forgetting is a way to quantify the difference between maximum knowledge gained about the task throughout the learning process in the past and the knowledge that is currently still held about it.",
    "forward transfer": "Forward transfer determines the influence that an observed task has on a future task, quantifying the ability for zero-shot learning.",
    "backward transfer": "Backward transfer captures the improvement or deterioration an already observed task experiences when learning a new task",
    "openness": "Openness of the world describes the proportion between data points that can be assumed to originate from the investigated data distribution and potentially unknown, corrupted or perturbed instances.",
    "parameters": "Amount of overall parameters. A trivial solution to continual learning would be to allocate increasing amounts of separate and independent parameters over time, motivating a desire for parameter efficiency.",
    "memory": "How much memory is used. Provides a combined perspective on data storage and model parameter efficiency.",
    "stored data": "Amount of original data retained in a buffer, if any. Rehearsing instances becomes a trivial solution the more the buffer approximates the original dataset size.",
    "generated data": "Amount of data that is generated, if any. The quality and number of data instances sampled from a generative model determine the effectiveness of rehearsal.",
    "optimization steps": "The number of optimization steps is crucial to gauge empirical convergence. The number of optimization steps on revisited data also distinguishes sequences of continual offline and truly online scenarios.",
    "per task metric": "Task specific parts of reported losses or metrics allow for a deeper assessment of each task’s evolution over time, e.g. new and base for first and most recent task, in addition to the overall average.",
    "task order": "The order in which tasks are introduced, even if randomly sampled in practice. The order has a significant impact on obtainable continual performance depending on the constructed curriculum.",
    "data per task": "What data is introduced sequentially. The number of data instances is a primary indicator for sample efficiency and provides the context for e.g. few-shot settings."
  }
}
