Cluster-trace-v2018 includes about 4000 machines in a perids of 8 days and is consisted of 6 tables (each is a file). Here is a brief introduction of the tables and the detail of each table is provided in section 2.2 Schema.

machine_meta.csv：the meta info and event infor of machines.
machine_usage.csv: the resource usage of each machine.
container_meta.csv：the meta info and event infor of containers.
container_usage.csv：the resource usage of each container.
batch_instance.csv：inforamtion about instances in the batch workloads.
batch_task.csv：inforamtion about instances in the batch workloads. Note that the DAG information of each job's tasks is described in the task_name field.

A job is typically consisted of several tasks whose depencies are expressed by DAG (Directed Acyclic Graph). Each task has a number of instances, and only when all the instances of a task are completed can a task be considered as "finished", i.e. if task-2 is depending on task-1, any instance of task-2 cannot be started before all the instances of task-1 are completed. The DAG of tasks in a job can be deduced from the task_name field of all tasks of this job, and it is explained with the following example.

The DAG of Job-A is shown in the following figure. Job-A is consisted of 5 tasks with some dependencies. The DAG of the 5 tasks are expressed with their task_name. For each task:

task1's task_name is M1: means that task1 is an independent task and can be started without waiting for any other task. Similarly for th rest
M2_1: means that task2 depends on the finishing of task1
M3_1: means that task3 depends on the finishing of task1
R4_2: means that task4 depends on the finishing of task2
M5_3_4: means that task5 depends on both task3 and task4, that is, task5 cannot start before all instances of both task3 and task4 are completed.
Note that for DAG information, only the numeric figure in the task_name matters, while the first charactor (e.g. M, R in the example) has nothing to do with dependency.

The number of instances for each task is expressed with another field instance_num.

Note that this is the infromation of the raw data. It will be provided as a single time-series