{"@context":{"@language":"en","@vocab":"https://schema.org/","citeAs":"cr:citeAs","column":"cr:column","conformsTo":"dct:conformsTo","cr":"http://mlcommons.org/croissant/","data":{"@id":"cr:data","@type":"@json"},"dataBiases":"cr:dataBiases","dataCollection":"cr:dataCollection","dataType":{"@id":"cr:dataType","@type":"@vocab"},"dct":"http://purl.org/dc/terms/","extract":"cr:extract","field":"cr:field","fileProperty":"cr:fileProperty","fileObject":"cr:fileObject","fileSet":"cr:fileSet","format":"cr:format","includes":"cr:includes","isEnumeration":"cr:isEnumeration","isLiveDataset":"cr:isLiveDataset","jsonPath":"cr:jsonPath","key":"cr:key","md5":"cr:md5","parentField":"cr:parentField","path":"cr:path","personalSensitiveInformation":"cr:personalSensitiveInformation","recordSet":"cr:recordSet","references":"cr:references","regex":"cr:regex","repeated":"cr:repeated","replace":"cr:replace","sc":"https://schema.org/","separator":"cr:separator","source":"cr:source","subField":"cr:subField","transform":"cr:transform","wd":"https://www.wikidata.org/wiki/"},"alternateName":"Ransomware \u0026 Benign NVMe streams - labeled per-command","conformsTo":"http://mlcommons.org/croissant/1.0","license":{"@type":"sc:CreativeWork","name":"CC BY-NC-SA 4.0","url":"https://creativecommons.org/licenses/by-nc-sa/4.0/"},"distribution":[{"contentUrl":"https://www.kaggle.com/api/v1/datasets/download/johndoenvme/clear-command-level-annotated-ransomware","contentSize":"131.357 GB","encodingFormat":"application/zip","@id":"archive.zip","@type":"cr:FileObject","name":"archive.zip","description":"Archive containing all the contents of the CLEAR - Command LEvel Annotated Ransomware dataset"},{"includes":"*.json","containedIn":{"@id":"archive.zip"},"encodingFormat":"application/json","@id":"application-json_fileset","@type":"cr:FileSet","name":"application/json files","description":"application/json files contained in archive.zip"},{"includes":"*.csv","containedIn":{"@id":"archive.zip"},"encodingFormat":"text/csv","@id":"text-csv_fileset","@type":"cr:FileSet","name":"text/csv files","description":"text/csv files contained in archive.zip"},{"includes":"*.txt","containedIn":{"@id":"archive.zip"},"encodingFormat":"text/plain","@id":"text-plain_fileset","@type":"cr:FileSet","name":"text/plain files","description":"text/plain files contained in archive.zip"}],"keywords":["data type \u003E tabular","subject \u003E cyber security","task \u003E binary-classification","technique \u003E time series analysis"],"isAccessibleForFree":true,"isLiveDataset":true,"includedInDataCatalog":{"@type":"sc:DataCatalog","name":"Kaggle","url":"https://www.kaggle.com"},"creator":{"@type":"sc:Person","name":"JohnDoeNVMe","url":"/johndoenvme","image":"https://storage.googleapis.com/kaggle-avatars/thumbnails/default-thumb.png"},"publisher":{"@type":"sc:Organization","name":"Kaggle","url":"https://www.kaggle.com/organizations/kaggle","image":"https://storage.googleapis.com/kaggle-organizations/4/thumbnail.png"},"thumbnailUrl":"https://storage.googleapis.com/kaggle-datasets-images/6509812/10516971/fe8600f782ccf1c6e7e804cdb2ff67d0/dataset-card.jpg?t=2025-01-20-09-24-10","dateModified":"2025-05-15T14:01:08.51","datePublished":"2025-01-21T10:12:39.5048019","@type":"sc:Dataset","name":"CLEAR - Command LEvel Annotated Ransomware","url":"https://www.kaggle.com/datasets/johndoenvme/clear-command-level-annotated-ransomware","description":"# **Introduction**\nThe CLEAR dataset is designed to support research in ransomware detection using NVMe stream analysis. This dataset contains recordings of benign and ransomware activities on various disk sizes, with per-command labels. Specifically, our collection includes recordings of mixed workloads, where both benign and ransomware activities coexist in the same recording.\n\n# **Dataset Structure**\nThe dataset consists of two main folders:\n\n- Benign: Contains recordings of normal disk activity.\n- Ransomware (RW): Contains recordings of ransomware attacks.\n\n# **Recording Structure** \nEach recording is stored in a separate folder, which contains:\n\n## recording_metadata.json\nA JSON file with metadata about the recording, including:\n- Type: The type of recording (RW / Benign)\n- Virus Family: The associated virus family, if applicable\n- SHA-256: Hash of the virus strand (if applicable)\n- Benign Tasks: Description of background benign process(es) (if applicable)\n- Duration: The recording\u0027s duration, in seconds\n- No. of Commands: The number of NVMe commands in the recording\n- Read Traffic: The total bytes of the recording of associated read requests (GiB)\n- Write Traffic: The total bytes of the recording of associated write requests (GiB)\n- Disk Size: Size of the disk where the simulation took place\n- Victim Data: The relevant data on which the recording worked on (one of 9 sets, see below)\n- Source: Whether it is in-house or from the Storage Networking Industry Association (SNIA)\n- OS: The operating system of the recording\n\t\n### Victim Data Sets\n- Set 1, 2: NapierTiny (17.8GB) \u002B in-house curation of user files (6GB)\n- Set 3, 4: NapierSmall (157GB) \u002B in-house curation of user files (11.3GB) \u002B downloads (10GB)\n- Set 5: As in Sets 3, 4 \u002B png  files (57GB) from the DiffusionBM-2M dataset repository\n- Set 6: png files (57GB) from the DiffusionBM-2M dataset repository \u002B in-house curation of user files (11.3GB)\n- Set 7: First set of 25GB png files from the DiffusionBM-2M dataset repository \u002B in-house curation of user files (11.3GB)\n- Set 8: Second set of 25GB png files the DiffusionBM-2M dataset repository \u002B in-house curation of user files (11.3GB)\n- Set 9: In-house curation of user files (11.3GB) \u002B downloads (10GB)\n\n## recording.parquet\nA Parquet file containing the recording data, with the following columns:\n\n| Column Name      | Description                                                                                                                         |\n|------------------|-------------------------------------------------------------------------------------------------------------------------------------|\n| Timestamp         | The timestamp in seconds                                                                                                           |\n| OpCode            | The operation code: 1 for write, 2 for read                                                                                        |\n| Offset (Bytes)    | The starting point of the data operation on disk                                                                                   |\n| Size (Bytes)      | The amount of data transferred                                                                                                     |\n| Label             | The ransomware label: 0 for Benign, 1 for Ransomware                                                                               |\n| WaR               | Write after Read overlap: how many bytes of a particular write command overlap previous read commands within the same disk region   |\n| RaR               | Read after Read overlap: how many bytes of a particular read command overlap previous read commands within the same disk region     |\n| RaW               | Read after Write overlap: how many bytes of a particular read command overlap previous write commands within the same disk region   |\n| WaW               | Write after Write overlap: how many bytes of a particular write command overlap previous write commands within the same disk region |\n| WaR Lapse         | Time lapsed in seconds between the first associated read command and the current write command                                     |\n| RaR Lapse         | Time lapsed in seconds between the first associated read command and the current read command                                      |\n| RaW Lapse         | Time lapsed in seconds between the previous write command and the current read command                                             |\n| WaW Lapse         | Time lapsed in seconds between the previous write command and the current write command                                            |\n| Process ID        | The process ID of the command                                                                                                      |\n| Process Name      | The process name of the command                                                                                                    |\n| Path Name         | The particular file initiated by the command                                                                                       |\n\n## process_tree.parquet (if available)\nA Parquet file containing the process tree data, with the following columns:\n\n| Column Name         | Description                                           |\n|----------------------|-------------------------------------------------------|\n| Process Name         | The process name                                     |\n| Process ID           | The process ID                                       |\n| Parent Process ID    | The process ID that initiated the current process    |\n| Process Tree         | The process progression until the current process    |\n| Command Line         | The command that initiated this process              |\n\n\n\n# **Additional Files**\nThe dataset also includes:\n\n1. sample.csv: A 1000-row sample of a recording\n2. metadata.csv: A combined description of the entire dataset (combination of all of the recordings\u0027 metadata.json files)\n\n# **Usage and Integration Guidelines**\nExamples for specific use cases for the dataset:\n* Exploratory Data Analysis (EDA): Visualize data distributions, trends, and outliers.\n* Binary Classification: Build models to classify data as ransomware or benign.\n* Feature Engineering: Explore effective ways to represent ransomware.\n\nTo import this dataset into a full training and inference pipeline, one can follow the instructions under the dataset\u0027s [code](https://github.com/ravensorioles/rwdetection/blob/main/README.md)\n\n**We hope you find this dataset useful. If you have any questions or need additional information, please don\u0027t hesitate to contact us.**"}