
## FruitBin

The dataset being accesible in a non anonimous link exemples of data are in the folder "data". 

The datasets for the 8 scenarios are currently available in compressed zip files under the "scenarios" section. The release of the large-scale dataset, which encompasses a broader range of data, will be made available at the same location as soon as possible, within a few weeks. Considering the substantial size of the dataset, users will have the option to selectively download specific data subsets based on their requirements, such as RGB, depth, and other features, allowing for more efficient and targeted data retrieval.

Detailed information regarding the dataset generation process can be accessed through the code folder "PickSim" for insights into the PickSim generation, the code folder "fruitbin" for access to the processing codes and procedures involved in creating FruitBin, as well as the scenarios designed for training purposes. These resources provide valuable documentation and code references for a better understanding of the dataset's origins and the steps taken to prepare it for various applications.

To facilitate the reproduction of the benchmark results, we have provided the folder codes PVnet and Densefusion along with detailed instructions on how to apply them to the FruitBin dataset. These resources will guide you through the implementation of pvnet and Densefusion on the FruitBin dataset, enabling you to replicate the benchmark and further explore the capabilities of the dataset.


## Getting started

The expected usage involves utilizing the following command:

```
python main.py --World_begin="$id_begin" --Nb_world="$Nb" --dataset_id="$id_dataset" --occlusion_target_min=$occlusion_min --occlusion_target_max=$occlusion_max --rearrange=$rearrange --compute=$compute
```
as the exemple :
```
python main.py --World_begin=1 --Nb_world=10000 --dataset_id=1 --occlusion_target_min=0.7 --occlusion_target_max=1.0 --rearrange=yes --compute=yes
```

The following table presents the different input parameters:

| Parameter |                  Description    |
| :---:   | :---: | 
| World_begin             | The ID of the first scene to be processed. |
| Nb_world                | The number of scenes to be processed.   |
| dataset_id              | The ID of the dataset to be processed. It is used to determine the path of the data to be processed. |
| rearrange               | Determines whether the script should perform data rearrangement from the data generated by PickSim.  |
| compute                 | Determines whether the script should perform post-processing for a specific scenario.|
| occlusion_target_min    | In the case of scenario pre-processing, specifies the lower bound of the desired visibility rate for filtering. |
| occlusion_target_max    | In the case of scenario pre-processing, specifies the upper bound of the desired visibility rate for filtering. |


Furthermore, additional parameters can be modified in the main.py file, such as:

| Parameters |                  Description    |
| :---:   | :---: | 
| Nb_camera              | The number of cameras in the dataset                                           |
| dataset_src             | The path to the data generated by PickSim that needs to be processed |
| dataset_path            | The destination path for the rearranged dataset |
| choice               | Determines whether the script performs post-processing for depth sensor data or RGB sensor data |
| list_categories       | The list of categories to be considered in the dataset|
| new_size             | The desired size for resizing features for future training |


## Rearrange step


The rearrange step involves moving data from the original organization of Picksim, such as:

```
├── [Scene Id]
|   ├── Meta.json
│   ├── [Camera ID]
|   |    ├── color
|   |    │   └── image
|   |    ├── depth
|   |    │   ├── depth_map
|   |    │   ├── image
|   |    │   ├── normals_map
|   |    │   ├── pointcloud
|   |    │   └── reflectance_map
|   |    ├── ground_truth_depth
|   |    │   ├── 2d_detection
|   |    │   ├── 2d_detection_loose
|   |    │   ├── 3d_detection
|   |    │   ├── 3d_pose
|   |    │   ├── id_map
|   |    │   ├── instance_map
|   |    │   ├── occlusion
|   |    │   └── semantic_map
|   |    ├── ground_truth_rgb
|   |    │   ├── 2d_detection
|   |    │   ├── 2d_detection_loose
|   |    │   ├── 3d_detection
|   |    │   ├── 3d_pose
|   |    │   ├── id_map
|   |    │   ├── instance_map
|   |    │   ├── occlusion
|   |    │   └── semantic_map
|   |    ├── infra1
|   |    │   └── image
|   |    └── infra2
|   |        └── image

```

to the following structured organization:

```
├── Bbox_2d
├── Bbox_2d_loose
├── Bbox_3d
├── Depth
├── Instance_Segmentation
├── Meta
├── Occlusion
├── Pose
├── RGB
├── Semantic_Segmentation
```

The table below provides information about the folders in the PickSim generation, which correspond to different sensors in the Gazebo simulation, aligning with the various sensors of the RealSense camera D415.


| Parameters |                  Description    |
| :---:   | :---: | 
| Meta.json |  Scene-oriented meta file that enumerates all the recorded data and provides the list of categories and their corresponding instance IDs.
| Scene ID              | The ID of the scene for which the data was generated.                                          |
| Camera ID             | The ID of the camera for which the data was generated.    |
| color            | The name of the sensor recording RGB images with a resolution of 1920x1080, matching the data from the RealSense camera D415. |
| depth            | The name of the sensor recording depth data or point cloud with a resolution of 1280x720, matching the data from the RealSense camera D415. |
| ground_truth_rgb               | Recorded features from a new vision plugin for the color sensor with a resolution of 1920x1080. |
| ground_truth_depth              | Recorded features from a new vision plugin for the depth sensor with a resolution of 1280x720. |
| infra1       | Black and white infra channel 1 with a resolution of 1280x720.  |
| infra2    | Black and white infra channel 2 with a resolution of 1280x720. |



The resulting data is organized based on the type of features. The following table presents information about the resulting data and its corresponding raw data from PickSim:


| Parameters |  Equivalent PickSim |   Description    |
| :---:   | :---: | :---: | 
| Meta  |  [Scene Id]/Meta.json | Scene-oriented metadata file that enumerates all recorded data and provides a list of categories and instance IDs.|
| Bbox_2d | [Scene Id]/[camera_i]/[ground_truth_depth]/2d_detection | 2D bounding boxes of objects in the scene. |             |          
| Bbox_2d_loose | [Scene Id]/[camera_i]/[ground_truth_depth]/2d_detection | Loose 2D bounding boxes of objects in the scene. |
| Bbox_3d  | [Scene Id]/[camera_i]/[ground_truth_depth]/3d_detection | 3D bounding boxes of objects in the scene. |
| Depth | [Scene Id]/[camera_i]/[depth]/images  | Depth map captured by the depth sensor.  |
| Instance_Segmentation |  [Scene Id]/[camera_i]/ground_truth_depth/id_map | Instance segmentation mask of objects in the scene. |
| Semantic_Segmentation | [Scene Id]/[camera_i]/ground_truth_depth/sematic_map  | Semantic segmentation mask of objects in the scene. |
| Occlusion | [Scene Id]/[camera_i]/ground_truth_depth/occlusion | Occlusion rate of each instance in the scene.  |
| Pose  | [Scene Id]/[camera_i]/ground_truth_depth/3d_pose | 6D pose (position and orientation) of each instance in the scene.   |
| RGB  | [Scene Id]/[camera_i]/depth/image | RGB image captured by the depth sensor. |

The ground truth annotation exclusively utilizes the ground_truth_depth data, which ensures consistency with RGB-D data of the same resolution. However, all features will be included in the dataset to cater to the potential needs of the community. You can access the dataset, including these features, at our link.
## Compute step for PVnet and the the scenarios

The compute step takes the rearranged data as input and processes it for future training purposes. It formats the data into the required structure for training the Pvnet 6D pose estimation model. Additionally, for training purposes, all the data is categorized and fruit-related, as illustrated in the following architectures:

```
├── Fruit_i
│   ├── Bbox
│   ├── Bbox_3d_Gen
│   ├── Depth_Gen
│   ├── Depth_resized
│   ├── FPS
│   ├── FPS_resized
│   ├── Instance_Mask
│   ├── Instance_Mask_resized
│   ├── Labels
│   ├── Meta_Gen
│   ├── Models
│   ├── Pose_transformed
│   ├── RGB_Gen
│   ├── RGB_resized
│   └── Splitting
```

The table below provides information about the various generated features. Furthermore, this processed data takes into account the filtering parameters specified in the main.py script, such as the desired level of occlusion. Regarding FruitBin, the scene scenarios are divided into 6000 scenes for training, 2000 scenes for evaluation, and 10000 scenes for testing. In addition, 9 cameras are allocated for training, while 3 cameras are assigned for evaluation and testing, resulting in a total of 15 cameras.


| Parameters |                  Description    |
| :---:   | :---: | 
| Fruit_i | The fruit category being considered        |          
| Meta_Gen | Metadata describing fruit-specific information such as Scene ID, Camera ID, a list of instance IDs related to the fruit, and associated occlusion rates | 
| BBox    | Bboxes | 2D bounding boxes |
| Bbox_3d_Gen    | 3D Bboxes  | 3D bounding boxes |
| Depth_Gen    | Depth map data with a resolution of 1280x720 |
| Depth_resized   | Resized depth map data with a resolution of 640x480 for training |
| FPS    | Farthest Point Sampling (FPS) key points for the 1280x720 image used in Pvnet |
| FPS_resized    | Resized FPS data with a resolution of 640x480 for training in Pvnet |
| Instance_Mask    | Instance mask data with a resolution of 1280x720 |
| Instance_Mask_resized | Resized instance mask data with a resolution of 640x480 for training |
| Labels    | Instance mask in the Yolov8 format (generated using the 'compute label' script explained below) |
| Models    | Meshes of the 8 fruits in a common PLY format |
| Pose_transformed   | 6D pose annotations in the PVNet format |
| RGB_Gen   | RGB image data with a resolution of 1280x720 |
| RGB_resized    | RGB image data  with a resolution of 1280x720   |
| Splitting    | This folder is only available when the dataset is downloaded online. It contains a list of .txt splitting files for different scenarios, describing the train/eval/test split. |

The following step fully prepares the data for PVNet training. For more detailed information, please refer to the following code folder pvnet_fruitbin

## Compute step for Densefusion

The final step involves preparing the data for DenseFusion training. To accomplish this, you can use the following command as an example:

```
python3 compute_label.py --path_dataset=Path/FruitBin1/FruitBin_low_1_0.7_1.0/ --target_folder=Generated_Cameras --path_DF_data=Path/DenseFusion01_Cameras/datasets/linemod/Linemod_preprocessed/data --occ_data=""
```

Parameters	Description
path_dataset	The path to the preprocessed dataset generated during the previous compute step.
target_folder	The specific scenario to be considered for DenseFusion training.
path_DF_data	The path to the Densefusion folder where the training data will be placed.
occ_data	An additional parameter that modifies the name of the resulting .txt DenseFusion splitting file when multiple scenarios are considered within the same folder.

The following table present the different input parameters : 
| Parameters |                  Description    |
| :---:   | :---: | 
| path_dataset              | The path to the preprocessed dataset generated during the previous compute step.   |
| target_folder             | The specific scenario to be considered for DenseFusion training. |
| path_DF_data                | The path to the Densefusion folder where the training data will be placed.|
| occ_data              | An additional parameter that modifies the name of the resulting .txt DenseFusion splitting file when multiple scenarios are considered within the same folder. |

The preprocessing for DenseFusion is now complete. For more detailed information, please refer to the code folder "DenseFusion".


## License
The dataset is licensed under the CC BY-NC-SA license.
