Key point is key in resolving the offline three-dimensional bin packing problem

Xinyuan Tian; pengzhan qu; Hongsheng Qi

Key point is key in resolving the offline three-dimensional bin packing problem

Xinyuan Tian, pengzhan qu, Hongsheng Qi

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: deep reinforcement learning; three-dimensional offline packing;

Abstract: In recent years, with the surge in deep learning and reinforcement learning, researchers have begun to explore the use of deep reinforcement learning to solve the offline three-dimensional bin packing problem. However, the valid action space in the offline three-dimensional bin packing problem is quite large, making it difficult for the model to converge as the number of boxes increases. Therefore, reducing the number of valid actions is crucial. In previous studies, many researchers have developed heuristic rules to reduce the number of effective actions. However, some of these heuristic rules drastically reduce the number of valid actions, potentially forgoing the optimal solution, while others do not sufficiently reduce the number of effective actions, making it still challenging for the model to converge when there are many boxes.In response to this, we propose a heuristic rule where boxes are placed only at certain specific locations, which we refer to as Key Points, while other locations are masked. This method integrates well with existing deep reinforcement learning models for solving the offline three-dimensional bin packing problem. We not only theoretically demonstrate the efficacy of this heuristic rule but also empirically show that when our method is combined with existing models, it can easily train with four times the number of boxes. The model converges ten times faster than before, and its performance also improves. Interestingly, even without retraining the model, using our method in the testing phase yields better results than the original method. We also compare our method to other heuristic rules. Experimental results show that our approach strikes a balance between convergence speed and performance.

Supplementary Material: zip

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4839

Loading