Abstract: Random Forest (RF) is one of the state-of-art supervised learning methods in Machine Learning and inherently consists of two steps: the training and the evaluation step. In applications where the system needs to be updated periodically, the training step becomes the bottleneck of the system, imposing hard constraints on its adaptability to a changing environment. In this work, a novel FPGA architecture for accelerating the RF training step is presented, exploring key features of the device. By combing a fine-grain data-flow processing at low-level and by exploiting parallelism available at high level inherent in the algorithm, significant acceleration factors are achieved. Key to the above gains is a novel FPGA FIFO based merge sorter module, a core component in the architecture, that exhibits high efficiency in memory utilisation; as well as a batch training strategy that enable full exploitation of the high memory bandwidth offered by the on-chip memory featured on FPGA devices. The proposed system achieves speed-up factors of up to 230x over a 3GHz Intel Core i5 processor when an Altera Stratix IV device is utilised under classification problems drawn from VOC2007.
Loading