- Abstract: Many machine learning systems are implemented as pipelines. A pipeline is essentially a chain/network of information processing units. As information flows in and out and gradients vice versa, ideally, a pipeline can be trained end-to-end via backpropagation provided with the right supervision and loss function. However, this is usually impossible in practice, because either the loss function itself may be non-differentiable, or there may exist some non-differentiable units. One popular way to superficially resolve this issue is to separate a pipeline into a set of differentiable sub-pipelines and train them with isolated loss functions. Yet, from a decision-theoretical point of view, this is equivalent to making myopic decisions using ad hoc heuristics along the pipeline while ignoring the real utility, which prevents the pipeline from behaving optimally. In this paper, we show that by converting a pipeline into a stochastic counterpart, it can then be trained end-to-end in the presence of non-differentiable parts. Thus, the resulting pipeline is optimal under certain conditions with respect to any criterion attached to it. In experiments, we apply the proposed approach - reinforced pipeline optimization - to Faster R-CNN, a state-of-the-art object detection pipeline, and obtain empirically near-optimal object detectors consistent with its base design in terms of mean average precision.
- Keywords: Pipeline Optimization, Reinforcement Learning, Stochastic Computation Graph, Faster R-CNN
- TL;DR: By converting an originally non-differentiable pipeline into a stochastic counterpart, we can then train the converted pipeline completely end-to-end while optimizing any criterion attached to it.