Keywords: Network compression, Pruning, Quantization, Fluid Turbulence, Network inference
TL;DR: In this work we look at gradient-conserving pruning and quantization methods and apply it to problems of scientific/engineering relevance.
Abstract: Multi-scale, multi-fidelity numerical simulations form the pillar of scientific applications
related to numerically modeling fluids. However, simulating the fluid
behavior characterized by the non-linear Navier Stokes equations are often times
computational expensive. Physics informed machine learning methods is a viable
alternative and as such has seen great interest in the community [refer to
Kutz (2017); Brunton et al. (2020); Duraisamy et al. (2019) for a detailed review
on this topic]. For full physics emulators, the cost of network inference is often
trivial. However, in the current paradigm of data-driven fluid mechanics models
are built as surrogates for complex sub-processes. These models are then used in
conjunction to the Navier Stokes solvers, which makes ML model inference an
important factor in the terms of algorithmic latency. With the ever growing size
of networks, and often times overparameterization, exploring effective network
compression techniques becomes not only relevant but critical for engineering
systems design. In this study, we explore the applicability of pruning and quantization
(FP32 to int8) methods for one such application relevant to modeling fluid
turbulence. Post-compression, we demonstrate the improvement in the accuracy
of network predictions and build intuition in the process by comparing the compressed
to the original network state.
1 Reply
Loading