- Keywords: Network compression, Pruning, Quantization, Fluid Turbulence, Network inference
- TL;DR: In this work we look at gradient-conserving pruning and quantization methods and apply it to problems of scientific/engineering relevance.
- Abstract: Multi-scale, multi-fidelity numerical simulations form the pillar of scientific applications related to numerically modeling fluids. However, simulating the fluid behavior characterized by the non-linear Navier Stokes equations are often times computational expensive. Physics informed machine learning methods is a viable alternative and as such has seen great interest in the community [refer to Kutz (2017); Brunton et al. (2020); Duraisamy et al. (2019) for a detailed review on this topic]. For full physics emulators, the cost of network inference is often trivial. However, in the current paradigm of data-driven fluid mechanics models are built as surrogates for complex sub-processes. These models are then used in conjunction to the Navier Stokes solvers, which makes ML model inference an important factor in the terms of algorithmic latency. With the ever growing size of networks, and often times overparameterization, exploring effective network compression techniques becomes not only relevant but critical for engineering systems design. In this study, we explore the applicability of pruning and quantization (FP32 to int8) methods for one such application relevant to modeling fluid turbulence. Post-compression, we demonstrate the improvement in the accuracy of network predictions and build intuition in the process by comparing the compressed to the original network state.