LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Abstract: Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce th...
0 Replies
Loading