All terms
Optimization
Unstructured Pruning
Zeroing individual weights to reach high sparsity, needing sparse kernels for speedups.
Definition
Unstructured pruning sets individual weights to zero based on a criterion such as magnitude, gradient sensitivity, or a second-order estimate. It can reach very high sparsity with little accuracy loss, but because the removed weights are scattered irregularly, realizing actual speedups requires sparse computation kernels. SparseGPT and Wanda are notable methods that apply it to large language models.