Do low rank/block diagonal matrices come up in LLMs often? What about banded or ...

sdenton4 · 2025-11-23T02:26:21 1763864781

Yep! Think of LORA for network fine tuning. Monarch (linked above) uses lots of block diagonality. These ideas also make flash attention flash.

I haven't seen banded matrices as much, though (with weight sharing) they're just convolutions. One nice feature of block diagonality is that you can express it as batched matrix multiplication, reusing all the existing matmul kernels.