| | KV Sharing, MHC, and Compressed Attention (sebastianraschka.com) |
| 35 points by gmays 5 days ago | past | 3 comments |
|
| | Developments in LLM Architectures: KV Sharing, MHC, and Compressed Attention (sebastianraschka.com) |
| 4 points by ibobev 6 days ago | past | discuss |
|
| | Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention (sebastianraschka.com) |
| 3 points by pretext 8 days ago | past | discuss |
|
| | Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention (sebastianraschka.com) |
| 2 points by vismit2000 8 days ago | past | discuss |
|
| | My Workflow for Understanding LLM Architectures (sebastianraschka.com) |
| 4 points by ibobev 27 days ago | past |
|
| | Components of a Coding Agent (sebastianraschka.com) |
| 300 points by MindGods 50 days ago | past | 90 comments |
|
| | Claude Code's Real Secret Sauce Isn't the Model (sebastianraschka.com) |
| 6 points by ModelForge 54 days ago | past |
|
| | A Visual Guide to Attention Variants in Modern LLMs (sebastianraschka.com) |
| 9 points by Brajeshwar 62 days ago | past |
|
| | A Visual Guide to Attention Variants in Modern LLMs (sebastianraschka.com) |
| 23 points by Anon84 63 days ago | past | 1 comment |
|
| | LLM Architecture Gallery (sebastianraschka.com) |
| 586 points by tzury 70 days ago | past | 41 comments |
|
| | A Round Up and Comparison of 10 Open-Weight LLM Releases in Spring 2026 (sebastianraschka.com) |
| 4 points by MindGods 88 days ago | past |
|
| | Categories of Inference-Time Scaling for Improved LLM Reasoning (sebastianraschka.com) |
| 1 point by ibobev 3 months ago | past |
|
| | Understanding and Coding the Self-Attention Mechanism of LLMs from Scratch (sebastianraschka.com) |
| 1 point by onurkanbkrc 3 months ago | past | 1 comment |
|
| | The State of LLMs 2025: Progress, Problems, and Predictions (sebastianraschka.com) |
| 1 point by nsainsbury 4 months ago | past |
|
| | The State of LLMs 2025: Progress, Problems, and Predictions (sebastianraschka.com) |
| 3 points by ModelForge 4 months ago | past |
|
| | The State of LLMs 2025: Progress, Progress, and Predictions (sebastianraschka.com) |
| 4 points by ibobev 4 months ago | past |
|
| | The State of LLMs 2025: Progress, Progress, and Predictions (sebastianraschka.com) |
| 9 points by vismit2000 4 months ago | past |
|
| | New LLM Pre-Training and Post-Training Paradigms (sebastianraschka.com) |
| 2 points by lr0 4 months ago | past | 1 comment |
|
| | Understanding Encoder and Decoder LLMs (sebastianraschka.com) |
| 1 point by jeffjeffbear 5 months ago | past |
|
| | A Technical Tour of the DeepSeek Models from V3 to v3.2 (sebastianraschka.com) |
| 23 points by ibobev 5 months ago | past | 1 comment |
|
| | A Technical Tour of the DeepSeek Models from V3 to v3.2 (sebastianraschka.com) |
| 5 points by mzl 5 months ago | past | 1 comment |
|
| | Recommendations for Getting the Most Out of a Technical Book (sebastianraschka.com) |
| 2 points by naves 5 months ago | past |
|
| | A Technical Tour of the DeepSeek Models from V3 to v3.2 (sebastianraschka.com) |
| 8 points by giuliomagnifico 5 months ago | past |
|
| | Getting the Most Out of a Technical Book (sebastianraschka.com) |
| 4 points by quietlearning 6 months ago | past |
|
| | Beyond Standard LLMs (sebastianraschka.com) |
| 1 point by vismit2000 6 months ago | past |
|
| | Beyond Standard LLMs (sebastianraschka.com) |
| 1 point by ibobev 6 months ago | past |
|
| | A Researcher's Field Guide to Non-Standard LLM Architectures (sebastianraschka.com) |
| 2 points by ModelForge 6 months ago | past |
|
| | Understanding the 4 Main Approaches to LLM Evaluation (From Scratch) (sebastianraschka.com) |
| 1 point by ibobev 7 months ago | past |
|
| | Popular Attention Alternatives: GQA, MLA, SWA (sebastianraschka.com) |
| 4 points by ModelForge 7 months ago | past |
|
| | Multi-Head Latent Attention (sebastianraschka.com) |
| 4 points by ModelForge 7 months ago | past |
|
|
| More |