文章目录1. Context Extension1.1 Rotary Position Embedding (RoPE)1.2 LongLoRA2. Evaluation of Long-Context LLMs2.1 The Lost in the Middle Phenomenon2.2 Long-Context Benchmarks: NIAH, LongBench3. Efficient Attention Mechanisms3.1 KV Cache3.2 StreamingLLM and Attention Sinks(重点)3.3 DuoAttention: Retrieval Heads and Streaming Heads (重点)3.4 Quest: Query-Aware Sparsity(重点)4. Beyond Transformers4.1 State-Space Models (SSMs): Mamba4.2 Hybrid Models: Jamba1. Context Extension1.1 Rotary Position Embedding (RoPE)1.2 LongLoRA2. Evaluation of Long-Context LLMs2.1 The Lost in the Middle Phenomenon2.2 Long-Context Benchmarks: NIAH, LongBench3. Efficient Attention Mechanisms3.1 KV Cache3.2 StreamingLLM and Attention Sinks(重点)3.3 DuoAttention: Retrieval Heads and Streaming Heads (重点)3.4 Quest: Query-Aware Sparsity(重点)4. Beyond Transformers4.1 State-Space Models (SSMs): Mamba4.2 Hybrid Models: Jamba