ABOUT ME

Today
Yesterday
Total
  • Make Your LLM Fully Utilize the Context
    논문 리뷰/Context length 2024. 5. 17. 15:04
    반응형

    LLM에서 흔히 발생하는 중도 포기 문제를 극복하기 위한 접근 방식을 제시

    사용 모델: Mistral-7B-Instruct-v0.24 (Jiang et al., 2023)

     

    IN2 training (instruction-tuning)

    https://huggingface.co/datasets/In2Training/VaLProbing-32K

     

    • the long contexts and questions are used as instructions, and the loss on the answer parts are used to update the model 
    To avoid data contamination for the evaluation stage in Section 4, we apply a pre-filtering strategy during sampling the raw texts for constructing the dataset of IN2 training. Specifically, during sampling Ci for generating data, if the sampled Ci has a 10-gram overlap with any example in all of our evaluation data (including probing tasks, real-world tasks and short-context tasks), it will not be used for neither generating question-answer pairs nor serving as the random segments [rj ].

     

    • Training Examples for IN2 Training

     

     

    • Prompts For Data Generation and Training

     

    VArious Long-context (VAL) Probing

    • document
    • code
    • structured-data context

     

     

    Related Work

    1. Long-context LLMs

    • training large models with extended context window
      • Mistral 7B
      • Glm: General language model pretraining with autoregressive blank infilling
      • How Long Can Context Length of Open-Source LLMs truly Promise?
      • Together Team. Together 32k, 2023. URL https://huggingface.co/togethercomputer/ LLaMA-2-7B-32K.
      • Effective long-context scaling of foundation models
      • Hierarchical context merging: Better long context understanding for pre-trained llms.
      • Focused transformer: Contrastive training for context scaling
      • Yi: Open foundation models by 01. ai
      • Internlm2 technical report
    • Data engineering
      • Data balancing  
        • Data Engineering for Scaling Language Models to 128K Context
      • Data order arrangement
        • In-Context Pretraining: Language Modeling Beyond Document Boundaries
      • instruction data collection
        • Longalign: A recipe for long context alignment of large language models
      • data quality measurement
        • LongWanjuan: Towards Systematic Measurement for Long Text Quality
    • Effective and efficient training (to optimize the training of a long-context model)
      • position encoding
        • Extending context window of large language models via positional interpolation
        • Scaling laws of rope-based extrapolation
        • Yarn: Efficient context window extension of large language models
        • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
      • batching strategy
        • Longalign: A recipe for long context alignment of large language models
      • parameter-efficient training 
        • Longlora: Efficient fine-tuning of long-context large language models
      • the development of new model architectures
        • Rwkv: Reinventing rnns for the transformer era
        • Mamba: Linear-time sequence modeling with selective state spaces

    2. Long-context evaluations

    • Real-world benchmarks (e.g., long-context QA, summarization, and language modeling)
      • NarrativeQA
      • LongBench - Longbench: A bilingual, multitask benchmark for long context understanding
      • ZeroSCROLLS - Zeroscrolls: A zero-shot benchmark for long text understanding
      •  L-Eva - L-eval: Instituting standardized evaluation for long context language models
      • Loogle - LooGLE: Can Long-Context Language Models Understand Long Contexts?
      • ∞Bench - $\infty $ Bench: Extending Long Context Evaluation Beyond 100K Tokens
      • perplexity evaluation
        • Longformer: The long-document transformer
        • Efficient content-based sparse attention with routing transformers
        • Train short, test long: Attention with linear biases enables input length extrapolation
        • Extending context window of large language models via positional interpolation
        • Scaling laws of rope-based extrapolation
        • Yarn: Efficient context window extension of large language models
        • Longlora: Efficient fine-tuning of long-context large language models
        • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
        • Random-access infinite context length for transformers
    • Probing tasks that provide a more concise reflection of the long-context utilization across different
      context lengths and positions
      • Needle-in-the-Haystack, passkey retrieval - Random-access infinite context length for transformers
      • synthesized document QA - [HTML] Lost in the middle: How language models use long contexts
      • S3Eval -  S3eval: A synthetic, scalable, systematic evaluation suite for large language models
      • Discovery - Long-context LLMs Struggle with Long In-context Learning
      • RULER - RULER: What's the Real Context Size of Your Long-Context Language Models?

     

     

     

     

     

     

     

    https://arxiv.org/pdf/2404.16811

     

    반응형
Designed by Tistory.