Make Your LLM Fully Utilize the Context

Make Your LLM Fully Utilize the Context

논문 리뷰/Context length 2024. 5. 17. 15:04
반응형
LLM에서 흔히 발생하는 중도 포기 문제를 극복하기 위한 접근 방식을 제시

사용 모델: Mistral-7B-Instruct-v0.24 (Jiang et al., 2023)

IN2 training (instruction-tuning)

https://huggingface.co/datasets/In2Training/VaLProbing-32K

the long contexts and questions are used as instructions, and the loss on the answer parts are used to update the model

To avoid data contamination for the evaluation stage in Section 4, we apply a pre-filtering strategy during sampling the raw texts for constructing the dataset of IN2 training. Specifically, during sampling Ci for generating data, if the sampled Ci has a 10-gram overlap with any example in all of our evaluation data (including probing tasks, real-world tasks and short-context tasks), it will not be used for neither generating question-answer pairs nor serving as the random segments [rj ].

Training Examples for IN2 Training

Prompts For Data Generation and Training

VArious Long-context (VAL) Probing

document

code

structured-data context

Related Work

1. Long-context LLMs

training large models with extended context window

Mistral 7B

Glm: General language model pretraining with autoregressive blank infilling

How Long Can Context Length of Open-Source LLMs truly Promise?

Together Team. Together 32k, 2023. URL https://huggingface.co/togethercomputer/ LLaMA-2-7B-32K.

Effective long-context scaling of foundation models

Hierarchical context merging: Better long context understanding for pre-trained llms.

Focused transformer: Contrastive training for context scaling

Yi: Open foundation models by 01. ai

Internlm2 technical report

Data engineering

Data balancing

Data Engineering for Scaling Language Models to 128K Context

Data order arrangement

In-Context Pretraining: Language Modeling Beyond Document Boundaries

instruction data collection

Longalign: A recipe for long context alignment of large language models

data quality measurement

LongWanjuan: Towards Systematic Measurement for Long Text Quality

Effective and efficient training (to optimize the training of a long-context model)

position encoding

Extending context window of large language models via positional interpolation

Scaling laws of rope-based extrapolation

Yarn: Efficient context window extension of large language models

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

batching strategy

Longalign: A recipe for long context alignment of large language models

parameter-efficient training

Longlora: Efficient fine-tuning of long-context large language models

the development of new model architectures

Rwkv: Reinventing rnns for the transformer era

Mamba: Linear-time sequence modeling with selective state spaces

2. Long-context evaluations

Real-world benchmarks (e.g., long-context QA, summarization, and language modeling)

NarrativeQA

LongBench - Longbench: A bilingual, multitask benchmark for long context understanding

ZeroSCROLLS - Zeroscrolls: A zero-shot benchmark for long text understanding

L-Eva - L-eval: Instituting standardized evaluation for long context language models

Loogle - LooGLE: Can Long-Context Language Models Understand Long Contexts?

∞Bench - $\infty $ Bench: Extending Long Context Evaluation Beyond 100K Tokens

perplexity evaluation

Longformer: The long-document transformer

Efficient content-based sparse attention with routing transformers

Train short, test long: Attention with linear biases enables input length extrapolation

Extending context window of large language models via positional interpolation

Scaling laws of rope-based extrapolation

Yarn: Efficient context window extension of large language models

Longlora: Efficient fine-tuning of long-context large language models

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Random-access infinite context length for transformers

Probing tasks that provide a more concise reflection of the long-context utilization across different
context lengths and positions

Needle-in-the-Haystack, passkey retrieval - Random-access infinite context length for transformers

synthesized document QA - [HTML] Lost in the middle: How language models use long contexts

S3Eval - S3eval: A synthetic, scalable, systematic evaluation suite for large language models

Discovery - Long-context LLMs Struggle with Long In-context Learning

RULER - RULER: What's the Real Context Size of Your Long-Context Language Models?

https://arxiv.org/pdf/2404.16811
반응형

ABOUT ME

just do it just do it

LLM에서 흔히 발생하는 중도 포기 문제를 극복하기 위한 접근 방식을 제시

사용 모델: Mistral-7B-Instruct-v0.24 (Jiang et al., 2023)

IN2 training (instruction-tuning)

VArious Long-context (VAL) Probing

Related Work

1. Long-context LLMs

2. Long-context evaluations

티스토리툴바