Chart to Code

논문 리뷰/Multimodal 2025. 4. 2. 19:18

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

논문 요약

저자 및 기관: Xuanle Zhao, Xianzhen Luo 등 / Tsinghua University, Harbin Institute of Technology
제출일: 2025년 1월 11일
Zhao, Xuanle, et al. "ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation." arXiv preprint arXiv:2501.06598 (2025).
논문 링크: https://github.com/thunlp/ChartCoder

세계 최초의 chart-to-code 전용 MLLM(multimodal LLM) 구조를 제안.
DeepSeek Coder 6.7B 기반 Code LLM을 백본으로 사용하여 실행 가능성(executability) 을 대폭 향상.

텍스트 인식 정확도 부족
- OCR 기반 학습이 부족하여 chart 내부 텍스트 추출 및 재현 성능은 GPT-4o 대비 낮음.
- high-level 평가 지표 중 "text content"에서 성능 격차 발생.
리소스 제약으로 인한 작은 모델 사이즈
- ChartCoder는 7B 파라미터로 제한되어 있으며, 대형 모델 대비 잠재적 성능 한계 존재.
- 대규모 모델 확장이 아직 이루어지지 않음.
실제 chart OCR이나 문맥 기반 chart 이해 미반영
- Scientific document 상의 caption, 주변 문장 정보 등은 학습에 포함되지 않아 real-world contextual grounding은 미흡.
코드 생성 편향

code generation 과정에서 특정 template 기반 생성이 많아 중복적 패턴이 존재할 수 있음.

GitHub - thunlp/ChartCoder: ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation - thunlp/ChartCoder

github.com

LLaVA / LLaVA-NeXT (1)	2025.01.15
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges (2)	2025.01.03
Beagle: Automated Extraction and Interpretation of Visualizations from the Web (1)	2025.01.03
A Survey on Multimodal Large Language Models (1)	2024.11.13
A Comprehensive Review of Multimodal LargeLanguage Models: Performance and ChallengesAcross Different Tasks (1)	2024.11.13