DocDancer: Towards Agentic Document-Grounded Information Seeking

2026年1月8日

11 authors

概要

Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source models. In this work, we introduce DocDancer, an end-to-end trained open-source Doc agent. We formulate DocQA as an information-seeking problem and propose a tool-driven agent framework that explicitly models document exploration and comprehension. To enable end-to-end training of such agents, we introduce an Exploration-then-Synthesis data synthesis pipeline that addresses the scarcity of high-quality training data for DocQA. Training on the synthesized data, the trained models on two long-context document understanding benchmarks, MMLongBench-Doc and DocBench, show their effectiveness. Further analysis provides valuable insights for the agentic tool design and synthetic data.

カテゴリ

cs.CL

著者

Qintong ZhangXinjie LvJialong WuBaixuan LiZhengwei TaoGuochen YanHuanyao ZhangBin WangJiahao XuHaitao MiWentao Zhang