TongUI

A large-scale trajectory data construction effort that transforms multimodal web tutorials into agent training data across operating systems and application types.

Data ConstructionEvaluationMultimodal Training

Problem

General GUI agents need large, diverse trajectory data, but high-quality real-world demonstrations are difficult to collect at scale.

Key Contributions

Contributed to benchmark evaluation, including offline and online testing pipelines.
Supported training experiments based on Qwen2.5-VL and LoRA-style fine-tuning.
Helped validate the resulting agent against established GUI benchmarks.

Results

Supported the construction and validation of a million-scale GUI trajectory dataset.
Helped connect large-scale data generation with measurable downstream gains.

TongUI showed me how much infrastructure is required before a large-scale dataset becomes genuinely useful: filtering, validation, benchmark alignment, and careful error analysis all matter. My role was especially close to the evaluation side, where we had to connect the data pipeline to meaningful agent improvements.