Large Language Models are reshaping task automation, yet remain limited in complex, multi-step real-world
tasks that require aligning with vague user intent and enabling dynamic user override. From a formative
study with 12 participants, we found that end-users actively seek to shape task-oriented interfaces rather
than relying on one-shot outputs. To address this, we introduce the human-agent co-generation paradigm,
materialized in DuetUI. This LLM-empowered system unfolds alongside task progress through a bidirectional
context loop—the agent scaffolds the interface by decomposing the task, while the user's direct
manipulations implicitly steer the agent's next generation step. In a technical ablation study and a user
study with 24 participants, DuetUI improved task efficiency and interface usability, supporting more
seamless human-agent collaboration. Our contributions include the proposal of this novel paradigm, the
design of a proof-of-concept DuetUI prototype embodying it, and empirical and technical insights from an
initial evaluation of how this bidirectional loop may help align agents with human intent and inform future
development.
Key Contributions
Defines the human-agent co-generation paradigm grounded in a bidirectional context loop.
Implements DuetUI, enabling agents to decompose tasks into editable scaffolds that welcome
user overrides.
Demonstrates significant gains in task efficiency and interface usability through a
controlled user study.
How the Bidirectional Loop Works
1. Goal Decomposition
The agent interprets high-level intents and produces an interface scaffold with
structured layout, task modules, and actionable suggestions.
2. Direct Manipulation Feedback
Users adjust components directly on the canvas. These manipulations provide implicit
feedback that the agent converts into updated context.
3. Iterative Co-Generation
With refreshed context, the agent proposes refined UI states or complementary
components, sustaining a fluid collaboration rhythm that respects user agency.