Hero — 10–15s looping video of the Claude Code prototype in action. A builder types a natural-language prompt → the agent responds with a proposal → app structure updates on canvas. Muted autoplay, no controls.

Role: Design DRI
Timeline: Jan 2026 – ongoing
Team: Ilana Djemal (PM), Robin Pauli (eng lead), Alisa Faingold (data science)
Tools: Figma, Claude Code, GitHub

Tulip Interfaces

AI App Authoring

Designing for builders who now have a collaborator.

Framing

Every builder tool is racing to add an AI agent. The more interesting design problem isn’t the agent itself. It’s the editor underneath it.

Tulip’s app editor is a canvas-based no-code tool for manufacturing engineers — people who build shopfloor apps to digitize processes on the shop floor without writing code. Building an app today requires technical knowledge: configuring steps, triggers, logic, and data models.

It’s time-consuming for experienced builders and intimidating for new users.

However, bolting a chat window onto an existing editor doesn’t solve the real problem, which is that the app editor itself was designed for a solo builder. When a user suddenly has a collaborator, the product they thought they knew needs to work differently. As we introduce AI into our app authoring experience, I’m trying to answer how changes get reviewed, what the agent can see and how disagreement gets handled.

A simple schematic diagram — two side-by-side states. Left: "Editor, today" showing the current app editor with a user manipulating it directly. Right: "Editor, with agent" showing the same canvas with two actors — the user and the agent — both able to make changes.

Approach — why I didn’t start in Figma

When I joined this project in January, the engineering prototype was a few months old and partially working. My default instinct was to start designing screens.

Instead, I discovered that the prototype was unstable in a way that makes high-fidelity design actively wasteful. Features were being added and removed week-by-week. The agent’s capabilities were expanding. Design decisions made against a moving target could easily encourage the team to rally around visuals instead of debating the underlying product.

So I started with the current-state user journey. I mapped what a builder does today in the app editor, from opening a blank app to shipping a working version. I posted the map in #p-ai-builder with a specific ask: correct or add to the user journey and this artifact can evolve with evidence for why pain points matter.

This gave the team a shared artifact to argue against. Instead of debating UI choices in the abstract, we could point at specific journey moments and ask: where is an agent most valuable here? Where does it get in the way? Which pain points are worth an agent trying to solve, and which are better left alone?

Screenshot of the Agentic AI Figma board — the user journey section. Annotated with 3–4 callouts showing specific pain points the team debated. Before-and-after versions of the journey with added stickies show the artifact doing its job.

Prototyping in code, not Figma

I built the design prototype in code.

By February, the journey map had done its job. The team had alignment on the highest-value agent moments. But the next question — what does the agent chat UX actually feel like? — couldn’t be answered in Figma. Agent interactions are fundamentally about timing, state changes, and the specific rhythm of user to agent exchange. There was a limit to Figma prototypes.

So I opened Claude Code and built a working prototype on my local app editor branch. Over two weeks I touched 42 files, experimenting with:

How the agent’s “thinking” state should appear
How the agent’s proposals should be surfaced for review before committing changes to the app
What happens when the agent misreads intent
How the editor canvas should update as the agent builds

2–3 screen captures or short 5-second clips from the prototype. (1) Agent thinking state, folded and unfolded; (2) Proposal acceptance flow; (3) Canvas updating live during an agent response. Each labeled with what it demonstrates.

I was explicit with the team that this was throwaway code. My goal was to create a living prototype and playground to test ideas.

Key design decisions

Thinking state as foldable detail, not hidden reasoning. AI products face a recurring tension: users want to trust the agent (which requires seeing its reasoning), but they also don’t want to read walls of text. The default industry pattern is to hide reasoning behind an expand toggle — which trains users not to look. My choice: surface a condensed “thinking box” inline with the agent’s response, folded by default but with a visible preview of what’s inside. The user can expand when they want detail and skip when they don’t. The key constraint: users must still be able to access the detail, not have it hidden from them.

Side-by-side comparison — the "hidden reasoning" default pattern vs. the "foldable thinking box" pattern, with arrows showing what's discoverable in each.

Trigger-level diff granularity, not change summaries. For users to trust agent-generated apps, they need to know exactly what changed, not a summary. This is a direct extension of my earlier work on app version diffs: operators building manufacturing apps cannot ship a version based on “the agent updated your trigger logic.” They need line-level specificity. I’m pushing for the same granularity standard in agent-generated diffs as in human-authored ones.

Grid canvas as default for new apps only. The team is debating whether to introduce a grid-based canvas structure that would help both users and agents place widgets consistently. My position: opt new apps in by default, leave existing apps alone. Retroactively restructuring existing apps is messy and inaccurate, and the cost of breaking builders’ trust in their existing work outweighs the consistency benefit. This was a pushback against the instinct to make the new model universal.

Decision matrix for chat panel placement — rows are placements, columns are the criteria. Filled in with scores and short annotations.

Operating inside engineering

Designing an agent product means living in the codebase, not the Figma file.

I opened GitHub PRs directly and paired regularly with devs on the team during implementation. When an agent was producing verbose markdown output that no front-end design could fix, I edited its system instructions via a SQL migration to slim the response at the source.

The boundary between “design” and “engineering” collapses when you’re working with a medium where the interaction model is inseparable from the backend behavior.

Screenshot of a GitHub PR — authored PR list view, or the common/npm/chat folder structure. Real artifacts, not decorative.

What I’d do differently — and open questions

A few honest notes from the work so far:

The journey map should have happened sooner. The prototype existed for three months before I mapped the current state. If I’d done the mapping first, the prototype scope would have been tighter from the start, and we’d have avoided some early engineering work that the journey later showed was premature.

Agent versioning is unsolved and gating GA. How do you version an app that was partially generated by an agent? What happens when the agent’s model changes underneath a shipped app? The team is still working through this. I don’t have a design answer yet, but I have a strong opinion that the versioning model for agent-generated content has to be functionally identical to human-authored content — users shouldn’t need to know which parts of their app came from where.

Progressive disclosure between beginner and advanced builders. The same agent interface needs to serve a first-time citizen developer and a power user who’s built 50 apps. I don’t have a great answer yet. Likely direction: the agent’s behavior adapts based on explicit user posture (“I’m new” vs “I know what I’m doing”) rather than inferring from usage patterns, which would feel surveillant.