Forging a Workflow: Agentic Engineering in Practice

I’ve been building software for over twenty years, most of it customer-facing products, mobile apps, SaaS platforms. When coding agents started gaining traction, I was skeptical at first, then curious, then increasingly productive. What started with solving small, scoped problems with Claude Code turned into a big shift.

And I’m not alone in this. Friends and colleagues are seeing the same thing. The wider circle is starting to follow. This isn’t an early-adopter curiosity anymore, it’s becoming how people work.

Over the winter holidays I decided to go all-in. One side project, built entirely with coding agents, from planning to production. It changed how I think about my job and the job of every software engineer.

The Coloring Book

My kids are six and three, and they love coloring. At that age they can sit down with a page and disappear into it for a surprisingly long time. I 3D printed coloring templates for them for a while, but they kept asking for things I couldn’t easily make. A unicorn swimming in a lake. A dragon riding a skateboard. The kind of specifics only a kid comes up with.

That’s where the idea started. If AI can generate images, it can generate coloring pages in any shape. I parked the idea over the summer and picked it up again in December with a plan: build it with coding agents, start to finish, not as an experiment but as a product I’d actually ship. I called it Colorburst.

I set constraints for myself to be closer to the systems I work with day to day, not a playground with every shiny tool plugged in. It’s a greenfield project with an AI-friendly stack and I know that gives me an advantage.

The core was done on the side between Christmas and New Year: coloring page generation, automatic coloring, remixing images based on instructions. The progress surprised me, not just the speed but the confidence I have in the result. The code is structured, the tests are there, more documentation than I have ever seen, the app feels production-ready. My years of experience in building software shaped the decisions, but the agent wrote most of the code.

The real test came from my kids. They ask me every day to make something new, sitting on my lap, explaining what they want, watching while my little animated loading illustration draws on the screen. When the result appears they can’t wait to run to the printer. Friends with kids are getting the same reactions.

Colorburst isn’t public yet because I need to cover the operational costs, but it should be open for everyone soon. More importantly, the workflow I developed while building it turned into something I want to share.

The Problem: Process as Tax

Every codebase that involves a team has coordination complexity, and structure helps to tame it. Someone has to communicate what needs to be built, make sure the code stays consistent, and keep track of what changed and why. These aren’t bureaucratic questions. They’re engineering questions, and the way most teams answer them hasn’t changed much. You write tickets, name branches, fill out PR templates, update changelogs, go through review. It works, but it’s manual, repetitive, and the first thing that gets skipped when you’re under pressure.

Solo developers skip most of this and that’s fine. You don’t need a PR template when you’re the only one contributing. That said, good documentation and tests are an investment in yourself. Come back to a side project a year from now and you’ll wish you had left some breadcrumbs.

When I started using coding agents seriously, the first thing that impressed me was the speed. You describe what you want, the agent builds it, you move on. But there are gaps. The agent doesn’t know your naming conventions or your commit format. It won’t put tests next to the source files just because that’s how your team does it. Basically, it’s a fast colleague who didn’t read the team wiki.

You can get impressive demos that way. Building something with a team requires more.

The Shift

At some point during the Colorburst build I realized the value wasn’t just in “AI writes my function.” It also was in “AI follows my development process.”

I want the agent to write code like I (my team) would write it. Follow my conventions, my patterns, my architectural decisions. Not generic best practices from the training data, but the specific way I build software. One-shot code generation can’t do that. You prompt, you get code, you fix what’s wrong, you prompt again. It plateaus fast.

What actually compounds is encoding your workflow into reusable steps the agent can follow. You’re not writing prompts for answers anymore. You’re writing prompts that encode how you work.

That’s the shift from using an AI assistant to doing agentic engineering. The agent doesn’t just help you code. It executes your workflow.

Building the Forge

I packaged what I learned into Forge, a set of skills that follow the Agent Skills open standard. As Mario Zechner put it about his own coding agent: “There are many workflow skills, but it’s mine.”

The skills form a pipeline: set up the project foundation, plan what to build, implement it, self-review, respond to feedback, document what shipped. Each step feeds into the next, and the conventions you define once flow through everything from issue titles to branch names to commit messages. That kind of consistency is tedious to maintain manually but effortless when the workflow handles it.

One design decision turned out to be non-negotiable: every skill explores the codebase before asking questions. An agent that reads your code first and then asks what’s unclear produces fundamentally better results. When something is unknown, the skill leaves a TODO marker instead of inventing content. I’d rather have an honest gap than hallucinated boilerplate.

Well-Scoped Issues Enable Parallel Execution

Peter Steinberger described his approach as running five to ten agents in parallel, like managing a StarCraft game. He plans thoroughly before starting execution, and I think that’s exactly right. The parallelism works because of what happens before it.

The planning step is conversational. You discuss the problem with the agent, explore options, and what comes out is a well-defined issue with acceptance criteria, alternatives considered, and implementation details. That artifact can be picked up and executed with minimal attention. Fill a pipeline with well-scoped issues and you can run them in parallel naturally.

The planning step is where the human adds the most value. The execution step is where the agent shines.

The Collaborative Loop

When an issue is well-scoped, agents produce solid code. The problems show up in the things that aren’t part of the feature spec: rate limiting, security headers, error handling patterns that are specific to your project. These cross-cutting concerns can get lost when both you and the agent are too focused on a wider scoped feature problem.

This is where the collaboration becomes iterative. You review what the agent built, notice what’s missing, and nudge it. I found reviewing in GitHub, giving line-based feedback, and asking the agent to address it afterwards highly effective. The important part is what comes next: you instruct the agent to update its own documentation so it remembers next time. Add the rate limiting pattern to docs/. Document the security checklist. Update the coding guidelines with the error handling convention.

Early on, this is effort. You’re essentially onboarding a new team member and the documentation you write for the agent is the documentation you’d want for any new hire. But at a certain point something shifts. The agent starts keeping these things in mind on its own. Not perfectly, and not when the context gets too polluted, but consistently enough that you notice the difference.

There’s a compounding effect that goes beyond the explicit documentation. As the codebase grows with consistent patterns, the agent orients itself on what’s already there and produces more of the same, which happens to be exactly what you want. The existing code becomes implicit guidance. Models are improving at this too, and the difference between generations is noticeable. The agent needs less explicit guidance when the codebase itself carries the conventions.

A recent ETH Zurich study confirms this: when they benchmarked context files like AGENTS.md on real GitHub issues, the files were largely redundant with the existing documentation in the repo. The context file is a bootstrap for when the codebase can’t speak for itself yet.

Mitchell Hashimoto calls this “engineering the harness”: when agents repeat mistakes, the fix isn’t better prompting but better documentation and tooling around them.

A 2025 study found that, in its benchmark setup, critical flaws in AI-generated code rose 37.6% after five rounds of edits without human intervention. The collaborative loop is what prevents that kind of degradation.

You’re not just reviewing code. You’re training a collaborator.

Why the PR Still Matters

Code review has always served multiple purposes, and with agents writing the code, all of them become more important.

Agents miss cross-cutting concerns. Not bugs primarily, but the things that live between features: security, performance, architectural consistency. The problem is that speed removes the bottlenecks, like code review and team oversight, that used to catch these issues. Well-scoped issues produce working features, but the space between features still needs a human eye.

Armin Ronacher calls human review “the final bottleneck”: code creation now outpaces review capacity, and when that gap compounds you get projects sitting on unreviewed pull requests. The answer isn’t removing review, it’s making the code more reviewable so the bottleneck doesn’t break.

PRs also give you composability. In my workflow, forge-reflect-pr runs the local quality gate: lint, build, test, self-review. Moving this to a PR lets me compose it with other review tools like GitHub Copilot review or CodeRabbit. The agent does the first pass, I make the final call. For teams introducing agentic engineering this becomes essential because it’s not just a technical step, it’s part of the change process.

And then there’s knowledge sharing. If nobody reviews what shipped, nobody on the team understands what shipped.

The PR is the last reliable mechanism for shared understanding when humans didn’t write the code.

Philipp Spiess argues we should move past code-level collaboration entirely, toward a future where the specification and the output are the only human-readable artifacts. I can see that future. Maybe we won’t even have code in higher-level languages anymore and line-by-line review won’t be possible or meaningful. But the organizations I know aren’t there yet, and the PR is the mechanism we have for the current change process. We can evolve past it once AI literacy in the org reaches commodity level.

The Engineer’s Role

The conversation around AI and engineering depends on where you look. On X, everyone is shipping at 100x and the future arrived yesterday. On Bluesky and Mastodon the picture is more nuanced, more skeptical, more concerned about what’s being lost. I can only evaluate my direct experience.

Colorburst showed me I can build much more alone in less time. I wouldn’t have gone as far without agents. At work I see the same thing: I can contribute meaningfully to tech stacks I’m less familiar with and move faster through problems that would have taken days. The software engineer’s job is not what it used to be, and if you miss this shift you won’t be able to come along for the ride.

But the role didn’t disappear, it moved. So what does the engineer actually do when agents write the code?

You own the architecture and decide what gets built in what order. The day-to-day is watching the flow, intervening when something drifts, reviewing and nudging. And at the end you decide what goes to production. That responsibility doesn’t change, and for now it can’t be delegated. Accountability for shipped code still requires a human who owns the decision.

For a solo developer working on a fun project you can be looser with this. I certainly was with Colorburst. But for production software in organizations, the human has to remain in the loop as quality gate and decision maker. Two-thirds of organizations currently operate without formal governance policies for AI tooling. The org structures aren’t ready for autonomous agents even if the technology were.

This is the environment I usually work in, and it’s why Forge is built the way it is. Not because agents can’t be trusted with more autonomy, but because the organizations around them aren’t there yet. The change process has to happen first.

What I Learned

Structure Amplifies Capability

An unstructured agent is fast but unpredictable. It can produce impressive output one moment and ignore your conventions the next. Give it a well-structured project, consistent patterns, existing tests, clear conventions in the code, and the output changes. This was the single biggest lesson from building Colorburst. That ETH study saw the same thing: agents didn’t get better from instructions in a context file, they performed better when the codebase already carried clear patterns. The agent got better because the project it worked in was well-structured.

The other kind of structure is the workflow: smaller steps, human checkpoints, feedback that flows back into the project. The study tested agents solving issues autonomously and found that more instructions alone didn’t lead to better outcomes. That tracks. A well-structured project gives the agent something to orient on. A structured workflow keeps it from drifting.

It’s a Relationship, Not a Transaction

The human-agent collaboration is iterative. You teach it your patterns, your security concerns, your architectural preferences, not in one shot but over time through nudges and documentation updates. Agents will get better and need less of this. But right now the collaboration is what produces the quality.

The Forge Is Open

Forge is my personal starting point. I shared it with a few friends and colleagues, and they started building their own workflows on top of it. It’s not a framework you have to adopt wholesale. Fork it, change it, throw out the parts that don’t fit. Teams will need to find common ground, but the skills should adapt to the team, not the other way around.

You can install it with a single command:

npx skills add mgratzer/forge

The tools will get better, the agents will get “smarter”, and the workflows we build today will look primitive in a few months. But the pattern of encoding how you work into reusable steps that an agent can follow, I think that’s here to stay for a while.

I’ve already started bringing this into my day job, working on that change process across the engineering org. It’s early, but I’ll report back how it goes.

My kids don’t care about any of this. They care that the dragon riding a skateboard looks cool. The workflow is what let me build that for them in no time.