{"id":13828,"date":"2026-05-27T14:52:48","date_gmt":"2026-05-27T14:52:48","guid":{"rendered":"https:\/\/cheesecakelabs.com\/blog\/"},"modified":"2026-05-27T14:52:50","modified_gmt":"2026-05-27T14:52:50","slug":"harness-engineering","status":"publish","type":"post","link":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/","title":{"rendered":"Harness Engineering: Why &#8220;Done&#8221; Isn&#8217;t the Agent Saying So"},"content":{"rendered":"\n<p>In November 2025, Justin Young at Anthropic published a <a href=\"https:\/\/www.anthropic.com\/engineering\/effective-harnesses-for-long-running-agents\" target=\"_blank\" rel=\"noreferrer noopener\">post<\/a> on what they had learned running long-running coding agents. One observation in that post stuck with me. He described a failure mode where Claude, after some progress on a project, would look at the state of the work, see that things had been built, and declare the job done. The verb was &#8220;declare.&#8221; The agent ran a curl, got a 200 back, called the integration finished, and moved on.<\/p>\n\n\n\n<p>Five months later, Birgitta B\u00f6ckeler at Thoughtworks published the cleanest writeup I have read of <a href=\"https:\/\/martinfowler.com\/articles\/harness-engineering.html\" target=\"_blank\" rel=\"noreferrer noopener\">what we should build around the model<\/a> to stop that from happening. She called the layer the harness, and she split it into two halves: <strong>guides<\/strong> (feed-forward controls that anticipate the agent&#8217;s behavior and steer it before it acts) and <strong>sensors<\/strong> (feedback controls that observe the result and help it self-correct).<\/p>\n\n\n\n<p>Two months before B\u00f6ckeler&#8217;s piece, Kief Morris had published the loop framework that gave the layer its strategic shape.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What the harness actually is<\/h2>\n\n\n\n<p>If you say &#8220;harness&#8221; to most engineers in 2026, they will nod and tell you they are doing it. Then you ask what they mean and you get a list of three things: linters, tests, CI. That is not a harness. That is a small slice of the sensor layer with no guides and no judgment.<\/p>\n\n\n\n<p>B\u00f6ckeler&#8217;s framing is the one I have adopted internally at Cheesecake Labs. The harness has two halves.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Guides are feed-forward<\/h3>\n\n\n\n<p>They shape what the agent does before it acts. The CLAUDE.md file at the project root, the PR template the agent reads before opening a pull request, the spec it must implement against, the skill that encodes how this team does database migrations, the architectural conventions written as plain English that the agent reads on every session.<\/p>\n\n\n\n<p>Guides are cheap and they compound \u2014 and they are also the part of the harness most teams under-invest in, because none of it ships a feature on its own.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Sensors are feedback<\/h3>\n\n\n\n<p>The sensors observe what the agent did and tell it whether it worked. Linters and type checkers, yes, but also test suites that actually run, separate review agents that read the diff, faithfulness checks that compare the implementation back to the spec, hooks on commit and on PR open, and the judge model that grades the work against the acceptance criteria. Sensors are how the agent learns it was wrong.<\/p>\n\n\n\n<p><strong>B\u00f6ckeler&#8217;s sharpest point is that you need both. <\/strong>Sensors alone leave you with an agent that keeps making the same mistake because nothing told it the right rule upstream. Guides alone leave you with an agent that follows the rules but never finds out whether they produced the right outcome. A real harness is the closed loop between the two.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Read more: <\/strong><a href=\"https:\/\/cheesecakelabs.com\/blog\/spec-driven-development\/\" target=\"_blank\" rel=\"noreferrer noopener\">Spec-Driven Development: How to Capture Intent Before You Burn Tokens<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">The maturity ladder. Where most teams sit.<\/h2>\n\n\n\n<p>Morris&#8217;s in \/ on \/ out of the loop framing is the most useful diagnostic tool I have for talking to engineering leaders right now. I ask them where their team sits and the answer tells me what to invest in next.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">In the loop<\/h3>\n\n\n\n<p>The engineer reviews every line the agent produces. They are the gatekeeper on the innermost loop, where code gets generated. Morris&#8217;s words: &#8220;the challenge when we insist on being too closely involved in the process is that we become a bottleneck.&#8221;<\/p>\n\n\n\n<p><strong>Most teams I see live here: <\/strong>they use Claude Code, they ship features, and their senior engineers spend half their day reviewing AI-generated diffs by hand. The agent went faster and the reviewer became the constraint.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">On the loop<\/h3>\n\n\n\n<p>The engineer designs and maintains the mechanisms that produce and validate the agent&#8217;s work. Morris again: &#8220;Rather than personally inspecting what the agents produce, we can make them better at producing it.&#8221;<\/p>\n\n\n\n<p>This is where harness engineering becomes a real category of work. You stop fixing individual bad PRs and start fixing the system that produced them. The senior engineer&#8217;s job shifts from line reviewer to harness builder.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Out of the loop<\/h3>\n\n\n\n<p>The harness is mature enough that the agent runs largely autonomously and the human audits aggregate outputs. Morris calls this the natural home for what people loosely term &#8220;vibe coding,&#8221; but only when the harness is strong enough<strong> to keep vibe-coded<\/strong> output safe. Without that harness, &#8220;out of the loop&#8221; is just shipping bugs faster.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Flywheel<\/h3>\n\n\n\n<p>There is a fourth rung B\u00f6ckeler implies but does not name: the harness improving from its own outputs. Failed gates become CLAUDE.md updates, rejected PRs become new tests, and the harness compounds. This is where the leverage is.<\/p>\n\n\n\n<p>The leap from in to on is the single biggest career move of the next two years for senior engineers, and the single biggest architecture move for engineering leaders. The next leap, from on to out, requires the harness to actually be good.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Completion gates. The cascade.<\/h2>\n\n\n\n<p>This is the part of the harness most teams have not built and most need. The premise is that &#8220;done&#8221; is not the agent saying so \u2014 it is the system proving it. A task moves through a cascade of checks, cheap filters first, before it is accepted, with each gate catching a specific failure mode and returning a classified reason and an actionable fix when it fails.<\/p>\n\n\n\n<p>The cascade I run at Cheesecake Labs has five gates. None of these are individually novel. The order and the framing are mine, built on top of B\u00f6ckeler&#8217;s guides-and-sensors model and Anthropic&#8217;s failure-mode taxonomy.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Gate 1: Structural<\/h3>\n\n\n\n<p>Lint, typecheck, unit tests. Cheap, deterministic. Catches what Anthropic calls the &#8220;marks complete without verification&#8221; failure: the agent ran a curl, got a 200, declared the integration working. This gate fails roughly 30 to 50% of first-shot agent PRs in my experience. That is a healthy signal. It means the gate is doing its job.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Gate 2: File integrity<\/h3>\n\n\n\n<p>Critical files untouched. Did the agent silently delete tests to make them pass? Rewrite an API contract instead of conforming to it? Modify a config file outside the change scope? These are the failures that ship as silent regressions. A simple allow-list on which files a task is permitted to touch catches most of them. The fix takes one line of YAML. Not enough teams have it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Gate 3: Sufficiency<\/h3>\n\n\n\n<p>Does the diff actually cover the <a href=\"https:\/\/cheesecakelabs.com\/blog\/spec-driven-development\/\" target=\"_blank\" rel=\"noreferrer noopener\">scope of the spec<\/a>? An agent that ships half the feature and declares the rest &#8220;follow-up work&#8221; is the most common failure mode I see in <a href=\"https:\/\/cheesecakelabs.com\/blog\/plan-mode-claude-code\/\" type=\"post\" id=\"13797\" target=\"_blank\" rel=\"noreferrer noopener\">plan mode<\/a> workflows. The fix is mechanical: every task in tasks.md has acceptance criteria, every PR maps to one task, and the gate verifies the diff touched what the task said it would.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Gate 4: Faithfulness<\/h3>\n\n\n\n<p>Does the implementation actually do what the design said it would do? This is the gate most teams skip. The mechanic is borrowed from RAG evaluation tools like RAGAS and adapted: compute a semantic similarity between the diff and the design markdown. Cheap, embedding-based, runs in seconds. It is a filter, not a verdict. If the similarity is below a threshold, the PR fails before it ever pays for a judge model.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Gate 5: Judge LLM<\/h3>\n\n\n\n<p>A separate model, ideally a different one, reads the spec, the design, the tests, and the diff, and produces an explicit accept or reject with reasons. Zheng et al. (2023) showed strong LLM judges agree with humans more than 80% of the time, &#8220;the same level of agreement between humans.&#8221;<\/p>\n\n\n\n<p>The <strong>Agent-as-a-Judge work <\/strong>from Meta in 2024 extended this specifically to coding agents and found it &#8220;dramatically outperforms LLM-as-a-Judge and is as reliable as our human evaluation baseline.&#8221; The judge runs only on PRs that already passed gates 1 through 4, so it is the most expensive gate but you pay it the least often.<\/p>\n\n\n\n<p>The order matters because the judge is expensive in tokens and latency \u2014 you only pay for it on PRs that have already cleared the structural, integrity, sufficiency, and faithfulness checks. By the time the judge runs, it is evaluating work that is plausibly correct, not work that already failed lint.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Read more: <\/strong><a href=\"https:\/\/cheesecakelabs.com\/blog\/three-eras-of-software\/\" type=\"post\" id=\"13791\" target=\"_blank\" rel=\"noreferrer noopener\">The Three Eras of Software: From Autocomplete to Agentic Development<\/a><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">LLM-as-Judge. Never let the executor grade itself.<\/h2>\n\n\n\n<p>I want to underline the structural argument here because it is the move most teams resist most stubbornly.<\/p>\n\n\n\n<p>The agent that implemented the feature does not get to decide whether it is done. Same conflict of interest as a developer reviewing their own PR. The executor agent has an incentive (built into how it was prompted) to converge on &#8220;complete.&#8221; If you let it grade itself, it will grade itself generously. The only fix is a separate evaluator.<\/p>\n\n\n\n<p>In practice this means two agents, ideally two different models. At Cheesecake Labs we run Sonnet 4.6 for implementation and Opus 4.7 for judging. The judge sees only the spec, the design, the tests, and the diff. It does not see the executor&#8217;s reasoning. It does not see the chat history. It produces a structured verdict: accept, reject (with classified reasons), or request clarification (with specific questions).<\/p>\n\n\n\n<p>The rejection rate on first pass is between <strong>15 and 25% depending on the team and the spec quality. <\/strong>That is not a sign that the executor is bad. It is a sign that the judge is doing its job. Without the judge, that 15 to 25% of work was shipping as &#8220;done&#8221; and getting caught later, either in QA, in the next sprint, or in production.<\/p>\n\n\n\n<p>The <a href=\"https:\/\/cloud.google.com\/resources\/content\/2025-dora-ai-assisted-software-development-report\" target=\"_blank\" rel=\"noreferrer noopener\">DORA 2025 Accelerate State of DevOps report<\/a> puts the wider point most directly: &#8220;AI doesn&#8217;t fix a team. It amplifies what&#8217;s already there.&#8221; If your &#8220;done&#8221; definition was already loose, AI ships more loose-definition work faster. The judge tightens the definition.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Where the harness learns<\/h2>\n\n\n\n<p>The fourth rung, the flywheel, is where this work compounds. Most teams never get there because they treat each rejected PR as a one-off. The PR gets fixed, merged, and forgotten.<\/p>\n\n\n\n<p>The pattern that gets you to the flywheel is mechanical and unglamorous. <strong>Every rejected PR generates a record: <\/strong>what failed, why, what the fix was. Every week, the team reviews those records and asks one question. Is there a guide we could add that would have prevented this? A <strong>CLAUDE.md <\/strong>entry, a skill, a new gate, a new test. If yes, add it. Commit it. Now the next PR cannot fail in the same way.<\/p>\n\n\n\n<p>Run a thirty-minute weekly harness retro on the projects where the harness is mature enough to support it. The first month it feels like overhead. By month three it feels like the most leveraged thirty minutes on the calendar. The cost of building the harness is paid down by the harness itself.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Closing thought<\/h2>\n\n\n\n<p>The discourse on agentic coding is converging fast. Anthropic&#8217;s harness post, <a href=\"https:\/\/martinfowler.com\/articles\/harness-engineering.html\" target=\"_blank\" rel=\"noreferrer noopener\">Birgitta B\u00f6ckeler&#8217;s guides and sensors<\/a>, <a href=\"https:\/\/martinfowler.com\/articles\/exploring-gen-ai\/humans-and-agents.html\" target=\"_blank\" rel=\"noreferrer noopener\">Kief Morris&#8217;s loop positions<\/a>, the SWE-Bench Pro gap, the DORA 2025 finding. The framing is settling. The model is not the bottleneck. The harness is.<\/p>\n\n\n\n<p><strong>On the loop is where you change the harness that produced the artifact.<\/strong> That is the line from Morris&#8217;s piece I keep coming back to. The senior engineer who was the bottleneck in the In-the-loop world becomes the most leveraged person in the company in the On-the-loop world.<\/p>\n\n\n\n<p>The job description changes. The output of a great senior engineer is no longer code. It is the system that makes the next hundred features ship correctly with much less of their time.<\/p>\n\n\n\n<p>On Cheesecake Labs, we help engineering organizations move from &#8220;we use Claude Code&#8221; to &#8220;we built the harness that lets us trust Claude Code.&#8221; Gates, judges, classified failure logs, the unglamorous infrastructure that turns agentic coding into a delivery system.<\/p>\n\n\n\n<p>If your senior engineers are spending most of their week reviewing agent diffs by hand, <a href=\"https:\/\/cheesecakelabs.com\/contact\/\" target=\"_blank\" rel=\"noreferrer noopener\">talk with us<\/a>. The fix is usually a harness fix, and it pays back in weeks.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"http:\/\/cheesecakelabs.com\/services\" target=\"_blank\" rel=\" noreferrer noopener\"><img decoding=\"async\" width=\"1200\" height=\"409\" src=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-1200x409.jpg\" alt=\"\" class=\"wp-image-13491\" srcset=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-1200x409.jpg 1200w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-600x205.jpg 600w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-768x262.jpg 768w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-1536x524.jpg 1536w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl-760x259.jpg 760w, https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2023\/06\/legacy-app-ckl.jpg 1920w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>In November 2025, Justin Young at Anthropic published a post on what they had learned running long-running coding agents. One observation in that post stuck with me. He described a failure mode where Claude, after some progress on a project, would look at the state of the work, see that things had been built, and [&hellip;]<\/p>\n","protected":false},"author":92,"featured_media":13833,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1288,432],"tags":[1399,1395,1400],"class_list":["post-13828","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-engineering","tag-agents","tag-ai-agent","tag-harness"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Harness Engineering: Why &quot;Done&quot; Isn&#039;t the Agent Saying So<\/title>\n<meta name=\"description\" content=\"Here is how to build the harness engineering that turns yours agentic coding into shippable softwares for your business.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Harness Engineering: Why &quot;Done&quot; Isn&#039;t the Agent Saying So\" \/>\n<meta property=\"og:description\" content=\"Here is how to build the harness engineering that turns yours agentic coding into shippable softwares for your business.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/\" \/>\n<meta property=\"og:site_name\" content=\"Cheesecake Labs\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cheesecakelabs\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-27T14:52:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-27T14:52:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1536\" \/>\n\t<meta property=\"og:image:height\" content=\"689\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Cheesecake Labs\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@cheesecakelabs\" \/>\n<meta name=\"twitter:site\" content=\"@cheesecakelabs\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/\"},\"author\":{\"name\":\"Douglas da Silva\"},\"headline\":\"Harness Engineering: Why &#8220;Done&#8221; Isn&#8217;t the Agent Saying So\",\"datePublished\":\"2026-05-27T14:52:48+00:00\",\"dateModified\":\"2026-05-27T14:52:50+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/\"},\"wordCount\":1995,\"image\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg\",\"keywords\":[\"agents\",\"AI agent\",\"harness\"],\"articleSection\":[\"Artificial Intelligence\",\"Engineering\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/\",\"url\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/\",\"name\":\"Harness Engineering: Why \\\"Done\\\" Isn't the Agent Saying So\",\"isPartOf\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg\",\"datePublished\":\"2026-05-27T14:52:48+00:00\",\"dateModified\":\"2026-05-27T14:52:50+00:00\",\"author\":{\"@type\":\"person\",\"name\":\"Douglas da Silva\"},\"description\":\"Here is how to build the harness engineering that turns yours agentic coding into shippable softwares for your business.\",\"breadcrumb\":{\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#primaryimage\",\"url\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg\",\"contentUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg\",\"width\":1536,\"height\":689},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/cheesecakelabs.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Harness Engineering: Why &#8220;Done&#8221; Isn&#8217;t the Agent Saying So\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#website\",\"url\":\"https:\/\/cheesecakelabs.com\/blog\/\",\"name\":\"Cheesecake Labs\",\"description\":\"Nearshore outsourcing company for Web and Mobile design and engineering services, and staff augmentation for startups and enterprises..\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/cheesecakelabs.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"name\":\"Douglas da Silva\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/cheesecakelabs.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2017\/06\/douglas-da-silva.jpeg\",\"contentUrl\":\"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2017\/06\/douglas-da-silva.jpeg\",\"caption\":\"Douglas da Silva\"},\"url\":\"https:\/\/cheesecakelabs.com\/blog\/autor\/douglasgimli\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Harness Engineering: Why \"Done\" Isn't the Agent Saying So","description":"Here is how to build the harness engineering that turns yours agentic coding into shippable softwares for your business.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/","og_locale":"en_US","og_type":"article","og_title":"Harness Engineering: Why \"Done\" Isn't the Agent Saying So","og_description":"Here is how to build the harness engineering that turns yours agentic coding into shippable softwares for your business.","og_url":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/","og_site_name":"Cheesecake Labs","article_publisher":"https:\/\/www.facebook.com\/cheesecakelabs","article_published_time":"2026-05-27T14:52:48+00:00","article_modified_time":"2026-05-27T14:52:50+00:00","og_image":[{"width":1536,"height":689,"url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg","type":"image\/jpeg"}],"author":"Cheesecake Labs","twitter_card":"summary_large_image","twitter_creator":"@cheesecakelabs","twitter_site":"@cheesecakelabs","twitter_misc":{"Written by":null,"Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#article","isPartOf":{"@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/"},"author":{"name":"Douglas da Silva"},"headline":"Harness Engineering: Why &#8220;Done&#8221; Isn&#8217;t the Agent Saying So","datePublished":"2026-05-27T14:52:48+00:00","dateModified":"2026-05-27T14:52:50+00:00","mainEntityOfPage":{"@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/"},"wordCount":1995,"image":{"@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#primaryimage"},"thumbnailUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg","keywords":["agents","AI agent","harness"],"articleSection":["Artificial Intelligence","Engineering"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/","url":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/","name":"Harness Engineering: Why \"Done\" Isn't the Agent Saying So","isPartOf":{"@id":"https:\/\/cheesecakelabs.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#primaryimage"},"image":{"@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#primaryimage"},"thumbnailUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg","datePublished":"2026-05-27T14:52:48+00:00","dateModified":"2026-05-27T14:52:50+00:00","author":{"@type":"person","name":"Douglas da Silva"},"description":"Here is how to build the harness engineering that turns yours agentic coding into shippable softwares for your business.","breadcrumb":{"@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#primaryimage","url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg","contentUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2026\/05\/harness-cover.jpg","width":1536,"height":689},{"@type":"BreadcrumbList","@id":"https:\/\/cheesecakelabs.com\/blog\/harness-engineering\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cheesecakelabs.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Harness Engineering: Why &#8220;Done&#8221; Isn&#8217;t the Agent Saying So"}]},{"@type":"WebSite","@id":"https:\/\/cheesecakelabs.com\/blog\/#website","url":"https:\/\/cheesecakelabs.com\/blog\/","name":"Cheesecake Labs","description":"Nearshore outsourcing company for Web and Mobile design and engineering services, and staff augmentation for startups and enterprises..","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cheesecakelabs.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","name":"Douglas da Silva","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cheesecakelabs.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2017\/06\/douglas-da-silva.jpeg","contentUrl":"https:\/\/ckl-website-static.s3.amazonaws.com\/wp-content\/uploads\/2017\/06\/douglas-da-silva.jpeg","caption":"Douglas da Silva"},"url":"https:\/\/cheesecakelabs.com\/blog\/autor\/douglasgimli\/"}]}},"_links":{"self":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/13828","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/users\/92"}],"replies":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/comments?post=13828"}],"version-history":[{"count":2,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/13828\/revisions"}],"predecessor-version":[{"id":13830,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/posts\/13828\/revisions\/13830"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/media\/13833"}],"wp:attachment":[{"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/media?parent=13828"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/categories?post=13828"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cheesecakelabs.com\/blog\/wp-json\/wp\/v2\/tags?post=13828"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}