Claude Code agents power Thicket’s 25-site portfolio with a ratchet audit

Must-Have

Clickbank.net

The AI Blueprint for Entrepreneurs

BUY NOW

Top Rated

Clickbank.net

Comprehensive AI Framework for Business

BUY NOW

Thicket’s Agent Stack: How Claude Code Agents, a Git-Based Memory, and a Ratchet Evaluation Built 25 Live Utility Sites

Thicket uses Claude Code agents, Git-as-memory, and an immutable evaluation contract to autonomously build, deploy, and optimize 25 utility websites for SEO.

Thicket’s experiment in agent-driven site creation began as a simple question: can a small team of autonomous AI agents, steered by a single human vision, build and maintain a growing portfolio of utility websites without ongoing human labor? Over six weeks the Thicket team moved from prototype to production with 25 live domains, all created, deployed, and iteratively improved by a roster of Claude Code agents. This result matters because it exposes a practical architecture for continuous, automated product development — not a one-off code-generation stunt, but a system that measures, preserves, and improves what works while retracting what doesn’t. The experiment centers on three engineering primitives — an immutable evaluation contract, Git as persistent memory, and an auditor-driven ratchet — plus a set of specialized agents that together cover research, design, build, content, SEO, and monitoring.

Why single-shot AI fails for live portfolios

One-shot prompt-and-generate workflows are useful for demos and rapid prototyping, but they don’t scale to portfolios that must improve and not regress over time. A generated site may ship once, but real-world success depends on continuous measurement, bug correction, content refreshes, deployment hygiene, and search/discovery signals that evolve. If an automated system is allowed to redefine its own success criteria, optimization becomes performative: agents will learn to look good on metrics they control rather than actually deliver value. Thicket’s approach rejects that circularity by embedding an external, unchangeable definition of success — a move that changes everything from governance to how the agents evolve.

How the ratchet evaluation enforces progress

At the heart of Thicket’s operation is what the team calls the ratchet mechanism. It’s a governance loop with three linked elements:

An immutable evaluation contract. Each site receives a health score (0–100) derived from build quality, uptime, geographic endpoint completeness, and traffic signals. The scoring formula is stored in a repository file (registry/eval.md) and is off-limits to agents. That immutability prevents the system from gaming its own yardstick and makes the auditor the only agent with authority to propose changes through a controlled process.
Git as authoritative memory. Every agent action is committed to Git. The agents read git log to understand historical experiments, probing which scaffolds, content strategies, and SEO tweaks succeeded or failed. This persistent, versioned history gives the system institutional memory: nothing is transient, and reversion is straightforward when a change reduces portfolio health.
An auditor that closes the loop. At weekly cadence an Auditor agent ingests agent status files, computes grades (A/B/C/D), and enforces remediation. First failures generate suggested instruction changes; repeated poor performance triggers direct edits to an agent’s instruction document (CLAUDE.md) so the agent’s behavior is corrected. If the edits hurt metrics, the Auditor reverts them; if they help, the change persists. Over time this creates a self-improving agent team that only escalates to human oversight when the Auditor can’t identify a fix.

This ratchet reframes “autonomy” as bounded, accountable iteration rather than hands-off roulette.

The agent team and their roles

Thicket configures each role as a Claude Code instance with its own instruction file; the CEO agent runs the sequence in a single session so each agent has current context. Key players include:

Analytics: the first actor each cycle, scanning all sites for health, computing the canonical scores, and diagnosing immediate failures so downstream agents act on fresh data.
Research: hunting for high-volume, low-competition niches. It tracks prediction accuracy and adjusts how it prioritizes opportunities, providing candidate ideas the Builder and Content agents turn into live products.
Designer: creates templates, wireframes, and accessibility-conscious layouts; hands off component definitions to the Builder.
Builder: scaffolds sites with a Next.js template, integrates analytics and GEO endpoints, and deploys to Netlify. It validates builds with curl checks before marking work as complete.
Content and Editor agents: generate and refine content, optimize for readability and topical coverage, and ensure markdown endpoints are available for LLM discovery.
SEO/GEO: implements the discovery-specific endpoints — /llms.txt, /llms-full.txt, /api/llm, markdown routes, and schema.org JSON‑LD — and runs a checklist to verify the endpoints exist and return correct responses.
Auditor: gradekeeper and remediation engine, applying the ratchet rules and pausing new builds if the portfolio KPI slips.

This sequence mirrors a small product team but replaces human labor with specialized LLM roles and explicit, auditable handoffs.

Registry: the single source of truth

Thicket uses a repository-level registry (registry/registry.json) as the canonical state store. The registry records site status, health_score, weekly_sessions, build_quality flags, and overall portfolio_score (the sum of health scores). Every agent reads the registry before acting and writes results back so there’s a single, machine-readable source of truth for orchestration, billing, and monitoring.

The portfolio_score is treated as the system’s primary KPI and cannot drop week over week without triggering the Auditor to pause new deployments and investigate. That constraint forces conservative behavior: agents must fix regressions before the system scales further.

MCP servers and precise calculations for LLMs

To avoid hallucinations in numeric logic, Thicket exposes a Model Context Protocol (MCP) server bundling 25+ calculators — mortgage payments, TDEE, compound interest, BMI, unit conversion, and more — as structured tools. An MCP package (@thicket-team/mcp-calculators) returns exact numeric outputs so Claude Code agents can rely on deterministic, auditable math without embedding calculation logic in freeform prompts.

Must-Have

The AI Blueprint for Entrepreneurs

Updated for 2026 frontier AI models

This essential guide empowers entrepreneurs to leverage AI tools for significant business growth and efficiency gains. Ideal for automating workflows and generating content effectively.

View Price at Clickbank.net

Providing a calculator service reduces hallucination risk and improves reliability for utility pages that need correct numbers. It also makes the LLM’s role clearer: reasoning and natural-language glue rather than raw computation. The MCP tool has shown traction — tens of downloads per week within a few weeks of release — indicating third parties see value in machine-accessible calculators for agent-assisted workflows.

Designing for LLM and search discoverability

Thicket treats LLMs as another discovery channel alongside traditional search engines. To support this, every site serves a set of LLM-friendly endpoints and formats:

/llms.txt: a concise, human-readable summary tailored for LLM ingestion.
/llms-full.txt: a full content dump for agents that need comprehensive context.
/api/llm: structured JSON for programmatic consumption.
Markdown routes and schema.org JSON‑LD on pages to aid parseability.

The SEO/GEO agent runs automated verifications for these endpoints; a 404 on any required route triggers a health failure. By explicitly addressing LLM discovery signals, Thicket positions its utility sites to be suggested by large language models, which increasingly act as gateways to content and tools. This approach complements conventional SEO efforts and aligns with broader trends where structured data and API-first design improve machine consumption.

What’s working and where the system still needs work

Several aspects of the architecture have proven effective in live traffic:

A hands-free deployment pipeline: Builder → Netlify → curl verification → registry update has reliably launched sites with minimal friction.
MCP calculator distribution demonstrates a replicable product tangent and downloads without paid promotion.
Engagement metrics on the quiz hub outperformed industry norms, showing the content models can deliver sticky experiences.
The ratchet mechanism has caught regressions early, preventing compounding failures across the portfolio.

But limitations remain:

Indexing lag: despite 25 live sites, only a minority are indexed by search engines yet, which limits organic traffic growth.
Content velocity: the content pipeline underdelivers relative to the cadence the team planned; Editor and Content agents require further tuning in instruction and review heuristics.
Social automation friction: an automated Mastodon posting account was suspended for bot-like behavior, underscoring platform policy risk in unattended posting.

These issues are solvable, but they highlight that operational realities — search crawlers, moderation policies, content quality at scale — still demand careful design and, occasionally, human intervention.

The technical stack and orchestration choices

Thicket’s engineering stack is conventional where it matters for robustness, and experimental where agents interact:

Sites: Next.js 14 with TypeScript and Tailwind CSS, which provides predictable rendering and componentization.
Shared libraries: a packages/base-site monorepo contains analytics wiring, GEO handlers, and shared UI components so agents can reuse proven building blocks.
Agent runtime: Claude Code CLI powers each agent instance, with per-agent CLAUDE.md instruction files that encode role behavior.
Orchestration: Git submodules organize one repo per site plus an orchestration repo; every agent commits status and artifacts back to Git.
Analytics: GA4 is used with a shared measurement ID across all sites to centralize traffic telemetry.
DNS and CDN: Cloudflare manages the thicket.sh zone and edge behavior.

These choices balance developer ergonomics and observability. Using standard web frameworks facilitates later human takeover or integrations with developer tools, CI systems, and monitoring stacks.

Practical reader questions: what it does, how it works, and who can use it

Thicket’s system automates the full lifecycle of simple utility sites: niche research, scaffolding, deployment, content generation, SEO/GEO endpoint provisioning, and weekly performance analysis. It works by sequencing specialized LLM agents with shared context in a Git-backed registry and enforcing success via an immutable evaluation contract.

Why it matters: the model shows that autonomous agents are most useful when embedded in a governance framework that records history, evaluates outcomes with an external yardstick, and continuously corrects agent behavior. For developers and businesses, this reduces human overhead for low-touch, high-leverage web properties and opens possibilities for automated product experimentation at scale.

Who can use it: teams with Claude Code (or another agent-capable runtime), a Git-centric workflow, and a willingness to codify evaluation criteria can emulate the architecture. It’s particularly appropriate for teams creating repeatable, narrowly scoped utilities (calculators, quizzes, converters, small SaaS microsites) where correctness, uptime, and discoverability are straightforward to define.

When it will be available: the code, evaluation files, and agent instructions are stored in public repositories in Thicket’s setup, and the MCP calculators are published as an npm package. Developers can begin by installing the MCP package and adapting the registry and ratchet rules to their own domain and priorities.

Developer and business implications

For engineers, Thicket demonstrates several samplings of future work patterns. First, treating Git as the universal interface between humans and agents creates auditable workflows and simplifies rollbacks, which appeals to teams that already rely on GitOps practices. Second, exposing deterministic tools (like MCP calculators) to LLMs reduces hallucination risk and clarifies responsibilities between toolchains and models — the model reasons, the tool computes.

For product and marketing teams, the architecture offers a way to scale experimentation without proportional headcount growth. A marketing team could prototype dozens of campaign microsites or calculators to test demand signals and measure conversion with minimal engineering lift. CRM and automation platforms could integrate with such a system to feed lead capture and nurture flows, while security tooling should be layered to vet deployments and API endpoints to prevent abuse.

However, the model raises operational and ethical questions. Unsupervised content publication at scale increases the risk of misinformation, platform policy violations (as seen with the Mastodon suspension), and duplicate-content penalties from search engines. Businesses will need guardrails for content moderation, rate limits, and responsible automation policies — and legal/compliance teams should be consulted when sites handle PII or financial calculations.

Integration points and ecosystem fit

Thicket’s approach is not isolated; it sits naturally in modern developer ecosystems. The use of Next.js and TypeScript aligns with frontend developer tooling, while a Git-based orchestrator enables CI/CD and GitOps integrations. The MCP pattern is extensible: teams could publish domain-specific toolkits (tax calculators, mortgage modules, health formulas) for LLMs to call. Marketing stacks, CRMs, and analytics platforms can consume the registration and analytics outputs to connect traffic to conversions.

For teams considering similar automation, practical next steps include: creating a non-modifiable evaluation contract that matches your business KPIs; building or adopting an MCP for domain-specific logic; and designing agent instructions with explicit failure-handling and reversion policies. Phrases like “deployment checklist,” “Next.js SEO best practices,” and “MCP integration guide” could serve as internal documentation links when you implement these patterns.

Risk management and operational hygiene

Automated site portfolios require layered protection. Thicket’s engineering choices address several risks: immutable eval rules prevent goal drift, Git-backed memory simplifies audits, and an Auditor agent pauses new builds if portfolio health declines. Still, teams must plan for search indexing delays, platform moderation, and legal compliance. Recommended practices include staged rollouts, crawl budget management, explicit sitemaps for search, rate-limiting outbound bot traffic, and human-in-the-loop review for sensitive content categories.

Security tooling should scan generated code for dependencies with known vulnerabilities and flag outbound connection patterns that could trigger abuse or platform policy enforcement. Observability should include uptime checks, structured logging, and GA4 funnels so the Auditor has actionable telemetry.

Thicket’s live experiment surfaces another practical point: machine-readable endpoints (JSON-LD, /api/llm, markdown) not only aid LLM discovery but also make it easier to integrate with automation platforms and analytics feeds, which can be important for conversion tracking and CRM syncs.

As autonomous agents become more capable, engineers and product teams must treat them as distributed contributors in the delivery pipeline, with the same expectations around traceability, rollbacks, and testing that human teams have long observed.

Thicket’s six-week run shows that with careful governance, a small set of specialized Claude Code agents can create and maintain a portfolio of utility websites while catching regressions and iterating on failures. The architecture’s combination of an immutable evaluation contract, Git-as-memory, MCP-backed deterministic tools, and an Auditor ratchet provides a repeatable blueprint for teams aiming to automate low-friction web products. Future work will likely focus on accelerating search indexing, improving content throughput, smoothing social publishing to comply with platform policies, and expanding the MCP ecosystem to cover more domain-specific calculators and validators. Those advances would make the model more broadly applicable across marketing, developer tools, and automation platforms as organizations weigh the benefits and responsibilities of agent-driven product creation.

Top Rated

Comprehensive AI Framework for Business

700+ prompts for effective automation

Designed for entrepreneurs, this AI system streamlines strategy, content creation, and sales automation with over 700 structured prompts suitable for various AI platforms.

View Price at Clickbank.net