The Real Framework Is the One You Build on Top
On the restart as production technique, and what the CCGS community built that no repository could ship.
elysi trimmed Claude Code Game Studios to 49% of its original size, wrote 15 bash helpers, capped tech-debt at 50 items, and built a 1,081-test suite, none of which shipped with the framework
The LinearB 2026 Software Engineering Benchmarks Report finds AI-assisted teams see technical debt increase 30-41% after adoption; the top 20% impose quality gates and governance constraints before the debt compounds
The Toyota Production System calls it jidoka: detect the problem, stop the process, fix the immediate condition, prevent recurrence. elysi did all four, on day 19, with a migration script and a new constraint set
The framework is the substrate. The harness is the production system. One can be downloaded. The other can’t
19 days into a run of Claude Code Game Studios, a community user named elysi stopped.
The agents were arguing. The tech-debt register was bloating. The studio was generating output, but the quality of that output was degrading with each sprint. At this point, most users do one of two things: they prompt harder, or they blame the framework. elysi did something else. They treated the degradation as diagnostic data, and they stopped the line.
Their account of the decision, posted in GitHub Issue #46, is the most instructive production sequence in the CCGS community data, and it has very little to do with the framework itself.
What the data says about this moment
The degradation elysi experienced is the predictable second act of AI-assisted development, and the research describes it consistently.
The LinearB 2026 Software Engineering Benchmarks Report, drawing on analysis of 8.1 million pull requests from 4,800 teams, found that technical debt increases 30-41% after AI coding tool adoption. AI-generated code contains 1.7 times more issues than human-written code. Developers working with AI tools report feeling 25% more productive while being 19% slower on end-to-end tasks. The Stack Overflow Developer Survey 2025 (n=49,000+) found that 66% of developers are spending more time fixing “almost-right” AI code. Forrester projects that 75% of technology decision-makers will face technical debt at a severe level by 2026.
The pattern is consistent: AI generates volume. Debt accumulates invisibly. By the time it’s visible, it’s compound and structural.
The CCGS version of this is the deferred items register. CCGS ships with a tech-debt tracking mechanism, but it has no cap on that register by default. As a 49-agent studio produces across multiple sprints, the register grows. Agents read it to understand project context. The larger it grows, the more tokens every read operation consumes, and the more likely agents are to work from an incomplete picture of the project’s actual state. Issue #64 documents what happens when architecture decision records exceed 25k tokens: the Read tool triggers offset-and-limit retries, doubling the token cost of a single read operation. The defect propagates downstream before anyone names it.
The IBM Institute for Business Value 2025 study put numbers on the consequence: teams that ignored technical debt saw project returns drop 18-29% and timelines expand by as much as 22%. That is the cost of not stopping.
The four steps
The Toyota Production System has a principle for exactly this situation. Jidoka, sometimes translated as “automation with a human touch,” is the right to stop the line. When a worker detects a defect or an abnormality in output quality, they stop the production line immediately. Not at the end of the sprint. Not at the next retrospective. Immediately. Then: fix the immediate condition, investigate the root cause, and install a countermeasure to prevent recurrence.
The principle originated in Sakichi Toyoda’s automatic loom, which was designed to stop the moment a thread broke rather than continuing to produce defective cloth. The insight was that defects cost less to fix at the point of production than after they have been woven into the fabric of the finished product. Stopping the line always feels expensive. Continuing with known defects is always more expensive.
elysi followed these four steps without knowing they were following them.
Detect. “Agents were arguing, tech-debt kept creeping, it wasn’t iterating.” That is the signal. Not frustration, not a hunch. A specific observation about system behaviour. Output volume was maintained while output quality declined. That gap is the defect.
Stop. elysi told Claude the run was not working and initiated the restart decision. This is the move most users don’t make. The sunk cost of 19 days, the token spend, the design documents and sprint artefacts: all of it creates pressure to keep prompting. elysi discarded that pressure and pulled the cord.
Fix. Rather than scrapping the entire run, elysi had Opus write a migration script. Not to preserve everything, but to extract what had worked and move it to a clean folder. A migration is not a copy. It is a curation decision: what from this system is worth carrying forward, and what should stay behind?
Prevent recurrence. This is the part that separates a restart from a reset. elysi trimmed CCGS to 49% of its original size. They imposed a 50-item cap on the deferred items register, with a standing rule to run a mid-sprint polish pass when the count reached 40. They wrote 15 bash helper scripts to manage context consumption, including partial-read-helper.sh, which prevented the framework from loading 40k tokens of context before any actual work began. They built a 1,081-test suite running via Godot Unit Testing at every subsequent code iteration.
“I got Opus to write a migration script to a new folder, copy what worked, and trimmed CCGS down to 49% of the repository. We have a written memory, no more than 50 open deferred items, and when we get to 40 we do a mid-sprint story to polish and fix.” — elysi, GitHub Issue #46
None of that came from the repository.
The constraint is the product
I’ve run into the same pattern from a different direction. Early in a complex UI project, the instinct is to solve the whole thing at once: one Miro board, one comprehensive design, everything visible and connected. It feels like progress. What it actually produces is a tangle of assumptions, interdependencies, and hidden constraints that only become visible when you try to build from the design.
Breaking it down changes the result. Personas first. Low-fidelity wireframes. Black and white mockups before any colour decisions. UX flow charts before any visual treatment. Each stage is a constraint: a deliberate reduction of scope that forces the work to surface what it actually requires. The hidden constraints don’t disappear. They appear early, where they are cheap to address, rather than late, where they are expensive.
The constraint is the instrument that makes the actual shape of the work visible.
elysi’s 50-item tech-debt cap works the same way. The number is somewhat arbitrary; 49 or 51 would also function. But the cap forces a reckoning. When the register hits 40, the team stops and resolves debt before adding more. The constraint makes the accumulation visible at a point where it is still manageable. Without the cap, the register grows until it is structural: until the defects are baked into the architecture and addressing them costs a restart.
The partial-read-helper.sh script is a constraint of a different kind. It prevents the framework from performing full-context reads when partial reads would serve the same purpose at a fraction of the token cost. The script does not change what CCGS can do. It changes how much of the budget gets spent on infrastructure overhead before any production work begins. elysi open-sourced it in Issue #46 because other users were burning 30-40% of their weekly token quota on context reads before the first line of game code was produced.
“I trimmed CCGS down to 49% of the repository. No more than 50 open deferred items, and when we get to 40 we do a mid-sprint story to polish and fix.” — elysi, GitHub Issue #46
The LinearB research identifies the top 20% of AI-assisted teams, the ones that avoid the technical debt crisis. They share three practices: tracking AI-generated code separately with specialised quality gates, measuring quality alongside output speed rather than treating them as a trade-off, and enforcing governance standards that catch AI’s predictable failure modes before they merge. These are production decisions, made in response to specific failure modes encountered on a specific project. They cannot be packaged. They have to be earned.
What frameworks can and can’t ship
McKinsey’s 2026 data finds that 62% of organisations experiment with AI agents while fewer than 25% scale to production. Gartner projects that by end of 2027, more than 40% of agentic AI projects will be put on hold, citing unclear business value and insufficient risk controls. The common thread is frameworks being used as shipped, without the constraint layer that makes production at scale viable.
CCGS at full capacity (49 agents, 73 skills, 41 templates, 12 hooks) is a starting position. It is well-engineered and thoughtfully structured for a broad range of users across Godot, Unity, and Unreal, across novice and experienced developers, across casual prototypes and serious production runs. A generic framework cannot anticipate the specific failure modes of a specific project. It can only give you the surface to work from.
What elysi built (the trimmed hierarchy, the bash tooling, the test suite, the capped register) is the production system. It is specific to their project, their failure modes, their context budget, their engine, their codebase. It emerged from 19 days of running the official framework until the official framework broke. The restart was the method. The constraint set that came out of the restart is the product.
You can download CCGS in seconds. You can’t download the 19 days that taught elysi what to cut.



