
The tools are real, the productivity gains are real, and the risks are real. AI coding agents like Claude Code, Cursor, and Codex can compress hours of work into minutes. They can scaffold entire features, refactor large codebases, write tests, and debug problems with a level of fluency that would have seemed implausible just a few years ago. Even just a few months ago.
But capability is not the same as discipline. The teams getting the most durable value from these tools are not the ones using them most aggressively. They are the ones using them most deliberately. They have developed team-wide habits and norms that keep AI output trustworthy, keep engineers engaged and skilled, and keep codebases from becoming mazes of generated code that nobody fully understands.
What follows are the principles those teams have converged on. They are not rules handed down from a committee. They are observations from the field, distilled into a framework that reflects how productive agentic engineering actually works.
Principle 1: You Own Everything the AI Writes
This principle is #1 because it is the bedrock of all other principles. When an AI agent generates code and you accept it, that code is yours. The bugs are yours. The security vulnerabilities are yours. The architectural decisions embedded in that code are yours. The Git commit shows your name, not "Claude Code".
The Git commit shows your name, not "Claude Code"
This ownership is not a burden to resist. It is the correct mental posture. Engineers who approach AI-generated code as something the AI is responsible for make worse decisions about what to accept, what to modify, and what to reject. The framing matters: the AI is a tool. You are the engineer. The output belongs to you.
Practically, this means reviewing AI-generated code with the same rigor you would apply to a pull request from a colleague. In many cases, more rigor is warranted, because a colleague understands your system and your constraints in ways the AI does not. The AI is productive and often correct, but it is not accountable. You are.
Principle 2: Decompose Before You Prompt
The single most common failure mode in agentic engineering is prompting for too much at once. “Implement the entire payment processing module” produces a large block of code that addresses an imagined version of your requirements. “Write a function that validates a credit card number using the Luhn algorithm and returns a structured result” produces something you can evaluate, test, and integrate confidently.
This is not a limitation of AI capability. It is a reflection of a general principle: any task that is too vague to specify precisely is too vague to evaluate correctly. If you cannot describe what correct looks like before you run the prompt, you will not recognize incorrect when it arrives.
Decomposition also limits the blast radius of mistakes. A 20-line function that is wrong costs minutes to correct. A 500-line module that is architecturally wrong costs days. Agentic tools make large outputs easy to generate. That ease creates the temptation to skip decomposition. Resist it.
Break work down to the point where each unit has clear inputs, clear outputs, and a clear definition of success. Then prompt for that unit. Then verify it. Then move to the next.
Break work down to the point where each unit has clear inputs, clear outputs, and a clear definition of success.
This discipline extends to session-level scope as well. Agentic tools make large-scale transformations feel accessible – refactor the entire data access layer, rewrite the authentication module, migrate all error handling in a single pass. The problem is not the ambition; it is the verification. How do you know a 2000-line diff is correct? How do you know the refactoring preserved all the edge-case behavior that was embedded in the original code? With large generated changes, the answer is often that you don’t know, and you will find out in production.
A refactoring that moves in ten steps of 200 lines each, verified after every step, is far more reliable than a single large transformation. This is why the strangler fig pattern works so well with agentic development: incremental replacement of legacy components gives you a real checkpoint at every stage. A full rewrite attempted in a single agentic session does not.
Principle 3: Context Is Your Most Valuable Asset
AI agents produce output that is directly proportional in quality to the context they receive. A prompt given to an agent with no knowledge of your codebase, your conventions, or your constraints will produce generic, plausible-looking output that may not fit your system at all. The same prompt given to an agent with a well-structured project context file, access to relevant code, and clear instructions about style and patterns will produce something much closer to what you actually need.
In tools like Claude Code and Cursor, this means investing seriously in your CLAUDE.md or equivalent project-level context files. These files should tell the agent:
- What the project is and who it is for
- What technologies and frameworks are in use
- What conventions the codebase follows
- What patterns to prefer and which to avoid
- What areas of the codebase are particularly sensitive or complex
Think of this file as an onboarding document for a very capable contractor who has no prior context about your project. The better your onboarding documentation, the better the output you get.
Context also means being deliberate about what code you include in the conversation. When asking an agent to implement something that interacts with existing code, provide that existing code. When asking for something that must match an existing pattern, point to examples of that pattern. The agent cannot use context it does not have.
Go beyond the prompt with carefully crafted comments throughout your codebase. Comments are one of the best ways to provide specific context to AI tools right where it matters most.
Go beyond the prompt with carefully crafted comments throughout your codebase.
Principle 4: Never Rubber-Stamp AI Output
Accept nothing without reading it. This discipline needs to be explicit because the speed and fluency of AI output creates pressure to accept it uncritically. The code looks complete. The tests pass. The diff looks reasonable. Approve and move on.
This is how subtle bugs, security issues, and architectural problems enter codebases. AI models are trained to produce code that looks correct. They are not trained to be correct. The distinction matters enormously.
What to look for when reviewing AI-generated code:
Hidden Assumptions
What is the code assuming about inputs, state, or the environment? Are those assumptions documented? Are they correct?
Error Handling Gaps
AI-generated code often handles the happy path well and treats error paths superficially. Trace through failure scenarios explicitly.
Performance Implications
Generated code is frequently correct but naive about performance. Review database query patterns, loop structures, and memory usage with fresh eyes.
Security Concerns
Input handling, authentication checks, and data exposure are all areas where AI-generated code can be subtly wrong. Do not assume these are handled correctly.
Fit with the Surrounding System
Does this code use the existing abstractions? Does it follow the project conventions? Or has the agent invented new patterns that diverge from the rest of the codebase?
Reading generated code carefully takes time. It is still dramatically faster than writing it from scratch. The discipline pays for itself.
Principle 5: Architecture Belongs to Humans
AI agents are effective implementers. They are poor architects. The distinction is important.
Given a well-defined task in a well-understood context, an agent can produce a correct implementation efficiently. But given broad architectural latitude, an agent will make structural decisions based on patterns from its training data rather than on your specific requirements, your team's expertise, or your long-term maintainability goals. Those decisions can look reasonable in isolation and be wrong for your system.
AI agents are effective implementers. They are poor architects.
Architectural decisions include: what services exist, how they communicate, what owns what data, where boundaries should be drawn, what abstractions to build, and how the system should evolve. These decisions belong to human engineers. AI agents should implement the architecture you have chosen, not choose it for you.
This does not mean you cannot use AI to explore architectural options. A conversation with an agent about the tradeoffs between two approaches can be useful. But the decision and the accountability for it need to rest with your team. Once the architecture is decided, prompt the agent to implement it, with that decision clearly stated as context.
Principle 6: When in Doubt, Go Manual
AI is a force multiplier for engineers who already understand what they are building. That framing contains an important implication: the multiplier only helps when the understanding is already there. When it is not, prompting an agent doesn’t clarify the problem – it outsources the thinking, and you get back something that looks like an answer but may not be.
There are categories of work where this distinction matters most, and where the instinct to reach for an AI tool should be examined before acting on it.
Security-critical code is the highest-stakes category. Authentication flows, authorization logic, cryptographic operations, session management, input sanitization – these areas are where subtle implementation errors cause serious harm, and where AI models are most likely to produce code that is functionally correct in the happy path but wrong in the edge cases that matter for security. Timing attacks, injection vectors, privilege escalation paths, and improper validation are all things that well-written AI-generated code can still get wrong in ways that a surface review will not catch. If the code touches auth or handles sensitive data, human-first development is the right posture.
Novel domain logic is a less obvious case, but equally important. When you are working through domain logic for the first time – figuring out what the rules actually are, not just how to implement rules you already understand – prompting an AI before you have clarity means the agent fills the gap with reasonable-sounding defaults from its training data. You get plausible logic instead of correct logic, and the difference may not surface until the business discovers the code doesn’t reflect how their domain actually works. Work the logic out yourself first. Write it down. Then use an agent to implement what you have already understood.
Performance-sensitive paths are a third category worth flagging. Generated code tends toward clarity and correctness at the expense of efficiency. That tradeoff is acceptable in most of the codebase. In the paths that run under high load – tight loops, frequently-called service boundaries, database-heavy operations – naive but functionally correct code causes production incidents. These paths deserve the attention of an engineer who has profiled the system and understands the performance constraints, not code accepted from an agent that produced something that looks reasonable.
None of this means refusing to use AI in these areas. It means reversing the default. In most of the codebase, the default is to prompt the agent and verify what it produces. In these categories, the default is to write it yourself and consider using the agent for specific pieces once you are confident in the overall design.
The signal to watch for is uncertainty that persists after review. If you read the generated code, cannot find a specific problem, but still feel uneasy about whether it is right – trust that feeling. It usually means your mental model of the problem is not solid enough to evaluate the output. That is not a reason to accept anyway. It is a reason to step back, build understanding, and return to the agent when you know what correct looks like.
Principle 7: Tests Are Not Optional, and AI-Generated Tests Need Scrutiny
Automated tests are essential in agentic development, not as a nice-to-have but as the primary mechanism for verifying that generated code is correct. Without tests, you have no systematic way to know whether the code the agent wrote does what you intended.
But AI-generated tests require scrutiny just as AI-generated production code does. Common problems include:
Tests That Verify What the Code Does, Not What It Should Do
An agent asked to write tests for code it also wrote may produce tests that pass because they match the implementation, not because the implementation is correct. The test becomes a specification of observed behavior, not a validation of intended behavior.
Incomplete Coverage
Generated tests often cover the happy path and obvious edge cases while missing subtle failure modes. Review test coverage with the same rigor you apply to production code.
Circular Verification
If you use an agent to write code, then use the same agent to write tests, and then use the test results to decide whether the code is correct, you have created a loop that provides false confidence. Include human-authored tests for critical behavior.
Write your tests before you prompt for the implementation whenever possible. Tests written first describe what you actually want. Tests written after describe what the agent produced.
Principle 8: Track What the Agent Touches
Know which parts of your codebase have been significantly AI-generated. This is not about stigma. It is about risk management and maintenance awareness.
AI-generated code may have embedded assumptions or subtle patterns that are not immediately obvious. When bugs appear in that code, knowing its provenance is useful diagnostic information. When you need to extend or modify that code, knowing that it was generated means approaching it with appropriate skepticism about whether it fully reflects intentional design decisions.
Practically, this can be as simple as a comment or commit message convention that flags AI-generated files or significant AI-assisted changes. Some teams use git attributes or tags. The mechanism matters less than the discipline of tracking.
This information also helps with calibrating AI tool usage over time. If you notice that AI-generated code in certain areas generates a disproportionate share of bugs, that is a signal about either the quality of prompting in those areas or the suitability of agentic approaches for that class of problem.
Principle 9: Establish Team Norms Before You Need Them
Individual engineers using AI tools ad hoc produces incoherent results. One engineer uses agents for everything and accepts output uncritically. Another refuses to use agents at all. A third uses them heavily but with good discipline. The result is a codebase with wildly varying quality and consistency.
Agentic engineering practices need to be team-level decisions, not individual choices. This means agreeing as a team on:
Which Tools Are Sanctioned
Not for vendor lock-in reasons, but so that context files, conventions, and practices can be standardized across the team.
The Review Process for AI-Generated Code
Is there a higher bar? A different checklist? These questions should be answered before they arise in a code review, not during one.
How to Handle Context Files
Who owns the CLAUDE.md or equivalent? How is it updated? Who reviews changes to it? These files are load-bearing artifacts, not afterthoughts.
What Work Is Well-Suited to Agentic Approaches
Boilerplate generation, test scaffolding, and repetitive transformations are good candidates. Core business logic, security-critical code, and performance-sensitive paths warrant more caution.
How Skills Will Be Maintained
If the team is shifting toward agentic development, what is the plan for ensuring engineers remain capable of working without AI assistance when needed?
These conversations are uncomfortable because they require engineering teams to confront tradeoffs they would rather avoid. But teams that have them upfront build better norms than teams that patch them together after problems emerge.
Principle 10: The Goal Is Sustainable Velocity, Not Maximum Output
The organizations misusing agentic tools are optimizing for output volume. Lines of code generated. Features shipped per sprint. Tickets closed. These metrics look good in the short term. They look very different after twelve months, when the codebase is full of generated code that nobody fully understands, when bugs are appearing in unexpected places, and when engineers feel less capable and less engaged than they did before.
Sustainable agentic engineering optimizes for something different: code your team can understand, extend, and maintain; engineers who grow rather than atrophy; velocity that holds up over time rather than accelerating briefly and then collapsing under accumulated complexity.
This is the harder goal. It requires discipline in the face of tools that make undisciplined approaches feel productive. It requires investment in practices that slow you down slightly in the short term and pay back substantially over time.
The teams that get this right treat agentic tools as a significant force multiplier for capable engineers, not as a replacement for engineering capability. They understand that the value of these tools compounds when engineers are skilled, and erodes when engineers are not.
That is the through-line of this manifesto. Use the tools. Use them aggressively. But use them in ways that leave your team stronger, your codebase cleaner, and your organization more capable than before.
Principle 11: Maintain Your Craft
The productivity gains from AI-assisted development are real, but they come with a cost that most organizations are not measuring: the gradual erosion of engineering capability. When engineers stop writing code from scratch, stop debugging without assistance, and stop designing systems without an agent to scaffold the thinking, the skills that make them effective as engineers – and effective as users of AI tools – quietly weaken.
This is not a hypothetical risk. It follows from how expertise works. Engineering proficiency is built through deliberate practice: encountering hard problems and working through them, debugging something obscure until you understand it, designing something from constraints you cannot hand off. When AI tools absorb that practice, they absorb the vehicle for skill development along with the work. Engineers who rely heavily on agents without maintaining independent practice become increasingly dependent over time, and increasingly unable to catch the problems in AI-generated output that require deep technical judgment to spot.
The practical response is to be deliberate about what you practice without AI assistance. Read code regularly -- including AI-generated code -- without asking the agent to explain it. Debug problems without requesting a fix first. Design systems on a whiteboard before prompting for scaffolding. The goal is not to use AI less; it is to keep the part of your engineering mind that reasons independently sharp. That independence is what makes AI tools valuable in the first place. The best agentic engineers are engineers who would be excellent without the tools. The tools amplify capability that is already there.
The best agentic engineers are engineers who would be excellent without the tools.
Principle 12: Garbage In, Garbage Out
The quality of AI output is a direct function of the quality of input. This is true of every system that processes information, but it matters more with agentic tools because the consequences are so immediate and the output volume is so high. A bad prompt in traditional development produces one developer’s mediocre implementation. A bad prompt with an agentic tool produces 500 lines of plausible-looking code that now needs to be reviewed, questioned, corrected, or thrown away. The investment you make in inputs – prompts, context files, comments, codebase hygiene – pays back in proportion to how much you use these tools.
Prompt quality is the most visible input, but it is not the only one. The context file you maintain for your project is foundational. If it is sparse, out of date, or internally contradictory, the agent is operating from a distorted map of your system. The code surrounding whatever you’re asking the agent to touch is itself a form of input: a codebase full of inconsistent patterns, poor naming, or no explanatory comments at all gives the agent a weak signal to work from. An agent working in a well-structured, well-documented codebase naturally produces output that fits. An agent working in a codebase that nobody has taken care of produces output that fits that codebase – which is not a compliment.
The practical implication is that improving your AI output often does not mean prompting differently -- it means improving the environment the agent is working in. Investing in CLAUDE.md or equivalent context files, maintaining clean and well-commented code, and writing precise prompts that include constraints not just goals are all upstream inputs that determine downstream output quality. The teams consistently getting the best results from agentic tools are not prompting more cleverly in the moment; they have built an environment where the agent has what it needs to succeed before the session starts.
Improving your AI output often does not mean prompting differently. It means improving the environment the agent is working in.
How VergeOps Can Help
VergeOps works with engineering organizations to build agentic engineering practices that capture real productivity gains without creating long-term capability or quality debt.
Agentic Engineering Readiness Assessment. We evaluate your team’s current AI tool usage, identify gaps in practices and norms, and provide a concrete roadmap for improvement.
Team Norms and Playbooks. We facilitate the conversations teams need to have and help establish the guidelines, context files, and review practices that make agentic development reliable.
Training Programs. Our hands-on workshops build the discipline and judgment engineers need to use AI tools effectively. We cover everything from prompt engineering to code review practices for AI-generated output.
Architecture and Code Review. We provide experienced oversight on AI-assisted development efforts, catching problems early and building team capability through direct collaboration.
Contact us to discuss how your organization can use agentic tools at their full potential, without the hidden costs.
The Three Questions Before Committing
Before accepting any significant block of AI-generated code, run through these three checks. If any answer is no, keep working.
Can I explain this code to a junior developer?
If you cannot walk someone through the logic and the decisions embedded in it, the code is not ready to commit. Understanding -- not just approval -- is the standard.
Have I tested the edge cases and error paths?
Generated code handles the happy path reliably. Edge cases and failure modes require deliberate attention. If you have not traced through them, you have not finished the review.
Does this follow our team's standards and patterns?
Agents work from training data, not your team's conventions. Verify that naming, structure, error handling, and patterns are consistent with what the rest of the codebase expects.
Red Flags: Stop and Reconsider
Any of these conditions is a signal to pause before moving forward.
You accepted code you do not understand
If you cannot explain why the code is correct, you cannot know that it is. This is the most common way subtle bugs reach production in agentic development.
You are accepting suggestions without reading them
Speed is the enemy of rigor here. If the output looks complete and the tests pass, the temptation is to move on. That is when things go quietly wrong.
The code "just works" but you cannot explain why
Correct-looking behavior in development is not the same as correct behavior under production conditions. If you cannot explain how it works, you cannot predict how it will fail.
You are using AI for auth, authorization, or encryption
Security-critical paths are where AI-generated code is most dangerous. Generated code here looks correct and fails in ways that surface checks will not reveal. Go manual.
The generated code has no error handling
Agents optimize for the happy path. If a function makes no attempt to handle failures, assume failure modes were not considered -- not that they do not exist.
You have been stuck at "90% done" for hours
When iterating with an agent stops producing progress, the prompt or the approach is wrong. Stop, reassess from first principles, and reframe the problem before continuing.
The AI is generating deprecated or non-existent APIs
Hallucinated library methods and outdated API calls are a common failure mode. Verify any external API usage against current documentation before accepting it.
Remember
AI coding tools are like autocomplete on steroids: Useful, fast, and occasionally brilliant — but also capable of confidently suggesting terrible ideas. Your judgment is the irreplaceable part.