The Art of Commenting Code for AI Agents

There is a quiet skill gap emerging in software teams, and most organizations haven’t noticed it yet. It’s not about knowing how to use an AI coding tool. Most developers have figured that out. The gap is in knowing how to prepare your codebase so that those tools produce work that actually fits.

AI coding agents like Claude Code, Cursor, GitHub Copilot, and others are not magic. They are context machines. They look at what surrounds the code they’re being asked to touch, and they make inferences. When that surrounding context is thin, the inferences are generic. When it is rich, the output is often surprisingly precise. Comments are a large part of that context, and writing them well is becoming a genuinely important engineering discipline.

This article is aimed at technology leaders and architects who want to get the most out of AI-assisted development across their teams. We’ll dig into what AI agents actually use when they generate code, where comments fit in, and how to write them in ways that steer the output in a meaningful direction. Python examples are used throughout.

Why AI Agents Depend on Comments More Than Humans Do

A senior developer joining your team can ask questions, read tickets, sit in on standups, and gradually build a mental model of your system. An AI agent has none of that. It sees the file, maybe a few surrounding files, and whatever context window it’s been given. That’s it.

In that environment, comments are doing work that would otherwise require institutional knowledge. A comment explaining why a particular approach was chosen, what assumptions the function makes about its inputs, or what the code must never do is directly actionable context for an agent. Without those comments, the agent falls back on what looks statistically common in its training data – which often means conventional, reasonable-looking code that doesn’t account for the specific constraints of your system.

There’s a subtle shift required here. Historically, comments explained what the code was doing to other humans. For AI agents, the more valuable information is why, what not to do, what the data looks like, and what design decisions have already been made. Comments that answer those questions reduce the chance of the agent doing something technically correct but architecturally wrong.

The Six Comment Patterns That Actually Help AI Agents

1. Intent Comments

The most overlooked comment is a plain statement of purpose at the top of a function or module. Not a restatement of the function signature – a description of why this code exists and what problem it solves.

# BAD: Restates the obvious
# Loops through items and sums the prices
def calculate_order_total(items: list) -> int:
    return sum(item.price for item in items)


# GOOD: States intent and a non-obvious constraint
# Calculate the total for a checkout order.
# Prices are stored and returned as integer cents to avoid floating-point
# rounding errors in downstream billing calculations. Never convert to float
# here -- the payment processor expects integer cents end-to-end.
def calculate_order_total(items: list) -> int:
    return sum(item.price for item in items)

In the second version, an AI agent asked to modify or extend this function now knows the integer-cents constraint. It won’t helpfully “improve” the function by converting to a float for readability. That’s the difference comments make.

2. Data Contract Comments

AI agents frequently mishandle data shapes because the types in the code don’t tell the full story. A dict is just a dict from a type-checking perspective, but your system has very specific expectations about what’s inside. Spell those out.

# Process an inbound webhook from the payment provider.
#
# `payload` is a pre-parsed dict from the provider's SDK -- do not attempt
# to parse or decode it again. Expected structure:
#
#   {
#     "type": "payment_intent.succeeded",
#     "data": {
#       "object": { ...PaymentIntent fields... }
#     }
#   }
#
# Unknown event types should be silently ignored, not raised as errors.
# The provider sends many event types we don't handle, and that is normal.
def process_payment_webhook(payload: dict) -> None:
    event_type = payload.get("type", "")
    if event_type == "payment_intent.succeeded":
        _handle_successful_payment(payload["data"]["object"])

When an AI agent encounters this function and is asked to add support for a new event type, it knows exactly what the payload structure looks like and what to do with unrecognized types. Without the comment, both of those things are guesswork.

3. Architectural Boundary Comments

This is perhaps the most valuable comment pattern for large systems, and the most consistently absent. Module-level or class-level comments that describe the role a component plays in the overall architecture save AI agents from casually crossing boundaries that were deliberately drawn.

# Anti-corruption layer: Salesforce CRM integration
#
# This module is the single point of translation between our internal domain
# model and the Salesforce API. Its job is translation only.
#
# Rules:
#   - No business logic here. Translate shapes, not decisions.
#   - If the Salesforce API changes its response format, fix it in this file.
#     Nothing outside this module should need to change.
#   - Our domain objects go in, CRM-shaped dicts come out (and vice versa).
#
# See: docs/architecture/crm-integration.md
class SalesforceAdapter:
    def to_crm_contact(self, user: User) -> dict:
        return {
            "FirstName": user.first_name,
            "LastName": user.last_name,
            "Email": user.email,
            "AccountId": user.org.crm_account_id,
        }

    def from_crm_contact(self, crm_data: dict) -> User:
        return User(
            first_name=crm_data["FirstName"],
            last_name=crm_data["LastName"],
            email=crm_data["Email"],
        )

An AI agent working anywhere in the vicinity of this class now understands it’s an ACL boundary. It won’t slip business logic in here because the comment says not to. It won’t reach into this class from the domain layer to avoid the translation step because the purpose is explicit.

4. Decision Log Comments

These are the comments that save future developers (and AI agents) from re-litigating decisions that were made after hard-won experience. They answer the question: “Why isn’t this done the obvious way?”

# NOTE: Raw SQL is intentional here. Do not convert to ORM.
#
# The ORM generates a cartesian join when filtering across this particular
# many-to-many relationship with the current data volume. At scale (>500k users),
# that query caused a 40x slowdown in production (discovered Q3 2024, ticket #4821).
#
# If the data model changes significantly, benchmark before switching back.
def get_users_with_active_subscriptions(db_conn, org_id: int) -> list:
    query = """
        SELECT DISTINCT u.id, u.email, u.created_at
        FROM users u
        JOIN subscription_seats ss ON ss.user_id = u.id
        JOIN subscriptions s ON s.id = ss.subscription_id
        WHERE s.org_id = %s
          AND s.status = 'active'
          AND s.expires_at > NOW()
    """
    cursor = db_conn.cursor()
    cursor.execute(query, (org_id,))
    return cursor.fetchall()

Without this comment, an AI agent asked to “clean up” or “modernize” this module will almost certainly replace the raw SQL with a clean ORM query. It’s reasonable. It looks better. And it will blow up in production under load. The comment prevents that from happening.

5. Execution Context Comments

Some of the most painful AI-generated bugs come from code that looks correct but violates the runtime environment it runs in. Background workers, async contexts, and request-scoped operations all have constraints that aren’t obvious from the code itself.

# IMPORTANT: This function runs in a Celery worker with no HTTP request context.
#
# Do NOT reference:
#   - flask.request or flask.g (no active request)
#   - current_user (Flask-Login is unavailable outside a request)
#   - session (same reason)
#
# All data this function needs must be passed as explicit arguments.
# If you find yourself wanting to use request context here, the caller
# should resolve it before queuing the task.
def send_welcome_email(user_id: int, org_name: str, invite_url: str) -> None:
    user = User.query.get(user_id)
    if not user:
        return

    EmailService.send(
        to=user.email,
        template="welcome",
        context={"org_name": org_name, "invite_url": invite_url},
    )

This comment prevents a whole category of errors. The “do not use X here” pattern is especially powerful because AI agents often learn from patterns that work in one context and apply them everywhere.

6. Test Scope and Fixture Comments

AI agents frequently help write tests, and tests are an area where comments dramatically affect output quality. Knowing whether a test is a unit test, an integration test, or an end-to-end test changes what the agent should generate. Knowing what fixtures are required and what shouldn’t be mocked matters enormously.

# Integration test suite for the UserRepository class.
#
# These tests require a live PostgreSQL instance loaded with test fixtures.
# Run with: pytest -m integration
#
# Do NOT mock the database connection here. The entire point of these tests
# is to validate the actual SQL queries and index usage against a real database.
# For unit tests with mocked dependencies, see tests/unit/test_user_service.py.
class TestUserRepository:
    def test_get_active_users_returns_only_confirmed_accounts(self, db_session):
        # Fixture: db_session provides a test transaction that rolls back after each test
        repo = UserRepository(db_session)
        users = repo.get_active_users()
        assert all(u.is_confirmed for u in users)
        assert all(u.status == "active" for u in users)

Without this comment, an AI agent asked to add a new test will quite reasonably mock the database – because that’s standard practice for unit tests. With this comment, it understands that mocking defeats the purpose.

Patterns to Avoid: Comments That Mislead AI Agents

Not all comments help. Some comments actively mislead, because they set expectations that the code doesn’t match, or they suggest that a codebase is simpler than it is.

Stale Comments

Comments that describe what the code used to do are worse than no comments. They cause the AI to make changes that match the comment rather than the actual behavior. Audit and remove them when refactoring.

What-Comments on Obvious Code

Comments like # increment counter above counter += 1 add noise without signal. They train the model to expect low-density comments and reduce the weight it places on the comments that actually matter.

Aspirational Comments

Comments like # TODO: make this more generic left in place for months signal a design direction the codebase hasn't taken. AI agents may try to implement that aspiration in ways that break existing behavior.

Over-Documented Internals

Exhaustively commenting every line of a simple helper function buries the comment that matters -- the one explaining why this helper exists at all. Comment at the right level of abstraction.

A Practical Before-and-After: Pulling It Together

Here is a realistic module with before-and-after commenting to show the full effect. The module handles user session validation in a web application.

Before: Sparse Comments

import time
import hmac
import hashlib

SECRET_KEY = "changeme"
SESSION_TTL = 3600


def create_session_token(user_id: int) -> str:
    payload = f"{user_id}:{int(time.time())}"
    signature = hmac.new(SECRET_KEY.encode(), payload.encode(), hashlib.sha256).hexdigest()
    return f"{payload}:{signature}"


def validate_session_token(token: str) -> int | None:
    parts = token.split(":")
    if len(parts) != 3:
        return None
    user_id, timestamp, signature = parts
    expected = hmac.new(SECRET_KEY.encode(), f"{user_id}:{timestamp}".encode(), hashlib.sha256).hexdigest()
    if not hmac.compare_digest(expected, signature):
        return None
    if int(time.time()) - int(timestamp) > SESSION_TTL:
        return None
    return int(user_id)

An AI agent asked to “add refresh token support” or “extend the session on activity” to this module would likely invent a storage mechanism, possibly add a database dependency, and produce something functionally complete but architecturally at odds with how this token is intended to work.

After: Well-Commented Module

# Stateless session token module
#
# Tokens are self-contained and signed -- there is no server-side session store.
# This is intentional. The application runs across multiple stateless containers
# and we deliberately avoid shared session state.
#
# Token format: "{user_id}:{unix_timestamp}:{hmac_sha256_signature}"
#
# Constraints:
#   - Do NOT add database calls or cache reads here. Keep this stateless.
#   - Do NOT log token strings -- they are equivalent to passwords.
#   - SECRET_KEY comes from environment in production (see config.py).
#     The hardcoded fallback is for local dev only and will not be accepted
#     by the auth middleware in staging or production environments.
#
# To "refresh" a session, simply issue a new token. There is no refresh
# token concept; the caller should re-issue on each authenticated request
# if the token is within the last 15 minutes of its TTL.

import time
import hmac
import hashlib
import os

SECRET_KEY = os.environ.get("SESSION_SECRET", "changeme-local-dev-only")
SESSION_TTL = 3600  # seconds; tokens older than this are rejected regardless of signature


def create_session_token(user_id: int) -> str:
    """Issue a signed session token for the given user ID."""
    payload = f"{user_id}:{int(time.time())}"
    signature = hmac.new(SECRET_KEY.encode(), payload.encode(), hashlib.sha256).hexdigest()
    return f"{payload}:{signature}"


def validate_session_token(token: str) -> int | None:
    """
    Validate a session token and return the user ID if valid, or None if not.

    Returns None on any validation failure (bad format, bad signature, expired).
    Callers should treat None as "unauthenticated" and not attempt to distinguish
    the reason -- we intentionally do not expose why a token failed.
    """
    parts = token.split(":")
    if len(parts) != 3:
        return None

    user_id, timestamp, signature = parts

    # Constant-time comparison prevents timing attacks against the signature
    expected = hmac.new(
        SECRET_KEY.encode(),
        f"{user_id}:{timestamp}".encode(),
        hashlib.sha256
    ).hexdigest()
    if not hmac.compare_digest(expected, signature):
        return None

    if int(time.time()) - int(timestamp) > SESSION_TTL:
        return None

    return int(user_id)

Now an AI agent has everything it needs. It won’t add a database, because the comment says this is stateless by design. It won’t log token values, because the comment says not to. It understands what “refresh” means in this context and won’t introduce a refresh token flow that requires server-side storage.

Making This a Team Practice

Individual developers writing good comments is valuable. A team-wide standard is much more so, because it means AI agents encounter consistent context signals across the entire codebase rather than randomly in some files.

Establish a Commenting Standard

Document which comment patterns your team uses and where. Include module-level intent, data contracts, and architectural boundary comments in your definition of "done" for code review.

Use AI Agents to Identify Gaps

Ask your AI agent to review modules and flag functions that lack intent or constraint comments. Agents are good at spotting missing context once they understand what good looks like in your codebase.

Treat Comments as Living Documentation

Include comment accuracy in code reviews the same way you include test coverage. A stale comment is a bug. Stale comments in an AI-assisted codebase produce incorrect code at the speed of AI generation.

Capture Architecture Decisions at the Source

Architecture Decision Records (ADRs) belong in external docs, but a pointer and a one-line summary belong in the code. If a module exists because of a specific architectural decision, say so where the code lives.

The Bigger Picture: Comments as System Specification

There is a broader shift happening here that is worth naming. As AI agents take on more of the implementation work, comments increasingly function as specifications. They are where the architect’s intent meets the agent’s execution. A codebase with thoughtful, constraint-aware comments is a codebase that an AI agent can extend in a way that respects the design decisions already made.

This is not a new idea. The most experienced developers have always written comments that explained the “why” rather than the “what.” AI assistance has simply raised the stakes. The consequences of thin context are now immediate and compounding: an agent that doesn’t understand your constraints will generate a large volume of code that doesn’t honor them, and that code will need to be reviewed, corrected, and eventually refactored.

Getting ahead of this is a leadership decision as much as a technical one. It means recognizing that comment quality is part of code quality, and that teams working with AI tools need to be as deliberate about the context they create as they are about the code itself.

How VergeOps Can Help

Adopting AI coding agents is the easy part. Preparing your engineering organization to get consistent, high-quality output from those agents is a deeper challenge that touches coding standards, architectural governance, and team habits.

VergeOps works with engineering organizations to assess how AI-ready their codebases and development practices are, then build the standards, training, and tooling to close the gap. Whether that means establishing commenting standards, conducting AI-assisted development workshops, or embedding with your architecture team to document system boundaries and constraints, we help teams get from “we have the tools” to “the tools actually work for us.”

If your teams are using AI coding agents but not seeing the consistency and quality you expected, let’s talk.