Back to overview

June 12, 2025

Colin Kerkhof

Co-creating Code with LLMs: A Practical Workflow

In the rapidly evolving field of AI engineering, we're constantly seeking ways to move beyond simple code completion and leverage Large Language Models (LLMs) for more complex, project-specific development tasks. The challenge isn't just about generating snippets of code; it's about integrating the LLM into our development lifecycle as a true collaborator.

For a recent project, we were tasked with building a system to help streamline the review of social security benefit applications. This process involves complex business rules, detailed evidence review, and strict data consistency requirements. Manually building the boilerplate for data models, APIs, and tests for each type of benefit and condition would be time-consuming and error-prone. This challenge was a perfect candidate for our LLM co-creation workflow, allowing us to rapidly generate robust and consistent code based on a set of well-defined patterns.

This post introduces a structured, iterative workflow that positions the engineer as an architect and the LLM as a highly-skilled pair programmer. The core of this collaboration hinges on a simple but powerful mantra: "Apply Pattern X to our specific Context Y." By defining clear patterns and providing focused context, we can unlock a new level of productivity and shift our focus from writing boilerplate to designing elegant systems.

This workflow is a "happy path" that has worked well for us. We encourage you to experiment with it and adapt it to your own needs.

Our Development Workflow: A 5-Step Cycle

We've distilled our LLM-driven development process into a five-step cycle. This cycle is designed to take an idea from a rough concept to a fully implemented and tested feature, all in close collaboration with an LLM.

Create a Blueprint: Translate project requirements into a formal, machine-readable schema.
Define the Data Model: Apply a consistent architectural pattern to generate data models from the schema.
Implement the Endpoint: Use the data models to create API endpoints.
Implement the Test: Generate property-based tests to ensure the endpoint is correct and robust.
Iterate and Expand: Re-apply the established patterns to build out the rest of the application.

Let's walk through each step.

Step 1: Create a Blueprint From The Key Ideas (The "Memory")

Every project starts with ideas, often scattered across meeting notes, design documents, or whiteboard sketches. The first step is to consolidate these unstructured thoughts into a formal database schema. This schema becomes the foundational "memory" document for all subsequent steps.

Goal: Translate project requirements into a formal database schema.
Pattern: We use a general LLM capability: generating an entity-relationship diagram (ERD) in Mermaid-markdown from unstructured text.
Context: Our project-specific notes about the required entities and their relationships.
Example Prompt: "Given these notes about benefit types, applications, and eligibility criteria, generate a Mermaid ERD for a database schema."

The outcome is a database_schema.md file that serves as a single source of truth for our data structures.

Example Blueprint: The Schema

The machine readable version

erDiagram
    BENEFIT_TYPES {
        UUID id PK
        STRING name
        TEXT description
    }

    APPLICATIONS {
        UUID id PK
        UUID benefit_type_id FK
        ENUM status
        TIMESTAMP submitted_at
    }

    ELIGIBILITY_CRITERIA {
        UUID id PK
        UUID benefit_type_id FK
        STRING criterion
    }

    APPLICATION_DATA {
        UUID id PK
        UUID application_id FK
        STRING data_type
        STRING value
    }

    BENEFIT_TYPES ||--o{ APPLICATIONS : "are for"
    BENEFIT_TYPES ||--o{ ELIGIBILITY_CRITERIA : "have"
    APPLICATIONS ||--o{ APPLICATION_DATA : "contain"

The human readable version

‍

Step 2: Define The Data Model Using The Command-Query Model Pattern

With a blueprint in hand, the next step is to implement the database models. To ensure consistency and quality, we use a predefined architectural pattern for our SQLModel classes.

Goal: Implement a single, high-quality database model.
Pattern: The Command-Query Model Pattern (Base, Create, Table, Update), an architectural pattern we defined for SQLModel.
Context: A table definition from our database_schema.md.
Example Prompt: "Using the 'EligibilityCriteria' table from @database_schema.md and our documented Command-Query model pattern, generate the corresponding SQLModel classes in @models.py."

For this to work, the LLM needs access to both the database schema and a clear explanation of our model pattern. This pattern is heavily inspired by the official SQLModel tutorial on multiple models with FastAPI. This tutorial is a perfect document to provide as context for the LLM.

Example Command-Query Model

The LLM might initially produce code that is correct but verbose. For example, it might place fields like id, created_at, and updated_at directly into each model.

LLM's First Pass

# LLM's first pass is functional, but repetitive.
class EligibilityCriterionBase(SQLModel):
    criterion: str
    benefit_type_id: uuid.UUID = Field(foreign_key="benefit_types.id")

class EligibilityCriterionCreate(EligibilityCriterionBase):
    pass

class EligibilityCriterion(EligibilityCriterionCreate, table=True):
    __tablename__: str = "eligibility_criteria"
    id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True, index=True, nullable=False)
    created_at: datetime | None = Field(default_factory=lambda: datetime.now(UTC), nullable=False)
    updated_at: datetime | None = Field(default_factory=lambda: datetime.now(UTC), nullable=False)


"""For a PATCH request, all fields should be optional."""
class EligibilityCriterionUpdate(SQLModel):
    criterion: str | None = None
    benefit_type_id: uuid.UUID | None = None

This works, but it violates the Don't Repeat Yourself (DRY) principle. If we need to change how IDs or timestamps are handled, we'd have to edit every model. A better approach is to extract these common fields into a BaseRecord model. We can instruct the LLM to do this, or do it ourselves.

Human-Refined Code

Here's an example of the pattern applied to an EligibilityCriteria model. It separates the base fields, the creation schema, the database table model, and the update schema.

class BaseRecord(SQLModel):
    id: uuid.UUID = Field(default_factory=uuid.uuid4, primary_key=True, index=True, nullable=False)
    created_at: datetime | None = Field(default_factory=lambda: datetime.now(UTC), nullable=False)
    updated_at: datetime | None = Field(default_factory=lambda: datetime.now(UTC), nullable=False)

class EligibilityCriterionBase(SQLModel):
    criterion: str
    benefit_type_id: uuid.UUID = Field(foreign_key="benefit_types.id")

class EligibilityCriterionCreate(EligibilityCriterionBase):
    pass

class EligibilityCriterion(EligibilityCriterionCreate, BaseRecord, table=True):
    __tablename__: str = "eligibility_criteria"

"""For a PATCH request, all fields should be optional."""
class EligibilityCriterionUpdate(SQLModel):
    criterion: str | None = None

Step 3: Building the Endpoints

Once we have our data models, we can create the API endpoints to interact with them. This step also follows a standard pattern.

Goal: Create generic CRUD functions and a specific API endpoint for a model.
Pattern: A standard FastAPI router with GET, POST, PATCH, and DELETE endpoints.
Context: Our Command-Query EligibilityCriterion models from models.py.
Example Prompt: "Implement standard RESTful CRUD operations in a FastAPI router for the 'EligibilityCriterion' table. Use the appropriate Schemas from @models.py."

The SQLModel tutorial on integrating with FastAPI provides a complete example of how to connect these models to CRUD endpoints. Of course, this requires some initial setup, like a database dependency for FastAPI. We recommend using an in-memory SQLite database during development for speed and simplicity.

Example CRUD-Router: From Specific to Generic

This initial output defines the API endpoints, but the implementation is missing. When asked to implement them, an LLM might generate specific, verbose logic for each function.

LLM's First Pass: Specific CRUD Logic

@criterion_router.post("/")
async def create_criterion(
    session: AsyncSessionDep, criterion: EligibilityCriterionCreate
) -> EligibilityCriterion:
    """Creates a new criterion. This is verbose and will be repeated for every model."""
    db_criterion = EligibilityCriterion.model_validate(criterion)
    session.add(db_criterion)
    await session.commit()
    await session.refresh(db_criterion)
    return db_criterion

# ... (and imagine similar verbose implementations for read, update, and delete)

This is highly repetitive. A much cleaner approach is to define generic functions that can handle CRUD operations for any SQLModel table. The human programmer can create those generics.

Human-Refined Code: Generic CRUD Functions

The programmer can write a set of generic functions to handle the core operations, and then use them in the specific endpoints.

T = TypeVar("T", bound=SQLModel)

async def generic_create(schema: type[T], data: BaseModel, session: AsyncSession) -> T:
    new_data = data.model_dump(exclude_unset=True)
    try:
        insert = schema.model_validate(new_data)
    except ValidationError as e:
        raise HTTPException(status_code=422, detail=str(e)) from e

    session.add(insert)
    await session.commit()
    await session.refresh(insert)
    return insert

# ... imagine similar generics for Read, Update, Delete ...

With these generics in place, the API router becomes much simpler and easier to maintain.

Refactored Router

criterion_router = APIRouter(prefix="/criteria", tags=["criteria"])

@criterion_router.post("/")
async def create_criterion(
    session: AsyncSessionDep, criterion: EligibilityCriterionCreate
) -> EligibilityCriterion:
    return await generic_create(EligibilityCriterion, criterion, session)

@criterion_router.get("/")
async def read_criteria(
    session: AsyncSessionDep, skip: int = 0, limit: int = 100
) -> list[EligibilityCriterion]:
    return await generic_get_all(EligibilityCriterion, session, skip, limit)

@criterion_router.get("/{criterion_id}")
async def read_criterion(
    session: AsyncSessionDep, criterion_id: UUID
) -> EligibilityCriterion:
    return await generic_get(EligibilityCriterion, criterion_id, session)

@criterion_router.patch("/{criterion_id}")
async def update_criterion(
    session: AsyncSessionDep,
    criterion_id: UUID,
    criterion_update: EligibilityCriterionUpdate,
) -> EligibilityCriterion:
    return await generic_update(EligibilityCriterion, criterion_id, criterion_update, session)

@criterion_router.delete("/{criterion_id}")
async def delete_criterion(
    session: AsyncSessionDep, criterion_id: UUID
) -> EligibilityCriterion:
    return await generic_delete(EligibilityCriterion, criterion_id, session)

Step 4: Implement High-Fidelity, Property-Based Tests

A robust API needs robust tests. We use property-based testing to ensure our endpoints are reliable and correct under a wide range of inputs.

Goal: Ensure the API is robust, reliable, and correct.
Pattern: Property-based testing using pytest and Hypothesis.
Context: The 'create criterion' endpoint, its EligibilityCriterionCreate schema, examples of using httpx.AsyncClient, and examples of Hypothesis.
Example Prompt: "Write tests for the criterion router as defined in @crud.py. Use Hypothesis to generate test data based on the Schemas defined in @models.py and use the async test_client to call the API."

This step relies on having testing infrastructure in place, like a pytest fixture for the httpx.AsyncClient as a test client.

Example Property-Based Testing

"""Generate Objects to Match the `Create` Model"""
@st.composite
def criterion_strategy(draw: st.DrawFn) -> dict[str, Any]:
    """Generate valid EligibilityCriterion data."""
    return {
        "criterion": draw(st.text(min_size=1, max_size=200)),
        "benefit_type_id": uuid.uuid4(),
    }
    """
    In a production test suite, we'd replace uuid.uuid4()
    with a pytest fixture that creates a BENEFIT_TYPES record
    and provides its ID, ensuring our foreign key constraint
    is always satisfied during testing.
    """

"""Supply the Object factory to the test to quickly test all properties"""
@given(new_criterion=criterion_strategy())
@pytest.mark.asyncio
async def test_create_criterion(
    self, new_criterion: dict[str, Any], test_client: AsyncClient
) -> None:
    """Test creating an EligibilityCriterion."""
    # In a real test, you'd ensure the benefit_type_id exists.
    validated_input = models.EligibilityCriterionCreate.model_validate(new_criterion)
    response = await test_client.post(
        "/criteria/", content=validated_input.model_dump_json()
    )

    assert response.status_code == HTTPStatus.OK
    created = response.json()

    assert "id" in created
    assert "created_at" in created
    assert "updated_at" in created
    assert created["criterion"] == new_criterion["criterion"]
    assert created["benefit_type_id"] == str(new_criterion["benefit_type_id"])

Example Test Client Fixture

import pytest_asyncio
from httpx import AsyncClient, ASGITransport

from main import app, get_db

# ... other fixtures or test DB setup etc. ...

@pytest_asyncio.fixture
async def test_client() -> AsyncGenerator[AsyncClient, None]:
    app.dependency_overrides[get_db] = get_test_db
    transport = ASGITransport(app)
    async with AsyncClient(transport=transport, base_url="http://test") as client:
        yield client
    app.dependency_overrides.clear()

Step 5: LLM Goes Brrrr (Iterate and Expand)

Here's where the magic happens. Having established our patterns by building one high-quality implementation, scaling becomes incredibly efficient. We can now ask the LLM to apply those same patterns to the other tables in our schema.

Prompt 1: Create Models

Pattern: Our Command-Query Model Pattern.
Context: All other models in @database_schema.md, using @models.py as an example.
Prompt: "Based on the examples in @models.py, implement all other models as defined in @database_schema.md."

Prompt 2: Create CRUD Logic

Pattern: Our FastAPI router pattern.
Context: All other models in @models.py, using @crud.py as an example.
Prompt: "Based on the examples in @crud.py, implement CRUD logic for all models defined in @models.py."

Prompt 3: Create Tests

Pattern: Our property-based testing strategy.
Context: All new endpoints in @crud.py, using @test_crud.py as an example.
Prompt: "Based on the examples in @test_crud.py, implement tests for all endpoints defined in @crud.py."

A few well-crafted prompts can generate a significant portion of our application's boilerplate, giving us more time to focus on more complex logic.

Guiding Principles for LLM Co-Creation

This workflow is supported by a few key principles that we've found essential for success.

Design Code, Don't Write It

Your role shifts from writing code to designing it. By establishing a high-quality example of a pattern ("First One, Then Many"), you provide the LLM with a template to follow. Your job becomes reviewing, refining, and guiding the LLM, rather than typing out boilerplate. You spend more time thinking about architecture and robust patterns.

Control Context and Use Sources

LLMs perform best with small, focused contexts. Start each new step with a "clean room" context window. Use external documents (like our database_schema.md) as a persistent "memory" that you can feed to the LLM. When implementing a library, feed the LLM its official documentation and tutorials. For instance, when asking it to generate SQLModel classes and FastAPI endpoints, we provide it with the official tutorial on multiple models. And don't be afraid to restart a chat if the LLM gets sidetracked.

Let the AI Clean Its Own Mess

Use static analysis and automated testing to your advantage. An LLM can generate code that looks right but fails under scrutiny.

DLTLLMRI: Don't Let the LLM Repeat Itself. If you see the LLM writing repetitive code, stop and work with it to create a generic function or a factory instead.
Instruct the LLM to verify its own work. A simple prompt addition like "Verify your work by running make ci" can work wonders.

Here's the Makefile we use to run our static analysis suite:

.PHONY: static_analysis
static_analysis:
 @uvx ruff format .
 @uvx ruff check . --fix
 @uvx complexipy --details low src/app tests/
 @uv run --all-groups --with pip-audit pip-audit -l
 @uv run --all-groups --with pyright pyright src/app tests/

.PHONY: test
test:
 @uv run --group testing pytest -n auto -m "not slow" --cov=src/app --cov-report=xml
 @uvx diff-cover coverage.xml --fail-under=80 --compare-branch=main

ci: | static_analysis test

This makefile allows to run all analysis at once with make ci, but it also lets you run only static analyis with make static_analysis, which is useful when the tests are not written yet. Note that we make use of uv and its tool runner uvx, but this can just as easily be done with equivalent tools.

Conclusion

This 5-step workflow for co-creating code with LLMs represents a significant shift in the development process. By acting as architects who define patterns and guide the LLM, we can automate the generation of high-quality boilerplate code. This allows us to dedicate our expertise to the most complex and unique aspects of our applications.

Adopting a structured approach like this one transforms the LLM from a simple code completion tool into a powerful and productive development partner, bridging the gap between AI capabilities and the demands of real-world software engineering.

Come chat with us

Get in touch to find out what your data can do for you. Spoiler alert: it's a lot.

“This Vincent guy really, really knows his shit!”

As stated by one happy customer

Our views on the latest in AI.