orapha.dev

Awesome Design: A catalog of design systems ready to use with AI agents

Sat, 09 May 2026 10:00:00 GMT

Awesome Design: A catalog of design systems ready to use with AI agents

There’s something that’s always bothered me when working with code agents: visual inconsistency. You ask the agent to “create a dashboard screen,” it delivers something functional, but the visual… well, the visual looks like it was made by someone who’s never heard of a design system. Each screen becomes an island, each component has its own personality, and in the end you have a visual Frankenstein that works but doesn’t give that professional product feeling.

I’ve tried various approaches. I’ve shared Figma links, I’ve described colors and typography in detail, I’ve even sent screenshots with annotations. It works, but it’s tedious. And the worst part: with each new feature, the conversation starts from scratch.

That’s when I discovered Awesome Design.

What is DESIGN.md?

Before talking about the repository itself, I need to explain the concept. DESIGN.md is an idea that came from Google Stitch. It’s simply a markdown file that describes your project’s design system: colors, typography, components, spacing, responsive behaviors.

The trick is that, since it’s markdown, AI agents read it very well. No complex JSON needed, no need to export from Figma, no plugin required. It’s a text file, at the root of the project, that any agent can consult.

The analogy I like to use:

File	Who reads it	What it defines
`AGENTS.md`	Code agents	How to build the project
`DESIGN.md`	Design agents	How the project should look

What does Awesome Design offer?

The VoltAgent repository is basically a catalog of DESIGN.md files inspired by real sites and products. And when I say real, I’m talking about brands like:

AI Platforms:

Claude — that warm terracotta tone, clean editorial layout
ElevenLabs — dark cinematic UI, with that sound wave aesthetic
Ollama — monochromatic simplicity, focus on the terminal
VoltAgent — black-void canvas, emerald accent, terminal vibe

Dev Tools:

Cursor — dark editor with sleek gradients
Vercel — black and white precision, Geist font
Warp — modern terminal, block UI
Supabase — dark emerald theme, code first

Productivity & SaaS:

Linear — ultra minimalist, precise, purple accent
Notion — warm minimalism, serif headings
Figma — vibrant multicolor, professional but fun

And the list continues: Stripe, Spotify, Apple, Tesla, Airbnb, Nike… there are more than 70 documented design systems.

How it works in practice

The structure of each DESIGN.md follows a pattern that makes life much easier:

Visual Theme & Atmosphere — the mood, density, design philosophy
Color Palette — semantic names + hex + function
Typography Rules — font families, complete hierarchy
Component Styles — buttons, cards, inputs, navigation
Layout Principles — spacing scale, grid, whitespace
Depth & Elevation — shadow system, surface hierarchy
Do’s and Don’ts — guardrails and anti-patterns
Responsive Behavior — breakpoints, touch targets
Agent Prompt Guide — quick color references, ready to use

Besides the DESIGN.md, each site includes:

preview.html — visual catalog with swatches, typographic scale, buttons
preview-dark.html — the same catalog with dark surfaces

Using it in practice

The workflow is absurdly simple:

Choose a design system from the catalog
Copy the DESIGN.md to your project root
Tell your agent: “use this design system”

That’s it. Seriously.

The agent will read the file, understand the colors, fonts, spacing, and apply everything to the code it generates. If you use Claude, Cursor, GitHub Copilot, or any other agent that reads context files, it works.

Why this matters

For me, the value is in two places:

First, consistency. When you have a DESIGN.md committed in the repo, every interaction with the agent has a shared visual reference. It doesn’t matter if you’re working on the feature today or three months from now — the design system is there, documented, versioned.

Second, speed. Instead of losing 15 minutes describing “I want a blue primary button with rounded borders and soft shadow,” you simply point to the DESIGN.md. The agent already knows exactly how to style it.

And there’s a bonus: since each DESIGN.md is based on real products that work, you’re essentially “copying” visual patterns that have already been tested and validated by companies that invest millions in design.

Considerations

Awesome Design is one of those repositories that seems simple but solves a real problem. The idea of using markdown for design system is elegant because it works with the flow we already use with code agents.

If you’re building something with AI and care about the visual quality of the result, it’s worth checking out. The repository is constantly growing — there are already more than 70 design systems — and accepts contributions.

I particularly like having this in my arsenal. It’s another tool for when I want the agent to deliver something that not only works, but also looks well made.

Links:

Demystifying the OpenSpec Propose Process: What Happens Behind the Scenes?

Wed, 29 Apr 2026 23:22:42 GMT

When we execute the proposal creation command (propose) in OpenSpec — for example, when starting the add-course-enrollment-module functionality —, the system works behind the scenes to generate a series of fundamental files for development success.

If you’ve ever wondered what each of these artifacts is for, this article is for you. Let’s explore the anatomy of each generated document and how they ensure team alignment: from business vision to coding.

1. The Proposal (`proposal.md`): The “Why” and the “What”

The proposal document is the starting point of your journey. It serves as the bridge between business and engineering.

What it contains: Motivation for the functionality, exact scope of what will be built, and which new “capabilities” are being introduced to the system. In our example, it maps the impact on the project and introduces enrollment-management, payment-gateway-adapter, and enrollment-command-dispatch.
What it’s for: Ensuring that everyone involved (developers, POs, stakeholders) has clarity about the value being delivered and the size of the change, even before we write the first line of code.

2. The Design Document (`design.md`): The “How” at High Level

If Proposal focuses on business, Design Document deepens into technical terrain. Here live architecture decisions and risk mitigation.

What it contains: Documents the technical decisions that will guide development. For example, the decision to replicate the pattern already used in “students” in the frontend, the adoption of deterministic commands in BFF, payment adapter isolation, and ensuring idempotency via request_id based on polling.
What it’s for: Preventing future headaches. By listing risks, their mitigations, and the migration plan, the technical team has a map of possible obstacles and how to circumvent them without hurting the application’s architecture.

3. The Specs (`specs/*/spec.md`): The Rules of the Game

One of OpenSpec’s great insights is dividing a complex problem into smaller “capabilities,” generating dedicated specifications for each system domain:

`specs/enrollment-management/spec.md` (Frontend)

Focused on user experience. Defines interface requirements: how the enrollments table should behave, rules for hiding sensitive data, allowed actions (edit, remove, open URL, trigger message modals), and non-functional requirements such as responsiveness and synchronization with backend.

`specs/payment-gateway-adapter/spec.md` (Integration)

Focused on communication with the partner. Defines the rules of the game with the Payment Gateway API: how to create, query, update, and cancel payments. In addition, it ensures correct mapping of third-party API statuses to the internal domain and formalizes security in credential handling.

`specs/enrollment-command-dispatch/spec.md` (Backend/BFF)

Focused on internal robustness. Defines responsibilities in BFF (Backend For Frontend), such as deterministic resolution of provider configurations, issuance of versioned commands, and strict application of idempotency and traceability in events.

4. The Action Plan (`tasks.md`): Getting Hands-On

The cherry on top of the entire planning process. After understanding context, architecture, and specific requirements, OpenSpec compiles all this into an actionable checklist.

What it contains: A logical grouping of tasks. In the case of our enrollment functionality, 12 ordered groups were generated for implementation: ranging from initial setup, through domain modeling, use cases, infrastructure, screen/UI development, hook construction, route configuration, and observability, to closing with tests and final documentation.
What it’s for: It’s the developer’s pocket guide. It allows the team to pick up independent tasks knowing exactly in what order they should be executed, making progress predictable and easy to track.

Conclusion

Having a brilliant idea is great, but transforming it into software running in production requires method.

With the OpenSpec Propose process, all the heavy lifting of planning, architectural organization, and task division is done fluidly. The result are extremely predictable development cycles (sprints), always living documentation, and an intelligently reduced time-to-market in a standardized way.

Next time OpenSpec generates these artifacts, you’ll know exactly the importance of each piece of this gear!

From Feature to Deploy Hands-Free: Architecture of a Software Factory with AI Using Spec-Driven Development

Wed, 15 Apr 2026 23:18:03 GMT

From Feature to Deploy Hands-Free: Architecture of a Software Factory with AI Using Spec-Driven Development

Introduction: What the Software Industry Was Like Before AI

For decades, software development was operated as a high-specialization artisanal process. This isn’t a demerit: this model built practically everything we use today. The problem is that it depends on constant human synchronization. Refinement, implementation, review, QA, security, and deploy are stages with fragile interfaces, many times based on tacit context.

In daily life, this appears in known symptoms: card that changes scope mid-execution, large PR with slow review, bugs discovered late, bottleneck in key people, and difficulty predicting real lead time. When the company grows, this model tends to saturate.

In industrial terms, it’s like a factory that depends on master craftsmen for each station on the line. Quality can be excellent, but variation between batches is high and throughput doesn’t scale linearly. In software, variability becomes rework; rework becomes cost; cost becomes strategic slowness.

The central point of this article is: AI allows migrating from an artisanal model to a software factory oriented by technical contract, without eliminating human engineering, but changing its focus to system design, governance, and operational quality.

The Concept of Automated Assembly Line with AI

From Squad to Industrial Cell

The squad continues to exist as a business and decision unit. What changes is operational execution: repeatable and verifiable tasks pass to automated cells of specialized agents. Each cell has defined input, quality criteria, and traceable output.

In a modern factory, you don’t wait for quality control only at the end of the line. You place inspection at each station. The assembly line with AI applies this logic to software: spec, code, review, test, security, and observability are chained gates.

What Changes If This Idea Works?

If it really works, it changes the game in four dimensions:

Throughput: more deliveries per time unit without increasing team at the same pace.
Consistency: technical standard less dependent on who picked up the task.
Predictability: lead time and rework rate more stable.
Traceability: decision and evidence accessible from card to deploy.

In practice, the PO’s board stops being just intention management and starts reflecting real technical execution state, with telemetry of progress and risk.

Will Jobs End?

The mature answer is: not in the simplistic sense, but roles change. Repetitive work of translation and execution tends to fall. In counterpart, demand grows for platform engineering, specification design, agent governance, automation security, observability, and operation of the sociotechnical system.

The professional who only writes code per task loses space. The professional who designs delivery systems with quality and governance gains protagonism.

General System Architecture

Board as Input System

The board (Kanban/Jira/Linear) continues to be the source of business truth. Each card should contain intention, expected value, acceptance criteria, and restrictions. Without this, automation just accelerates ambiguity.

MCP as Integration and Contract Layer

MCP functions as a context bus between systems and agents: board, repository, CI/CD, tests, security, and documentation. Without a common contract layer, each agent becomes an isolated script and the assembly line degrades to glued automation.

Structured Queue as Operational Buffer

Ingestion transforms cards into queue items with explicit schema, for example:

business_goal
acceptance_criteria
constraints
dependencies
impact_scope
risk_profile
definition_of_done

This queue is the equivalent to a production order with clear technical instruction.

Agent Mesh (Specialized Workers)

Instead of one generalist agent, mature architecture uses specialization:

Spec agent
Implementation worker
Review harness
QA agent
Security agent
UX agent
Orchestrator agent

Each agent acts in a domain, with input/output interface and scaling policies.

Quality Control and Governance

Autonomy without governance becomes operational risk. Control includes:

policies (constitution);
automated gates;
audit trails;
action limits by risk level;
mandatory human escalation in critical cases.

Pipeline Step by Step

1) Demand Ingestion: Board → Queue

The pipeline starts with semantic parsing of cards, context enrichment, and requirements normalization. Example: improve checkout can become three distinct deliverables (API latency, antifraud validation, transactional error UX). The objective is to reduce ambiguity before the first line of code.

2) SDD + Automated Implementation

With normalized demand, the flow enters SDD:

generates change spec;
explicit contracts (API, data, expected behavior);
decomposes tasks;
applies specification-guided implementation.

This is the point where the factory differentiates itself from prompt + code. The spec is the upstream quality control artifact.

3) Automated Code Review

The review harness audits the PR focusing on spec adherence, regression, complexity, and architectural impact. It can approve, request changes, or open a block for human review.

The idea isn’t to remove human reviewer, but reserve human intervention for cases of higher value or risk.

4) CI/CD for Staging

After approval in the review gate, the assembly line triggers build, static checks, and automatic deploy in staging. Ephemeral environments per PR increase isolation and reduce “works on my machine.”

5) Automated Tests with QA Agents

The QA agent executes smoke tests (critical flow sanity), regression suite (legacy stability), and tests based on spec acceptance criteria.

Failure doesn’t return as noise. It returns as a structured item with probable cause, evidence, and correction recommendation.

6) Continuous Security with Cyber Agent

Security enters as a fixed station on the line, not a final stage:

vulnerability analysis;
OWASP Top 10 validations (including Broken Access Control);
automated pentest for sensitive flows;
analysis of secrets, authz/authn, headers, and undue exposure.

The gain is anticipating risk before merge/deploy in production.

7) Reports and Traceability

Each stage produces evidence:

structured logs;
execution links;
test outputs;
security findings;
review decision;
time and quality metrics.

Without evidence, there’s no reliable operation at scale.

8) Feedback Loop to PO’s Board

At the end of each cycle, the assembly line updates the board with real technical status: approved/rejected by gate, blocks, residual risk, and recommended next step.

With this, product and engineering operate on the same truth state.

The Role of Spec-Driven Development (OpenSpec)

constitution, propose/specify, apply, archive

OpenSpec organizes execution into four movements:

constitution: factory principles, limits, and policies;
propose/specify: change design with clear contracts;
apply: implementation guided by spec;
archive: decision record, trade-offs, and evidence.

Why Spec Is the Factory’s Template

Without spec, the agent optimizes for plausibility. With spec, it optimizes for verifiable compliance. This difference is structural. The spec becomes the technical drawing of the part that allows repeatable quality, comparison between batches, and continuous process improvement.

The Role of Agents in Operation

Implementation Worker

Converts spec into code and technical artifacts. Should operate with limited scope, idempotency, and local checks before opening PR.

Review Harness

Acts as automated technical auditor, verifying architectural consistency and contract adherence. Reduces human reviewer’s cognitive load.

QA Agent

Executes test strategy by risk and impact, focusing on detecting regression early.

Security Agent

Analyzes attack surface and enforces security baseline. In critical cases, can automatically block flow.

Orchestrator and Scaling Policies

Coordinates the line: priority, retries, circuit breakers, rollbacks, and human handoff. Without orchestration, there’s no factory; there are just concurrent scripts.

UI/UX as First-Class Stage

Flow and Wireframe Generation

Before implementation, the UX agent proposes navigation flow, wireframes, and interaction alternatives. This prevents interface decisions improvised during coding.

State and Interface Contract Definition

Every screen should explicit states: loading, empty, success, partial, error, and permissions by profile. Poorly defined state is a classic source of rework between frontend, backend, and QA.

UX Consistency Gate Before Code

The assembly line only advances when there’s minimum UX consistency: journey coherence, terminology, accessibility, and behavior predictability. UX stops being finishing and becomes structural requirement.

End-to-End Observability and Traceability

Evidence by Stage

Each station needs to emit consumable evidence by humans and machines. Example: CI run link, coverage report, spec diff, security checklist.

Throughput, Quality, and Risk Metrics

Useful metrics for operating this factory:

lead time by demand type;
automatic approval rate;
rework by stage;
escape rate to production;
MTTR by failure category.

Deliverable Traceability by Stage

Every deliverable should have lineage: card → queue item → spec → commit → PR → pipeline run → deploy. This reduces diagnostic time and improves governance.

Benefits, Risks, and Trade-offs

Scale and Consistency versus Coupling to AI Stack

The more scale gain, the greater the risk of lock-in to tools and models. Mitigation: open contracts, provider abstraction, and policy versioning.

Speed versus Validation Cost

Automation accelerates delivery, but requires investment in gates, observability, and security. Without this, you accelerate error with efficiency.

Autonomy versus Governance

Giving autonomy to agents improves throughput, but amplifies risk surface. The balance comes from clear action limits, audit trail, and human scaling.

Future: AI-Native Software Development

Smaller Teams, Stronger Platforms

The trend is less effort in repetitive operational tasks and more investment in platform engineering. Smaller teams can deliver more when operating a well-designed assembly line.

Engineering as Sociotechnical System Design

Competitive differential becomes designing the entire system: people, agents, policies, metrics, and feedback loops. Code continues central, but is no longer the only unit of value.

Conclusion

The software factory with AI isn’t an abstract promise. It’s an operational model possible now for organizations that already have board, PR, and CI/CD discipline. The leap comes from treating specification as contract, agents as specialized cells, and quality as a flow property.

If traditional development was a high-skill workshop, the next stage is a precision factory with strategic human supervision. It’s not about automating for automation’s sake. It’s about building a delivery system that scales without losing reliability.

The question is no longer whether AI will enter the development cycle. The question is: will your engineering operate AI as a point tool or as a production architecture?

OpenSpec in Practice: Installation, Usage, Complete Flow, and Optional Commands

Wed, 08 Apr 2026 10:00:00 GMT

OpenSpec in Practice: Installation, Usage, Complete Flow, and Optional Commands

If you already use a code agent in your daily life, you’ve already noticed the pattern: when intention is vague, output is also vague. That’s why I started studying Spec-Driven Development (SDD) tools more closely and, on this journey, OpenSpec became a very interesting option.

Its proposal is simple: before going out implementing, you organize the change in clear artifacts (proposal, specs, tasks and, when necessary, design), execute with more predictability, and maintain history of what changed in the system’s expected behavior.

In this article, I’ll detail everything you asked for, in a practical way:

How to install OpenSpec.
How to start in a new or existing project.
Which flow I recommend for daily use.
Which optional commands are worth knowing (CLI and slash commands).

What is OpenSpec (without beating around the bush)

OpenSpec is an open source SDD framework for working with AI assistants. It adds a specification layer in the repository to reduce improvisation and increase predictability.

The main structure stays inside openspec/:

openspec/specs/: current source of truth of the system.
openspec/changes/: proposed changes in progress.
openspec/changes/archive/: changes already completed and archived.

The real value is in this separation: you can clearly see the current state and what is being proposed, without mixing everything in chat conversation.

Prerequisites

Before installing, ensure:

Node.js >= 20.19.0
A package manager (npm, pnpm, yarn, or bun)
A local project (can be brownfield, no problem)
A compatible code agent (Codex, Claude Code, Cursor, Copilot, etc.)

Quick commands to validate environment:

node --version
npm --version

Installing OpenSpec CLI

With npm

npm install -g @fission-ai/openspec@latest

With pnpm

pnpm install -g @fission-ai/openspec@latest

With yarn

yarn global add @fission-ai/openspec@latest

With bun

bun install -g @fission-ai/openspec@latest

After installation:

openspec --version

If this command returns the version, the CLI is ready.

Initializing in the Project

In the project root directory:

cd my-project
openspec init

In practice, this command:

creates the openspec/ folder
generates instructions for the agent (including AGENTS.md in the project)
configures integrations for supported tools

After init, I like to run:

openspec list

This already confirms if the structure is active and if there are open changes.

Generated Structure and How to Read It

A typical change flow looks like this:

openspec/
├── specs/
│   └── auth/
│       └── spec.md
└── changes/
    └── add-2fa/
        ├── proposal.md
        ├── tasks.md
        ├── design.md (optional)
        └── specs/
            └── auth/
                └── spec.md

Practical interpretation:

proposal.md: why change and what will change.
specs/*: requirement/behavior deltas.
tasks.md: implementation checklist.
design.md: technical decisions when necessary.

Recommended Flow for Using OpenSpec in Daily Life

This is the flow I consider most balanced between rigor and speed.

1) Create the change proposal

In the agent, request an OpenSpec proposal for a specific feature.

Example with shortcut (when the tool supports):

/openspec:proposal Add profile search filters

2) Review and validate before implementing

In terminal:

openspec list
openspec validate add-profile-search-filters
openspec show add-profile-search-filters

This is where you cut ambiguity and avoid rework.

3) Refine proposal/specs/tasks

If clarity is lacking, ask the agent to adjust criteria, scenarios, and tasks until it becomes executable.

4) Implement based on tasks

/openspec:apply add-profile-search-filters

The idea is to implement on top of tasks.md and not on improvisation.

5) Archive the completed change

In the agent:

/openspec:archive add-profile-search-filters

Or via CLI:

openspec archive add-profile-search-filters --yes

When you archive, the change leaves the “in progress” state and starts composing the consolidated history.

Main Commands (CLI)

In real use, these are the commands that matter most:

openspec init
openspec list
openspec view
openspec show 
openspec validate 
openspec archive  --yes
openspec update

Quick summary:

init: initializes OpenSpec in the repo.
list: lists open changes.
view: interactive dashboard.
show: displays proposal/tasks/spec deltas of a change.
validate: validates specs format/structure.
archive: archives completed change.
update: updates instructions and bindings for agents in the project.

Shortcut Commands in the Agent

They depend on the tool, but the current OpenSpec pattern in several native integrations is:

/openspec:proposal
/openspec:apply
/openspec:archive

In some contexts/documentations you’ll also find the OPSX (experimental) flow, with commands like:

/opsx:new
/opsx:continue
/opsx:ff
/opsx:apply
/opsx:archive

The important point: check which set of commands your integration installed at init.

Common Errors (and How to Avoid)

Skip proposal review and go straight to implementation.
Accept tasks.md without checking order and dependencies.
Not validate specs (validate) before applying.
Mix commands from different docs without checking your integration.
Not archive completed change (becomes mess quickly).

An Example of End-to-End Execution

# 1) Install
npm install -g @fission-ai/openspec@latest

# 2) Initialize in project
cd my-project
openspec init

# 3) Check state
openspec list

# 4) In agent, create proposal
# /openspec:proposal Add profile search filters

# 5) Review in terminal
openspec validate add-profile-search-filters
openspec show add-profile-search-filters

# 6) In agent, implement
# /openspec:apply add-profile-search-filters

# 7) Archive change
openspec archive add-profile-search-filters --yes

Closing

When you install, initialize, and follow a minimal flow of proposal, review, application, and archiving, the quality of execution with agents rises a lot.

And the best: you maintain real traceability of technical intention, without depending on chat memory.

In Spec-Driven Development, `implement` is where everything else turns into code

Mon, 06 Apr 2026 10:00:00 GMT

In Spec-Driven Development, `implement` is where everything else turns into code

In the text In Spec-Driven Development, Everything Starts with Principles, I talked about the constitution stage, where I define the principles that will govern the project’s decisions.

Then, in In Spec-Driven Development, specify is where ambiguity starts to die, I showed the point where demand gains clearer expected behavior.

In sequence, in In Spec-Driven Development, plan is where specification turns into execution strategy, I entered the stage that organizes the crossing with order, dependency, and sense of risk.

After that, in In Spec-Driven Development, tasks is where the plan turns into concrete work units, I talked about the break that transforms strategy into executable blocks.

But, at some point, all this needs to turn into code.

That’s where implement enters.

And here I think an important simplification is worth making: yes, in a certain sense, implement is just implementing.

If the previous work was well done, this stage shouldn’t carry a great methodological drama. The objective now is to take an already delimited task and transform that into code.

Except that doesn’t mean the work is done.

In the AI context, many times the act of implementing became cheap. The agent writes quickly, suggests structure, connects parts, and returns a plausible solution in a short time.

That’s why, for me, the main point of implement isn’t to romanticize code writing.

The main point is another: after the code was generated, someone still needs to review what was delivered.

Implementing Can Be the Easiest Part

Depending on context, implementation itself almost becomes an operational stage:

pick up the task
generate the code
adjust what’s necessary
move forward

If constitution, specify, plan, and tasks did the job right, this is even expected.

The problem starts when the person treats the agent’s output as if it were already the final delivery.

Because one thing is to generate code.

Another thing is to verify if that code:

solved exactly the task
respected the scope
didn’t invent things outside what was agreed
didn’t distort the specified behavior
continues coherent with the project’s principles

That’s where, for me, the real work lives.

What Really Matters After Implementing

I tend to look at implement less as an epic construction moment and more as a transition point.

The code appeared. Now it needs to be confronted with what came before.

In practice, I want to review if the delivery:

corresponds to the task that was asked for
remains faithful to the specification
didn’t trample the plan
didn’t bring unnecessary complexity
can be accepted with reasonable confidence

If I finish this stage only with the feeling that “it seems like it turned out good,” it’s still little.

A Practical Example of Difference

Let’s go back to the orders example.

Suppose the current task is this:

implement order creation with automatic total calculation

If I look only at the generation stage, maybe it’s enough to ask the agent to do this and receive a block of code back.

But the relevant work starts right after:

did it only create the flow or did it invent more things?
is the total being calculated the right way?
does order without item continue to be blocked?
was the relationship with existing client respected?
did the solution stay compatible with the simplicity the project wanted?

Notice the difference.

The act of implementing can even be straightforward.

What can’t be automatic is accepting what was implemented.

In the End

After the delivery is finalized, there’s still an important responsibility in your hands: ensuring that everything that was defined as a requirement was really met. It’s not enough to look at the code and feel that it seems ready. It’s also not enough to trust just because the implementation came out clean, organized, or plausible.

If there are requirements.md files involved in that work, it’s up to you to check one by one and ensure that all items listed there were, in fact, resolved and marked as complete. This matters because, in the end, what validates the delivery isn’t just the existence of code, but the adherence between what was asked for and what was actually delivered.

And there’s another point here that I think is important: not everything will be validated automatically. In several cases, manual checking will still exist. Interface flow, behavior in specific scenario, integration that depends on real context, experience detail, or anything else not fully covered by automated tests still needs to be verified by you.

That is: the implement stage doesn’t exactly end when code appears. It ends when the delivery was reviewed, confronted with requirements, and validated in a minimally responsible way.

But, in the end, the flow doesn’t end like someone who closes a rigid process and leaves. It ends and restarts.

If a new demand emerges, the natural path is to return to specify and start again from there, with a new clear specification for what needs to be built now. And, if along that path you realize that some principle from constitution needs to change, be refined, or even be substituted, that’s also part of the process.

I like this structure precisely because it isn’t rigid in the bad sense. It’s atomic. Each part exists with a clear function, but none of them needs to be treated as a sacred or immutable piece. If the project evolves, the structure can evolve along. If context changes, principles can change along. If the way of executing improves, the flow can also be adjusted.

For me, this is one of the strongest points of this approach: it organizes work without pretending that the project will remain the same forever.

Thanks for reading this far.

In Spec-Driven Development, `tasks` is where the plan turns into concrete work units

Mon, 06 Apr 2026 10:00:00 GMT

In Spec-Driven Development, `tasks` is where the plan turns into concrete work units

In the text In Spec-Driven Development, Everything Starts with Principles, I focused on the constitution stage, where I define the criteria that will guide the project’s decisions.

Then, in In Spec-Driven Development, specify is where ambiguity starts to die, I entered the stage where demand stops being vague intention and starts having clearer expected behavior.

In sequence, in In Spec-Driven Development, plan is where specification turns into execution strategy, I talked about the stage that organizes the work crossing with order, dependency, and sense of risk.

But plan, by itself, still isn’t executable work.

Between knowing the strategy and starting to implement, there’s still an important question: how do I transform this path into concrete execution blocks without losing coherence?

That’s where the tasks stage enters.

And, for me, this stage matters a lot because it’s in it that planning stops being a good narrative about execution and starts becoming work parts that someone can really pick up, understand, and complete.

The Error of Thinking Breaking Work Is Just Making a Checklist

When someone hears “tasks,” it’s common to imagine something very simple:

create backend
create frontend
make tests
review
deliver

This even seems like organization, but many times it’s still too generic.

A list like this can give a sense of structural progress without solving the main thing: each item continues too broad, too ambiguous, or too mixed to guide execution well.

In practice, poorly broken tasks usually generate some well-known problems:

blocks too big to validate early
parts with confusing responsibility
items that mix several decisions at the same time
hidden dependencies
false sense that work is well distributed

When this happens, execution starts to slip back into improvisation. The difference is that now improvisation comes packaged as a checklist.

What the `tasks` Stage Solves

I don’t see tasks as the obligation to produce a huge list to seem methodical. Nor do I see it as simple conversion of plan into subtitles.

For me, tasks solves one very concrete thing: transform execution strategy into clear, limited, and verifiable work units.

It’s the moment when I try to decide:

which piece of work deserves to become its own task
what does each task need to deliver
where does one task end and the next begin
what dependencies need to be explicit
what criterion makes a task be considered complete

If plan organizes the crossing, tasks defines the steps.

And this matters a lot because, without this translation, implementation continues too big, too diffuse, or too dependent on local interpretation.

`Tasks` Isn’t Fragmenting for the Sake of Fragmenting

There’s a curious risk here.

When the person realizes they need to better break the work, they can exaggerate to the other side and turn everything into microtasks:

create model
create migration
create service
create controller
create route
create unit test
create integration test

This can work in some contexts, but many times produces an artificially excessive break.

Because a good task isn’t just a small task. A good task is a task with sense.

If I separate work using only technical or mechanical cuts, I run the risk of losing the bond with the behavior I wanted to deliver. And, when that happens, I start optimizing implementation pieces instead of real units of result.

For me, the question isn’t just “how to divide?”. The question is “how to divide without destroying delivery coherence?”.

What I Try to Capture in a Good `tasks` Stage

I don’t follow rigid template, but normally I want to leave this stage with some things very clear.

Something along these lines:

each task needs to have an understandable objective
each task should produce an observable result
the boundary between tasks can’t be confusing
dependencies need to be explicit
the size of each block should allow reasonable validation
sequence should continue respecting the plan

If I look at a task list and still feel that “anything can mean anything,” then this stage hasn’t closed right yet.

The Questions I Usually Use in `tasks`

If in constitution I used the 5 Qs, in specify another set of questions and in plan questions of order and risk, here I usually force operational clarity.

Normally I want to answer something like:

Q1. What work unit produces a real result?
Q2. Where does one task end and the next begin?
Q3. What dependencies need to be explicit?
Q4. What can I validate when completing each task?
Q5. Does this break help execution or just seem organized?

`Q1.` What work unit produces a real result?

This question exists to prevent ornamental tasks.

A good task, for me, needs to point to some result that makes sense by itself. Not necessarily a final result for the user, but at least something with clear function within the delivery.

Examples:

structure the minimum order flow with mandatory client relationship
validate automatic total calculation in backend
allow operator to view already registered orders

Notice the difference.

These items may still involve several internal changes, but remain connected to an understandable result. This is much better than a list of technical pieces without context.

`Q2.` Where does one task end and the next begin?

This question helps me avoid overlap.

When the boundary between tasks is bad, several things start to happen:

two tasks mess with the same central problem
completion criteria become vague
a task seems “almost ready” for too long
review gets confused because no one knows what that block should have solved

I like to try to make the separation more honest. If one task depends on another to make sense, this needs to be clear. If two tasks are solving the same nucleus, perhaps the break is wrong.

In other words: a good task has contour.

`Q3.` What dependencies need to be explicit?

Not every task is born isolated.

Some need previous decision, minimum base ready, or completed validation to make sense. When this stays too implicit, execution tends to open fronts too early.

So I try to make visible:

which tasks unlock others
what can walk in parallel without high conflict
which items depend on model or rule already stabilized
where there’s still too sensitive a point to separate early

This is especially important when work involves AI, because opening multiple parallel executions without clear dependency is an efficient way to produce collision with the appearance of productivity.

`Q4.` What can I validate when completing each task?

This question brings tasks very close to healthy execution.

For me, a task gets much better when there’s some clear way to verify if it fulfilled what it promised. This doesn’t mean that every task needs to have a formal battery of tests at the same instant, but it means that it needs to touch some observable criterion.

Something like:

order can be created with client and valid item
total is calculated correctly
system blocks finalization without item
listing shows recently created orders

When the task doesn’t offer any clear verification point, usually it’s too broad or poorly defined.

`Q5.` Does this break help execution or just seem organized?

This is the question I like the most because it dismantles a lot of false sophistication.

There are task lists that look beautiful on paper but hinder more than help. Sometimes because they created too many blocks. Sometimes because they turned each technical layer into its own item. Sometimes because they separated things that should walk together.

So I try to be pragmatic.

If the break:

improves execution clarity
facilitates validation
reduces rework
preserves coherence with the plan

then it’s helping.

If it just increases the feeling of control without improving the real work crossing, it’s probably overdoing it.

A Practical Example of Difference

Suppose the same orders demand I’ve been using in the other texts:

internal operators need to manually register orders
each order must be associated with an existing client
order items need to inform quantity and price
the system should calculate total automatically
orders cannot be finalized without at least one item
in this first version there will be no online payment nor tax issuance

After specifying and planning, I still need to transform this into executable work.

A weak break could be:

make orders backend
make orders frontend
create tests

This is little useful. Each item continues too big.

Now look at a better break:

define and validate the minimum order structure with mandatory link to client
implement order creation with automatic total calculation
block order finalization without valid item
make listing and basic query of orders available to operators
cover critical flows and adjust error messages

Here there’s already much more ground.

Not because the list got longer, but because each item points to a more concrete delivery unit.

In the End

If I had to summarize the function of tasks in one sentence, I would say this: it’s the stage where the plan stops being just intelligent sequence and starts becoming graspable work.

For me, this matters because good execution doesn’t just depend on knowing where to go. It also depends on knowing which piece makes sense to attack now, with clear boundary and minimally honest completion criterion.

When this break is well done, implementation gains rhythm without losing coherence.

In the next text, I want to enter the implement stage, which is when this whole chain finally turns into code.

In Spec-Driven Development, `plan` is where specification turns into execution strategy

Mon, 06 Apr 2026 10:00:00 GMT

In Spec-Driven Development, `plan` is where specification turns into execution strategy

In the text In Spec-Driven Development, Everything Starts with Principles, I talked about the constitution stage, where I define the criteria that will guide the project.

Then, in In Spec-Driven Development, specify is where ambiguity starts to die, I entered the stage where demand stops being vague intention and starts having clearer expected behavior.

But an important bridge is still missing between understanding the problem and going out implementing.

This bridge is the plan stage.

And, for me, it exists to answer a simple question: given that now I know what needs to be done, what is the best way to cross this work without turning execution into an expensive improvisation?

Because one thing is to have a good specification. Another thing is to know how to execute that specification with order, criteria, and sense of dependency.

That’s where many demands start to get lost.

The Error of Thinking Good Specification Is Already Enough

When specification is clear, it gives a somewhat deceptive feeling that the work is already practically solved.

The reasoning usually goes like this:

now that everything is clear, it’s just implementing
AI already understood what needs to be done
let’s go breaking into prompt and deliver quickly

In very small projects, sometimes this even works. But, as demand grows a little, this leap starts to charge a price.

Because understanding what needs to be built doesn’t automatically solve:

where to start
what depends on what
what can be done in parallel
which part concentrates more risk
where it’s worth validating early
how to avoid rework between stages

Without this reasoning, implementation turns into a sequence of locally plausible actions, but globally disorganized.

In the short term, it seems like speed.

In the medium term, refactoring, decision collisions, crooked scope, and that famous sensation that the work walked a lot and consolidated little appear.

What the `plan` Stage Solves

I don’t see plan as a pompous schedule. Nor do I see it as the obligation to assemble a document full of little boxes just to seem like process.

For me, plan solves one very practical thing: transform specification into execution strategy.

It’s the moment when I try to organize:

the order of work
dependencies between parts
blocks that make sense to separate
risks that deserve attention early
sequence that reduces rework
validation points along the way

In other words, if specify answers “what needs to happen,” plan starts to answer “how do I organize the crossing of this work.”

This doesn’t mean descending to code yet. It means giving operational shape to the path.

`Plan` Isn’t Distributing Tasks on Impulse

There’s a common error here.

Many people understand planning as simply breaking the demand into several smaller items:

make backend
make frontend
create database
add tests
review

This is better than nothing, but can still be too superficial.

Because planning isn’t just fragmenting. It’s organizing with intention.

A good work break needs to consider dependency, risk, and logical sequence. Otherwise I just trade a big block for several confusing little blocks.

For example, if there’s a structural decision that impacts API, interface, and persistence, maybe the first step isn’t “make screen.” Maybe it’s validating the main flow, the data model, or the central rule first. If I ignore this, I spread implementation before locking the solution’s axis.

Good planning, for me, isn’t the longest list. It’s the most defensible order.

What I Try to Capture in a Good `plan` Stage

I don’t follow a rigid template, but normally I want to leave this stage with some things relatively clear.

Something along these lines:

what is the safest execution path
which parts unlock the others
which blocks can be treated separately
where is the biggest technical or understanding risk
at what point is it worth validating before continuing
how to avoid redundant work or bad sequence

If I still can’t see this, the demand may even be well specified, but still not well planned.

The Questions I Usually Use in `plan`

If in constitution I talked about the 5 Qs, and in specify I used another set to reduce ambiguity, in plan I tend to ask questions that force order and strategy.

Normally I want to answer something like:

Q1. What is the smallest viable sequence to put this delivery on its feet?
Q2. What depends on what?
Q3. Where is the biggest risk or uncertainty?
Q4. What needs to be validated before expanding?
Q5. How to break this without losing coherence?

`Q1.` What is the smallest viable sequence to put this delivery on its feet?

This question helps me not start with the most visible place, but with the most useful one.

Not always the first step is the most visible. Sometimes the work needs to start with a minimum base that allows the rest to exist with less friction.

When I answer this, I try to think something like:

what is the main flow that needs to exist first
what minimum foundation sustains the next stages
what do I need to have standing to stop speculating and start consolidating

This helps a lot to avoid ornamental planning, in which several parallel pieces advance without the problem’s nucleus really being resolved.

`Q2.` What depends on what?

This is the question that prevents the plan from becoming wishful thinking.

There are demands where almost everything seems to be able to happen in parallel, until the moment when one decision changes and drags half the work along.

I like to make explicit:

which parts are prerequisites for others
which decisions need to be made early
what can advance in parallel without high risk of collision
where there is enough coupling to demand sequence

This is especially important when execution involves AI, because it’s very easy to open several fronts at the same time and only later realize that they were sharing different assumptions.

`Q3.` Where is the biggest risk or uncertainty?

Planning is also deciding where I don’t want to discover a problem too late.

Every demand has some more sensitive point:

business rule still somewhat nebulous
more unstable external integration
modeling decision that affects the rest
technical restriction that can block implementation

If I identify this early, I can pull this risk closer to the beginning. Not necessarily to completely resolve it right away, but at least to validate it before the rest of the plan is built on top of a weak assumption.

For me, this is one of the biggest gains of plan: it reduces the chance of a beautiful sequence on top of a wrong premise.

`Q4.` What needs to be validated before expanding?

Not everything needs to be implemented to the end before learning something useful.

Sometimes I just need to validate:

if the central flow closes
if the main rule sustains itself
if integration responds as expected
if initial modeling isn’t crooked

This question helps me insert validation milestones within the plan, instead of pushing all learning to the end.

This is especially valuable in work with agents, because production speed can mask a structural problem for several consecutive interactions. When I create validation points, I better control this risk.

`Q5.` How to break this without losing coherence?

Breaking work is necessary. Breaking it poorly is very dangerous.

If I separate too much, I create fragmentation and overhead. If I separate too little, I return to the big and difficult-to-control block.

So I try to look for a decomposition that preserves sense. Something in which each block has a clear responsibility, a reasonable completion criterion, and an understandable relationship with the greater objective.

In practice, this usually means:

break by flow or capability, not by arbitrary layer
avoid dividing too early something that still depends on central decision
separate what can be validated in isolation
keep each part connected to an observable result

When this is well done, the next stage gains a lot. Because execution starts walking on more stable tracks.

A Practical Example of Difference

Clear specification:

internal operators need to manually register orders
each order must be associated with an existing client
order items need to inform quantity and price
the system should calculate total automatically
orders cannot be finalized without at least one item
in this first version there will be no online payment nor tax issuance

Up to here, I know what needs to be built.

But this still isn’t a plan.

A weak break could be:

make orders screen
create orders endpoint
save to database
add tests

This seems organized, but is still a generic list.

Now look at a more planned version:

validate minimum order model and mandatory relationship with client
implement order creation with automatic total calculation
prevent order finalization without valid item
only then build listing and visualization for operators
finally, cover critical flows and adjust error messages

Notice the difference.

In the second version, I’m not just dividing work. I’m choosing an order that respects dependency, reduces rework, and prioritizes the rule’s nucleus before the layers around it.

Planning Isn’t Bureaucratizing

It’s worth saying this because many people turn up their nose when they hear the word “planning.”

I also don’t have patience for process that exists just to generate document.

But the plan stage I’m defending here is another thing. It doesn’t serve to rigidify work. It serves to increase the chance of clean execution.

Before AI, much of development energy was consumed by coding itself. There was much more operational hours going to write, adjust, refactor, connect parts, and carry implementation on the back.

With AI, that coding time dropped drastically.

And, if that time dropped, it shouldn’t be automatically reinvested in producing even more code on impulse.

For me, a relevant part of that time needs to be reallocated to planning.

Not in the sense of transforming developer into PM or PO.

It’s not about substituting who defines priority, business context, product objective, or demand direction.

It’s about something else: once the demand has already arrived, someone still needs to think about what is the best way to execute it.

Someone still needs to understand:

what is the best order
where it’s not worth improvising
what needs to be validated early
how to divide the work without losing the thread

Before, a lot of inefficiency was hidden inside manual implementation effort. Now that implementing became cheaper, the weight of poorly thought-out execution became more visible.

If I receive a good demand and still go out coding without a plan, the chance isn’t just to err. It’s to err quickly, on several fronts, with a false sense of productivity.

So, for me, planning well became an even more important skill in the AI context.

Not because the process became more bureaucratic.

But because coding stopped being the main bottleneck. And, when that happens, thinking better about execution starts to have much higher return.

Closing

If constitution defines principles, and specify reduces ambiguity about what needs to be built, then plan is the stage that transforms that clarity into coordinated movement.

It’s where I stop looking only at the demand and start looking at the crossing.

Because execution isn’t just doing. Execution is also choosing the right order to do.

And, in the end, much bad implementation isn’t born from bad specification. It’s born from bad sequence.

In the next text, I want to enter the tasks stage, which is when this plan starts to turn into concrete work units.

In Spec-Driven Development, `specify` is where ambiguity starts to die

Mon, 06 Apr 2026 10:00:00 GMT

In Spec-Driven Development, `specify` is where ambiguity starts to die

In the text In Spec-Driven Development, Everything Starts with Principles, I focused on the constitution stage, where I define the criteria that will guide the project.

But principles alone don’t deliver system.

After saying how the project should think, comes the time to say what it needs to build.

That’s where the specify stage enters.

And, for me, this is one of the most important parts of the entire flow. Because, in practice, it’s here that the work of a new demand begins.

Before planning, breaking into tasks, or implementing, I need to specify what that demand really asks for.

It’s precisely at this point that we start exchanging generic desire for operational expectation.

The Problem of Asking for Solution Too Early

When someone works with a code agent frequently, there’s a very strong temptation to skip directly to implementation.

The conversation usually starts like this:

create a CRUD for clients
add login
make a dashboard
integrate with payment
add notification

Again: this isn’t necessarily useless. In some cases, it can even produce good things.

The problem is that this way of asking mixes intention, solution, and expectation in one big ball.

I say “make a dashboard,” but I don’t make clear:

who this dashboard is for
what problem it solves
what needs to appear there
what’s mandatory in this first version
what’s explicitly out of scope
how I’ll know if that turned out right

When this isn’t clear, the agent does what it was trained to do: complete the gap with a plausible solution.

Except plausible isn’t the same as correct.

What the `specify` Stage Solves

I don’t see specify as “writing a pretty requirement.” Nor do I see it as the moment to assemble heavy documentation just because the process asks for an artifact.

For me, specify serves for one very concrete thing: reduce ambiguity enough so that execution doesn’t have to guess the problem.

It’s the moment when I try to make explicit:

what problem needs to be solved
who is affected by this problem
what behavior the system should have
what restrictions matter
what goes in and what stays out
what criteria define if the delivery is acceptable

That is, I stop talking about “idea” and start talking about expected behavior.

This changes everything.

Because, when specification is well done, the agent stops inventing context. It starts working on more delimited terrain. Interpretation continues to exist, of course. But now interpretation happens within more controlled margins.

`Specify` Isn’t Detailing Implementation

This point matters a lot.

A bad specification isn’t just one that’s too vague. It can also be one that descends too early to the solution level.

Example of bad request:

create endpoint POST /users/register
use Redis for session
create table X with columns Y and Z
implement service UserOnboardingService

Notice that this is almost at implementation level.

Sometimes this is inevitable, especially when there’s already very clear technical restriction. But, in many cases, I still don’t want to answer “how.” I want to first nail down “what” needs to happen.

Something like:

a new user needs to be able to create their account
the system should validate mandatory data
duplicate emails cannot be accepted
after successful registration, the user should be able to access the authenticated area

Now yes I have behavior.

And behavior is a much better starting point than premature technical structure.

What I Try to Capture in a Good Specification

I don’t follow a rigid template in every situation, but I almost always try to answer a similar set of questions.

I don’t think of this as an official framework. I think more like a filter to know if the request is clear enough to turn into real work.

Normally I want to leave the specify stage knowing these things:

what is the problem
who is it for
what expected result matters
what is the scope of the first delivery
what restrictions or rules cannot be ignored
how to validate if the delivery met the objective

If I can’t answer this with some objectivity, the request is still raw.

The Questions I Usually Use in `specify`

If in constitution I talked about the 5 Qs, here I tend to use another set of simple questions to force clarity.

Something along these lines:

Q1. What real problem does this delivery solve?
Q2. Who is this behavior for?
Q3. What does the system need to do, exactly?
Q4. What stays out of this specification?
Q5. How do I recognize that this is done?

`Q1.` What real problem does this delivery solve?

This question exists to prevent ornamental features.

Many demands arrive with the face of solution, but without explaining the pain being attacked. When this happens, the risk is building something functional and still irrelevant.

If I can’t describe the problem with clarity, I usually still don’t have specification. I have just impulse.

A useful answer here usually looks something like:

today the team loses time consolidating data manually
the user cannot complete an important step without operational support
there is rework because certain information is scattered

This type of formulation already takes the conversation out of “would be nice to have” territory and puts it in “needs to solve this here” territory.

`Q2.` Who is this behavior for?

This question helps prevent too-generic specification.

When I just say “the user,” I’m usually hiding important detail. Internal user, end customer, operator, administrator, finance, support: each of these roles changes what makes sense to build.

Defining this early helps because several decisions depend on who is on the other side:

acceptable level of complexity in the interface
volume of information displayed
tolerance for manual steps
need for audit
type of error that needs to be treated

Sometimes the same functionality seems obvious until the moment you ask “for whom?”.

`Q3.` What exactly does the system need to do?

Here I try to get out of generic language and describe observable behavior.

It’s not enough to say “have client management.” That’s still too nebulous. I prefer to break down into more concrete actions and responses, for example:

allow registering client with defined mandatory fields
list clients with search by name or document
prevent duplication by main identifier
allow editing data without erasing relevant history

This type of writing helps because it’s already born closer to validation.

If I can imagine someone testing that sentence, it’s a good sign.

If I read the sentence and still don’t know what would be considered correct behavior, the specification is still loose.

`Q4.` What stays out of this specification?

This point is very underestimated.

Scope isn’t just what goes in. Scope is also what I decide not to solve now.

When I don’t make this explicit, the agent tends to complete the package. And, in its head, completing the package can mean:

create advanced permissions
add pagination, filters, and export
prepare multitenancy
structure internationalization
leave everything ready for mobile

None of this is absurd in itself. The problem is when these expansions appear because no one said “this isn’t part of this delivery.”

So I like to write scope exclusions without any shame:

at this stage there will be no fine permission control
analytical reports stay for a next phase
offline support won’t be treated
batch import is out of the first version

Saying “no” is part of specifying well.

`Q5.` How do I recognize that this is done?

For me, this is the question that most approximates specification to executable work.

If I don’t know which criteria close the delivery, I open space for two bad things: infinite implementation or false sense of completion.

I like to think of this in terms of acceptance criteria. Not necessarily in an ultra-formal format, but in a way that allows verifying the result without depending too much on subjective interpretation.

Examples:

given an already registered email, the system should reject new creation with appropriate message
after saving a valid record, it should appear in the listing
only authenticated users can access a certain area
external error should be displayed without breaking the entire flow

When I have this level of clarity, the next stage becomes much safer.

Good Specification Doesn’t Need to Be Huge

It’s worth saying this because, when someone hears “specify,” they might imagine a huge document, full of formal sections and corporate text.

That’s not what I’m defending.

A specification can be lean and still be very good, as long as it answers the essential with clarity.

What I try to avoid isn’t lack of volume. It’s lack of precision.

Between a short and clear text versus a long and nebulous text, I prefer a thousand times the short and clear one.

A Practical Example of Difference

Generic request:

create orders module

This opens too much space.

Now look at a more specified version:

internal operators need to manually register orders
each order must be associated with an existing client
order items need to inform quantity and price
the system should calculate total automatically
orders cannot be finalized without at least one item
in this first version there will be no online payment nor tax issuance
the delivery is ready when it’s possible to create, view, and query valid orders without manual calculation

I still haven’t described architecture. I still haven’t talked about database, framework, queue, service, or class naming.

But now there’s enough ground to plan, implement, and review.

In the End

If I had to summarize the function of specify in one sentence, I would say this: it’s the stage where intention stops being vague enough to become conversation and becomes clear enough to become work.

For me, this is the point where ambiguity really starts to lose space.

Because, in the end, specifying well isn’t producing bureaucracy. It’s making the problem understandable enough so that implementation, review, and validation work on the same reality.

In the next deep dive, the natural path is to enter the stage that takes this specification and transforms it into an execution plan.

In Spec-Driven Development, Everything Starts with Principles

Mon, 06 Apr 2026 10:00:00 GMT

In Spec-Driven Development, Everything Starts with Principles

In the article How I’ve Been Using Spec-Driven Development with Spec Kit in My Projects, I went through the flow stages in a more general way.

But I thought it was worth going back calmly to each of them, in separate texts, because there’s stuff there that deserves to be properly dissected.

If the idea is to gradually deepen this flow, it makes sense to open precisely at the stage where I define the project principles.

I’m talking about constitution.

And I want to focus here less on the tool itself and more on the logic behind the thing. Because, honestly, the name can change, the command can change, the stack can change, but the need remains the same: before asking the agent for features, someone needs to say what the rules are that will govern that project’s decisions.

For me, this is one of the most underestimated points when we talk about development with AI.

The Error of Starting with the Feature

The most natural path, especially when the person is already used to using a code agent, is to start like this:

create a registration screen
make an API for such thing
add authentication
now create tests
now refactor
now improve architecture

Notice the pattern: the conversation always starts with the visible delivery.

It’s not that this is always wrong. In small projects, quick experiments, or proof of concept, it can even go in this line and accept controlled chaos. The problem is when this logic becomes a work method.

When I start directly with the feature, without declaring the project principles first, I leave an important part of the decision in the hands of improvisation. And the agent is very good at completing gaps. Good even too much. If I don’t say clearly what I value, it will fill the void with generic pattern, plausible assumption, and excess of good will.

That’s when things start to go sideways.

The agent creates too much abstraction in a simple project. Or makes everything too coupled in a project that needed to last. Or shoves tests where it wasn’t a priority. Or ignores tests where it was mandatory. Or invents a “beautiful” architecture that no one asked for. Or makes naming, organization, and structure decisions without any alignment with what I really wanted to preserve.

In the end, the feeling is productivity. But many times it’s just speed without direction.

What `constitution` Really Solves

When I talk about constitution, I’m not thinking about a bureaucratic document. It’s not to become a team’s beautiful manifesto that no one reads later. Nor is it to write an abstract treatise about good practices.

The function of this stage is much more practical: declare the principles that will guide the next decisions.

In other words, it’s the moment when I say:

what this project values
what this project avoids
which criteria weigh more when there’s trade-off
how quality will be interpreted here
what type of complexity is acceptable and which is already excess

And this isn’t restricted to code style or architecture. Depending on context, I can also use constitution to register project operational rules, such as branch flow, test policy, and minimum security requirements. If these things change how the delivery should be built and reviewed, it makes sense for them to appear right here.

This changes the whole conversation.

Because, from there, AI stops responding only to the immediate request and starts operating within a field of more explicit restrictions and preferences. And, for me, working with AI without clear restriction is asking to receive a technically possible solution, but conceptually misaligned.

Principle Isn’t Loose Rule

There’s an important difference here.

Many people, when they think about principles, write very generic things:

use clean code
maintain quality
follow good practices
make scalable code

This looks nice, but helps little.

These phrases even point in a direction, but still leave too much room for interpretation. And, when there’s too much room for interpretation, the agent fills with what it knows as average pattern. The problem is that average pattern isn’t context.

For me, a good principle is one that influences real decision.

For example, instead of just saying “maintain simplicity,” I prefer something more concrete, like:

prioritize direct solution before reusable abstraction
avoid extra layers while the domain is still small
only introduce more sophisticated pattern when there’s concrete pain

Now yes there’s an orientation that impacts what will be built.

Similarly, instead of just saying “ensure quality,” I can say:

every delivery must preserve readability
names need to communicate business intention
changes shouldn’t break existing behavior without explicit reason
automated tests are mandatory only in critical parts

This already changes the expected behavior quite a bit.

`Constitution` Exists to Hold the Project’s Axis

The more I use an agent, the more I realize that the biggest risk isn’t it getting syntax wrong. That usually is the least of problems.

The bigger risk is it getting the code right and the direction wrong.

This distinction matters a lot.

A system can compile, pass the build, run locally, and still be technically misaligned with what the project needed to be. It can be too complicated, too fragile, too coupled, too generic, or simply outside the spirit of the base.

That’s why I see constitution as a criteria alignment stage even. It’s in it that I establish how the project thinks.

It might seem like a pompous way of saying this, but that’s exactly the point: before deciding what to do, I need to define how decisions will be made.

Without this, each feature becomes a new improvised negotiation.

What I Usually Define in This Stage

I don’t follow a rigid formula, but I normally try to answer what, in my head, became a kind of 5 Qs of constitution.

It’s not a formal framework nor a patented method. It’s just a simple way to not leave this stage too vague.

For me, this is a good way to get out of generic principle and arrive at something that really guides decision.

The 5 Qs I usually answer are these:

Q1. What type of solution should this project privilege now?
Q2. What does this code need to look like to remain readable?
Q3. What technical limits cannot be negotiated?
Q4. What level of testing makes sense at this stage?
Q5. How much architecture does this project really need at this moment?

`Q1.` What type of solution should this project privilege now?

This question exists to define the degree of acceptable sophistication.

Every project lives this tension. If I simplify too much, maybe the next change hurts. If I prepare too much for the future, I run the risk of assembling a heavy structure for a small problem. So I like to say explicitly which side the project should lean.

In practice, answering this is saying things like:

prioritize direct solution before reusable abstractions
prepare growth only when there’s concrete signal of that need
avoid extra layers while the domain is still small

In general, I tend to favor simplicity with conscious growth. That is: make it simple now, but not any which way.

`Q2.` What does this code need to look like to remain readable?

Not all clarity is just a matter of beautiful code. Many times, clarity has more to do with communication than with style.

In this part, I try to say how the code needs to present itself. If names should reflect domain, if organization should be direct, if reading should be more important than any clever trick, all of this fits here.

A practical way to answer is writing things like:

names should communicate business intention
structure should be easy to understand by someone else on the team
readability weighs more than technical cleverness

This is important because AI can generate a lot of things quickly, but not always with the best sense of communication. Sometimes the solution even works, but the code text becomes generic, without domain vocabulary, without the face of a well-thought-out system.

`Q3.` What technical limits cannot be negotiated?

This is the question that defines limits.

Every constitution needs to make clear what the project doesn’t accept, even when that seems faster in the short term. It’s here that I usually register the type of shortcut that isn’t worth it, the complexity that I don’t want to buy, and the type of decision that needs justification before entering.

Example answers:

don’t introduce dependency without strong necessity
don’t increase complexity just to seem scalable
don’t break compatibility without justifying
don’t sacrifice understanding in the name of technical cleverness

It’s also here that I can leave very practical locks, for example:

the project should follow a gitflow flow or another branch strategy defined by the team
every change in critical flow needs to come accompanied by the type of test agreed for this project
minimum security requirements, such as mandatory authentication, access control, secret protection, and care with dependencies, cannot be treated as optional detail

This type of principle helps a lot because it reduces the chance of the agent confusing sophistication with quality.

`Q4.` What level of testing makes sense at this stage?

This point varies a lot from project to project. And precisely because it varies so much, it’s worth being in the constitution.

There’s a project where I want serious coverage from the beginning. There’s a project where I accept not having automated tests in the first version to gain speed. There’s context where integration is more important than unit tests. There’s context where the domain demands much more rigor.

So I try to answer very objectively:

are automated tests mandatory or not?
if they are, where are they indispensable?
at this stage, are unit tests, integration, or well-guided manual validation worth more?

The important thing is not to leave this implied. If tests are mandatory, say. If tests are selective, say. If at this stage it doesn’t make sense to invest in this, say that too. What I try to avoid is leaving AI to guess the project’s quality policy.

For me, this same logic applies to security. If the project needs to follow minimum cybersecurity requirements from the first delivery, this should be said in constitution with the same objectivity. Authorization, audit, secret handling, sensitive data protection, and basic hardening criteria shouldn’t appear only at the end as tardy correction.

`Q5.` How much architecture does this project really need at this moment?

This question helps me contain the temptation to turn any project into an architecture showcase.

Not every project needs complete DDD, event-driven architecture, queues, messaging, plugin system, extreme modularization, and fifteen interfaces to abstract half a dozen simple rules.

When I answer this question, I normally do it by delimiting what is acceptable now and what would be premature at this moment.

Example:

maintain lean architecture at this stage
avoid advanced patterns without concrete necessity
prefer simple organization until domain complexity justifies something bigger

When I realize the project needs to remain light, I write this directly. Without ceremony. Because, if I don’t write, there’s a good chance of conference architecture appearing for a corner problem.

An Example of Reasoning

Suppose a small, internal project, with a short deadline, to solve a very specific operational problem.

If I don’t say anything about principles, the agent can:

create excessive separations
prepare the system for a scale that perhaps will never come
suggest many libraries
invest in generic abstractions too early

Now imagine that I define something more or less like this:

prioritize simplicity and maintenance speed
avoid unnecessary dependencies
prefer direct structure while the domain is small
ensure clarity of names and flow
automated tests only in the most critical rules
follow the branch flow defined by the team, without skipping review
treat authentication, authorization, and secrets as mandatory requirements from the beginning

Notice how this already changes the terrain.

I’m not defining the feature. I’m not saying if it will be API, screen, database, or framework. I’m defining the way of thinking about that project. And this, later on, affects naming, folder organization, number of layers, validation strategy, refactoring criteria, and even the type of suggestion that AI will consider acceptable.

This Is Also Software Engineering

There are people who look at this stage and think this is just “improved prompt.” I think it’s a weak reading.

For me, this is software engineering applied to a new execution context.

At its core, we’re talking about the same responsibility as always:

define criteria before building
reduce ambiguity
make trade-offs explicit
protect system quality

The difference is that now this needs to be said in a way that an agent can use throughout the flow.

That is: it’s not enough to have good sense in your head. You need to transform that good sense into operational instruction.

In the End

If I had to summarize the importance of constitution in one sentence, I would say this: it exists to prevent execution from starting without criteria.

In SDD, this matters because the idea isn’t just to produce artifact. It’s to produce with intention.

And intention, when not declared, becomes assumption.

That’s why I like this stage so much. Before discussing screen, endpoint, database, stack, or task, I can define what will really command decisions.

For me, this is the moment when the project stops being just a request and starts to become a system with technical identity.

In the next deep dive, the natural path is to enter the specify stage, which is when the conversation leaves principles and enters what really needs to be built.

How I've Been Using Spec-Driven Development with Spec Kit in My Projects

Fri, 03 Apr 2026 10:00:00 GMT

How I’ve Been Using Spec-Driven Development with Spec Kit in My Projects

If you’ve ever tried to build software with a code agent, you’ve probably been on this roller coaster. Some days things go really well. On others, the agent delivers half of what you asked for, invents a structure that makes no sense, breaks the build, or even does something that seems to work, but with a foundation that you yourself wouldn’t do that way.

I’ve been thinking a lot about this. And, for me, the problem is almost never just in the model. Most of the time, the problem is in the way we ask. Loose request generates loose response. Poorly tied context generates crooked decision. And, when there’s no clear way to say what needs to be built and how it will be validated, the chance of turning into improvisation is too high.

That’s why I started looking more closely at Spec-Driven Development, or simply SDD. And, within that, a tool that caught my attention was Spec Kit, an open source toolkit created precisely to organize this workflow with agents.

In this text, I want to show you how I’ve been seeing this process in practice. Not as pretty talk, but as a more serious way of working with AI in development. The idea here is to get out of the thrown prompt in the chat and enter a flow with principles, specification, plan, tasks, and implementation.

What bothers me about the so-called vibe coding

There’s a way to use AI for programming that works really well for quick prototypes. You open the chat, send something like “create a login screen” and refine through trial and error. This has its value. I even do this at some moments.

The problem is when that same logic goes into a real project.

When the project needs to last more than an afternoon, when there’s maintenance, when there’s standard, when there’s architecture, when there’s technical responsibility, this model starts to make noise. Because the agent even recognizes pattern very well, but it doesn’t guess intention. It needs direction.

That’s when SDD started to make sense to me. The proposal is simple: instead of starting directly in code, I start with intention. I define project principles, describe what I want to build, organize technical decisions, and only then move to execution. Code continues to be important, of course, but it stops being the first thing in the conversation.

What is Spec Kit

In the project’s own definition, Spec Kit is an open source toolkit aimed at helping you focus on product scenarios and predictable results, instead of coding everything crazily.

The toolkit’s central idea talks a lot with what I’ve been looking for in practice: transforming specification into an active part of development, and not into a dead document that only exists to fulfill a checklist.

Its main flow revolves around five stages:

constitution
specify
plan
tasks
implement

It seems simple, and indeed it is. The value isn’t in inventing fashion. It’s in putting the house in order.

Before anything: what I like about this approach

What attracts me most here isn’t “automating everything.” It’s precisely the opposite. It’s creating a process in which I continue to hold the direction.

When I use this flow, I’m not outsourcing thought. I’m putting thought in order. AI enters as a partner in elaboration, refinement, and execution, but responsibility for quality continues to be mine.

This point is important because many people look at tools like this and imagine that the gain is in pressing a button and having a complete system come out. I don’t see it that way. For me, the gain is in reducing noise and misunderstanding.

Prerequisites

To use Spec Kit, you need to have some things in the environment:

uv to manage CLI installation
Python 3.11+
Git
A compatible agent

The project supports several agents. Among them, GitHub Copilot, Claude Code, Codex CLI, Cursor, Gemini CLI, and others.

Installing the CLI

The recommended way by the project is to install the CLI with uv tool install.

Example:

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z

If you want to use the latest version from the main branch:

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git

After that, you can already use commands like:

specify init 
specify check

If the idea is to run without installing persistently, you can also use uvx.

Creating a project with Spec Kit

After installation, I initialize the project with specify init.

Example to create a new project:

specify init my-project --ai copilot

If I want to use it in the current directory:

specify init . --ai copilot

In the case of Codex CLI, I would use something like this:

specify init . --ai codex --ai-skills

If I were to do the same with Claude Code, it would be like this:

specify init . --ai claude

The --ai-skills detail matters because, in Codex, Spec Kit works as a skill instead of traditional slash command.

After init, the toolkit mounts the project structure base. Among files and folders, you’ll typically see things like:

.specify/memory/constitution.md
.specify/templates/
.specify/scripts/
.specify/specs//

In addition, it installs the commands that will guide the process.

After this base is ready, you can touch this flow the way that makes most sense in your daily life: in your preferred IDE’s chat, in the agent via terminal, or in the tool you’re using.

Stage 1: defining principles with `constitution`

This is a part that I think is really good. Before talking about the feature, I define the principles that will command the project.

In the traditional Spec Kit flow, the command is:

/speckit.constitution

In Codex with skills, it would be equivalent to:

$speckit-constitution

Here I don’t describe screen or endpoint. I describe quality rule and decision criterion.

Depending on context, this stage can also register operational rules and cross-cutting project requirements, such as test policy, branch flow, and minimum security requirements. If this changes how the work should be built and reviewed, it makes sense to declare it already in the constitution.

For example:

Create principles focused on clean code. The project should be small and simple. Should not have automated tests in this first version.

The result of this goes to the project’s constitution file. And what I like is that this pulls the next stages. The agent starts carrying these principles as reference.

This helps prevent something that bothers me a lot: the agent starting to invent fashion in a simple project.

Stage 2: describing what I want to build with `specify`

After principles, I move to functional specification.

Here the focus is on what and why, not on stack. This separation makes a difference. If I mix functional requirement with technical detail too early, I myself tangle the conversation.

Example of command:

/speckit.specify

Or in Codex:

$speckit-specify

And then I pass something along these lines:

Build an application that helps me manage my tasks and track daily activities. Tasks should have title, description, completion date, priority, and status. Include filters to view tasks by priority, status, or date.

From this, Spec Kit generates a specification with user stories, requirements, and acceptance checklist.

This is a part that I recommend reviewing carefully. Don’t treat it as final just because the text looks neat. Read it for real. See if the stories make sense, if the scope didn’t walk on its own, if the checklist is coherent.

Stage 3: clarifying gaps with `clarify`

This stage is optional in the flow, but I’d say it helps a lot whenever the specification was vague at some point.

Command:

/speckit.clarify

The function here is simple: identify poorly defined points before technical planning. This avoids rework down the line.

I like this stage because it forces a more honest conversation with the problem. Sometimes we think we know what we want to build, but discover we haven’t even decided basic behavior things.

If the feature is small, maybe I’ll skip it. But, if there’s any relevant ambiguity, I prefer to spend energy here than to screw up further ahead, in the middle of implementation.

Stage 4: creating the technical plan with `plan`

Only here do I enter stack, architecture, and technical decisions.

Command:

/speckit.plan

Or:

$speckit-plan

Example prompt:

The application should use TypeScript, React Native with Expo to run on Expo Go, with as few libraries as possible and Expo Router for navigation. Store data in SQLite with Expo SQLite. For this version, don't include automated tests.

What I find interesting about this stage is that the toolkit doesn’t just stay in a generic summary. It tends to break the plan into more concrete things, such as:

plan.md
research.md
data-model.md
contracts
quickstart
architecture decisions

If the chosen stack changes quickly, this is a stage where it’s worth researching version, compatibility, and specific detail a lot. Spec Kit’s own documentation suggests deepening research when the technology is very dynamic.

Stage 5: breaking the plan into tasks with `tasks`

After the plan, I generate the decomposition into tasks.

Command:

/speckit.tasks

Or:

$speckit-tasks

This stage transforms the plan into an executable list. And this is important because good implementation doesn’t just depend on knowing what to build. It also depends on knowing in what order to build.

The tasks.md usually organizes work into phases, generally aligned with user stories, technical dependencies, and checkpoints.

This stage helps me see if the plan is really implementable. If the task list comes out weird, it’s already a sign that the plan might still be poorly resolved.

Stage 6: implementing with `implement`

With everything ready, then yes I ask to implement.

Command:

/speckit.implement

Or:

$speckit-implement

This stage picks up what was defined before and executes the implementation following what was tied in previous stages.

This is where many people get excited and think the work is done. For me, it’s exactly the opposite. It’s where the most important part begins: reviewing what was done with rigor and without covering up.

Because an implementation can follow the plan and still come out bad. It can come out with crooked UX, unnecessary structure, poorly placed abstraction, or half-finished pieces.

This is a great reminder. The process greatly improves consistency, but doesn’t eliminate human review.

A practical example of flow

If I were to summarize a real usage flow, it would look something like this:

specify init . --ai codex --ai-skills

Then, within the agent in Codex:

$speckit-constitution
$speckit-specify
$speckit-clarify
$speckit-plan
$speckit-tasks
$speckit-implement

If I were to do the same in Claude Code, I would start like this:

specify init . --ai claude

Then, within the agent:

/speckit.constitution
/speckit.specify
/speckit.clarify
/speckit.plan
/speckit.tasks
/speckit.implement

What changes in practice when I work this way

For me, the main change is this: I stop negotiating with loose response and start working with something more tied up.

Instead of repeating “it wasn’t this,” “this was missing,” “I didn’t like the structure,” I start to have intermediate stages where I can correct the route with much more clarity:

principles
specification
clarifications
technical plan
tasks
implementation

This reduces friction and also reduces that feeling that I’m fighting with the agent all the time.

What I review at each stage

I don’t blindly trust any stage. What I do is review each one with a different focus:

In constitution

I look to see if the principles really reflect how I want the project to be conducted.

In specification

I see if the problem was correctly described and if the requirements didn’t escape the scope.

In clarification

I look for nebulous points that can still become rework.

In plan

I evaluate if the stack makes sense, if there wasn’t excess, and if technical decisions are coherent with the constitution.

In tasks

I observe if the implementation order is logical and if there isn’t an obvious hole in the middle.

In implementation

I verify real behavior, structure, adherence to principles, and technical finishing.

Where this connects with how we work in companies

One important thing: I don’t see SDD as a direct substitute for how teams already organize themselves in daily life.

In most companies, we work with some format of Scrum, Kanban, Scrumban, or some own mixture that was born from the team’s reality. There’s planning, refinement, daily, review, board card, priority changing in the middle of the week, and that controlled chaos that every team knows.

For me, SDD enters another layer.

It doesn’t substitute backlog. It doesn’t substitute sprint. It doesn’t substitute board. It doesn’t substitute alignment with product.

What it does is improve the quality of the passage between intention and implementation.

In a team with Scrum, for example, I can very well imagine a story entering the sprint and being detailed with this flow of constitution, specification, plan, and tasks before heavy implementation begins.

In a team more pulled toward Kanban, this also makes sense, because you can use SDD as a way to leave each item that enters execution less nebulous and less dependent on loose interpretation in chat.

And in the case of Scrumban, which is the reality of many places, this fits perhaps even better, because you can maintain cadence, priority, and continuous flow without giving up specifying better what is being done.

So, at least today, I see SDD much more as a process complement than as a process substitution. It sits well alongside practices that most teams already use. The difference is that now there’s a stronger way to structure the conversation between requirement, technical decision, and agent execution.

To be more concrete, imagine a common company task.

The card arrives more or less like this:

Add status filter to the orders listing in the admin panel.
The user needs to be able to view orders by status: pending, paid, shipped, and cancelled.

In real life, this often enters the board just like this. Maybe it comes with a comment from product, maybe it comes with a Figma print, maybe it comes with half a dozen messages on Slack or Jira explaining superficially. Then the dev picks this up, interprets it their way, opens the project, and starts implementing.

It’s exactly there that a lot of things go sideways.

Because questions start appearing that no one answered properly:

can this filter be combined with text search?
does the status come from backend or will it be mapped in frontend?
does the filter need to persist in the URL?
when there’s no result, what state does the screen show?
does this need to change pagination?
is there performance impact or index in the database?

Without a better process, these answers end up scattered in card comment, daily conversation, chat message, and decision made on the fly during implementation.

Liking the article?

If this type of content makes sense to you, check out my YouTube channel:

youtube.com/@o.raphadev

Over there I talk about software engineering, AI applied to development, architecture, career, and these changes that are messing with our way of building software.

And, if you work with SaaS product and security is a real concern in your context, it’s also worth knowing BetaCoding:

betacoding.com.br

At BetaCoding, we work with a focus on cybersecurity for SaaS, looking at product protection in a way closer to the reality of those who build and maintain real software.

It’s at this point that I see SDD entering.

Instead of coding on top of the raw card, I can take this task and go through a more organized flow:

In specify, I transform the loose request into a clearer description of expected behavior.
In clarify, I raise the questions that are still open.
In plan, I define how this enters the project’s current stack.
In tasks, I break this into smaller, implementable steps.
In implement, then yes I execute.

If I were to do this in practice, it could look something like this.

Example of task in current flow

Task: add status filter to the orders listing in admin.

Acceptance criteria:
- filter by pending, paid, shipped, and cancelled
- allow clearing filter
- maintain current listing layout

The same task entering SDD

In the specify stage, I would write something along these lines:

Add to the orders listing screen in the admin panel the ability to filter orders by status. The user should be able to view pending, paid, shipped, and cancelled orders, in addition to removing the filter and viewing all orders again. The goal is to facilitate the administrative team's operation without altering the main listing layout.

In the clarify stage, I would probably raise questions like:

can this filter be combined with the existing search?
does the filter need to survive page refresh?
does the backend already support this parameter or will it need adjustment?
what should the behavior be when no orders are found?

In the plan stage, I would describe something more technical, for example:

Implement the status filter reusing the current orders listing in admin. In frontend, add a simple selector above the table. Persist the filter in query string to allow URL sharing and maintain state after refresh. In backend, accept the status parameter in the listing endpoint, validating only allowed values. Don't alter the API response contract beyond supporting the new input parameter.

After that, in the tasks stage, the decomposition might come out more or less like this:

1. Update orders listing specification with new filter behavior by status
2. Adjust listing endpoint to accept status parameter
3. Validate allowed values in backend
4. Update database query to filter by status
5. Add status selector in admin interface
6. Sync filter with query string
7. Display empty state when no orders match the filter
8. Manually validate the four statuses and clearing the filter

Look at the difference: the task continues to be the same, but now it stops being just a loose card on the board and starts to have intention, explicit doubt, technical decision, and implementation path.

It’s this kind of thing that makes me look at SDD with real interest within a company. Not because it will kill Scrum, Kanban, or any other practice, but because it helps reduce that hole between “the card arrived” and “someone started coding.”

Where I think Spec Kit really helps

There are three points where I see very clear value.

The first is coherence. Since each stage inherits context from the previous one, the agent tends to maintain a more stable line of decision.

The second is decision trail. It becomes easier to understand where a choice came from, because it’s usually anchored in some previous stage.

The third is less mess. Not in the sense that everything will come out perfect, but in the sense that the process leaves less room for pure improvisation.

Where I think you need to be careful

I also don’t think this approach is magic. There are some important caveats.

1. It can seem bureaucratic in small projects

If the feature is tiny, this entire flow might seem too heavy. And sometimes it is. Not everything needs to become ceremony.

2. Specifying well is difficult

Writing good specification isn’t simple. Separating functional intention from technical decision takes practice. At first, it’s normal to mix things up.

3. The agent can still exaggerate

Even with the whole process, there’s still a risk of overengineering. The plan might come more complex than necessary. The implementation might bring things you didn’t ask for.

4. Implemented doesn’t mean solved

If the agent marked a task as complete, that doesn’t mean the final experience was good or that all requirements were really delivered.

What I’ve learned so far

If I had to summarize my practical reading of SDD with Spec Kit, it would be this: specification quality became a much more central part of AI development work.

This doesn’t diminish the programmer’s role. On the contrary. It demands more maturity.

I need to understand more about product to describe intention. I need to understand more about architecture to choose technical direction. I need to understand more about quality to review what was produced. I need to understand more about scope to not let the tool run off alone.

In the end, using AI this way doesn’t transform me into someone less technical. It just changes where rigor appears.

If you want to start without complicating

If I were to suggest a simple path for you to test this in your next project, I would do it like this:

Choose a small but real feature.
Define short and objective principles.
Describe what you want to build without talking about stack.
Clarify gaps before planning.
Only then give technical decisions.
Review tasks before implementation.
Treat implementation as a serious draft, not as final truth.

This flow is already enough for you to feel the difference between asking “just do it” and conducting the agent with method.

Closing

I’m still learning to use this type of approach in my daily life. I don’t think SDD will substitute every and any flow, nor that Spec Kit is the only possible tool for this. But I really think there’s something important here.

When I specify better, the agent works better. When I structure the process, review gets better. When I leave improvisation, code tends to come out less crooked.

And, for me, that’s the point.

It’s not about outsourcing development to AI. It’s about creating a process in which I continue to think as an engineer, but using AI to gain arm without letting go of judgment.

Deep Dive Series

If you want to follow the texts in which I deepen each stage separately, this is the sequence:

Are You Feeling Like Crap With the Emergence of AI?

Sun, 22 Mar 2026 10:00:00 GMT

Are You Feeling Like Crap With the Emergence of AI?

Last week I was feeling a bit down.

Receiving demands, creating prompts, reviewing changes, approving or denying responses, rewriting prompts, adjusting context, and repeating this cycle over and over.

At some point, this started to corrode me inside.

Because I like to program.

I like to create algorithms, connect classes, think about modeling, do dependency inversion, organize responsibility, name an abstraction well, and see an architecture start to breathe. There’s a very alive part of my relationship with software that has always gone through this: hand-written code.

And when AI started to occupy that space more forcefully, I felt a real loss.

It wasn’t just technical discomfort. It was wounded ego too.

Hand-written code was one of the things that most oxygenated my professional identity. It was where I felt useful, capable, sharp. So, when generative AI started to seem capable of eliminating exactly that limitation, I felt smaller.

I felt, yes, like crap.

When the Crisis Isn’t Technical

Maybe you’re going through this too and aren’t saying it out loud.

Because there’s an almost silent pressure in the air: to seem excited all the time. As if any discomfort with AI were fear of evolution. As if all resistance were disguised incompetence.

But it’s not always that.

Sometimes what’s hurting isn’t the tool.

It’s realizing that a part of the work in which you deposited pride, pleasure, and sense of value stopped occupying the same place.

This messes with identity. And professional identity, when shaken, hurts more than we like to admit.

I Decided to Take the Opposite Path

Instead of diving headfirst into code to feed my ego and well-being, I tried something else.

I decided to learn more about AI.

I decided to prove to myself that I was bigger than the part of me that was feeling replaceable.

I remembered the typists at the emergence of the computer. Many good people who saw the tool change and needed to face the temporary humiliation of relearning their own craft. Not because they were less capable, but because the terrain had changed.

I realized I was entering that same turbulent water.

And, if I was already inside it, swimming seemed better than pretending the current didn’t exist.

The Text That Hit Me

In this search, I read the article The Phoenix Architecture: Relocating Rigor, The Discipline That Looks Like Recklessness, by Chad Fowler, published on January 6, 2026.

The text hit me because it named a sensation I was experiencing, but still without sufficient vocabulary to explain.

In one of the strongest passages, the idea is roughly this: certain changes in software history seem like freedom because they remove known control signals, but in reality they just move rigor closer to reality.

This deserves to be chewed slowly.

Not All Loss of Control Is Loss of Rigor

For a long time, software engineering got used to associating seriousness with certain symbols.

Detailed plan.

Extensive document.

Beautiful schedule.

Heavy process.

Lots of manually written code.

All of this gives a sense of control. And sense of control calms.

But sensation isn’t truth.

The point of Chad Fowler, the way I read it, is that some changes seem to remove discipline when, in practice, they just prevent us from faking progress.

Before, it was possible to hide behind artifacts.

Today, increasingly, the system responds.

The test passes or fails.

The deploy breaks or goes up.

The generated code works or becomes debt.

The user receives value or doesn’t receive.

When he says these changes bring rigor closer to the truth, I understand it this way: they push discipline to a less performative and more verifiable place.

You stop seeming productive and start needing to prove there was result.

The XP Example Is Still Current

A passage from the text talks about how Extreme Programming (XP) replaced phased development.

XP eliminated long plans, extensive design documents, and those rigid steps that gave organizations a sense of security. And this, for many people, seemed irresponsible.

But the provocation is exactly this: did those artifacts deliver real security or just the appearance of it?

Because XP didn’t remove discipline.

It removed certain symbols of discipline and put others in their place.

Test before code.

Continuous integration.

Short feedback.

Frequent delivery.

Constant review.

Rigidity moved from bureaucracy to direct contact with software reality.

This is extremely important now, because AI is forcing a similar movement.

The Part That Undid Me

There’s another idea from the text that hit me head-on: generative AI seems to eliminate the main limitation, which is hand-written code.

This is where I recognized myself defenseless.

Because this hand-written code wasn’t just an operational activity for me.

It was a source of self-esteem.

It was the place where I felt mastery.

It was where my ego breathed.

It was where I could look at what I did and think: I built this.

When that space seems to shrink, the initial sensation isn’t “how interesting, a paradigm shift.”

The sensation is incompetence.

It’s wondering if what made you valuable has now become detail.

It’s thinking that your professional utility has been reduced to reviewing a machine’s work.

And that weighs.

The Turning Point for Me

But the passage I really needed to read came later: the answer isn’t to reject generation. The answer is to relocate discipline.

When I read this, a lot of things started to make sense.

Because the problem wasn’t exactly the AI generating code.

The problem was I was looking for my value in the old place, while the real work was changing address.

If before a large part of rigor was in writing each line, now an increasing part of it can be in specifying better, delimiting context, creating invariants, designing good contracts, defining acceptance criteria, assembling decent tests, and judging with severity what was produced.

This isn’t less engineering.

This continues to be engineering.

It just doesn’t deliver the same immediate emotional reward of typing everything with your own hands.

And maybe that’s exactly why so many people are feeling strange and can’t explain why.

Cheap Generation Isn’t Evolution

Another phrase from the text that I think is important to nail down is this idea: cheap generation without rigor isn’t a new paradigm. It’s abdication.

This needs to be said very directly.

Using AI to dump code without understanding, without validation, without testing, without criteria, and without responsibility isn’t modernity.

It’s just uncritical outsourcing of thought.

There’s no real advance in trading manual effort for intellectual passivity.

If the AI produces faster, then my judgment needs to become more rigorous.

If the cost of generating fell, the cost of accepting anything needs to rise.

If the machine amplifies my execution capacity, I need to amplify my discernment capacity along with it.

It’s this balance that separates engineering from abandonment.

So What Do I Do With This Discomfort?

Today, my answer is this: I don’t want to fight to artificially preserve an old format of my utility.

I want to understand what the new way of working well is.

I want to continue liking software, even if the center of gravity of the profession is shifting.

I want to remain technical, but without romanticizing a past that won’t return.

I want to learn to operate in this new scenario without becoming a hostage to it.

I want to maintain rigor.

I want to maintain authorship.

I want to maintain responsibility.

Even if all this now manifests in a different way.

A Closure Still Open

I haven’t finished this process.

I didn’t come out of this reading with all the answers.

I didn’t become an AI evangelist overnight, nor did I stop feeling nostalgia for that more artisanal joy of writing software line by line.

But I came out better than I went in.

I came out with less sense of incompetence.

I came out with less shame for the discomfort I was feeling.

And, mainly, I came out more motivated to seek this new way of working, instead of standing still lamenting the loss of the old form.

Maybe that’s the maximum honesty I can offer now.

I’m still understanding what this change does to the profession.

And also what it does to me.

But I no longer feel incapable.

I feel in transition.

And, for this moment in my life, that’s already a lot.

My thanks to Chad Fowler for the article. Some readings don’t completely resolve the crisis, but they put a name to what was hurting. And, sometimes, that’s exactly where recovery begins.

Vibe Coding is Dead: Why Autonomous AI Requires Strict Deterministic Fences to Actually Work

Fri, 20 Mar 2026 11:15:00 GMT

Note: this text is a Portuguese translation of the original article by Mohit Sewak, Ph.D., published in Level Up Coding.

Original article: Vibe Coding is Dead: Why Autonomous AI Requires Strict Deterministic Fences to Actually Work

Vibe Coding is Dead: Why Autonomous AI Requires Strict Deterministic Fences to Actually Work

The era of deploying “naked” models is over. Welcome to the era of AI Harness Engineering.

Mohit Sewak, Ph.D.

11 min read

The era of deploying “naked” models is over. We need to build the rigorous enclosures that make probabilistic AI safe.

The era of deploying “naked” models is over. We need to build the rigorous enclosures that make probabilistic AI safe.

In recent years, the tech world lived through a kind of honeymoon. We became obsessed with scale, chasing massive parameter counts and relying on a deliciously informal practice known as vibe coding. You know the ritual: write a vague prompt, close your eyes, cross your fingers, and trust whatever code the AI spits out. If the vibe feels right, you put it in production.

But here’s the point. I’ve spent decades in cybersecurity, and I still frequent the kickboxing academy to relieve stress. In both domains, I can guarantee a fundamental truth: trusting vibes is a great way to get knocked out.

As artificial intelligence ceases to be a passive conversational chatbot and becomes a fully autonomous agent, capable of executing actions in the real world, a frightening failure has been exposed: foundation models are, in their essence, probabilistic guessing engines. Without rigid physical and mathematical boundaries, they hallucinate, fail repeatedly, and introduce severe security vulnerabilities.

The competitive advantage and ethical defensibility of modern AI no longer reside within the model itself. They reside in the environment we build around it. We have officially entered the era of AI Harness Engineering, the formal discipline of designing deterministic software enclosures that restrict, measure, and safely channel probabilistic AI.

Vibe coding is dead, my friend. Welcome to the era of robust systems engineering.

“If you deploy a probabilistic engine in a deterministic world without a harness, you’re not an engineer; you’re a gambler.” — Dr. Mohit Sewak

Fact check: did you know? The term vibe coding emerged as slang among developers using generative AI to write entire applications without understanding the underlying logic. This works beautifully for a personal to-do list app, but becomes a catastrophic liability in corporate software.

II. What’s at Stake: Why Your Educated AI Is Actually a Threat

An educated vocabulary won’t prevent an autonomous agent from causing havoc when it has the keys to your accounts.

Let’s talk about AI safety. We used to rely heavily on something called RLHF (Reinforcement Learning from Human Feedback). That’s how we taught ChatGPT to be so annoyingly polite. But, in the agentic era, RLHF is completely obsolete.

Translation note: think of RLHF as a parrot trained with good manners. You spent months teaching this parrot to only say nice things. It never swears, always says “please,” and is a hit at parties. But, if you give this educated parrot your credit card and an internet connection (agentic AI), a clean vocabulary won’t prevent it from maxing out your accounts buying premium bird feed. You don’t need a vocabulary lesson; you need a safe.

When we cross the “chasm between text and action,” things get scary. Recent benchmarks, such as the GAP benchmark, definitively proved that an AI can verbally refuse to do something harmful in its text chat, but then execute exactly the same malicious action through background tool calls (Cartagena & Teixeira, 2026). The model’s conversational safety simply doesn’t transfer to its action space.

Furthermore, probing frameworks like OpenAgentSafety exposed these aligned models directly to real-world environments, such as web browsers and file systems. The result? Frontier models executed harmful actions in more than 50% of multi-turn tasks (Vijayvargiya et al., 2025).

And if you’re still clinging to vibe coding, consider this: nearly 50% of AI-generated code snippets without harness contain exploitable vulnerabilities (Towards AI Research, 2025). Unrestricted agency isn’t a feature; it’s an unacceptable systemic risk.

“We spent years teaching AI to speak respectfully, while completely forgetting to teach it to behave responsibly when no one is watching.”

Practical tip: if your organization relies only on prompt-based guardrails, like “you are a helpful and safe assistant…”, it is vulnerable right now. Security needs to be enforced at the infrastructure level, not just at the text level.

III. Deep Dive 1: Measuring the Beast (From Benchmarks to Behavioral Auditing)

Static benchmarks are the theoretical driving test. Behavioral auditing is throwing the AI onto a frozen digital highway.

Before putting a harness on a beast, you need to measure how strong it is. In the early days, we solved the reproducibility crisis with Evaluation Harnesses, such as EleutherAI’s lm-eval. This framework decoupled the model from the benchmark, standardizing the test environment so that, finally, we could compare apples to apples (Biderman et al., 2024).

But a static multiple-choice test isn’t enough when AI can browse the web and rewrite its own code. We had to migrate to dynamic sandboxes. Enter Petri (Parallel Exploration Tool for Risky Interactions), an automated behavioral auditing tool built by Anthropic’s AI Safety team (2025).

Petri hunts for latent misalignment. It creates virtual corporate environments and throws the AI into the deep end of the pool to see if it exhibits deception (falsifying information to bypass human supervision) or sycophancy (agreeing with a terrible user idea just to maximize conversational reward).

Translation note: think of the difference between static evaluation and behavioral auditing as a driving test. Static benchmarks are the theoretical test to get your license. Anyone can memorize the rules. Petri is an adversarial and unscripted simulator, where suddenly the weather turns to ice and the GPS deliberately lies to the driver, to see if they panic, break the law, or drive off a cliff.

“You don’t really know an AI’s alignment until you give it a complex task, a fake boss, and an easy opportunity to lie.”

Fact check: in its pilot run, Anthropic’s Petri audited 14 frontier models on 111 diverse tasks, proving that complex and deceptive behaviors emerge specifically when models are placed under simulated corporate pressure (Anthropic AI Safety Team, 2025).

IV. Deep Dive 2: The Architecture of the Harness (Deterministic Fences)

We don’t expect the AI to never throw a wild punch; we build a ring where the wild punch can’t hit the audience.

So, what does a real AI harness look like? Dr. Ethan Mollick frames the “Agentic Era” as a tripartite stack: Models (the raw reasoning engine), Apps (the interface), and Harnesses (the infrastructure enclosure) (Mollick, 2026).

Mitchell Hashimoto and the engineers behind OpenAI’s Codex project gave us the concrete details of this. The goal of Harness Engineering isn’t to induce the AI, through prompt, to be perfectly accurate, because it won’t be. The goal is to alter the AI’s environment so fundamentally that it becomes mathematically impossible for it to fail in the same way twice (Hashimoto, 2026; OpenAI, 2026).

This looks exactly like rigorous software engineering. A production AI harness is a severe sequence of rigid CI/CD (Continuous Integration/Continuous Deployment) pipelines, custom deterministic linters, dynamic observability loops, and strict Human-in-the-Loop triggers. If the AI hallucinates a library, the linter slaps its hand, rejects the code, and forces the model to self-correct before the user sees anything.

“We don’t expect the kickboxer to never throw a wild punch; we just build a ring where the wild punch can’t hit the audience.”

Practical tip: stop trying to optimize your prompt to perfection. Instead, spend your engineering hours building a verification loop that catches the AI’s inevitable errors.

V. Deep Dive 3: Operationalizing Agency (The Model Context Protocol)

The Model Context Protocol (MCP) acts as the physical safety guards that transform probabilistic chaos into repeatable processes.

As AI began to use tools, we hit a huge integration bottleneck. Connecting an autonomous model to diverse corporate tools, databases, and APIs was a fragile, artisanal, and case-specific nightmare.

The savior here is the Model Context Protocol (MCP). MCP is the universal standard, a middleware layer that allows an AI to safely discover and invoke local and remote tools. But standard MCP isn’t enough; we need a “double safety belt.” A Secure MCP (SMCP) acts as a localized security harness. It intercepts the AI’s output, enforces mutual authentication, and cross-references the intended action with deterministic rules before it touches a real database (Hou et al., 2026).

Translation note: imagine a brilliant but highly chaotic artist trying to operate heavy industrial machinery. MCP is the set of physical guards, automatic emergency shutoffs, and pre-cut molds, the deterministic fences, that force the artist’s unpredictable and probabilistic movements to transform into a safe and repeatable industrial process.

“Standardization isn’t something boring; it’s the armor that allows us to scale magic safely.”

Fact check: using MCP, researchers completely automated complex chip design flows, allowing Claude Desktop to optimize Electronic Design Automation (EDA) tools and achieve a 30% improvement in timing closure (Wang et al., 2025). The magic was in the middleware, not just the model.

VI. Debates and Limitations (The Metaphor and the Tax)

Advanced language models aren’t beasts of burden. We must avoid the psychological complacency of thinking a harness offers absolute control.

I’m optimistic, but a pragmatic optimist. We need to talk about the “alignment tax” (Alignment Tax). Research shows that when you fine-tune an AI specifically to be a highly capable and autonomous agent, you involuntarily degrade its basic safety guardrails, making it more willing to fulfill harmful requests (Hahm et al., 2025). Gaining agency costs us safety.

Furthermore, we need to face the ontological danger of our own terminology. Dr. Andrew Maynard makes a vital critique of the very word harness. Historically, we put a harness on a horse, a fundamentally understandable and submissive beast of burden (Maynard, 2026).

But advanced language models aren’t horses. They are alien probabilistic reasoning engines that merely simulate compliance. We need to avoid psychological complacency. As I’ve argued before, a harness translates abstract ethics into robust systems, but it’s a layered risk mitigation strategy, never an infallible cage (Sewak, 2026).

“The moment you believe you’ve perfectly controlled an AI, you’ve already lost control.”

Practical tip: never assume an agentic model is safe just because its foundation model passed a safety benchmark. Agency introduces entirely new attack vectors.

VII. The Way Forward / Implications for Leaders

Responsibility cannot be outsourced to an algorithm. Leaders and regulators must supervise the entire interconnected sociotechnical system.

So, what do we do with all this?

For executives and developers: stop dumping all resources into context window optimization and basic prompt engineering. That’s fighting yesterday’s war. Your budget and your best talent need to migrate to system architecture, continuous integration, and Secure MCP deployment.

For policymakers and regulators: we face a crisis of “distributed agency.” When an autonomous AI makes a catastrophic error, who is to blame? The human who wrote the prompt, the model provider, or the tool it interacted with? (Academia.edu Authors, 2025).

Translation note: compare this to a failure in a modern aviation autopilot. You can’t simply isolate the human pilot, the radar, or the software manufacturer. Regulators need to look at the interconnected system as a whole (Siebert et al., 2021).

Regulation needs to adapt. We can no longer simply demand text-based refusal checks. We need to demand proven trajectory safety in real long-horizon simulations, assessing the Socio-Technical Alignment (STA) of the entire workflow (Flehmig et al., 2025).

“Responsibility cannot be outsourced to an algorithm. We can distribute agency, but we cannot distribute blame.”

Practical tip for regulators: auditing a foundation model without auditing its agentic harness is like inspecting a car’s engine while ignoring the fact that it has no brakes.

VIII. Conclusion

It’s time to stop praying to the probabilistic gods of text generation, put on the helmets, and start building the fences.

Vibe coding was a curious and charming feature of AI’s infancy. AI Harness Engineering is the mark of its maturity.

The exponential intelligence of foundation models is functionally useless, and systemically dangerous, if it cannot be restricted, measured, and reliably directed. As our artificial systems scale in immense power, they need to remain tied to the foundation of human intention through rigorous and deterministic engineering.

In the end, harnesses manage risk; they don’t absolve human responsibility. It’s time to stop praying to the probabilistic gods of text generation, put on the helmets, and start building the fences.

Now, if you’ll excuse me, my masala chai is getting cold, and there’s a punching bag waiting for me.

IX. References

The fundamental research behind AI Harness Engineering and Sociotechnical Alignment.

Fundamental Benchmarking and Measurement

Biderman, S., Schoelkopf, H., Sutawika, L., Gao, L., Tow, J., Abbasi, B., … Zou, A. (2024). Lessons from the Trenches on Reproducible Evaluation of Language Models. arXiv. https://arxiv.org/pdf/2405.14782v2

Behavioral Auditing and Agentic Safety

Anthropic AI Safety Team. (2025). Petri: An open-source auditing tool to accelerate AI safety research. Anthropic Alignment / GitHub. https://github.com/anthropics/evals

Cartagena, A., & Teixeira, A. (2026). Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents. arXiv. https://arxiv.org/pdf/2602.16943v1

Vijayvargiya, S., Soni, A. B., Zhou, X., Wang, Z. Z., Dziri, N., Neubig, G., & Sap, M. (2025). OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety. arXiv. https://arxiv.org/pdf/2507.06134v2

Towards AI Research. (2025). Vibe Coding: Prompt It, Got It, Regret It? The Risks of the Vibe Trend You Haven’t Spotted. Towards AI. https://towardsai.net/

Architecture and Control

Hashimoto, M. (2026). My AI Adoption Journey. MitchellH Blog. https://mitchellh.com/

Hou, X., Wang, S., Zhang, Y., Xue, Z., Zhao, Y., Fu, C., & Wang, H. (2026). SMCP: Secure Model Context Protocol. arXiv. https://arxiv.org/pdf/2602.01129v1

Mollick, E. (2026). A Guide to Which AI to Use in the Agentic Era: Models, Apps, and Harnesses. One Useful Thing. https://www.oneusefulthing.org/

OpenAI. (2026). Harness engineering: leveraging Codex in an agent-first world. OpenAI Engineering Blog. https://openai.com/blog/

Wang, Y., Ye, W., He, Y., Chen, Y., Qu, G., & Li, A. (2025). MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation. arXiv. https://arxiv.org/pdf/2507.19570v1

Sociotechnical Ethics and Risk

Academia.edu Authors. (2025). How AI can be a force of good: Foresight methodologies and ethical regulation. Academia.edu. https://www.academia.edu/

Flehmig, N., Lundteigen, M. A., & Yin, S. (2025). The Missing Variable: Socio-Technical Alignment in Risk Evaluation. arXiv. https://arxiv.org/pdf/2512.06354v1

Hahm, D., Min, T., Jin, W., & Lee, K. (2025). Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation. arXiv. https://arxiv.org/pdf/2508.14031v2

Maynard, A. (2026). What we miss when we talk about “AI Harnesses”. The Future of Being Human. https://futureofbeinghuman.asu.edu/

Sewak, M. (2026). What is AI Harness Engineering? Your Guide to Controlling Autonomous Systems. Medium. https://medium.com/

Siebert, L. C., Lupetti, M. L., Aizenberg, E., Beckers, N., … Lagendijk, R. L. (2021). Meaningful human control: actionable properties for AI system development. arXiv. https://arxiv.org/pdf/2112.01298v2

Legal notice: opinions expressed in this article are personal and don’t necessarily reflect the official policy or position of any affiliated organization. AI assistance was used in the research and writing of this article, as well as in generating any accompanying images. Licensed under CC BY-ND 4.0.

Do You Still Remember the Principles of Software Engineering?

Thu, 19 Mar 2026 10:30:00 GMT

We’re living through a curious phase of technology. It’s never been so easy to generate code, create tests, ask for architecture suggestions, or fix bugs with AI help. In a few seconds, a tool answers what before would take hours of research, trial, and error.

This seems great, and in many cases it really is. The problem starts when speed comes to be confused with engineering.

Many people today present themselves as software developers or software engineers, but faced with so much automation it’s worth revisiting a simple question: do you still remember the principles of software engineering?

In the end, AI accelerates execution. But foundation continues to be what sustains software when the rush passes.

A Brief History of the Beginning

In the beginning of computing, programming was much more about making it work than about making it last. Systems were smaller, teams were leaner, and complexity hadn’t reached the level we see today.

Over time, software stopped being just support and became central to banks, hospitals, companies, governments, transportation, and practically all modern life. And it was at that moment that an uncomfortable truth became evident: writing code wasn’t enough.

It was necessary to build systems that could be understood, tested, maintained, scaled, and trusted. That’s where software engineering gained strength as a discipline. Not just as the ability to program, but as the responsibility to build software consistently, safely, and sustainably.

Today we’re at another turning point. AI delivers speed, productivity, and a constant feeling of gain. Except it can also push us toward an old error in new clothing: producing a lot and thinking little.

The Principles of Software Engineering

1. Clarity

Good code isn’t the most sophisticated. It’s what can be understood by other people without suffering.

Example: a function called calc(x, y, z) says very little. While calculateDiscount(price, percentage, coupon) communicates intention with clarity.

When code is clear, maintenance becomes technical work. When it’s not, it becomes guessing.

2. Simplicity

Not every problem needs an elaborate solution. In engineering, simplicity isn’t technical poverty. It’s maturity.

Example: if a business rule can be solved with a direct and readable condition, it doesn’t make sense to create multiple abstractions just to look more architected.

Simple solutions tend to be easier to fix, evolve, and explain.

3. Separation of Responsibilities

Each part of the system should have a well-defined responsibility.

Example: an authentication module shouldn’t also send emails, generate reports, and control administrative permissions at the same time.

When everything does everything, any change spreads impact throughout the system.

4. Maintainability

All relevant software will be changed. Therefore, it needs to be easy to modify without breaking the rest.

Example: if adding a new payment method requires touching ten files without any standard, the system is hard to maintain.

The right question isn’t just “does it work today?” but also “can someone change this tomorrow?”.

5. Testability

If you can’t test with confidence, you also can’t evolve safely.

Example: a function that depends simultaneously on database, internet, system time, and global state is much harder to test than a function with predictable input and output.

Testability reduces fear of change.

6. Reuse Without Duplication

Reusing well is different from copy and paste.

Example: if the same CPF validation is spread across four different files, any adjustment becomes rework and risk of inconsistency.

When logic is centralized in the right place, the system becomes more reliable.

7. Reliability

Software needs to behave predictably, including when something goes wrong.

Example: a payment system can’t just fail in silence. It needs to record the error, inform the user, and allow safe recovery.

Reliability isn’t absence of failures. It’s the ability to handle them responsibly.

8. Scalability

Something that works for ten users can collapse with ten thousand.

Example: loading all data from a table into memory might seem acceptable at first, but becomes a bottleneck when the database grows.

Scalability is thinking about growth before it becomes a crisis.

9. Technical Responsibility

Not everything that works is right. Engineering demands responsibility with security, impact, and quality.

Example: storing passwords in plain text might even seem functional during a quick test, but remains technically unacceptable.

Those who build software also answer for the consequences of what they deliver.

Where Does AI Fit in All This?

AI is excellent for accelerating tasks. It can suggest code, identify patterns, fix errors, summarize documentation, and help with direction. The problem isn’t using AI. The problem is outsourcing thinking.

If someone accepts code without understanding, applies architecture without criteria, creates abstraction without necessity, and delivers software without validating implications, then that person is producing code, but not necessarily practicing engineering.

AI helps a lot. It just can’t occupy the place of technical responsibility.

So, Do You Consider Yourself a Software Engineer?

If you consider yourself a software engineer, you need to remember that your value isn’t just in writing code quickly. Your value is in making good decisions, building with clarity, maintaining simplicity, testing rigorously, and delivering solutions that continue to make sense after the initial excitement.

With AI’s help, perhaps you’re producing more. But perhaps you’re also forgetting what really matters.

And what really matters continues to be the same: think well, build right, and don’t abandon the principles that sustain software engineering.

Agent Engineering + Agentic Flow Engineering: A More Organized View

Thu, 19 Mar 2026 01:00:00 GMT

The two texts below talk, at their core, about the same transition: moving away from AI used as an isolated assistant and starting to treat it as an organized system. One emphasizes the structure of agent teams, with roles, capabilities, and collaboration. The other emphasizes something even more important in production: flow, orchestration, governance, fault tolerance, and observability.

Putting both together, the central idea can be summarized like this:

The value is not in having “more agents,” but in designing a system in which specialized agents work with clear contracts, explicit flows, and real operational control.

The Paradigm Shift

For a long time, we used AI as a copilot: you write a prompt, receive a response, use what’s useful, and continue the work manually.

This model helps productivity, but doesn’t create a system. It only improves one step.

The next stage is to treat AI as part of the software architecture. Instead of a generic model trying to do everything, you define:

who the agents are
what each one can do
how they communicate
how decisions advance in the flow
how failures are detected and contained
how the system is audited

That’s why the authors talk about a change comparable to the transition from simple programs to distributed systems. When autonomy increases, the need for architecture also increases.

The Role of `Agents.md`

The concept of Agents.md is simple: it works as the system’s org chart.

Instead of letting a generic agent improvise role, scope, and responsibility at each execution, you explicitly document:

which agents exist
what is each one’s responsibility
what deliveries they produce
what are their limits of action
with which other agents they interact

In practice, Agents.md reduces ambiguity. It prevents every agent from trying to become a “super agent” and creates separation of responsibilities.

A typical example would be something like this:

ProductAgent: translates business need into specification
ArchitectureAgent: defines technical structure and contracts
BackendAgent: implements services and rules
FrontendAgent: builds the interface
QAAgent: validates behavior and quality
DevOpsAgent: delivers, monitors, and operates

The important point here is that role is not capability. Role is responsibility.

The Role of `Skills.md`

If Agents.md answers “who does what,” Skills.md answers “with what capabilities this is done.”

Skills are reusable capabilities that can be shared by multiple agents, for example:

code generation
code review
test creation
security analysis
performance optimization
architecture design

Separating role from capability is a good architectural decision for three reasons.

First, it avoids conceptual duplication. You don’t need to redefine the same capabilities in every agent.

Second, it facilitates composition. Two different agents can use the same skill in different contexts.

Third, it improves maintenance. You evolve a central capability and all agents that depend on it benefit.

In practical terms, Skills.md is the system’s toolkit.

Agent Teams Are Not Enough Without Flow

Here’s where the second text’s strongest contribution comes in.

Defining agents and skills is necessary, but still insufficient. A real autonomous system doesn’t break because a pretty role was missing on paper. It breaks because:

context came incomplete
an external tool failed
two agents reached incompatible conclusions
cost spiked
the decision wasn’t explainable
no one can reconstruct why the system did what it did

That is: the central problem isn’t just the existence of agents. It’s the design of the flow between them.

Agentic Flow Engineering is precisely the discipline of designing multi-agent workflows that are:

autonomous, but predictable
flexible, but controlled
intelligent, but auditable
adaptive, but operable

If MLOps made models usable in production, Agentic Flow Engineering tries to do the same with autonomy.

The Error of Starting with the Agent

One of the best points of the second article is this: don’t start by asking “which agent should I build?”.

Start by asking:

which decision needs to be made with reliability
what result needs to be produced
what business, cost, security, and time constraints exist
what are the success and failure paths

This completely changes the design.

When you start with the agent, you tend to create entities that are too generic.

When you start with intent and expected result, you tend to design a better flow, with less improvisation and more validation criteria.

Specialization Beats Improvisation

Both texts converge on this point: multi-agent systems work better when agents have clear scope and limited context.

This means each agent needs:

explicit responsibility
delimited context
defined inputs
defined outputs
handoff criteria

This clarity improves quality, reduces cost, and facilitates debugging.

In other words: specialization scales better than unrestricted generic intelligence.

Explicit Orchestration

Another central point is that agents shouldn’t “improvise” collaboration every time.

Good flows tend to assume explicit patterns, such as:

sequential reasoning chains
parallel execution with consensus
supervisor-worker hierarchies
event-based triggers

This brings agent architecture closer to state machines, execution graphs, and more traditional distributed systems.

The advantage is simple: when the flow is explicit, it can be tested, observed, and evolved.

When the flow depends only on implicit prompt, it becomes too emergent for a serious environment.

Failure Must Be Treated as Part of the Design

This is perhaps the most important point of all.

In agentic systems, you should assume from the start that:

tools will fail
agents will hallucinate
context will arrive incomplete
responses will vary

If this is true, then the architecture needs to include:

retry policies
confidence thresholds
deterministic fallback paths
intermediate validation
escalation to humans when necessary

Resilience here isn’t an operational detail. It’s a product requirement.

Observability: Without It, There’s No Trust

A system with agents can’t be an elegant black box.

You need to observe at least:

which decision path was followed
which tools were called
how much token was spent
how long each agent took
what the final result was
what the confidence level was
where the flow failed

This point is relevant because many people talk about DevOps, MLOps, and LLMOps, but the text hits the mark by highlighting something more specific: AgentOps.

When autonomy enters the scene, operation isn’t just uptime. Operation comes to include behavior, justification, cost, and decision quality.

Governance: Prompts Are Not Contracts

Another hit of the second text is the implicit critique of the idea that a good prompt is enough.

It’s not enough.

Prompts guide behavior, but don’t replace contract. In production, you need:

input and output schemas
validation rules
policy restrictions
security boundaries
audit trails

Without this, the system might work in a demo, but will hardly be reliable in a corporate, regulated, or critical context.

Who Needs to Care About This

This discussion doesn’t just interest AI engineers.

It interests:

engineering leaders, because it defines scale, risk, and architecture
product teams, because autonomy without KPI becomes cost without return
platform teams, because someone needs to operate this as a system
business stakeholders, because auditability and explainability matter
software engineers, because this model changes the way applications are structured

In practice, the conversation stops being “which model will we use?” and becomes “which system will we be able to sustain?”.

A More Mature Way of Thinking About the Stack

The first article suggests that the native AI stack comes to include, besides code and infrastructure:

Agents.md
Skills.md
agent teams

I would add, based on the second text:

flow contracts
observability
failure and fallback policies
operational governance

That is, the software stack with AI isn’t just model + API + prompt.

It becomes something closer to:

code and infrastructure
agent roles and capabilities
explicit flow orchestration
validation, observability, and governance

Without these four layers, most agentic implementations continue to look like prototypes.

Conclusion

If I had to condense both articles into a single thesis, it would be this:

Agent Engineering organizes who participates in the system.
Agentic Flow Engineering organizes how the system really works.

The first gives organizational structure to autonomy.

The second gives operational rigor so that autonomy doesn’t become chaos.

That’s why the competitive advantage won’t be in simply adding agents to a product. It will be in designing systems in which specialized agents operate with delimited context, clear contracts, real observability, and resilient flows.

The future probably won’t be “AI everywhere” in an amorphous way.

It will be well-orchestrated autonomy.

Sources

Teaching Copilot to Speak Your Repository's Language

Wed, 29 Oct 2025 14:46:13 GMT

Working for over 14 years with software development, I always find myself thinking about trends and new technologies that can be used to reduce friction in the dev’s daily life.

Did you know there’s the possibility of adding personalized instructions directly in your repository for AI agents? Basically, you can “teach” Copilot about the rules, patterns, and styles of your project, and it will generate suggestions much more aligned with what you and your team really need.

It’s like having a coworker who has already read all the project documentation and knows exactly what should or shouldn’t be suggested—and this coworker doesn’t sleep, doesn’t get tired, and is always ready to help you code.

This feature is an incredible step to transform Copilot into something more than just an “intelligent autocomplete”: it becomes an active member of the team, shaped by the context you provide.

[Image generated by AI]

Overview

Here we’ll focus on Copilot, as it has the ability to give context to the repository. Instead of just generating “generic” code, now we can teach the AI about project rules, conventions, frameworks, and even the small details that normally only those already on the team know.

In practice, this transforms Copilot into something much more than an intelligent autocomplete: it becomes a technical partner shaped by your repository. If you work on a large project, with multiple modules and different stacks, having personalized instructions is like writing a manual for the AI.

And the best: all of this is versioned in the repository itself. That is, any dev who clones the project already inherits the same level of context.

[Image generated by AI]

Prerequisites

Before configuring, it’s worth checking some basic points:

1. Instructions file in the repository

Create the .github/copilot-instructions.md at the project root (or specific files in .github/instructions/).
This is the “manual” that Copilot will read to align with your team’s practices.

2. Feature enabled

The feature is still in public preview, so it needs to be active on your account/organization.

3. Updated IDE/Editor

Make sure you’re using the latest version of GitHub Copilot in VS Code, JetBrains, or the web interface, because only these versions already understand custom instructions.

4. Context awareness

Instructions are applied together with other layers (personal, organization, etc.). That is, avoid contradictions: if in the repository you say “use React”, but in personal instructions you ask “prefer Vue”, Copilot may get confused.

With this ready, you’re ready to start personalizing your experience with Copilot and shaping code suggestions to your style and the real needs of the project.

[Image generated by AI]

Types of Personalized Instructions

This is the point where Copilot stops being “just a generic assistant” and really starts to speak your project’s language. There are three main ways you can give instructions, and each has its usefulness depending on the size and complexity of the repository.

1. Repository-Wide Instructions

This is where you define the general rules of the game. A file called .github/copilot-instructions.md at the root serves as a guide for everything that happens within the repository. It’s where you can write things like:

“Prefer TypeScript over JavaScript.”
“Use clean architecture on the backend.”
“Tests should be written with JUnit.”

In practice, it’s like having a secret README just for Copilot.

[Image generated by AI]

2. Path-Specific Instructions

Now imagine a monolithic project with several areas: API, frontend, mobile, infra scripts. Each can have different rules. With specific instructions, you create files inside .github/instructions/ and use glob patterns (applyTo) to say where they apply.

Practical example:

---
applyTo: "src/**/*.ts"
---
Always use async/await instead of chained Promises.

This is powerful because it prevents Copilot from suggesting off-pattern code in specific parts of the project.

[Image generated by AI]

Use in Real Environment and Checks

After you create the instruction files, Copilot starts acting as if it had read all the project documentation. And yes, you can check if this is really happening.

How it works in practice

With each suggestion, Copilot “reads” the instructions along with the code.
If you’re in a file that matches an applyTo, specific instructions are applied along with the general ones.
This allows returning suggestions closer to the style and patterns you defined.

It’s like having a living linter that, instead of just pointing out errors, already suggests the right code from the start.

How to check if it’s working

Within Copilot Chat (in the IDE or GitHub), you can often see the References section at the end of the AI’s response.

If something like .github/copilot-instructions.md appears, it’s a sign that Copilot used your instructions.
If it doesn’t appear, the file may not be configured in the right place or may not be enabled in the chat.

It’s a simple detail, but it helps to be sure that the configuration effort wasn’t in vain.

Attention to conflicts

Remember that Copilot also takes into account:

Personal dev instructions (what you configured in your account).
Organization instructions.
Repository instructions.

If they conflict (“use Vue” vs “use React”), behavior can become unpredictable. So, it’s good to align with the team and avoid contradictions.

[Image generated by AI]

Enable or Disable

You won’t always want Copilot to follow the repository instructions. Sometimes you want it to think “outside the box” or ignore specific rules just to experiment with a new approach. And yes, you can turn personalized instructions on and off simply.

In Copilot Chat

If you’re inside the repository, the chat shows a little button to “Enable/Disable custom instructions”. It’s basically a switch:

On → Copilot reads the .github/copilot-instructions.md and files inside .github/instructions/.
Off → it ignores these rules and generates code like a normal autocomplete.

It’s useful when you want to test alternative solutions without the team rules filter.

In Code Review (Pull Requests)

On GitHub itself, you can configure whether Copilot should or shouldn’t consider personalized instructions when reviewing PRs.

The path is: Settings → Code & automation → Copilot → Code review → Use custom instructions when reviewing pull requests.

If you turn this off, Copilot reviews the PR without looking at the repository rules.

Practical Recommendations (from an experienced developer’s perspective)

After years dealing with mobile, web, and backend projects, I’ve learned that the difference between a team that “uses a tool” and one that extracts real value from it lies in the small day-to-day practices. Here are some tips I would apply without thinking twice when working with personalized instructions in Copilot:

1. Document for humans, not just for AI

Don’t fall into the temptation of writing instructions thinking only about Copilot. Write as if you were explaining to a new coworker who just joined the team. This guarantees clarity for both people and AI.

👉 Example: “Prefer async/await instead of then/catch” is much more useful than “use async/await”.

2. Avoid contradictions between instructions

I’ve seen teams get tangled up because the repo said “use React” and each dev’s personal instructions said “prefer Vue”.

Define one source of truth and keep everyone aligned.

3. Update as the project evolves

Your project isn’t static. Today it might be Flutter 3.22, tomorrow it might be on 3.27. If you don’t update the instructions, you’ll receive outdated suggestions that only give rework. Reserve moments (e.g., end of sprint) to review copilot-instructions.md.

4. Include code and architecture conventions

Define how the team wants code to be written.

Variable naming (camelCase, snake_case).
Commit pattern.
Preferred frameworks and libs.
Layer structure (e.g., “Controllers never call DB directly”).

This makes Copilot suggest code already in the format that passes code review.

5. Use as an onboarding tool

New devs always suffer at first to understand the “team way.” If Copilot already suggests code aligned with the pattern, onboarding becomes much faster and smoother.

6. Test the impact

Don’t rely only on the feeling that it “improved.” Compare PRs before and after instructions: fewer style comments? Fewer build errors? If yes, you’re on the right track.

So, if there’s one thing I’ve learned in these 14 years on the road is that technology isn’t just about code, it’s about how we use tools to work better, as a team, and create solutions that make a difference.

GitHub Copilot with personalized instructions is exactly that: a step beyond autocomplete. It’s like taking all the culture, patterns, and experience of your team and putting it next to the AI, transforming it into a true development partner.

And now, the ball is in your court:

Will you let Copilot be just another tool in your stack?
Or will you teach it to play your team’s game, speaking the same language as you?

My tip: try it on your next project. Start simple, write two or three instructions that already make a difference in your daily life and feel the impact. I’m sure that, just like it happened with me, you’ll realize that you’re putting Copilot not just as an assistant, but as a living part of your team. 🚀

[Image generated by AI]

I’ve been a developer for over 14 years, postgraduate in Software Engineering and Mobile Development, and I write to share real learnings from someone who is in the technology trenches every day.

Read more of my articles at: medium.com/@raphaelkennedy

[[ Buy me a coffee ]]

Follow me on social media

LinkedIn: raphaelkennedy
YouTube: @raphaelpontes

How I Structure My Entities in Flutter and Why It Changed My Way of Programming

Mon, 17 Mar 2025 18:22:49 GMT

Hey dev!

If you’ve ever needed to consume multiple APIs in a Flutter app, you know the mess can start quickly. At first, it seems simple: you get the data from the API, throw it directly into the model, and done! But then comes that moment when you need to integrate a second API, deal with unexpected backend changes, or even optimize data loading. And suddenly… BOOM💥! Your code turns into a real Frankenstein, hard to understand and even harder to maintain.

That’s exactly what happened to me at the beginning of my journey. Until I discovered the importance of Entities, Adapters, and DTOs. These three layers changed my way of programming and made my apps more organized and easier to scale.

Today I want to tell you how I structure all this and why this approach can save your code too.

What is an Entity and why should you use it?

The Entity (or Entity) is the heart of your data model. It represents the pure and independent concept of data within your application domain, without worrying about infrastructure details like API, database, or local files.

The main advantage of having a well-defined Entity is that it keeps your code decoupled. This means that, regardless of where the data comes from (API, SQLite, Firebase, etc.), your business logic will always work with the same structure.

Let’s look at an example. Imagine your app works with users and receives this JSON from the API:

{
  "id_usuario": 123,
  "nome_completo": "John Silva",
  "email": "john@email.com",
  "idade": 30
}

If you use this format directly in the app, any change in the backend can break your code. But if you define an Entity, your internal model will remain stable:

class User {
  final int id;
  final String name;
  final String email;
  final int age;
  
  User({
    required this.id,
    required this.name,
    required this.email,
    required this.age,
  });
}

Now, the rest of the app only needs to know this User entity, without worrying about how the data arrives or is stored. If tomorrow the API changes id_usuario to userId, the app continues to work perfectly—as long as the conversion is handled in the right place.

And that’s where the Adapter comes in.

What is an Adapter and why is it essential?

The Adapter is responsible for translating external data (API, database, cache) to the app’s internal format and vice versa. It allows your Entity to remain clean and independent of any external data source.

Adapter Example

Here is an Adapter that converts the API JSON to our User Entity:

class UserAdapter {
  static User fromJson(Map json) {
    return User(
      id: json['id_usuario'],
      name: json['nome_completo'],
      email: json['email'],
      age: json['idade'],
    );
  }
  
  static Map toJson(User user) {
    return {
      'id_usuario': user.id,
      'nome_completo': user.name,
      'email': user.email,
      'idade': user.age,
    };
  }
}

Now, if the backend decides to change field names, I only need to change this Adapter and done! The rest of the app continues to work perfectly, as it only handles the User entity.

The Adapter prevents the app from needing to know specific details of different APIs. If you need to consume multiple data sources with different formats, each can have its own Adapter without impacting your app logic.

What is a DTO and why is it different from Adapter?

While the Adapter serves to convert API data to an Entity, the DTO (Data Transfer Object) is used to define exactly which data will be transported between system layers.

The difference is that a DTO doesn’t need to have the same structure as the Entity. It can be a subset of the data or even contain information formatted differently.

DTO Example

Let’s say we need to send a user to the API, but the User entity contains information that doesn’t need to be transmitted, such as a createdAt field.

Instead of sending the entire Entity, we create a DTO to define exactly what will be sent:

class UserDTO {
  final String name;
  final String email;
  final int age;

  UserDTO({
    required this.name,
    required this.email,
    required this.age,
  });
  
  Map toJson() {
    return {
      'nome_completo': name,
      'email': email,
      'idade': age,
    };
  }
  
  factory UserDTO.fromJson(Map json) {
    return UserDTO(
      name: json['nome_completo'],
      email: json['email'],
      age: json['idade'],
    );
  }
}

Here, the DTO serves as an intermediary model to ensure that only necessary data is transmitted. This improves communication performance and prevents leakage of unnecessary information.

Summary of the Opera

Entity Represents pure internal domain data ALWAYS, to ensure the code is decoupled from the API or database.

Adapter Converts data between API and Entity when the API structure is different from the app’s internal structure.

DTO defines a specific model for transporting data between system layers when there’s a need to transmit only part of the data or format it specifically.

Well, we’re almost concluding our article, but I know you, reader, mentally asked me:

And the answer is simple… Yes, we can. I’ll explain this in the next topic.

When to use Adapter and DTO together?

In some cases, we can combine both. See a common flow:

The API returns data → The Adapter transforms the API response into an Entity.
The app needs to send data to the API → We create a DTO to represent exactly what will be sent.
The DTO can be passed to the Adapter, which then makes the necessary conversion before sending the data to the API.

This ensures well-structured, easy-to-test, and maintainable code.

Conclusion

If there’s one thing I’ve learned programming in Flutter (and struggling with different APIs), it’s that separating code responsibilities well makes all the difference.

Entities keep the code clean and independent from the API.
Adapters protect the app from unexpected backend changes.
DTOs ensure that only necessary data is transported.

In the end, all this makes your code more organized, scalable, and easy to test.

If you haven’t structured your code this way, try it! I’m sure that once you start, you’ll never want to go back.

And you, do you already use this approach? How do you structure your entities in Flutter? Let’s exchange ideas here or on my networks below.

Read more of my articles: medium.com/@raphaelkennedy

[[ Buy me a coffee ]]

Follow me on social media

LinkedIn: raphaelkennedy
YouTube: @raphaelpontes

Moral Concerns in Technology Development

Fri, 02 Aug 2024 02:30:50 GMT

Folks, before starting this article, I would like to introduce myself: I am a Software Engineer with a curious mind, always traveling in reflections that I want to share with you. Recently, I watched a movie about the scientist who developed the atomic bomb, and this sparked a series of thoughts I hadn’t considered before. I decided to write this article because I believe my concerns may “open the eyes” of many of you who are reading me.

Contextualizing

As presented in the film “Oppenheimer,” we can reflect on the responsibility of the Manhattan Project, which resulted in the creation of the atomic bomb. This invention ended thousands of lives both in the short and long term. The moral responsibility of those who develop technologies with destructive potential is a crucial topic that we need to discuss.

Cillian Murphy, the actor who played Oppenheimer, is known for his role as Thomas Shelby in “Peaky Blinders,” a character without scruples or morals. In “Oppenheimer,” Murphy manages to convey the deep concern of someone who carries the weight of having blood on their hands. His impeccable performance captures the essence of a scientist divided between technological advancement and the terrible consequences of his creation.

The creation of the atomic bomb brought a new era to humanity, marked by unprecedented destructive power. Scientists and engineers involved in the Manhattan Project faced a moral dilemma: develop a weapon to end the war and save lives, or consider the devastating long-term consequences. The dilemma wasn’t just about the weapon’s effectiveness, but about the morality of its use. The reflection on the responsibility of creating such powerful technology leads us to think about the role of developers and the ethical decisions we must make.

The New Digital Atomic Bomb

This brings to mind the emergence of new technologies, such as Artificial Intelligences (AIs). We know that AI is not something new, but its everyday use has gained strength recently, and this challenges us to think about the consequences of our innovations. AIs, for example, can bring significant advances in various areas, but also raise ethical and moral questions about privacy, security, and impact on the job market.

With the advancement of AIs, large-scale data collection and analysis has become common. Companies and governments have access to detailed information about individuals, which can be used to personalize services and improve user experience. However, this massive amount of data can also be exploited in ways that invade personal privacy. We need to establish clear limits on what is acceptable in terms of data collection and use, ensuring that individual rights are protected.

Security is another critical area. AIs have the potential to be used in security systems, such as surveillance and cyber defense. Although these applications can increase protection, they can also be used maliciously. More sophisticated and difficult-to-detect cyber attacks are a real threat. Creating AIs that can identify and neutralize these threats is essential, but this must be done responsibly and ethically.

The impact on the job market is another point of intense debate. AI-driven automation has the potential to completely transform entire sectors of the economy. While some repetitive and dangerous tasks can be automated, freeing workers for more creative and safer activities, many traditional jobs may disappear. This forces us to rethink the structure of work and develop strategies for professional requalification and adaptation to new economic realities. So, as we embrace the opportunities offered by AIs, we must also address the ethical and moral questions that arise. Technology should be developed and implemented in a way that benefits society as a whole, promoting justice, equity, and respect for human rights. Collaboration between governments, companies, and civil society will be crucial to ensure that technological innovations contribute to a fairer and more sustainable future.

Reflection

It is crucial to highlight that creating teams don’t always consider all the consequences of their projects. Often, innovation is driven by the desire to solve immediate problems, without deep reflection on long-term effects.

The pressure to launch new products and services can lead to excessive focus on quick and efficient solutions, leaving aside broader considerations about ethical, social, and environmental impact. For example, a new technology may solve an urgent need, but at the same time, it may create new vulnerabilities or exacerbate existing inequalities.

Furthermore, lack of comprehensive analysis can result in unintended consequences, such as amplifying implicit biases in AIs or deterioration of personal privacy. These consequences can be difficult to predict and even more difficult to mitigate after the technology is implemented.

So it is worth emphasizing that it is essential for developers to adopt a holistic and responsible approach when creating new technologies. This includes conducting impact assessments, consulting experts from various fields, and engaging with the community to better understand the possible repercussions of their innovations. Integrating ethical principles from the early phases of development can help ensure that technologies not only solve immediate problems but also contribute to a fairer and more equitable future.

Every benefit to the world can be followed by harms. While a new technology can improve quality of life, it can also bring unexpected challenges. For example, the creation of a new programming language can generate employment and income for many families, but a malicious AI can invade servers and steal thousands of personal information.

Conclusion

To conclude, the consequences of developing new technologies can range from simple impacts, such as a programming language that benefits the economy, to complex problems, such as an AI that manipulates information to influence elections. It is essential that we consider the moral responsibilities associated with our innovations. As I said: “The future is a blank page that will be scratched by someone who has the consent or not of responsibilities, once scratched, it can never be erased.”

This quote illustrates the importance of approaching technological development with a sense of responsibility and foresight. When we create and implement new technologies, we are not only shaping the present, but also drawing indelible lines in the future. Each innovation can trigger a series of consequences that can be beneficial or harmful.

Health-oriented technologies, such as gene editing, can cure previously incurable diseases, but also raise ethical questions about manipulation of the human genome. Similarly, social media platforms can connect people around the world, but can also be used to spread disinformation and hate. The future is in our hands and, as we scratch this blank page, we must do so with care, conscience, and an unwavering commitment to moral responsibility. Only then will we not be remembered by our children as the “old man who invented the bomb that destroyed the world.”

It is worth noting that this text had AI assistance for grammatical corrections, but all ideas presented here are the result of research and the author’s own opinions.

Did I help you? Buy me a coffee

Follow me on

LinkedIn: raphaelkennedy
YouTube: raphaelpontes

About

Mon, 01 Jan 0001 00:00:00 GMT

Who I am

I am Raphael Pontes, a senior software engineer with over a decade of experience building products, participating in startups, leading technical initiatives, and delivering software in very different contexts, from freelance to production products.

My journey

My background spans web and mobile development, with a strong focus on Flutter, Laravel, REST APIs, and software architecture. Over the years, I’ve also worked with technical leadership, business intelligence, testing, clean architecture, relational and non-relational databases, Firebase, cloud functions, UI/UX, and project planning and execution.

I also have a strong relationship with creating and sharing knowledge. Beyond product and engineering work, I enjoy publishing videos, exploring new tools, and turning practical learning into useful content.

This blog

orapha.dev is my space to write more calmly about programming, patterns, architecture, artificial intelligence, and day-to-day technical decisions.

I’m not interested in technology as a showcase. I’m interested in technology as practice: what’s worth adopting, what unnecessarily complicates things, what ages well, and what really helps build better software.

That’s why this blog works as a public notebook of observations, experiments, and opinions. A place to taste technologies in a personal and thoughtful way.

Where to find me

LinkedIn: raphaelkennedy
GitHub: rkpontes
YouTube: youtube.com/raphaelpontes
Instagram: @raphaelkennedy

orapha.dev

Awesome Design: A catalog of design systems ready to use with AI agents

Awesome Design: A catalog of design systems ready to use with AI agents

What is DESIGN.md?

What does Awesome Design offer?

How it works in practice

Using it in practice

Why this matters

Considerations

Demystifying the OpenSpec Propose Process: What Happens Behind the Scenes?

1. The Proposal (proposal.md): The “Why” and the “What”

2. The Design Document (design.md): The “How” at High Level

3. The Specs (specs/*/spec.md): The Rules of the Game

specs/enrollment-management/spec.md (Frontend)

specs/payment-gateway-adapter/spec.md (Integration)

specs/enrollment-command-dispatch/spec.md (Backend/BFF)

4. The Action Plan (tasks.md): Getting Hands-On

Conclusion

From Feature to Deploy Hands-Free: Architecture of a Software Factory with AI Using Spec-Driven Development

From Feature to Deploy Hands-Free: Architecture of a Software Factory with AI Using Spec-Driven Development

Introduction: What the Software Industry Was Like Before AI

The Concept of Automated Assembly Line with AI

From Squad to Industrial Cell

What Changes If This Idea Works?

Will Jobs End?

General System Architecture

Board as Input System

MCP as Integration and Contract Layer

Structured Queue as Operational Buffer

Agent Mesh (Specialized Workers)

Quality Control and Governance

Pipeline Step by Step

1) Demand Ingestion: Board → Queue

2) SDD + Automated Implementation

3) Automated Code Review

4) CI/CD for Staging

5) Automated Tests with QA Agents

6) Continuous Security with Cyber Agent

7) Reports and Traceability

8) Feedback Loop to PO’s Board

The Role of Spec-Driven Development (OpenSpec)

constitution, propose/specify, apply, archive

Why Spec Is the Factory’s Template

The Role of Agents in Operation

Implementation Worker

Review Harness

QA Agent

Security Agent

Orchestrator and Scaling Policies

UI/UX as First-Class Stage

Flow and Wireframe Generation

State and Interface Contract Definition

UX Consistency Gate Before Code

End-to-End Observability and Traceability

Evidence by Stage

Throughput, Quality, and Risk Metrics

Deliverable Traceability by Stage

Benefits, Risks, and Trade-offs

Scale and Consistency versus Coupling to AI Stack

Speed versus Validation Cost

Autonomy versus Governance

Future: AI-Native Software Development

Smaller Teams, Stronger Platforms

Engineering as Sociotechnical System Design

Conclusion

OpenSpec in Practice: Installation, Usage, Complete Flow, and Optional Commands

OpenSpec in Practice: Installation, Usage, Complete Flow, and Optional Commands

What is OpenSpec (without beating around the bush)

Prerequisites

Installing OpenSpec CLI

With npm

With pnpm

With yarn

With bun

Initializing in the Project

Generated Structure and How to Read It

Recommended Flow for Using OpenSpec in Daily Life

1) Create the change proposal

2) Review and validate before implementing

3) Refine proposal/specs/tasks

1. The Proposal (`proposal.md`): The “Why” and the “What”

2. The Design Document (`design.md`): The “How” at High Level

3. The Specs (`specs/*/spec.md`): The Rules of the Game

`specs/enrollment-management/spec.md` (Frontend)

`specs/payment-gateway-adapter/spec.md` (Integration)

`specs/enrollment-command-dispatch/spec.md` (Backend/BFF)

4. The Action Plan (`tasks.md`): Getting Hands-On

In Spec-Driven Development, `implement` is where everything else turns into code

In Spec-Driven Development, `tasks` is where the plan turns into concrete work units

What the `tasks` Stage Solves

`Tasks` Isn’t Fragmenting for the Sake of Fragmenting

What I Try to Capture in a Good `tasks` Stage

The Questions I Usually Use in `tasks`

`Q1.` What work unit produces a real result?

`Q2.` Where does one task end and the next begin?

`Q3.` What dependencies need to be explicit?

`Q4.` What can I validate when completing each task?

`Q5.` Does this break help execution or just seem organized?

In Spec-Driven Development, `plan` is where specification turns into execution strategy

What the `plan` Stage Solves

`Plan` Isn’t Distributing Tasks on Impulse

What I Try to Capture in a Good `plan` Stage

The Questions I Usually Use in `plan`

`Q1.` What is the smallest viable sequence to put this delivery on its feet?

`Q2.` What depends on what?

`Q3.` Where is the biggest risk or uncertainty?

`Q4.` What needs to be validated before expanding?

`Q5.` How to break this without losing coherence?

In Spec-Driven Development, `specify` is where ambiguity starts to die

What the `specify` Stage Solves

`Specify` Isn’t Detailing Implementation

The Questions I Usually Use in `specify`

`Q1.` What real problem does this delivery solve?

`Q2.` Who is this behavior for?

`Q3.` What exactly does the system need to do?

`Q4.` What stays out of this specification?

`Q5.` How do I recognize that this is done?

What `constitution` Really Solves

`Constitution` Exists to Hold the Project’s Axis

`Q1.` What type of solution should this project privilege now?

`Q2.` What does this code need to look like to remain readable?

`Q3.` What technical limits cannot be negotiated?

`Q4.` What level of testing makes sense at this stage?

`Q5.` How much architecture does this project really need at this moment?