Internal docs are where good engineering goes to die. Not because people do not care, but because docs have no CI, no reviewers, and no definition of done.

Notebook and pen on a desk

Same standards

If a function needs a test, a procedure needs a verification step. If code needs a review, a runbook needs a second pair of eyes. The content is different; the workflow should not be.

“If it’s not written down, it didn’t happen.” — Lab proverb

Treat that literally. Undocumented tribal knowledge is a single-resignation away from being lost.

Small and close

The best documentation lives next to the thing it describes. A README in the repo beats a Confluence page linked from a wiki linked from a Slack pin.

What belongs in-repo:

  1. README.md — how to run, test, and deploy
    • Prerequisites with version pins
    • Common failure modes and fixes
    • Link to deeper docs, not duplicate them
  2. docs/ — architecture decisions and runbooks
    • One topic per file
    • Frontmatter with lastVerified date
  3. Inline comments — only for non-obvious why
    • Not what the code does — that’s what the code is for
    • Link to the ADR when the reason is architectural

Proximity keeps docs honest — when the code changes, the diff is one tab away.

A runbook template

markdown
---
title: Restore database from backup
lastVerified: 2026-06-01
owner: platform-team
---

## When to use
- Primary DB unreachable for > 5 minutes
- Data corruption confirmed by on-call

## Steps
1. Confirm backup age: `bun run db:backup:list`
2. Stop write traffic: `kubectl scale deploy/api --replicas=0`
3. Restore: `bun run db:restore --date 2026-06-14`
4. Verify row counts match staging
5. Resume traffic and monitor error rate for 15 min

## Rollback
If restore fails, escalate to #incidents and page @platform-lead.

Copy-pasteable commands beat prose paragraphs every time.

Delete aggressively

Outdated docs are worse than no docs. They erode trust faster than a missing page. Add a lastVerified date to every runbook and treat anything older than 90 days as a bug.

The rule is simple: if you wouldn’t merge code without a test, don’t merge a runbook without a verification date.