The Signal
A solo developer built Pardonned.com — a searchable database of US presidential pardons — in their spare time. The entire stack: Playwright scrapes the DOJ website, SQLite stores the data locally, Astro 6 generates a static site directly from the database. All code is open source on GitHub. It hit 427 points on Hacker News front page. No backend. No cloud database. No monthly bill. Just a scraper, a flat file, and a static site generator.
Builder's Take
This is the platonic ideal of a one-person data product. Let 's break down the leverage math:
- Hosting cost: ~$0/month. Static site on Netl ify, Vercel, or Cloudflare Pages. No server. No database connection pool. No cold starts.
- Data freshness: Run the Playwright scraper on a cron job (GitHub Actions free tier = 2, 000 minutes/month). Re-generate the static site. Deploy. Done .
- Moat: Low, but that's fine. The value isn't the tech — it's the curation and search UX layered on top of a painful government interface. That's the actual product.
The insight here is the SQLite-as-build-artifact pattern . You don't need SQLite to run at query time. Ast ro reads it at build time, generates static HTML, and you never expose a database to the internet. Zero attack surface. Zero ops.
This pattern also means you can embed AI on top cheaply. Run a local embedding pass over your SQLite rows during the build step . Write the vectors to a JSON file or a second SQL ite table. Serve semantic search via a tiny edge function. Total marginal cost per query: near zero.
The cost/capability curve here is steep in your favor: government data is ugly, inaccessible, and nobody is going to clean it up. Every hour you spend scraping and normalizing is an hour a VC-backed team won't b other with because the TAM looks too small. That's your opening.
Tools & Stack
Scr aping: Playwright
Playwright is the right call for DOJ- style government sites — they often require JS rendering, session cookies, or pagination that curl won't handle.
- Install:
npm install playwrightorpip install playwright - Pricing: Free, open source (Microsoft). Self-hosted scraping = $0.
- Alternative: Puppeteer (Node only, smaller API surface ). For simpler HTML:
cheerio+axiosis faster and lighter.
// minimal playwright scrape
const { chromium } = require('playwright');
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://www.justice.gov/pardon/clemency -statistics');
const data = await page.evaluate(() => {
// extract table rows
return Array.from(document.querySelectorAll('table tr')).map(row =>
Array.from(row.querySelectorAll('td' )).map(td => td.innerText)
);
});
await browser.close();
Storage: SQLite
SQLite is crimin ally underused in indie projects. No server process. Single file. Queryable with standard SQL. Ships with Python stdlib (import sqlite3). For Node: use better-sqlite3 (synchron ous, fast).
- better-sqlite3:
npm install better-sqlite3 - Pricing: Free. The database is a
.dbfile sitting in your repo or on disk. - Tip : Commit the
.dbfile to Git for small datasets (& lt;100MB). For larger sets, generate it in CI. - Alternative: DuckDB if you need analytical queries over large CSVs. Turso if you need edge-distributed SQLite with a hosted option (free tier available — check current pricing).
Frontend: Astro 6
Astro's core trick: it runs your data fet ching and templating at build time, not request time. You import your SQL ite file in a .astro component, query it, render static HTML pages — one per pardon, one for search , whatever you need.
- Install:
npm create astro@latest - Version: Astro 6 (current stable as of this writing — verify at astro.build)
- Pricing: Free, open source. Deploy to Cloud flare Pages, Vercel, or Netlify free tiers.
- Killer feature for this pattern: Content Collections + dynamic route generation from database rows = 0 per-page runtime cost.
// src/pages/pardon/ [id].astro
---
import Database from 'better-sqlite3';
const db = new Database('pardons.db');
export function getStaticPaths() {
const pardons = db.prepare('SELECT * FROM pardons').all();
return pardons.map(p => ({ params: { id: p.id }, props: { pardon: p } }));
}
const { pardon } = Astro.props;
---
{pardon.name}
{ pardon.offense}
Automation: GitHub Actions
Schedule a weekly scrape + rebuild + deploy for free. Under 2,000 minutes/month on the free tier for public repos.
# .github/workflows/scrape-and-deploy.yml
on:
schedule:
- cron: '0 6 * * 1' # every Monday 6am UTC
jobs:
build:
runs-on: ubuntu -latest
steps:
- uses: actions/checkout@v4
- run: npm install
- run: node scrape.js
- run: npm run build
- uses: cloudflare/pages-action@v1 # or vercel deploy
Ship It This Week
Pick a painful government or institutional dataset in your country or niche. Build the searchable version that doesn't exist yet.
Concrete idea you can start today: FDA drug approval database with semantic search. The FDA publishes approval data as ugly HTML tables. Scrape it with Playwright. Store in SQLite. Add an AI twist: during build, run each drug's indication text through OpenAI embeddings (or a local model via Ollama — free). Store vectors in SQLite using the sqlite-vss extension. Add a search box that hits a Cloudflare Worker for semantic lookup. Total infrastructure cost: check current OpenAI embedding pricing for your dataset size, but for a one-time build pass over a few thousand records it's negligible. Ongoing: $0 hosting, $0 database, pennies for re-embedding on updates.
The playbook is reusable: ugly public data → Playwright → SQLite → Astro → static site with search. Swap the data source and you have a new product every time. Court records. SEC filings. Patent databases. Building permits. Every one of these is a potential n iche tool that someone will pay for or that drives newsletter signups.
Fork the Pardonned repo, swap the scraper target, and you're 80% of the way there before lunch.