An AI Pipeline That Reads Your Post Before It Draws the Picture

Stock photos are a lie. They look professional, they’re completely detached from the content, and every developer has seen the same three “hands on laptop in coffee shop” images recycled across a hundred different articles. The image has nothing to do with the post. It’s just filling a rectangle.

Here’s what I built instead.

The Concept Before the Image

The mistake with AI image generation is treating it like a search engine. You describe what you want (“dark moody tech blog cover”), you get something vague that matches the description, and it still has nothing to do with what you actually wrote.

The right approach is abstraction first. Every technical post has an underlying concept — something the words are pointing at. Before generating anything, translate that concept into a visual metaphor:

A post about git submodules in a monorepo → nested translucent cubes connected by thin blue threads
A post about constraint-driven architecture → walls forming a corridor, the constraints themselves becoming the passage
A post about giving AI context in a large codebase → a constellation of 100+ nodes, one central light giving them structure
A post about a dev workflow that follows you everywhere → a single luminous thread weaving through multiple distinct spaces

The metaphor becomes the prompt. The prompt produces an image that a reader can understand before reading a word — not because it illustrates the content, but because it is the concept in a different medium.

The Stack: Three Tools

The pipeline uses three CLI tools chained together.

1. `generate-image` — NanoClaw image skill

A wrapper around the Gemini AI image generation API (free tier, ~500 requests/day). Text-to-image, 16:9 aspect ratio, two models:

generate-image "Three abstract geometric structures converging at a luminous point..." \
  --aspect 16:9 \
  --output /path/to/cover.png

The command outputs JSON metadata to stdout:

{
  "path": "/path/to/cover.png",
  "size": 1323727,
  "model": "gemini-2.5-flash-image",
  "aspect_ratio": "16:9",
  "mode": "generate"
}

Generated images come out as PNG at around 1–1.5 MB.

2. `sharp` — WebP conversion

The sharp Node.js library handles the conversion. Quality 85 is the target — typically brings a 1.3 MB PNG down to 20–80 KB WebP, well under the 200 KB performance budget.

node -e "
const sharp = require('./node_modules/sharp');
sharp('cover.png')
  .webp({ quality: 85 })
  .toFile('cover.webp')
  .then(i => console.log('WebP size:', (i.size/1024).toFixed(1), 'KB'));
"

3. `wrangler r2 object put` — Cloudflare R2 upload

Wrangler uploads directly to a Cloudflare R2 bucket. The image becomes immediately available via Cloudflare’s CDN on a custom domain.

wrangler r2 object put {your-bucket}/blog/covers/{slug}-cover.webp \
  --file cover.webp \
  --content-type image/webp \
  --remote

The Prompt Structure

Every cover follows the same visual identity. The style system is fixed; only the subject metaphor changes.

[SUBJECT — the post's concept abstracted into a visual metaphor]

STYLE AND COMPOSITION:
Dark atmospheric digital art. Deep indigo-blue void (#0f0f23 to #1a1b2e).
Abstract, geometric, architectural forms. Three depth layers.
Single soft accent glow (#7aa2f7). No text, no people, no faces.
Edge-blended, cinematic, matte painting quality. 16:9 wide composition.

AVOID:
Realistic scenes, text, faces, neon cyberpunk, busy effects, stock photo feel.

The palette — deep indigo background, soft blue accent — matches the Tokyo Night colour scheme the site runs on. The images feel native to the design rather than imported from somewhere else.

Two hard rules: no text in images (titles belong in HTML, not burned into pixels), and no people or faces (abstract and structural, always).

The Frontmatter

Three fields in the blog post markdown:

coverImage: 'blog/covers/{slug}-cover.webp'
coverAlt: 'Accurate description of what the image shows'
coverCaption: 'prompt · "the visual metaphor that became this image"
— NanoBanana'

coverAlt is accessibility — describe what’s actually in the image, not what the post is about. coverCaption carries the provenance: the prompt that produced the image, attributed to NanoBanana (the image generation system). The caption is the image’s creative DNA, visible in the rendered figcaption.

Astro renders these via Cloudflare Image Transformations — the same R2 object serves at different sizes and formats depending on context (400px for stream card thumbnails, 1200px for the cover hero, cropped 1200×630 for OG/social). The CDN handles resizing at the edge; you only store one file.

The Full Run

For each new blog post:

# 1. Understand the post concept, compose the visual metaphor
# 2. Generate
generate-image "..." --aspect 16:9 --output {slug}-cover.png

# 3. Convert
node -e "const sharp = require('./node_modules/sharp'); sharp('{slug}-cover.png').webp({quality:85}).toFile('{slug}-cover.webp').then(i => console.log((i.size/1024).toFixed(1), 'KB'))"

# 4. Upload
wrangler r2 object put {your-bucket}/blog/covers/{slug}-cover.webp \
  --file {slug}-cover.webp --content-type image/webp --remote

# 5. Add frontmatter to blog post, push

The whole thing takes two minutes. The bottleneck is thinking of a good metaphor — the tools don’t slow you down.

What the Pipeline Doesn’t Do

It doesn’t generate architecture diagrams or flow charts. For anything that needs to be precise — request lifecycles, crate dependency graphs, data flow between services — AI image generation is the wrong tool. Those get Mermaid or SVG.

The pipeline is specifically for atmospheric concept art: images that capture a mood or an idea, not a technical specification. That distinction matters. Knowing which type of image a section needs keeps you from reaching for the wrong tool.

The Meta Moment

The cover image for this post was generated by this pipeline. The visual metaphor: a Gemini-generated abstract composition flowing through a conversion stage and arriving at a storage bucket — three geometric forms in sequence, connected by light.

That’s the loop. The pipeline describes itself.

The views expressed here are my own. Examples and scenarios are drawn from my personal projects and do not represent any specific organization, product, or system.