StepX — Attention Explorer

Inspectable generation traces for open diffusion models

StepX turns a Stable Diffusion run into a guided visual explanation: generate an image, inspect how prompt words connect to denoising-step images, and export the evidence. It explains the open pipeline used here, not private black-box image models.
Primary audience
AI image researchers and educators
Use StepX to demonstrate cross-attention and denoising behavior in an inspectable model.
Creative technologists and designers
Use it to debug prompt-image grounding when working with open diffusion workflows.
Students and reviewers
Use the guided outputs as evidence for presentations, reports, and critique sessions.
Recommended workflow
1
Generate
Start with one prompt and seed so the run is reproducible.
2
Inspect a word
Select Global or one prompt word to see where attention concentrates.
3
Move through steps
Compare early layout formation with later detail refinement.
4
Export evidence
Save a GIF or ZIP for analysis, teaching, or project documentation.
Main path
Generate & Inspect is the default route for token-level attention over denoising steps.
Follow-up analysis
Image Structure helps compare region-to-region relationships after generation.
Advanced tools
Object discovery and ZoeDepth are optional companions, not the primary explanation.

Who is StepX for?

StepX is for researchers, educators, creative technologists, and designers working with open diffusion models who need to explain what happened inside a generation run. It is not a general-purpose image generator, and it does not claim to explain private black-box models such as ChatGPT image generation.

Recommended path: generate an image in Generate & Inspect, choose one prompt word, move the Step slider, then export a GIF or ZIP if you want evidence for a presentation, paper, design review, or committee demo.

Select a different model only if you want to switch from the default SD 1.5. The model loads automatically when you click Generate.

Model

The open diffusion model whose internal attention signals will be inspected.

Generate an image, then inspect prompt-word attention

Main workflow. Use this first. StepX generates one Stable Diffusion image, saves the decoded denoising-step images, and overlays DAAM attention on the matching step image. Start with one prompt, then choose a word and move the Step slider.

📝 Input

10 100

🖼️ Generated Image


🎛️ Attention Controls

Pick a word, move through denoising steps, and adjust opacity. Each heatmap is overlaid on the decoded image for the selected step when available.
Focus Word

Choose Global for aggregate attention, or choose a prompt word to inspect its spatial grounding.

0 50
0.1 1

🎬 GIF Animation

0.1 2

📦 Export All

🔥 Attention Map

Export tip: use Download GIF for the selected word over denoising time, or Export All for every step x token overlay, denoising base image, CSV value, and per-token GIF.
Depth is separate from attention. ZoeDepth estimates near/far scene structure for the final generated image. Use it only when you want to compare attention with figure-ground, spatial hierarchy, or focus.