SkinAtlas, part 2: the atlas grows while I sleep

SkinAtlas’s knowledge base started at 225 ingredients. At the latest count it’s 2,402, with three unresolved tokens left in the worklist. Most of that growth happened in agent-run batch sessions, and the machinery behind them is the most reusable thing the project has produced.

State lives in a file, not in the conversation

The core problem with long agent jobs is that sessions end. Context fills up, laptops sleep, weekly usage limits arrive mid-batch. So the mass-ingestion pipeline is designed around one rule: all state lives in a JSON progress file, and every stage is resumable. A fresh session reads the progress file, sees that batch 003 is at step four of seven, and continues. The skill document that drives it is written for whoever wakes up next — model-agnostic, assumptions spelled out, stop conditions explicit.

The pipeline stages: ingest raw product data, source images, verify them, upload, validate, research the unknown ingredients, ship. Each batch ends as a pull request with the knowledge-base delta in the title.

A campaign card showing the SkinAtlas catalog interface with product rows and category chips

The catalog the pipeline feeds. Each batch ends as a pull request with the knowledge-base delta in the title.

The 90% wrong problem

The humbling number: automated image lookup for niche Korean and Japanese skincare brands is wrong roughly 9 times out of 10. One audit found 9 of 9 images wrong. Products in a line look nearly identical, retailer pages mix variants, and a confident agent will happily attach the toner’s photo to the essence.

The fix wasn’t a better scraper, it was a QA hierarchy. Candidate images get tiled into a montage for cheap visual spot-checks. A small model does a first pass, and the measured finding — recorded in the skill doc — is that it only catches about 10% of the errors, so an authoritative vision pass with a stronger model does the real audit, comparing each candidate against a reference image before anything is accepted. Verify-before-apply became the pipeline’s law, and “which model is allowed to make this judgment” is written down per stage.

Cost segmentation

The paid scraping API is reserved for one thing: end-user requests in the live app. All internal catalog building runs on free tooling — a self-hosted scraper built on a headless-browser render engine and a metasearch library that fails over across search backends. The internal tool has a clean seam where the render engine sits, so it can be swapped without touching business logic. The reason is plain economics: burning paid credits on bulk ops work would be paying retail for a wholesale job.

The pipeline also finds real bugs in the product. A fuzzy-deduplication pass once merged two different single-ingredient patches into one product; the false-merge got root-caused and fixed both in the pipeline and in the live app’s catalog code. Bulk operations are adversarial testing you didn’t have to write.