Applied AI
ArtLens
Computer vision and iconographic deduction for art identification, with a self-training compounding loop.
- 209 Iconographic items in cache
- 5 Cutout + quadrant variants per item
- 16 Variants curated overnight
- 92 Excluded-lists generated
- 87–93% Accuracy on cutout images
ArtLens identifies religious and mythological iconography in art-auction photographs by combining VLM evaluation, expert-curated subject variants, and a forced-choice prompting pattern. Confirmed observations feed back into the diagnostic library — the system gets sharper the more it's used.
Decisions
- 01
Haiku primary, Opus on uncertainty
VLM benchmarking on the same art-identification task showed Haiku reaching 87–93% accuracy on cutout images, statistically equivalent to Opus on this domain. Haiku-primary with escalation to Opus only when the answer is uncertain (low confidence, conflicting elements, edge cases) cuts cost ~70% with no measurable accuracy loss. The rule for "uncertain" is explicit and tested rather than left to model-default judgment.
- 02
Image prep before VLM, every time
Raw thumbnails hide the discriminating features. A rembg cutout (subject isolated on white) plus four quadrant crops at full resolution surface details (Sol/Luna faces, INRI, Agnus Dei) that no VLM sees in the source image. The image-prep stage runs once per item, costs ~12 MB of cached derivatives, and lifts identification accuracy by a wider margin than any model swap I've benchmarked.
- 03
Forced-choice VLM prompting, not open-ended
Open-ended prompts get open-ended answers that are confidently wrong. Yes/no/uncertain checklists against named discriminators, with concrete examples instead of schema placeholders, work measurably better. qwen3-vl has a 30% empty-response variance on this format that an auto-retry handles cleanly. The pattern is documented in feedback_forced_choice_vlm_pattern.md and reused across every VLM call in the system.
- 04
Self-training compounding loop
When a reviewer approves an item's identification, every depiction element present in the image but not in the winning variant's diagnostic list gets recorded as an observation. After three independent corroborations of the same element on the same subject, it auto-promotes into the variant's diagnostic library. The library gets sharper with use — the iNaturalist research-grade pattern, transposed to art iconography.
- 05
Two pipelines, not one
The variant-ranking pipeline works for the ~4% of items with religious or mythological iconography. The other 96% (portraits, landscapes, genre paintings, modern works, artist-named pieces) need an artist-based pipeline instead. Conflating them was a real source of bad classifications and lost rule firings until the two were separated. Honest scope: ArtLens currently solves the 4% case well, and the 96% case is its own project.
ArtLens started as a single question — given a photograph of a religious painting, can a model identify the iconographic subject (Pietà, Saint Jerome, Vesperbild, etc.) with enough specificity to inform an attribution or appraisal? The answer was “almost”, and the gap between almost and reliable turned into the system you see.
The interesting part isn’t the model. The interesting part is everything that has to be true before the model can do its job — the image-prep stage that makes small features visible, the expert-curated subject variants that define what “Pietà” can and cannot look like, the forced-choice prompting pattern that beats open-ended questions on every benchmark I’ve run, and the self-training loop that turns each approved item into evidence that strengthens the library.
I also learned to be honest about what the system doesn’t do. Iconographic deduction over expert variants is powerful when there is iconography to deduce. For the other 96% of items, that approach is the wrong tool. Naming that explicitly in the project saved months of fitting square problems into round pipelines.