Anthology of Computers and the Humanities · Volume 3

Beyond Accuracy: Investigating Vision Model Perception on 19th-Century Decorative Arts

Albina Toumarkine1 and Chahan Vidal-Gorène1,2 ORCID

  • 1 École nationale des chartes-PSL University, Paris
  • 2 Centre Jean Mabillon, Paris

Permanent Link: https://doi.org/10.63744/k4Jy3Lr4rdK5

Published: 21 November 2025

Keywords: Islamic ornament, 19th-century ceramics, Unsupervised image analysis, Feature clustering, Interpretability, Vision transformers, Zero-shot learning

Abstract

This paper examines how pretrained vision models perceive and organize a corpus of 19th-century decorative artefacts and printed materials. Using a zero-shot approach, we combine feature extraction, dimensionality reduction, and clustering to explore how convolutional and transformer architectures respond to historical visual material. Two complementary experiments are presented: the first analyzes corpus-level organization through unsupervised clustering of VGG16 embeddings; the second investigates similarity retrieval from individual queries to compare model interpretability (VGG16, EfficientNet, ViT, DINOv2, and CLIP). By visualizing and aggregating activation maps, we discuss biases in how models attend to shape, ornament, and layout, often emphasizing background contrast or framing over meaningful decorative structure. Rather than measuring accuracy, this study focuses on interpretability and bias, highlighting the challenges of adapting art-historical imagery to contemporary vision pipelines.