Visual Descriptions for the Entire Collection

Describing the Collection: Accessibility at the Cleveland Museum of Art

A visit to an art museum depends heavily on visual perception. For patrons who are blind or have low vision, this dependence can significantly diminish the experience. Descriptive textual representations of artworks help close this gap, giving blind and low vision visitors access to the composition and content of a piece in ways that parallel how sighted visitors experience it. With that goal in mind, CMA set out to write visual descriptions for the primary images of all 68,000+ objects in its collection, to be used both online and on-site. Explore any artwork on Collection Online and open its visual description modal to take a look.

Laying the groundwork

In the summer of 2023, CMA launched a project to create visual descriptions for the entire collection, building on the momentum of the museum's accessibility-centric website that had debuted several months earlier. That site was developed with substantial support from Prime Access Consulting (PAC), an accessibility firm led by disabled experts. With a sharpened awareness of the needs of disabled patrons, CMA redirected its partnership with PAC toward this new effort.

The first step was defining what makes a high-quality visual description. Over several months, CMA worked with PAC to develop a comprehensive style guide tailored to the museum's collection, one that outlines how descriptions should be organized, what visual details to prioritize, and what standards to uphold. From there, the museum's team began implementing the guide by writing descriptions for collection highlights and a broader range of objects, then refining them based on feedback from PAC.

Support from AI

Manually writing descriptions for the remaining tens of thousands of objects in any reasonable timeframe was not realistic, so the museum turned to AI to support generation at scale. The project went into hibernation for over a year, however, as the available technology was not yet capable of producing results consistent with the style guide.

By 2025, advances in multimodal large language models changed that calculus. CMA partnered with Google Arts and Culture (GA&C) to bring the project to completion, using Gemini, Google's multimodal AI, to generate the remaining descriptions. After months of refining prompts through careful integration of the style guide's rules and the use of manually written descriptions as reference examples, generous funding from GA&C helped offset the costs of text generation at that scale.

Bringing it all together

Each visual description is attached to a specific image rather than an artwork record. This distinction matters because many objects in the collection have multiple views, and a description written for a front-facing image would be inaccurate for an alternate angle or detail shot. Attaching the description to the image directly ensures it only travels with the image it actually describes, wherever that image appears across CMA's platforms.

Constructing this system required significant changes across the museum's technological infrastructure. CMA's Digital Asset Management (DAM) system and Collection Management System (CMS) both received new fields to store the visual description text and to indicate whether each description was written by a human or generated by AI. The museum's API surfaces those fields from the CMS, and the website draws on them to assemble the visual description modal that visitors see today.

Before any description goes live, it passes through an automated quality evaluation using cosine similarity, a metric that measures how closely the generated text aligns with the image it describes. Descriptions that meet the threshold are published directly; those that fall short are flagged for human review.

On the website, descriptions power screen reader-accessible alt-text and appear in a dedicated visual description modal, making the content available to visitors of all abilities. Each modal includes a link to a feedback form where visitors can share thoughts on any description, whether manually written or AI-generated, supporting CMA's commitment to ongoing improvement. The system is also in the process of being designed with future extensibility in mind, with the ArtLens gallery and its corresponding app among the planned next deployments.

From the Cleveland Museum of Art

Digital Innovation and Technology Services: Jane Alexander (Chief Digital Information Officer), Tara Bobinac (Associate Developer), Andrea Bour (Collections Information Data Analyst), Delaney Marrs (Walton Family Endowment Fellowship in Digital Innovations and Technology Services)

External Partners

Prime Access Consulting: [Fill in]

Google Arts & Culture: Lynn Cherny, Tom Granger, Lane Lytle, Surya Tubach