PixPlot facilitates the dynamic exploration of tens of thousands of images. Inspired by Benoît Seguin et al’s paper at DH Krakow (2016), PixPlot uses the penultimate layer of a pre-trained convolutional neural network for image captioning to derive a robost featurization space in 2,048 dimensions.
Improved Dimensionality Reduction
In order to collapse those 2,048 dimensions into something that can be rendered on a computer screen, we turned to Uniform Manifold Approximation and Projection (UMAP), a dimensionality reduction technique similar to t-Distributed Stochastic Neighbor Embedding (t-SNE) that seeks to preserve both local clusters and an interpretable global shape.
To visualize the results, we looked to approaches more commonly deployed in 3D game design. The resulting WebGL-powered visualization consists of a two-dimensional projection within which similar images cluster together. Users can navigate the space by panning and zooming in and out of clusters of interest, or they can jump to designated “hotspots” that feature a representative image from each cluster, as identified by the computer.
PixPlot provides new ways of engaging large-scale visual collections. Initial experiments underway at Yale use the tool to look at thousands of cultural heritage images held in the Beinecke Rare Book & Manuscript Library, Yale Center for British Art, and the Medical Historical Library.
We’re currently working on new enhancements to the software that will include:
- animations between layouts
- on-click metadata, with options for filtering
- high-resolution images
- single-vertex primitives to enable users to display lots of images!
For more on the underlying code, visit the DHLab’s GitHub repository. The code was authored by Yale Digital Humanities Lab Developer Douglas Duhaime.