Occuprint Image Analysis
Click on images for full size visualizations
On day two of the hackathon, we had a mini tutorial from ImagePlot’s author, Lev Manovich. Last spring, we used a smaller set of features for image processing. After a bit of a demo, were able to extract and process many more, including one representation that approximate each images qualities as a matrix, instead of a point.
Approach: Using almost 400 Occuprint posters, we used ImagePlot and R’s pca package to extract image features, identify principal components, and cluster images. A principal component is a vector that can be used to describe variability in a feature matrix, and the first principal component describes the most variability. Figure 1 shows the Occuprint collection plotted by the first and second principal components.
We also performed non-parametric Bayesian clustering using hue, saturation and brightness features and Python’s sci-kit learn module. Clustering is an unsupervised learning method that seeks to partition a data set so that observations in the same cluster are similar to each other, and distinct from members of other clusters. Seven clusters were returned and are shown in Figure 2, each cluster representing one of the seven rows.
Lastly, we plotted only those posters with `occ’ in the poster’s title. This collection is shown in Figure 3.
Conclusions: Based on preliminary analysis (our final output generated only today) and our related work last year, several themes emerge: (1) the spring inspires a more colorful Occupy poster palette, with the most works relating to nature and rebirth, and (2) Anonymous and strike themed posters typically have less colors, gravitating towards primary red, white and black.
In addition, we’ve a human pattern recognition task underway in the gallery that is shown in Figure 4. So far, it appears that gallery visitors are using the same types of features to understand and visualize patterns in the OccuPrint poster set. However, one distinction is incredibly striking, speed. Relative to computer efficiency, human work feels more like geologic time.
Next steps: one nice feature we got to see in action is the ability of ImagePlot to visualize image sequences. For future work, we’d like to further examine narrative and event histories. Some ideas fro content are the Sandy Storyline, and the eviction of Zuccotti Park.
Please share any of your own insights about these visualizations or anything else that comes to mind about this work below.
-by srt + Christo de Klerk