Narratives of Occupy Day 2
We wrote a script to download text content and metadata from the “We Are the 99%” Tumblr, and used built-in TextMate features to remove HTML tags and fix other data issues. The final CSV had 3413 posts, of which 2522 had text (many of the posts have not yet been transcribed). From this, we were able to draw a quick frequency chart showing posts to the site over time:
A lot of activity around September / October last year, and some sporadic updates now. No great surprises, but the overall chronology of the data set is good to bear in mind while we search for themes across it.
We used IBM ManyEyes to generate some initial visualisations (click to see more context and interact):
Obviously, a lot of the issues described in the Tumblr are financial (rent, bills, work, food, job, money, debt) and personal (family, life, depression, friends). Health is also a big one, and many of the stories mention the cost, unfairness and inefficacy of the health system. Education also appears frequently (college, school, degree), and a word net showing “* is not *” reveals a theme of it not having been worth the debt.
Next, we wanted to categorise the stories somehow, and get some overall data on the issues raised. Using Latent Dirichlet allocation (LDA), we wrote a script to automatically detect topics based on word usage. Setting the script to search for 4 categories gave 4 topics, shown here with the names we assigned them (bold) and the most common words in each category:
One quick pie chart later, and we can see which topics were most common (based on treating each post as belonging to its single most-likely category):
By the end of the hackathon, we had collected the above visualisations into a preliminary infographic overview of the Tumblr stories, which we presented to the rest of the group. Our (slightly messy!) working notes from the day are online.
The prevalence of healthcare- and education-related posts is interesting. The mainstream understanding of Occupy (particularly in terms of party politics) has been centred around income inequality, but folk narratives so far paint a slightly different picture. It appears that many people identify with “The 99%” based on everyday struggles — although money is important, it could be perhaps be seen as an expression of an underlying demand for a minimum standard of living, and a responsible collective solution to shared personal crises. Our preliminary findings from “We are the 99%” suggest that messaging and outreach based on this concept of mutual support may be more effective than continuing to try to rally around abstract issues.