Preparing for the Hackathon: #OpenDataDay for #OccupySandy Wrap-up

data sets produced by the Occupy Sandy mutual aid/disaster relief effort

A couple dozen data wranglers, activists, computer scientists, and social researchers came together on Saturday to digitize data, articulate needs, share data sources, and prepare data sets for this week’s OccupyData Hackathon.

As a grassroots coordinated disaster response network, Occupy Sandy mobilized upon a variety of data practices. Some of these were tactical and innovative: such as fulfilling immediate community needs by placing items on an Amazon wedding registry or by coordinating logistics through the deployment of a full-scale Enterprise Resource Planning system.

In sorting through the data produced by Occupy Sandy, we identified the types or sets of data produced and the potential outcomes in transforming that particular data set. We formed groups around the collections and checked-in, as the larger group, once an hour to share progress.

Assessments.  Beginning just after the storm, volunteers began collecting data with neighborhood canvasses.  Although the collected data exists in several formats, and with variable levels of completeness, data is available for parts of New Jersey, Staten Island, Red Hook, and the Rockaways. Part of the group worked on digitizing paper forms that were not entered into a system, and another part worked on identifying a common data model.

Forming an aggregated data set is challenging, because a variety of forms and a varying level of detail exists between the data sets collected from each site. A common data model can help to standardize data collection in the future, and support the allocation of data management and reporting resources. This model should also be flexible so that different hubs can customize it for their specific needs and interests.

Privacy in the transformation of the data sets is a concern. The group discussed ways to de-identify and anonymize data, such as the use of census blocks or mapping perimeters around random longitude/latitude points that contain multiple addresses.

Volunteer.  These are data sets on volunteers from Red Hook and Rockaways. This group reviewed the data collected, collection process, and its use to design a more effective volunteer skills matching and site placement/scheduling system.

Donations and Receipts. An apparent goal would be to compare dollar-for-dollar responsibility and effectiveness of OccupySandy financial and in-kind donations to other networks, but the data appears too incomplete. We have data on donations coming from parts of NJ and receipts coming from parts of NY. This group is attempting to collect complementary datasets for analysis.

Locations & Logistics. There are data sets from across the city on locations that were used for a variety of relief work. This group is working on merging the data sets that were used by the OccupySandy network, assess the data model used, and design a more effective location decision support and logistics system for grassroots disaster response that also supports better day-to-day action/resource sharing. For now, the plan is to import the consolidated locations data into Sahana – an information management system for disaster relief.

But there’s even more data! Reflecting upon the data sets collected from OccupySandy, it is interesting to see that we neglected to mention the media content produced: hundreds of videos on youtube, thousands of tweets, or the countless DIY wayfinding signs. Certainly these are data sets that we can revisit and work on during the hackathon.

The Challenges. Some of the challenges we identified on Saturday includes the following: To look for ways data sets can be related and thereby sites better interfaced to strengthen mutual aid and disaster relief. To identify better ways to safeguard privacy on personal data, recognizing that the data practices even on volunteer, donation, and assessments can be rather relaxed when improvising solutions under extreme conditions. How best to safeguard privacy while recognizing that different “tiers” of access are needed (e.g., volunteers on the ground might need the specific address of a home without heat, but analysts can use more aggregated, deidentified data and still get interesting results).

As with former OccupyData hackathons, two thematic outcomes were expressed. One, to understand what happened by looking at the data and comparing it with the established response networks such as American Red Cross. Two, to make and keep the data actionable. While the clearest action agenda for OccupySandy data is to support future grassroots disaster relief networks, reviewing the data and the unique data practices of communities reveals opportunities to support and synchronize ongoing efforts.