The Data Anywhere Project
The problem: Data is available in bits publicly, but aggregated by companies that want to charge for it. Other data may be free in aggregate form, but not available for live query/access. This project aims to solve both problems, one data set at a time.
Proposed solution: Set up simple database, which will replicate itself, and simple scrapers on various virtual machines.
Virtual machines are fairly cheap (about $20/mo), and many go unused/underutilized. We’ll be setting up two linode VPSs at the hackathon and there will be several ongoing meetups.
What to expect: Participants will be provided with simple instructions on how to set up and secure a server, databases can be set up to maintain themselves, and replicate. Examples of how this is done, and sample code to get someone up and running will be provided.
Immediate goal: These servers would be used to aggregate any type of data, and make it accessible to the public at large, through a simple ReSTful web interface.
This has the potential to help many organizations. If one machine is shut down, no loss, since the db replicated itself to several other machines.
Ultimate vision: to use the data in sort of a Freakonomics type of analysis, comparing what looks like disparate data, chronologically at first, but could be compared along any index.
About the instructor: all around AWESOME software developer and Linux admin, teaching Linux basic system admin, MongoDB setup and usage, and flask web API.
Knowledge of Linux and/or Python is helpful but not necessary. Patience and a willingness to learn is much more important.