Intro was created with the purpose of seeking out and mirroring any and all legal and open data sets.

Big Data

Data defines our age. Nearly everything we do produces it. We choose to think of it as an abundant resource for storing, accessing, managing, and studying.

Open Data

We believe in a free society built on the whole of modern knowledge. We believe that knowledge empowers people to uplift the human condition.

The era of open government data is just beginning. We can look forward to even larger data sets being released in the future. We view aggregating and organizing those data sets as a public good.

Data Responsibility

The world is awash in data. Bruce Schneier says that data is the pollution of the information age. Long after we are gone the data will remain.

Our aim is to be a public steward of these data sets. We intend to store them indefinetely and to make them available to the public via an api and gui.

Data Integrity and Preservation

Historians have pieced together an incredibly rich picture of our history with the bits and pieces that have survived. The data sets of our time tell in vivid detail the story of our lives. Historians of future generations will get a much more accurate picture of our day and age from the data sets that we hope to help preserve.


Though we agree with much of the Guerilla Open Access Manifesto it's ways are not our ways. We aim to be 100% compliant with laws and will provide a way for copyright holders to issue takedown requests.

We will post the copyright notice if available at the top of each directory.

Some may find this to be a watered down stance. Our feeling is that the amount of open and legal data available is so large and getting larger daily that there is room for an organization to focus 100% on it.


Q: If this data is already open and freely available to anyone why bother mirroring it?

A: Firstly, unfortunately we can't be sure that this climate of open government data will last forever. In a best case scenario we know that political winds change directions. In a worst case scenario there could be a situation in which the government(s) removes all open data.

In either case we believe it's a public asset to mirror these datasets.

Secondly, though all this data is available it's all spread across the web. Someone who was interested would need to be aware of the location of each piece of data and then would need to take the time to seek it all out.

We speed that process up by having it all indexed and available in one place. It's our belief that the greater the number of data sets aggregated into one place the greater the chance that connections between the data will be made.

Q: Who's behind this?

A: Open Big Data was created and is ran by Carlos Cardona

Thanks for reading! Follow me on Twitter and/or G+ for more.

See something wrong or got something to add/change? Fork this git repo and send me a pull request

If you particularly enjoy my work, I appreciate donations given with


31 January 2013