Transparency Talk

Category: "Agriculture" (2 posts)

Beyond Alphabet Soup: 5 Guidelines For Data Sharing
August 29, 2013

(Andy Isaacson is Forward Deployed Engineer at Palantir Technologies. This blog is re-posted from the Markets for Good blog. Please see the accompanying reference document: Open Data Done Right: Five Guidelines – available for download and for you to add your own thoughts and comments.)

The BaIsaacson-100tcomputer was ingenious. In the 1960s Batman television series, the machine took any input, digested it instantly, and automagically spat out a profound insight or prescient answer – always in the nick of time (watch what happens when Batman feeds it alphabet soup). Sadly, of course, it was fictional. So why do we still cling to the notion that we can feed in just any kind of data and expect revelatory output? As the saying goes, garbage in yields garbage out; so, if we want quality results, we need to begin with high quality input. Open Data initiatives promise just such a rich foundation.

High quality, freely available data means hackers everywhere, from Haiti to Hurricane Sandy, are now building the kinds of analytical tools we need to solve the world’s hardest problems.

Presented with a thorny problem, any single data source is a great start – it gives you one facet of the challenge ahead. However, to paint a rich analytical picture with data, to solve a truly testing problem, you need as many other facets as you can muster. You can often get these by taking openly available data sets and integrating them with your original source. This is why the Open Data movement is so exciting. It fills in the blanks that lead us to critical insights: informing disaster relief efforts with up-to-the-minute weather data, augmenting agricultural surveys with soil sample data, or predicting the best locations for Internally Displaced Persons camps using rainfall data.

High quality, freely available data means hackers everywhere, from Haiti to Hurricane Sandy, are now building the kinds of analytical tools we need to solve the world’s hardest problems. But great tools and widely-released data isn’t the end of the story.

At Palantir, we believe that with great data comes great responsibility, both to make the information usable, and also to protect the privacy and civil liberties of the people involved. Too often, we are confronted with data that’s been released in a haphazard way, making it nearly impossible to work with. Thankfully, I’ve got one of the best engineering teams in the world backing me up – there’s almost nothing we can’t handle. But Palantir engineers are data integration and analysis pros – and Open Data isn’t about catering to us.

It is, or should be, about the democratization of data, allowing anybody on the web to extract, synthesize, and build from raw materials – and effect change. In a recent talk to a G-8 Summit on Open Data for Agriculture, I outlined the ways we can help make this happen:

#1 – Release structured raw data others can use

#2 – Make your data machine-readable

#3 – Make your data human-readable

#4 – Use an open-data format

#5 – Release responsibly and plan ahead

Abbreviated explanations below. Download the full version here: Open Data, Done Right: Five Guidelines.

#1 – Release structured raw data others can use

One of the most productive side effects of data collection is being able to re-purpose a set collected for one goal and use it towards a new end. This solution-focused effort is at the heart of Open Data. One person solves one problem; someone else takes the exact same dataset and re-aggregates, re-correlates, and remixes it into novel and more powerful work. When data is captured thoroughly and published well, it can be used and re-used in the future too; it will have staying power.

Release data in a raw, structured way – think a table of individual values rather than words – to enable its best use, and re-use.

#2 – Make your data machine-readable.

Once structured, raw data points are integrated into an analysis tool (like one of the Palantir platforms), a machine needs to know how to pick apart the individual pieces.

Even if the data is structured and machine readable, building tools to extract the relevant bits takes time, so another aspect of this rule is that a dataset’s structure should be consistent from one release to the next. Unless there’s a really good reason to change it, next month’s data should be in the exact same format as this month’s, so that the same extraction tools can be used again and again.

Use machine-readable, structured formats like CSV, XML, or JSON to allow the computer to easily parse the structure of data, now and in future.

#3 – Make your data human-readable.

Now that the data can be fed into an analysis tool, it is vital for humans, as well as machines, to understand what it actually means. This is where PDFs come in handy. They are an awful format for a data release as they can be baffling for automatic extraction programs. But, as documentation, they can explain the data clearly to those who are using it.

Assume nothing – document and explain your data as if the reader has no context.

#4 – Use an open-data format.

Proprietary data formats are fine for internal use, but don’t force them on the world. Prefer CSV files to Excel, KMLs to SHPs, and XML or JSON to database dumps. It might sound overly simplistic, but you never know what programming ecosystem your data consumers will favor, so plainness and openness is key.

Choose to make data as simple and available as possible: When releasing it to the world, use an open data format.

#5 – Release responsibly and plan ahead

Now that the data is structured, documented, and open, it needs to be released to the world. Simply posting files on a website is a good start, but we can do better, like using a REST API.

Measures that protect privacy and civil liberties are hugely important in any release of data. Beyond simply keeping things up-to-date, programmatic API access to your data allows you to go to the next level of data responsibility. By knowing who is requesting the data, you can implement audit logging and access controls, understanding what was accessed when and by whom, and limiting exposure of any possibly sensitive information to just the select few that need to see it.

Allow API access to data, to responsibly provide consumers the latest information – perpetually.

...

These guidelines seem simple, almost too simple. You might wonder why in this high tech world we need to keep things so basic when we have an abundance of technological solutions to overcome data complexity.

Sure, it’s all theoretically possible. However, in practice, anybody working with these technologies knows that they can be brittle, inaccurate, and labor intensive. Batman’s engineers can pull off extracting data from pasta, but for the rest of us, relying on heroic efforts means a massive, unnecessary time commitment – time taken away from achieving the fundamental goal: rapid, actionable insight to solve the problem.

There’s no magic wand here, but there are some simple steps to make sure we can share data easily, safely and effectively. As a community of data consumers and providers, together we can make the decisions that will make Open Data work.

-- Andy Isaacson

Glasspockets Find: 2012 Annual Letter from Bill Gates
February 2, 2012

Gates Foundation Annual LetterBill Gates speaks candidly about his work at the Bill & Melinda Gates Foundation in his fourth Annual Letter. As a tool for transparency, the letter is a unique glimpse into the mind of a foundation donor, revealing Gates' critical thinking with respect to the foundation's activity, what has worked, what setbacks have been encountered, and what lessons have been learned by the foundation and its partners and grantees. The need for innovation continues to be a central element to his thinking. This year's letter is an argument for making the choice to keep on helping extremely poor people build self-sufficiency." The foundation will continue to encourage innovation in areas, including agriculture and public health, "where there is less profit opportunity but where the impact for those in need is very high."

Gates devotes a significant portion of this year's annual letter to innovation in agriculture. This is clearly an area that he believes holds great promise to improve the lives of billions of people in a relatively short period of time with rather modest commitments of resources. He cites many reasons for optimism, including exciting new understanding of plant genes that should greatly accelerate the pace of agricultural innovation.

Most of the foundation's resources go to global health issues. He shares many positive developments in this area, including a milestone in the fight to eradicate polio: on January 13, 2012, India marked its first anniversary of being polio-free. This was a huge accomplishment, calling for the coordination of many players. The effort reveals many lessons that will hopefully lead to successful campaigns in the three countries where the virus remains endemic-Afghanistan, Nigeria and Pakistan.

The foundation's domestic work focuses on U.S. education. Here, Gates is impressed by the technique of peer evaluation among teachers that has been tested in the Tampa, Florida, school district and hopes it may serve as a model that can be replicated. Interestingly, the concept of learning from one's peers arises again when Gates later discusses the first of what will be an annual meeting of those who have taken the Giving Pledge. He would like to focus attention on how the web can be used to allow "givers of all sizes to connect to causes and see the results of their giving."

One of the perennial challenges that Gates admits facing is the common belief that development money is wasteful or doesn't produce lasting results. But he is "convinced that when people hear stories of the lives they've helped improve, they want to do more, not less." Given this, Gates attempts to put into perspective the news that some of the money provided to the Global Fund to Fight AIDS, Tuberculosis and Malaria was diverted for corrupt purposes. The Gates foundation is the largest non-governmental donor to the Global Fund.

Gates concludes by making a plea for continued funding from the world's wealthiest nations, even in challenging economic times, for development that benefits the world's poorest. A "relatively small amount of money invested in development," in his words, "has changed the future prospects of billions of people-and it can do the same for billions more if we make the choice to continue investing in innovation."

To read or download the letter (available in Arabic, Chinese, English, French, German, and Spanish), click here.

Those interested may send feedback about the annual letter to annualletter@gatesfoundation.org.

Tweet using #billsletter to join the conversation. Follow Bill Gates on Twitter: @BillGates.

-- Mark Foley

About Transparency Talk

  • Transparency Talk, the Glasspockets blog, is a platform for candid and constructive conversation about foundation transparency and accountability. In this space, Foundation Center highlights strategies, findings, and best practices on the web and in foundations–illuminating the importance of having "glass pockets."

    The views expressed in this blog do not necessarily reflect the views of the Foundation Center.

    Questions and comments may be
    directed to:

    Janet Camarena
    Director, Transparency Initiatives
    Foundation Center

    If you are interested in being a
    guest contributor, contact:
    glasspockets@foundationcenter.org

Subscribe to Transparency Talk

Categories