The GOOD Data framework: to share data with care

They say “sharing is caring”. Indeed it is, but only if done the proper way. It’s about sharing data that people can benefit from with the least hassle possible. We deem data to be so if it is well-described, cleaned, and constantly available. GOOD proposes, therefore, four principles for a successful data sharing experience on the Web.

G for Guided

Like User Guide of a software or a product, data has also to be accompanied with a couple of information (metadata) that help users get a sense of what it is and what it contains. Information such as its description, its format and structure (e.g., a meaningful header in a CSV file), who provided it, when it has been created, provided and/or modified, what is its version if it was the evolution of a previous one, etc.
> Data that people cannot grasp or trust is NOT a GOOD data.

O for Open 

Publishing open data is a pretty old topic around which plenty standards and best-practices have been established in the past. Data is so needed to fight diseases, preserve health and well-being, empower education and equality, improve mobility and circulation, fuel research and science in whatever form it is, etc.
> Blocked or locked data can be good to its owners or privileged users, but it is NOT GOOD to the rest of the world.

O for Optimized

We refer here to data that is at its highest level POSSIBLE of readiness for use. It has to be clean, clear, uniform, and simple to use.
> Data that has the user to go through stressful data transformation pipelines just to make it ready for the first contact is NOT a GOOD data.

D for Durable

It is more often than not, that data is made available online without a long-term monitoring plan. The server hosting the data can go down, references/alias can break, database server storing metadata information can go offline, etc. all are phenomenon most of us have faced at some point. Data shared on the Web has to be supported by a long-term monitoring, like mirroring download across several CDNs, setting up health-check notifications in any hosting server involved. We do not mean here data that goes ‘permanently’ unavailable, that is non-data, we rather refer to data that gets offline temporarily.
> Like a friend, data that is not there when you need is NOT a GOOD data.

It is to add that:

  • GOOD preaches for sharing data at the highest level of openness possible, so usage licences are considered out-of-the-scope.
  • GOOD looks more into the technical side of data sharing, rather than on the legal one. So another reason aspects like licencing and privacy are not considered—not underestimating their importance in any way.
  • GOOD suggests four high-level, simple and rememberable requirements for beneficial data sharing on the Web. Indeed, each requirement can have many sub-requirements underneath. We leave diving into the details to interested readers, and refer to some pointers that can be useful link 1, link 2.



Leave a comment

Leave a Reply

Your email address will not be published.