-
Notifications
You must be signed in to change notification settings - Fork 7
#103 WIP 5 levels #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
#103 WIP 5 levels #104
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
d7537e4
#103 WIP 5 levels
joepio 2a6bb66
#103 add verifiable data
joepio 9a4cf79
#103 add class to type-safety
joepio 5226f11
#103 add level 0, proprietary data
joepio 5c8040d
#103 CC fix
joepio b1d053c
#103 open database license
joepio dbfc511
Spell, re-use
joepio 46ecf1f
#103 less politics
joepio File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# 5 Levels of data reusability | ||
|
||
Not all data are created equal. | ||
There are notable differences in how much you can do with data and how much effort it takes. | ||
The more reusable data is, the easier it will be to use it as a developer, researcher or other type of data user. | ||
Re-useability is about being able to transform, sort, query, serialize, modify, render and audit data without requiring too much work. | ||
|
||
_This list is inspired by Tim Berners-Lee's [5-star open data](https://5stardata.info/en/)_. | ||
|
||
## Level 0: proprietary data | ||
|
||
If you don't give others the _rights_ to read, use or modify your data, it's reusability is zero. | ||
|
||
That's why it's important to have a _license_ that allow others to use your data, like the [Open Database License](https://opendatacommons.org/licenses/odbl/summary/). | ||
or one of the Creative Commons licenses. | ||
|
||
It's also important to use _open formats_ (such as `CSV`, `JSON` or `PNG`), instead of _proprietary formats_ (tied to specific vendors, such as `PSD` or `RAR`). | ||
|
||
|
||
## Level 1: unstructured data | ||
|
||
_Examples: images, videos, plain text_ | ||
|
||
Unstructured data is the least usable. | ||
Humans can read it, and AI / Machine Learning systems can draw more conclusions from it then ever, | ||
but it's hard to build an actual application or graphic from only unstructured data. | ||
|
||
``` | ||
Hi! I'm Joep, I'm born in 1991. | ||
``` | ||
|
||
## Level 2: structured data | ||
|
||
_Examples: CSV, XML, JSON, TOML, EXCEL_ | ||
|
||
Structured data can be read by machines, and this allows us to do all sorts of useful things. | ||
We can _query_, _sort_ and _filter_. | ||
But still, this type of data often requires human input when it needs to be processed. | ||
And we don't have guarantees about which fields will be filled, or what their datatypes are. | ||
One time, a `birthYear` can be a string, and the next time it can be a number. | ||
Data can be _structured_, but still _unpredictable_. | ||
|
||
```json | ||
{ | ||
"name": "Joep", | ||
"birthYear": 1991 | ||
} | ||
``` | ||
|
||
If we want predictability, we need to make it _type-safe_. | ||
|
||
## Level 3: type-safe data | ||
|
||
_Examples: SQL + DB SCHEMA, JSON + JSON schema, XSD + XML, RDF + SHACL, In-memory data in type-safe programming languages_ | ||
|
||
Type-safe data means that every value of the data has an explicit datatype. | ||
It is _strongly typed_ and has a clear _schema_ that describes which properties you can expect in a Resource. | ||
This means that someone re-using type-safe data can know for certain that it conforms to a specification, a set of rules. | ||
The shape of the data is _predictable_. | ||
This predictability means that developers can safely re-use it in their system without worrying about missing fields or datatype errors. | ||
|
||
Lots of software has _internal_ type safety, especially if you use type-safe programming languages like Typescript, Kotlin or Rust. | ||
However, when the data _leaves the system_, a lot of type related data is lost. | ||
Even if this schema related information is described, the schema itself is often not machine-readable. | ||
The best way to have type-safe data, is to describe the schema in a machine-readable format. | ||
|
||
In SQL, we can use a DB schema. In JSON, we can add a JSON Schema file. For XML, we have XSD. | ||
|
||
In Atomic Data, the Properties themselves (the links in the keys in JSON-AD) describe the required datatypes, which helps developers when re-using data understand what they can expect from a value. | ||
|
||
```json | ||
{ | ||
"https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Agent"], | ||
"https://atomicdata.dev/properties/name": "Joep", | ||
"https://atomicdata.dev/properties/birthYear": 1991, | ||
"https://atomicdata.dev/properties/worksOn": "Atomic Data", | ||
} | ||
``` | ||
|
||
## Level 4: browsable data | ||
|
||
_Examples: Atomic Data, properly hosted RDF_ | ||
|
||
If your data is _connected_ to other pieces of machine-readable dat, is becomes browsable, similar to how websites link to each other. | ||
This effectively creates a _web of data_, and allows for a whole new way to think about the internet. | ||
This is what allows decentralized applications, true data ownership, and a new set of applications. | ||
|
||
```json | ||
{ | ||
"https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Agent"], | ||
"https://atomicdata.dev/properties/name": "Joep", | ||
"https://atomicdata.dev/properties/birthYear": 1991, | ||
"https://atomicdata.dev/properties/worksOn": "https://atomicdata.dev", | ||
} | ||
``` | ||
|
||
## Level 5: verifiable data | ||
|
||
_Examples: Atomic Data + Atomic Commits_ | ||
|
||
When your data is _verifiable_, other people can verify who created it and modified it. | ||
They can use cryptography to validate signatures, which proves that one person or machine created a piece of data. | ||
|
||
```json | ||
{ | ||
"https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Agent"], | ||
"https://atomicdata.dev/properties/name": "Joep", | ||
"https://atomicdata.dev/properties/birthYear": 1991, | ||
"https://atomicdata.dev/properties/worksOn": "https://atomicdata.dev", | ||
"https://atomicdata.dev/properties/previousCommit": "https://atomicdata.dev/commits/EF18751AE781", | ||
} | ||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop the trailing dot here (or capitalize first letter in following line).