diff --git a/.github/workflows/blank.yml b/.github/workflows/deploy.yml similarity index 74% rename from .github/workflows/blank.yml rename to .github/workflows/deploy.yml index e38898f..f28b1da 100644 --- a/.github/workflows/blank.yml +++ b/.github/workflows/deploy.yml @@ -1,21 +1,24 @@ name: github pages +permissions: write-all + on: push: branches: - - master + - main jobs: deploy: - runs-on: ubuntu-18.04 + runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Setup mdBook uses: peaceiris/actions-mdbook@v1 with: - mdbook-version: '0.3.7' - # mdbook-version: 'latest' + mdbook-version: 'latest' + + - run: cargo install mdbook-open-on-gh - run: mdbook build diff --git a/.gitignore b/.gitignore index 5a0bf03..41d4f2a 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ /book +.DS_Store diff --git a/.vscode/tasks.json b/.vscode/tasks.json new file mode 100644 index 0000000..d57ebc6 --- /dev/null +++ b/.vscode/tasks.json @@ -0,0 +1,10 @@ +{ + "version": "2.0.0", + "tasks": [ + { + "type": "shell", + "label": "docs atomic data (mdbook serve)", + "command": "mdbook serve", + } + ] +} diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..cf62a6d --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2022 Joep Meindertsma + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md index d4a85f6..2117450 100644 --- a/README.md +++ b/README.md @@ -1,31 +1 @@ -![Atomic Data](src/assets/atomic_data_logo_stroke.svg) - -# `atomic-data-docs` - -_Atomic Data is a specification for sharing, modifying and modeling graph data._ - -View it on [docs.atomicdata.dev](https://docs.atomicdata.dev). -If you're looking for an (early) implementation of Atomic data, check out [atomic](https://github.com/joepio/atomic) (server + cli + lib) and [atomic-data-browser](https://github.com/joepio/atomic-data-browser) (react / typescript). - -## About this repo - -This repository holds the markdown book for the Atomic Data standard. - -You can run it locally using [mdBook](https://github.com/rust-lang/mdBook) - -```sh -# This requires at least Rust 1.39 and Cargo to be installed. Once you have installed Rust, type the following in the terminal: -cargo install mdbook -# Install mdbook-linkcheck to prevent broken links in your markdown. -cargo install mdbook-linkcheck -# Serve at localhost:3000, updates when files change. -mdbook serve -``` - -Publishing is done with Github actions - simply push the master branch. - -## Contributing - -Add an issue or open a PR! -All thoughts are welcome. -Also, check out the [Discord](https://discord.gg/a72Rv2P). +# MOVED TO [ATOMIC-SERVER REPO](https://github.com/atomicdata-dev/atomic-server/tree/develop/docs) diff --git a/book.toml b/book.toml index af3a367..3848b54 100644 --- a/book.toml +++ b/book.toml @@ -5,8 +5,15 @@ description = "Documentation for the Atomic Data standard." language = "en" [output.html] -google-analytics = "UA-121994595-2" -git-repository-url = "https://github.com/ontola/atomic-data" git-repository-icon = "fa-github" +git-repository-url = "https://github.com/atomicdata-dev/atomic-data-docs" +additional-css = ["open-in.css"] [output.linkcheck] +optional = true + +[preprocessor.open-on-gh] +command = "mdbook-open-on-gh" +renderer = ["html"] +text = "mdbook-open-on-gh" +open-on-text = "[Suggest edits for this page on GitHub.]" diff --git a/open-in.css b/open-in.css new file mode 100644 index 0000000..7c46d76 --- /dev/null +++ b/open-in.css @@ -0,0 +1,6 @@ +footer { + font-size: 0.8em; + text-align: center; + border-top: 1px solid black; + padding: 5px 0; +} diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 55dc8ee..25f3a52 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -2,39 +2,67 @@ * [Atomic Data Overview](atomic-data-overview.md) * [Motivation](motivation.md) + * [Strategy, history and roadmap](roadmap.md) * [When (not) to use it](when-to-use.md) -* [Core](core/intro.md) - * [Concepts](core/concepts.md) + +# Specification (core) + +* [What is Atomic Data?](core/concepts.md) * [Serialization](core/serialization.md) * [JSON-AD](core/json-ad.md) * [Querying](core/querying.md) * [Paths](core/paths.md) -* [Schema](schema/intro.md) - * [Classes](schema/classes.md) - * [Datatypes](schema/datatypes.md) - * [Translations](schema/translations.md) - * [FAQ](schema/faq.md) -* [Collections, filtering, sorting](schema/collections.md) -* [Agents and authentication](agents.md) -* [Invitations and sharing](invitations.md) -* [Commits (writing data)](commits/intro.md) - * [Concepts](commits/concepts.md) - * [Compared to](commits/compare.md) -* [Hierarchy and authorization](hierarchy.md) -* [Endpoints](endpoints.md) -* [Interoperability](interoperability/intro.md) + * [Schema](schema/intro.md) + * [Classes](schema/classes.md) + * [Datatypes](schema/datatypes.md) + * [FAQ](schema/faq.md) + +# Specification (extended) + +* [Atomic Data Extended](extended.md) + * [Agents](agents.md) + * [Hierarchy and authorization](hierarchy.md) + * [Authentication](authentication.md) + * [Invitations and sharing](invitations.md) + * [Commits (writing data)](commits/intro.md) + * [Concepts](commits/concepts.md) + * [Compared to](commits/compare.md) + * [WebSockets](websockets.md) + * [Endpoints](endpoints.md) + * [Collections, filtering, sorting](schema/collections.md) + * [Uploading and downloading files](files.md) + +# Create Atomic Data + +* [Atomizing](atomizing.md) + * [Using Atomic-Server](atomic-server.md) + * [Creating a JSON-AD file](create-json-ad.md) + * [Upgrade your existing project](interoperability/upgrade.md) + +# Use Atomic Data + +* [Interoperability and comparisons](interoperability/intro.md) * [RDF](interoperability/rdf.md) * [Solid](interoperability/solid.md) * [JSON](interoperability/json.md) * [IPFS](interoperability/ipfs.md) * [SQL](interoperability/sql.md) * [Graph Databases](interoperability/graph-database.md) - * [Your existing project](interoperability/upgrade.md) -* [Possible use cases](usecases/intro.md) +* [Potential use cases](usecases/intro.md) + * [As a Headless CMS](usecases/headless-cms.md) + * [In a React project](usecases/react.md) * [Personal Data Store](usecases/personal-data-store.md) + * [Artificial Intelligence](usecases/ai.md) + * [E-commerce & marketplaces](usecases/e-commerce.md) * [Surveys](usecases/surveys.md) -* [Software and libraries](tooling.md) + * [Verifiable Credentials](usecases/verifiable-credentials.md) + * [Data Catalog](usecases/data-catalog.md) + * [Education](usecases/education.md) + * [Food labels](usecases/food-labels.md) +* [**Software and libraries**](tooling.md) ----------- +[Acknowledgements](acknowledgements.md) | +[Newsletter](newsletter.md) | [Get involved](get-involved.md) diff --git a/src/acknowledgements.md b/src/acknowledgements.md new file mode 100644 index 0000000..e88ec6b --- /dev/null +++ b/src/acknowledgements.md @@ -0,0 +1,19 @@ +# Acknowledgements + +## Authors: + +- **Joep Meindertsma** ([joepio](https://github.com/joepio/) from [Ontola.io](https://ontola.io/)) + +## Special thanks to: + +- **Thom van Kalkeren** (my colleague, friend and programming mentor who came up with many great ideas on how to work with RDF, such as [HexTuples](https://github.com/ontola/hextuples) and [linked-delta](https://github.com/ontola/linked-delta)) +- **Tim Berners-Lee** (for everything he did for linked data and the web) +- **Ruben Verborgh** (for doing great work with RDF, such as the TPF spec) +- **Pat McBennett** (for lots of valuable feedback on initial Atomic Data docs) +- **Manu Sporny** (for his work on JSON-LD, which was an important inspiration for JSON-AD) +- **Jonas Smedegaard** (for the various interesting talks we had and the feedback he provided) +- **Arthur Dingemans** (for sharing his thoughts, providing feedback and his valuable suggestions) +- **Anja Koopman** (for all her support, even when this project ate away days and nights of our time together) +- **Alex Mikhalev** (for sharing many inspiring project and ideas) +- **Daniel Lutrha** (for inspiring me to be more ambitious and for providing lots of technical ideas) +- All the other people who contributed to linked data related standards diff --git a/src/agents.md b/src/agents.md index a1b9d0a..35d9b41 100644 --- a/src/agents.md +++ b/src/agents.md @@ -1,9 +1,10 @@ +{{#title Atomic Data Agents - Users and identities }} # Atomic Agents -Atomic Agents are used for _authentication_: to set an identity and prove who an actor actually is. +Atomic Agents are used for [authentication](./authentication.md): to set an identity and prove who an actor actually is. Agents can represent both actual individuals, or machines that interact with data. Agents are the entities that can get write / read rights. -Agents are used to sign [Commits](commits/intro.md) and to accept [Invites](invitations.md). +Agents are used to sign Requests and [Commits](commits/intro.md) and to accept [Invites](invitations.md). ## Design goals @@ -11,6 +12,7 @@ Agents are used to sign [Commits](commits/intro.md) and to accept [Invites](invi - **Easy**: It should be easy to work with, code with, and use - **Privacy-friendly**: Agents should allow for privacy friendly workflows - **Verifiable**: Others should be able to verify who did what +- **Secure**: Resistant to attacks by malicious others ## The Agent model diff --git a/src/assets/venn.svg b/src/assets/venn.svg new file mode 100644 index 0000000..29269cd --- /dev/null +++ b/src/assets/venn.svg @@ -0,0 +1,16 @@ + + + + + + + + + + + + + + + + diff --git a/src/assets/venn_old.svg b/src/assets/venn_old.svg new file mode 100644 index 0000000..9bff990 --- /dev/null +++ b/src/assets/venn_old.svg @@ -0,0 +1,58 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/atomic-data-overview.md b/src/atomic-data-overview.md index 486271e..468bd5d 100644 --- a/src/atomic-data-overview.md +++ b/src/atomic-data-overview.md @@ -1,44 +1,57 @@ +{{#title Atomic Data}} ![# Atomic Data Docs - Overview](assets/atomic_data_logo_stroke.svg) -Atomic Data is a specification for sharing, modifying and modeling graph data. +**Atomic Data is a modular specification for sharing, modifying and modeling graph data. It combines the ease of use of JSON, the connectivity of RDF (linked data) and the reliability of type-safety.** -It uses links to connect pieces of data, and therefore makes it easier to connect datasets to each other - even when these datasets exist on separate machines. +![Venn diagram showing Atomic Data is the combination of JSON, RDF and Type-Safety](assets/venn.svg) -Atomic Data is especially suitable for knowledge graphs, distributed datasets, semantic data, p2p applications, decentralized apps and linked open data. -It is designed to be highly extensible, easy to use, and to make the process of domain specific standardization as simple as possible. +Atomic Data uses links to connect pieces of data, and therefore makes it easier to connect datasets to each other - even when these datasets exist on separate machines. -Atomic Data is [Linked Data](https://ontola.io/what-is-linked-data/), as it is a more strict subset of RDF. -It is typed (you know if something is a `string`, `number`, `date`, `URL`, etc.) and extensible through [Atomic Schema](schema/intro.md), which means that you can define your own Classes, Properties and Datatypes. +Atomic Data has been designed with [the following goals in mind](motivation.md): + +- Give people more control over their data +- Make linked data easier to use +- Make it easier for developers to build highly interoperable apps +- Make standardization easier and cheaper + +Atomic Data is [Linked Data](https://ontola.io/what-is-linked-data/), as it is a [strict subset of RDF](interoperability/rdf.md). +It is type-safe (you know if something is a `string`, `number`, `date`, `URL`, etc.) and extensible through [Atomic Schema](schema/intro.md), which means that you can re-use or define your own Classes, Properties and Datatypes. The default serialization format for Atomic Data is [JSON-AD](core/json-ad.md), which is simply JSON where each key is a URL of an Atomic Property. These Properties are responsible for setting the `datatype` (to ensure type-safety) and setting `shortnames` (which help to keep names short, for example in JSON serialization) and `descriptions` (which provide semantic explanations of what a property should be used for). -Atomic Data has a standard for communicating state changes called [Commits](commits/intro.md). -These Commits are signed using cryptographic keys, which ensures that every change can be audited. -Commits are also used to construct a history of versions. +[Read more about Atomic Data Core](core/concepts.md) + +## Atomic Data Extended + +Atomic Data Extended is a set of extra modules (on top of Atomic Data Core) that deal with data that changes over time, authentication, and authorization. + +{{#include extended-table.md}} + +## Atomizing: how to create, convert and host Atomic Data + +Atomic Data has been designed to be very easy to create and host. +In the Atomizing section, we'll show you how you can create Atomic Data in three ways: -[Agents](agents.md) are Users that enable authentication. -Atomic Data can be traversed using [Paths](core/paths.md), or queried using [Collections](schema/collections.md). -[Hierarchies](hierarchy.md) are used for authorization and keeping data organized. -[Invites](invitations.md) can be used to easily create new users and provide them with rights. +- [Using Atomic Server, from your browser](atomic-server.md) +- [By creating JSON-AD (and optionally importing it)](create-json-ad.md) +- [By upgrading your existing application](interoperability/upgrade.md) -## Get Started +## Tools & libraries -If you want to read more about how Atomic Data works - read on. -If you'd rather play and discover for yourself, play with the existing open source [tooling](tooling.md): +- Browser app [atomic-data-browser](https://github.com/atomicdata-dev/atomic-data-browser) ([demo on atomicdata.dev](https://atomicdata.dev)) +- Build a react app using [typescript & react libraries](https://github.com/atomicdata-dev/atomic-data-browser). Start with the [react template on codesandbox](https://codesandbox.io/s/atomic-data-react-template-4y9qu?file=/src/MyResource.tsx) +- Host your own [atomic-server](https://github.com/atomicdata-dev/atomic-server) (powers [atomicdata.dev](https://atomicdata.dev), run with `docker run -p 80:80 -v atomic-storage:/atomic-storage joepmeneer/atomic-server`) +- Discover the command line tool: [atomic-cli](https://github.com/atomicdata-dev/atomic-server) (`cargo install atomic-cli`) +- Use the Rust library: [atomic-lib](https://github.com/atomicdata-dev/atomic-server) -- Browser app [atomic-data-browser](https://github.com/joepio/atomic-data-browser) ([demo on atomicdata.dev](https://atomicdata.dev)) -- Build a react app using [typescript & react libraries](https://github.com/joepio/atomic-data-ts). Start with the [react template on codesandbox](https://codesandbox.io/s/atomic-data-react-template-4y9qu?file=/src/MyResource.tsx) -- Host your own [atomic-server](https://github.com/joepio/atomic) (powers [atomicdata.dev](https://atomicdata.dev), run with `docker run -p 80:80 -p 443:443 -v atomic-storage:/atomic-storage joepmeneer/atomic-server`) -- Discover the command line tool: [atomic-cli](https://github.com/joepio/atomic) (`cargo install atomic-cli`) -- Use the Rust library: [atomic-lib](https://github.com/joepio/atomic) +## Get involved Make sure to [join our Discord](https://discord.gg/a72Rv2P) if you'd like to discuss Atomic Data with others. -Keep in mind that none of the Atomic Data project has reached a v1, which means that breaking changes can happen. ## Status -Keep in mind that none of the Atomic Data project has reached a v1, which means that breaking changes can happen. +Keep in mind that none of the Atomic Data projects has reached a v1, which means that breaking changes can happen. ## Reading these docs diff --git a/src/atomic-server.md b/src/atomic-server.md new file mode 100644 index 0000000..cfe0295 --- /dev/null +++ b/src/atomic-server.md @@ -0,0 +1,124 @@ +# Creating Atomic Data using Atomic-Server + +Here is everything you need to get started: + +- [Atomic-Server and its features](#atomic-server-and-its-features) +- [Running Atomic-Server locally (optional)](#running-atomic-server-locally-optional) +- [Creating an Agent](#creating-an-agent) +- [Creating your first Atomic Data](#creating-your-first-atomic-data) +- [There's more!](#theres-more) + +## Atomic-Server and its features + +[`Atomic-Server`](https://github.com/atomicdata-dev/atomic-server/blob/master/server/README.md) is the _reference implementation_ of the Atomic Data Core + Extended specification. +It was developed parallel to this specification, and it served as a testing ground for various ideas (some of which didn't work, and some of which ended up in the spec). + +Atomic-Server is a graph database server for storing and sharing typed linked data. +It's free, open source (MIT license), and has a ton of features: + +- ⚛️ **Dynamic schema validation** / type checking using [Atomic Schema](https://docs.atomicdata.dev/schema/intro.html). Combines safety of structured data with the +- 🚀 **Fast** (1ms responses on my laptop) +- 🪶 **Lightweight** (15MB binary, no runtime dependencies) +- 💻 **Runs everywhere** (linux, windows, mac, arm) +- 🌐 **Embedded server** with support for HTTP / HTTPS / HTTP2.0 and Built-in LetsEncrypt handshake. +- 🎛️ **Browser GUI included** powered by [atomic-data-browser](https://github.com/atomicdata-dev/atomic-data-browser). Features dynamic forms, tables, authentication, theming and more. +- 💾 **Event-sourced versioning** / history powered by [Atomic Commits](https://docs.atomicdata.dev/commits/intro.html) +- 🔄 **Synchronization using websockets**: communicates state changes with a client. Send a `wss` request to `/ws` to open a webscocket. +- 🧰 **Many serialization options**: to JSON, [JSON-AD](https://docs.atomicdata.dev/core/json-ad.html), and various Linked Data / RDF formats (RDF/XML, N-Triples / Turtle / JSON-LD). +- 🔎 **Full-text search** with fuzzy search and various operators, often <3ms responses. +- 📖 **Pagination, sorting and filtering** using [Atomic Collections](https://docs.atomicdata.dev/schema/collections.html) +- 🔐 **Authorization** (read / write permissions) and Hierarchical structures powered by [Atomic Hierarchy](https://docs.atomicdata.dev/hierarchy.html) +- 📲 **Invite and sharing system** with [Atomic Invites](https://docs.atomicdata.dev/invitations.html) +- 📂 **File management**: Upload, download and preview attachments. +- 🖥️ **Desktop app**: Easy desktop installation, with status bar icon, powered by [tauri](https://github.com/tauri-apps/tauri/). + +## Running Atomic-Server locally (optional) + +In this guide, we'll can simply use `atomicdata.dev` in our browser without installing anything. +So you can skip this step and go to _Creating your first Atomic Data_. +But if you want to, you can run Atomic-Server on your machine in a couple of ways: + +- **Using a desktop installer**: download a desktop release from the [`releases`](https://github.com/atomicdata-dev/atomic-server/releases) page and install it using your desktop GUI. +- **Using a binary**: download a binary release from the [`releases`](https://github.com/atomicdata-dev/atomic-server/releases) page and open it using a terminal. +- **Using Docker** is probably the quickest: `docker run -p 80:80 -v atomic-storage:/atomic-storage joepmeneer/atomic-server`. +- **Using Cargo**: `cargo install atomic-server` and then run `atomic-server` to start. + +_[Atomic-Server's README](https://github.com/atomicdata-dev/atomic-server) contains more (and up-to-date) information about how to use it!_ + +Open your server in your browser. +By default, that's [`http://localhost:9883`](http://localhost:9883). +Fun fact: `⚛` is HTML entity code for the Atom icon: ⚛. + +The first screen should show you your [_Drive_](https://atomicdata.dev/classes/Drive). +You can think of this as your root folder. +It is the resource hosted at the root URL, effectively being the home page of your server. + +There's an instruction on the screen about the `/setup` page. +Click this, and you'll get a screen showing an [_Invite_](https://atomicdata.dev/classes/Invite). +Normally, you could `Accept as new user`, but since you're running on `localhost`, you won't be able to use the newly created Agent on non-local Atomic-Servers. +Therefore, it may be best to create an Agent on some _other_ running server, such as the [demo Invite on AtomicData.dev](https://atomicdata.dev/invites/1). +And after that, copy the Secret from the `User settings` panel from AtomicData.dev, go back to your `localhost` version, and press `sign in`. +Paste the Secret, and voila! You're signed in. + +Now, again go to `/setup`. This time, you can `Accept as {user}`. +After clicking, your Agent has gotten `write` rights for the Drive! +You can verify this by hovering over the description field, clicking the edit icon, and making a few changes. +You can also press the menu button (three dots, top left) and press `Data view` to see your agent after the `write` field. +Note that you can now edit every field. +You can also fetch your data now as various formats. + +Try checking out the other features in the menu bar, and check out the `collections`. + +Again, check out the [README](https://github.com/atomicdata-dev/atomic-server) for more information and guides! + +Now, let's create some data. + +## Creating an Agent + +Before you can create new things on AtomicData.dev, you'll need an _Agent_. +This is your virtual User, which can create, sign and own things. + +Simply open the [demo invite](https://atomicdata.dev/invites/1) and press accept. +And you're done! + + +## Creating your first Atomic Data + +Now let's create a [_Class_](https://atomicdata.dev/classes/Class). +A Class represents an abstract concept, such as a `BlogPost` (which we'll do here). +We can do this in a couple of ways: + +- Press the `+ icon` button on the left menu (only visible when logged in), and selecting Class +- Opening [Class](https://atomicdata.dev/classes/Class) and pressing `new class` +- Going to the [Classes Collection](https://atomicdata.dev/classes/) and pressing the plus icon + +The result is the same: we end up with a form in which we can fill in some details. + +Let's add a shortname (singular), and then a description. + +After that, we'll add the `required` properties. +This form you're looking at is constructed by using the `required` and `recommended` Properties defined in `Class`. +We can use these same fields to generate our BlogPost resource! +Which fields would be required in a `BlogPost`? +A `name`, and a `description`, probably. + +So click on the `+ icon` under `requires` and search for these Properties to add them. + +Now, we can skip the `recommended` properties, and get right to saving our newly created `BlogPost` class. +So, press save, and now look at what you created. + +Notice a couple of things: + +- Your Class has its own URL. +- It has a `parent`, shown in the top of the screen. This has impact on the visibility and rights of your Resource. We'll get to that [later in the documentation](./hierarchy.md). + +Now, go to the navigation bar, which is by default at the bottom of the window. Use its context menu to open the `Data View`. +This view gives you some more insight into your newly created data, and various ways in which you can serialize it. + +## There's more! + +This was just a very brief introduction to Atomic Server, and its features. +There's quite a bit that we didn't dive in to, such as versioning, file uploads, the collaborative document editor and more... +But by clicking around you're likely to discover these features for yourself. + +In the next page, we'll dive into how you can create an publish JSON-AD files. diff --git a/src/atomizing.md b/src/atomizing.md new file mode 100644 index 0000000..c483f81 --- /dev/null +++ b/src/atomizing.md @@ -0,0 +1,21 @@ +# Atomizing: How to create and publish Atomic Data + +Now that we're familiar with the basics of Atomic Data Core and its Schema, it's time to create some Atomic Data! +We call the process of turning data into Atomic Data _Atomizing_. +During this process, we **upgrade the data quality**. +Our information becomes more valuable. +Let's summarize what the advantages are: + +- Your data becomes **available on the web** (publicly, if you want it to) +- It can now **link to other data**, an become part of a bigger web of data +- It becomes **strictly typed**, so developers can easily and safely re-use it in their software +- It becomes **easier to understand**, because people can look at the Properties and see what they mean +- It can be **easily converted** into many formats (JSON, Turtle, CSV, XML, more...) + +## Three ways to Atomize data + +In general, there are three ways to create Atomic Data: + +- [Using the **Atomic-Server** app + GUI](./atomic-server.md) (easy, only for direct user input) +- [Create an **importable JSON-AD file**](./create-json-ad.md) (medium, useful if you want to convert existing data) +- [Make your existing service / app **host and serialize Atomic Data**](./interoperability/upgrade.md) (hard, if you want to make your entire app be part of the Atomic Web!) diff --git a/src/authentication.md b/src/authentication.md new file mode 100644 index 0000000..fee1d6b --- /dev/null +++ b/src/authentication.md @@ -0,0 +1,140 @@ +# Authentication in Atomic Data + +Authentication means knowing _who_ is doing something, either getting access or creating some new data. +When an Agent wants to _edit_ a resource, they have to send a signed [Commit](commits/intro.md), and the signatures are checked in order to authorize a Commit. + +But how do we deal with _reading_ data, how do we know who is trying to get access? +There are two ways users can authenticate themselves: + +- Signing an `Authentication Resource` and using that as a cookie +- Opening a WebSocket, and passing an `Authentication Resource`. +- Signing every single HTTP request (more secure, less flexible) + +## Design goals + +- **Secure**: Because, what's the point of authentication if it's not? +- **Easy to use**: Setting up an identity should not require _any_ effort, and proving identity should be minimal effort. +- **Anonimity allowed**: Users should be able to have multiple identities, some of which are fully anonymous. +- **Self-sovereign**: No dependency on servers that user's don't control. Or at least, minimise this. +- **Dummy-proof**: We need a mechanism for dealing with forgetting passwords / client devices losing data. +- **Compatible with Commits**: Atomic Commits require clients to sign things. Ideally, this functionality / strategy would also fit with the new model. +- **Fast**: Of course, authentication will always slow things down. But let's keep that to a minimum. + +## Authentication Resources + +An _Authentication Resource_ is a JSON-AD object containing all the information a Server needs to make sure a valid Agent requests a session at some point in time. +These are used both in Cookie-based auth, as well as in [WebSockets](websockets.md) + +We use the following fields (be sure to use the full URLs in the resource, see the example below): + +- `requestedSubject`: The URL of the requested resource. + - If we're authenticating a *WebSocket*, we use the `wss` address as the `requestedSubject`. (e.g. `wss://example.com/ws`) + - If we're authenticating a *Cookie* of *Bearer token*, we use the origin of the server (e.g. `https://example.com`) + - If we're authentication a *single HTTP request*, use the same URL as the `GET` address (e.g. `https://example.com/myResource`) +- `agent`: The URL of the Agent requesting the subject and signing this Authentication Resource. +- `publicKey`: base64 serialized ED25519 public key of the agent. +- `signature`: base64 serialized ED25519 signature of the following string: `{requestedSubject} {timestamp}` (without the brackets), signed by the private key of the Agent. +- `timestamp`: Unix timestamp of when the Authentication was signed +- `validUntil` (optional): Unix timestamp of when the Authentication should be no longer valid. If not provided, the server will default to 30 seconds from the `timestamp`. + +Here's what a JSON-AD Authentication Resource looks like for a WebSocket: + +```json +{ + "https://atomicdata.dev/properties/auth/agent": "http://example.com/agents/N32zQnZHoj1LbTaWI5CkA4eT2AaJNBPhWcNriBgy6CE=", + "https://atomicdata.dev/properties/auth/requestedSubject": "wss://example.com/ws", + "https://atomicdata.dev/properties/auth/publicKey": "N32zQnZHoj1LbTaWI5CkA4eT2AaJNBPhWcNriBgy6CE=", + "https://atomicdata.dev/properties/auth/timestamp": 1661757470002, + "https://atomicdata.dev/properties/auth/signature": "19Ce38zFu0E37kXWn8xGEAaeRyeP6EK0S2bt03s36gRrWxLiBbuyxX3LU9qg68pvZTzY3/P3Pgxr6VrOEvYAAQ==" +} +``` + +## Atomic Cookies Authentication + +In this approach, the client creates and signs a Resource that proves that an Agent wants to access a certain server for some amount of time. +This Authentication Resource is stored as a cookie, and passed along in every HTTP request to the server. + +### Setting the cookie + +1. Create a signed Authentication object, as described above. +2. Serialize it as JSON-AD, then as a base64 string. +3. Store it in a Cookie: + 1. Name the cookie `atomic_session` + 2. The expiration date of the cookie should be set, and should match the expiration date of the Authentication Resource. + 3. Set the `Secure` attribute to prevent Man-in-the-middle attacks over HTTP + +## Bearer Token Authentication + +Similar to creating the Cookie, except that we pass the base64 serialized Authentication Resource as a Bearer token in the `Authorization` header. + +```http +GET /myResource HTTP/1.1 +Authorization: Bearer {base64 serialized Authentication Resource} +``` + +In Data Browser, you can find the `token` tab in `/app/token` to create a token. + +## Authenticating Websockets + +After [opening a WebSocket connection](websockets.md), create an Authentication Resource. +Send a message like so: `AUTHENTICATE {authenticationResource}`. +The server will only respond if there is something wrong. + +## Per-Request Signing + +Atomic Data allows **signing every HTTP request**. +This method is most secure, since a MITM attack would only give access to the specific resource requested, and only for a short amount of time. +Note that signing every single request takes a bit of time. +We picked a fast algorithm (Ed25519) to minimize this cost. + +### HTTP Headers + +All of the following headers are required, if you need authentication. + +- `x-atomic-public-key`: The base64 public key (Ed25519) of the Agent sending the request +- `x-atomic-signature`: A base64 signature of the following string: `{subject} {timestamp}` +- `x-atomic-timestamp`: The current time (when sending the request) as milliseconds since unix epoch +- `x-atomic-agent`: The subject URL of the Agent sending the request. + +### Sending a request + +Here's an example (js) client side implementation with comments: + +```ts +// The Private Key of the agent is used for signing +// https://atomicdata.dev/properties/privateKey +const privateKey = "someBase64Key"; +const timestamp = Math.round(new Date().getTime());; +// This is what you will need to sign. +// The timestmap is to limit the harm of a man-in-the-middle attack. +// The `subject` is the full HTTP url that is to be fetched. +const message = `${subject} ${timestamp}`; +// Sign using Ed25519, see example implementation here: https://github.com/atomicdata-dev/atomic-data-browser/blob/30b2f8af59d25084de966301cb6bd1ed90c0eb78/lib/src/commit.ts#L176 +const signed = await signToBase64(message, privateKey); +// Set all of these headers +const headers = new Headers; +headers.set('x-atomic-public-key', await agent.getPublicKey()); +headers.set('x-atomic-signature', signed); +headers.set('x-atomic-timestamp', timestamp.toString()); +headers.set('x-atomic-agent', agent?.subject); +const response = await fetch(subject, {headers}); +``` + +## Verifying an Authentication + +- If none of the `x-atomic` HTTP headers are present, the server assigns the [PublicAgent](https://atomicdata.dev/agents/publicAgent) to the request. This Agent represents any guest who is not signed in. +- If some (but not all) of the `x-atomic` headers are present, the server will return with a `500`. +- The server must check if the `validUntil` has not yet passed. +- The server must check whether the public key matches the one from the Agent. +- The server must check if the signature is valid. +- The server should check if the request resource can be accessed by the Agent using [hierarchy](hierarchy.md) (e.g. check `read` right in the resource or its parents). + +## Hierarchies for authorization + +Atomic Data uses [Hierarchies](hierarchy.md) to describe who gets to access some resource, and who can edit it. + +## Limitations / considerations + +- Since we need the Private Key to sign Commits and requests, the client should have this available. This means the client software as well as the user should deal with key management, and that can be a security risk in some contexts (such as a web browser). [See issue #49](https://github.com/ontola/atomic-data-docs/issues/49). +- When using the Agent's subject to authenticate somewehere, the authorizer must be able to check what the public key of the agent is. This means the agent must be publicly resolvable. This is one of the reasons we should work towards a server-independent identifier, probably as base64 string that contains the public key (and, optionally, also the https identifier). See [issue #59 on DIDs](https://github.com/ontola/atomic-data-docs/issues/59). +- We'll probably also introduce some form of token-based-authentication created server side in the future. [See #87](https://github.com/ontola/atomic-data-docs/issues/87) diff --git a/src/commits/compare.md b/src/commits/compare.md index 0bf4f64..cfa776f 100644 --- a/src/commits/compare.md +++ b/src/commits/compare.md @@ -1,3 +1,4 @@ +{{#title Atomic Commits compared to other (RDF) delta models}} # Atomic Commits compared to other (RDF) delta models Let's compare the [Atomic Commit](concepts.md) approach with some existing protocols for communicating state changes / patches / mutations / deltas in linked data, JSON and text files. @@ -22,9 +23,26 @@ It is designed for collaborating on open source projects, which means dealing wi ## RDF mutation systems +Let's move on to specifications that mutate RDF specifically: + +### .n3 Patch + +N3 Patch is [part of the Solid spec](https://solidproject.org/TR/protocol#writing-resources), since december 2021. + +It uses the N3 serialization format to describe changes to RDF documents. + +``` +@prefix solid: + +<> solid:patches ; + solid:where { ?a . }; + solid:inserts { ?a . }; + solid:deletes { ?a . }. +``` + ### RDF-Delta -[https://afs.github.io/rdf-delta/]() +[https://afs.github.io/rdf-delta/](https://afs.github.io/rdf-delta/) Describes changes (RDF Patches) in a specialized turtle-like serialization format. @@ -138,7 +156,7 @@ An N-Quads serialized delta format. Methods are URLs, which means they are extensible. Does not specify how to bundle lines. Used in production of a web app that we're working on ([Argu.co](https://argu.co)). -Designed with simplicity (no new serialization format, simple to parse) and performance in mind. +Designed with simplicity (no new serialization format, simple to parse) and performance in mind by my colleague Thom van Kalkeren. ``` Initial state: @@ -155,7 +173,7 @@ New state: ``` -## JSON-LD-PATCH +### JSON-LD-PATCH [https://github.com/digibib/ls.ext/wiki/JSON-LD-PATCH]() @@ -243,8 +261,13 @@ The result It uses the [JSON-Pointer spec](http://tools.ietf.org/html/rfc6901) for denoting `path`s. It has quite a bunch of implementations, in various languages. + -## Atomic Commits +## Atomic Commits - how it's different and why it exists Let's talk about the differences between the concepts above and Atomic Commits. diff --git a/src/commits/concepts.md b/src/commits/concepts.md index 75de9aa..8584814 100644 --- a/src/commits/concepts.md +++ b/src/commits/concepts.md @@ -1,3 +1,4 @@ +{{#title Atomic Commits: Concepts}} # Atomic Commits: Concepts ## Commit @@ -10,7 +11,7 @@ It is cryptographically signed by an [Agent](https://atomicdata.dev/classes/Agen The **required fields** are: -- `subject` - The thing being changed. A Resource Subject URL that the Commit is providing information about. +- `subject` - The thing being changed. A Resource Subject URL (HTTP identifier) that the Commit is changing about. A Commit Subject must not contain query parameters, as these are reserved for dynamic resources. - `signer` - Who's making the change. The Atomic URL of the Author's profile - which in turn must contain a `publicKey`. - `signature` - Cryptographic proof of the change. A hash of the JSON-AD serialized Commit (without the `signature` field), signed by the Agent's `private-key`. This proves that the author is indeed the one who created this exact commit. The signature of the Commit is also used as the identifier of the commit. - `created-at` - When the change was made. A UNIX timestamp number of when the commit was created. @@ -20,6 +21,7 @@ The **optional method fields** describe how the data must be changed: - `destroy` - If true, the existing Resource will be removed. - `remove` - an array of Properties that need to be removed (including their values). - `set` - a Nested Resource which contains all the new or edited fields. +- `push` - a Nested Resource which contains all the fields that are _appended_ to. This means adding items to a new or existing ResourceArray. These commands are executed in the order above. This means that you can set `destroy` to `true` and include `set`, which empties the existing resource and sets new values. @@ -49,6 +51,7 @@ Let's look at an example Commit: }, "https://atomicdata.dev/properties/signature": "3n+U/3OvymF86Ha6S9MQZtRVIQAAL0rv9ZQpjViht4emjnqKxj4wByiO9RhfL+qwoxTg0FMwKQsNg6d0QU7pAw==", "https://atomicdata.dev/properties/signer": "https://surfy.ddns.net/agents/9YCs7htDdF4yBAiA4HuHgjsafg+xZIrtZNELz4msCmc=", + "https://atomicdata.dev/properties/previousCommit": "https://surfy.ddns.net/commits/9YCs7htDdF4yBAiA4HuHgjsafg+xZIrtZNELz4msCmc=", "https://atomicdata.dev/properties/subject": "https://atomicdata.dev/test" } ``` @@ -80,15 +83,35 @@ Congratulations, you've just created a valid Commit! Here are currently working implementations of this process, including serialization and signing (links are permalinks). -- [in Rust (atomic-lib)](https://github.com/joepio/atomic/blob/ceb88c1ae58811f2a9e6bacb7eaa39a2a7aa1513/lib/src/commit.rs#L81). -- [in Typescript / Javascript (atomic-data-browser)](https://github.com/joepio/atomic-data-browser/blob/fc899bb2cf54bdff593ee6b4debf52e20a85619e/src/atomic-lib/commit.ts#L51). +- [in Rust (atomic-lib)](https://github.com/atomicdata-dev/atomic-server/blob/ceb88c1ae58811f2a9e6bacb7eaa39a2a7aa1513/lib/src/commit.rs#L81). +- [in Typescript / Javascript (atomic-data-browser)](https://github.com/atomicdata-dev/atomic-data-browser/blob/fc899bb2cf54bdff593ee6b4debf52e20a85619e/src/atomic-lib/commit.ts#L51). If you want validate your implementation, check out the tests for these two projects. +### Applying the Commit + +If you're on the receiving end of a Commit (e.g. if you're writing a server or a client who has to parse Commits), you will _apply_ the Commit to your Store. +If you have to _persist_ the Commit, you must perform all of the checks. +If you're writing a client, and you trust the source of the Commit, you can probably skip the validation steps. + +Here's how you apply a Commit: + +1. Check if the Subject URL is valid +2. Validate the signature. This means serialize the Commit deterministically (see above), check the Agent's publickey (you might need to fetch this one), verify if the signature matches. +3. Check if the timestamp matches is OK. I think an acceptable window is 10 seconds. +4. If the Commit is for an existing resource, get it. +5. Validate the Rights of the one making the Commit. +6. Check if the `previousCommit` of the Commit matches with the `previousCommit` of the Resource. +7. Iterate over the `set` fields. Overwrite existing, or add the new Values. Make sure the Datatypes match with the respective Properties. +8. Iterate over the `remove` fields. Remove existing properties. +9. If the Resource has one or more classes, check if the required Properties are there. +10. You might want to perform some custom validations now (e.g. if you accept an Invite, you should make sure that the one creating the Invite has the correct rights to actually make it!) +11. Store the created Commit as a Resource, and store the modified Resource! + ## Limitations -- Commits adjust only one Resource at a time, which means that you cannot change multiple in one commit. -- The one creating the Commit will need to sign it, which may make clients that write data more complicated than you'd like. -- Commits require signatures, which means key management. Doing this securely is no trivial matter. -- The signatures require JSON-AD serialization -- If your implementation stores all Commits, this means +- Commits adjust **only one Resource at a time**, which means that you cannot change multiple in one commit. ([issue](https://github.com/atomicdata-dev/atomic-data-docs/issues/130)) +- The one creating the Commit will **need to sign it**, which may make clients that write data more complicated than you'd like. You can also let Servers write Commits, but this makes them less verifiable / decentralized. +- Commits require signatures, which means **key management**. Doing this securely is no trivial matter. +- The signatures **require JSON-AD** serialization +- If your implementation persists all Commits, you might need to **store a lot of data**. diff --git a/src/commits/intro.md b/src/commits/intro.md index b231cc9..b9ee58c 100644 --- a/src/commits/intro.md +++ b/src/commits/intro.md @@ -1,8 +1,9 @@ +{{#title Atomic Commits - Event standard for Atomic Data}} # Atomic Commits _Disclaimer: Work in progress, prone to change._ -Atomic Commits is a proposed standard for communicating state changes (events / transactions / patches / deltas / mutations) of [Atomic Data](../core/intro.md). +Atomic Commits is a specification for communicating _state changes_ (events / transactions / patches / deltas / mutations) of [Atomic Data](../core/concepts.md). It is the part of Atomic Data that is concerned with writing, editing, removing and updating information. ## Design goals diff --git a/src/core/concepts.md b/src/core/concepts.md index 19241f2..faf2519 100644 --- a/src/core/concepts.md +++ b/src/core/concepts.md @@ -1,17 +1,33 @@ -# Atomic Data Core: Concepts +{{#title What is Atomic Data?}} +# What is Atomic Data? -## Atomic Data +## Atomic Data Core -Atomic Data is a data model for sharing information on the web. -It can be used to express any type of information, including personal data, vocabularies, metadata, documents, files and more. +Atomic Data is a modular specification for sharing information on the web. +Since Atomic Data is a _modular_ specification, you can mostly take what you want to use, and ignore the rest. +The _Core_ part, however, is the _only required_ part of the specification, as all others depend on it. + +Atomic Data Core can be used to express any type of information, including personal data, vocabularies, metadata, documents, files and more. It's designed to be easily serializable to both JSON and linked data formats. -It is _typed_ data model, which means that every value should be validated and predictable. +It is a _typed_ data model, which means that every value must be validated by their datatype. + +## Design goals + +* **Browsable**: Data should explicitly link to other pieces of data, and these links should be followable. +* **Semantic**: Every data Atom and relation has a clear semantic meaning. +* **Interoperable**: Plays nice with other data formats (e.g. JSON, XML, and all RDF formats). +* **Open**: Free to use, open source, no strings attached. +* **Clear Ownership**: The data shows who (or which domain) is in control of the data, so new versions of the data can easily be retrieved. +* **Mergeable**: Any two sets of Atoms can be merged into a single graph without any merge conflicts / name collisions. +* **Extensible**: Anyone can define their own data types and create Atoms with it. +* **ORM-friendly**: Navigate a _decentralized_ graph by using `dot.syntax`, similar to how you navigate a JSON object in javascript. +* **Type-safe**: All valid Atomic data has an unambiguous, static datatype. -It is a directed, labeled graph, similar to RDF, so contrary to some other (labeled) graph data models (e.g. NEO4j), a relationship between two items (Resources) does not have attributes. +# Concepts ## Resource -A Resource is a bunch of information about a thing, referenced by a single link (the Subject). +A _Resource_ is a bunch of information about a thing, referenced by a single link (the _Subject_). Formally, it is a set of Atoms (i.e. a Graph) that share a Subject URL. You can think of a Resource as a single row in a spreadsheet or database. In practice, Resources can be anything - a Person, a Blogpost, a Todo item. @@ -20,13 +36,13 @@ A Property can only occur once in every Resource. ## Atom (or Atomic Triple) -Every Resource is composed of Atoms. -The Atom is the smallest possible piece of _meaningful_ data / information. +Every Resource is composed of _Atoms_. +The Atom is the smallest possible piece of _meaningful_ data / information (hence the name). You can think of an Atom as a single cell in a spreadsheet or database. An Atom consists of three fields: -* **[Subject](#subject-field)**: the Thing that the atom is providing information about. -* **[Property](#property-field)**: the property of the Thing that the atom is about (will always be a URL to a [Property](../schema/classes.md#property)). +* **[Subject](#subject-field)**: the thing that the atom is providing information about. This is typically also the URL where we can find more information about it. +* **[Property](#property-field)**: the property of the thing that the atom is about (will always be a URL to a [Property](../schema/classes.md#property)). * **[Value](#value-field)**: the new piece of information about the Atom. If you're familiar with RDF, you'll notice similarities. @@ -74,9 +90,9 @@ The `@id` field denotes the Subject of each Resource, which is also the URL that In the JSON-AD example above, we have: - two **Resources**, describing two different **Subjects**: `https://example.com/arnold` and `https://example.com/britta`. -- three different **Properties** (`https://example.com/properties/bornAt`, `https://example.com/properties/firstName`, and `https://example.com/properties/bestFriend`) -- four **Values** (`1991-01-20`, `Arnold`, `https://example.com/britta` and `Britta`) -- four **Atoms** +- three different **Properties** (`https://example.com/properties/lastname`, `https://example.com/properties/birthDate`, and `https://example.com/properties/bestFriend`) +- four **Values** (`Peters`, `1991-01-20`, `https://example.com/britta` and `Smalls`) +- four **Atoms** - every row is one Atom. All Subjects and Properties are Atomic URLs: they are links that point to more Atomic Data. One of the Values is a URL, too, but we also have values like `Arnold` and `1991-01-20`. @@ -103,9 +119,11 @@ In JSON-AD, the Subject is denoted by `@id`. The Property field is the second part of an Atom. It is a URL that points to an Atomic [Property](../schema/classes.md#Property). -For example `https://example.com/createdAt` or `https://example.com/firstName`. +Examples can be found at https://atomicdata.dev/properties. -The Property field MUST be a URL, and that URL MUST resolve to an Atomic Property, which contains information about the Datatype. +The Property field MUST be a URL, and that URL MUST resolve (it must be publicly available) to an Atomic Property. +The Property is perhaps the most important concept in Atomic Data, as it is what enables the type safety (thanks to [`datatype`](https://atomicdata.dev/properties/datatype)) and the JSON compatibility (thanks to [`shortname`](https://atomicdata.dev/properties/shortname)). +We also use Properties for rendering fields in a form, because the Datatype, shortname and description helps us to create an intuitive, easy to understand input for users. ## Value field @@ -117,27 +135,16 @@ This includes URLs, strings, integers, dates and more. ## Graph A Graph is a collection of Atoms. -A Graph can describe various subjects, and may or may not be related. +A Graph can describe various subjects, which may or may not be related. Graphs can have several characteristics (Schema Complete, Valid, Closed) +In mathematial graph terminology, a graph consists of _nodes_ and _edges_. +The Atomic Data model is a so called _directed graph_, which means that relationships are by default one-way. +In Atomic Data, every node is a `Resource`, and every edge is a `Property`. + ## Nested Resource A Nested Resource only exists inside of another resource. It does not have its own subject. -In the following JSON-AD example, the `address` is a nested resource: - -```json -{ - "@id": "https://example.com/arnold", - "https://example.com/properties/address": { - "https://example.com/properties/firstLine": "Longstreet 22", - "https://example.com/properties/city": "Watertown", - "https://example.com/properties/country": "the Netherlands", - } -} -``` - -A Nested Resource often does not have its own subject (`@id`), but it _does_ have its own unique [path](./paths.md), which can be used as its identifier. - In the next chapter, we'll explore how Atomic Data is serialized. diff --git a/src/core/intro.md b/src/core/intro.md deleted file mode 100644 index eec4855..0000000 --- a/src/core/intro.md +++ /dev/null @@ -1,15 +0,0 @@ -# Atomic Data Core - -The Atomic Data Core describes the fundamental data model of Atomic Data. - -## Design goals - -* **Browsable**: Data should explicitly link to other pieces of data, and these links should be followable. -* **Semantic**: Every data Atom and relation has a clear semantic meaning. -* **Interoperable**: Plays nice with other data formats (e.g. JSON, XML, and all RDF formats). -* **Open**: Free to use, open source, no strings attached. -* **Clear Ownership**: The data shows who is in control of the data, so new versions of the data can easily be retrieved. -* **Mergeable**: Any two sets of Atoms can be merged into a single graph without any merge conflicts / name collisions. -* **Extensible**: Anyone can define their own data types and create Atoms with it. -* **ORM-friendly**: Navigate a _decentralized_ graph by using dot.syntax, similar to how you navigate a JSON object in javascript. -* **Typed**: All valid Atomic data has an unambiguous, static datatype. Models expressed in Atomic Data can be mapped to programming language models, such as `structs` or `interfaces` in Typescript / Rust / Go. diff --git a/src/core/json-ad.md b/src/core/json-ad.md index a375529..c69bf13 100644 --- a/src/core/json-ad.md +++ b/src/core/json-ad.md @@ -1,18 +1,22 @@ +{{#title JSON-AD: The Atomic Data serialization format}} # JSON-AD: The Atomic Data serialization format -`JSON-AD` is the _default_ serialization format for Atomic Data. -It is what the current [Rust](https://github.com/joepio/atomic) and [Typescript / React](https://github.com/joepio/atomic-data-browser) implementation use to communicate. -It is a [JSON](https://www.ecma-international.org/publications-and-standards/standards/ecma-404/) with a lot of links in it and the following rules: +Although you can use various serialization formats for Atomic Data, `JSON-AD` is the _default_ and _only required_ serialization format. +It is what the current [Rust](https://github.com/atomicdata-dev/atomic-data-browser) and [Typescript / React](https://github.com/atomicdata-dev/atomic-data-browser) implementations use to communicate. +It is designed to feel familiar to developers and to be easy and performant to parse and serialize. +It is inspired by [JSON-LD](https://json-ld.org/). -- Every Object is a Resource. -- Every Key is a Property URL. -- The `@id` field is special: it defines the Subject of the Resource. +It uses [JSON](https://www.ecma-international.org/publications-and-standards/standards/ecma-404/), but has some additional constraints: + +- Every single Object is a `Resource`. +- Every Key is a [`Property`](https://atomicdata.dev/classes/Property) URL. Other keys are invalid. Each Property URL must resolve to an online Atomic Data Property. +- The `@id` field is special: it defines the `Subject` of the `Resource`. If you send an HTTP GET request there with an `content-type: application/ad+json` header, you should get the full JSON-AD resource. - JSON arrays are mapped to [Resource Arrays](https://atomicdata.dev/datatypes/resourceArray) - Numbers can be [Integers](https://atomicdata.dev/datatypes/integer), [Timestamps](https://atomicdata.dev/datatypes/timestamp) or [Floats](https://atomicdata.dev/datatypes/float). - JSON booleans map to [Booleans](https://atomicdata.dev/datatypes/boolean). - JSON strings can be many datatypes, including [String](https://atomicdata.dev/datatypes/string), [Markdown](https://atomicdata.dev/datatypes/markdown), [Date](https://atomicdata.dev/datatypes/date) or other. -- Nested JSON Objects are Nested Resources. A Nested Resource can either be anonymous (without an `@id` subject) or a regular Nested Resource with an `@id` subject. -- When you want to describe multiple Resources in one JSON-AD document, use an array as the root item. +- Nested JSON Objects are Nested Resources. A Nested Resource can either be _Anonymous_ (without an `@id` subject) or a Named Nested Resource (with an `@id` subject). Everywhere a Subject URL can be used as a value (i.e. all properties with the datatype [atomicURL](https://atomicdata.dev/datatypes/atomicURL)), a Nested Resource can be used instead. This also means that an item in an `ResourceArray` can be a Nested Resource. +- The root data structure must either be a Named Resource (with an `@id`), or an Array containing Named Resources. When you want to describe multiple Resources in one JSON-AD document, use an array as the root item. Let's look at an example JSON-AD Resource: @@ -28,6 +32,35 @@ Let's look at an example JSON-AD Resource: } ``` +The mime type (for HTTP content negotiation) is `application/ad+json` ([registration ongoing](https://github.com/ontola/atomic-data-docs/issues/60)). + +## Nested, Anonymous and Named resources + +In JSON-AD, a Resource can be respresented in multiple ways: + +- **Subject**: A URL string, such as `https://atomicdata.dev/classes/Class`. +- **Named Resource**: A JSON Object with an `@id` field containing the Subject. +- **Anonymous Nested Resource** A JSON Object without an `@id` field. This is only possible if it is a Nested Resource, which means that it has a parent Resource. + +Note that this is also valid for `ResourceArrays`, which usually only contain Subjects, but are allowed to contain Nested Resources. + +In the following JSON-AD example, the `address` is a nested resource: + +```json +{ + "@id": "https://example.com/arnold", + "https://example.com/properties/address": { + "https://example.com/properties/firstLine": "Longstreet 22", + "https://example.com/properties/city": "Watertown", + "https://example.com/properties/country": "the Netherlands", + } +} +``` + +Nested Resources can be _named_ or _anonymous_. An _Anonymous Nested Resource_ does not have it's own `@id` field. +It _does_ have its own unique [path](./paths.md), which can be used as its identifier. +The `path` of the anonymous resource in the example above is `https://example.com/arnold https://example.com/properties/address`. + ## JSON-AD Parsers, serializers and other libraries - **Typescript / Javacript**: [@tomic/lib](https://www.npmjs.com/package/@tomic/lib) JSON-AD parser + in-memory store. Works with [@tomic/react](https://www.npmjs.com/package/@tomic/lib) for rendering Atomic Data in React. @@ -43,3 +76,7 @@ When you need deterministic serialization of Atomic Data (e.g. when calculating 1. The JSON-AD is minified: no newlines, no spaces. The last two steps of this process is more formally defined by the JSON Canonicalization Scheme (JCS, [rfc8785](https://tools.ietf.org/html/rfc8785)). + +## Interoperability with JSON and JSON-LD + +[Read more about this subject](../interoperability/json.md). diff --git a/src/core/paths.md b/src/core/paths.md index 1f82f05..60f78a9 100644 --- a/src/core/paths.md +++ b/src/core/paths.md @@ -1,3 +1,4 @@ +{{#title Atomic Data Paths}} # Atomic Paths An Atomic Path is a string that consists of at least one URL, followed by one or more URLs or Shortnames. @@ -109,4 +110,4 @@ This means that we still have a unique, globally resolvable identifier - yay! ## Try for yourself -Install the [`atomic-cli`](https://github.com/joepio/atomic/blob/master/cli/README.md) software and run `atomic-cli get https://atomicdata.dev/classes/Class description`. +Install the [`atomic-cli`](https://github.com/atomicdata-dev/atomic-server/blob/master/cli/README.md) software and run `atomic-cli get https://atomicdata.dev/classes/Class description`. diff --git a/src/core/querying.md b/src/core/querying.md index 5423325..1e45763 100644 --- a/src/core/querying.md +++ b/src/core/querying.md @@ -1,17 +1,13 @@ +{{#title Querying Atomic Data}} # Querying Atomic Data There are multiple ways of getting Atomic Data into some system: -- [**Atomic Collections**](../schema/collections.md) is a simple way to traverse Atomic Graphs and target specific values -- [**Atomic Paths**](paths.md) is a simple way to traverse Atomic Graphs and target specific values - [**Subject Fetching**](#subject-fetching-http) requests a single subject right from its source +- [**Atomic Collections**](../schema/collections.md) can filter, sort and paginate resources +- [**Atomic Paths**](paths.md) is a simple way to traverse Atomic Graphs and target specific values - [**Triple Pattern Fragments**](#triple-pattern-fragments) allows querying for specific (combinations of) Subject, Property and Value. -- [**SRARQL**](#SPARQL) is a powerful Query language for traversing linked data graphs - -## Atomic Paths - -An Atomic Path is a string that consist of one or more URLs, which when traversed point to an item. -For more information, see [Atomic Paths](paths.md). +- [**SPARQL**](#SPARQL) is a powerful Query language for traversing linked data graphs ## Subject fetching (HTTP) @@ -38,38 +34,26 @@ Connection: Closed The server MAY also include other resources, if they are deemed relevant. -## Triple Pattern Fragments - -[Triple Pattern Fragments](https://linkeddatafragments.org/specification/triple-pattern-fragments/) (TPF) is an interface for querying RDF. -It works great for Atomic Data as well. - -An HTTP implementation of a TPF endpoint might accept a GET request to a URL such as this: - -`http://example.org/tpf?subject={subject}&property={property}&value={value}` +## Atomic Collections -Make sure to URL encode the `subject`, `property`, `value` strings. +Collections are Resources that provide simple query options, such as filtering by Property or Value, and sorting. +They also paginate resources. +Under the hood, Collections are powered by Triple Pattern Fragments. +Use query parameters to traverse pages, filter, or sort. -For example, let's search for all Atoms where the value is `test`. +[Read more about Collections](../schema/collections.md) -```HTTP -GET https://atomicdata.dev/tpf?value=0 HTTP/1.1 -Content-Type: text/turtle -``` - -This is the HTTP response: +## Atomic Paths -```HTTP -HTTP/1.1 200 OK -Content-Type: text/turtle -Connection: Closed +An Atomic Path is a string that consist of one or more URLs, which when traversed point to an item. - "0"^^ . -``` +[Read more about Atomic Paths](paths.md) ## SPARQL [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) is a powerful RDF query language. Since all Atomic Data is also valid RDF, it should be possible to query Atomic Data using SPARQL. +None of the exsisting implementations support a SPARQL endpoint, though. -- Convert / serialize Atomic Data to RDF (for example by using the `/tpf` endpoint and an `accept` header: `curl -i -H "Accept: text/turtle" "https://atomicdata.dev/tpf"`) -- Load it into a SPARQL engine (e.g. ) +- Convert / serialize Atomic Data to RDF (for example by using an `accept` header: `curl -i -H "Accept: text/turtle" "https://atomicdata.dev"`) +- Load it into a SPARQL engine of your choice diff --git a/src/core/serialization.md b/src/core/serialization.md index 48a693a..63985fc 100644 --- a/src/core/serialization.md +++ b/src/core/serialization.md @@ -1,16 +1,49 @@ +{{#title Serialization of Atomic Data}} # Serialization of Atomic Data Atomic Data is not necessarily bound to a single serialization format. It's fundamentally a data model, and that's an important distinction to make. +It can be serialized in different ways, but there is only one required: `JSON-AD`. ## JSON-AD -However, it's recommended to use [`JSON-AD`](json-ad.md) (more about that on the next page), which is specifically designed to be a simple, complete and performant format for Atomic Data. +[`JSON-AD`](json-ad.md) (more about that on the next page) is specifically designed to be a simple, complete and performant format for Atomic Data. + +```json +{ + "@id": "https://atomicdata.dev/properties/description", + "https://atomicdata.dev/properties/datatype": "https://atomicdata.dev/datatypes/markdown", + "https://atomicdata.dev/properties/description": "A textual description of something. When making a description, make sure that the first few words tell the most important part. Give examples. Since the text supports markdown, you're free to use links and more.", + "https://atomicdata.dev/properties/isA": [ + "https://atomicdata.dev/classes/Property" + ], + "https://atomicdata.dev/properties/parent": "https://atomicdata.dev/properties", + "https://atomicdata.dev/properties/shortname": "description" +} +``` + +[Read more about JSON-AD](json-ad.md) ## JSON (simple) Atomic Data is designed to be serializable to clean, simple [JSON](../interoperability/json.md), for usage in (client) apps that don't need to know the full URLs of properties. +````json +{ + "@id": "https://atomicdata.dev/properties/description", + "datatype": "https://atomicdata.dev/datatypes/markdown", + "description": "A textual description of something. When making a description, make sure that the first few words tell the most important part. Give examples. Since the text supports markdown, you're free to use links and more.", + "is-a": [ + "https://atomicdata.dev/classes/Property" + ], + "parent": "https://atomicdata.dev/properties", + "shortname": "description" +} +```` + +[Read more about JSON and Atomic Data](json-ad.md) + + ## RDF serialization formats Since Atomic Data is a strict subset of RDF, RDF serialization formats can be used to communicate and store Atomic Data, such as N-Triples, Turtle, HexTuples, JSON-LD and [other RDF serialization formats](https://ontola.io/blog/rdf-serialization-formats/). @@ -18,6 +51,43 @@ However, not all valid RDF is valid Atomic Data. Atomic Data is more strict. Read more about serializing Atomic Data to RDF in the [RDF interoperability section](../interoperability/rdf.md). -## Experimental serialization formats +JSON-LD: + +```json +{ + "@context": { + "datatype": { + "@id": "https://atomicdata.dev/properties/datatype", + "@type": "@id" + }, + "description": "https://atomicdata.dev/properties/description", + "is-a": { + "@container": "@list", + "@id": "https://atomicdata.dev/properties/isA" + }, + "parent": { + "@id": "https://atomicdata.dev/properties/parent", + "@type": "@id" + }, + "shortname": "https://atomicdata.dev/properties/shortname" + }, + "@id": "https://atomicdata.dev/properties/description", + "datatype": "https://atomicdata.dev/datatypes/markdown", + "description": "A textual description of something. When making a description, make sure that the first few words tell the most important part. Give examples. Since the text supports markdown, you're free to use links and more.", + "is-a": [ + "https://atomicdata.dev/classes/Property" + ], + "parent": "https://atomicdata.dev/properties", + "shortname": "description" +} +``` + +Turtle / N-Triples: -Some experimental ideas for Atomic Data serialization are [written here](https://github.com/ontola/atomic-data/blob/master/src/experimental-serialization.md). +```turtle + . + . + "description"^^ . + "https://atomicdata.dev/classes/Property"^^ . + "A textual description of something. When making a description, make sure that the first few words tell the most important part. Give examples. Since the text supports markdown, you're free to use links and more."^^ . +``` diff --git a/src/create-json-ad.md b/src/create-json-ad.md new file mode 100644 index 0000000..fa47583 --- /dev/null +++ b/src/create-json-ad.md @@ -0,0 +1,137 @@ +# How to create and publish a JSON-AD file + +[JSON-AD](core/json-ad.md) is the default serialization format of Atomic Data. +It's just JSON, but with some extra requirements. + +Most notably, all keys are links to [Atomic Properties](https://atomicdata.dev/classes/Property). +These Properties must be actually hosted somewhere on the web, so other people can visit them to read more about them. + +Ideally, in JSON-AD, each Resource has its own `@id`. +This is the URL of the resource. +This means that if someone visits that `@id`, they should get the resource they are requesting. +That's great for people re-using your data, but as a data provider, implementing this can be a bit of a hassle. +That's why there is a different way that allows you to create Atomic Data _without manually hosting every resource_. + +## Creating JSON-AD without hosting individual resources yourself + +In this section, we'll create a single JSON-AD file containing various resources. +This file can then be published, shared and stored like any other. + +The goal of this preparation, is to ultimately import it somewhere else. +We'll be importing it to Atomic-Server. +Atomic-Server will create URLs for every single resource upon importing it. +This way, we only deal with the JSON-AD and the data structure, and we let Atomic-Server take care of hosting the data. + +Let's create a BlogPost. +We know the fields that we need: a `name` and some `body`. +But we can't use these keys in Atomic Data, we should use URLs that point to Properties. +We can either create new Properties (see the Atomic-Server tutorial), or we can use existing ones, for example by searching on [AtomicData.dev/properties](https://atomicdata.dev/properties). + +## Setting the first values + +```json +{ + "https://atomicdata.dev/properties/name": "Writing my first blogpost", + "https://atomicdata.dev/properties/description": "Hi! I'm a blogpost. I'm also machine readable!", +} +``` + +## Adding a Class + +Classes help others understanding what a Resource's type is, such as BlogPost or Person. +In Atomic Data, Resources can have multiple classes, so we should use an Array, like so: + +```json +{ + "https://atomicdata.dev/properties/name": "Writing my first blogpost", + "https://atomicdata.dev/properties/description": "Hi! I'm a blogpost. I'm also machine readable!", + "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Article"], +} +``` + +Adding a Class helps people to understand the data, and it can provide guarantees to the data users about the _shape_ of the data: they now know which fields are _required_ or _recommended_. +We can also use Classes to render Forms, which can be useful when the data should be edited later. +For example, the BlogPost item + +## Using exsisting Ontologies, Classes and Ontologies + +Ontologies are groups of concepts that describe some domain. +For example, we could have an Ontology for Blogs that links to a bunch of related _Classes_, such as BlogPost and Person. +Or we could have a Recipy Ontology that describes Ingredients, Steps and more. + +At this moment, there are relatively few Classes created in Atomic Data. +You can find most on [atomicdata.dev/classes](https://atomicdata.dev/classes). + +So possibly the best way forward for you, is to define a Class using the Atomic Data Browser's tools for making resources. + +## Multiple items + +If we want to have _multiple_ items, we can simply use a JSON Array at the root, like so: + +```json +[{ + "https://atomicdata.dev/properties/name": "Writing my first blogpost", + "https://atomicdata.dev/properties/description": "Hi! I'm a blogpost. I'm also machine readable!", + "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Article"], +},{ + "https://atomicdata.dev/properties/name": "Another blogpost", + "https://atomicdata.dev/properties/description": "I'm writing so much my hands hurt.", + "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Article"], +}] +``` + +## Preventing duplication with `localId` + +When we want to _publish_ Atomic Data, we also want someone else to be able to _import_ it. +An important thing to prevent, is _data duplication_. +If you're importing a list of Blog posts, for example, you'd want to only import every article _once_. + +The way to preventing duplication, is by adding a `localId`. +This `localId` is used by the importer to find out if it has already imported it before. +So we, as data producers, need to make sure that our `localId` is _unique_ and _does not change_! +We can use any type of string that we like, as long as it conforms to these requirements. +Let's use a unique _slug_, a short name that is often used in URLs. + +```json +{ + "https://atomicdata.dev/properties/name": "Writing my first blogpost", + "https://atomicdata.dev/properties/description": "Hi! I'm a blogpost. I'm also machine readable!", + "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Article"], + "https://atomicdata.dev/properties/localId": "my-first-blogpost", +} +``` + +## Describing relationships between resources using `localId` + +Let's say we also want to describe the `author` of the BlogPost, and give them an e-mail, a profile picture and some biography. +This means we need to create a new Resource for each Author, and again have to think about the properties relevant for Author. +We'll also need to create a link from BlogPost to Author, and perhaps the other way around, too. + +Normally, when we link things in Atomic Data, we can only use full URLs. +But, since we don't have URLs yet for our Resources, we'll need a different solution. +Again, this is where we can use `localId`! +We can simply refer to the `localId`, instead of some URL that does not exist yet. + +```json +[{ + "https://atomicdata.dev/properties/name": "Writing my first blogpost", + "https://atomicdata.dev/properties/description": "Hi! I'm a blogpost. I'm also machine readable!", + "https://atomicdata.dev/properties/author": "jon", + "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Article"], + "https://atomicdata.dev/properties/localId": "my-first-blogpost", +},{ + "https://atomicdata.dev/properties/name": "Another blogpost", + "https://atomicdata.dev/properties/description": "I'm writing so much my hands hurt.", + "https://atomicdata.dev/properties/author": "jon", + "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Article"], + "https://atomicdata.dev/properties/localId": "another-blogpost", +},{ + "https://atomicdata.dev/properties/name": "Jon Author", + "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Person"], + "https://atomicdata.dev/properties/localId": "jon", +}] +``` + +## Importing data using Atomic Sever + +_currently [under development](https://github.com/atomicdata-dev/atomic-server/issues/390)_ diff --git a/src/endpoints.md b/src/endpoints.md index fad3082..cefbc13 100644 --- a/src/endpoints.md +++ b/src/endpoints.md @@ -1,3 +1,4 @@ +{{#title Atomic Data Endpoints - describe how RESTful HTTP APIs behave}} # Atomic Endpoints _URL: https://atomicdata.dev/classes/Endpoint_ @@ -12,9 +13,21 @@ The most important property in an Endpoint is [`parameters`](https://atomicdata. You can find a list of Endpoints supported by Atomic-Server on [atomicdata.dev/endpoints](https://atomicdata.dev/endpoints). +Endpoint Resources are _dynamic_, because their properties could be calculated server-side. +When a Property tends to be calculated server-side, they will have a [`isDynamic` property](https://atomicdata.dev/properties/isDynamic) set to `true`, which tells the client that it's probably useless to try to overwrite it. + +## Incomplete resources + +A Server can also send one or more partial Resources for an Endpoint to the client, which means that some properties may be missing. +When this is the case, the Resource will have an [`incomplete`](https://atomicdata.dev/properties/incomplete) property set to `true`. +This tells the client that it has to individually fetch the resource from the server to get the full body. + +One scenario where this happens, is when fetching Collections that have other Collections as members. +If we would not have incomplete resources, the server would have to perform expensive computations even if the data is not needed by the client. + ## Design Goals - **Familiar API**: should look like something that most developers already know - **Auto-generate forms**: a front-end app should present Endpoints as forms that non-developers can interact with -[Discussion in issue tracker](https://github.com/ontola/atomic-data-docs/issues/15). +([Discussion](https://github.com/atomicdata-dev/atomic-data-docs/issues/15)) diff --git a/src/extended-table.md b/src/extended-table.md new file mode 100644 index 0000000..4a57309 --- /dev/null +++ b/src/extended-table.md @@ -0,0 +1,9 @@ +- [Commits](commits/intro.md) communicate state changes. These Commits are signed using cryptographic keys, which ensures that every change can be audited. Commits are also used to construct a history of versions. +- [Agents](agents.md) are Users that enable [authentication](authentication.md). They are Resources with their own Public and Private keys, which they use to identify themselves. +- [Collections](schema/collections.md): querying, filtering, sorting and pagination. +- [Paths](core/paths.md): traverse graphs. +- [Hierarchies](hierarchy.md) used for authorization and keeping data organized. Similar to folder structures on file-systems. +- [Invites](invitations.md): create new users and provide them with rights. +- [WebSockets](websockets.md): real-time updates. +- [Endpoints](endpoints.md): provide machine-readable descriptions of web services. +- [Files](files.md): upload, download and metadata for files. diff --git a/src/extended.md b/src/extended.md new file mode 100644 index 0000000..d779038 --- /dev/null +++ b/src/extended.md @@ -0,0 +1,10 @@ +{{#title Atomic Data Extended specification}} +# Atomic Data Extended + +Atomic Data is a _modular_ specification, which means that you can choose to implement parts of it. +All parts of Extended are _optional_ to implement. +The _Core_ of the specification (described in the previous chapter) is required for all of the Extended spec to work, but not the other way around. + +However, many of the parts of Extended do depend on _eachother_. + +{{#include extended-table.md}} diff --git a/src/files.md b/src/files.md new file mode 100644 index 0000000..42cf2b3 --- /dev/null +++ b/src/files.md @@ -0,0 +1,33 @@ +{{#title Uploading, downloading and describing files with Atomic Data}} +# Uploading, downloading and describing files with Atomic Data + +The Atomic Data model (Atomic Schema) is great for describing structured data, but for many types of existing data, we already have a different way to represent them: files. +In Atomic Data, files have two URLs. +One _describes_ the file and its metadata, and the other is a URL that downloads the file. +This allows us to present a better view when a user wants to take a look at some file, and learn about its context before downloading it. + +## The File class + +_url: [https://atomicdata.dev/classes/File](https://atomicdata.dev/classes/File)_ + +Files always have a downloadURL. +They often also have a filename, a filesize, a checksum, a mimetype, and an internal ID (more on that later). +They also often have a [`parent`](https://atomicdata.dev/properties/parent), which can be used to set permissions / rights. + +## Uploading a file + +In `atomic-server`, a `/upload` endpoint exists for uploading a file. + +- Decide where you want to add the file in the [hierarchy](hierarchy.md) of your server. You can add a file to any resource - your file will refer to this resource as its [`parent`](https://atomicdata.dev/properties/parent). Make sure you have `write` rights on this parent. +- Use that parent to add a query parameter to the server's `/upload` endpoint, e.g. `/upload?parent=https%3A%2F%2Fatomicdata.dev%2Ffiles`. +- Send an HTTP `POST` request to the server's `/upload` endpoint containing [`multi-part-form-data`](https://developer.mozilla.org/en-US/docs/Web/API/FormData/Using_FormData_Objects). You can upload multiple files in one request. Add [authentication](authentication.md) headers, and sign the HTTP request with the +- The server will check your authentication headers, your permissions, and will persist your uploaded file(s). It will now create File resources. +- The server will reply with an array of created Atomic Data Files + +## Downloading a file + +Simply send an HTTP GET request to the File's [`download-url`](https://atomicdata.dev/properties/downloadURL) (make sure to authenticate this request). + +- [Discussion on specification](https://github.com/ontola/atomic-data-docs/issues/57) +- [Discussion on Rust server implementation](https://github.com/atomicdata-dev/atomic-server/issues/72) +- [Discussion on Typescript client implementation](https://github.com/atomicdata-dev/atomic-data-browser/issues/121) diff --git a/src/get-involved.md b/src/get-involved.md index 53f140c..c1d131d 100644 --- a/src/get-involved.md +++ b/src/get-involved.md @@ -1,3 +1,4 @@ +{{#title Atomic Data - Get Involved}} # Get involved Atomic Data is an open specification, and that means that you're very welcome to share your thoughts and help make this standard as good as possible. @@ -5,21 +6,15 @@ Atomic Data is an open specification, and that means that you're very welcome to Things you can do: - Join the [Discord server](https://discord.gg/a72Rv2P) for voice / text chat -- Start playing with / contributing to the [`atomic-server / atomic-cli`](https://github.com/joepio/atomic) implementation written in Rust. -- Clone the [Book Repo](https://github.com/ontola/atomic-data/) and read some of the inline comments, which might help start some discussions -- Drop an [issue on Github](https://github.com/ontola/atomic-data/issues) to share your suggestions or criticism +- Start playing with / contributing to [the implementations](tooling.md) +- Drop an [issue on Github](https://github.com/ontola/atomic-data-docs/issues) to share your suggestions or criticism of this book / spec +- Subscribe to the [newsletter](newsletter.md) - Join our [W3C Community Group](https://www.w3.org/community/atomic-data/) -## Authors: - -- Joep Meindertsma ([joepio](https://github.com/joepio/) from [Ontola.io](https://ontola.io/)) - -## Special thanks to: - -- **Thom van Kalkeren** (my colleague, friend and programming mentor who came up with many great ideas on how to work with RDF, such as [HexTuples](https://github.com/ontola/hextuples) and [linked-delta](https://github.com/ontola/linked-delta)) -- **Tim Berners-Lee** (for everything he did for linked data and the web) -- **Ruben Verborgh** (for doing great work with RDF, such as the TPF spec) -- **Pat McBennett** (for lots of valuable feedback on initial Atomic Data docs) -- **Manu Sporny** (for his work on JSON-LD, which was an important inspiration for JSON-AD) -- **Jonas Smedegaard** (for the various intersting talks we had and the feedback he provided) -- All the other people who contributed to linked data related standards + + diff --git a/src/get-started.md b/src/get-started.md new file mode 100644 index 0000000..e50bedb --- /dev/null +++ b/src/get-started.md @@ -0,0 +1,41 @@ +{{#title Get started with Atomic Data}} +# Get started with Atomic Data + +There's a couple of levels at which you can start working with Atomic Data (from easy to hard): + +- **Play with the demo**: Create an Agent, edit a document. +- **Host your own Atomic-Server**. +- **Create a react app with the template** +- **Set up the full dev environment**. +- **Create a library for Atomic Data**. + +## Play with the demo + +- Open [the Invite](https://atomicdata.dev/invites/1) on `atomicdata.dev` +- Press `Accept`. Now, the front-end app will generate a Private-Public Key pair. The public key will be sent to the server, which creates an Agent for you. +- You're now signed in! You can edit the document in your screen. +- Edit your Agent by going to [user settings](https://atomicdata.dev/app/agent) +- Copy your `secret`, and save it somewhere safe. You can use this to sign in on a different machine. +- Press `edit user` to add your name and perhaps a bio. +- When you're done, visit user settings again and press `sign out` to erase your credentials and end the session. + +## Host your own Atomic-Sesrver (locally) + +- If you have docker running, you can use this one-liner: `docker run -p 80:80 -v atomic-storage:/atomic-storage joepmeneer/atomic-server` (or use `cargo install atomic-server`, or the [binaries](https://github.com/atomicdata-dev/atomic-server/releases/)) +- Now, visit `localhost` in your browser to access your server. +- It's now only available locally. If you want to get it on the _internet_, you need to set up a domain name, and make sure its traffic is routed to your computer (search `port forwarding`). + +## Host your own Atomic-Server (on a VPS) + +- **Set up a domain name** by using one of the many services that do this for you. +- **Get a virtual private server (VPS)** on which you can run `atomic-server`. We are running atomicdata.dev on the cheapest VPS we could find: $3.50 / month at [Vultr.com (use this link to give us $10 bucks of hosting credit)](https://www.vultr.com/?ref=8970814-8H). + + + +- Browser app [atomic-data-browser](https://github.com/atomicdata-dev/atomic-data-browser) ([demo on atomicdata.dev](https://atomicdata.dev)) +- Build a react app using [typescript & react libraries](https://github.com/atomicdata-dev/atomic-data-browser). Start with the [react template on codesandbox](https://codesandbox.io/s/atomic-data-react-template-4y9qu?file=/src/MyResource.tsx) +- Host your own [atomic-server](https://github.com/atomicdata-dev/atomic-data-browser) (powers [atomicdata.dev](https://atomicdata.dev), run with `docker run -p 80:80 -v atomic-storage:/atomic-storage joepmeneer/atomic-server`) +- Discover the command line tool: [atomic-cli](https://github.com/atomicdata-dev/atomic-server) (`cargo install atomic-cli`) +- Use the Rust library: [atomic-lib](https://github.com/atomicdata-dev/atomic-server) + +Make sure to [join our Discord](https://discord.gg/a72Rv2P) if you'd like to discuss Atomic Data with others. diff --git a/src/headless-cms.md b/src/headless-cms.md new file mode 100644 index 0000000..0af7c1f --- /dev/null +++ b/src/headless-cms.md @@ -0,0 +1 @@ +# Using as a Headless CMS diff --git a/src/hierarchy.md b/src/hierarchy.md index c96c914..da94a71 100644 --- a/src/hierarchy.md +++ b/src/hierarchy.md @@ -1,3 +1,4 @@ +{{#title Atomic Data Hierarchy, rights and authorization }} # Hierarchy, rights and authorization Hierarchies help make information easier to find and understand. @@ -6,7 +7,7 @@ Your computer probably has a bunch of _drives_ and deeply nested _folders_ that We generally use these hierarchical elements to keep data organized, and to keep a tighter grip on rights management. For example, sharing a specific folder with a team, but a different folder could be private. -Although you are free to use Atomic Data with your own custom authorization system, we have a standardized model that is currently being used by some of the tools that we've built. +Although you are free to use Atomic Data with your own custom authorization system, we have a standardized model that is currently being used by Atomic-Server. ## Design goals @@ -16,18 +17,35 @@ Although you are free to use Atomic Data with your own custom authorization syst ## Atomic Hierarchy Model -- Every Resource SHOULD have a [`parent`](https://atomicdata.dev/properties/parent). +- Every Resource SHOULD have a [`parent`](https://atomicdata.dev/properties/parent). There are some exceptions to this, which are discussed below. - Any Resource can be a `parent` of some other Resource, as long as both Resources exists on the same Atomic Server. -- Inversely, every Resource could have `children`. -- Only [`Drive`](https://atomicdata.dev/classes/Drive)s (Resources with the class `Drive`) are allowed to be a top-level parent. -- Any Resource might have `read` and `write` Atoms. These both contain a list of Agents. These Agents will be granted the rights to edit (using Commits) or read / use the Resources. +- Grants / rights given in a `parent` also apply to all children, and their children. +- There are few Classes that do not require `parent`s: + +## Authorization + +- Any Resource might have [`read`](https://atomicdata.dev/properties/read) and [`write`](https://atomicdata.dev/properties/write) Atoms. These both contain a list of Agents. These Agents will be granted the rights to edit (using Commits) or read / use the Resources. - Rights are _additive_, which means that the rights add up. If a Resource itself has no `write` Atom containing your Agent, but it's `parent` _does_ have one, you will still get the `write` right. - Rights cannot be removed by children or parents - they can only be added. +- `Commits` can not be edited. They can be `read` if the Agent has rights to read the [`subject`](https://atomicdata.dev/properties/subject) of the `Commit`. + +## Top-level resources + +Some resources are special, as they do not require a `parent`: + +- [`Drive`](https://atomicdata.dev/classes/Drive)s are top-level items in the hierarchy: they do not have a `parent`. +- [`Agent`](https://atomicdata.dev/classes/Agent)s are top-level items because they are not `owned` by anything. They can always `read` and `write` themselves. +- [`Commit`](https://atomicdata.dev/classes/Commit)s are immutable, so they should never be edited by anyone. That's why they don't have a place in the hierarchy. Their `read` rights are determined by their subject. + +## Authentication + +Authentication is about proving _who you are_, which is often the first step for authorization. See [authentication](./authentication.md). + +## Current limitations of the Authorization model -## Limitations of the current Authorization model +The specification is growing (and please contribute in the [docs repo](https://github.com/atomicdata-dev/atomic-data-docs/issues)), but the current specification lacks some features: -- Rights can only be added, but not removed in a higher item of a hierarchy. This means that you cannot have a secret folder inside a public folder. -- No model for representing groups of Agents, or other runtime checks for authorization. -- No way to limit access to reading / writing specific properties -- No way to limit delete access +- Rights can only be added, but not removed in the hierarchy. This means that you cannot have a secret folder inside a public folder. +- No model for representing groups of Agents, or other runtime checks for authorization. ([issue](https://github.com/atomicdata-dev/atomic-data-docs/issues/73)) +- No way to limit delete access or invite rights separately from write rights ([issue](https://github.com/atomicdata-dev/atomic-data-docs/issues/82)) - No way to request a set of rights for a Resource diff --git a/src/interoperability/git.md b/src/interoperability/git.md index 5af2229..870aa1e 100644 --- a/src/interoperability/git.md +++ b/src/interoperability/git.md @@ -1,4 +1,5 @@ -# Atomic data and Git +{{#title Atomic Data and Git}} +# Atomic Data and Git ## How to manage Atomic Data using GIT diff --git a/src/interoperability/graph-database.md b/src/interoperability/graph-database.md index d8e4e74..bfc66cb 100644 --- a/src/interoperability/graph-database.md +++ b/src/interoperability/graph-database.md @@ -1,9 +1,23 @@ +{{#title How does Atomic Data relate to Graph Databases?}} # Atomic Data and Graph Databases -Atomic Data fundamentally is a _graph data model_. +Atomic Data is fundamentally a _graph data model_. We can think of Atomic Resources as _nodes_, and links to other resources through _properties_ as _edges_. -In this section, we'll explore how Atomic Data relates to some graph technologies. +In the first section, we'll take a look at Atomic-Server as a Graph Database. +After that, we'll explore how Atomic Data relates to some graph technologies. + +## Atomic-Server as a database + +- **Built-in REST**. Everything is done over HTTP, there's no new query language or serialization to learn. It's all JSON. +- **All resources have HTTP URLs**. This means that every single thing is identified by where it can be be found. Makes it easy to share data, if you want to! +- **Sharable and re-usable data models**. Atomic Schema helps you share and re-use data models by simply pointing to URLs. +- **Authorization built-in**. Managing rights in a hierarchy (similar to how tools like Google Drive or filesystems work) enable you to have a high degree of control over read / write rights. +- **Built-in easy to use GUI**. Managing content on Atomic-Server can be done by anyone, as its GUI is extremely easy to use and has a ton of features. +- **Dynamic indexing**. Indexes are created by performing Queries, resulting in great performance - without needing to manually configure indexing. +- **Synchronization over WebSockets**. All changes (called [Commits](../commits/intro.md)) can be synchronized over WebSockets, allowing you to build realtime collaborative tools. +- **Event-sourced**. All changes are stored and reversible, giving you a full versioned history. +- **Open source**. All code is MIT-licensed. ## Comparing Atomic Data to Neo4j @@ -44,7 +58,7 @@ This means that with Atomic Data, we get _versioning + audit trails_ for all dat ### Schema language and type safety In Neo4j, constraints can be added to the database by -Atomic Data uses Atomic Schema for validating datatypes and required properties in resources. +Atomic Data uses [Atomic Schema](../schema/intro.md) for validating datatypes and required properties in [Classes](../schema/classes.md). ### Other differences diff --git a/src/interoperability/intro.md b/src/interoperability/intro.md index 87c6e85..8c33a1e 100644 --- a/src/interoperability/intro.md +++ b/src/interoperability/intro.md @@ -1,8 +1,13 @@ +{{#title Atomic Data Interoperability - Relationship and comparison to other technology}} # Interoperability: Relation to other technology Atomic data is designed to be easy to use in existing projects, and be interoperable with existing formats. This section will discuss how Atomic Data differs from or is similar to various data formats and paradigms, and how it can interoperate. +## Upgrade guide + +* [Upgrade](upgrade.md): How to make your existing (server-side) application serve Atomic Data. From easy, to hard. + ## Data formats * [JSON](json.md): Atomic Data is designed to be easily serializable to clean, idiomatic JSON. However, if you want to turn JSON into Atomic Data, you'll have to make sure that all keys in the JSON object are URLs that link to Atomic Properties, and the data itself also has to be available at its Subject URL. @@ -10,12 +15,10 @@ This section will discuss how Atomic Data differs from or is similar to various ## Protocols +* [Solid](solid.md): A set of specifications that has many similarities with Atomic Data * [IPFS](ipfs.md): Content-based addressing to prevent 404s and centralization ## Database paradigms * [SQL](sql.md): How Atomic Data differs from and could interact with SQL databases - -## Upgrade guide - -* [Upgrade](upgrade.md): How to make your existing server-side application compatible with Atomic Data +* [Graph](graph-database.md): How it differs from some labeled property graphs, such as Neo4j diff --git a/src/interoperability/ipfs.md b/src/interoperability/ipfs.md index 0e165e7..ccddcfc 100644 --- a/src/interoperability/ipfs.md +++ b/src/interoperability/ipfs.md @@ -1,3 +1,4 @@ +{{#title How does Atomic Data relate to IPFS?}} # Atomic Data and IPFS ## What is IPFS @@ -7,22 +8,36 @@ Instead of using an HTTP URL like `http://example.com/helloworld`, it uses the I IPFS identifies things based on their unique content hash (the long, seemingly random string) using a thing called a Merkle DAG ([this great article](https://medium.com/textileio/whats-really-happening-when-you-add-a-file-to-ipfs-ae3b8b5e4b0f#:~:text=In%20practice%2C%20content%20addressing%20systems,function%2C%20to%20produce%20a%20digest.&text=From%20raw%20image%20to%20cryptographic%20digest%20to%20content%20id%20(multihash).) explains it nicely). This is called a [CID](https://github.com/multiformats/cid), or Content ID. This simple idea (plus some not so simple network protocols) allows for decentralized, temper-proof storage of data. -This fixes some issues with HTTP that are related to its centralized philosophy: no more 404s! +This fixes some issues with HTTP that are related to its centralized philosophy: **no more 404s**! -## Why is IPFS especially interesting for Atomic Data +## Why is IPFS interesting for Atomic Data Atomic Data is highly dependent on the availability of Resources, especially Properties and Datatypes. -These resources are meant to be re-used a lot, and that would make everything expensive. +These resources are meant to be re-used a lot, and when these go offline or change (for whatever reason), it could cause issues and confusion. +IPFS guarantees that these resources are entirely static, which means that they cannot change. +This is useful when dealing with Properties, as a change in datatype could break things. +IPFS also allows for location-independent fetching, which means that resources can be retrieved from any location, as long as it's online. +This Peer-to-peer functionality is a very fundamental advantage of IPFS over HTTP, especially when the resources are very likely to be re-use, which is _especially_ the case for Atomic Data Properties. ## Considerations using IPFS URLs -They are static, their contents can never change. -This is great for some types of data, but horrible for others. -If you're describing a time-dependent thing (such as a person's job), -If you're describing personal, private information, its also a bad idea to use IPFS, because it's designed to be permanent. -Also, IPFS is not as fast as HTTP - at least for now. +IPFS URLs are **static**, which means that their contents can never change. +This is great for some types of data, but not so much for others. +If you're describing a time-dependent thing (such as a person's job), you'll probably want to know what the _current_ value is, and that is not possible when you only have an IPFS identifier. +This can be fixed by including an HTTP URL in IPFS bodies. + +IPFS data is also **hard to remove**, as it tends to be replicated across machines. +If you're describing personal, private information, it can therefore be a bad idea to use IPFS. + +And finally, its **performance** is typically not as good as HTTP. +If you know the IPFS gateway that hosts the IPFS resource that you're looking for, things improve drastically. +Luckily for Atomic Data, this is often the case, as we know the HTTP url of the server and could try whether that server has an IPFS gateway. ## Atomic Data and IPLD IPLD (not IPFS) stands for InterPlanetary Linked Data, but is not related to RDF. The scope seems fundamentally different from RDF, too, but I have to read more about this. + +## Share your thoughts + +Discuss on [this issue](https://github.com/ontola/atomic-data-docs/issues/42). diff --git a/src/interoperability/json.md b/src/interoperability/json.md index 9a333f0..7e5a600 100644 --- a/src/interoperability/json.md +++ b/src/interoperability/json.md @@ -1,3 +1,4 @@ +{{#title How does Atomic Data relate to JSON?}} # How does Atomic Data relate to JSON? Because JSON is so popular, Atomic Data is designed with JSON in mind. @@ -70,7 +71,7 @@ The Property keys (e.g. "https://example.com/properties/name") need to resolve t } ``` -In practice, the easiest approach to make this conversion, is to create the data and host it using software like [Atomic Server](https://github.com/joepio/atomic/blob/master/server/README.md). +In practice, the easiest approach to make this conversion, is to create the data and host it using software like [Atomic Server](https://github.com/atomicdata-dev/atomic-server/blob/master/server/README.md). ## From Atomic Data to JSON-LD diff --git a/src/interoperability/rdf.md b/src/interoperability/rdf.md index 975f11c..08638c4 100644 --- a/src/interoperability/rdf.md +++ b/src/interoperability/rdf.md @@ -1,3 +1,4 @@ +{{#title How does Atomic Data relate to RDF?}} # How does Atomic Data relate to RDF? RDF (the [Resource Description Framework](https://www.w3.org/TR/rdf-primer/)) is a W3C specification from 1999 that describes the original data model for linked data. @@ -15,17 +16,17 @@ However, it does differ in some fundamental ways. - Atomic requires URL (not URI) values in its `subjects` and `properties` (predicates), which means that they should be resolvable. Properties must resolve to an `Atomic Property`, which describes its datatype. - Atomic only allows those who control a resource's `subject` URL endpoint to edit the data. This means that you can't add triples about something that you don't control. - Atomic has no separate `datatype` field, but it requires that `Properties` (the resources that are shown when you follow a `predicate` value) specify a datatype. However, it is allowed to serialize the datatype explicitly, of course. -- Atomic has no separate `language` field, but it does support [Translation Resources](../schema/translations.md). +- Atomic has no separate `language` field. - Atomic has a native Event (state changes) model ([Atomic Commits](../commits/intro.md)), which enables communication of state changes - Atomic has a native Schema model ([Atomic Schema](../schema/intro.md)), which helps developers to know what data types they can expect (string, integer, link, array) - Atomic does not support Named Graphs. These should not be needed, because all statements should be retrievable by fetching the Subject of a resource. However, it _is_ allowed to include other resources in a response. ## Why these changes? -I love RDF, and have been working with it for quite some time now. -I started a company that specializes in Linked Data, and we use it extensively in our products and services. +I have been working with RDF for quite some time now, and absolutely believe in some of the core premises of RDF. +I started a company that specializes in Linked Data ([Ontola](https://ontola.io)), and we use it extensively in our products and services. Using URIs (and more-so URLs, which are URIs that can be fetched) for everything is a great idea, since it helps with interoperability and enables truly decentralized knowledge graphs. -However, some of the characteristics of RDF might have contributed to its relative lack of adoption. +However, some of the characteristics of RDF make it hard to use, and have probably contributed to its relative lack of adoption. ### It's too hard to select a specific value (object) in RDF @@ -100,13 +101,20 @@ This more closely resembles common CS terminology. ([discussion](https://github. ### Subject + Predicate uniqueness -In RDF, it's very much possible for a graph to contain multiple statements that share both a `subject` and a `predicate`. -One of the reasons this is possible, is because RDF graphs should always be mergeable. -However, this introduces some extra complexity for data users. +As discussed above, in RDF, it's very much possible for a graph to contain multiple statements that share both a `subject` and a `predicate`. +This is probably because of two reasons: + +1. RDF graphs must always be **mergeable** (just like Atomic Data). +1. Anyone can make **any statement** about **any subject** (_unlike_ Atomic Data, see next section). + +However, this introduces a lot extra complexity for data users (see above), which makes it not very attractive to use RDF in any client. Whereas most languages and datatypes have `key-value` uniqueness that allow for unambiguous value selection, RDF clients have to deal with the possibility that multiple triples with the same `subject-predicate` combination might exist. +It also introduces a different problem: How should you interpret a set of `subject-predicate` combinations? +Does this represent a non-ordered collection, or did something go wrong while setting values?\ +In the RDF world, I've seen many occurences of both. -Atomic Data requires `subject-property` uniqueness, which means that this is no longer an issue for clients. -However, in order to guarantee this, and still retain _graph merge-ability_ we also need to limit who creates statements about a subject: +Atomic Data requires `subject-property` uniqueness, which means that these issues are no more. +However, in order to guarantee this, and still retain _graph merge-ability_, we also need to limit who creates statements about a subject: ### Limiting subject usage @@ -130,11 +138,11 @@ This means that someone using RDF data about domain B cannot know that domain B Knowing _where data comes from_ is one of the great things about URIs, but RDF does not require that you can think of subjects as the source of data. Many subjects in RDF don't actually resolve to all the known triples of the statement. It would make the conceptual model way simpler if statements about a subject could only be made from the source of the domain owner of the subject. -When triples are created about a resource in a place other than where the subject is hosted, these triples are hard to share. +When triples are created about a resource, in a place other than where the subject is hosted, these triples are hard to share. The way RDF projects deal with this, is by using _named graphs_. As a consequence, all systems that use these triples should keep track of another field for every atom. -To make things worse, it makes `subject-predicate` _impossible_ to guarantee. +To make things worse, it makes `subject-predicate` uniqueness _impossible_ to guarantee. That's a high price to pay. I've asked two RDF developers (who did not know each other) working on RDF about limiting subject usage, and both were critical. @@ -150,7 +158,7 @@ In RDF, an `object` can either be a `named node`, `blank node` or `literal`. A ` Although RDF statements are often called `triples`, a single statement can consist of five fields: `subject`, `predicate`, `object`, `language`, `datatype`. Having five fields is way more than most information systems. Usually we have just `key` and `value`. This difference leads to compatibility issues when using RDF in applications. -In practice, clients have to run a lot of checks before they can use the data - which makes RDF in most contexts harder to use than something such as JSON. +In practice, clients have to run a lot of checks before they can use the data - which makes RDF in most contexts harder to use than something like JSON. Atomic Data drops the `named node` / `literal` distinction. We just have `values`, and they are interpreted by looking at the `datatype`, which is defined in the `property`. @@ -158,13 +166,18 @@ When a value is a URL, we don't call it a named node, but we simply use a URL da ### Requiring URLs -RDF allows any type of URIs for `subject` and `predicate` value, which means they can be URLs, but don't have to be. This means they don't always resolve, or even function as locators. The links don't work, and that restricts how useful the links are. Atomic Data takes a different approach: these links MUST Resolve. Requiring Properties to resolve is part of what enables the type system of Atomic Schema - they provide the `shortname` and `datatype`. +A URL (Uniform Resource _Locator_) is a specific and cooler version of a URI (Uniform Resource _Identifier_), because a URL tells you where you can find more information about this thing (hence _Locator_). + +RDF allows any type of URIs for `subject` and `predicate` value, which means they can be URLs, but don't have to be. +This means they don't always resolve, or even function as locators. +The links don't work, and that restricts how useful the links are. +Atomic Data takes a different approach: these links MUST Resolve. Requiring [Properties](https://atomicdata.dev/classes/Property) to resolve is part of what enables the type system of Atomic Schema - they provide the `shortname` and `datatype`. -Requiring URLs makes things easier for data users, at the cost of the data producer. -With Atomic Data, the data producer MUST offer the triples at the URL of the subject. -This is a challenge - especially with the current (lack of) tooling. +Requiring URLs makes things easier for data users, but makes things a bit more difficult for the data producer. +With Atomic Data, the data producer MUST offer the data at the URL of the subject. +This is a challenge that requires tooling, which is why I've built [Atomic-Server](https://crates.io/crates/atomic-server): an easy to use, performant, open source data management sytem. -However - making sure that links _actually work_ offer tremendous benefits for data consumers, and that advantage is often worth the extra trouble. +Making sure that links _actually work_ offer tremendous benefits for data consumers, and that advantage is often worth the extra trouble. ### Replace blank nodes with paths @@ -262,7 +275,7 @@ This tooling should help to create URLs, Properties, and host everything on an e ## Convert Atomic data to RDF Since all Atomic Data is also valid RDF, it's trivial to convert / serialize Atoms to RDF. -This is why [atomic](https://github.com/joepio/atomic) can serialize Atomic Data to RDF. (For example, try `atomic-cli get https://atomicdata.dev/properties/description --as n3`) +This is why [atomic](https://github.com/atomicdata-dev/atomic-data-browser) can serialize Atomic Data to RDF. (For example, try `atomic-cli get https://atomicdata.dev/properties/description --as n3`) However, contrary to Atomic Data, RDF has optional Language and Datatype elements in every statement. It is good practice to use these RDF concepts when serializing Atomic Data into Turtle / RDF/XML, or other [RDF serialization formats](https://ontola.io/blog/rdf-serialization-formats/). diff --git a/src/interoperability/solid.md b/src/interoperability/solid.md index dd2d66c..274cd9b 100644 --- a/src/interoperability/solid.md +++ b/src/interoperability/solid.md @@ -1,3 +1,4 @@ +{{#title How does Atomic Data relate to Solid?}} # Atomic Data and Solid The [Solid project](https://solidproject.org/) is an initiative by the inventor of linked data and the world wide web: sir Tim Berners-Lee. @@ -9,18 +10,21 @@ In many ways, it has **similar goals** to Atomic Data: Technically, both are also similar: -- Usage of personal servers, or PODs (Personal Online Datastores). Both Atomic Data and Solid aim to provide users with a highly personal server where all sorts of data can be stored. +- Usage of **personal servers**, or PODs (Personal Online Datastores). Both Atomic Data and Solid aim to provide users with a highly personal server where all sorts of data can be stored. - Usage of **linked data**. All Atomic Data is valid RDF, which means that **all Atomic Data is compatible with Solid**. However, the other way around is more difficult. In other words, if you choose to use Atomic Data, you can always put it in your Solid Pod. But there are some important **differences**, too, which will be explained in more detail below. -- Atomic Data uses a strict built-in schema to ensure type safety. +- Atomic Data uses a strict built-in schema to ensure type safety - Atomic Data standardizes state changes (which also provides version control / history, audit trails) - Atomic Data is more easily serializable to other formats (like JSON) -- Atomic Data is less mature, and currently lacks things like authentication and hierarchy +- Atomic Data has different models for authentication, authorization and hierarchies +- Atomic Data does not depend on existing semantic web specifications +- Atomic Data is a smaller and younger project, and as of now a one-man show -Disclaimer: I've been quite involved in the development of Solid, and have a lot of respect for all the people who are working on it. -The following is not meant as a critique on Solid, let alone the individuals working on it. +_Disclaimer: I've been quite involved in the development of Solid, and have a lot of respect for all the people who are working on it. +Solid and RDF have been important inspirations for the design of Atomic Data. +The following is not meant as a critique on Solid, let alone the individuals working on it._ ## Atomic Data is type-safe, because of its built-in schema @@ -51,9 +55,11 @@ Atomic Data has a **uniform write API**. All changes to data are done by posting Commits to the `/commits` endpoint of a Server. This removes the need to think about differences between all sorts of HTTP methods like POST / PUT / PATCH, and how servers should reply to that. +_EDIT: as of december 2021, Solid has introduced `.n3 patch` for standardizing state changes. Although this adds a uniform way of describing changes, it still lacks the power of Atomic Commits. It does not specify signatures, mention versioning, or deals with persisting changesets. On top of that, it is quite difficult to read or parse, being `.n3`._ + ## Atomic Data is more easily serializable to other formats (like JSON) -Atomic Data is designed with the modern developer in mind. +Atomic Data is designed with the modern (web)developer in mind. One of the things that developers expect, is to be able to traverse (JSON) objects easily. Doing this with RDF is not easily possible, because doing this requires _subject-predicate uniqueness_. Atomic Data does not have this problem (properties _must_ be unique), which means that traversing objects becomes easy. @@ -63,13 +69,80 @@ Atomic Data uses `shortnames` to map properties to short, human-readable strings For more information about these differences, see the previous [RDF chapter](./rdf.md). -## Solid is more mature -Atomic Data has significant gaps at this moment - not just in the implementations, but also in the spec. -This makes it not yet usable for most applications. +## Authentication + +Both Solid an Atomic Data use URLs to refer to individuals / users / Agents. + +Solid's identity system is called WebID. +There are multiple supported authentication protocols, the most common being [WebID-OIDC](https://github.com/solid/webid-oidc-spec). + +Atomic Data's [authentication model](../authentication.md) is more similar to how SSH works. +Atomic Data identities (Agents) are a combination of HTTP based, and cryptography (public / private key) based. +In Atomic, all actions (from GET requests to Commits) are signed using the private key of the Agent. +This makes Atomic Data a bit more unconventional, but also makes its auth mechanism very decentralized and lightweight. + +## Hierarchy and authorization + +Atomic Data uses `parent-child` [hierarchies](../hierarchy.md) to model data structures and perform authorization checks. +This closely resembles how filesystems work (including things like Google Drive). +Per resource, `write` and `read` rights can be defined, which both contain lists of Agents. + +Solid is working on the [Shape Trees](https://shapetrees.org/TR/specification/) spec, which also describes hierarchies. +It uses SHEX to perform shape validation, similar to how Atomic Schema does. + + +## No dependency on existing semantic web specifications + +The Solid specification (although still in draft) builds on a 20+ year legacy of committee meetings on semantic web standards such as RDF, SPARQL, OWL and XML. +I think the process of designing specifications in [various (fragmented) committees](https://en.wikipedia.org/wiki/Design_by_committee) has led to a set of specifications that lack simplicity and consistency. +Many of these specifications have been written long before there were actual implementations. +Much of the effort was spent on creating highly formal and abstract descriptions of common concepts, but too little was spent on making specs that are easy to use and solve actual problems for developers. + +Aaron Scharz (co-founder or reddit, inventor of RSS and Markdown) wrote this in his [unfinished book 'A Programmable Web'](https://ieeexplore.ieee.org/document/6814657): + +> Instead of the “let’s just build something that works” attitude that made the Web (and the Internet) such a roaring success, they brought the formalizing mindset of mathematicians and the institutional structures of academics and defense +contractors. +> They formed committees to form working groups to write drafts of ontologies that carefully listed (in 100-page Word documents) all possible things in the universe and the various properties they could have, and they spent hours in Talmudic debates over whether a washing machine was a kitchen appliance or a household cleaning device. + +(The book is a great read on this topic, by the way!) + +So, in a nutshell, I think this legacy makes Solid unnecessarily hard to use for developers, for the following reasons: + +- **RDF Quirks**: Solid has to deal with all the [complexities of the RDF data model](./rdf.md), such as blank nodes, named graphs, subject-predicate duplication. +- **Multiple (uncommon) serialization formats** need to be understood, such as `n3`, `shex` and potentially all the various RDF serialization formats. These will feel foreign to most (even very experienced) developers and can have a high degree of complexity. +- **A heritage of broken URLs**. Although a lot if RDF data exists, only a small part of it is actually resolvable as machine-readable RDF. The large majority won't give you the data when sending a HTTP GET request with the correct `Accept` headers to the subject's URL. Much of it is stored in documents on a different URL (`named graphs`), or behind some SPARQL endpoint that you will first need to find. Solid builds on a lot of standards that have these problems. +- **Confusing specifications**. Reading up on RDF, Solid, and the Semantic Web can be a daunting (yet adventurous) task. I've seen many people traverse a similar path as I did: read the RDF specs, dive into OWL, install protege, create ontologies, try doing things that OWL doesn't do (validate data), read more complicated specs that don't help to clear things, become frustrated... It's a bit of a rabbit hole, and I'd like to prevent people from falling into it. There's a lot of interesting ideas there, but it is not a pragmatic framework to develop interoperable apps with. + +## Atomic Data and Solid server implementations + +Both Atomic Data and Solid are specifications that have different implementations. +Some open source Solid implementations are the [Node Solid Server](https://github.com/solid/node-solid-server), the [Community Solid Server](https://github.com/solid/community-server) (also nodejs based) and the [DexPod](https://gitlab.com/ontola/dexpod) (Ruby on Rails based). + +[Atomic-Server](https://github.com/atomicdata-dev/atomic-server/) is a database + server written in the Rust programming language, that can be considered an alternative to Solid Pod implementations. +It was definitely built to be one, at least. +It implements every part of the Atomic Data specification. +I believe that as of today (february 2022), Atomic-Server has quite a few advantages over existing Solid implementations: + + +- **Dynamic schema validation** / type checking using [Atomic Schema](https://docs.atomicdata.dev/schema/intro.html), combining the best of RDF, JSON and type safety. +- **Fast** (1ms responses on my laptop) +- **Lightweight** (8MB download, no runtime dependencies) +- **HTTPS + HTTP2 support** with Built-in LetsEncrypt handshake. +- **Browser GUI included** powered by [atomic-data-browser](https://github.com/atomicdata-dev/atomic-data-browser). Features dynamic forms, tables, authentication, theming and more. Easy to use! +- **Event-sourced versioning** / history powered by [Atomic Commits](https://docs.atomicdata.dev/commits/intro.html) +- **Many serialization options**: to JSON, [JSON-AD](https://docs.atomicdata.dev/core/serialization.html#json-ad), and various Linked Data / RDF formats (RDF/XML, N-Triples / Turtle / JSON-LD). +- **Full-text search** with fuzzy search and various operators, often <3ms responses. +- **Pagination, sorting and filtering** using [Atomic Collections](https://docs.atomicdata.dev/schema/collections.html) +- **Invite and sharing system** with [Atomic Invites](https://docs.atomicdata.dev/invitations.html) +- **Desktop app** Easy desktop installation, with status bar icon, powered by [tauri](https://github.com/tauri-apps/tauri/). +- **MIT licensed** So fully open-source and free forever! + +## Things that Atomic Data misses, but Solid has + +Atomic Data is not even two years old, and although progress has been fast, it does lack some specifications. Here's a list of things missing in Atomic Data, with links to their open issues and links to their existing Solid counterpart. -- No way to restrict access to reading content. Only for writing content with Commits. [WAC](https://github.com/solid/web-access-control-spec) in Solid. Also, [No hierarchy model](https://github.com/ontola/atomic-data/issues/18). [ShapeTrees in Solid](https://shapetrees.org/TR/specification/index.html#ecosystem). (We're working on an implementation of a hierarchy with authorization, see [issue](https://github.com/ontola/atomic-data/issues/18)) -- No inbox or [notifications](https://www.w3.org/TR/ldn/) ([issue](https://github.com/ontola/atomic-data/issues/28)) -- No way to discover content from user ID. +- No inbox or [notifications](https://www.w3.org/TR/ldn/) yet ([issue](https://github.com/ontola/atomic-data/issues/28)) +- No OIDC support yet. ([issue](https://github.com/atomicdata-dev/atomic-server/issues/277)) - No support from a big community, a well-funded business or the inventor of the world wide web. diff --git a/src/interoperability/sql.md b/src/interoperability/sql.md index 7061a60..dc582d1 100644 --- a/src/interoperability/sql.md +++ b/src/interoperability/sql.md @@ -1,19 +1,41 @@ +{{#title How does Atomic Data relate to SQL?}} # Atomic Data and SQL Atomic Data has some characteristics that make it similar and different from SQL. -- Atomic Data has a _dynamic_ schema. Any Resource could have different properties. However, the properties themselves are validated (contrary to most NOSQL solutions) +- Atomic Data has a _dynamic_ schema. Any Resource could have different properties, so you can **add new properties** to your data without performing any migrations. However, the properties themselves are still validated (contrary to most NoSQL solutions) +- Atomic Data uses **HTTP URLs** in its data, which means it's easy to **share and reuse**. - Atomic Data separates _reading_ and _writing_, whereas SQL has one language for both. -- Atomic Data has a standardized way of storing changes ([Commits](../commits/intro.md)) +- Atomic Data has a standardized way of **storing changes** ([Commits](../commits/intro.md)) -## Dynamic vs static schema +## Tables and Rows vs. Classes and Properties At its core, SQL is a query language based around _tables_ and _rows_. -The _tables_ in SQL are similar to _classes_ in Atomic Data: they both define a set of _properties_ which an item could have. +The _tables_ in SQL are similar to `Classes` in Atomic Data: they both define a set of `properties` which an item could have. +Every single item in a table is called a _row_ in SQL, and a `Resource` in Atomic Data. +One difference is that in Atomic Data, you can add new properties to resources, without making changes to any tables (migrations). + +## Dynamic vs static schema + In SQL, the schema of the database defines which shape the data can have, which properties are required, what datatypes they have. In Atomic Data, the schema exists as a Resource on the web, which means that they can be retrieved using HTTP. -SQL is a centralized, closed system. -Atomic Data is a decentralized, open system. +An Atomic Database (such as [Atomic-Server](https://crates.io/crates/atomic-server)) uses a _dynamic schema_, +which means that any Resource can have different properties, and the properties themselves can be validated, even when the server is not aware of these properties beforehand. +In SQL, you'd have to manually adjust the schema of your database to add a new property. +Atomic Data is a decentralized, open system, which can read new schema data from other sources. +SQL is a centralized, closed system, which relies on the DB manager to define the schema. + +## Identifiers: numbers vs. URLs + +In SQL, rows have numbers as identifiers, whereas in Atomic Data, every resource has a resolvable HTTP URL as an identifier. +URLs are great identifiers, because you can open them and get more information about something. +This means that with Atomic Data, other systems can re-use your data by referencing to it, and you can re-use data from other systems, too. +With Atomic Data, you're making your data part of a bigger _web of data_, which opens up a lot of possibilities. + +## Atomic Server combines server and database + +If you're building an App with SQL, you will always need some server that connects to your database. +If you're building an App with Atomic Server, the database can function as your server, too. It deals with authentication, authorization, and more. ## Querying @@ -27,6 +49,21 @@ In SQL, the one creating the query basically defines the shape of a table that i Atomic Data does not offer such functionality. So if you need to create custom tables at runtime, you might be better off using SQL, or move your Atomic Data to a query system. +## Convert an SQL database to Atomic Data + +If you want to make your existing SQL project serve Atomic Data, you can keep your existing SQL database, see [the upgrade guide](upgrade.md). +It basically boils down to mapping the rows (properties) in your SQL tables to Atomic Data [Properties](https://atomicdata.dev/classes/Property). + +When you want to _import arbitrary Atomic Data_, though, it might be easier to use `atomic-server`. +If you want to store arbitrary Atomic Data in a SQL database, you might be best off by creating a `Resources` table with a `subject` and a `propertyValues` column, or create both a `properties` table and a `resources` one. + +## Limitations of Atomic Data + +- SQL is far more common, many people will know how to use it. +- SQL databases are battle-tested and has been powering countless of products for tens of years, whereas Atomic Server is at this moment in beta. +- SQL databases have a more powerful and expressive query language, where you can define tables in your query and combine resources. +- Atomic Data doesn't have a [mutli-node / distributed option](https://github.com/atomicdata-dev/atomic-server/issues/213) + ## FAQ ### Is Atomic Data NOSQL or SQL? @@ -40,12 +77,11 @@ So in a way, Atomic Data tries to combine best of both worlds: the extendibility ### Is Atomic Data transactional / ACID? -Well, if you use Atomic-Server, then you can only write to the server by using Atomic Commits, which are in fact transactions. +Yes, if you use Atomic-Server, then you can only write to the server by using Atomic Commits, which are in fact transactions. This means that if part of the transaction fails, it is reverted - transactions are only applied when they are 100% OK. This prevents inconsistent DB states. -### Can I use a SQL database with Atomic Data? +### How does Atomic Server build indexes for its resources if the schema is not known in advance -Yes, if you want to make your existing project serve Atomic Data, you can keep your existing SQL database, see [the upgrade guide](upgrade.md). -When you want to _import arbitrary Atomic Data_, it might be easier to use `atomic-server`. -If you want to store arbitrary Atomic Data in a SQL database, you might be best off by creating a `Resources` table with a `subject` and a `propertyValues` column, or create both a `properties` table and a `resources` one. +It creates indexed collections when users perform queries. +This means that the first time your perform some type of query (that sorts and filters by some properties), it will be slow, but the next time you perform a similar query, it will be fast. diff --git a/src/interoperability/upgrade.md b/src/interoperability/upgrade.md index 75851e0..49fe51c 100644 --- a/src/interoperability/upgrade.md +++ b/src/interoperability/upgrade.md @@ -1,13 +1,49 @@ -# Upgrade your existing application to Atomic Data +{{#title Upgrade your existing application to serve Atomic Data}} +# Upgrade your existing application to serve Atomic Data + +You don't have to use [Atomic-Server](https://crates.io/crates/atomic-server) and ditch your existing projects or apps, if you want to adhere to Atomic Data specs. + +As the Atomic Data spec is modular, you can start out simply and conform to more specs as needed: + +1. Map your JSON keys to new or existing Atomic Data properties +2. Add `@id` fields to your resources, make sure these URLs resolve using HTTP +3. Implement parts of the [Extended spec](../extended.md) + +There's a couple of levels you can go to, when adhering to the Atomic Data spec. + +## Easy: map your JSON keys to Atomic Data Properties If you want to make your existing project compatible with Atomic Data, you probably don't have to get rid of your existing storage / DB implementation. -The only thing that matters, is how you make the data accessible to others: the serialization. +The only thing that matters, is how you make the data accessible to others: the _serialization_. You can keep your existing software and logic, but simply change the last little part of your API. + In short, this is what you'll have to do: -- Map all properties of resources to Atomic Properties. Either use existing ones, or create new ones and make them accessible (using any Atomic Server, as long as the URLs of the properties resolve). -- Make sure that when the user requests some URL, that you return that resource as a JSON-AD object (at the very least if the user requests it using an HTTP `Accept` header). +Map all properties of resources to Atomic Properties. +Either use [existing ones](https://atomicdata.dev/properties), or [create new ones](https://atomicdata.dev/app/new?classSubject=https%3A%2F%2Fatomicdata.dev%2Fclasses%2FProperty&parent=https%3A%2F%2Fatomicdata.dev%2Fagents%2F8S2U%2FviqkaAQVzUisaolrpX6hx%2FG%2FL3e2MTjWA83Rxk%3D&newSubject=https%3A%2F%2Fatomicdata.dev%2Fproperty%2Fsu98ox6tvkh). +This means: take your JSON objects, and change things like `name` to `https://atomicdata.dev/properties/name`. + +That's it, you've done the most important step! + +Now your data is already more interoperable: + +- Every field has a clear **semantic meaning** and **datatype** +- Your data can now be **easily imported** by Atomic Data systems + +## Medium: add `@id` URLs that properly resolve + +Make sure that when the user requests some URL, that you return that resource as a [JSON-AD](../core/json-ad.md) object (at the very least if the user requests it using an HTTP `Accept: application/ad+json` header). + +- Your data can now be **linked to** by external data sources, it can become part of a **web of data**! + +## Hard: implement Atomic Data Extended protocols + +You can go all out, and implement Commits, Hierarchies, Authentication, Collections and [more](https://docs.atomicdata.dev/extended.html). +I'd suggest starting with [Commits](../commits/intro.md), as these allow users to modify data whilst maintaining versioning and auditability. +Check out the [Atomic-Server source code](https://github.com/atomicdata-dev/atomic-server/tree/master/server) to get inspired on how to do this. + +## Reach out for help -Don't feel obliged to implement all parts of the Atomic Data spec, such as Collections and Commits. +If you need any help, join our [Discord](https://discord.gg/a72Rv2P). -If you need any help, get in touch in our [Discord](https://discord.gg/a72Rv2P) +Also, share your thoughts on creating Atomic Data in [this issue on github](https://github.com/ontola/atomic-data-docs/issues/95). diff --git a/src/interoperability/vc-old.md b/src/interoperability/vc-old.md new file mode 100644 index 0000000..52a9919 --- /dev/null +++ b/src/interoperability/vc-old.md @@ -0,0 +1,32 @@ +# Atomic Data and Verifiable Credentials + +Verifiable Credentials are pieces of information that have cryptographic proof by some reliable third party. +For example, you could have a credential that proves your degree, signed by your education. +These credentials an enable privacy-friendly transactions where a credential owner can prove being part of some group, without needing to actually identify themselves. +For example, you could prove that you're over 18 by showing a credential issued by your government, without actually having to show your ID card with your birthdate. +Verifiable Credentials are still not that widely used, but various projects exists that have had moderate success in implementing it. + +In Atomic Data, _all information created with Atomic Commits is verifiable_. +Atomic Commits are signed by specific individuals, and these signatures can be verified with the Public Key from the Agent who signed the Commit. + +## W3C Verifiable Credentials spec + +The W3C Verifiable Credentials (W3CVC) specification has helped to create a spec to describe credentials. +However, the approach is fundamentally different from how Atomic Data works. +In the W3CVC spec, every credential is a resource. +In Atomic Data, having a new type of `Credential` class that maps to W3CVC Credentials is definitely possible, but it is also highly redundant, as Commits already provide the same information. +That's why we've opted for only signing Commits. + +In Atomic Commits, the _change in information_ is signed, instead of the _state_ of the data. +This is by design, as storing signed state changes allows for fully verifiable and reversible history / version control with audit logs. + +## Verifying data with Atomic Commits + +If you want to know whether a specific value that you see is signed by a specific Agent, you need to find the Commit that created the value. + +This can be achieved by using a Collection. +The easiest way to do this, is by using the [`/all-versions` Endpoint](https://atomicdata.dev/all-versions) and finding the Signer of the version that is relevant to your question. + +In the near future, we will introduce a `/verify` Endpoint that will allow you to verify a specific value. + +Visit the [issue on github](https://github.com/ontola/atomic-data-docs/issues/22) to join the discussion about this subject. diff --git a/src/interoperability/verifiable-credentials.md b/src/interoperability/verifiable-credentials.md index 52f53a3..751fb01 100644 --- a/src/interoperability/verifiable-credentials.md +++ b/src/interoperability/verifiable-credentials.md @@ -1,20 +1,2 @@ -# Atomic Data and Verifiable Credentials - -Verifiable Credentials are pieces of information that have cryptographic proof by some reliable third party. -For example, you could have a credential that proves your degree, signed by your education. -These credentials an enable privacy-friendly transactions where a credential owner can prove being part of some group, without needing to actually identify themselves. -For example, you could prove that you're over 18 by showing a credential issued by your government, without actually having to show your ID card with your birthdate. -Verifiable Credentials are still not that widely used, but various projects exists that have had moderate success in implementing it. - -## W3C Verifiable Credentials spec - -The W3C Verifiable Credentials specification has helped to create a spec to describe credentials, but it still leaves some important work to be done. -Most - -## Self-sovereign identity - -Atomic Data is designed to give people more control over their own personal data. -Part of this, is being able to prove things about your identity to others, without relying on some third party to acknowledge this every single time. -This is where verifiable credentials come in. - -Visit the [issue on github](https://github.com/ontola/atomic-data-docs/issues/22) to join the discussion about this subject. +{{#title How does Atomic Data relate to Verifiable Crendentials?}} +# Verifiable Credentials diff --git a/src/invitations.md b/src/invitations.md index 7b6f487..278e5e9 100644 --- a/src/invitations.md +++ b/src/invitations.md @@ -1,6 +1,7 @@ +{{#title Atomic Data Invitations - Sharing using Tokens }} # Invitations & Tokens -_Discussion: https://github.com/ontola/atomic-data/issues/23_ +([Discussion](https://github.com/ontola/atomic-data/issues/23)) At some point on working on something in a web application, you're pretty likely to share that, often not with the entire world. In order to make this process of inviting others as simple as possible, we've come up with an Invitation standard. @@ -13,10 +14,15 @@ In order to make this process of inviting others as simple as possible, we've co ## Flow -1. The Owner or a resource creates an [Invite](https://atomicdata.dev/classes/Invite). This Invite points to a `target` Resource, provides read writes by default but can additionally add `write` rights, contains a bunch of `usagesLeft`. +1. The Owner or a resource creates an [Invite](https://atomicdata.dev/classes/Invite). This Invite points to a `target` Resource, provides `read` rights by default but can additionally add `write` rights, contains a bunch of `usagesLeft`. 1. The Guest opens the Invite URL. This returns the Invite resource, which provides the client with the information needed to do the next request which adds the actual rights. 1. The browser client app might generate a set of keys, or use an existing one. It sends the Agent URL to the Invite in a query param. 1. The server will respond with a Redirect resource, which links to the newly granted `target` resource. 1. The Guest will now be able to access the Resource. Try it on [https://atomicdata.dev/invites/1](https://atomicdata.dev/invites/1) + +## Limitations and gotcha's + +- The one creating the Invite has to take security in consideration. Some URLs can be easily guessed! When implementing Invitations, make sure to use a good amount of randomness when creating the Subject. +- Make sure that the invite is not publicly discoverable (e.g. through a Collection), this can happen if you set the `parent` of the invite to a public resource. diff --git a/src/links.md b/src/links.md new file mode 100644 index 0000000..009836f --- /dev/null +++ b/src/links.md @@ -0,0 +1,4 @@ +# List of links of Atomic Data on the web + +- [Awesome Knowledge Graph](https://github.com/totogo/awesome-knowledge-graph) +- [Awesome Semantic Web](https://github.com/semantalytics/awesome-semantic-web) diff --git a/src/motivation.md b/src/motivation.md index 2e6804d..9feddcc 100644 --- a/src/motivation.md +++ b/src/motivation.md @@ -1,19 +1,91 @@ +{{#title Motivation for creating Atomic Data}} # Motivation: Why Atomic Data? -Linked data (RDF / the semantic web) enables us to use the web as a large, decentralized graph database. + + +## Give people more control over their data + +The world wide web was designed by Tim Berners-Lee to be a decentralized network of servers that help people share information. +As I'm writing this, it is exactly 30 years ago that the first website has launched. +Unfortunately, the web today is not the decentralized network it was supposed to be. +A handful of large tech companies are in control of how the internet is evolving, and where and how our data is being stored. +The various services that companies like Google and Microsoft offer (often for free) integrate really well with their other services, but are mostly designed to _lock you in_. +Vendor lock-in means that it is often difficult to take your information from one app to another. +This limits innovation, and limits users to decide how they want to interact with their data. +Companies often have incentives that are not fully aligned with what users want. +For example, Facebook sorts your newsfeed not to make you satisfied, but to make you spend as much time looking at ads. +They don't want you to be able to control your own newsfeed. +Even companies like Apple, that don't have an ad-revenue model, still have a reason to (and very much do) lock you in. +To make things even worse, even open-source projects made by volunteers often don't work well together. +That's not because of bad intentions, that's because it is _hard_ to make things interoperable. + +If we want to change this, we need open tech that works really well together. +And if we want that, we need to _standardize_. +The existing standards are well-suited for documents and webpages, but not for structured personal data. +If we want to have that, we need to standardize the _read-write web_, which includes standardizing how items are changed, how their types are checked, how we query lists, and more. +I want all people to have a (virtual) private server that contains their own data, that they control. +This [Personal Data Store](usecases/personal-data-store.md) could very well be an old smartphone with a broken screen that is always on, running next to your router. + +Atomic Data is designed to be a standard that achieves this. +But we need more than a standard to get adoption - we need implementations. +That's why I've been working on a server, various libraries, a GUI and [more](tooling.md) - all MIT licensed. +If Atomic Data will be successful, there will likely be other, better implementations. + +## Linked data is awesome, but it is too difficult for developers in its current form + +[Linked data](https://ontola.io/what-is-linked-data/) (RDF / the semantic web) enables us to use the web as a large, decentralized graph database. Using links everywhere in data has amazing merits: links remove ambiguity, they enable exploration, they enable connected datasets. -Linked Data could help to democratize the web by decentralizing information storage, and giving people more control. -The Solid Project by Tim Berners-Lee is a great example of why linked data can help to create a more decentralized web. +But the existing specs are too difficult to use, and that is harming adoption. -At [Ontola](https://ontola.io/), we've been working with linked data quite intensely for the last couple of years. +At my company [Ontola](https://ontola.io/), we've been working with linked data quite intensely for the last couple of years. We went all-in on RDF, and challenged ourselves to create software that communicates exclusively using it. That has been an inspiring, but at times also a frustrating journey. -While building various production grade apps (e.g. our e-democracy platform [Argu.co](https://argu.co/), which is used by various governments), we had to [solve many problems](https://ontola.io/blog/full-stack-linked-data/). -How to properly model data in RDF? How to deal with sequences? How to communicate state changes? Converting RDF to HTML? Typing? CORS? +While building our e-democracy platform [Argu.co](https://argu.co/), we had to [solve many RDF related problems](https://ontola.io/blog/full-stack-linked-data/). +How to properly model data in RDF? How to deal with [sequences](https://ontola.io/blog/ordered-data-in-rdf/)? How to communicate state changes? Which [serialization format](https://ontola.io/blog/rdf-serialization-formats/) to use? How to convert [RDF to HTML, and build a front-end](https://ontola.io/blog/rdf-solid-react-tutorial-link/)? We tackled some of these problems by having a tight grip on the data that we create (e.g. we know the type of data, because we control the resources), and another part is creating new protocols, formats, tools, and libraries. But it took a long time, and it was hard. It's been almost 15 years since the [introduction of linked data](https://www.w3.org/DesignIssues/LinkedData.html), and its adoption has been slow. We know that some of its merits are undeniable, and we truly want the semantic web to succeed. -We believe the lack of growth partially has to do with a lack of tooling, but also with [some problems that lie in the RDF data model](interoperability/rdf.md#why-these-changes). +I believe the lack of growth partially has to do with a lack of tooling, but also with some problems that lie in the RDF data model. Atomic Data aims to take the best parts from RDF, and learn from the past to make a more developer-friendly, performant and reliable data model to achieve a truly linked web. +Read more about [how Atomic Data relates to RDF, and why these changes have been made](interoperability/rdf.md). + +## Make standardization easier and cheaper + +Standards for data sharing are great, but creating one can be very costly endeavor. +Committees with stakeholders write endless documents describing the intricacies of domain models, which fields are allowed and which are required, and how data is serialized. +In virtually all cases, these documents are only written for humans - and not for computers. +Machine readable ways to describe data models like UML diagrams and OpenAPI specifications (also known as Swagger) help to have machine-readable descriptions, but these are still not _really_ used by machines - they are mostly only used to generate _visualizations for humans_. +This ultimately means that implementations of a standard have to be _manually checked_ for compliance, which often results in small (yet important) differences that severely limit interoperability. +These implementations will also often want to _extend_ the original definitions, but they are almost always unable to describe _what_ they have extended. + +Standardizing with Atomic Data solves these issues. +Atomic Data takes the semantic value of ontologies, and merges it with a machine-readable [schemas](schema/intro.md). +This makes standards created using Atomic Data easy to read for humans, and easy to validate for computers (which guarantees interoperability). +Atomic Data has a highly standardized protocol for fetching data, which means that Atomic Schemas can link to each other, and _re-use existing Properties_. +For developers (the people who need to actually implement and use the data that has been standardized), this means their job becomes easier. +Because Properties have URLs, it becomes trivial to _add new Properties_ that were initially not in the main specification, without sacrificing type safety and validation abilities. + +## Make it easier for developers to build feature-rich, interoperable apps + +Every time a developer builds an application, they have to figure a lot of things out. +How to design the API, how to implement forms, how to deal with authentication, authorization, versioning, search... +A lot of time is essentially wasted on solving these issues time and time again. + +By having a more complete, strict standard, Atomic Data aims to decrease this burden. +[Atomic Schema](schema/intro.md) enables developers to easily share their datamodels, and re-use those from others. +[Atomic Commits](commits/intro.md) helps developers to deal with versioning, history, undo and audit logs. +[Atomic Hierarchies](hierarchy.md) provides an intuitive model for authorization and access control. +And finally, the [existing open source Atomic Data software](tooling.md) (such as a server + database, a browser GUI, various libraries and React templates) help developers to have these features without having to do the heavy lifting themselves. diff --git a/src/newsletter.md b/src/newsletter.md new file mode 100644 index 0000000..4491efd --- /dev/null +++ b/src/newsletter.md @@ -0,0 +1,12 @@ +{{#title Official Atomic Data newsletter}} +# Subscribe to the Atomic Data newsletter + +We'll send you an update (max once per month) when there's something relevant to share, such as + +- Major changes to the specification +- Major new releases (with new features) +- Use-cases, implementations +- Tutorials, blog posts +- Organizational / funding news + +[Click here to sign up to the Atomic Data Newsletter](http://eepurl.com/hHcRA1) diff --git a/src/plugins.md b/src/plugins.md index 8041fe0..7076f91 100644 --- a/src/plugins.md +++ b/src/plugins.md @@ -26,3 +26,10 @@ When a plugin is installed, the Server needs to be aware of when the functionali - Periodically (if so, when?) - On a certain endpoint (which endpoint? One or multiple?) - As a middleware when (specific) resources are created / read / updated. + +## Hooks + +### BeforeCommit + +Is run before a Commit is applied. +Useful for performing authorization or data shape checks. diff --git a/src/roadmap.md b/src/roadmap.md new file mode 100644 index 0000000..efa8b41 --- /dev/null +++ b/src/roadmap.md @@ -0,0 +1,60 @@ +# Strategy, history and roadmap for Atomic Data + +We have the ambition to make the internet more interoperable. +We want Atomic Data to be a commonly used specification, enabling a vast amount of applications to work together and share information. +This means we need a lot of people to understand and contribute to Atomic Data. +In this document, discuss the strategic principles we use, the steps we took, and the path forward. +This should help you understand how and where you may be able to contribute. + +## Strategy for adoption + +- **Work on both specification and implementations (both client and server side) simultaneously** to make sure all ideas are both easily explainable and properly implementable. Don't design a spec with a large committee over many months, only to learn that it has implementation issues later on. +- **Create libraries whenever possible.** Enable other developers to re-use the technology in their own stacks. Keep the code as modular as possible. +- **Document everything**. Not just your APIs - also your ideas, considerations and decisions. +- **Do everything public**. All code is open source, all issues are publicly visible. Allow outsiders to learn everything and start contributing. +- **Make an all-in-one workspace app that stand on its own**. Atomic Data may be an abstract, technical story, but we still need end-user friendly applications that solve actual problems if we want to get as much adoption as possible. +- **Let realistic use cases guide API design**. Don't fall victim to spending too much time for extremely rare edge-cases, while ignoring more common issues and wishes. +- **Familiarity first**. Make tools and specs that feel familiar, build libraries for popular frameworks, and stick to conventions whenever possible. + +## History + +- **First draft of specification** (2020-06). Atomic Data started as an unnamed bundle of ideas and best practices to improve how we work with linked data, but quickly turned into a single (draft) specification. The idea was to start with a cohesive and easy to understand documentation, and use that as a stepping stone for writing the first code. After this, the code and specification should both be worked on simultaneously to make sure ideas are both easily explainable and properly implementable. Many of the earliest ideas were changed to make implementation easier. +- **[atomic-cli](https://crates.io/crates/atomic-cli) + [atomic-lib](https://docs.rs/atomic_lib/0.32.1/atomic_lib/)** (2020-07). The CLI functioned as the first platform to explore some of the most core ideas of Atomic Data, such as Properties and fetching. `atomic_lib` is the place where most logic resides. Written in Rust. +- **[AtomicServer](https://github.com/atomicdata-dev/atomic-server/)** (2020-08). The server (using the same `atomic_lib` as the CLI) should be a fast, lightweight server that must be easy to set-up. Functions as a graph database with no dependencies. +- **[Collections](schema/collections.md)** (2020-10). Allows users to perform basic queries, filtering, sorting and pagination. +- **[Commits](commits/intro.md)** (2020-11). Allow keeping track of an event-sourced log of all activities that mutate resources, which in turn allows for versioning and adding new types of indexes later on. +- **[JSON-AD](core/json-ad.md)** (2021-02). Instead of the earlier proposed serialization format `.ad3`, we moved to the more familiar `json-ad`. +- **[Atomic-Data-Browser](https://github.com/atomicdata-dev/atomic-data-browser)** (2021-02). We wanted typescript and react libraries, as well as a nice interactive GUI that works in the browser. It should implement all relevant parts of the specification. +- **[Endpoints](endpoints.md)** (2021-03). Machine readable API endpoints (think Swagger / OpenAPI spec) for things like versioning, path traversal and more. +- **Classes and Properties editable from the browser** (2021-04). The data-browser is now powerful enough to use for managing the core ontological data of the project. +- **[Hierarchies](hierarchy.md) & [Invitations](invitations.md)** (2021-06). Users can set rights, structure Resources and invite new people to collaborate. +- **[Websockets](websockets.md)** (2021-08). Live synchronization between client and server. +- **Use case: Document Editor** (2021-09). Notion-like editor with real-time synchronization. +- **Full-text search** (2021-11). Powered by Tantivy. +- **Authentication for read access** (2021-11). Allows for private data. +- **Desktop support** (2021-12). Run Atomic-Server on the desktop, powered by Tauri. Easier install UX, system tray icon. +- **File management** (2021-12). Upload, download and view Files. +- **Indexed queries** (2022-01). Huge performance increase for queries. Allows for far bigger datasets. +- **Use case: ChatRoom** (2022-04). Group chat application. To make this possible, we had to extend the Commit model with a `push` action, and allow Plugins to create new Commits. +- **[JSON-AD Publishing and Importing](create-json-ad.md)** (2022-08). Creating and consuming Atomic Data becomes a whole lot easier. +- **[@tomic/svelte](https://github.com/atomicdata-dev/atomic-svelte)** (2022-12). Library for integrating Atomic Data with Svelte(Kit). + +## Where we're at + +Most of the specification seems to become pretty stable. +The implementations are working better every day, although 1.0 releases are still quite a bit far away. +At this point, the most important thing is to get developers to try out Atomic Data and provide feedback. +That means not only make it easy to install the tools, but also allow people to make Atomic Data _without_ using any of our own tools. +That's why we're now working on the JSON-AD and Atomizer projects (see below). + +## Roadmap + +- **Video(s) about Atomic Data** (2023 Q1). Explain what Atomic Data is, why we're doing this, and how to get started. +- **[Atomic Tables](https://github.com/atomicdata-dev/atomic-data-browser/issues/25)** (2023 Q2). A powerful table editor with keyboard / copy / paste / sort support that makes it easier to model and edit data. +- **[E-mail registration](https://github.com/atomicdata-dev/atomic-server/issues/276)** (2023 Q1). This makes it easier for users to get started, and de-emphasizes the importance of private key management, as user can register new Private Keys using their e-mail address. +- **Headless CMS tooling** (2023). Use Atomic-Server to host and edit data that is being read by a front-end JAMSTACK type of tool, such as NextJS or SvelteKit. +- **[Atomizer](https://github.com/atomicdata-dev/atomic-server/issues/434)** (2023). Import files and automatically turn these into Atomic Data. +- **Model Marketplace** (2023 Q4). A place where user can easily find, compare and use Classes, Properties and Ontologies. +- **[Atomic-server plugins](https://github.com/atomicdata-dev/atomic-server/issues/73)** (2024). Let developers design new features without having to make PRs in Atomic-Server, and let users install apps without re-compiling (or even restarting) anything. +- **Atomic-browser plugins** (2024). Create new views for Classes. +- **1.0 release** (2024). Mark the specification, the server [(tracking issue)](https://github.com/atomicdata-dev/atomic-server/milestone/5) and the browser as _stable_. It is possible that the Spec will become 1.0 before any implementation is stable. Read the [STATUS.md](https://github.com/atomicdata-dev/atomic-server/blob/master/server/STATUS.md) document for an up-to-date list of features that are already stable. diff --git a/src/schema/classes.md b/src/schema/classes.md index 61cf5d1..4de4ebc 100644 --- a/src/schema/classes.md +++ b/src/schema/classes.md @@ -1,17 +1,9 @@ +{{#title Atomic Data Classes}} # Atomic Schema: Classes -## How to read classes +The following Classes are some of the most fundamental concepts in Atomic Data, as they make data validation possible. -Example: - -- `description` - (required, AtomicURL, TranslationBox) human readable explanation of what the Class represents. - -Means: - -This class has a _required_ property with shortname `description`. -This Property has a Datatype of `AtomicURL`, and these should point to `TranslationBox` instances. - -_Note: the URLs for properties are missing and will be added at a later time._ +Click the URLs of the classes to read the most actual data, and discover their properties! ## Property @@ -41,11 +33,11 @@ Properties of a Property instance: } ``` -Visit https://atomicdata.dev/collections/property for a list of example Properties. +Visit [https://atomicdata.dev/properties](https://atomicdata.dev/properties) for a list of example Properties. ## Datatype -_URL: `https://atomicdata.dev/classes/Datatype`_ +_URL: [`https://atomicdata.dev/classes/Datatype`](https://atomicdata.dev/classes/Datatype)_ A Datatype specifies how a `Value` value should be interpreted. Datatypes are concepts such as `boolean`, `string`, `integer`. @@ -60,11 +52,11 @@ Properties: - `binarySerialization` - (optional, AtomicURL, TranslationBox) how the datatype should be parsed / serialized as a byte array. - `binaryExample` - (optional, string) an example `binarySerialization` that should be parsed correctly. Should have the same contents as the stringExample. Required if binarySerialization is present on the DataType. -Visit https://atomicdata.dev/collections/datatype for a list of example Datatypes. +Visit [https://atomicdata.dev/datatypes](https://atomicdata.dev/datatypes) for a list of example Datatypes. ## Class -_URL: `https://atomicdata.dev/classes/Class`_ +_URL: [`https://atomicdata.dev/classes/Class`](https://atomicdata.dev/classes/Class)_ A Class is an abstract type of Resource, such as `Person`. It is convention to use an Uppercase in its URI. @@ -106,4 +98,4 @@ Example: } ``` -Visit https://atomicdata.dev/collections/class for the a list of example Classes. +Check out a [list of example Classes](https://atomicdata.dev/classes/). diff --git a/src/schema/collections.md b/src/schema/collections.md index 705bc9c..cdf5f3c 100644 --- a/src/schema/collections.md +++ b/src/schema/collections.md @@ -1,3 +1,4 @@ +{{#title Atomic Data Collections - Filtering, Sorting & Querying}} # Atomic Collections _URL: [https://atomicdata.dev/classes/Collection](https://atomicdata.dev/classes/Collection)_ @@ -9,18 +10,18 @@ For dealing with these problems, we have Atomic Collections. An Atomic Collection is a Resource that links to a set of resources. Note that Collections are designed to be _dynamic resources_, often (partially) generated at runtime. +Collections are [Endpoints](../endpoints.md), which means that part of their properties are calculated server-side. Collections have various filters (`subject`, `property`, `value`) that can help to build a useful query. -- `members`: How many items (members) are visible per page. -- `subject`: Filter results by a property URL. -- `property`: Filter results by a property URL. -- `value`: Filter results by a Value. -- `sort_by`: A property URL by which to sort. -- `sort_desc`: Sort descending, instead of ascending. Defaults to `false`. -- `current_page`: The number of the current page. -- `page_size`: How many items (members) are visible per page. -- `total_pages`: How many pages there are for the current collection. -- `total_items`: How many items (members) are visible per page. +- [`members`](https://atomicdata.dev/properties/collection/members): How many items (members) are visible per page. +- [`property`](https://atomicdata.dev/properties/collection/property): Filter results by a property URL. +- [`value`](https://atomicdata.dev/properties/collection/value): Filter results by a Value. Combined with `property`, you can create powerful queries. +- [`sort_by`](https://atomicdata.dev/properties/collection/sortBy): A property URL by which to sort. Defaults to the `subject`. +- [`sort_desc`](https://atomicdata.dev/properties/collection/sortDesc): Sort descending, instead of ascending. Defaults to `false`. +- [`current_page`](https://atomicdata.dev/properties/collection/currentPage): The number of the current page. +- [`page_size`](https://atomicdata.dev/properties/collection/pageSize): How many items (members) are visible per page. +- [`total_pages`](https://atomicdata.dev/properties/collection/totalPages): How many pages there are for the current collection. +- [`total_members`](https://atomicdata.dev/properties/collection/totalMembers): How many items (members) are visible per page. ## Persisting Properties vs Query Parameters diff --git a/src/schema/datatypes.md b/src/schema/datatypes.md index 3f49a81..a450928 100644 --- a/src/schema/datatypes.md +++ b/src/schema/datatypes.md @@ -1,8 +1,9 @@ +{{#title Atomic Data: Datatypes}} # Atomic Schema: Datatypes The Atomic Datatypes consist of some of the most commonly used [Datatypes](classes.md#Datatype). -Please visit for the latest list of official Datatypes. +_Note: Please visit for the latest list of official Datatypes._ ## Slug @@ -32,7 +33,6 @@ _URL: `https://atomicdata.dev/datatypes/string`_ UTF-8 String, no max character count. Newlines use backslash escaped `\n` characters. -Should not contain language specific data, use a [TranslationBox](translations.md) instead. e.g. `String time! \n Second line!` @@ -88,7 +88,7 @@ Use a single bit one boolean. ## Date ISO date _without time_. -YYYY-MM-DD. +`YYYY-MM-DD`. e.g. `1991-01-20` @@ -97,7 +97,7 @@ e.g. `1991-01-20` _URL: `https://atomicdata.dev/datatypes/timestamp`_ Similar to [Unix Timestamp](https://www.unixtimestamp.com/). -Milliseconds since midnight UTC 1970 jan 01 (aka the [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time)). +Milliseconds since midnight UTC 1970 Jan 01 (aka the [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time)). Use this for most DateTime fields. Signed 64 bit integer (instead of 32 bit in Unix systems). @@ -110,6 +110,7 @@ _URL: `https://atomicdata.dev/datatypes/resourceArray`_ Sequential, ordered list of Atomic URIs. Serialized as a JSON array with strings. Note that other types of arrays are not included in this spec, but can be perfectly valid. -([discussion]()) + +([Discussion](https://github.com/atomicdata-dev/atomic-data-docs/issues/127)) - e.g. `["https://example.com/1", "https://example.com/1"]` diff --git a/src/schema/faq.md b/src/schema/faq.md index e0a1085..4f619f9 100644 --- a/src/schema/faq.md +++ b/src/schema/faq.md @@ -1,3 +1,4 @@ +{{#title Atomic Schema FAQ}} # Atomic Schema FAQ ## How do I create a Property that supports multiple Datatypes? @@ -6,11 +7,16 @@ A property only has one single Datatype. However, feel free to create a new kind of Datatype that, in turn, refers to other Datatypes. Perhaps Generics, or Option like types should be part of the Atomic Base Datatypes. +## Do you have an `enum` datatype? + +In Atomic Data, `enum` is not a datatype, but it's a constraint that can be added to properties that have. +You can set [`allows-only`](https://atomicdata.dev/properties/allowsOnly) on a Property, and use that to limit which values are allowed. + ## How should a client deal with Shortname collisions? Atomic Data guarantees Subject-Property uniqueness, which means that Valid Resources are guaranteed to have only one of each Property. Properties offer Shortnames, which are short strings. -These strings SHOULD be unique inside Classes, but these are not guaranteed to be unique inside all Resources. +These strings should be unique inside Classes, but these are not guaranteed to be unique inside all Resources. Note that Resources can have multiple Classes, and through that, they can have colliding Shortnames. Resources are also free to include Properties from other Classes, and their Shortnames, too, might collide. @@ -29,7 +35,7 @@ Let's assume that `https://example.com/name` and `https://another.example.com/so What if a client tries something such as `people123.name`? To consistently return a single value, we need some type of _precedence_: -1. The earlier Class mentioned in the [`isA`](https://atomicdata.dev/properties/isA) Property of the resource. Resources can have multiple classes, but they appear in an ordered ResourceArray. Classes, internally SHOULD have no key collisions in required and recommended properties, which means that they might have. If these exist internally, sort the properties by how they are ordered in the `isA` array - first item is preferred. +1. The earlier Class mentioned in the [`isA`](https://atomicdata.dev/properties/isA) Property of the resource. Resources can have multiple classes, but they appear in an ordered ResourceArray. Classes, internally should have no key collisions in required and recommended properties, which means that they might have. If these exist internally, sort the properties by how they are ordered in the `isA` array - first item is preferred. 1. When the Properties are not part of any of the mentioned Classes, use Alphabetical sorting of the Property URL. When shortname collisions are possible, it's recommended to not use the shortname, but use the URL of the Property: @@ -53,8 +59,28 @@ Another approach, is using [foreign keys (see issue)](https://github.com/ontola/ ## How does Atomic Schema relate to RDF / SHACL / SheX / OWL / RDFS? -Atomic Schema is _the_ schema language for atomic data, whereas RDF has a couple of competing ones, which all vary greatly. +Atomic Schema is _the_ schema language for Atomic Data, whereas RDF has a couple of competing ones, which all vary greatly. In short, OWL is not designed for schema validation, but SHACL and SheX can maybe be compared to Atomic Schema. An important difference is that SHACL and SheX have to deal with all the complexities of RDF, whereas Atomic Data is more constrained. For more information, see [RDF interoperability](../interoperability/rdf.md). + +## What are the risks of using Schema data hosted somewhere else? + +Every time you use an external URL in your data, you kind of create a dependency. +This is fundamental to linked data. +In Atomic Data, not having access to the Property in some JSON-AD resource will lead to now knowing how to interpret the data itself. +You will no longer know what the Datatype was (other than the native JSON datatype, of course), or what the semantic meaning was of the relationship. + +There are multiple ways we can deal with this: + +- **Cache dependencies**: Atomic Server already stores a copy of every class and property that it uses by default. The `/path` endpoint then allows clients to fetch these from servers that have cached it. If the source goes offline, the validations can still be performed by the server. However, it might be a good idea to migrate the data to a hosted ontology, e.g. by cloning the cached ontology. +- **Content-addressing**: using non-HTTP identifiers, such as with [IPFS](../interoperability/ipfs.md). + +([Discussion](https://github.com/ontola/atomic-data-docs/issues/99)) + +## How do I deal with subclasses / inheritance? + +Atomic Data does not have a concept of inheritance. +However, you can use the `isA` property to link to _multiple Classes_ from a single resource. +This effectively diff --git a/src/schema/intro.md b/src/schema/intro.md index c664e7c..537bdba 100644 --- a/src/schema/intro.md +++ b/src/schema/intro.md @@ -1,8 +1,10 @@ +{{#title Atomic Data Schema - modelling Atomic Data}} # Atomic Schema Atomic Schema is the proposed standard for specifying classes, properties and datatypes in Atomic Data. -You can compare it to what XSD is for XML. -Atomic Schema deals with validating and constraining the shape of data - it checks if all required properties are present, and whether the values conform to the datatype requirements (e.g. `datetime`, or `URL`). +You can compare it to UML diagrams, or what XSD is for XML. +Atomic Schema deals with validating and constraining the shape of data. +It is designed for checking if all the required properties are present, and whether the values conform to the datatype requirements (e.g. `datetime`, or `URL`). This section will define various Classes, Properties and Datatypes (discussed in [Atomic Core: Concepts](../core/concepts.md)). @@ -20,12 +22,13 @@ This section will define various Classes, Properties and Datatypes (discussed in In short, Atomic Schema works like this: -The Property _field_ in an Atom links to a **Property _Resource_**. It is important that the URL to the Property Resource resolves. +The Property _field_ in an Atom, or the _key_ in a JSON-AD object, links to a **Property _Resource_**. +It is important that the URL to the Property Resource resolves, as others can re-use it and check its datatype. This Property does three things: -1. it tells something about its semantic meaning, and links to a Datatype. -1. it links to a Datatype or Class, which indicates which Value is acceptable. -1. it provides a Shortname, which is used for ORM. +1. it links to a **Datatype** which indicates which Value is acceptable. +1. it has a **description** which tells you what the property means, what the relationship between the Subject and the Value means. +1. it provides a **Shortname**, which is sometimes used as an alternative to the full URL of the Property. **DataTypes** define the shape of the Value, e.g. a Number (`124`) or Boolean (`true`). diff --git a/src/schema/translations.md b/src/schema/translations.md index a8b31be..dc1193a 100644 --- a/src/schema/translations.md +++ b/src/schema/translations.md @@ -1,15 +1,17 @@ +{{#title Atomic Data Translations}} # Atomic Translations _Status: design / concept stage_ Dealing with translations can be hard. -([See discussion on this subject here.](https://github.com/ontola/atomic-data/issues/6)) + +([Discussion](https://github.com/ontola/atomic-data/issues/6)) ## TranslationBox _URL: `https://atomicdata.dev/classes/TranslationBox` (does not resolve yet)_ -A TranslationBox is a collection of translated strings, uses to provide multiple translations. +A TranslationBox is a collection of translated strings, used to provide multiple translations. It has a long list of optional properties, each corresponding to some language. Each possible language Property uses the following URL template: `https://atomicdata.dev/languages/{langguageTag}`. Use a [BCP 47](http://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag, e.g. `nl` or `en-US`. @@ -26,6 +28,6 @@ For example: } ``` -Every single property used for Translation strings are instances of the Translation class. +Every single value used for Translation strings is an instance of the Translation class. A translation string uses the [MDString](https://atomicdata.dev/datatypes/markdown) datatype, which means it allows Markdown syntax. diff --git a/src/tooling.md b/src/tooling.md index 5453143..df26799 100644 --- a/src/tooling.md +++ b/src/tooling.md @@ -1,10 +1,13 @@ -# Software and libraries +{{#title Software and libraries for Atomic Data}} +# Software and libraries for Atomic Data + +Although Atomic Data is a specification, it also has reference implementations: Open source (MIT licenced) software for Atomic Data: -- Server: [atomic-server](https://github.com/joepio/atomic) -- Front-end browser: [atomic-data-browser](https://github.com/joepio/atomic-data-browser) -- CLI (atomic-cli): [atomic-cli](https://github.com/joepio/atomic) +- **Server + Database**: [atomic-server](https://github.com/atomicdata-dev/atomic-server) +- **GUI**: [atomic-data-browser](https://github.com/atomicdata-dev/atomic-data-browser) +- **CLI**: [atomic-cli](https://github.com/atomicdata-dev/atomic-server) Libraries (MIT licenced) to build apps with: @@ -23,11 +26,11 @@ Server for hosting Atomic Data. Uses `atomic-lib`. - Authorization, authentication, versioning, collections, pagination - Browser-friendly HTML presentation, JSON serialization, RDF serialization. -One liner: `$ docker run -p 80:80 -p 443:443 -v atomic-storage:/atomic-storage joepmeneer/atomic-server` +One liner: `$ docker run -p 80:80 -v atomic-storage:/atomic-storage joepmeneer/atomic-server` [demo](https://atomicdata.dev/) -[repository + issue tracker](https://github.com/joepio/atomic). +[repository + issue tracker](https://github.com/atomicdata-dev/atomic-data-browser). ### `atomic-data-browser` @@ -39,7 +42,7 @@ Data browser, powered by `@tomic/lib` and `@tomic/react`. [demo](https://atomicdata.dev/) (same as `atomic-server`) -[repository + issue tracker](https://github.com/joepio/atomic-data-browser). +[repository + issue tracker](https://github.com/atomicdata-dev/atomic-data-browser). ### `atomic-cli` @@ -68,10 +71,15 @@ SUBCOMMANDS: set Update an Atom's value. Uses Commits. tpf Finds Atoms using Triple Pattern Fragments. -Visit https://github.com/joepio/atomic for more info +Visit https://github.com/atomicdata-dev/atomic-data-browser for more info ``` -[repository + issue tracker](https://github.com/joepio/atomic). +[repository + issue tracker](https://github.com/atomicdata-dev/atomic-data-browser). + + +### Raycast extension: Full-text search from your desktop + +[Install here](https://www.raycast.com/atomicdata-dev/atomic-data-browser). ## Libraries @@ -88,12 +96,11 @@ Library that powers `atomic-server` and `atomic-cli`. Features: - An in-memory store - Parsing (JSON-AD) / Serialization (JSON-AD, JSON-LD, TTL, N-Triples) - Commit validation and processing -- TPF queries - Constructing Collections - Path traversal - Basic validation -[repository + issue tracker](https://github.com/joepio/atomic). +[repository + issue tracker](https://github.com/atomicdata-dev/atomic-data-browser). ## Want to add to this list? Some ideas for tooling diff --git a/src/trust.md b/src/trust.md index d67328c..5bbee08 100644 --- a/src/trust.md +++ b/src/trust.md @@ -5,7 +5,7 @@ _status: just an idea_ Not all information on the web can be trusted. Instead of relying on some centralized authority to say which content is to be trusted, we can leverage our existing trust networks to determine what we can trust or not. -Atomic Trust is a proposed standard to share which actors, domains and resources we trust on the web. +Atomic Trust is a specification to share which actors, domains and resources we trust on the web. It's a decentralized model defined with Atomic Schema to create explicit trust networks. It can be used to calculate a score about a resource (such as a webpage). diff --git a/src/usecases/ai.md b/src/usecases/ai.md new file mode 100644 index 0000000..e2535df --- /dev/null +++ b/src/usecases/ai.md @@ -0,0 +1,31 @@ +# Atomic Data & Artificial Intelligence + +Recent developments in machine learning (and specifically deep neural networks) have shown how powerful and versatile AI can be. +Both Atomic Data and AI can be used to store and query knowledge, but we think of these technologies as complementary due to their unique characteristics: + +- Artificial Intelligence can make sense of (unstructured) data, so you can feed it any type of data. However, AIs often produce unpredictable and sometimes incorrect results. +- Atomic Data helps to make data interoperable, reliable and predictable. However, it requires very strict inputs. + +There are two ways in which Atomic Data and AI can help each other: + +- AI can help to make creating Atomic Data easier. +- Atomic Data can help train AIs. +- Atomic Data can provide AIs with reliable, machine readable data for answering questions. + +## Make it easier to create Atomic Data using AI + +While writing text, an AI might help make suggestions to disambiguate whatever it is you're writing about. +For example, you may mention `John` and your knowledge graph editor (like `atomic-server`) could suggest `John Wayne` or `John Cena`. +When making your selection, a link will be created which helps to make your knowledge graph more easily browsable. +AI could help make these suggestions through context-aware _entity recognition_. + +## Train AIs with Atomic Data + +During training, you could feed Atomic Data to your AI to help it construct a reliable, consistent model of the knowledge relevant to your organization or domain. +You could use `atomic-server` as the knowledge store, and iterate over your resources and let your AI parse them. + +## Provide AI with query access to answer questions + +Instead of training your AI, you might provide your AI with an interface to perform queries. +Note that at this moment, I'm not aware of any AIs that can actually construct and execute queries, but because of recent advancements (e.g. ChatGPT), we know that there now exist AIs that can create SQL queries based on human text. +In the future, you might let your AI query your `atomic-server` to find reliable and up-to-date answers to your questions. diff --git a/src/usecases/crud.md b/src/usecases/crud.md new file mode 100644 index 0000000..4189252 --- /dev/null +++ b/src/usecases/crud.md @@ -0,0 +1 @@ +# Atomic-Server for CRUD applications diff --git a/src/usecases/data-catalog.md b/src/usecases/data-catalog.md new file mode 100644 index 0000000..a6ad231 --- /dev/null +++ b/src/usecases/data-catalog.md @@ -0,0 +1,44 @@ +{{#title Atomic Server as a Data Catalog}} +# Using Atomic-Server as a Data Catalog + +A data catalog is a system that collects metadata - data about data. +They are inventories of datasets. + +They are often used to: + +- **Increase data-reuse of (open) datasets**. By making descriptions of datasets, you increase their discoverability. +- **Manage data quality**. The more datasets you have, the more you'll want to make sure they are usable. This could mean settings serialization requirements or schema compliance. +- **Manage compliance with privacy laws**. If you have datasets that contain GDPR-relevant data (personal data), you're legally required to maintain a list of where that data is stored, what you need it for and what you're doing with it. + +## Why Atomic Server could be great for Data Catalogs + +[Atomic-Server](https://docs.atomicdata.dev/atomic-server.html) is a powerful Database that can be used as a modern, powerful data catalog. It has a few advantages over others: + +- Free & **open source**. MIT licensed! +- Many built-in features, like **full-text search**, **history**, **live synchronization** and **rights management**. +- Great **performance**. Requests take nanoseconds to milliseconds. +- Very **easy to setup**. One single binary, no weird runtime dependencies. +- Everything is linked data. Not just datasets (which you might), but also everything around them (users, comments, implementations). +- Powerful **CMS capabilities**. With built in support for Tables and Documents, you can easily create webpages with articles or other types of resources using Atomic Server. +- [Atomic Schema](../schema/intro.md) can be used to describe the **shape of your datasets**: the properties you use, which fields are required - things like that. Because Atomic Schema uses URLs, we can easily re-use properties and class definitions. This helps to make your datasets highly interoperable. + +## When Atomic-Server is used for hosting the data, too + +Most datacatalogs only have metadata. However, if you convert your existing CSV / JSON / XML / ... datasets to _Atomic Data_, you can host them on Atomic-Server as well. This has a few advantages: + +- **Data previews** in the browser, users can navigate through the data without leaving the catalog. +- Data itself becomes **browseable**, too, which means you can traverse a graph by clicking on link values. +- **Standardized Querying** means you can easily, from the data catalog, can filter and sort the data. +- **Cross-dataset search**. Search queries can be performed over multiple Atomic Data servers at once, enabling searching over multiple datasets. This is also called _federated search_. + +## Atomic Server compared to CKAN + +- Atomic-Server is MIT licensed - which is more permissive than CKAN's AGPL license. +- Whereas CKAN needs an external database, a python runtime, solrd and a HTTPS server, Atomic-Server has all of these built-in! +- CKAN uses plain RDF, which has some [very important drawbacks](../interoperability/rdf.md). +- But... Atomic-Server still misses a few essentials right now: + +## What we should add to Atomic-Server before it's a decent Data Catalog + +- Add a model for datasets. This is absolutely essential. It could be based on (and link to) DCAT, but needs to be described using Atomic Schema. This step means we can generate forms for Datasets and we can validate their fields. +- Add views for datasets. Atomic-Server already renders decent views for unknown resources, but a specific view should be created for Datasets. [Add a PR](https://github.com/atomicdata-dev/atomic-data-browser) if you have a React view! diff --git a/src/usecases/e-commerce.md b/src/usecases/e-commerce.md index de98cb8..98d7df4 100644 --- a/src/usecases/e-commerce.md +++ b/src/usecases/e-commerce.md @@ -1,14 +1,51 @@ -# Atomic Data for e-commerce +{{#title Atomic Data for e-commerce & marketplaces}} +# Atomic Data for e-commerce & marketplaces + +Buying good and services on the internet is currently responsible for about 15% of all commerce, and is steadily climbing. +The internet makes it easier to find products, compare prices, get information and reviews, and finally order something. +But the current e-commerce situation is far from perfect, as large corporations tend to monopolize, which means that we have less competition which ultimately harms prices and quality for consumers. +Atomic Data can help empower smaller businesses, make searching for specific things way easier and ultimately make things cheaper for everyone. + +## Decentralize platform / sharing economy service marketplaces + +Platforms like Uber, AirBNB and SnapCar are virtual marketplaces that help people share and find services. +These platforms are responsible for: + +1. providing an interface for **managing offers** (e.g. describe your car, add specifications and pricing) +2. **hosting** the data of the offers themselves (make the data available on the internet) +3. providing a **search interface** (which means indexing the data from all the existing offers) +4. facilitating the **transaction** / payments +5. provide **trust** through reviews and warranties (e.g. refunds if the seller fails to deliver) + +The fact that these responsibilities are almost always combined in a single platforms leads to vendor lock-in and an uncompetitive landscape, which ultimately harms consumers. +Currently, if you want to manage your listing / offer on various platforms, you need to manually adjust it on all these various platforms. +Some companies even prohibit offering on multiple platforms (which is a legal problem, not a technical one). +This means that the biggest (most known) platforms have the most listings, so if you're looking for a house / car / rental / meal, you're likely to go for the biggest business - because that's the one that has the biggest assortment. + +Compare this to how the web works: every browser should support every type of webpage, and it does not matter where the webpage is hosted. +I can browse a webpage written on a mac on my windows machine, and I can read a webpage hosted by amazon on an google device. +It does not matter, because the web is _standardized_ and _open_, instead of being _centralized_ and managed by one single company as _proprietary_ data. +This openness of the web means that we get search engines like Google and Bing that _scrape_ the web and add it to their index. +This results in a dynamic where those who want to sell their stuff will need to share their stuff using an open standard (for webpages things like HTML and sometimes a bit of metadata), so crawlers can properly index the webpages. +We could do the same thing for _structured data_ instead of _pages_, and that's what Atomic Data is all about. + +Let's discuss a more practical example of what this could mean. +Consider a restaurant owner who currently uses UberEats as their delivery platform. +Using Atomic Data, they could define their menu on their own website. +The Atomic Schema specification makes it easy to standardize how the data of a menu item looks like (e.g. price, image, title, allergens, vegan...). +Several platforms (potentially modern variants of platforms like JustEat / UberEats) could then crawl this standardized Atomic Data, index it, and make it easily searchable. +The customer would use one (or multiple) of these platforms, that would probably have the _exact same_ offers. +Where these platforms might differ, is in their own service offering, such as delivery speed or price. +This would result in a more competitive and free market, where customers would be able to pick a platform based on their service price and quality, instead of their list of offerings. +It would empower the small business owner to be far more flexible in which service they will do business with. ## Highly personalized and customizable search -Searching for things on the internet is still not that great. +Searching for products on the internet is mostly limited to text search. If we want to buy a jacket, we see tonnes of jackets that are not even available in our own size. Every single website has their own way of searching and filtering. -Imagine creating a search description in _one_ application, and sending that to _multiple suppliers_, after you'll receive a fully personalized and optimized list of articles. -No duplicate articles, every article with price comparison. - +Imagine making a search query in _one_ application, and sending that to _multiple suppliers_, after you'll receive a fully personalized and optimized list of products. Browsing in an application that you like to use, not bound to any one specific store, that doesn't track you, and doesn't show advertisements. It is a tool that helps you to find what you need, and it is the job of producers to accurately describe their products in a format that your product browser can understand. @@ -21,15 +58,30 @@ Describing this in a machine-readable and predictable format as data is the next This is, of course, where Atomic Schema could help. Atomic-server could be the connected, open source database that suppliers use to describe their products as data. +Then we'll also need to build a search interface that performs federated queries, and product-dependent filter options. + +## Product lifecycle & supply chain insights + +Imagine buying a product, and being able to see where each part came from. +The car that you buy might contain a list of all the maintenance moments, and every replaced part. +The raw materials used could be traced back to their origins. + +This requires a high degree of coordination from each step in the supply chain. +This is exactly where Atomic Data shines, though, as it provides a highly standardized way of structuring, querying, authenticating an authorizing data. + +Before we get to this point, we'll need to: + +- Describe domain-specific product Classes using Atomic Schema, and their Properties. + ## Product specific updates after purchase Imagine buying an external battery pack with a production error. All units with a serial number between 1561168 and 1561468 have a serious error, where overcharging could lead to spontaneous combustion. This is something that you'd like to know. But how would the _manufacturer_ of that resource know where to find you? -Well, if your Atomic Server would have a list of all the things that you've bought, it could subscribe to safety updates from all manufacturers. +Well, if your Atomic Server would have a list of all the things that you've bought, it could _automatically_ subscribe to safety updates from all manufacturers. When any of these manufacturers would publish a safety warning about a product that you possess, you'll get an alert. -## Product lifecycle insights +Before we have this, well need to: -Imagine buying a product, and being able to see where each part came from. +- Build notifications support (see [issue](https://github.com/atomicdata-dev/atomic-server/issues/77)) diff --git a/src/usecases/education.md b/src/usecases/education.md new file mode 100644 index 0000000..e319de7 --- /dev/null +++ b/src/usecases/education.md @@ -0,0 +1,37 @@ +# Atomic Data for Education - standardized, modular e-learning + +The Atomic Data specification can help make online educational content more **modular**. This has two direct benefits: + +- **Separate learning goals from how they are achieved**. Some might prefer watching a video, others may want to read. Both can describe the same topic, and share the same test. +- **Improve discoverability**. Create links between topics so students know which knowledge is needed to advance to the next topic. + +## Modular educational content - a model + +We can think of **Knowledge** as being building blocks that we need to do certain things. +And we can think of **Lessons** as _teaching_ certain pieces of knowledge, while at the same time _requiring_ other pieces of knowledge. +For example, an algebra class might require that you already know how to multiply, add, etc. +We can think of **Test** as _verifying_ if a piece of knowledge is properly understood. + +Now there's also a relationship between the **Student** and all of these things. +A student is following a bunch Lessons in which they've made some progress, has done some Tests which resulted in Scores. + +Describing our educational content in this fashion has a bunch of advantages. +For students, this means they can know in advance if they can get started with a course, or if they need to learn something else first. +Conversely, they can also discover new topics that depend on their previous piece of knowledge. +For teachers, this means they can re-use existing lessons for the courses. + +## What makes Atomic-Server a great tool for creating online courseware + +- Powerful built-in document editor +- Drag & drop file support +- Versioning +- Open source, so no vendor lock-in, and full customizability +- Real-time updates, great for collaboration +- Online by default, so no extra hassle with putting courses on the internet + +However, there is still a lot to do! + +- Turn the model described above into an actual Atomic Schema data model +- Build the GUI for the application +- Add plugins / extenders for things like doing tests (without giving the answer to students!) +- Create educational content diff --git a/src/usecases/food-labels.md b/src/usecases/food-labels.md new file mode 100644 index 0000000..a4efc9d --- /dev/null +++ b/src/usecases/food-labels.md @@ -0,0 +1,38 @@ +# Atomic Data for food label standardization + +In most countries, food producers are required to provide nutritional information on the packages of products, which helps citizens to make informed decisions about what to eat. +But how about we upgrade these labels to machine-readable, atomic data? +We could describe products using Atomic Data, and put their identifiers (Subject URLs) as QR codes on packages. +Imagine these scenarios: + +## Scan labels to get detailed, reliable, interactive information + +You want to know more about some new cereal you've just bought. +You scan the QR code on the package. +A web app opens that shows detailed, yet highly visual information about its nutritional value. +The screen is no longer limited to what realistically fits on a package. +The elements are interactive, and provide explanations. +Everything is translated to the user's language. +If the food is (soon to be) expired, the app will clearly and visually alert you. +Click on the question mark next to `granulated sugars`, and you get an explanation of what this means to your health. +E-numbers are clickable, too, and help you instantly understand far more about what they represent. +When AR glasses become technologically feasible, you could even help people make better decisions while doing grocery shopping. + +Using _links_ instead of _names_ helps to guide consumers to _trustworthy_ pages that communicate clearly. +The alternative is that they use search engines, and maybe end up reading misinformation. + +## Provide nutritional advice based on shopping behavior + +You order a bunch of products on your favorite groceries delivery app. +When going to the payment screen, you are shown a nutritional overview of your order. +You see that with this diet, you might have a deficit of the Lysene amino acid. +The shopping cart suggest adding egg, dairy or soy to your diet. +This can be done, because the groceries app can easily check detailed information about the food in your shopping cart, and reason about your dietary intake. + +## How to achieve all this + +1. The governing body (e.g. the European Commision) should set up an [Atomic Server](https://github.com/atomicdata-dev/atomic-server/) and host it on some recognizable domain. +1. Create the [Class](https://atomicdata.dev/classes/Class) for a food product, containing the same (or more) information that is shown on food packages. +1. Create the Class for Ingredient. +1. Create instances for various Ingredients. Start with the E-numbers, work your way up to all kinds of used ingredients. Add Translations. +1. Give instructions to Producers on how to describe their Products. Give them to option to host their own Server and control their own data, and give them the option to use some EU server. diff --git a/src/usecases/governmental-open-data.md b/src/usecases/governmental-open-data.md new file mode 100644 index 0000000..27a6ba0 --- /dev/null +++ b/src/usecases/governmental-open-data.md @@ -0,0 +1,4 @@ +# Publishing governmental Open Data as Atomic Data + +- More information is more better +- diff --git a/src/usecases/headless-cms.md b/src/usecases/headless-cms.md new file mode 100644 index 0000000..a58984e --- /dev/null +++ b/src/usecases/headless-cms.md @@ -0,0 +1,50 @@ +# Using Atomic-Server as an open source headless CMS + +## Why people are switching to Headless CMS + +Traditionally, content management systems were responsible for both managing the content as well as producing the actual HTML views that the user saw. +This approach has some issues regarding performance and flexibility that headless CMS tools solve. + +- **Great performance**. We want pages to load in milliseconds, not seconds. Headless CMS tools + JAMSTACK style architectures are designed to give both performant initial page loads, as well as consecutive / dynamic loads. +- **High flexibility**. Designs change, and front-end developers want to use the tools that they know and love to create these designs effectively. With a headless CMS, you can build the front-end with the tools that you want, and make it look exactly like you want. +- **Easier content management**. Not every CMS is as fun and easy to use, as an admin, as others. Headless CMS tools focus on the admin side of things, so the front-end devs don't have to work on the back-end as well. + +## Atomic Server + +The [Atomic-Server](https://github.com/atomicdata-dev/atomic-server/blob/master/server/README.md) project may be the right choice for you if you're looking for a Headless CMS: + + +- **Free and open source**. MIT licensed, no strings attached. +- **Easy to use API**. Atomic-Server is built using the [Atomic Data specification](../atomic-data-overview.md). It is well-documented, and uses conventions that most web developers are already familiar with. +- **Typescript & React libraries**. Use the existing react hooks to make your own fully editable, live-reloaded web application. +- **Fast**. 1ms responses on my laptop. It's written in Rust, so it squeezes out every cycle of your server. +- **Lightweight**. It's a single 8MB binary, no external dependencies needed. +- **Easy to setup**. Just run the binary and open the address. Even HTTPS support is built-in. +- **Clean, powerful admin GUI**. The Atomic-Data-Browser front-end gives you a very easy interface to manage your content. +- **Share your data models**. Atomic Data is designed to achieve a more decentralized web. You can easily re-use existing data models, or share the ones you built. +- **Files / Attachments**. Upload and preview files. +- **Pagination / sorting / filtering**. Query your data. +- **Versioning**. Built-in history, where each transaction is saved. +- **Websockets**. If you need live updates and highly interactive apps (collaborative documents and chatrooms), we've got your back. +- **Full-text search**. No need for a big elasticsearch server - atomic-server has one built-in. + +## Limitations + +- No support for image resizing, [as of now](https://github.com/atomicdata-dev/atomic-server/issues/257) +- No GraphQL support [(see issue)](https://github.com/atomicdata-dev/atomic-server/issues/251) + +## Setting up the server + +- One-liners: `cargo install atomic-server` or `docker run -p 80:80 -v atomic-storage:/atomic-storage joepmeneer/atomic-server` +- Check out the [readme!](https://github.com/atomicdata-dev/atomic-server) + +## Using the data in your (React / NextJS) app + +The `@tomic/lib` and `@tomic/react` typescript NPM libraries can be used in any JS project. + +In the next section, we'll discuss how to use Atomic-Server in your React project. + +## Compared to alternative open source headless CMS software + +- **Strapi**: Atomic-Server doesn't need an external database, is easier to setup, has live synchronization support and is way faster. However, Strapi has a plugin system, is more polished, and has GraphQL support. +- ** diff --git a/src/usecases/intro.md b/src/usecases/intro.md index a4116f9..de75bc9 100644 --- a/src/usecases/intro.md +++ b/src/usecases/intro.md @@ -1,5 +1,18 @@ -# Atomic Data Use Cases +{{#title Various Use Cases for Atomic Data}} +# Various Use Cases for Atomic Data Most of this book is either abstract or technical, but this section aims to be different. In this section, we'll present concrete examples of things that can be built with Atomic Data. Although you could use Atomic Data for pretty much any type of application, it is especially valuable where **data re-use**, **standardization**, and **data ownership** are important. + + +* [As a Headless CMS](headless-cms.md) +* [In a React project](react.md) +* [Personal Data Store](personal-data-store.md) +* [Artificial Intelligence](ai.md) +* [E-commerce & marketplaces](e-commerce.md) +* [Surveys](surveys.md) +* [Verifiable Credentials](verifiable-credentials.md) +* [Data Catalog](data-catalog.md) +* [Education](education.md) +* [Food labels](food-labels.md) diff --git a/src/usecases/job-matching.md b/src/usecases/job-matching.md new file mode 100644 index 0000000..1394c28 --- /dev/null +++ b/src/usecases/job-matching.md @@ -0,0 +1,6 @@ +WIP + +# Atomic Data for job matching and vacancies + +- https://sfia-online.org/en/about-sfia/sfia-guiding-principles +- diff --git a/src/usecases/knowledge-management.md b/src/usecases/knowledge-management.md new file mode 100644 index 0000000..344f1a8 --- /dev/null +++ b/src/usecases/knowledge-management.md @@ -0,0 +1,41 @@ +# Atomic Data for (semantic) knowledge graph management + +Knowledge **management** is about making valuable knowledge easily findable, so everybody in an organization can be as effective as possible. +Knowledge **graphs** are information structures that help organizations to organize their knowledge using a graph model. +Graphs are especially useful for structuring knowledge, as they allow having links between resources which makes relationships understandable and makes data browsable. + +Atomic Data is a Graph structure, and [Atomic-Server](https://crates.io/crates/atomic-server/) is an open source Graph database / knowledge management system. + +## Knowledge management systems + +How do organizations store and share knowledge? +Some rely completely on their brains and social networks: if you want to know how the copier works, ask Sara. +But most use digital documents - more often than not in the cloud. +If your knowledge is digital and online available, people can retrieve it from anywhere at great speed. +Being able to search and browse through information is essential to making it effortless to retrieve. + +But good knowledge management systems are not just static: they have lives of their own. +Knowledge changes over time. +People add documents, make changes, move things. + +## Why use Atomic-Server as a knowledge management system + +### The entire web as one knowledge graph + +Atomic Data uses URLs to identify resources. +This means that it + +### Type-safe, decentralized data structures + +Contrary to many other types of graph systems, Atomic Data ensures type-safety by having a built-in schema language ([Atomic Schema](../schema/intro.md)). +This means that it is very easy to share and re-use data models, which helps you standardize the classes and properties that you use. + +## Non-goals of Atomic-Server + +- Deep, specific query requirements +- Time-series data / data visualization + +## Alternatives + +- **LinkedDataHub** by Atomgraph (unrelated, don't mind the name similarities): knowledge graph management tool that also supports RDF. Open source. +- ** diff --git a/src/usecases/personal-data-store.md b/src/usecases/personal-data-store.md index 8b720d3..1e8ddb9 100644 --- a/src/usecases/personal-data-store.md +++ b/src/usecases/personal-data-store.md @@ -1,3 +1,4 @@ +{{#title Atomic Data for personal data stores}} # Atomic Data for personal data stores A Personal Data Store (or personal data service) is a place where you store all sorts of personal information. @@ -10,7 +11,7 @@ Many services don't even provide export functionality, and even if they do, the Atomic Data could help to re-introduce data ownership. Because the specification helps to standardize information, it becomes easier to make data interoperable. -And even more important: Apps don't need their own back-end - they can use the same personal data store: an Atomic Server (such as [this one](https://github.com/joepio/atomic/blob/master/server/README.md)). +And even more important: Apps don't need their own back-end - they can use the same personal data store: an Atomic Server (such as [this one](https://github.com/atomicdata-dev/atomic-serverob/master/server/README.md)). Realizing this goal requires quite a bit of work, though. This specification needs to mature, and we need reliable implementations. diff --git a/src/usecases/react.md b/src/usecases/react.md new file mode 100644 index 0000000..6635019 --- /dev/null +++ b/src/usecases/react.md @@ -0,0 +1,18 @@ +# Using Atomic Data in a JS / TS React project + +Atomic Data has been designed with front-end development in mind. +The open source [Atomic-Data-Browser](https://github.com/atomicdata-dev/atomic-data-browser), which is feature-packed with chatrooms, a real-time collaborative rich text editor, tables and more, is powered by two libraries: + +- `@tomic/lib` ([docs](https://atomicdata-dev.github.io/atomic-data-browser/docs/modules/_tomic_lib.html)) is the core library, containing logic for fetching and storing data, keeping things in sync using websockets, and signing [commits](../commits/intro.md). +- `@tomic/react` ([docs](https://atomicdata-dev.github.io/atomic-data-browser/docs/modules/_tomic_react.html)) is the react library, featuring various useful hooks that mimic `useState`, giving you real-time updates through your app. + +Check out the [template on CodeSandbox](https://codesandbox.io/s/atomic-data-react-template-4y9qu?file=/src/MyResource.tsx:0-1223): + + + +Feeling stuck? [Post an issue](https://github.com/atomicdata-dev/atomic-data-browser/issues/new) or [join the discord](https://discord.gg/a72Rv2P). diff --git a/src/usecases/science.md b/src/usecases/science.md index 574c876..9eb9c00 100644 --- a/src/usecases/science.md +++ b/src/usecases/science.md @@ -9,4 +9,4 @@ - Scientific publications are a slow moving field - Publications tend to favor PDF, which is hard to make machine readable - Extending a syntax like LaTeX might provide a short path to referring to atomic data in publications -- Getting scientists to host the atomic data is going to be one of the most difficult aspects +- Getting scientists to host the atomic data is going to be one of the most difficult aspect diff --git a/src/usecases/self-integrating-applications.md b/src/usecases/self-integrating-applications.md new file mode 100644 index 0000000..78f35ec --- /dev/null +++ b/src/usecases/self-integrating-applications.md @@ -0,0 +1,12 @@ +# Self-integrating applications with Atomic Data + +Our digital workspaces are increasingly dependent on integrations. +Receive ticket updates in your group chat app, send an e-mail when a new lead is added, save uploaded files to a specific folder in your cloud storage. +Many SAAS platforms offer these types of integrations as simple click-to-enable features. +However, writing integrations as a developer pretty much always involves manual labor. +Integrations are costly, as developers will need to read API specifications, interpret them correctly, +map the properties, implement the integration and then add tests to make sure it doesn't break. + +Tools like the OpenAPI specification make it easier to render + +We call application _self-integrating_ if there is no labor involved with linking one app to the other. diff --git a/src/usecases/self-integrating-systems.md b/src/usecases/self-integrating-systems.md new file mode 100644 index 0000000..6d7d7aa --- /dev/null +++ b/src/usecases/self-integrating-systems.md @@ -0,0 +1 @@ +See https://www.nitrd.gov/nitrdgroups/images/b/ba/Steven_ray_the_future_of_software.pdf diff --git a/src/usecases/semantic-web.md b/src/usecases/semantic-web.md new file mode 100644 index 0000000..2c97a2e --- /dev/null +++ b/src/usecases/semantic-web.md @@ -0,0 +1,92 @@ +# Atomic Data for the Semantic Web + +The term 'Semantic Web' was popularized in [a paper of the same name](https://www-sop.inria.fr/acacia/cours/essi2006/Scientific%20American_%20Feature%20Article_%20The%20Semantic%20Web_%20May%202001.pdf) published in 2001 by three people, including the inventor of the World Wide Web: Tim Berners-Lee. +In this paper, a vision was shared for how a higher degree of standardization on the internet could lead to a bunch of interesting innovations. +For example, it describes how an appointment for a doctor is scheduled automatically by a "semantic agent", by checking the location of the person, comparing that to doctors in the area, getting reviews and checking the availability in the calendar. +By making the web machine-readable, we could get far more interoperability and therefore new applications that make our lives easier. +All of this would have been made possible by using linked data. + +It has been 20 years since this paper, and it is indeed easier then ever to make an appointment with a professional. +If can yell "hairdresser" at my phone, and I instantly see the nearest one with a high rating with a 'book now' button that checks our calendars. +So... we made it? +Unfortunately, this problem and many similar ones have not been solved by the semantic web: they have been solved by big companies that know everything about us, and have monopolized so much of the data on the internet. +Tech giants like Google and Microsoft have created ecosystems that integrate many types of (free) services, have huge databases containing all kinds of related stuff, and as a result, provide us with nice user experiences. +A high degree of _centralization_, instead of _standardization_, turned out to be a sufficient solution, too. +But of course, this centralized approach comes at a serious cost. +The first problem is we get _vendor lock-in_, which means that it becomes harder to switch from service to service. +We can't take our data from Whatsapp and take it to Telegram, for example, or our Twitter feed to Mastadon. +The second problem is that our usage goals do not align with the tech giants. +We might want to see a list of recent activity from our friends when we open facebook, but facebook's investors might want us to simply look at as much ads as possible. + +But of course, the internet isn't just tech giants - there are a lot of enthousiasts that really want to see the decentralized, semantic web succeed. + +The Semantic Web wasn't just an idea and a paper - there were a lot of standards involved, all of which were properly defined and managed by the W3C, the organization that standardizes so much of our web. +But the adoption of most of these standards is pretty low, unfortunately. + +## Why the semantic web didn't take off + +Before we'll discuss why Semantic Web related standards (most importantly its core data model: RDF) aren't being used too much, you need to know that I have company called Ontola which has been specialized in semantic web technologies. +We love this vision of a semantic web, and have a strong dedication to make this a reality. +We've built many libraries, full stack apps and services on RDF, and I really do think we've built technically unique products. +By going through this process, we discovered how technologically hard it is to actually build semantic web apps. +I'm actually pretty sure that we're one of the very few companies that have built a full SAAS platform (the e-democracy platform [Argu.co](https://argu.co/)) that communicates exclusively with its back-end by using RDF. +You can read more about this journey in [full-stack linked data](https://ontola.io/blog/full-stack-linked-data/), but here I'll summarize why this was such a challenging and costly endeavor. + +### Standards without working implementations + +The Semantic Web community actually built + +### A lack of proper RDF tools + +A lack + +### No business incentive to make data highly accessible + +If you're a software company that builds a product, you probably want people to keep using your product. +Investing in an awesome export feature where your customer can easily switch to a competitor is often a risky move. +This problem is of course not unique to the semantic web, but it is + +### Quirks in the RDF data model + +- No native support for arrays, which leads to a lot of confusion. I've written an [article comparing various approaches](https://ontola.io/blog/ordered-data-in-rdf/) on how to deal with this as an RDF developer. +- Subject-object combinations in RDF are not necessarily unique (contrary to key-value combinations in any Map or JSON object, for example), which makes querying and storing RDF hard. +- Named Graphs add another layer of complexity for identifying where data comes from, and makes querying and storing RDF again even harder. + +### Too much academic focus on reasoning, not enough on data models + +> Instead of the “let’s just build something that works” attitude that made the Web (and the Internet) such a roaring success, they brought the formalizing mindset of mathematicians and the institutional structures of academics and defense contractors. They formed committees to form working groups to write drafts of ontologies that carefully listed (in 100-page Word documents) all possible things in the universe and the various properties they could have, and they spent hours in Talmudic debates over whether a washing machine was a kitchen appliance or a household cleaning device + +- https://en.wikisource.org/wiki/Page:Aaron_Swartz_s_A_Programmable_Web_An_Unfinished_Work.pdf/15 + +### No schema language + +Being able to _check and validate_ the types of data is very useful when you want people to reach consensus on how to model things. +RDF Schema was not really a schema language. + +### Confusing terminology and documentation + +While learning the Semantic Web, a whole bunch of new concepts need to be learned. + Terms like + +### Too much new languages and serialization formats + +The Semantic Web and RDF are both older than JSON, and focused mostly on XML. +The First RDF serialization format (RDF/XML) was hard to read, hard to parse, very confusing and basically tried to combine the worst of graph-based and document-based data models. +After that, many new serialization formats appeared (N3, Turtle, JSON-LD) that made it even more confusing for developers to adopt this technology. +[Read this](https://ontola.io/blog/rdf-serialization-formats/) if you want to know more about RDF serialization formats. + +### Other reading + +- http://inamidst.com/whits/2008/ditching +- https://en.wikisource.org/wiki/Page:Aaron_Swartz_s_A_Programmable_Web_An_Unfinished_Work.pdf/15 +- https://twobithistory.org/2018/05/27/semantic-web.html + +## Why Atomic Data might give the Semantic Web a second chance + +When creating Atomic Data, I tried to learn from what went wrong with the Semantic Web. + +- Focus on developer experience from the start. +- **Minimize new serialization formats / languages**. Use things that people love. That's why Atomic Data uses JSON as its core serialization format, and keeps export support for all RDF formats. +- **Build applications, libraries and tools while writing the spec**. As a process, this means that every time the spec might result in a bad developer experience, I can update the specification. +- Have a schema language built in, include it in reference libraries. This results in all data being fully type safe. +- Have Subject-predicate / key-value uniqueness. diff --git a/src/usecases/standardization-bodies.md b/src/usecases/standardization-bodies.md new file mode 100644 index 0000000..feb99ea --- /dev/null +++ b/src/usecases/standardization-bodies.md @@ -0,0 +1,5 @@ +{{#title Atomic Data for Standardization bodies}} +# Atomic Data for Standardization bodies + +Most industry domains work on standardization. + (health, pharmaseuyical studies diff --git a/src/usecases/surveys.md b/src/usecases/surveys.md index e77596d..fb215be 100644 --- a/src/usecases/surveys.md +++ b/src/usecases/surveys.md @@ -1,3 +1,4 @@ +{{#title Atomic Data for Surveys}} # Atomic Data for Surveys Surveys and Questionnaires haven't been evolving that much over the past few years. @@ -49,6 +50,6 @@ The user only sees invitations that are highly relevant, without sharing _any_ i The Atomic Data specification solves at least part of this problem. [Paths](../core/paths.md) are used to describe the queries that researchers make. -[Atomic Server](https://github.com/joepio/atomic/blob/master/server/README.md) can be used as the personal online data store. +[AtomicServer](https://github.com/atomicdata-dev/atomic--rust/blob/master/server/README.md) can be used as the personal online data store. However, we still need to specify the process of sending a request to an individual (probably by introducing an [inbox](https://github.com/ontola/atomic-data/issues/28)) diff --git a/src/usecases/verifiable-credentials.md b/src/usecases/verifiable-credentials.md new file mode 100644 index 0000000..79c58e1 --- /dev/null +++ b/src/usecases/verifiable-credentials.md @@ -0,0 +1,42 @@ +{{#title Atomic Data and Verifiable Credentials / SSI}} +# Atomic Data and Verifiable Credentials / SSI + +## What are Verifiable Credentials / Self-Sovereign Identity + +Verifiable Credentials are pieces of information that have cryptographic proof by some reliable third party. +For example, you could have a credential that proves your degree, signed by your education. +These credentials an enable privacy-friendly transactions where a credential owner can prove being part of some group, without needing to actually identify themselves. +For example, you could prove that you're over 18 by showing a credential issued by your government, without actually having to show your ID card with your birthdate. +Verifiable Credentials are still not that widely used, but various projects exists that have had moderate success in implementing it. + +## What makes Atomic Data suitable for this + +Firstly, [Atomic Commit](../commits/intro.md) are already verifiable using signatures that contain all the needed information. +Secondly, [Atomic Schema](../schema/intro.md) can be used for standardizing Credential Schemas. + +## Every Atomic Commit is a Verifiable Credential + +Every time an Agent updates a Resource, an [Atomic Commit](../commits/intro.md) is made. +This Commit is cryptographically signed by an Agent, just like how Verfifiable Credentials are signed. +In essence, this means that _all atomic data created through commits is fully verifiable_. + +How could this verification work? + +- **Find the Commit** that has created / edited the value that you want to verify. This can be made easier with a specialized Endpoint that takes a `resource`, `property` and `signer` and returns the associated Commit(s). +- **Check the signer of the Commit**. Is that an Agent that you trust? +- **Verify the signature** of the Commit using the public key of the Agent. + +Sometimes, credentials need to be revoked. +How could revocation work? + +- **Find the Commit** (see above) +- **Get the signer** (see above) +- **Find the `/isRevoked` Endpoint of that signer**, send a Request there to make sure the linked Commit is still valid and not revoked. + +([Discussion](https://github.com/ontola/atomic-data-docs/issues/22)) + +## Use Atomic Schema for standardizing Credentials + +If you are a Verifier who wants to check someone's _birthdate_, you'll probably expect a certain datatype in return, such as a [date](https://atomicdata.dev/datatypes/date) that is formatted in some specific way. +[Atomic Schema](../schema/intro.md) makes it possible to express which _properties_ are [required](https://atomicdata.dev/properties/requires) in a certain [Class](https://atomicdata.dev/classes/Class), and it also makes it possible to describe which [datatype](https://atomicdata.dev/classes/Datatype) is linked to a specific [Property](https://atomicdata.dev/classes/Property). +Combined, they allow for fine-grained descriptions of models / classes / schemas. diff --git a/src/usecases/vocabulary-management.md b/src/usecases/vocabulary-management.md new file mode 100644 index 0000000..30125a7 --- /dev/null +++ b/src/usecases/vocabulary-management.md @@ -0,0 +1,4 @@ +{{#title Atomic Data for Vocabulary & Taxonomy management}} +# Atomic Data for Vocabulary & Taxonomy management + +Describing abstract concepts, data models, and terms in a consistent way helps organisations diff --git a/src/usecases/zettelkasten.md b/src/usecases/zettelkasten.md new file mode 100644 index 0000000..8d1b18d --- /dev/null +++ b/src/usecases/zettelkasten.md @@ -0,0 +1,2 @@ +{{#title Atomic Data for Zettelkasten}} +# Atomic Data for Zettelkasten diff --git a/src/websockets.md b/src/websockets.md new file mode 100644 index 0000000..6248697 --- /dev/null +++ b/src/websockets.md @@ -0,0 +1,34 @@ +{{#title Atomic Data Websockets - live synchronization}} +# WebSockets in Atomic Data + +WebSockets are a very fast and efficient way to have a client and server communicate in an asynchronous fashion. +They are used in Atomic Data to allow real-time updates, which makes it possible to create things like collaborative applications and multiplayer games. +These have been implemented in `atomic-server` and `atomic-data-browser` (powered by `@tomic/lib`). + +## Initializing a WebSocket connection + +Send an HTTP `GET` request to the `/ws` endpoint of an `atomic-server`. The Server should update that request to a secure WebSocket (`wss`) connection. +Use `x-atomic` [authentication headers (read more here)](./authentication.md) and use `ws` as a subject when signing. +The `WebSocket-Protocol` is `AtomicData`. + +## Client to server messages + +- `SUBSCRIBE ${subject}` tells the Server that you'd like to receive Commits about this Subject. +- `UNSUBSCRIBE ${subject}` tells the Server that you'd like to stop receiving Commits about this Subject. +- `GET ${subject}` fetch an individual resource. +- `AUTHENTICATE ${authenticationResource}` to set a user session for this websocket and allow authorized messages. The `authenticationResource` is a JSON-AD resource containing the signature and more, see [Authentication](../src/authentication.md). + +## Server to client messages + +- `COMMIT ${CommitBody}` an entire [Commit](../src/commits/concepts.md) for a resource that you're subscribed to. +- `RESOURCE ${Resource}` a JSON-AD Resource as a response to a `GET` message. If there is something wrong with this request (e.g. 404), return a `Error` Resource with the requested subject, similar to how the HTTP protocol server does this.` +- `ERROR ${ErrorBody}` an Error resource is sent whenever something goes wrong. The `ErrorBody` is a plaintext, typically English description of what went wrong. + +## Considerations + +- For many messages, there is no response to give if things are processed correctly. If a message is unknown or there is a different problem, return an `ERROR`. + +## Example implementations + +- [Example client implementation in Typescript (@tomic/lib).](https://github.com/atomicdata-dev/atomic-data-browser/blob/main/lib/src/websockets.ts) +- [Example server implementation in Rust using Actix-Web](https://github.com/atomicdata-dev/atomic-server/blob/master/server/src/handlers/web_sockets.rs) diff --git a/src/when-to-use.md b/src/when-to-use.md index f52b24c..59e8410 100644 --- a/src/when-to-use.md +++ b/src/when-to-use.md @@ -1,17 +1,17 @@ +{{#title When (not) to use Atomic Data}} # When (not) to use Atomic Data ## When should you use Atomic Data - **Flexible schemas**. When dealing with structured wikis or semantic data, various instances of things will have different attributes. Atomic Data allows _any_ kind of property on _any_ resource. -- **High-value open data**. Atomic Data is a bit harder to create than plain JSON, for example, but it is easier to re-use and understand. It's use of URLs for properties makes data self-documenting. +- **Open data**. Atomic Data is a bit harder to create than plain JSON, for example, but it is easier to re-use and understand. It's use of URLs for properties makes data self-documenting. - **High interoperability requirements**. When multiple groups of people have to use the same schema, Atomic Data provides easy ways to constrain and validate the data and ensure type safety. -- **Multi-class / multi-model**. Contrary to (SQL) tables, Atomic Data allows a single thing to have multiple classes, each with their own properties. - **Connected / decentralized data**. With Atomic Data, you use URLs to point to things on other computers. This makes it possible to connect datasets very explicitly, without creating copies. Very useful for decentralized social networks, for example. -- **Audibility & Versioning**. Using Atomic Commits, we can store all changes to data as transactions that can be replayed. This creates a complete audit log and history. +- **Auditability & Versioning**. Using Atomic Commits, we can store all changes to data as transactions that can be replayed. This creates a complete audit log and history. - **JSON or RDF as Output**. Atomic Data serializes to idiomatic, clean JSON as well as various RDF formats (Turtle / JSON-LD / n-triples / RDF/XML). ## When not to use Atomic Data - **Internal use only**. If you're not sharing structured data, Atomic Data will probably only make things harder for you. - **Big Data**. If you're dealing with TeraBytes of data, you probably don't want to use Atomic Data. The added cost of schema validation and the lack of distributed / large scale persistence tooling makes it not the right choice. -- **Video / Audio / 3D**. These should have unique, optimized binary representations and have very strict, static schemas. The advantages of linke data do little to improve this, unless it's just for metadata. +- **Video / Audio / 3D**. These should have unique, optimized binary representations and have very strict, static schemas. The advantages of atomic / linked data do little to improve this, unless it's just for metadata. diff --git a/theme/index.hbs b/theme/index.hbs index f9d3c5a..8e893c9 100644 --- a/theme/index.hbs +++ b/theme/index.hbs @@ -222,24 +222,20 @@ {{/if}} - {{#if google_analytics}} + - {{/if}} {{#if playground_line_numbers}} {{/if}} {{/if}} + {{!-- Discord widget "Crate" by WidgetBot --}} +