Skip to content

Commit

Permalink
feat(data): Datastore Docs (aws-amplify#9753)
Browse files Browse the repository at this point in the history
  • Loading branch information
david-mcafee authored Apr 11, 2022
1 parent 0c9c401 commit 4eb824f
Show file tree
Hide file tree
Showing 14 changed files with 476 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,7 @@ If you can't migrate to [aws-sdk-js-v3](https://github.com/aws/aws-sdk-js-v3) or
```js
import { Auth } from 'aws-amplify';
```

### DataStore Docs

For more information on contributing to DataStore / how DataStore works, see the [DataStore Docs](packages/datastore/README.md)
154 changes: 154 additions & 0 deletions packages/datastore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# AWS Amplify DataStore Docs

[Amplify DataStore](https://docs.amplify.aws/lib/datastore/getting-started/q/platform/js/) provides a programming model for leveraging shared and distributed data without writing additional code for offline and online scenarios, which makes working with distributed, cross-user data just as simple as working with local-only data.

---

| package | version | open issues | closed issues |
| ---------------------- | --------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| @aws-amplify/datastore | ![npm](https://img.shields.io/npm/v/@aws-amplify/datastore.svg) | [![Open Issues](https://img.shields.io/github/issues/aws-amplify/amplify-js/DataStore?color=red)](https://github.com/aws-amplify/amplify-js/issues?q=is%3Aissue+label%3ADataStore+is%3Aopen) | [![Closed Issues](https://img.shields.io/github/issues-closed/aws-amplify/amplify-js/DataStore)](https://github.com/aws-amplify/amplify-js/issues?q=is%3Aissue+label%3ADataStore+is%3Aclosed) |

---

## **👋 Note For Contributers: 👋**

_**Please update these docs any time you find something that is incorrect or lacking. In particular, if a line in the docs prompts a question, take a moment to figure out the answer, then update the docs with the necessary detail.**_

---

## Getting Started

Before you start reading through these docs, take a moment to understand [how DataStore works at a high level](https://docs.amplify.aws/lib/datastore/how-it-works/q/platform/js/). Additionally, we recommend first reading through [docs.amplify.aws](https://docs.amplify.aws/lib/datastore/getting-started/q/platform/js/). The purpose of these docs is to dive deep into the codebase itself and understand the inner workings of DataStore for the purpose of contributing. Understanding these docs is **not** necessary for using DataStore. Lastly, before reading, take a look at [the diagrams below](#diagrams).

---

## Docs

- [Conflict Resolution](docs/conflict-resolution.md)
- [Contributing](docs/contributing.md)
- [DataStore Lifecycle Events ("Start", "Stop", "Clear")](docs/datastore-lifecycle-events.md)
- This explains how DataStore fundementally works, and is a great place to start.
- [Getting Started](docs/getting-started.md) (Running against a sample app, etc.)
- [Namespaces](docs/namespaces.md)
- [How DataStore uses Observables](docs/observables.md)
- [Schema Changes](docs/schema-changes.md)
- [Storage](docs/storage.md)
- [Sync Engine](docs/sync-engine.md)
- ["Unsupported hacks" / workarounds](docs/workarounds.md)

---

# Diagrams

_Note: relationships with dotted lines are explained more in a separate diagram._

## How the DataStore API and Storage Engine Interact

```mermaid
flowchart TD
%% API and Storage
api[[DS API]]-- observe -->storage{Storage Engine}
storage-- next -->adapter[[Adapter]]
adapter-->db[[Local DB]]
db-->api
sync[[Sync Engine*]]-.-storage
sync-.-appSync[(AppSync)]
```

# How the Sync Engine Observes Changes in Storage and AppSync

_Note: All green nodes belong to the Sync Engine._

\* Merger first checks outbox

\*\* Outbox sends outgoing messages to AppSync

```mermaid
flowchart TD
subgraph SyncEngine
index{index.ts}-- observe -->reach[Core reachability]
subgraph processors
mp[Mutation Processor]
sp[Subscription Processor]
syp[Sync Processor]
end
reach--next-->mp[Mutation Processor]
reach--next-->sp[Subscription Processor]
reach--next-->syp[Sync Processor]
subgraph outbox / merger
outbox[Outbox]
merger[Merger]
outbox---merger
end
end
api[DS API]-.->storage
mp-- 1. observe -->storage{Storage Engine}
storage-- 2. next -->merger[merger*]-- next -->storage
sp-- observe -->appsync[(AppSync)]
appsync-- next -->sp
syp---appsync
mp-->outbox[outbox**]
appsync<--->outbox
%% styling
classDef syncEngineClass fill:#8FB,stroke:#333,stroke-width:4px,color:#333;
class index,mp,sp,syp,merger,outbox syncEngineClass;
```

---

# Project Structure

<pre>
amplify-js/packages/datastore/src
├── authModeStrategies
│ └── defaultAuthStraegy.ts
│ └── index.ts
│ └── multiAuthStrategy.ts
├── datastore
│ └── datastore.ts # Entry point for DataStore
├── predicates
│ └── index.ts
│ └── sort.ts
├── ssr
├── storage # Storage Engine
│ └── adapter # Platform-specific Storage Adapters
│ └── getDefaultAdapter
│ └── AsyncStorageAdapter.ts
│ └── AsyncStorageDatabase.ts
│ └── index.ts
│ └── IndexedDBAdapter.ts
│ └── InMemoryStore.native.ts
│ └── InMemoryStore.ts
│ └── storage.ts # Entry point for Storage
├── sync # Sync Engine
│ └── dataStoreReachability
│ └── index.native.ts
│ └── index.ts
│ └── processors # Sync Engine Processors
│ └── mutation.ts
│ └── subscription.ts
│ └── sync.ts
│ └── datastoreConnectivity.ts # Subscribe to reachability monitor
│ └── index.ts # Entry point for Sync Engine
│ └── merger.ts # <a href="https://github.com/aws-amplify/amplify-js/blob/datastore-docs/packages/datastore/docs/sync-engine.md#merger" title="merger doc">doc</a>
│ └── outbox.ts # <a href="https://github.com/aws-amplify/amplify-js/blob/datastore-docs/packages/datastore/docs/sync-engine.md#outbox" title="outbox doc">doc</a>
</pre>

---

## Other Resources:

- [High-level overview of how DataStore works](https://docs.amplify.aws/lib/datastore/how-it-works/q/platform/js/)
- [DataStore Docs](https://docs.amplify.aws/lib/datastore/getting-started/q/platform/js/)
- [re:Invent talk](https://www.youtube.com/watch?v=KcYl6_We0EU)
8 changes: 8 additions & 0 deletions packages/datastore/docs/conflict-resolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Conflict Resolution
- **AppSync is the source of truth for conflict resolution**
1. In the event AppSync fails to resolve a conflict, the network response will contain an error message (`conflict unhandled`). This is how we give customers the chance to make an update, or try again.
2. We use jittered retry (10x).
- TODO: add more detail / links to how this retry logic occurs.
- We err on the side of not deleting customer data when performing conflict resolution.
- Auto-merge is the default resolution strategy. This relies on the version, and will attempt to merge fields that changed when possible.
- For more, see [the AppSync docs](https://docs.aws.amazon.com/appsync/latest/devguide/conflict-detection-and-sync.html)
17 changes: 17 additions & 0 deletions packages/datastore/docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Contributing

## Formatting
- We use Prettier to format our code. We recommend installing it within your IDE to prevent formatting code within other Amplify packages (as opposed to formatting from the Prettier CLI directly). Example [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=esbenp.prettier-vscode)).

## Testing DataStore changes locally
- On first build:
- Within **amplify-js**: `yarn && yarn build && yarn link-all && yarn build:esm:watch`
- Within sample app: `yarn && yarn link aws-amplify && yarn link @aws-amplify/datastore && yarn start`
- On subsequent builds (useful if something isn't working):
- Within **amplify-js**: `yarn clean && yarn build && yarn link-all && yarn build:esm:watch`
- Within sample app: `rm -rf node_modules && yarn && yarn link aws-amplify && yarn link @aws-amplify/datastore && yarn start`

## Contributing to these docs
- Do not link to specific lines of code, as these frequently change. Instead, do the opposite: link to the documentation within the code itself, as the docs are less likely to change.
- Prefer small, self-contained sections over large, monolothic documents.
- Do not use permalinks - instead, link to the most current files.
88 changes: 88 additions & 0 deletions packages/datastore/docs/datastore-lifecycle-events.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# DataStore Lifecycle Events ("Start", "Stop", "Clear")

# DataStore Initialization ("Start")

**Understanding how DataStore starts is critical to understanding how DataStore fundamentally works.** At a high level, starting DataStore does the following things in order for each model:

> 1. Init Schema
> 2. Init the Storage Engine
> 3. Migrate schema changes
> 4. Sync Engine Operations
> 5. Empty the Outbox / processes the mutation queue
> 6. Begin processing the subscription buffer
> 7. DataStore is now in "ready" state
- _We can eagerly start DataStore by calling `DataStore.start`. Otherwise, invoking a method (query, save, delete, observe) will start it up_
- _When importing a model, DS consumes `schema.js`, and creates the IndexedDB store._

## **How it works:**

1. ### **Init schema**
- **1.1** First we call `initSchema` [here](packages/datastore/src/datastore/datastore.ts)
- **1.2** Codegen then generates `schema.js` from the schema
- **1.3** DataStore consumes `schema.js`

2. ### **Init the storage engine**
- **2.1** The adapter is initialized
- **2.2** The local database gets created if it doesn’t exist already
- **2.3** The adapter [has a `setUp`](packages/datastore/src/storage/adapter/IndexedDBAdapter.ts#L82) method that then calls the database's `init` method
- **2.4** Relations are established
- _Nothing happens until the first interaction with DataStore_

3. ### **Migrate schema changes (if needed)**
- See [schema-changes.md](./schema-changes.md)
- If the user has updated the schema, we perform the migration here

4. ### **Sync Engine Operations**
- #### 4.1 Instantiate Sync Engine (`this.sync = new SyncEngine(`)
- The Sync Engine is only instantiated if there is a graphql endpoint (meaning we’ve already provisioned the backend). Otherwise, DataStore is in local-only mode. See [datastore.ts](packages/datastore/src/datastore/datastore.ts#L735)
- **Note: at this step, we do not yet process the buffer**
- There are three subscriptions per model: `create`, `update`, and `delete`

- #### 4.2 Sync Engine is started (`syncSubscription = this.sync.start(`)
- **4.2.1 Subscribe to the Sync Engine**
- Messages from this subscription are emmited as Hub events for DataStore
- When ready, we call `initResolve`
- If unauthorized, DS keeps working, as we may have publicly readable models
- If a validation error occurs, DataStore Sync breaks entirely
- Without subscriptions, DataStore doesn’t work (when there is an endpoint present)
- Subscriptions are the only component that update the local store from remote
- If updates come in, they’re buffered to be processed after sync is complete
- Prepares the sync predicates (similar to adapter setup)

- **4.2.2 Subscribe to DataStore connectivity observable (notifications about network status)**
- `this.datastoreConnectivity.status().subscribe`
- Subscribe Amplify Core component that monitors network reachability
- We do this here because we need the ability to stop or start the sync process. When offline, we disconnect the websocket and stop syncing. Once online, we reconnect the websocket and start base / delta syncing
- Sync engine subscribes to the Storage Engine
- Every write may need to get translated to a mutation in the outbox
- Storage engine is local source of truth for DataStore, all other pieces are observing

- **4.2.3 Run the Sync Queries (when online)**
- _Note: We perform a topological sort of the data - we sync the children first, so when we query the parent, the children are already present We also use an optimisation to parallelize this process if possible (i.e. non-dependent models)_
- Sync queries are Graphql queries that are necessary to hydrate the local store initially
- The first time we run the app, we will perform a query to perform a scan or query of DynamoDB with up to 10k records per table. This populates the local store. With selective sync, we perform a query instead of a scan against DynamoDB.
- Subsequent changes after the initial sync query come in through subscriptions
- There are two mechanisms:
- Base sync - retrieve all records up to total sync value
- Delta sync - one table per model, one delta sync table per DS store
- AppSync makse the final decision regarding which sync (base vs delta) to perform
- The client sends the last sync param with the sync query, service then compares the diff
- There is a TTL on all delta sync table records
- To find the TTL within the AppSync Console, see "Update Data Source"

5. ### **Empty the Outbox / processes the mutation queue**
- Example: when performing mutations offline, records are added to the queue. Once there is connectivity, we start sending these **ONE BY ONE**.
- **Note: No batch API is exposed to consumers**
- Mutation events have ids
- Syncs get applied before mutations are sent
6. ### **Begin processing the subscription buffer**
- If we receive subscription messages any time in the process of initializing subscriptions, performing sync queries, or processing the mutation queue, we buffer the subscription messages until everything else is completed. Once we have completed processing the mutation queue, we then process the subscription buffer
7. ### **DataStore is now in "ready" state**
- For additional reference, and how the above are published as Hub events, see [the docs](https://docs.amplify.aws/lib/datastore/datastore-events/q/platform/js/)

## Stop
- Stops the DataStore sync process. This will close the real-time subscription connection when your app is no longer interested in updates. You will typically call DataStore.stop() just before your application is closed. You can also force your DataStore sync expressions to be re-evaluated at runtime by calling stop(), followed by start()

## Clear
- Clears local data from DataStore. DataStore will now require a full sync (not a delta sync) to populate the local store with data
26 changes: 26 additions & 0 deletions packages/datastore/docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Onboarding with DataStore

## Understand the primary DataStore events
- Read the [DataStore lifecycle events doc](docs/datastore-lifecycle-events.md)

## Building a sample app with DataStore to understand how it works
1. Build a basic DataStore sample app (no @auth, just 1-2 models).
2. Build a similar sample app with the API category.
3. Add @auth rules to the DataStore app:
1. `amplify update api`
2. Modify schema according to auth rules docs (https://docs.amplify.aws/lib/datastore/setup-auth-rules/q/platform/js).
3. `amplify push`
4. `amplify codegen models`
4. Add [selective sync](https://docs.amplify.aws/lib/datastore/sync/q/platform/js#selectively-syncing-a-subset-of-your-data).
5. Enable [real-time changes](https://docs.amplify.aws/lib/datastore/real-time/q/platform/js).
6. While interacting with your app, examine the IndexedDB tables (Application > IndexedDB within Chrome dev tools):
1. Check out the different stores in IDB that get created for your schema. Note the internal stores prefixed with sync_, and the stores corresponding to your models prefixed with user_.
2. Familiarize yourself with how actions taken in the UI affect the data stored in IDB. This may be easier to do while throttling the network connection. You'll be able to see how outgoing mutations first get persisted into the corresponding store, then added to the mutation queue / outbox (sync_MutationEvent), and then updated in the store with data from AppSync.
7. Turn on DEBUG logging (`Amplify.Logger.LOG_LEVEL = "DEBUG";`) at the root of your project, and inspect the logs in the console while using your app. Additionally, [enable hub events](https://docs.amplify.aws/lib/datastore/datastore-events/q/platform/js#usage) for DataStore.
8. The best way to understand DataStore events is to place several debuggers or breakpoints throughout DataStore.
- With logging / Hub events enabled, you can see what operations DataStore is performing (i.e. start, sync, etc.) as you step through with the debugger.
9. Testing offline scenerios / concurrent user sessions is a useful way to test the full functionality of DataStore, and to fully understand how the sync process actually works.
10. Next steps:
- Create a React Native example (uses a different storage type)
- Try more complex schema types
- Observe changes in records within DynamoDB (for instance, soft deletion).
52 changes: 52 additions & 0 deletions packages/datastore/docs/namespaces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Namespaces
- `datastore`
- Settings
- `user`
- Models that came from user schema
- `sync`
- Metadata (last time ran query, etc.)
- `storage`
- Deprecated
# Local database examples:

- *Note: Anything prepended with `sync_` is an internal table.*

- ## datastore_Setting
- Used for schema versioning
- See the [schema changes doc](docs/schema-changes.md)
```
{
id: "01FYABF3DMBZZJ46W1CC214NH2"
key: "schemaVersion"
value: "\"4401034582a70c60713e1f7f9da3b752\""
}
```
- ## sync_ModelMetadata
- Sync Engine metadata
- Includes information about the last time we synced a model
```
{
fullSyncInterval: 86400000
id: "01FYABF3DMBZZJ46W1CC214NH3"
lastFullSync: 1647467532307
lastSync: 1647467532307
lastSyncPredicate: null
model: "Todo"
namespace: "user"
}
```
- ## sync_MutationEvent
- ## user_[Model Name]
- The actual records themselves.
```
{
createdAt: "2022-03-16T21:52:07.718Z"
description: null
id: "6f69055b-b081-4225-8fc4-1d6d52732660"
name: "name 1647467527489"
updatedAt: "2022-03-16T21:52:07.718Z"
_deleted: null
_lastChangedAt: 1647467527754
_version: 1
}
```
11 changes: 11 additions & 0 deletions packages/datastore/docs/observables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# How DataStore uses Observables
- All of DataStore internally uses event driven methods (observables) to handle everything from the sync process, to observing online connectivity. **This makes the Storage Engine the single source of truth for DataStore.**
- Examples:
- The Sync Engine observes DataStore Connectivity
- The Sync Engine observes the Storage Engine
- The client observes DataStore with `observe` and `observeQuery`:
- https://docs.amplify.aws/lib/datastore/real-time/q/platform/js/

## Understanding Observables
- DataStore uses [`zen-observable`](https://github.com/zenparsing/zen-observable)
- [The RXJS docs](https://rxjs.dev/guide/observable) do a good job of describing observables in more detail.
Loading

0 comments on commit 4eb824f

Please sign in to comment.