first commit

cloudant-labs · Mar 27, 2020 · ce978ce · ce978ce
commit ce978ce
Show file tree

Hide file tree

Showing 157 changed files with 1,576 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+.DS_Store
diff --git a/Part 01 - What is Cloudant.md b/Part 01 - What is Cloudant.md
@@ -0,0 +1,61 @@
+![](slides/Slide0.png)
+
+Welcome to the Introduction to Cloudant course, an eighteen part video series that gives you an overview of the IBM Cloudant databases-as-a-service.
+
+![](slides/Slide1.png)
+
+
+---
+
+This is part 1: "What is Cloudant?"
+
+![](slides/Slide4.png)
+
+--- 
+
+Cloudant is a database, run as a service in the IBM Cloud. Its job is to store your application's data securely and allow you to retrieve it quickly and efficiently. Cloudant's key features are that it is 
+
+- a database - it stores and retrieves data
+- more specifically it is a JSON document store. JSON comes from JavaScript and represents simple objects in a universal file format. The "document" is the unit of storage in Cloudant. Documents are added/updated/deleted in their entirety.
+- It has an HTTP API. Any Cloudant operation can be achieved using HTTP. HTTP is the protocol that powers the World Wide Web and Cloudant is a  database built for the web.
+
+Most databases are hidden in a private network, inaccessible but to a handful of machines but the Cloudant service sits (mainly) on the public internet where it be accessed by anyone with an internet connection (and permission to do so!).
+
+![](slides/Slide5.png)
+---
+
+Cloudant wasn't written entirely by IBM. It is based on Apache CouchDB, an open source project run by the Apache Foundation. Cloudant employs a number of CouchDB contributors but by the rules of Apache, they cannot monopolise its development.
+
+Much of what you see in this course is applicable to Apache CouchDB as it is to Cloudant. Their APIs are 99% the same - I'll point out where they diverge.
+
+Cloudant can be thought of as CouchDB run "as-a-service". A Cloudant service is easily deployed and is managed by IBM engineers 24-7. There's no software to install, no servers to manage, no configuration to understand. The user need not be a CouchDB expert to use and manage it.
+
+Cloudant being built on truly open-source foundations means that you can be sure that your data layer is not _locked in_ to a particular platform, cloud or vendor and Cloudant can be used in concert with CouchDB to create hybrid applications that share data through replication, as we'll see.
+
+![](slides/Slide6.png)
+---
+
+Later on in the course we'll look "under the hood, to see how Cloudant works, but initially we'll treat Cloudant as a "black box".
+
+![](slides/Slide7.png)
+---
+
+To summarise
+
+- Cloudant is based on Apache CouchDB, an open-source project.
+- it stores JSON documents.
+- it is accessed with an HTTP API and can therefore be accessed by any device on the internet that speaks HTTP: application code, web browser, IoT device or mobile phone.
+- Cloudant is a highly-available managed service able to continue to operate with multiple hardware failures
+
+![](slides/Slide8.png)
+
+---
+
+That's the end of this part. The next part is called ["The Document"](Part\ 02\ -\ The\ Document.md)
+
+![](slides/Slide0.png)
+
+---
+
+
+
diff --git a/Part 02 - The Document.md b/Part 02 - The Document.md
@@ -0,0 +1,121 @@
+![](slides/Slide0.png)
+
+Welcome to the Introduction to Cloudant course, an eighteen part video series that gives you an overview of the IBM Cloudant databases-as-a-service.
+
+![](slides/Slide1.png)
+
+---
+
+This is part 2: "The Cloudant Document?"
+
+We've seen that Cloudant is a JSON document store. Let's find out what that means in practice and how that compares to other types of database.
+
+![](slides/Slide9.png)
+
+--- 
+
+Most databases store their data in collections called _tables_, where each unit of data is a _row_, each with identical, fixed columns. The schema of each table is predefined: a list of columns with their name, date type, value constraints and relations to other tables carefully defined. Each new record forms a row in a table. 
+
+Cloudant is quite different!
+
+A Cloudant service has collections called _databases_ (instead of _tables_) each of which containing any number of documents. 
+
+The example on this slide shows the same data expressed in a traditional tabular database and how the same data would be stored in Cloudant as JSON documents. 
+
+So if you come from a relational database background: tables are "databases" in Cloudant, and rows are "documents".
+
+![](slides/Slide10.png)
+
+---
+
+A Cloudant document must be a JSON object, starting and ending with curly braces and containing a number of key/value attributes.
+
+JSON objects must be less that 1 megabyte in size and contain any number of strings, numbers, booleans, arrays and objects. The nesting of objects within objects can continue to any depth.
+
+The keys used can be as brief or verbose as you like.
+
+Here are some simple example documents showing how each data type is used.
+
+- the first example shows a person object, storing strings, booleans, and an array of tags.
+- the second example shows very brief attribute names, to save on storage and represents a web event such as a click on a website.
+- the last example shows how the document may itself contain sub-objects
+
+A note on dates. JSON has no native Date type so dates are usually stored in 2018-10-30 or similar formats - we will come back to dates later.
+
+![](slides/Slide11.png)
+
+---
+
+Now for your first practical exercise. Visit www.ibm.com/cloud and register an account with the IBM Cloud, if you don't have one already.
+
+One registered, you may click on "services", search for the "Cloudant" database and provision a new service.
+
+The Cloudant "Lite" service provides a free plan to allow users to try Cloudant in a limited capacity while in development. It's bigger brother, the "Standard Plan", is paid-for service where you specify the number of reads/writes/and queries per second your application and that capacity is reserved for you. You pay for the capacity you provision and your data storage usage.
+
+The Lite plan operates in a similar way, but only has a small provisioned capacity and a fixed storage size, but is fine for "kicking the tyres".
+
+![](slides/Slide12.png)
+
+---
+
+Cloudant is often referred to as a "schemaless" database - but we have to be careful how we define that term.
+
+It's true to say that there's no need to define your schema (field names, types, constraints and relationships) ahead of time in a Cloudant database - you may simply write a JSON document of your own design  to a database. 
+
+This flexibility is well liked by developers because they can design their data in their code, turn it into JSON and write it to the database. 
+
+It's still important to think about the "shape of your data", especially in terms of how you are going to query and index it, as we'll see later. 
+
+Data design is still required, but strictly speaking that database doesn't need to know about your schema.
+
+![](slides/Slide13.png)
+
+---
+
+Let's say we want to create a database of US presidents. We can simply devise our "model" of the data in our app, turn it into JSON and write it to the database. In this case we are using a common CouchDB convention: the "type" field indicating the data type of the document. 
+
+![](slides/Slide14.png)
+
+---
+
+If at a future date we decide we want to add additional data to our "schema", we can simply write a new object to the database with no complaints from Cloudant. We could decide to add the "address" object only to:
+
+- documents that are created from now on
+- only documents that we know addresses for
+
+In other words, documents of the same type can have fields present or missing. 
+
+You database's schema can evolve over time to match your application's needs and you don't (necessarily) need to tell the database about the schema change - just write new documents in the new format.
+
+![](slides/Slide15.png)
+
+---
+
+
+We can even store multiple document "types" in the same database.  In this case, people/books/places reside in the same database. We know which is which because of the "type" field (this is a convention and not something that means anything to Cloudant).
+
+An alternative to this is have three databases people/books/places and keep each data type in its own database. Both approaches are fine: you would choose to have multiple types together in the same database if need to perform queries _across types_ or if you need to replicate all data types together, otherwise the _separate databases_ approach may be better.
+
+![](slides/Slide16.png)
+
+---
+
+Although Cloudant is "schemaless", this doesn't absolve you of the need to do detailed data design to get the best performance.
+
+Here are some tips, especially relevant if you have some relational database experience.
+
+- avoid thinking in joins - a Cloudant document should contain everything you need about that object, so that it can be retrieved in one API call.
+- normalisation goes out of the window in JSON store, some repeated values can be tolerated if it makes data retrieval more efficient.
+- although we have a 1MB limit on document size, your documents should be much smaller than that - a few KB is typical.
+- If your application can embrace a "write only" design pattern, where data is only ever added to a database, then it may make your life easier. You should definitely avoid patterns that rely on updating the same document over and over in small time window.
+
+
+![](slides/Slide17.png)
+
+---
+
+That's the end of this part. The next part is called ["The Document id"](Part\ 03\ -\ The\ Document\ _id)
+
+![](slides/Slide0.png)
+
+---
diff --git a/Part 03 - The Document _id.md b/Part 03 - The Document _id.md
@@ -0,0 +1,66 @@
+![](slides/Slide0.png)
+
+Welcome to the Introduction to Cloudant course, an eighteen part video series that gives you an overview of the IBM Cloudant databases-as-a-service.
+
+![](slides/Slide1.png)
+
+---
+
+This is part 3: "The Document _id"
+
+We've seen how data is stored in Cloudant documents with flexibility on how your application stores JSON objects in Cloudant databases. There are, however, a few hard and fast rules.
+
+![](slides/Slide18.png)
+
+--- 
+
+One rule is that every document must have a unique identifier call `_id` which is a string. No two documents in the same database can have the same _id field. In other database, you specify which column is the unique identifier, but in Cloudant it's always `_id` and can't be changed.
+
+![](slides/Slide19.png)
+
+---
+
+Also, unlike relational databases, Cloudant does not have "auto incrementing ids", that is an id field that starts at 1 and increments for each document added.
+
+Cloudant's `_id` field is either:
+
+- a 32 character string generated by Cloudant - the id is meaningless sequence of numbers and letters that is guaranteed to be unie other than it uniquely identifies a document in a database
+- a string defined by you (if you know something unique about your data)
+
+![](slides/Slide20.png)
+
+---
+
+Here are some examples of supplying your own document _id :
+
+- using it to store something that you know is unique i.e. the email address of a user. Your registration mechanism can enforce a one-user-per-email address policy.
+- Some users choose to encode the document type in the `_id` e.g. user:56, book:55
+- The last example shows using a 32-digit string (generated in your app) that is designed to sort in approximate date/time order, making it easy to retrieve the latest documents from the database, without a secondary index.
+
+![](slides/Slide21.png)
+
+---
+
+Cloudant takes your document `_ids` and stores them in an index (like the contents page of book). This primary index is in `_id` order and is used to allow Cloudant to retrieve documents by `_id` - thus behaving like a key/value store.
+
+By careful design of your `_id` field, you can make use to the primary index to keep data that makes sense to be together in the primary index, which makes it quicker to retrieve that data. We've already seen that using time-sortable `_id`s means that data can be retrieved in date/time order.
+
+We'll see this later when it comes to retrieving ranges of document ids.
+
+![](slides/Slide22.png)
+
+---
+
+In conclusion, each document must have a `_id` field that is unique in the database. It can be auto-generated by Cloudant, or can be supplied by your application, which must take responsibility of the uniqueness of the data.
+
+The `_id` field is the basis of the database's primary index which, as we'll see, can be used for key/value and range lookups.
+
+![](slides/Slide23.png)
+
+---
+
+That's the end of this part. The next part is called ["The rev token"](Part\ 04\ -\ The\ rev\ token.md)
+
+![](slides/Slide0.png)
+
+---
diff --git a/Part 04 - The rev token.md b/Part 04 - The rev token.md
@@ -0,0 +1,89 @@
+![](slides/Slide0.png)
+
+Welcome to the Introduction to Cloudant course, an eighteen part video series that gives you an overview of the IBM Cloudant databases-as-a-service.
+
+![](slides/Slide1.png)
+
+---
+
+This is part 4: "The rev token"
+
+The second fundamental Cloudant rule is that each document revision is given its own unique revision token. Let's find out what it means
+
+
+![](slides/Slide24.png)
+
+---
+
+You never need to generate a revision token - one is created for you when you add/update/delete a document using the API.
+
+A revision token consists of two parts:
+
+- a number 1, 2, 3, etc
+- a cryptographic hash of the document's body
+
+(For the uninitiated, a hash is a digital "fingerprint" of some data. If the data changes, the fingerprint changes. No two fingerprints are the same i.e. no two documents with different content would have the same hash.)
+
+You can see from the example on the right that our document has a revision token (the key starting `_rev`) that starts with a "1" followed  by a dash. That tells us that this is the first revision of the document. The digits starting 04aa8... are the cryptographic hash of the document.
+
+![](slides/Slide25.png)
+
+---
+
+If we follow the lifecycle of a document, it starts with a "revision 1". When it's modified later, it gets a "revision 2" and so on. With each incrementing revision number, the hash also changes because the content of the document is being modified too.
+
+One thing to note:
+
+> It is *possible* for a document to have more than one revisions with the same number. i.e. two "revision 3s". This is called  a "conflict" and is "normal" in some circumstances. We'll see why later in the course, but for now we can assume that the revision number will increment with update to a document.
+
+![](slides/Slide26.png)
+
+--- 
+
+Let's follow the lifecycle of an example Cloudant document
+
+When  a new document is created (auto-generated `_id` or user-supplied `_id`), it is allocated a "revision 1". You will be sent the token in the response to your API request. Normally you can discard the _id (UNLESS you intend to modify the document in the near future, as we'll see).
+
+![](slides/Slide27.png)
+
+--- 
+
+When we modify a document  whose `_rev` is at "revision 1" (notice we've change the name from Liz --> Elizabeth), the document is saved and a "revision 2" token is generated and returned to you in the API response.
+
+All simple enough so far.
+
+![](slides/Slide28.png)
+
+--- 
+
+If we delete the document later, A "revision 3" is created !
+
+Unlike almost any other database, Cloudant keeps a reference for deleted documents. A deletion is just another document revision - a special one where `_deleted: true` replaces the document body.
+
+In fact the document's recent revision history (the tree of revisions - remember we could have more than one of each revision number) - is kept.
+
+Note
+
+> You can't use Cloudant's revision tree as version control system to retrieve or "rollback" to old revision. Once revision is superceded, the document _body_ of the older revision is deleted and its disk space recovered in a process called "compaction". Compaction occurs automatically in Cloudant, so it's not safe to assume that old revisions will be available to be retrieved. 
+
+![](slides/Slide29.png)
+
+--- 
+
+To summarise:
+
+- revision tokens are generated by the database on add/edit/delete. (you never need to create your own revision tokens).
+- generally, the revision number increases by one each time, but more complex scenarios are possible (we'll cover this later).
+- older document bodies are discarded or _compacted_ (don't rely on being able to get them back).
+- all Cloudant operations that change a document need the document's `_id` AND its `_rev` (this is unlike most databases)
+
+![](slides/Slide30.png)
+
+--- 
+
+
+That's the end of this part. The next part is called ["Authentication"](Part\ 05\ -\ Authentication.md)
+
+![](slides/Slide0.png)
+
+---