Skip to content

Commit

Permalink
Add fixed IDs to each notebook cell (stellargraph#1842)
Browse files Browse the repository at this point in the history
nbformat recently released 5.1.0, which includes randomly generated cell IDs in
every cell: https://nbformat.readthedocs.io/en/latest/changelog.html#id2

> - Implemented CellIds from
>   [JEP-62](https://github.com/jupyter/enhancement-proposals/blob/master/62-cell-id/cell-id.md)

This thus requires us to update our notebooks, or else our CI validation of
their formatting fails
(https://github.com/stellargraph/stellargraph/runs/1699039930?check_suite_focus=true). Using
the default mode doesn't work, because they seem to be randomly generated on
every run (even after saving them once), and thus the 'formatting' changes each
time `format_notebook.py` is run. This PR thus implements a basic hashing
scheme to give fixed IDs for a cell:

- hash the cell's source code with SHA256 and use the first 8 hex digits
  (e.g. `abcd1234`)
- add a counter if this is not the first cell with the given source code in the
  notebook (e.g. `abcd1234-1`)

This is idempotent: reformatting a notebook without changes will give the same
result. Thus, it can be easily used as a check on CI.
  • Loading branch information
huonw authored Jan 15, 2021
1 parent db0e2f7 commit 70f2611
Show file tree
Hide file tree
Showing 57 changed files with 3,072 additions and 0 deletions.
52 changes: 52 additions & 0 deletions demos/basics/loading-networkx.ipynb

Large diffs are not rendered by default.

49 changes: 49 additions & 0 deletions demos/basics/loading-numpy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "0",
"metadata": {},
"source": [
"# Loading data into StellarGraph from NumPy\n",
Expand All @@ -11,6 +12,7 @@
},
{
"cell_type": "markdown",
"id": "1",
"metadata": {
"nbsphinx": "hidden",
"tags": [
Expand All @@ -23,6 +25,7 @@
},
{
"cell_type": "markdown",
"id": "2",
"metadata": {},
"source": [
"[The StellarGraph library](https://github.com/stellargraph/stellargraph) supports loading graph information from NumPy. [NumPy](https://www.numpy.org) is a library for working with data arrays.\n",
Expand All @@ -49,6 +52,7 @@
{
"cell_type": "code",
"execution_count": 1,
"id": "3",
"metadata": {
"nbsphinx": "hidden",
"tags": [
Expand All @@ -66,6 +70,7 @@
{
"cell_type": "code",
"execution_count": 2,
"id": "4",
"metadata": {
"nbsphinx": "hidden",
"tags": [
Expand All @@ -88,6 +93,7 @@
{
"cell_type": "code",
"execution_count": 3,
"id": "5",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -96,6 +102,7 @@
},
{
"cell_type": "markdown",
"id": "6",
"metadata": {},
"source": [
"## Loading via NumPy\n",
Expand All @@ -111,6 +118,7 @@
{
"cell_type": "code",
"execution_count": 4,
"id": "7",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -120,6 +128,7 @@
},
{
"cell_type": "markdown",
"id": "8",
"metadata": {},
"source": [
"## Sequential numeric graph structure\n",
Expand All @@ -139,6 +148,7 @@
{
"cell_type": "code",
"execution_count": 5,
"id": "9",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -219,6 +229,7 @@
},
{
"cell_type": "markdown",
"id": "10",
"metadata": {},
"source": [
"## Homogeneous graph with sequential IDs and feature vectors\n",
Expand All @@ -229,6 +240,7 @@
{
"cell_type": "code",
"execution_count": 6,
"id": "11",
"metadata": {},
"outputs": [
{
Expand All @@ -254,6 +266,7 @@
},
{
"cell_type": "markdown",
"id": "12",
"metadata": {},
"source": [
"Because our nodes have IDs `0`, `1`, ..., we can construct the `StellarGraph` by passing in the feature array directly, along with the edges:"
Expand All @@ -262,6 +275,7 @@
{
"cell_type": "code",
"execution_count": 7,
"id": "13",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -270,6 +284,7 @@
},
{
"cell_type": "markdown",
"id": "14",
"metadata": {},
"source": [
"The `info` method ([docs](https://stellargraph.readthedocs.io/en/stable/api.html#stellargraph.StellarGraph.info)) gives a high-level summary of a `StellarGraph`:"
Expand All @@ -278,6 +293,7 @@
{
"cell_type": "code",
"execution_count": 8,
"id": "15",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -305,6 +321,7 @@
},
{
"cell_type": "markdown",
"id": "16",
"metadata": {},
"source": [
"On this square, it tells us that there's 4 nodes of type `default` (a homogeneous graph still has node and edge types, but they default to `default`), with 2 features, and one type of edge that touches it. It also tells us that there's 5 edges of type `default` that go between nodes of type `default`. This matches what we expect: it's a graph with 4 nodes and 5 edges and one type of each.\n",
Expand All @@ -315,6 +332,7 @@
{
"cell_type": "code",
"execution_count": 9,
"id": "17",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -348,6 +366,7 @@
},
{
"cell_type": "markdown",
"id": "18",
"metadata": {},
"source": [
"## Non-sequential graph structure\n",
Expand All @@ -367,6 +386,7 @@
{
"cell_type": "code",
"execution_count": 10,
"id": "19",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -447,6 +467,7 @@
},
{
"cell_type": "markdown",
"id": "20",
"metadata": {},
"source": [
"## Homogeneous graph with non-numeric IDs and feature vectors\n",
Expand All @@ -457,6 +478,7 @@
{
"cell_type": "code",
"execution_count": 11,
"id": "21",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -466,6 +488,7 @@
{
"cell_type": "code",
"execution_count": 12,
"id": "22",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -475,6 +498,7 @@
{
"cell_type": "code",
"execution_count": 13,
"id": "23",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -505,13 +529,15 @@
},
{
"cell_type": "markdown",
"id": "24",
"metadata": {},
"source": [
"As before, there's 4 nodes, each with features of length 2."
]
},
{
"cell_type": "markdown",
"id": "25",
"metadata": {},
"source": [
"## Homogeneous graph with non-numeric IDs and feature tensors\n",
Expand All @@ -526,6 +552,7 @@
{
"cell_type": "code",
"execution_count": 14,
"id": "26",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -569,6 +596,7 @@
{
"cell_type": "code",
"execution_count": 15,
"id": "27",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -578,6 +606,7 @@
{
"cell_type": "code",
"execution_count": 16,
"id": "28",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -608,13 +637,15 @@
},
{
"cell_type": "markdown",
"id": "29",
"metadata": {},
"source": [
"We can see that the features of the `corner` nodes are now listed as a tensor, with shape 3 × 2, matching the array we created above."
]
},
{
"cell_type": "markdown",
"id": "30",
"metadata": {},
"source": [
"## Heterogeneous graphs\n",
Expand Down Expand Up @@ -644,6 +675,7 @@
{
"cell_type": "code",
"execution_count": 17,
"id": "31",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -653,6 +685,7 @@
{
"cell_type": "code",
"execution_count": 18,
"id": "32",
"metadata": {},
"outputs": [
{
Expand All @@ -676,6 +709,7 @@
{
"cell_type": "code",
"execution_count": 19,
"id": "33",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -684,6 +718,7 @@
},
{
"cell_type": "markdown",
"id": "34",
"metadata": {},
"source": [
"We have the information for the two node types `foo` and `bar` in separate DataFrames, so we can now put them in a dictionary to create a `StellarGraph`. Notice that `info()` is now reporting multiple node types, as well as information specific to each."
Expand All @@ -692,6 +727,7 @@
{
"cell_type": "code",
"execution_count": 20,
"id": "35",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -729,6 +765,7 @@
},
{
"cell_type": "markdown",
"id": "36",
"metadata": {},
"source": [
"Node IDs (the DataFrame index) needs to be unique across all types. For example, renaming the `a` corner to `b` like `square_foo_overlap` in the next cell, is not accepted and a `StellarGraph(...)` call will throw an error"
Expand All @@ -737,6 +774,7 @@
{
"cell_type": "code",
"execution_count": 21,
"id": "37",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -746,6 +784,7 @@
{
"cell_type": "code",
"execution_count": 22,
"id": "38",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -755,6 +794,7 @@
},
{
"cell_type": "markdown",
"id": "39",
"metadata": {},
"source": [
"If the node IDs aren't unique across types, one way to make them unique is to add a string prefix. You'll need to add the same prefix to the node IDs used in the edges too. Adding a prefix can be done by replacing the index:"
Expand All @@ -763,6 +803,7 @@
{
"cell_type": "code",
"execution_count": 23,
"id": "40",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -774,6 +815,7 @@
{
"cell_type": "code",
"execution_count": 24,
"id": "41",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -784,6 +826,7 @@
},
{
"cell_type": "markdown",
"id": "42",
"metadata": {},
"source": [
"### Feature tensors\n",
Expand All @@ -794,6 +837,7 @@
{
"cell_type": "code",
"execution_count": 25,
"id": "43",
"metadata": {},
"outputs": [
{
Expand All @@ -817,6 +861,7 @@
{
"cell_type": "code",
"execution_count": 26,
"id": "44",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -826,6 +871,7 @@
{
"cell_type": "code",
"execution_count": 27,
"id": "45",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -865,13 +911,15 @@
},
{
"cell_type": "markdown",
"id": "46",
"metadata": {},
"source": [
"We can now see that the `foo` node is listed as having a feature tensor, as desired."
]
},
{
"cell_type": "markdown",
"id": "47",
"metadata": {},
"source": [
"## Conclusion\n",
Expand All @@ -887,6 +935,7 @@
},
{
"cell_type": "markdown",
"id": "48",
"metadata": {
"nbsphinx": "hidden",
"tags": [
Expand Down
Loading

0 comments on commit 70f2611

Please sign in to comment.