Add fixed IDs to each notebook cell (stellargraph#1842)

nbformat recently released 5.1.0, which includes randomly generated cell IDs in every cell: https://nbformat.readthedocs.io/en/latest/changelog.html#id2 > - Implemented CellIds from > [JEP-62](https://github.com/jupyter/enhancement-proposals/blob/master/62-cell-id/cell-id.md) This thus requires us to update our notebooks, or else our CI validation of their formatting fails (https://github.com/stellargraph/stellargraph/runs/1699039930?check_suite_focus=true). Using the default mode doesn't work, because they seem to be randomly generated on every run (even after saving them once), and thus the 'formatting' changes each time `format_notebook.py` is run. This PR thus implements a basic hashing scheme to give fixed IDs for a cell: - hash the cell's source code with SHA256 and use the first 8 hex digits (e.g. `abcd1234`) - add a counter if this is not the first cell with the given source code in the notebook (e.g. `abcd1234-1`) This is idempotent: reformatting a notebook without changes will give the same result. Thus, it can be easily used as a check on CI.
gokunwu · Jan 15, 2021 · 70f2611 · 70f2611
1 parent db0e2f7
commit 70f2611
Show file tree

Hide file tree

Showing 57 changed files with 3,072 additions and 0 deletions.
diff --git a/demos/basics/loading-networkx.ipynb b/demos/basics/loading-networkx.ipynb
diff --git a/demos/basics/loading-numpy.ipynb b/demos/basics/loading-numpy.ipynb
@@ -2,6 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "id": "0",
    "metadata": {},
    "source": [
     "# Loading data into StellarGraph from NumPy\n",
@@ -11,6 +12,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "1",
    "metadata": {
     "nbsphinx": "hidden",
     "tags": [
@@ -23,6 +25,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "2",
    "metadata": {},
    "source": [
     "[The StellarGraph library](https://github.com/stellargraph/stellargraph) supports loading graph information from NumPy. [NumPy](https://www.numpy.org) is a library for working with data arrays.\n",
@@ -49,6 +52,7 @@
   {
    "cell_type": "code",
    "execution_count": 1,
+   "id": "3",
    "metadata": {
     "nbsphinx": "hidden",
     "tags": [
@@ -66,6 +70,7 @@
   {
    "cell_type": "code",
    "execution_count": 2,
+   "id": "4",
    "metadata": {
     "nbsphinx": "hidden",
     "tags": [
@@ -88,6 +93,7 @@
   {
    "cell_type": "code",
    "execution_count": 3,
+   "id": "5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -96,6 +102,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "6",
    "metadata": {},
    "source": [
     "## Loading via NumPy\n",
@@ -111,6 +118,7 @@
   {
    "cell_type": "code",
    "execution_count": 4,
+   "id": "7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -120,6 +128,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "8",
    "metadata": {},
    "source": [
     "## Sequential numeric graph structure\n",
@@ -139,6 +148,7 @@
   {
    "cell_type": "code",
    "execution_count": 5,
+   "id": "9",
    "metadata": {},
    "outputs": [
     {
@@ -219,6 +229,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "10",
    "metadata": {},
    "source": [
     "## Homogeneous graph with sequential IDs and feature vectors\n",
@@ -229,6 +240,7 @@
   {
    "cell_type": "code",
    "execution_count": 6,
+   "id": "11",
    "metadata": {},
    "outputs": [
     {
@@ -254,6 +266,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "12",
    "metadata": {},
    "source": [
     "Because our nodes have IDs `0`, `1`, ..., we can construct the `StellarGraph` by passing in the feature array directly, along with the edges:"
@@ -262,6 +275,7 @@
   {
    "cell_type": "code",
    "execution_count": 7,
+   "id": "13",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -270,6 +284,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "14",
    "metadata": {},
    "source": [
     "The `info` method ([docs](https://stellargraph.readthedocs.io/en/stable/api.html#stellargraph.StellarGraph.info)) gives a high-level summary of a `StellarGraph`:"
@@ -278,6 +293,7 @@
   {
    "cell_type": "code",
    "execution_count": 8,
+   "id": "15",
    "metadata": {},
    "outputs": [
     {
@@ -305,6 +321,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "16",
    "metadata": {},
    "source": [
     "On this square, it tells us that there's 4 nodes of type `default` (a homogeneous graph still has node and edge types, but they default to `default`), with 2 features, and one type of edge that touches it. It also tells us that there's 5 edges of type `default` that go between nodes of type `default`. This matches what we expect: it's a graph with 4 nodes and 5 edges and one type of each.\n",
@@ -315,6 +332,7 @@
   {
    "cell_type": "code",
    "execution_count": 9,
+   "id": "17",
    "metadata": {},
    "outputs": [
     {
@@ -348,6 +366,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "18",
    "metadata": {},
    "source": [
     "## Non-sequential graph structure\n",
@@ -367,6 +386,7 @@
   {
    "cell_type": "code",
    "execution_count": 10,
+   "id": "19",
    "metadata": {},
    "outputs": [
     {
@@ -447,6 +467,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "20",
    "metadata": {},
    "source": [
     "## Homogeneous graph with non-numeric IDs and feature vectors\n",
@@ -457,6 +478,7 @@
   {
    "cell_type": "code",
    "execution_count": 11,
+   "id": "21",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -466,6 +488,7 @@
   {
    "cell_type": "code",
    "execution_count": 12,
+   "id": "22",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -475,6 +498,7 @@
   {
    "cell_type": "code",
    "execution_count": 13,
+   "id": "23",
    "metadata": {},
    "outputs": [
     {
@@ -505,13 +529,15 @@
   },
   {
    "cell_type": "markdown",
+   "id": "24",
    "metadata": {},
    "source": [
     "As before, there's 4 nodes, each with features of length 2."
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "25",
    "metadata": {},
    "source": [
     "## Homogeneous graph with non-numeric IDs and feature tensors\n",
@@ -526,6 +552,7 @@
   {
    "cell_type": "code",
    "execution_count": 14,
+   "id": "26",
    "metadata": {},
    "outputs": [
     {
@@ -569,6 +596,7 @@
   {
    "cell_type": "code",
    "execution_count": 15,
+   "id": "27",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -578,6 +606,7 @@
   {
    "cell_type": "code",
    "execution_count": 16,
+   "id": "28",
    "metadata": {},
    "outputs": [
     {
@@ -608,13 +637,15 @@
   },
   {
    "cell_type": "markdown",
+   "id": "29",
    "metadata": {},
    "source": [
     "We can see that the features of the `corner` nodes are now listed as a tensor, with shape 3 × 2, matching the array we created above."
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "30",
    "metadata": {},
    "source": [
     "## Heterogeneous graphs\n",
@@ -644,6 +675,7 @@
   {
    "cell_type": "code",
    "execution_count": 17,
+   "id": "31",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -653,6 +685,7 @@
   {
    "cell_type": "code",
    "execution_count": 18,
+   "id": "32",
    "metadata": {},
    "outputs": [
     {
@@ -676,6 +709,7 @@
   {
    "cell_type": "code",
    "execution_count": 19,
+   "id": "33",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -684,6 +718,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "34",
    "metadata": {},
    "source": [
     "We have the information for the two node types `foo` and `bar` in separate DataFrames, so we can now put them in a dictionary to create a `StellarGraph`. Notice that `info()` is now reporting multiple node types, as well as information specific to each."
@@ -692,6 +727,7 @@
   {
    "cell_type": "code",
    "execution_count": 20,
+   "id": "35",
    "metadata": {},
    "outputs": [
     {
@@ -729,6 +765,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "36",
    "metadata": {},
    "source": [
     "Node IDs (the DataFrame index) needs to be unique across all types. For example, renaming the `a` corner to `b` like `square_foo_overlap` in the next cell, is not accepted and a `StellarGraph(...)` call will throw an error"
@@ -737,6 +774,7 @@
   {
    "cell_type": "code",
    "execution_count": 21,
+   "id": "37",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -746,6 +784,7 @@
   {
    "cell_type": "code",
    "execution_count": 22,
+   "id": "38",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -755,6 +794,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "39",
    "metadata": {},
    "source": [
     "If the node IDs aren't unique across types, one way to make them unique is to add a string prefix. You'll need to add the same prefix to the node IDs used in the edges too. Adding a prefix can be done by replacing the index:"
@@ -763,6 +803,7 @@
   {
    "cell_type": "code",
    "execution_count": 23,
+   "id": "40",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -774,6 +815,7 @@
   {
    "cell_type": "code",
    "execution_count": 24,
+   "id": "41",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -784,6 +826,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "42",
    "metadata": {},
    "source": [
     "### Feature tensors\n",
@@ -794,6 +837,7 @@
   {
    "cell_type": "code",
    "execution_count": 25,
+   "id": "43",
    "metadata": {},
    "outputs": [
     {
@@ -817,6 +861,7 @@
   {
    "cell_type": "code",
    "execution_count": 26,
+   "id": "44",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -826,6 +871,7 @@
   {
    "cell_type": "code",
    "execution_count": 27,
+   "id": "45",
    "metadata": {},
    "outputs": [
     {
@@ -865,13 +911,15 @@
   },
   {
    "cell_type": "markdown",
+   "id": "46",
    "metadata": {},
    "source": [
     "We can now see that the `foo` node is listed as having a feature tensor, as desired."
    ]
   },
   {
    "cell_type": "markdown",
+   "id": "47",
    "metadata": {},
    "source": [
     "## Conclusion\n",
@@ -887,6 +935,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "48",
    "metadata": {
     "nbsphinx": "hidden",
     "tags": [