Collectives notebook

Xilinx · Jul 20, 2022 · d3f4f3b · d3f4f3b
1 parent 43730a1
commit d3f4f3b
Show file tree

Hide file tree

Showing 2 changed files with 172 additions and 1 deletion.
diff --git a/src/pyaccl/notebooks/collectives.ipynb b/src/pyaccl/notebooks/collectives.ipynb
@@ -0,0 +1,172 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ACCL Collectives (Emulator/Simulator)\n",
+    "In a system of more than one ACCL-enabled FPGAs, we can execute MPI-like collectives (scatter, gather, broadcast, reductions, etc). This notebook illustrates how to initialize the ACCL instances and run collectives. Usually, each ACCL instance runs in a separate process on a distinct compute node in a network, but for purposes of demonstration, we utilize multithreading in a single process to create and operate multiple ACCL instances"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setting up the ACCL ranks descriptor\n",
+    "Each ACCL instance requires a dictionary describing the ranks involved in communication. This dictionary describes, for each rank, the IP address and port of the rank, the session ID, and the maximum size of buffers the rank can receive."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "START_PORT = 5500\n",
+    "WORLD_SIZE = 4\n",
+    "RXBUF_SIZE = 16*1024\n",
+    "\n",
+    "ranks = []\n",
+    "for i in range(WORLD_SIZE):\n",
+    "    ranks.append({\"ip\": \"127.0.0.1\", \"port\": START_PORT+WORLD_SIZE+i, \"session_id\":i, \"max_segment_size\": RXBUF_SIZE})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initializing ACCL emulator/simulator instances\n",
+    "We are now ready to initialize our ACCL instances. We assume that a simulator or emulator session has been started with the appropriate number of ranks (see ACCL documentation). Our ACCL instances will connect to the simulator or emulator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pyaccl import accl\n",
+    "\n",
+    "accl_instances = []\n",
+    "for i in range(WORLD_SIZE):\n",
+    "    accl_instances.append(accl(ranks, i, bufsize=RXBUF_SIZE, protocol=\"TCP\",\n",
+    "                                sim_sock=\"tcp://localhost:\"+str(START_PORT+i)   ))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating ACCL buffers\n",
+    "With the ACCL instances ready, we can allocate buffers in each of the instances' memories. We allocate one source buffer and one result buffer, and paint the source with floating point data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "COUNT = 1000\n",
+    "\n",
+    "op0_buffers = []\n",
+    "op1_buffers = []\n",
+    "res_buffers = []\n",
+    "for i in range(WORLD_SIZE):    \n",
+    "    op0_buffers.append(accl_instances[i].allocate((COUNT,)))\n",
+    "    res_buffers.append(accl_instances[i].allocate((COUNT,)))\n",
+    "    op0_buffers[i][:] = [1.0*i for i in range(COUNT)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run an all-reduce collective\n",
+    "We are now ready to execute collectives. Since collectives require communication between the ACCL instances, we must start the collectives in each of the instances in parallel, utilizing threads. Each thread executes an all-reduce sum collective."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import threading\n",
+    "from pyaccl import ACCLReduceFunctions\n",
+    "import numpy as np\n",
+    "\n",
+    "def allreduce(n):\n",
+    "    accl_instances[n].allreduce(op0_buffers[n], res_buffers[n], COUNT, ACCLReduceFunctions.SUM)\n",
+    "\n",
+    "threads = []\n",
+    "for i in range(WORLD_SIZE):\n",
+    "    threads.append(threading.Thread(target=allreduce, args=(i,)))\n",
+    "    threads[i].start()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Check results\n",
+    "All-reduce should produce in each of the result buffers the sum of all the input buffers from each of the ACCL instances involved in the collective. We can compare all-reduce outputs with the expected outputs, element by element, to make sure this is the case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for i in range(WORLD_SIZE):\n",
+    "    threads[i].join()\n",
+    "    assert np.isclose(res_buffers[i], sum(op0_buffers)).all()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## De-Initialize ACCL instances\n",
+    "The `deinit()` function clears all internal data structures in the ACCL instance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for i in range(WORLD_SIZE):\n",
+    "    accl_instances[i].deinit()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/src/pyaccl/notebooks/multirank.ipynb b/src/pyaccl/notebooks/multirank.ipynb