more changes to attention and visuals

neelkanthavvari · Apr 10, 2021 · 1ff8a88 · 1ff8a88
1 parent 8fd24ee
commit 1ff8a88
Show file tree

Hide file tree

Showing 16 changed files with 44 additions and 4 deletions.
diff --git a/assets/images/RECOVER_multihead_attention.fla b/assets/images/RECOVER_multihead_attention.fla
diff --git a/assets/images/RECOVER_multihead_attention_highlevel.fla b/assets/images/RECOVER_multihead_attention_highlevel.fla
diff --git a/assets/images/RECOVER_three_eras.fla b/assets/images/RECOVER_three_eras.fla
diff --git a/assets/images/attention_overview.fla b/assets/images/attention_overview.fla
diff --git a/assets/images/attention_overview.png b/assets/images/attention_overview.png
diff --git a/assets/images/bidirectional_attention.fla b/assets/images/bidirectional_attention.fla
diff --git a/assets/images/bidirectional_attention.png b/assets/images/bidirectional_attention.png
diff --git a/assets/images/multihead_attention.fla b/assets/images/multihead_attention.fla
diff --git a/assets/images/multihead_attention.png b/assets/images/multihead_attention.png
diff --git a/assets/images/multihead_attention_highlevel.fla b/assets/images/multihead_attention_highlevel.fla
diff --git a/assets/images/self_attention.fla b/assets/images/self_attention.fla
diff --git a/assets/images/self_attention.png b/assets/images/self_attention.png
diff --git a/assets/images/three_eras.fla b/assets/images/three_eras.fla
diff --git a/assets/images/three_eras.png b/assets/images/three_eras.png
diff --git a/course/attention/02_self_attention.ipynb b/course/attention/02_self_attention.ipynb
@@ -0,0 +1,40 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Self Attention\n",
+    "\n",
+    "With dot-product attention, we calculated the alignment between word vectors from two different sequences - perfect for translation. Self-attention takes a different approach, here we compare words to previous words in the *same sequence*. So, where with dot-product attention we took our queries **Q** and keys **K** from two different sequences, self-attention takes them from the same sequence. Transformer models that look at previous tokens and try to predict the next include both text generation, and summarization.\n",
+    "\n",
+    "So, just like before with dot-product attention, we calculate the dot-product again - this time taking **Q** and **K** from the same sequence.\n",
+    "\n",
+    "![Self attention visual](../../assets/images/self_attention.png)\n",
+    "\n",
+    "After calculating the dot-product across all items in the sequence, we apply a mask to remove all values calculated for future words - leaving us with the dot-product between past words only. Next, we take the softmax just as before, and multiply the result by **V** to get our attention **Z**."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "ML",
+   "language": "python",
+   "name": "ml"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/course/attention/04_multihead_attention.ipynb b/course/attention/04_multihead_attention.ipynb
@@ -6,13 +6,13 @@
    "source": [
     "# Multihead Attention\n",
     "\n",
+    "Multihead attention allows us to build several representations of attention between words - so rather than calculating attention once, we calculate it several times, concatenate the results, and pass them through a linear layer. In a transformer model it would look like this:\n",
     "\n",
+    "Tk image\n",
     "\n",
-    "![Flow in multihead attention](../../assets/images/multihead_attention.png)\n",
+    "And if we were to look at the multi-head attention segment in more detail we would see this:\n",
     "\n",
-    "## From Scratch in Numpy\n",
-    "\n",
-    "TK work through example in Numpy"
+    "![Flow in multihead attention](../../assets/images/multihead_attention.png)"
    ]
   },
   {