Skip to content

Commit

Permalink
added attention visuals and notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
jamescalam committed Apr 8, 2021
1 parent 5a1fd61 commit 8fd24ee
Show file tree
Hide file tree
Showing 7 changed files with 36 additions and 58 deletions.
Binary file modified assets/images/dot_product_attention.fla
Binary file not shown.
Binary file modified assets/images/dot_product_attention.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions course/attention/00_summary.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Attention\n",
"\n",
"In this section we'll cover four different types of attention mechanisms:\n",
"\n",
"* Dot-product (encoder-decoder) attention\n",
"\n",
"* Self attention\n",
"\n",
"* Bidirectional attention\n",
"\n",
"* Multihead attention\n",
"\n",
"Each of these mechanisms lend well to our understanding of modern day transformer models, which typically use a combination of these mechanisms - for example BERT which uses the dot-product attention, adapted for encoder-encoder mappings using self-attention, which is modified to bidirectional attention - and this operation is performed several times due to multihead attention.\n",
"\n",
"![Visual showing the focus of each attention mechanism](../../assets/images/attention_overview.png)\n",
"\n",
"Each row in the visual above corresponds to dot-product (encoder-decoder), self, bidirectional, and multihead attention respectively."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
9 changes: 8 additions & 1 deletion course/attention/01_dot_product_attention.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Dot-Product Attention\n",
"\n",
"The first attention mechanism we will focus on is dot-product attention. When we perform many NLP tasks we would typically convert a word into a vector (*word2vec*), with transformers we perform the same operation. These vectors allows us to represent meaning numerically (eg days of the week may be clustered together, or we can perform logical arithmetic on the vectors - *King - Man + Woman = Queen*).\n",
"The first attention mechanism we will focus on is dot-product (encoder-decoder) attention. When we perform many NLP tasks we would typically convert a word into a vector (*word2vec*), with transformers we perform the same operation. These vectors allows us to represent meaning numerically (eg days of the week may be clustered together, or we can perform logical arithmetic on the vectors - *King - Man + Woman = Queen*).\n",
"\n",
"Because of this, we would expect sentences with similar meaning to have a similar set of values. For example, in neural machine translation, the phrase *\"Hello, how are you?\"*, and the Italian equivalent *\"Ciao, come va?\"* should share a similar matrix representation.\n",
"\n",
Expand Down Expand Up @@ -187,6 +187,13 @@
"\n",
"Once we calculate the dot product, we apply a softmax function to convert the dot product alignment into probabilities. These are then multiplied by *V* to give us the attention tensor **z**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
45 changes: 0 additions & 45 deletions course/attention/02_causal_attention.ipynb

This file was deleted.

13 changes: 2 additions & 11 deletions course/attention/03_bidirectional_attention.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,10 @@
"source": [
"# Bi-directional Attention\n",
"\n",
"TK explain\n",
"We've explored both dot-product attention, and self-attention. Where dot-product compared two sequences, and causal attention compared previous tokens from the *same sequence*, bidirectional attention compares tokens from the *same sequence* in both directions, subsequent and previous. This is as simple as performing the exact same operation that we performed for *self-attention*, but excluding the masking operation - allowing each word to be mapped to every other word in the same sequence. So, we could call this *bi-directional **self** attention*. This is particularly useful for masked language modeling - and is used in BERT (**Bidirectional Encoder** Representations from Transformers) - bidirectional self-attention refers to the *bidirectional encoder*, or the *BE* of BERT.\n",
"\n",
"## From Scratch in Numpy\n",
"\n",
"TK work through example in Numpy"
"![Bidirectional Attention](../../assets/images/bidirectional_attention.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
4 changes: 3 additions & 1 deletion course/attention/04_multihead_attention.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
"source": [
"# Multihead Attention\n",
"\n",
"TK explain\n",
"\n",
"\n",
"![Flow in multihead attention](../../assets/images/multihead_attention.png)\n",
"\n",
"## From Scratch in Numpy\n",
"\n",
Expand Down

0 comments on commit 8fd24ee

Please sign in to comment.