Skip to content

Commit

Permalink
docs: keep implementing reviewer demmands
Browse files Browse the repository at this point in the history
  • Loading branch information
ucaiado committed Oct 13, 2016
1 parent 2c23936 commit aa6be6e
Showing 1 changed file with 39 additions and 26 deletions.
65 changes: 39 additions & 26 deletions learning_trade.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -53,39 +53,27 @@
"- Have you thoroughly discussed how you will attempt to solve the problem?\n",
"- Is an anticipated solution clearly defined? Will the reader understand what results you are looking for?\n",
"```\n",
"[Algo trading](http://goo.gl/b9jAqE) strategies usually are programs that follow a predefined set of instructions to place its orders. The primary challenge to this approach is building these rules in a way that it can consistently generate profit without being too sensitive to market conditions. \n",
"[Algo trading](http://goo.gl/b9jAqE) strategies usually are programs that follow a predefined set of instructions to place its orders. \n",
"\n",
"Thus, the goal of this project is to develop an adaptative learning model that can learn by itself those rules and trade a particular asset using reinforcement learning framework under an environment that replays historical high-frequency data.\n",
"\n",
"As \\cite{chan2001electronic} describe, reinforcement learning can be considered as a model-free approximation of dynamic programming. The knowledge of the underlying processes is not assumed but learned from experience.\n",
"\n",
"The agent can access some information about the environment state as the order flow imbalance, the sizes of the best bid and offer and so on. At each time step $t$, It should generate some valid action, as buy stocks or insert a limit order at the Ask side. All inputs and actions will be detailed in the next sections.\n",
"\n",
"The agent also should receive a reward or a penalty at each time step if it is already carrying a position from previous rounds or if it has made a trade (the cost of the operations are computed as a penalty).\n",
"\n",
"Based on the rewards and penalties it gets, the agent should learn an optimal policy for trade this particular stock, maximizing the profit it receives from its actions and resulting positions.\n",
"The primary challenge to this approach is building these rules in a way that it can consistently generate profit without being too sensitive to market conditions. Thus, the goal of this project is to develop an adaptive learning model that can learn by itself those rules and trade a particular asset using reinforcement learning framework under an environment that replays historical high-frequency data.\n",
"\n",
"As \\cite{chan2001electronic} described, reinforcement learning can be considered as a model-free approximation of dynamic programming. The knowledge of the underlying processes is not assumed but learned from experience. The agent can access some information about the environment state as the order flow imbalance, the sizes of the best bid and offer and so on. At each time step $t$, It should generate some valid action, as buy stocks or insert a limit order at the Ask side. The agent also should receive a reward or a penalty at each time step if it is already carrying a position from previous rounds or if it has made a trade (the cost of the operations are computed as a penalty). Based on the rewards and penalties it gets, the agent should learn an optimal policy for trade this particular stock, maximizing the profit it receives from its actions and resulting positions.\n",
"\n",
"```\n",
"Udacity Reviewer:\n",
"\n",
"This is really quite close! I'm marking as not meeting specifications because you should fully outline your solution here. You've outlined your strategy regarding reinforcement learning, but you should also address things like data preprocessing, choosing your state space etc. Basically, this section should serve as an outline for your entire solution. Just add a paragraph or two to fully outline your proposed methodology and you're good to go.\n",
"```\n",
"\n",
"\n",
"This project starts with an overview of the dataset and show how the environment state will be represented in Section 2. The same section also dive in the reinforcement learning framework and defines the benchmark used at the end of the project. \n",
"\n",
"Section 3 discretizes the environment states by transforming its variables and clustering them into six groups. Also describes the implementation of the model and the process of improvement made upon the algorithm used.\n",
"\n",
"Section 4 presents the final model and compares statistically its performance to the benchmark chosen. Section 5 concludes the project with some final remarks and possible improvements.\n"
"This project starts with an overview of the dataset and shows how the environment states will be represented in Section 2. The same section also dives in the reinforcement learning framework and defines the benchmark used at the end of the project. Section 3 discretizes the environment states by transforming its variables and clustering them into six groups. Also describes the implementation of the model and the environments, as well as and the process of improvement made upon the algorithm used. Section 4 presents the final model and compares statistically its performance to the benchmark chosen. Section 5 concludes the project with some closing remarks and possible improvements.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### 1.4. Metrics\n",
"### 1.3. Metrics\n",
"```\n",
"Udacity:\n",
"\n",
Expand All @@ -94,9 +82,20 @@
"- Have you provided reasonable justification for the metrics chosen based on the problem and solution?\n",
"```\n",
"\n",
"In 1988, the Wall Street Journal created a [Dartboard Contest](http://www.automaticfinances.com/monkey-stock-picking/), where Journal staffers threw darts at a stock table to select their assets, while investment experts picked their own stocks. After six months, they compared the results of the two methods. After adjusting the results to risk level, they found out that the pros barely have beaten the random pickers.\n",
"```\n",
"Udacity Reviewer:\n",
"\n",
"The section on metrics should address any statistics or metrics that you'll be using in your report. What you've written in your benchmark section is roughly what we're looking for for the metrics section and vice versa. I'd recommend changing the subtitles to clarify this. If it's more logical to introduce the benchmark before explaining your metrics, you could combine the 'Benchmark' and 'Metrics' subsections into a single 'Benchmark and Metrics' section. \n",
"```\n",
"\n",
"Different metrics are used to support the decisions made throughout the project. We use the mean [Silhouette Coefficient](http://scikit-learn.org/stable/modules/clustering.html#silhouette-coefficient) of all samples to justify the clustering method chosen to reduce the state space representation of the environment. As exposed in the scikit-learn documentation, this coefficient is composed by the mean intra-cluster distance ($a$) and the mean nearest-cluster distance ($b$) for each sample. The score for a single cluster is given by $s = \\frac{b-a}{\\max{a, \\, b}}$.This scores are so average down to all samples and varying between $1$ (the best one) and $-1$ (the worst value).\n",
"\n",
"Then, we use [sharpe ratio](https://en.wikipedia.org/wiki/Sharpe_ratio) to help us understanding the performance impact of different values to the model parameters. The Sharpe is measure upon the first difference ($\\Delta r$) of the accumulated PnL curve of the model. So, the first difference is defined as $\\Delta r = PnL_t - PnL_{t-1}$.\n",
"\n",
"Given that, the metric used to measure the performance of the learner will be the amount of money made by a random agent. So, my goal will be to outperform this agent, that should just produce some random action from a set of allowed action at each time $t$. In the next section, I will detail the behavior of this agent."
"Finally, as we shall justify latter, the performance of my agent will be compared to the performance of a random agent. These performances will be measured primarily of Reais made (the Brazilian currency) by the agents. To compared the final PnL of both agents in the simulations, we will perform a one-sided [t-student](https://en.wikipedia.org/wiki/Student%27s_t-test) test for the null hypothesis that the learner agent has the expected PnL smaller than a random agent. In the next section, I will detail the behavior of this agent.\n",
"\n",
"\n",
"Finally, as we shall justify latter, the performance of my agent will be compared to the performance of a random agent. These performances will be measured primarily of Reais made (the Brazilian currency) by the agents. To compared the final PnL of both agents in the simulations, we will perform a one-sided [Welch's unequal variances t-test](https://goo.gl/Je2ZLP) for the null hypothesis that the learning agent has the expected PnL greater than the random agent. In the next section, I will detail the behavior of learning agent."
]
},
{
Expand Down Expand Up @@ -450,7 +449,7 @@
"\n",
"$$e_n = \\mathbb{1}_{P_{n}^{B} \\geq P_{n-1}^{B}} q^{B}_{n} - \\mathbb{1}_{P_{n}^{B} \\leq P_{n-1}^{B}} q^{B}_{n-1} - \\mathbb{1}_{P_{n}^{A} \\leq P_{n-1}^{A}} q^{A}_{n} + \\mathbb{1}_{P_{n}^{A} \\geq P_{n-1}^{A}} q^{A}_{n-1}$$\n",
"\n",
"Where $q^{B}_{n}$ and $q^{A}_{n}$ are linked to the cumulated quantities at the best bid and ask in the time $n$. The subscript $n-1$ is related to the last observation. $\\mathbb{1}$ is an [indicator](https://en.wikipedia.org/wiki/Indicator_function) function. In the figure below is ploted the 10-second log-return of PETR4 against the contemporaneous OFI."
"Where $q^{B}_{n}$ and $q^{A}_{n}$ are linked to the cumulated quantities at the best bid and ask in the time $n$. The subscript $n-1$ is related to the last observation. $\\mathbb{1}$ is an [indicator](https://en.wikipedia.org/wiki/Indicator_function) function. In the figure below is ploted the 10-second log-return of PETR4 against the contemporaneous OFI. [Log-return](https://quantivity.wordpress.com/2011/02/21/why-log-returns/) is defined as $\\ln{r_t} = \\ln{\\frac{P_t}{P_{t-1}}}$, where $P_t$ is the current price of PETR4 and $P_{t-1}$ is the previous one."
]
},
{
Expand Down Expand Up @@ -597,11 +596,15 @@
"- Is it clear how this result or value was obtained (whether by data or by hypothesis)?\n",
"```\n",
"\n",
"As described before, the performance of my agent will be compared to the performance of a random agent. This random agent should select a random action from a set of valid actions taken from $A$ at each time step $t$.\n",
"In 1988, the Wall Street Journal created a [Dartboard Contest](http://www.automaticfinances.com/monkey-stock-picking/), where Journal staffers threw darts at a stock table to select their assets, while investment experts picked their own stocks. After six months, they compared the results of the two methods. After adjusting the results to risk level, they found out that the pros barely have beaten the random pickers.\n",
"\n",
"Given that, the benchmark used to measure the performance of the learner will be the amount of money made, in Reais, by a random agent. So, my goal will be to outperform this agent, that should just produce some random action from a set of allowed actions taken from $A$ at each time step $t$.\n",
"\n",
"Just like my learner, the set of action can change over time depending on the open position, that is limited to $100$ stocks at most, on any side. When it reaches its limit, it will be allowed just to perform actions that decrease its position. So, for instance, if it already [long](https://goo.gl/GgXJgR) in $100$ shares, the possible moves would be $\\left (None,\\, sell,\\, best\\_ask \\right)$. If it is [short](https://goo.gl/XFR7q3), it just can perform $\\left (None,\\, buy,\\, best\\_bid\\right)$.\n",
"\n",
"The performance will be measured primarily in the money made by the agents (that will be optimized by the learner). First, I will analyze if the learning agent was able to improve its performance on the same dataset after different trials. Later on, I will use the policy learned to simulate the learning agent behavior in a different dataset and then I will compare the final Profit and Loss and volatility of the returns of both agents. All data analyzed will be obtained by simulation."
"The performance will be measured primarily in the money made by the agents (that will be optimized by the learner). First, I will analyze if the learning agent was able to improve its performance on the same dataset after different trials. Later on, I will use the policy learned to simulate the learning agent behavior in a different dataset and then I will compare the final Profit and Loss and volatility of the returns of both agents. All data analyzed will be obtained by simulation.\n",
"\n",
"As a last benchmark, more like a reference, ...."
]
},
{
Expand Down Expand Up @@ -674,7 +677,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The scale of the variables is very different, and, in the case of the $BOOK\\_RATIO$, it presents a logarithmic distribution. I will apply a logarithm transformation on this variable and rescale both to lie between a given minimum and maximum value of each feature using the function [MinMaxScaler](http://scikit-learn.org/stable/modules/preprocessing.html) from scikit-learn. The result of the transformation can be seen in the figure below."
"```\n",
"Udacity Reviewer:\n",
"\n",
"Please be sure to specify how you are doing this (I'd recommend giving the formula).\n",
"```\n",
"The scale of the variables is very different, and, in the case of the $BOOK\\_RATIO$, it presents a logarithmic distribution. I will apply a logarithm transformation on this variable and rescale both to lie between a given minimum and maximum value of each feature using the function [MinMaxScaler](http://scikit-learn.org/stable/modules/preprocessing.html) from scikit-learn. So, both variable will be scaled to lie between $0$ and $1$ by applying the formula $z_{i} =\\frac{x_i - \\min{X}}{\\max{X} - \\min{X}}$. Where $z$ is the transformed variable, $x_i$ is the variable to be transformed and $X$ is a vector with all $x$ that will be transformed. The result of the transformation can be seen in the figure below."
]
},
{
Expand Down Expand Up @@ -1060,7 +1068,12 @@
"\n",
"The agent will be allowed to take action every $2$ seconds and, due to this delay, every time it decides to insert limit orders, it will place it 1 cent worst than the best price. So, if the best bid is $12.00$ and the best ask is $12.02$, if the agent chooses the action $BEST\\_BOTH$, it should include a buy order at $11.99$ and a sell order at $12.03$. It will be allowed to cancel these orders after 2 seconds. However, if these orders are filled in the mean time, the environment will inform the agent so it can update its current position. Even though, it just will take new actions after passed those 2 seconds.\n",
"\n",
"In the next subsection, I will try different configurations of $k$ and $\\gamma$ to try to improve the performance of the learning agent over the same trial.\n"
"```\n",
"Udacity Reviewer:\n",
"\n",
"Please be sure to note any complications that occurred during the coding process. Otherwise, this section is simply excellent\n",
"```\n",
"One of the biggest complication of the approach proposed in this project was to find out a reasonable representation of the environment state that wasn't too big to visit each state-action pair sufficiently often but was still useful in the learning process. In the next subsection, I will try different configurations of $k$ and $\\gamma$ to try to improve the performance of the learning agent over the same trial.\n"
]
},
{
Expand Down Expand Up @@ -2087,9 +2100,9 @@
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [conda env:python2]",
"display_name": "Python [conda root]",
"language": "python",
"name": "conda-env-python2-py"
"name": "conda-root-py"
},
"language_info": {
"codemirror_mode": {
Expand Down

0 comments on commit aa6be6e

Please sign in to comment.