updates

utsekaj42 · Jan 27, 2017 · 07a9aba · 07a9aba
1 parent 2e0416c
commit 07a9aba
Showing 1 changed file with 212 additions and 4 deletions.
diff --git a/sensitivity_introduction.ipynb b/sensitivity_introduction.ipynb
@@ -9,7 +9,7 @@
     "<!-- dom:AUTHOR: Leif Rune Hellevik at Department of Structural Engineering, NTNU -->\n",
     "<!-- Author: --> **Leif Rune Hellevik**, Department of Structural Engineering, NTNU\n",
     "\n",
-    "Date: **Jan 17, 2017**"
+    "Date: **Jan 27, 2017**"
    ]
   },
   {
@@ -937,9 +937,217 @@
     "    error.append(LA.norm((s_pc-s**2)/s**2))\n",
     " \n",
     "plt.figure()\n",
-    "plt.semilogy(Npc_list,error)\n",
-    "plt.xlabel('Nr samples')\n",
-    "plt.ylabel('L2-norm of error in Sobol indices')"
+    "plt.semilogy(Npc_list,error);\n",
+    "plt.xlabel('Nr samples');\n",
+    "plt.ylabel('L2-norm of error in Sobol indices');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Conditional variances\n",
+    "\n",
+    "As noted previously, the importance of a factor $Z_i$ is manifested\n",
+    "the existence of a 'shape' or 'pattern' in the model outputs\n",
+    "$Y$. Conversely, a uniform cloud of output points $Y$ as a function of\n",
+    "$Z_i$ is a symptom, albeit not a proof, indicating that $Z_i$ is a\n",
+    "noninfluential factor. In this section we seek to demonstrate that\n",
+    "conditional variances is a usefull means to quantify the 'shape' or\n",
+    "'pattern' in the outputs.\n",
+    "\n",
+    "The shape in the outputs $Y$ for a given $Z_i$, may be seen in the\n",
+    "scatterplot as of $Y$ versus $Z_i$. In particular, we may cut the\n",
+    "$Z_i$-axis into slices and assess how the distribution of the outputs\n",
+    "$Y$ changes from slice to slice. This is illustrated in the code\n",
+    "snippet below, where the slices are identified with vertical dashed\n",
+    "lines at equidistant locations on each $Z_i$-axis, $i=1, \\ldots,4$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# # Scatter plots of data, z-slices, and linear model \n",
+    "plt.figure()\n",
+    "\n",
+    "Ndz=10  #Number of slices of the Z-axes\n",
+    "\n",
+    "Zslice = np.zeros((r,Ndz))   # array for mean-values in the slices\n",
+    "ZBndry = np.zeros((r,Ndz+1)) # array for boundaries of the slices\n",
+    "dz=np.zeros(r)\n",
+    "\n",
+    "for k in range(r):\n",
+    "    plt.subplot(2,2,k+1)\n",
+    "    \n",
+    "    zmin=np.min(Z[k,:]); zmax=np.max(Z[k,:]) # each Z[k,:] may have different extremas\n",
+    "    dz[k] = (zmax-zmin )/Ndz\n",
+    "    \n",
+    "    ZBndry[k,:] = np.linspace(zmin,zmax, Ndz+1)\n",
+    "    Zslice[k,:] = np.linspace(zmin+dz[k]/2.,zmax-dz[k]/2., Ndz)\n",
+    "    \n",
+    "    # Plot the slices\n",
+    "    for i in range(Ndz):\n",
+    "        plt.axvline(ZBndry[k,i], np.amin(Y), np.amax(Y),linestyle='--', color='.75')\n",
+    "        \n",
+    "    # Plot the data\n",
+    "    plt.plot(Z[k,:],Y[:],'.')\n",
+    "    xlbl='Z'+str(k)\n",
+    "    plt.xlabel(xlbl)\n",
+    "    plt.ylabel('Y')        \n",
+    "\n",
+    "    Ymodel=linear_model(w[k],Zslice[k,:],Ndz)\n",
+    "    \n",
+    "    plt.plot(Zslice[k,:],Ymodel)\n",
+    "    \n",
+    "    ymin=np.amin(Y); ymax=np.amax(Y)\n",
+    "    plt.ylim([ymin,ymax])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note, that average value of $Y$ in a very thin slice, corresponds to keeping $Z_i$ fixed whiel while averaging over all output values of $Y$ due to all-but $Z_i$, which corresponds to the conditional expected value:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\begin{equation}\n",
+    "E_{Z_{\\sim i}} (Y\\;|\\;Z_i) \n",
+    "\\label{}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For convenience we denote 'all-but $Z_i$' by $Z_{\\sim\n",
+    "i}$. Naturally, a measure of how much $E_{Z_{\\sim i}} (Y\\;|\\;Z_i)$\n",
+    "varies in the range of $Z_i$ is given by the conditional variance:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\text{V}_{Z_i}(E_{Z_{\\sim i}} (Y\\;|\\;Z_i))\n",
+    "\\label{}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Further, the variance the output $Y$ may be decomposed into:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"eq:VarDecomp\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\text{V}(Y) = E_{Z_i} ( V_{Z_{\\sim i}} (Y \\; | Z_{i})) + \\text{V}_{Z_i}(E_{Z_{\\sim i}} (Y\\;|\\;Z_i))\n",
+    "\\label{eq:VarDecomp} \\tag{19}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A large $\\text{V}_{Z_i}(E_{Z_{\\sim i}} (Y\\;|\\;Z_i))$ will imply that\n",
+    "$Z_i$ is an important factor and is therefore coined the first-order\n",
+    "effect of $Z_i$ on $Y$, and its fraction of the total variation of $Y$ is expressed by $S_i$, `the first-order sensitivity index` of $Z_i$ on $Y$:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\begin{equation}\n",
+    "S_i = \\frac{\\text{V}_{Z_i}(E_{Z_{\\sim i}} (Y\\;|\\;Z_i))}{\\text{V}(Y)}\n",
+    "\\label{}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "By ([eq:VarDecomp](#eq:VarDecomp)), $S_i$ is number always in the range $[0,1]$,\n",
+    "and a high value implies an important factor."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# # Scatter plots of averaged y-values per slice, with averaged data\n",
+    "\n",
+    "Zsorted=np.zeros_like(Z)\n",
+    "Ysorted=np.zeros_like(Z)\n",
+    "YsliceMean=np.zeros((r,Ndz))\n",
+    "\n",
+    "plt.figure()\n",
+    "for k in range(r):\n",
+    "    plt.subplot(2,2,k+1)\n",
+    "\n",
+    "    # sort data\n",
+    "    sidx=np.argsort(Z[k,:])\n",
+    "    Zsorted[k,:]=Z[k,sidx].copy()   \n",
+    "    Ysorted[k,:]=Y[sidx].copy()   \n",
+    "    \n",
+    "    for i in range(Ndz):\n",
+    "        plt.axvline(ZBndry[k,i], np.amin(Y), np.amax(Y),linestyle='--', color='.75')\n",
+    "\n",
+    "        # find indexes of z-values in the current slice        \n",
+    "        zidx_range=np.logical_and(Zsorted[k,:]>=ZBndry[k,i],Zsorted[k,:]<ZBndry[k,i+1])\n",
+    "                    \n",
+    "        if np.any(zidx_range):    #check if range has elements\n",
+    "            YsliceMean[k,i]=np.mean(Ysorted[k,zidx_range])\n",
+    "        else:                     #set value to None if noe elements in z-slice\n",
+    "            YsliceMean[k,i]=None\n",
+    "\n",
+    "    plt.plot(Zslice[k,:],YsliceMean[k,:],'.')\n",
+    "\n",
+    "# # Plot linear model\n",
+    "    Nmodel=3\n",
+    "    zmin=np.min(Zslice[k,:])\n",
+    "    zmax=np.max(Zslice[k,:])\n",
+    "    \n",
+    "    zvals=np.linspace(zmin,zmax,Nmodel)\n",
+    "    Ymodel=linear_model(w[k],zvals,Nmodel)\n",
+    "    plt.plot(zvals,Ymodel)\n",
+    "    \n",
+    "    xlbl='Z'+str(k)\n",
+    "    plt.xlabel(xlbl)\n",
+    "    \n",
+    "    plt.ylim(ymin,ymax)"
    ]
   },
   {