404
+ +Page not found
+ + +From 4cd3bbe8552297d419c15df9f16be92ce11d9b23 Mon Sep 17 00:00:00 2001 From: <> Date: Wed, 17 Aug 2022 20:20:45 +0000 Subject: [PATCH] Deployed 573d7e5 with MkDocs version: 1.3.1 --- .nojekyll | 0 404.html | 141 ++ benchmarking.html | 301 +++ computations.html | 234 ++ controlling_memory.html | 202 ++ css/fonts/Roboto-Slab-Bold.woff | Bin 0 -> 87624 bytes css/fonts/Roboto-Slab-Bold.woff2 | Bin 0 -> 67312 bytes css/fonts/Roboto-Slab-Regular.woff | Bin 0 -> 86288 bytes css/fonts/Roboto-Slab-Regular.woff2 | Bin 0 -> 66444 bytes css/fonts/fontawesome-webfont.eot | Bin 0 -> 165742 bytes css/fonts/fontawesome-webfont.svg | 2671 ++++++++++++++++++++ css/fonts/fontawesome-webfont.ttf | Bin 0 -> 165548 bytes css/fonts/fontawesome-webfont.woff | Bin 0 -> 98024 bytes css/fonts/fontawesome-webfont.woff2 | Bin 0 -> 77160 bytes css/fonts/lato-bold-italic.woff | Bin 0 -> 323344 bytes css/fonts/lato-bold-italic.woff2 | Bin 0 -> 193308 bytes css/fonts/lato-bold.woff | Bin 0 -> 309728 bytes css/fonts/lato-bold.woff2 | Bin 0 -> 184912 bytes css/fonts/lato-normal-italic.woff | Bin 0 -> 328412 bytes css/fonts/lato-normal-italic.woff2 | Bin 0 -> 195704 bytes css/fonts/lato-normal.woff | Bin 0 -> 309192 bytes css/fonts/lato-normal.woff2 | Bin 0 -> 182708 bytes css/theme.css | 13 + css/theme_extra.css | 191 ++ data_analytics.html | 269 +++ extra.css | 3 + img/favicon.ico | Bin 0 -> 1150 bytes index.html | 169 ++ js/html5shiv.min.js | 4 + js/jquery-3.6.0.min.js | 2 + js/theme.js | 2 + js/theme_extra.js | 8 + machine_learning.html | 272 +++ optimization.html | 327 +++ pycomputations.html | 344 +++ pytensors.html | 300 +++ scheduling.html | 357 +++ scientific_computing.html | 264 ++ search.html | 148 ++ search/lunr.js | 3475 +++++++++++++++++++++++++++ search/main.js | 109 + search/search_index.json | 1 + search/worker.js | 133 + sitemap.xml | 68 + sitemap.xml.gz | Bin 0 -> 205 bytes tensors.html | 239 ++ tutorial.html | 180 ++ 47 files changed, 10427 insertions(+) create mode 100644 .nojekyll create mode 100644 404.html create mode 100644 benchmarking.html create mode 100644 computations.html create mode 100644 controlling_memory.html create mode 100644 css/fonts/Roboto-Slab-Bold.woff create mode 100644 css/fonts/Roboto-Slab-Bold.woff2 create mode 100644 css/fonts/Roboto-Slab-Regular.woff create mode 100644 css/fonts/Roboto-Slab-Regular.woff2 create mode 100644 css/fonts/fontawesome-webfont.eot create mode 100644 css/fonts/fontawesome-webfont.svg create mode 100644 css/fonts/fontawesome-webfont.ttf create mode 100644 css/fonts/fontawesome-webfont.woff create mode 100644 css/fonts/fontawesome-webfont.woff2 create mode 100644 css/fonts/lato-bold-italic.woff create mode 100644 css/fonts/lato-bold-italic.woff2 create mode 100644 css/fonts/lato-bold.woff create mode 100644 css/fonts/lato-bold.woff2 create mode 100644 css/fonts/lato-normal-italic.woff create mode 100644 css/fonts/lato-normal-italic.woff2 create mode 100644 css/fonts/lato-normal.woff create mode 100644 css/fonts/lato-normal.woff2 create mode 100644 css/theme.css create mode 100644 css/theme_extra.css create mode 100644 data_analytics.html create mode 100644 extra.css create mode 100644 img/favicon.ico create mode 100644 index.html create mode 100644 js/html5shiv.min.js create mode 100644 js/jquery-3.6.0.min.js create mode 100644 js/theme.js create mode 100644 js/theme_extra.js create mode 100644 machine_learning.html create mode 100644 optimization.html create mode 100644 pycomputations.html create mode 100644 pytensors.html create mode 100644 scheduling.html create mode 100644 scientific_computing.html create mode 100644 search.html create mode 100644 search/lunr.js create mode 100644 search/main.js create mode 100644 search/search_index.json create mode 100644 search/worker.js create mode 100644 sitemap.xml create mode 100644 sitemap.xml.gz create mode 100644 tensors.html create mode 100644 tutorial.html diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..913e4a3 --- /dev/null +++ b/404.html @@ -0,0 +1,141 @@ + + +
+ + + + +Page not found
+ + +The performance of Python applications that use TACO can be measured using
+Python's built-in time.perf_counter
function with minimal changes to the
+applications. As an example, we can benchmark the performance of the
+scientific computing application shown here as
+follows:
import pytaco as pt
+from pytaco import compressed, dense
+import numpy as np
+import time
+
+csr = pt.format([dense, compressed])
+dv = pt.format([dense])
+
+A = pt.read("pwtk.mtx", csr)
+x = pt.from_array(np.random.uniform(size=A.shape[1]))
+z = pt.from_array(np.random.uniform(size=A.shape[0]))
+y = pt.tensor([A.shape[0]], dv)
+
+i, j = pt.get_index_vars(2)
+y[i] = A[i, j] * x[j] + z[i]
+
+# Tell TACO to generate code to perform the SpMV computation
+y.compile()
+
+# Benchmark the actual SpMV computation
+start = time.perf_counter()
+y.compute()
+end = time.perf_counter()
+
+print("Execution time: {0} seconds".format(end - start))
+In order to accurately measure TACO's computational performance, only the
+time it takes to actually perform a computation should be measured. The time
+it takes to generate code under the hood for performing that computation should
+not be measured, since this overhead can be quite variable but can often be
+amortized in practice. By default though, TACO will only generate and compile
+code it needs for performing a computation immediately before it has to
+actually perform the computation. As the example above demonstrates, by
+manually calling the result tensor's compile
method, we can tell TACO to
+generate code needed for performing the computation before benchmarking starts,
+letting us measure only the performance of the computation itself.
Warning
+pytaco.evaluate
and pytaco.einsum
should not be used to benchmark
+TACO's computational performance, since timing those functions will
+include the time it takes to generate code for performing the computation.
The time it takes to construct the initial operand tensors should also not be
+measured, since again this overhead can often be amortized in practice. By
+default, pytaco.read
and functions for converting NumPy arrays and SciPy
+matrices to TACO tensors return fully constructed tensors. If you add nonzero
+elements to an operand tensor by invoking its insert
method though, then
+pack
must also be explicitly invoked before any benchmarking is done:
import pytaco as pt
+from pytaco import compressed, dense
+import numpy as np
+import random
+import time
+
+csr = pt.format([dense, compressed])
+dv = pt.format([dense])
+
+A = pt.read("pwtk.mtx", csr)
+x = pt.tensor([A.shape[1]], dv)
+z = pt.tensor([A.shape[0]], dv)
+y = pt.tensor([A.shape[0]], dv)
+
+# Insert random values into x and z and pack them into dense arrays
+for k in range(A.shape[1]):
+ x.insert([k], random.random())
+x.pack()
+for k in range(A.shape[0]):
+ z.insert([k], random.random())
+z.pack()
+
+i, j = pt.get_index_vars(2)
+y[i] = A[i, j] * x[j] + z[i]
+
+y.compile()
+
+start = time.perf_counter()
+y.compute()
+end = time.perf_counter()
+
+print("Execution time: {0} seconds".format(end - start))
+TACO avoids regenerating code for performing the same computation though as +long as the computation is redefined with the same index variables and with the +same operand and result tensors. Thus, if your application executes the same +computation many times in a loop and if the computation is executed on +sufficiently large data sets, TACO will naturally amortize the overhead +associated with generating code for performing the computation. In such +scenarios, it is acceptable to include the initial code generation overhead +in the performance measurement:
+import pytaco as pt
+from pytaco import compressed, dense
+import numpy as np
+import time
+
+csr = pt.format([dense, compressed])
+dv = pt.format([dense])
+
+A = pt.read("pwtk.mtx", csr)
+x = pt.tensor([A.shape[1]], dv)
+z = pt.tensor([A.shape[0]], dv)
+y = pt.tensor([A.shape[0]], dv)
+
+for k in range(A.shape[1]):
+ x.insert([k], random.random())
+x.pack()
+for k in range(A.shape[0]):
+ z.insert([k], random.random())
+z.pack()
+
+i, j = pt.get_index_vars(2)
+
+# Benchmark the iterative SpMV computation, including overhead for
+# generating code in the first iteration to perform the computation
+start = time.perf_counter()
+for k in range(1000):
+ y[i] = A[i, j] * x[j] + z[i]
+ y.evaluate()
+ x[i] = y[i]
+ x.evaluate()
+end = time.perf_counter()
+
+print("Execution time: {0} seconds".format(end - start))
+Warning
+In order to avoid regenerating code for performing a computation, the
+computation must be redefined with the exact same index variable objects
+and also with the exact same tensor objects for operands and result. In
+the example above, every loop iteration redefines the computation of y
+and x
using the same tensor and index variable objects costructed outside
+the loop, so TACO will only generate code to compute y
and x
in the
+first iteration. If the index variables were constructed inside the loop
+though, TACO would regenerate code to compute y
and x
in every loop
+iteration, and the compilation overhead would not be amortized.
Note
+As a rough rule of thumb, if a computation takes on the order of seconds or +more in total to perform across all invocations with identical operands and +result (and is always redefined with identical index variables), then it is +acceptable to include the overhead associated with generating code for +performing the computation in performance measurements.
+Tensor algebra computations can be expressed in TACO with tensor index +notation, which at a high level describes how each element in the output tensor +can be computed from elements in the input tensors. As an example, matrix +addition can be expressed in index notation as
+A(i,j) = B(i,j) + C(i,j)
+where A
, B
, and C
denote order-2 tensors (i.e. matrices) while i
and
+j
are index variables that represent abstract indices into the corresponding
+dimensions of the tensors. In words, the example above essentially states that,
+for every i
and j
, the element in the i
-th row and j
-th column of the
+A
should be assigned the sum of the corresponding elements in B
and C
.
+Similarly, element-wise multiplication of three order-3 tensors can be
+expressed in index notation as follows
A(i,j,k) = B(i,j,k) * C(i,j,k) * D(i,j,k)
+The syntax shown above corresponds to exactly what you would have to write in +C++ with TACO to define tensor algebra computations. Note, however, that prior +to defining a tensor algebra computation, all index variables have to be +declared. This can be done as shown below:
+IndexVar i, j, k; // Declare index variables for previous example
+In both of the previous examples, all of the index variables are used to index +into both the output and the inputs. However, it is possible for an index +variable to be used to index into the inputs only, in which case the index +variable is reduced (summed) over. For instance, the following example
+y(i) = A(i,j) * x(j)
+can be rewritten with the summation more explicit as and demonstrates how matrix-vector multiplication can be expressed +in index notation.
+Note that, in TACO, reductions are assumed to be over the smallest +subexpression that captures all uses of the corresponding reduction variable. +For instance, the following computation
+y(i) = A(i,j) * x(j) + z(i)
+can be rewritten with the summation more explicit as
++ +
+whereas the following computation
+y(i) = A(i,j) * x(j) + z(j)
+can be rewritten with the summation more explicit as
++ +
+Once a tensor algebra computation has been defined (and all of the inputs have
+been initialized), you can simply invoke the
+output tensor's evaluate
method to perform the actual computation:
A.evaluate(); // Perform the computation defined previously for output tensor A
+Under the hood, when you invoke the evaluate
method, TACO first invokes the
+output tensor's compile
method to generate kernels that assembles the output
+indices (if the tensor contains any sparse dimensions) and that performs the
+actual computation. TACO would then call the two generated kernels by invoking
+the output tensor's assemble
and compute
methods. You can manually invoke
+these methods instead of calling evaluate
as demonstrated below:
A.compile(); // Generate output assembly and compute kernels
+A.assemble(); // Invoke the output assembly kernel to assemble the output indices
+A.compute(); // Invoke the compute kernel to perform the actual computation
+This can be useful if you want to perform the same computation multiple times,
+in which case it suffices to invoke compile
once before the first time the
+computation is performed.
It is also possible to compute on tensors without having to explicitly invoke
+compile
, assemble
, or compute
. Once you attempt to modify or view the
+output of a computation, TACO would automatically invoke those methods if
+necessary in order to compute the values in the output tensor. If the input to
+a computation is itself the output of another computation, then TACO would also
+automatically ensure that the latter computation is fully executed first.
When using the TACO C++ library, the typical usage is to declare your input
+taco::Tensor
structures, then add data to these structures using the insert
+method. This is wasteful if the data is already loaded into memory in a
+compatible format; TACO can use this data directly without copying it. Below
+are some usage examples for common situations where a user may want to do this.
A two-dimensional CSR matrix can be created using three arrays:
+rowptr
(array of int
): list of indices in colidx
representing starts of rowscolidx
(array of int
): list of column indices of non-zero valuesvals
(array of T
for Tensor<T>
): list of non-zero values corresponding to columns in colidx
The taco::makeCSR<T>
function takes these arrays and creates a
+taco::Tensor<T>
. The following example constructs a 5x10 matrix populated
+with a few values.
int *rowptr = new int[6]{0, 2, 4, 4, 4, 7};
+int *colidx = new int[7]{3, 5, 0, 7, 7, 8, 9};
+double *values = new double[7]{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7};
+Tensor<double> A = makeCSR("A", {5, 10}, rowptr, colidx, values);
+Similarly, a two-dimensional CSC matrix can be created from the appropriate
+arrays using the taco::makeCSC<T>
function. This example constructs the same
+5x10 matrix from the CSR example above, but in CSC format.
int *colptr = new int[11]{0, 1, 1, 1, 2, 2, 3, 3, 5, 6, 7};
+int *rowidx = new int[7]{1, 0, 0, 1, 4, 4, 4};
+double *values = new double[7]{0.3, 0.1, 0.2, 0.4, 0.5, 0.6, 0.7};
+Tensor<double> B = makeCSC("B", {5, 10}, colptr, rowidx, values);
+For single-dimension dense vectors, you can use an array of values (of type T
+for a Tensor<T>
). There is no helper function for this (like makeCSR
or
+makeCSC
), but it can be done. This example constructs a 1x10 dense vector.
// Create an array of double values.
+double *x_values = new double[10];
+for (int i = 0; i < 10; i++) {
+ x_values[i] = i;
+}
+
+// Create the Tensor and set its storage to our array of values.
+Tensor<double> x({10}, Dense);
+Array x_array = makeArray<double>(x_values, 10);
+TensorStorage x_storage = x.getStorage();
+x_storage.setValues(x_array);
+x.setStorage(x_storage);
+
+ *g%2((WW-Z
z97F_wef !pma|lSQ?8FBe{DRl|@FzhU7Cb>&@ataZS{g
zrCQo@Lushkm71KyL3$%QD?KeSA #R4x5a04mzf2YJ>#U<-Nc81}`=EFMdUjIP>d|
zG57p9Z{fmuKgy=mdh|^p{vLBqr2nl~Uvs zPR?_(`m^k SD&E=gDxo
zBjV)w8+o5o5xxh)Pkk=gF;~NaZ4y1G;8;s-Ki}>T19M8XDSHi7Nn~SjDz7Qn5kM@c
ztfWt==Dpwu&Z$5!!9TN ${Sa*Ri#HVPA
zs%kYH$<&HQbomsJodBSR=qwdsV6iwXrS03wk`^vkte%m{)H|V|=z=3g4l8y>@D%d6
zLLnb?n**$$2Tq(>wHp27PpV9s?mv=F3K6#z-ClGg>9VF;Ab%XDN>=^DZ~eCFR{e
zg0U4!I+@&iJ4rAQ6+t$-334SW2MQ{!(&1bRqb3<&ueF7DpCC6~4xaqW49#3urv-2b
z+Q8Ebgjzki>bK#e#|f1dL*i9xLp1;Sd{lHpgAFv%1zr(<*&66JxXsiKe@OSg9hjYu
zM6N5^!c{J9q2R~cV(&tk-uDrI79zZY#UZ>|
z^6ta9p_ZRC{7(eqh7(+Q5bTscY
zHQ%wPv|(RxK;LY+aYIbxar;J@DWn~=i$S1A3pX&gCp(a*Z4LklxsQP#3&Q_e|7xFZN3VJIE%D3#k%P*{
zA^W-M-Rbvr?SAK!cYpRRrF_+@ajS+6?eSh?UCA2vBa`Qrv&@MTGzzmuIm(8VR_?#$
zwmn^kR}bzSt8RO`xKpQM@#pDV)dib!T0hZY;OpIU_m1t^wJ&{iAZ!Hu;VDxiN%o1nat7f@K`fHq
z&y|-}_uJPgE5Gj2s?LWxxeAomYOWa5t8MOtYsPrn6ROXthvl47NEqhh1DRD41#&9H
z8u|DHaCw+Bfw}12ph-;3NAw2Z^v4s*52|QXKv9Gqmux3$;O3@#aSU*FnQS(MxR^cT
zAB2Q|7=(7Sh7GB$9Wrdz!kqNf?CjL^9MONmg#P`;jqlsa<7t)Q@#tePj6VcVD_>%-
z58i8)(Y^2x*~t_m^&4|x&LgGSvnneeF%fC30G{4=@GJS2UGRD1`G4Su@?CSC!YDBL
zdj-t(`ImJ7Wd$hk0i
z4P-6 43=s+1>IFAQm(|Ajyks(wXnz@Zbm-wG$RBT
zn?&I3PK9PXhC5wZYB*0y?~+};E~CR$;|_oF$&rfn?c>FiiL<+nN`=&ll39~^7hlHU
zZj`4zs9C_a(cQbS~~BcTRm2?O6LN2r&M~(JO2|&Qhz~9Mc>fUBD)a#=noI
zf3)Kh#AR~nsjYEv{f9^?ywKvfhK_!}#T~>ofuSapBoJcRCC9px?2Xah@(3$q;e-N&
zqqb