From 289dfb0da2a7eededc591e1c99fc9ca941ad98b8 Mon Sep 17 00:00:00 2001 From: Dave Welsch Date: Tue, 21 Jan 2025 18:13:50 -0800 Subject: [PATCH] Edit feature guide analysis tool pages. Signed-off-by: Dave Welsch --- Docs/featureguide/analysis tools/index.rst | 25 +- .../interactive_visualization.rst | 21 +- .../layer_output_generation.rst | 125 ++++++---- .../analysis tools/quant_analyzer.rst | 219 +++++++++--------- 4 files changed, 218 insertions(+), 172 deletions(-) diff --git a/Docs/featureguide/analysis tools/index.rst b/Docs/featureguide/analysis tools/index.rst index 66804948c57..97a2ac3da43 100644 --- a/Docs/featureguide/analysis tools/index.rst +++ b/Docs/featureguide/analysis tools/index.rst @@ -11,23 +11,20 @@ Analysis tools Quantization analyzer Layer output generation +AIMET offers these tools to view and analyze a model's interal quantization results. -:ref:`Interactive visualization ` -------------------------------------------------------------------------- +Interactive visualization +------------------------- -Produces an interactive HTML to view the statistics collected by each quantizer during calibration. +:ref:`Interactive visualization ` produces an interactive HTML console showing the statistics collected by each quantizer during calibration. -:ref:`Quantization analyzer ` ----------------------------------------------------------- +Quantization analyzer +--------------------- -QuantAnalyzer analyzes your pre-trained model and points out sensitive layers to quantization -in the model. It checks model sensitivity to weight and activation quantization, performs per -layer sensitivity and MSE analysis. It also exports per layer encodings min and max ranges and -statistics histogram for every layer. +:ref:`Quantization analyzer ` (QuantAnalyzer) analyzes your pre-trained model and identifies layers sensitive to quantization. It checks model sensitivity to weight and activation quantization, and performs per-layer sensitivity and mean square error analysis. It also exports per-layer encoding min and max ranges and +statistics histograms for every layer. -:ref:`Layer output generation ` ---------------------------------------------------------------------- +Layer output generation +----------------------- -This API captures and saves intermediate layer-outputs of a model. This allows layer-output -comparison between quantization simulated model (QuantSim object) and actually -quantized model on target-device to debug accuracy miss-match issues at the layer level. +:ref:`Layer output generation ` is an API that captures and saves intermediate layer model outputs. This allows layer-output comparison between a quantization simulated model (QuantSim object) and an actual quantized model on a target device in order to debug accuracy mismatch issues at the layer level. diff --git a/Docs/featureguide/analysis tools/interactive_visualization.rst b/Docs/featureguide/analysis tools/interactive_visualization.rst index 163d41fe7bf..72f7666a64c 100644 --- a/Docs/featureguide/analysis tools/interactive_visualization.rst +++ b/Docs/featureguide/analysis tools/interactive_visualization.rst @@ -7,14 +7,13 @@ Interactive visualization Context ======= -Creates an interactive visualization of min and max activations/weights of all quantized modules -in the Quantization simulation :class:`QuantizationSimModel` object. +Interactive visualization displays the range (min and max values) of activations and weights for all quantized modules +in the quantization simulation :class:`QuantizationSimModel` object. -The features include: +Interactive visualization functionality includes: -- Adjustable threshold values to flag layers whose min or max activations/weights exceed the set thresholds - -- Tables containing names and ranges for layers exceeding threshold values. +- Adjustable threshold values to flag layers for which min or max activations or weights exceed these values +- Tables containing names and ranges for layers exceeding threshold values Workflow @@ -33,3 +32,13 @@ API .. include:: ../../apiref/torch/interactive_visualization.rst :start-after: # start-after + + .. tab-item:: TensorFlow + :sync: tf + + Interactive visualization does not support TensorFlow. + + .. tab-item:: ONNX + :sync: onnx + + Interactive visualization does not support ONNX. diff --git a/Docs/featureguide/analysis tools/layer_output_generation.rst b/Docs/featureguide/analysis tools/layer_output_generation.rst index 59313896bd1..2e2c2a91277 100644 --- a/Docs/featureguide/analysis tools/layer_output_generation.rst +++ b/Docs/featureguide/analysis tools/layer_output_generation.rst @@ -9,24 +9,33 @@ Layer output generation Context ======= -This API captures and saves intermediate layer-outputs of your pre-trained model. The model -can be original (FP32) or :class:`QuantizationSimModel`. +Layer output generation is an API that captures and saves intermediate layer-outputs of your pre-trained model. The model +can be original (FP32) or a :class:`QuantizationSimModel`. -The layer-outputs are named according to the exported PyTorch/ONNX/TensorFlow model by the +The layer outputs are named according to the exported model (PyTorch, ONNX, or TensorFlow) by the QuantSim export API :func:`QuantizationSimModel.export`. -This allows layer-output comparison amongst quantization simulated model (QuantSim) -and actually quantized model on target-runtimes like |qnn|_ to debug accuracy miss-match +This enables layer output comparison between quantization simulated (QuantSim) models +and quantized models on target runtimes like |qnn|_ to debug accuracy mismatch issues at the layer level (per operation). Workflow ======== -Code example ------------- +The layer output generation framework follows the same workflow for all model frameworks: -Step 1 Obtain Original or QuantSim model from AIMET Export Artifacts -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +1. Imports +2. Load a model from AIMET +3. Obtain inputs +4. Generate layer outputs + + +Choose your framework below for code examples. + +Step 1: Importing the API +------------------------- + +Import the API. .. tab-set:: :sync-group: platform @@ -39,13 +48,6 @@ Step 1 Obtain Original or QuantSim model from AIMET Export Artifacts :start-after: # Step 0. Import statements :end-before: # End step 0 - **Obtain Original or QuantSim model from AIMET Export Artifacts** - - .. literalinclude:: ../../legacy/torch_code_examples/layer_output_generation_code_example.py - :language: python - :start-after: # Step 1. Obtain original or quantsim model - :end-before: # End step 1 - .. tab-item:: TensorFlow :sync: tf @@ -54,13 +56,6 @@ Step 1 Obtain Original or QuantSim model from AIMET Export Artifacts :start-after: # Step 0. Import statements :end-before: # End step 0 - **Obtain Original or QuantSim model from AIMET Export Artifacts** - - .. literalinclude:: ../../legacy/keras_code_examples/layer_output_generation_code_example.py - :language: python - :start-after: # Step 1. Obtain original or quantsim model - :end-before: # End step 1 - .. tab-item:: ONNX :sync: onnx @@ -69,15 +64,44 @@ Step 1 Obtain Original or QuantSim model from AIMET Export Artifacts :start-after: # Step 0. Import statements :end-before: # End step 0 - **Obtain Original or QuantSim model from AIMET Export Artifacts** + +Step 2: Loading a model +----------------------- + +Export the original or QuantSim model from AIMET. + +.. tab-set:: + :sync-group: platform + + .. tab-item:: PyTorch + :sync: torch + + .. literalinclude:: ../../legacy/torch_code_examples/layer_output_generation_code_example.py + :language: python + :start-after: # Step 1. Obtain original or quantsim model + :end-before: # End step 1 + + .. tab-item:: TensorFlow + :sync: tf + + .. literalinclude:: ../../legacy/keras_code_examples/layer_output_generation_code_example.py + :language: python + :start-after: # Step 1. Obtain original or quantsim model + :end-before: # End step 1 + + .. tab-item:: ONNX + :sync: onnx .. literalinclude:: ../../legacy/onnx_code_examples/layer_output_generation_code_example.py :language: python :start-after: # Step 1. Obtain original or quantsim model :end-before: # End step 1 -Step 2 Generate layer-outputs -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Step 3: Obtaining inputs +------------------------ + +Obtain inputs from which to generate intermediate layer outputs. .. tab-set:: :sync-group: platform @@ -85,54 +109,61 @@ Step 2 Generate layer-outputs .. tab-item:: PyTorch :sync: torch - **Obtain inputs for which we want to generate intermediate layer-outputs** - .. literalinclude:: ../../legacy/torch_code_examples/layer_output_generation_code_example.py :language: python :start-after: # Step 2. Obtain pre-processed inputs :end-before: # End step 2 - **Generate layer-outputs** - - .. literalinclude:: ../../legacy/torch_code_examples/layer_output_generation_code_example.py - :language: python - :start-after: # Step 3. Generate outputs - :end-before: # End step 3 - .. tab-item:: TensorFlow :sync: tf - **Obtain inputs for which we want to generate intermediate layer-outputs** - .. literalinclude:: ../../legacy/keras_code_examples/layer_output_generation_code_example.py :language: python :start-after: # Step 2. Obtain pre-processed inputs :end-before: # End step 2 - **Generate layer-outputs** - - .. literalinclude:: ../../legacy/keras_code_examples/layer_output_generation_code_example.py - :language: python - :start-after: # Step 3. Generate outputs - :end-before: # End step 3 - .. tab-item:: ONNX :sync: onnx - **Obtain inputs for which we want to generate intermediate layer-outputs** - .. literalinclude:: ../../legacy/onnx_code_examples/layer_output_generation_code_example.py :language: python :start-after: # Step 2. Obtain pre-processed inputs :end-before: # End step 2 - **Generate layer-outputs** + +Step 4: Generating layer outputs +-------------------------------- + +Generate the specified layer outputs. + +.. tab-set:: + :sync-group: platform + + .. tab-item:: PyTorch + :sync: torch + + .. literalinclude:: ../../legacy/torch_code_examples/layer_output_generation_code_example.py + :language: python + :start-after: # Step 3. Generate outputs + :end-before: # End step 3 + + .. tab-item:: TensorFlow + :sync: tf + + .. literalinclude:: ../../legacy/keras_code_examples/layer_output_generation_code_example.py + :language: python + :start-after: # Step 3. Generate outputs + :end-before: # End step 3 + + .. tab-item:: ONNX + :sync: onnx .. literalinclude:: ../../legacy/onnx_code_examples/layer_output_generation_code_example.py :language: python :start-after: # Step 3. Generate outputs :end-before: # End step 3 + API === diff --git a/Docs/featureguide/analysis tools/quant_analyzer.rst b/Docs/featureguide/analysis tools/quant_analyzer.rst index 1aa311ec384..bd254e249f8 100644 --- a/Docs/featureguide/analysis tools/quant_analyzer.rst +++ b/Docs/featureguide/analysis tools/quant_analyzer.rst @@ -7,89 +7,82 @@ Quantization analyzer Context ======= -The Quantization analyzer (QuantAnalyzer) performs several analyses to identify sensitive areas and -hotspots in your model. These analyses are performed automatically. To use QuantAnalyzer, you pass -in callbacks to perform forward passes and evaluations, and optionally a dataloader for MSE loss -analysis. +The Quantization analyzer (QuantAnalyzer) automatically performs several analyses to identify sensitive areas in your model. To use QuantAnalyzer, you pass in callbacks to perform forward passes and evaluations, and optionally a dataloader for mean square error (MSE) loss analysis. -For each analysis, QuantAnalyzer outputs JSON and/or HTML files containing data and plots for +For each analysis, QuantAnalyzer generates JSON and/or HTML files containing the data, and plots for visualization. -Detailed analysis descriptions -============================== +Analysis descriptions +===================== -QuantAnalyzer performs the following analyses: +QuantAnalyzer performs the following analyses. -1. Sensitivity analysis to weight and activation quantization -------------------------------------------------------------- +1: Sensitivity to weight and activation quantization +---------------------------------------------------- QuantAnalyzer compares the accuracies of the original FP32 model, an activation-only quantized model, and a weight-only quantized model. This helps determine which AIMET quantization technique(s) will -be more beneficial for the model. +be more effective in the model. -For example, in situations where the model is more sensitive to activation quantization, Post-training -quantization (PTQ) techniques like Adaptive Rounding (Adaround) or Cross-layer equalization (CLE) might +For example, in situations where the model is more sensitive to activation quantization, post-training +quantization (PTQ) techniques like Adaptive Rounding (Adaround) or Cross-layer Equalization (CLE) might not be very helpful. -Quantized accuracy metric for your model are printed as part of AIMET logging. +Quantized accuracy metrics for your model are printed as part of AIMET logging. -2. Per-layer quantizer enablement analysis ------------------------------------------- +2: Per-layer quantizer enablement +--------------------------------- Sometimes the accuracy drop incurred from quantization can be attributed to only a subset of layers within the model. QuantAnalyzer finds such layers by enabling and disabling individual quantizers to observe how the quantized model accuracy metric changes. -The following two types of quantizer enablement analyses are performed: +Two types of quantizer enablement analyses are performed: -1. Disable all quantizers across the model and, for each layer, enable only that layer's output quantizer -and perform evaluation with the provided callback. This results in accuracy values obtained for each -layer in the model when only that layer's quantizer is enabled, exposing the effects of individual -layer quantization and pinpointing culprit layer(s) and hotspots. +1. **One at a time**: Disable all quantizers across the model and, for each layer, enable only that layer's output quantizer. Perform evaluation with the provided callback, giving accuracy values for each +layer in the model when it's the sole quantized layer. This and pinpoints hotspots by exposing the effects of individual +layer quantization. -2. Enable all quantizers across the model and, for each layer, disable only that layer's output quantizer -and perform evaluation with the provided callback. Once again, accuracy values are produced for each +2. **Elimination**: Enable all quantizers across the model and, for each layer, disable only that layer's output quantizer. Perform evaluation with the provided callback, giving accuracy values for each layer in the model when only that layer's quantizer is disabled. -As a result of these analyses, AIMET outputs `per_layer_quant_enabled.html` and -`per_layer_quant_disabled.html` respectively, containing plots mapping layers on the x-axis to quantized -model accuracy metrics on the y-axis. +AIMET outputs the results of these analyses as `per_layer_quant_enabled.html` and +`per_layer_quant_disabled.html` respectively. These files contain plots of the quantized +model accuracy metrics for each layer. JSON files `per_layer_quant_enabled.json` and `per_layer_quant_disabled.json` are also produced, containing the data shown in the .html plots. -3. Per-layer encodings min-max range analysis ---------------------------------------------- +3: Per-layer encodings min-max range +------------------------------------ -As part of quantization, encoding parameters for each quantizer must be obtained. -These parameters include scale, offset, min, and max, and are used to map floating point values to -quantized integer values. +As part of quantization, encoding parameters for each quantizer must be calculated. +These parameters are used to map floating point values to +quantized integer values and include scale, offset, min, and max. QuantAnalyzer tracks the min and max encoding parameters computed by each quantizer in the model as a result of forward passes through the model with representative data (from which the scale and offset values can be directly obtained). -As a result of this analysis, AIMET outputs html plots and json files for each activation quantizer -and each parameter quantizer (contained in the min_max_ranges folder) containing the encoding min/max -values for each. +AIMET outputs HTML plots and JSON files to the min_max_ranges folder for each activation quantizer +and each parameter quantizer, containing the encoding min/max values for each. -If Per-channel quantization (PCQ) is enabled, encoding min and max values for all the channels -of each weight parameters are shown. +If per-channel quantization (PCQ) is enabled, encoding min and max values are shown for all the channels +of each weight parameter. -4. Per-layer statistics histogram +4: Per-layer statistics histogram --------------------------------- -Under the TF-enhanced quantization scheme, encoding min/max values for each quantizer are obtained -by collecting a histogram of tensor values seen at that quantizer and deleting outliers. +Under the TF-enhanced quantization scheme, min/max encoding values for each quantizer are obtained +by deleting outliers from the histogram of tensor values seen at the quantizer. -When this quantization scheme is selected, QuantAnalyzer outputs plots for each quantizer in the model, -displaying the histogram of tensor values seen at that quantizer. +When this quantization scheme is selected, QuantAnalyzer outputs the histogram of tensor values seen at each quantizer in the model. -These plots are available as part of the `activations_pdf` and `weights_pdf` folders, containing a +These plots are available as part of the `activations_pdf` and `weights_pdf` folders. There is a separate .html plot for each quantizer. -5. Per layer mean-square-error (MSE) loss ------------------------------------------ +5: Per-layer mean-square-error loss +----------------------------------- QuantAnalyzer can monitor each layer's output in the original FP32 model as well as the corresponding layer output in the quantized model and calculate the MSE loss between the two. @@ -101,15 +94,15 @@ Approximately **256 samples** are sufficient for the analysis. A `per_layer_mse_loss.html` file is generated containing a plot that maps layer quantizers on the x-axis to MSE loss on the y-axis. A corresponding `per_layer_mse_loss.json` file is generated -containing data corresponding to the .html file. +containing data used in the .html file. Prerequisites ============= -To call the QuantAnalyzer API, you must provide the following: +To call the QuantAnalyzer API, provide the following: - An FP32 pre-trained model for analysis -- A dummy input for the model that can contain random values but which must match the shape of the model's expected input +- A dummy input for the model. This can contain random values but it must match the shape of the model's expected input - A user-defined function for passing 500-1000 representative data samples through the model for quantization calibration - A user-defined function for passing labeled data through the model for evaluation, returning an accuracy metric - (Optional, for running MSE loss analysis) A dataloader providing unlabeled data to be passed through the model @@ -122,11 +115,10 @@ To call the QuantAnalyzer API, you must provide the following: Workflow ======== -Code example ------------- +Step 1 Importing libraries +-------------------------- -Step 1 Prepare callback for calibration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Import required libraries. .. tab-set:: :sync-group: platform @@ -134,14 +126,37 @@ Step 1 Prepare callback for calibration .. tab-item:: PyTorch :sync: torch - **Required imports** - .. literalinclude:: ../../legacy/torch_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 0. Import statements :end-before: # End step 0 - **Prepare forward pass callback** + .. tab-item:: TensorFlow + :sync: tf + + .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py + :language: python + :lines: 39-47 + + .. tab-item:: ONNX + :sync: onnx + + .. literalinclude:: ../../legacy/onnx_code_examples/quant_analyzer_code_example.py + :language: python + :start-after: # Step 0. Import statements + :end-before: # End step 0 + + +Step 2 Preparing the calibration callback +----------------------------------------- + +Prepare the callback for calibration. + +.. tab-set:: + :sync-group: platform + + .. tab-item:: PyTorch + :sync: torch .. literalinclude:: ../../legacy/torch_code_examples/quant_analyzer_code_example.py :language: python @@ -151,20 +166,14 @@ Step 1 Prepare callback for calibration .. tab-item:: TensorFlow :sync: tf - **Required imports** - - .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py - :language: python - :lines: 39-47 - - **Prepare toy dataset to run example code** + **2.1 Prepare toy dataset to run example code** .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 0. Prepare toy dataset to run example code :end-before: # End step 0 - **Prepare forward pass callback** + **2.2 Prepare forward pass callback** .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py :language: python @@ -174,22 +183,15 @@ Step 1 Prepare callback for calibration .. tab-item:: ONNX :sync: onnx - **Required imports** - - .. literalinclude:: ../../legacy/onnx_code_examples/quant_analyzer_code_example.py - :language: python - :start-after: # Step 0. Import statements - :end-before: # End step 0 - - **Prepare forward pass callback** - .. literalinclude:: ../../legacy/onnx_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 1. Prepare forward pass callback :end-before: # End step 1 -Step 2 Prepare callback for quantized model evaluation -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Step 3 Preparing the evaluation callback +---------------------------------------- + +Prepare the callback for quantized model evaluation. .. tab-set:: :sync-group: platform @@ -197,8 +199,6 @@ Step 2 Prepare callback for quantized model evaluation .. tab-item:: PyTorch :sync: torch - **Prepare eval callback** - .. literalinclude:: ../../legacy/torch_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 2. Prepare eval callback @@ -207,8 +207,6 @@ Step 2 Prepare callback for quantized model evaluation .. tab-item:: TensorFlow :sync: tf - **Prepare eval callback** - .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 2. Prepare eval callback @@ -217,15 +215,16 @@ Step 2 Prepare callback for quantized model evaluation .. tab-item:: ONNX :sync: onnx - **Prepare eval callback** - .. literalinclude:: ../../legacy/onnx_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 2. Prepare eval callback :end-before: # End step 2 -Step 3 Prepare model and callback functions -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Step 4 Preparing model +---------------------- + +Prepare the model, callback functions, and dataloader as required per platform. .. tab-set:: :sync-group: platform @@ -233,7 +232,7 @@ Step 3 Prepare model and callback functions .. tab-item:: PyTorch :sync: torch - **Prepare model and callback functions** + **Prepare model, callback functions, and data** .. literalinclude:: ../../legacy/torch_code_examples/quant_analyzer_code_example.py :language: python @@ -243,6 +242,8 @@ Step 3 Prepare model and callback functions .. tab-item:: TensorFlow :sync: tf + **Prepare the model** + .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 3. Prepare model @@ -258,8 +259,10 @@ Step 3 Prepare model and callback functions :start-after: # Step 3. Prepare model, callback functions and dataloader :end-before: # End step 3 -Step 4 Create QuantAnalyzer and run analysis -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Step 5 Creating the QuantAnalyzer +--------------------------------- + +Create the QuantAnalyzer. .. tab-set:: :sync-group: platform @@ -267,48 +270,54 @@ Step 4 Create QuantAnalyzer and run analysis .. tab-item:: PyTorch :sync: torch - **Create QuantAnalyzer object** - .. literalinclude:: ../../legacy/torch_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 4. Create QuantAnalyzer object :end-before: # End step 4 - **Run QuantAnalyzer** - - .. literalinclude:: ../../legacy/torch_code_examples/quant_analyzer_code_example.py - :language: python - :start-after: # Step 5. Run QuantAnalyzer - :end-before: # End step 5 - .. tab-item:: TensorFlow :sync: tf - **Create QuantAnalyzer object** - .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 4. Create QuantAnalyzer object :end-before: # End step 4 - **Run QuantAnalyzer** - - .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py - :language: python - :start-after: # Step 5. Run QuantAnalyzer - :end-before: # End step 5 - .. tab-item:: ONNX :sync: onnx - **Create QuantAnalyzer object** - .. literalinclude:: ../../legacy/onnx_code_examples/quant_analyzer_code_example.py :language: python :start-after: # Step 4. Create QuantAnalyzer object :end-before: # End step 4 - **Run QuantAnalyzer** + +Step 6 Running the analysis +--------------------------- + +Finally, run the QuantAnalyzer to analyze the data. + +.. tab-set:: + :sync-group: platform + + .. tab-item:: PyTorch + :sync: torch + + .. literalinclude:: ../../legacy/torch_code_examples/quant_analyzer_code_example.py + :language: python + :start-after: # Step 5. Run QuantAnalyzer + :end-before: # End step 5 + + .. tab-item:: TensorFlow + :sync: tf + + .. literalinclude:: ../../legacy/keras_code_examples/quant_analyzer_code_example.py + :language: python + :start-after: # Step 5. Run QuantAnalyzer + :end-before: # End step 5 + + .. tab-item:: ONNX + :sync: onnx .. literalinclude:: ../../legacy/onnx_code_examples/quant_analyzer_code_example.py :language: python