Skip to content

Merge pull request #743 from AaltoSciComp/yu/update-deeplearning #2728

Merge pull request #743 from AaltoSciComp/yu/update-deeplearning

Merge pull request #743 from AaltoSciComp/yu/update-deeplearning #2728

Triggered via push September 30, 2024 07:16
Status Success
Total duration 1m 4s
Artifacts
Matrix: check-warnings
Fit to window
Zoom out
Zoom in

Annotations

10 warnings
check-warnings (3.12): triton/apps/llms.rst#L2
Explicit markup ends without a blank line; unexpected unindent.
check-warnings (3.12): triton/apps/python.rst#L56
Explicit markup ends without a blank line; unexpected unindent.
check-warnings (3.12): triton/tut/gpu.rst#L277
Explicit markup ends without a blank line; unexpected unindent.
check-warnings (3.12): triton/tut/storage.rst#L2
Explicit markup ends without a blank line; unexpected unindent.
check-warnings (3.12): triton/usage/localstorage.rst#L2
Explicit markup ends without a blank line; unexpected unindent.
check-warnings (3.12): triton/usage/lustre.rst#L2
Explicit markup ends without a blank line; unexpected unindent.
check-warnings (3.12): triton/usage/profiling.rst#L466
Explicit markup ends without a blank line; unexpected unindent.
check-warnings (3.12): triton/usage/smallfiles.rst#L2
Explicit markup ends without a blank line; unexpected unindent.
check-warnings (3.12): triton/usage/profiling.rst#L401
Could not lex literal_block ' Calculating pi using 1000000000 stochastic trials\n ==PROF== Connected to process 3944692 (/home/username/hpc-examples/slurm/pi-gpu)\n ==PROF== Profiling "throw_dart": 0%....50%....100% - 19 passes\n Throws: 785390400/1000000000 Pi: 3.141561508\n ==PROF== Disconnected from process 3944692\n [3944692] [email protected]\nthrow_dart(curandStateXORWOW *, int *, unsigned long *) (512, 1, 1)x(128, 1, 1), Context 1, Stream 7, Device 0, CC 7.0\n Section: GPU Speed Of Light Throughput\n ----------------------- ------------- ------------\n Metric Name Metric Unit Metric Value\n ----------------------- ------------- ------------\n DRAM Frequency cycle/usecond 877.96\n SM Frequency cycle/nsecond 1.24\n Elapsed Cycles cycle 10544719\n Memory Throughput % 49.01\n DRAM Throughput % 0.04\n Duration msecond 8.52\n L1/TEX Cache Throughput % 73.75\n L2 Cache Throughput % 49.01\n SM Active Cycles cycle 8850096.11\n Compute (SM) Throughput % 37.09\n ----------------------- ------------- ------------\n\n OPT This kernel grid is too small to fill the available resources on this device, resulting in only 0.4 full\n waves across all SMs. Look at Launch Statistics for more details.\n\n Section: Launch Statistics\n -------------------------------- --------------- ---------------\n Metric Name Metric Unit Metric Value\n -------------------------------- --------------- ---------------\n Block Size 128\n Function Cache Configuration CachePreferNone\n Grid Size 512\n Registers Per Thread register/thread 21\n Shared Memory Configuration Size byte 0\n Driver Shared Memory Per Block byte/block 0\n Dynamic Shared Memory Per Block byte/block 0\n Static Shared Memory Per Block byte/block 0\n Threads thread 65536\n Waves Per SM 0.40\n -------------------------------- --------------- ---------------\n\n Section: Occupancy\n ------------------------------- ----------- ------------\n Metric Name Metric Unit Metric Value\n ------------------------------- ----------- ------------\n Block Limit SM block 32\n Block Limit Registers block 21\n Block Limit Shared Mem block 32\n Block Limit Warps block 16\n Theoretical Active Warps per SM warp 64\n Theoretical Occupancy % 100\n Achieved Occupancy % 39.97\n Achieved Active Warps Per SM warp 25.58\n ------------------------------- ----------- ------------\n\n OPT Estimated Speedup: 60.03%\n This kernel\'s theoretical occupancy is not impacted by any block limit. The difference between calculated\n theoretical (100.0%) and measured achieved occupancy (40.0%) can be the result of warp scheduling overheads\n or workload imbalances during the kernel execution. Load imbalances can occur between warps within a block\n as well as across blocks of the same kernel. See the CUDA Best Practices Guide\n (https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#occupancy) for more details on\n optimizing occupancy.' as "bash". Highlighting skipped.
check-warnings (3.12): triton/usage/quotas.rst#L11
more than one target found for 'any' cross-reference 'smallfiles': could be :doc:`the page on small files` or :std:ref:`the page on small files`