Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid LDGSTS routing by changing default copy to be universalcopy #1674

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Avoid LDGSTG instructions by changing default copy to be universalcopy
  • Loading branch information
ZelboK committed Aug 1, 2024
commit 8724aeeeedf934b04dba270bb1a87a183395a3f8
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# PyCache files
__pycache__/
cutlass_library.egg-info/
cutlass_library.egg-info/
build/
.vscode/
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping this is okay to add? I've a habit of just doing git add . for small repos and it can get annoying having to deal with me accidently committing my build directory.

Original file line number Diff line number Diff line change
Expand Up @@ -485,7 +485,7 @@ int main(int argc, char const **args) {
// Tiled copy from Smem to Registers
// Note : CuTe will vectorize this copy if the tiling + swizzling above were right
using TiledCopyS2R = TiledCopy<
Copy_Atom<DefaultCopy, ElementAcc>,
Copy_Atom<AutoVectorizingCopyWithAssumedAlignment<128>, ElementAcc>,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the DefaultCopy used to be AutoVectorizingCopyWithAssumedAlignment<128>, it must now be changed to explicitly specify this to preserve behavior. I believe this is the easiest way to reduce the scope of this PR and keep changes minimal.

Layout< Shape<_128,_16>,
Stride<_16,_1>>,
TileShapeS2R>;
Expand All @@ -496,9 +496,9 @@ int main(int argc, char const **args) {
cutlass::gemm::TagToStrideC_t<LayoutD>,
cutlass::epilogue::thread::LinearCombination<int32_t, 1, int32_t, int32_t>,
SmemLayout,
Copy_Atom<DefaultCopy, ElementAcc>,
Copy_Atom<AutoVectorizingCopyWithAssumedAlignment<128>, ElementAcc>,
TiledCopyS2R,
Copy_Atom<DefaultCopy, ElementOutput>>>;
Copy_Atom<AutoVectorizingCopyWithAssumedAlignment<128>, ElementOutput>>>;

//
// Assembling the GemmKernel
Expand Down
3 changes: 2 additions & 1 deletion include/cute/arch/copy.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ using AutoVectorizingCopy = AutoVectorizingCopyWithAssumedAlignment<128>;
// DefaultCopy alias does not assume alignment of pointers or dynamic strides.
//

using DefaultCopy = AutoVectorizingCopyWithAssumedAlignment<8>;

using DefaultCopy = UniversalCopy<uint_bit_t<128>>;

//
// Global memory prefetch into L2
Expand Down
2 changes: 1 addition & 1 deletion test/unit/cute/volta/vectorization_auto.cu
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ template <class T, class GmemLayout, class RmemTiler>
void
test_copy_vectorization(GmemLayout gmem_layout, RmemTiler rmem_tiler)
{
test_copy_vectorization<T>(DefaultCopy{}, gmem_layout, rmem_tiler);
test_copy_vectorization<T>(AutoVectorizingCopyWithAssumedAlignment<128>{}, gmem_layout, rmem_tiler);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the DefaultCopy used to be AutoVectorizingCopyWithAssumedAlignment<128>, it must now be changed to explicitly specify this to preserve behavior. I believe this is the easiest way to reduce the scope of this PR and keep changes minimal.

}

TEST(SM70_CuTe_Volta, SimpleVec)
Expand Down