-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid LDGSTS routing by changing default copy to be universalcopy #1674
base: main
Are you sure you want to change the base?
Conversation
cutlass_library.egg-info/ | ||
build/ | ||
.vscode/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping this is okay to add? I've a habit of just doing git add .
for small repos and it can get annoying having to deal with me accidently committing my build directory.
@@ -485,7 +485,7 @@ int main(int argc, char const **args) { | |||
// Tiled copy from Smem to Registers | |||
// Note : CuTe will vectorize this copy if the tiling + swizzling above were right | |||
using TiledCopyS2R = TiledCopy< | |||
Copy_Atom<DefaultCopy, ElementAcc>, | |||
Copy_Atom<AutoVectorizingCopyWithAssumedAlignment<128>, ElementAcc>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the DefaultCopy used to be AutoVectorizingCopyWithAssumedAlignment<128>
, it must now be changed to explicitly specify this to preserve behavior. I believe this is the easiest way to reduce the scope of this PR and keep changes minimal.
@@ -109,7 +109,7 @@ template <class T, class GmemLayout, class RmemTiler> | |||
void | |||
test_copy_vectorization(GmemLayout gmem_layout, RmemTiler rmem_tiler) | |||
{ | |||
test_copy_vectorization<T>(DefaultCopy{}, gmem_layout, rmem_tiler); | |||
test_copy_vectorization<T>(AutoVectorizingCopyWithAssumedAlignment<128>{}, gmem_layout, rmem_tiler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the DefaultCopy used to be AutoVectorizingCopyWithAssumedAlignment<128>
, it must now be changed to explicitly specify this to preserve behavior. I believe this is the easiest way to reduce the scope of this PR and keep changes minimal.
cc @thakkarV Any ideas on other places of CUTLASS that might need help? Perhaps any Ampere related issues? |
@ccecka I like this change. wdyt? |
how about removing |
Or (un)setting All of these changes look very dangerous to me. |
The problem as I understood it was that logically it's a bit unintuitive that
I'm probably missing some context here. Could you give an example situation of where this change could go wrong? (Might be a stupid question 😅 ) |
I think this could work too, actually, IIUC if downstream code specifies that it wants to do a specific async cp it'll go through a different codepath from this anyway. Also won't need to change the default type so less code changes overall. |
@ccecka and I had a conversation about this internally, and have some thoughts.
As for the MR itself, you are replacing a copy with a universal copy with assumed alignment of 128b. This is very dangerous because not all tensors may have that alignment. We are working on a nicer fix of this internally that entails adding a simple assignment copy type that is only used when no atom is specified. |
I actually made a mistake I believe and I meant to do |
This PR has been labeled |
This PR has been labeled |
#1672
This PR changes the default copy to be
UniversalCopy
so the LDGSTS instruction is avoided, and downstream users will need to specify the copy type if they want to use it which is more intuitive.Note: I imagine since all of the tests and examples that are configured to use
DefaultCopy
will need to now transition over toAutoVectorizingCopyWithAssumedAlignment<128>
as that was the previous copy type. This way we can preserve behavior across benchmarks and tests. Before I make a bunch of changes across files though I'd like to get feedback now incase I'm missing anything.