-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backend] Improve dot support to target FMA #4516
base: main
Are you sure you want to change the base?
Conversation
68350e9
to
8e620d3
Compare
8e620d3
to
9d01eab
Compare
9d01eab
to
3033970
Compare
3033970
to
6907073
Compare
This PR: - Refactors FMA dot implementation - Supports dot3d in FMA path - Fixes several issues in operand offset computation - Enables small dot operands
…ompiltion time and reduce number of instructions in assembly, fix bug with wrong order field used for share mem load size computation
35bae87
to
fe8d557
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First batch of comments; I still need to review SharedToDotOperandFMA.cpp
more carefully.
lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandFMA.cpp
Outdated
Show resolved
Hide resolved
lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandFMA.cpp
Outdated
Show resolved
Hide resolved
When addressing comments, please make sure to add new commits and not squashing into existing ones. Otherwise it's hard to re-review again. |
fe8d557
to
4d70a5e
Compare
hmm, let me rework this with merge commits, I did not notice this comment in time Typically I rebase changes on top of main branch, when I have conflicts. Because this way history look clean and it is easier to review, but I see that you prefer merge updates, so I'll continue doing it that way. |
4d70a5e
to
04678cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the implementation! I still have a few nits.
The change is quite intensive and I'm not super familiar with shared layout indexing and such. @zhanglx13 please take another look.
lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandFMA.cpp
Outdated
Show resolved
Hide resolved
lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM/SharedToDotOperandFMA.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now looks good to me. But please wait for @zhanglx13 to double check the logic there given it's fairly involved. I'll approve after Lixun approves.
This PR:
This PR is a part of PR series. Final goal is to improve efficiency of small dot operations and bypass as much shared memory accesses as possible.
Rough list of PRs: