Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend] Improve dot support to target FMA #4516

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

binarman
Copy link
Contributor

@binarman binarman commented Aug 14, 2024

This PR:

  • Refactors FMA dot implementation
  • Supports dot3d in FMA path
  • Fixes several issues in operand offset computation
  • Enables small dot operands

This PR is a part of PR series. Final goal is to improve efficiency of small dot operations and bypass as much shared memory accesses as possible.

Rough list of PRs:

This PR:
- Refactors FMA dot implementation
- Supports dot3d in FMA path
- Fixes several issues in operand offset computation
- Enables small dot operands
…ompiltion time and reduce number of instructions in assembly,

fix bug with wrong order field used for share mem load size computation
Copy link
Collaborator

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of comments; I still need to review SharedToDotOperandFMA.cpp more carefully.

include/triton/Dialect/TritonGPU/IR/Dialect.h Outdated Show resolved Hide resolved
lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated Show resolved Hide resolved
third_party/nvidia/backend/compiler.py Outdated Show resolved Hide resolved
third_party/amd/backend/compiler.py Outdated Show resolved Hide resolved
lib/Conversion/TritonGPUToLLVM/DotOpToLLVM/FMA.cpp Outdated Show resolved Hide resolved
lib/Conversion/TritonGPUToLLVM/DotOpToLLVM/FMA.cpp Outdated Show resolved Hide resolved
python/test/unit/language/test_core.py Outdated Show resolved Hide resolved
@antiagainst
Copy link
Collaborator

antiagainst commented Sep 23, 2024

When addressing comments, please make sure to add new commits and not squashing into existing ones. Otherwise it's hard to re-review again.

@antiagainst antiagainst changed the title Relax dot operand constrains with FMA based dot [Backend] Improve dot support to target FMA Sep 23, 2024
@binarman
Copy link
Contributor Author

binarman commented Oct 2, 2024

When addressing comments, please make sure to add new commits and not squashing into existing ones. Otherwise it's hard to re-review again.

hmm, let me rework this with merge commits, I did not notice this comment in time

Typically I rebase changes on top of main branch, when I have conflicts. Because this way history look clean and it is easier to review, but I see that you prefer merge updates, so I'll continue doing it that way.

Copy link
Collaborator

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the implementation! I still have a few nits.

The change is quite intensive and I'm not super familiar with shared layout indexing and such. @zhanglx13 please take another look.

include/triton/Conversion/TritonGPUToLLVM/Utility.h Outdated Show resolved Hide resolved
third_party/nvidia/backend/compiler.py Outdated Show resolved Hide resolved
python/test/unit/language/test_core.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks good to me. But please wait for @zhanglx13 to double check the logic there given it's fairly involved. I'll approve after Lixun approves.

@antiagainst antiagainst marked this pull request as ready for review October 5, 2024 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants