Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[triton][tool] Add support for printing shared memory layouts in the triton-tensor-layout tool #4839

Merged
merged 8 commits into from
Oct 3, 2024

Conversation

SamGinzburg
Copy link
Contributor

The triton-tensor-layout CLI tool currently only supports printing distributed layouts, although in the past there was support for printing shared memory layouts as well previous commit here. This PR adds support for printing shared memory layouts again (using linear layouts).

I've added several unit tests based on the old tests to ensure correctness.

Example usage:

./triton-tensor-layout -l "#triton_gpu.shared<{vec = 4, perPhase = 32, maxPhase = 1, order = [1,0], hasLeadingOffset = false}>" -t "tensor<8x32xf16>"

================================================================
The core Triton is a small number of people, and we receive many PRs (thank
you!). To help us review your code more quickly, if you are a new
contributor (less than 3 PRs merged) we ask that you complete the following
tasks and include the filled-out checklist in your PR description.

Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because FILL THIS IN.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

Sam Ginzburg and others added 3 commits October 2, 2024 14:35
…layout` CLI tool.

Add several tests for shmem printing based on previously committed
tests + new ones.
@ThomasRaoux
Copy link
Collaborator

Looks nice!

@@ -152,6 +152,12 @@ void dumpHWLayout(RankedTensorType tensorType);
// Return a string representation of the layout of the tensor.
std::string getLayoutStr(RankedTensorType tensorType, bool useHWPointOfView);

// Helper functions for getLayoutStr
std::string getSharedLayoutStr(RankedTensorType tensorType,
Copy link
Contributor

@fywkevin fywkevin Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose to expose these two lower level APIs here since the getLayoutStr would dispatch to one of them. Isn't getLayoutStr enough to capture the full functionality of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right; we probably don't need to export these

fixed

auto sharedLayout = shared(/* vec */ 1, /* perPhase */ 1, /* maxPhase */ 4,
/* hasLeadingOffset */ false,
/* CTALayout, single CTA layout */
{1}, /* cpg */
Copy link
Contributor

@fywkevin fywkevin Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this comment is a little confusing since you mixed two styles: for the first half, you put comment before the argument; and for the second half you put it after the argument. Better to make them consistent.

}
}
} else {
// For the HW view here, print the (block, offset) --> (r,c) mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a deep understanding of linear layout, but from my superficial understanding, it can represent registers -> tensor element, or shared memory -> tensor element. This seems only print the layout from a shared memory perspective? How about the register one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only executed in the getSharedLayoutStr function, so it can only represent shared memory in this case; the registers perspective is at line 3353

Copy link
Contributor

@fywkevin fywkevin Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just see your reply. Got it, thanks! The register view is for the distributed encoding thing

Copy link
Contributor

@fywkevin fywkevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a nice utility that extends the capability of the layout printing tool! Although I don't have a deep knowledge of linear layouts, the heavy-lifting thing seems all happened inside ll->apply(). This implementation looks pretty nice to me, thank you!

Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@SamGinzburg SamGinzburg marked this pull request as ready for review October 3, 2024 13:07
@embg embg enabled auto-merge (squash) October 3, 2024 13:49
@embg embg merged commit 5f77e8c into triton-lang:main Oct 3, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants