Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster elide middle implementation #2487

Merged
merged 5 commits into from
Sep 2, 2024

Conversation

digit-google
Copy link
Contributor

Drastically speed up the implementation of the ElideMiddle() function by getting rid of std::regex usage abd unnecessary string allocations / reallocations. This also removes about 65 kiB from the stripped Linux binary.

  • Move the function to its own source file and add a performance test.
  • Fix a bug in the implementation that removed color sequences from elided results + avoid allocations (x1.8 speedup)
  • Remove std::regex usage completely, using custom parsing classes (x5.8 speedup)
  • Add fast path to avoid ANSI sequence stripping when not needed.

Overall performance test speedup is x10.6 (from 159ms to 15,0ms) on my machine

@jhasse jhasse added this to the 1.13.0 milestone Aug 30, 2024
@jhasse
Copy link
Collaborator

jhasse commented Aug 30, 2024

ElideMiddle is only used in the unit tests? Can you move the declaration there then?

I'm also not sure if it's a good idea to move so much code around because it breaks the Git history. Especially for the unit tests it would be helpful to see, that your changes don't change any behavior just by looking if you touched any tests.

@jhasse
Copy link
Collaborator

jhasse commented Aug 30, 2024

Forgot to say: Big thanks! I was thinking that std::regex would probably to slow and we should replace it before releasing 1.13.0.

@digit-google
Copy link
Contributor Author

I can keep the unit-test at the same location, but I would rather move the function to its own source file, given its complexity (especially after the refactor).

Another option is to make the unit-tests more readable first, using string macros to encode the ANSI sequences, so things look like:

#define MAGENTA "\x1B[0;35m"
#define NOTHING "\33[m"
#define RED "\x1b[1;31m"
#define RESET "\x1b[0m"

TEST(ElideMiddle, ElideAnsiEscapeCodes) {
  std::string input = "012345" MAGENTA "67890123456789";
  EXPECT_EQ("012..." MAGENTA "6789", ElideMiddle(input, 10));
  EXPECT_EQ("012345" MAGENTA "67...23456789", ElideMiddle(input, 19));

  EXPECT_EQ("Nothing " NOTHING " string.", ElideMiddle("Nothing "NOTHING" string.", 18));
  EXPECT_EQ("0" NOTHING "12...6789", ElideMiddle("0" NOTHING "1234567890123456789", 10));

  input = "abcd" RED "efg" RESET "hlkmnopqrstuvwxyz";
  EXPECT_EQ("", ElideMiddle(input, 0));
  EXPECT_EQ(".", ElideMiddle(input, 1));
  EXPECT_EQ("..", ElideMiddle(input, 2));
  EXPECT_EQ("...", ElideMiddle(input, 3));
  EXPECT_EQ("..." RED RESET "z", ElideMiddle(input, 4));
  EXPECT_EQ("a..." RED RESET "z", ElideMiddle(input, 5));
  EXPECT_EQ("a..." RED RESET "yz", ElideMiddle(input, 6));
  EXPECT_EQ("ab..." RED RESET "yz", ElideMiddle(input, 7));
  EXPECT_EQ("ab..." RED RESET "xyz", ElideMiddle(input, 8));
  EXPECT_EQ("abc..." RED RESET "xyz", ElideMiddle(input, 9));
  EXPECT_EQ("abc..." RED RESET "wxyz", ElideMiddle(input, 10));
  EXPECT_EQ("abcd" RED "..." RESET "wxyz", ElideMiddle(input, 11));
  EXPECT_EQ("abcd" RED "..." RESET "vwxyz", ElideMiddle(input, 12));

  EXPECT_EQ("abcd" RED "ef..." RESET "uvwxyz" ElideMiddle(input, 15));
  EXPECT_EQ("abcd" RED "ef..." RESET "tuvwxyz", ElideMiddle(input, 16));
  EXPECT_EQ("abcd" RED "efg" RESET "...tuvwxyz", ElideMiddle(input, 17));
  EXPECT_EQ("abcd" RED "efg" RESET "...stuvwxyz", ElideMiddle(input, 18));
  EXPECT_EQ("abcd" RED "efg" RESET "h...stuvwxyz", ElideMiddle(input, 19));

  input = "abcdef" RED "A" RESET "BC";
  EXPECT_EQ("..." RED RESET "C", ElideMiddle(input, 4));
  EXPECT_EQ("a..." RED RESET "C", ElideMiddle(input, 5));
  EXPECT_EQ("a..." RED RESET "BC", ElideMiddle(input, 6));
  EXPECT_EQ("ab..." RED RESET "BC", ElideMiddle(input, 7));
  EXPECT_EQ("ab..." RED "A" RESET "BC", ElideMiddle(input, 8));
  EXPECT_EQ("abcdef" RED "A" RESET "BC", ElideMiddle(input, 9));
}

Which is easier to read, then move to a different file. Wdyt?

Use C string macros to hold ANSI escape sequences in order
to make the test much easier to understand and follow.

NOTE: This does not change the test in any way!
Move the function to its own source file since it is
no longer trivial (and will get optimized in future
patches).

+ Add a new ElideMiddleInPlace() function that doesn't
  do anything if no elision is required + use it in
  status.cc.

+ Add a new elide_middle_perftest program to benchmark
  the performance of the function.

  On my machine, the 'avg' result is 159.9ms.
Add a fast-path for the case where there is no escape sequence
in the input, that avoids un-necessary string allocations.

Properly handle ANSI color sequences that appear in the elided
part of the input string. They *must* be preserved to ensure
that the right span is printed with the right color.

For example, consider the following input:

    |GREEN|aaaaaaaaaaaaa|RED|bbbbbbbbbbb|BLACK|

With different levels of elision:

    max_width=0   -> |GREEN|RED|BLACK| instead of ""

    max_width=5   -> |GREEN|a..|RED|b|BLACK|
      (this was |GREEN|a..b|BLACK| before this fix)

Moreoever, do not call std::regex_iterator::str() to avoid
un-necessary string allocations and concatenations.

With this patch, 'elide_middle_perftest' goes from 159.9ms
to 87.1 ms on my machine (x1.8 faster)
Get rid of std::regex and std::vector usage in ElideMiddleInPlace()
function. These are replaced with custom iterator classes that do
not perform any memory allocation at runtime.

Instead the input string will be parsed twice (once to determine the
visible width, and another time to create the elided result string).

With this patch, elide_middle_perftest goes from 87.1ms to 15.0ms
on my machine (x5.8 faster!). This also removes 65 kiB from the
stripped Linux binary for Ninja (!)
Just check for the ESC character in the `output` string,
if none is present, or if the terminal supports color,
avoid unnecessary busy work (string allocations and
parsing).
@jhasse
Copy link
Collaborator

jhasse commented Sep 2, 2024

I like it :) And since we now have to touch all lines anyway, moving it to its own file is fine.

@jhasse jhasse merged commit b2ae865 into ninja-build:master Sep 2, 2024
11 checks passed
@jhasse
Copy link
Collaborator

jhasse commented Sep 2, 2024

Thanks!

1 similar comment
@digit-google
Copy link
Contributor Author

Thanks!

@digit-google digit-google deleted the faster-elide-middle branch September 5, 2024 11:06
digit-google added a commit to digit-google/ninja that referenced this pull request Sep 26, 2024
PR ninja-build#2487 introduced a regression, where a completed command without
an output would force a newline, preventing the next status update
to appear on the same line in smart terminals.

This fixes the issue by adding the missing `!outputs.empty()`
condition + adding a proper regression test to catch future
breaks.

Fixed: ninja-build#2499
digit-google added a commit to digit-google/ninja that referenced this pull request Sep 30, 2024
PR ninja-build#2487 introduced a regression, where a completed command without
an output would force a newline, preventing the next status update
to appear on the same line in smart terminals.

This fixes the issue by adding the missing `!outputs.empty()`
condition + adding a proper regression test to catch future
breaks.

Fixed: ninja-build#2499
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants