-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
difference between java implementation and python #47
Comments
Looks strange - are you able to share the data? |
It attached in the second line or link: https://github.com/CamDavidsonPilon/tdigest/files/2613099/my_set.zip |
Hi, i have a same issue here. t.batch_update([1649.0, 69.0, 69.0, 69.0, 69.0, 69.0, 69.0, 69.0])
t.percentile(25)```
result was
`-523.5` |
Hi, I noticed this too. Thanks for everything you've done! In the cases it does work, it seems to be doing a great job! However, with this simple example posted above I'm now a little worried. Do you know when this error case occurs? Is this something inherent to TDigests? When should we watch out for something like this happening? |
@CamDavidsonPilon , @idanmoradarthas , @phucbui95 - I found where this regression was introduced:
The commit before returns the correct results:
Fix: roll back the implementation of def percentile(self, p) back to the implementation in 9cd2536. @CamDavidsonPilon , are you able to explain this further? I still have to run your test cases to confirm, but this seems promising from some manual tests. Examples:
Output with 1f76113 (broken):
Output with 9cd2536 (working):
|
Addendum - this may not be as simple as it seems, but may be a step in the right direction. I further compared a test case (bigJump) in the reference implementation (https://github.com/tdunning/t-digest/blob/ff3232bc25a69961fc7bf4911f8de0026bd28c44/core/src/test/java/com/tdunning/math/stats/TDigestTest.java) Here is the java test - notice the assertions, then compare with the values with the working python implementation (9cd2536):
Working python implementation results (9cd2536):
Output:
Notice that the p94.9999999 and p95 results are still quite different from the reference implementation. This will probably only happen in extreme scenarios, but I thought I'd point it out. Perhaps the reference implementation has some other conditions that handle these extremities. I don't think the python implementation is wrong, but this just may be one of the accuracy tradeoffs made with TDigest (just a guess). |
@rlele5 unfortunately I'm not involved much in this repo anymore - if you suggest a fix, with an appropriate test, I'd be happy to review and merge it. |
@CamDavidsonPilon , sounds good. I'll have to go back and understand why the change was made in the first place and will see if there are any further changes needed. |
I have
my_set.zip.
When I'm using the java code:
I'm getting the results:
But when I'm this code:
I'm getting the results:
How come there's a large difference between between the 0.95 quantiles?
P.S
same results when I use:
The text was updated successfully, but these errors were encountered: