Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose pt_image_add? #37

Open
vext01 opened this issue Mar 6, 2018 · 10 comments
Open

Expose pt_image_add? #37

vext01 opened this issue Mar 6, 2018 · 10 comments

Comments

@vext01
Copy link
Contributor

vext01 commented Mar 6, 2018

Hi Markus,

Is there a way to load code into the image from a memory address of the current process?

It looks like pt_image_add might be what I want, but this is not exposed in libipt.

I'm trying to avoid having to dump the VDSO to file, just to immediately load it back in.

Come to think of it, if I can load code in from memory, since my process is tracing itself, I can load all of the code sections in from memory, thus avoid filesystem accesses entirely.

Thanks

@markus-metzger
Copy link
Contributor

Hello Edd,

you could register a read memory callback in struct pt_config. The decoder will call you for all addresses it cannot find in its image.

That won't work together with the block cache, though, since that is organized per image section. Decode will be significantly slower without the block cache.

We already support mmap()-based and fread()-based sections. We could add a third type for in-memory sections. But I would not do it without good reason. What's your motivation?

@ck-on-github
Copy link

I guess adding support for in-memory sections with the block cache would also be beneficial for JITted code. It would eliminate the need for dumping the code to files just for PT decoding purposes.

@vext01
Copy link
Contributor Author

vext01 commented Mar 6, 2018

We could add a third type for in-memory sections. But I would not do it without good reason. What's your motivation?

I'm writing a tracing JIT using PT for the trace collection component.

I'll have a profiling interpreter for some language which decides which part of a user-program are frequently executed. Once a location becomes "hot" I'll collect a PT trace, decode it, optimise it, and compile a trace for later executions of the same location.

Under this scenario, the traced and the tracing process is the same process (it traces itself), and all of the code needed to decode the trace is already in virtual memory. Ideally I'd just point libipt at the memory containing the code rather than reading anything from disk.

@markus-metzger
Copy link
Contributor

You're also decoding in-process?

If I understood correctly, you'd just want to generate a single section spanning the entire address space. I see how this would be more convenient. But it would require a different organization of the block cache. I'd also expect the lookup to be slower.

If you're willing to create individual sections matching your code layout, we could keep the current block cache organization. But I'm not sure we really gain a lot by not dumping the memory into files, first. Of course, this is extra work, but how much is it really compared to decode and other overhead?

Adding an in-memory section type shouldn't be too difficult but it would still be a pity if we didn't gain anything.

Another aspect is self-modifying code. If the JITer is going to overwrite older versions of a JITed function (or otherwise re-use the memory), we'd have to dump them into files, anyway, unless we can make sure that they won't appear in any trace anymore.

@vext01
Copy link
Contributor Author

vext01 commented Mar 7, 2018

Hi Markus,

Yes, I'm decoding in-process.

Ideally it would be very useful to be able to tell libipt that all code comes from the current virtutal address space, but I don't mind loading the individual sections into the image if that would fit better with the architecture you already have.

As for tracing JIT code. In my use-case that will never happen. We will only trace code that was statically compiled.

@markus-metzger
Copy link
Contributor

What improvements do you expect from an in-memory section? Have you profiled the code?

@vext01
Copy link
Contributor Author

vext01 commented Mar 7, 2018

The code is still being written, but I'll take it as a given that reading from memory is going to be faster than reading from disk.

Also, under the current API, I have to dump the VDSO to disk, and then have libipt read it back in, which seems odd/inefficient to me.

@vext01
Copy link
Contributor Author

vext01 commented Mar 21, 2018

What did you think of this Markus? Is my use case too niche for inclusion into libipt?

@markus-metzger
Copy link
Contributor

I'm hesitating because it is not clear to me that this will result in a noticeable performance improvement.

@vext01
Copy link
Contributor Author

vext01 commented Mar 27, 2018

Having thought about this some more, I think the performance improvement would only be visible when loading the image which is a one-time event for most use cases.

I suppose I should have been spinning this an a usability improvement. It's a bit awkward having to dump the VDSO just to load it in later.

I've also noticed that during decoding libipt lazily loads from the VDSO file on demand. Because I'm using Rust (and I know this is my problem, not yours!) I have to pass the file handle around for the sole purpose of keeping the temporary file around long enough (as soon as it falls out of scope, it will be deleted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants