PTRACE/DEBUGGER GUERILLA TACTICS FOR ADDING CODE TO A PROCESS POST STARTUP -------------------------------------------------------------------------- (aside: This should really improve our ability to deal with Solaris/other proprietary environments). Several cases allow us to simply activate our code and then detach ourselves. We would like to have all the system dependent stuff handled by gdb or some system debugger so that we get tons of work and portability for free. dynamically linked (or statically linked with libdl) can just call putenv(PCT, LD_PRELOAD), dlopen() from the debugger All is then identical to the pre-exec environment variable case. dynamically linked, exporting a sub-dlopen interface to the dynamic linker. Similar to the above. Can still call some kind of _dl_ function to get our code properly bound into the address space. This simply relies on the dynamic linker actually being linked in with available symbols and a decent interface. Not sure if Linux supports this. dynamically linked or statically linked with symbols (at least open,mmap) We can open our code file, mmap() its text and mmap(ANON) a data region. We must rewrite our discovery/setup/signal code to explicitly indirect through global symbol table. Pretty easy since this code is so short and uses so little external functionality. Use debugger's understanding of symbol resolution to bootstrap a symbol table for our dynamically added code. E.g., call functions in our added code from the debugger (mmap() + offset_of_add_to_table)(&open). Even if ldd does not report load addresses we can discover them if some debugger knows shared libs. For example, 'nm libc', 'size libc', and the dynamic value of &open from the debugger give us what we need for libc. can call putenv(PCT, LD_PRELOAD) It should be pretty easy to make this idea portable to a wide range of systems (e.g. everywhere some scriptable debugger understands so's). Bear in mind for these next two that the sort of profiling reports one can get is limited. Annotated disassembly is the best one can hope for. It is not at all clear that they are worth the trouble. It's still an interesting challenge to see how far ptrace() guerilla tactics can take us. Also, profil() is sufficient since all code is in one big region. statically linked, stripped (the hardest case for a detachable controller) We need to make open() and mmap() syscalls in the context of the process before we can even get started. These probably only require syscall numbers and the debuggers ability to manipulate registers. This is the tricky and highly non-portable part of the problem. E.g., one could read and update the stack pointers to extend the stack and use whatever kernel calling convention is appropriate. Yuck. At least this is only necessary for two functions. It isn't even clear if manual function calls can be done this way. Another idea is to disassemble the libc stubs for open() and mmap(), identify the syscall convention and relocation issues (e.g. errno), create an offline assembly snippet, extend the stack, copy the snippet onto the stack, and call the address on the stack as if it were a function. Should work in theory if the stack is executable, which I think is ancient tradition... Once we have open&mmap, we can add code and that added code can operate independently provided it has its own system call stubs and doesn't use libc. On a lot of systems it should be possible to fake out this stuff. GNU libc surely documents syscall conventions. If that doesn't pan out, we may be able statically link our code against libc. There are relocatability issues. Most linkers allow one to tweak the load address of data. With an mmap() whose first argument works we could simply carve out the same portion of VM every time. If the first argument does not work then we could actually re-link the object file *after* mmap() tells us the value (it's obviously critical that re-linking not change the size of the file). We could write on top of the existing file so that the kernel just updates the mmap()d data. Then presto, we have another copy of libc. There is duplication of some things like errno that in principle should be shared. In practice, it would probably be ok to have a pair of them, syscalls in our code using one, syscalls in the main code using another. Can not call putenv(PCT, LD_PRELOAD). In fact we cannot even reliably know where the 'environ' which the target process exec() sees is located in VM, [ we might be able to disassemble libc:getenv() and do an approximate string search for the code and infer the address. Hmm. ]. Thusly while we can confer the property of being profiled to children, it will not be preserved across exec() unless we actually tracing fork()s and exec()s and go through our hackish code addition process a number of times. That requires staying attached. Oh well. gdb, at least, can trace fork()s and exec()s, so this may still be possible in a debugger script context, though it is dubious that "portability" is a concern when we are hacking syscall stubs onto a new stack frame... statically linked, stripped (non-portable) Finally, we may be willing to stay attached and add one process per target process to the system load. In this case we would reimplement a small portion of debugger functionality for any platform we need to work on. All we really need to do is query the program counter/instruction pointer, which is a pretty easy target process register inspection. However the timers and signals must still operate in the scheduling context of the target process. [ It isn't clear how valuable real-time sampling is compared to scheduled time sampling. ] If we can figure out how to call itimer(2), profil(2) and/or open/mmap/write to store data, then we may as well just do the the last idea and detach the controller. The only motivation to stay attached seems to be forcing inheritance of profiling upon exec()d children which are also statically linked.