PTRACE/DEBUGGER GUERILLA TACTICS FOR ADDING CODE TO A PROCESS POST STARTUP
--------------------------------------------------------------------------

  (aside: This should really improve our ability to deal with Solaris/other
   proprietary environments).

Several cases allow us to simply activate our code and then detach ourselves.
We would like to have all the system dependent stuff handled by gdb or some
system debugger so that we get tons of work and portability for free.

  dynamically linked (or statically linked with libdl)

    can just call putenv(PCT, LD_PRELOAD), dlopen() from the debugger
    All is then identical to the pre-exec environment variable case.

  dynamically linked, exporting a sub-dlopen interface to the dynamic linker.

    Similar to the above.  Can still call some kind of _dl_ function to get
    our code properly bound into the address space.  This simply relies on
    the dynamic linker actually being linked in with available symbols and
    a decent interface.  Not sure if Linux supports this.

  dynamically linked or statically linked with symbols (at least open,mmap)

    We can open our code file, mmap() its text and mmap(ANON) a data region.
    We must rewrite our discovery/setup/signal code to explicitly indirect
    through global symbol table.  Pretty easy since this code is so short
    and uses so little external functionality.

    Use debugger's understanding of symbol resolution to bootstrap a symbol
    table for our dynamically added code.  E.g., call functions in our added
    code from the debugger (mmap() + offset_of_add_to_table)(&open).

    Even if ldd does not report load addresses we can discover them if some
    debugger knows shared libs.  For example, 'nm libc', 'size libc', and the
    dynamic value of &open from the debugger give us what we need for libc.

    can call putenv(PCT, LD_PRELOAD)

    It should be pretty easy to make this idea portable to a wide range of
    systems (e.g. everywhere some scriptable debugger understands so's).

Bear in mind for these next two that the sort of profiling reports one can get
is limited.  Annotated disassembly is the best one can hope for.  It is not at
all clear that they are worth the trouble.  It's still an interesting challenge
to see how far ptrace() guerilla tactics can take us.  Also, profil() is
sufficient since all code is in one big region.

  statically linked, stripped (the hardest case for a detachable controller)
    
    We need to make open() and mmap() syscalls in the context of the process
    before we can even get started.  These probably only require syscall
    numbers and the debuggers ability to manipulate registers.  This is the
    tricky and highly non-portable part of the problem.  E.g., one could read
    and update the stack pointers to extend the stack and use whatever kernel
    calling convention is appropriate.  Yuck.  At least this is only necessary
    for two functions.

    It isn't even clear if manual function calls can be done this way.  Another
    idea is to disassemble the libc stubs for open() and mmap(), identify the
    syscall convention and relocation issues (e.g. errno), create an offline
    assembly snippet, extend the stack, copy the snippet onto the stack, and
    call the address on the stack as if it were a function.  Should work in
    theory if the stack is executable, which I think is ancient tradition...

    Once we have open&mmap, we can add code and that added code can operate
    independently provided it has its own system call stubs and doesn't use
    libc.  On a lot of systems it should be possible to fake out this stuff.
    GNU libc surely documents syscall conventions.

    If that doesn't pan out, we may be able statically link our code against
    libc.  There are relocatability issues.  Most linkers allow one to tweak
    the load address of data.  With an mmap() whose first argument works we
    could simply carve out the same portion of VM every time.  If the first
    argument does not work then we could actually re-link the object file
    *after* mmap() tells us the value (it's obviously critical that re-linking
    not change the size of the file).  We could write on top of the existing
    file so that the kernel just updates the mmap()d data.  Then presto, we
    have another copy of libc.  There is duplication of some things like errno
    that in principle should be shared.  In practice, it would probably be ok
    to have a pair of them, syscalls in our code using one, syscalls in the
    main code using another.

    Can not call putenv(PCT, LD_PRELOAD).  In fact we cannot even reliably know
    where the 'environ' which the target process exec() sees is located in VM,
    [ we might be able to disassemble libc:getenv() and do an approximate
    string search for the code and infer the address. Hmm. ].

    Thusly while we can confer the property of being profiled to children, it
    will not be preserved across exec() unless we actually tracing fork()s and
    exec()s and go through our hackish code addition process a number of times.
    That requires staying attached.  Oh well.  gdb, at least, can trace fork()s
    and exec()s, so this may still be possible in a debugger script context,
    though it is dubious that "portability" is a concern when we are hacking
    syscall stubs onto a new stack frame...

  statically linked, stripped (non-portable)

    Finally, we may be willing to stay attached and add one process per target
    process to the system load.  In this case we would reimplement a small
    portion of debugger functionality for any platform we need to work on.
    All we really need to do is query the program counter/instruction pointer,
    which is a pretty easy target process register inspection.

    However the timers and signals must still operate in the scheduling context
    of the target process. [ It isn't clear how valuable real-time sampling is
    compared to scheduled time sampling. ] If we can figure out how to call
    itimer(2), profil(2) and/or open/mmap/write to store data, then we may as
    well just do the the last idea and detach the controller.  The only
    motivation to stay attached seems to be forcing inheritance of profiling
    upon exec()d children which are also statically linked.