Tuesday, 23 March 2010

Libraries - Your Own and The System's - Part 3

This post will briefly cover a range of very non-essential information about shared libraries, starting with the dynamic linker and also covering playing around with programs like objdump and strip.

Then we'll do something a bit more interesting - with some knowledge of how we can change the behaviour of the dynamic linker, we can "intercept" calls to a shared library and do whatever we want, including gathering statistics, passing through the arguments to the real function or pretending an error occurred to see what would happen. This means we can intercept standard functions like puts(), malloc() and pthread_create(), and we don't even need to recompile either the program or the library in order to do this.


The Linux Dynamic Linker



The dynamic linker for Linux is, at the time of writing, ld-linux.so.2, and is usually found in the /lib/ directory, along with other essential system libraries like the standard C library. The main job of a dynamic linker is to find the shared libraries a program needs, and to load the relevant parts of those shared libraries into memory, ready for the program to use.

Obviously, if ld-linux has to find libraries, you have to have some way of telling it where those libraries are. First off, there are built-in library search paths it looks in, usually /lib/ and /usr/lib/. Then there are two main ways of specifying library search paths after build-time. The first is to use the environment variable LD_LIBRARY_PATH, which is a ':'-separated list of directories the dynamic linker should look in, much like the PATH environment variable used to tell bash where to look for executables.

Generally, you'd just use LD_LIBRARY_PATH for quick-and-dirty tests, like when we need to specify where a newly created library is in this post.

Note that LD_LIBRARY_PATH is searched first, so you could play havoc trying to run any program if you get things wrong. Here, we put an empty C library into the current directory and set LD_LIBRARY_PATH to search there, so it'll find that first, which certainly won't be healthy:

$ ldd `which ls` | grep libc
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb75f4000)
$ touch libc.so.6
$ LD_LIBRARY_PATH=`pwd` ls
ls: error while loading shared libraries: /home/john/test/libc.so.6:
file too short


Using /etc/ld.so.conf is much neater if you need a more permanent solution. It contains a list of places to search for shared libraries at run-time, with one on each line, not ':'-separated. Unlike LD_LIBRARY_PATH, however, the directories in this file aren't searched every time you run a program.

Instead, the ldconfig program searches the paths it finds in /etc/ld.so.conf and caches their names and locations in /etc/ld.so.cache. This means that any time you add a new directory with new shared libraries to /etc/ld.so.conf, or even if you add new shared libraries to the the directories already in /etc/ld.so.conf, you have to re-run the ldconfig program, and you have to run it as root.


Affecting the dynamic linker



Aside from telling the dynamic linker where to find extra libraries, there are a couple of interesting environment variables that'll affect its behaviour.

Firstly, you can get the linker to produce debugging output by setting LD_DEBUG in the environment. The most useful thing to set this to is libs, as this will show you where it's searching for library dependencies, and exactly what libraries the linker finds and uses. You can also set it to symbols, which will show you exactly which library the linker gets each and every symbol from. These will produce a lot of output and send it to stderr, unless you specify somewhere else you want the output in LD_DEBUG_OUTPUT. You can find out what other kinds of debug output are available by setting LD_DEBUG=help and running any dynamic executable.

Then there's LD_BIND_NOW, which I always set in my environment when I'm developing (it doesn't matter what you set it to, so long as it's a non-zero). The normal behaviour of the linker is to ensure that all the libraries that the executable says it needs are present before running the program. Then, when the program tries to actually use a symbol from one of the libraries, the linker goes and searches through the aforementioned libraries for that symbol. If it finds it, your program continues running in the usual way - if it can't, you get an unavoidable crash.

Setting LD_BIND_NOW changes this behaviour - instead of resolving symbols as they're needed, setting this will force the linker to resolve all symbols before control is handed over to your program. This is incredibly useful to a programmer - it means you can be sure that your compile-time library and your run-time library are compatible, and any differences will get pointed out to you as soon as possible. Otherwise, an unresolved symbol might go unnoticed indefinitely, simply because some error condition never occurs when an executable actually runs.

You can see the effect of LD_DEBUG and LD_BIND_NOW by running a simple program:

$ cat > prog.c << 'eof'
> #include <stdio.h>
> #include <unistd.h>
>
> int main()
> {
> puts("hello...");
> sleep(2);
> puts("...goodbye");
> }
> eof
$ gcc prog.c -o prog
$ ./prog
hello...
...goodbye


And now we'll run the same program, but we'll get the dynamic linker to tell us when it resolves a symbol. You get lots of output from the linker, so below shows only the very end of the output:

$ LD_DEBUG=symbols ./prog
(lots of output...)
3347: transferring control: ./prog
3347:
3347: symbol=puts; lookup in file=./prog [0]
3347: symbol=puts; lookup in file=/lib/tls/i686/cmov/libc.so.6 [0]
hello...
3347: symbol=sleep; lookup in file=./prog [0]
3347: symbol=sleep; lookup in file=/lib/tls/i686/cmov/libc.so.6 [0]
...goodbye
3347:
3347: calling fini: ./prog [0]
3347:
3347:
3347: calling fini: /lib/tls/i686/cmov/libc.so.6 [0]
3347:


3347 is the program's PID. As you can see, the output shown starts when control is transferred to prog. The next lines show the linker looking for the symbol puts, starting in the executable and then moving onto libc.so.6, where it is of course found - it can then be called, and we see the hello... message from the call to puts. The linker then goes on a similar search, this time looking for sleep, and then we see ...goodbye without a symbol search - puts has already been found and stored internally, ready to be quickly used again (you can change that behaviour though, as we'll do later in this post).

What if the linker hadn't been able to find sleep? Well, you'd have half a program run - we can prevent this using LD_BIND_NOW:

$ LD_DEBUG=symbols LD_BIND_NOW=yes ./prog
(lots of output...)
3392: symbol=__libc_start_main; lookup in file=./prog [0]
3392: symbol=__libc_start_main; lookup in file=/lib/tls/i686/cmov/libc.so.6 [0]
3392: symbol=sleep; lookup in file=./prog [0]
3392: symbol=sleep; lookup in file=/lib/tls/i686/cmov/libc.so.6 [0]
3392: symbol=puts; lookup in file=./prog [0]
3392: symbol=puts; lookup in file=/lib/tls/i686/cmov/libc.so.6 [0]
3392: symbol=_r_debug; lookup in file=./prog [0]
3392: symbol=_r_debug; lookup in file=/lib/tls/i686/cmov/libc.so.6 [0]
3392: symbol=_r_debug; lookup in file=/lib/ld-linux.so.2 [0]
(...more output...)
3392: transferring control: ./prog
3392:
hello...
...goodbye
3392:
3392: calling fini: ./prog [0]
3392:
3392:
3392: calling fini: /lib/tls/i686/cmov/libc.so.6 [0]
3392:


Here, puts and sleep are resolved before control is transferred to prog - if there's a problem with any symbol the program uses, no matter if it's from an esoteric, never-used error-handler, you're guaranteed to know about it as soon as you run your program.

There are other environment variables that affect the dynamic linker - they're detailed in the ld-linux(8) man page. We'll be using one of these extra ones, LD_BIND_NOT, later.


Symbol visibility and stripping libraries



Not all symbols in a shared library are available for linking against - just because it's there, doesn't mean you can use it.

Lets take an example - a simple library with two dummy functions:

$ cat > library.c << 'eof'
static void foo() {}
void bar() { foo(); }
eof
$ gcc library.c -shared -o library.so -fPIC
$ nm library.so | egrep 'foo|bar'
00000431 T bar
0000042c t foo


So, bar() is an ordinary function, and ends up in library as being of type "T", which means it's in the text (code) section of the resulting shared library. Nothing new there. foo(), on the other hand, has been declared static. This is not just a language-level construct that's only implemented in C - it's also enforced at the level of the linker. The symbol foo is shown to be of type "t". This is exactly the same as "T", except that it means that foo is a local symbol, and so a linker can't (in reality, will refuse to) link against it. It can still be used internally by anything in library.so, though.

A couple of examples (that we'll also use a bit later) show that we can use bar():

$ cat > use-bar.c << 'eof'
> int main() { bar(); }
> eof
$ gcc use-bar.c -o use-bar -L. -lrary
$ LD_LIBRARY_PATH=. ./use-bar


But we can't even link if we try and use foo():

$ cat > use-foo.c << 'eof'
> int main() { foo(); }
> eof
$ gcc use-foo.c -o use-foo -L. -lrary
/tmp/ccsw1ueO.o: In function `main':
use-foo.c:(.text+0x7): undefined reference to `foo'
collect2: ld returned 1 exit status


Even though there is definitely a symbol foo in library.so, our linker has refused to use it.

And now onto stripping programs and libraries, which will be fairly simple if you've understood the previous discussion.

Because programs and libraries contain some (usually lots) of symbols they simply don't need, we can just get rid of them. For a program, the symbols it exposes don't matter - all internal references are against sections and/or offsets. And through the magic of virtual memory, the program can always be loaded at the same virtual address, regardless of where it actually is in physical memory:

$ objdump -x use-bar | egrep -w 'start address|_start'
start address 0x08048410
08048410 g F .text 00000000 _start


The above shows that the symbol _start will always be located at virtual address 08048410, and that this is the start address of the program. Having this particular symbol present is purely a mechanism that allows us to inspect object files and libraries more easily - it's never used when executing the program, only the hard-coded address is.

So, let's use the strip program to cut-down the size of use-bar. This is how it starts out:

$ du -bh use-bar
8.1K use-bar
$ nm use-bar
08049f18 d _DYNAMIC
08049ff4 d _GLOBAL_OFFSET_TABLE_
0804859c R _IO_stdin_used
w _Jv_RegisterClasses
08049f08 d __CTOR_END__
08049f04 d __CTOR_LIST__
08049f10 D __DTOR_END__
08049f0c d __DTOR_LIST__
080485a0 r __FRAME_END__
08049f14 d __JCR_END__
08049f14 d __JCR_LIST__
0804a014 A __bss_start
0804a00c D __data_start
08048550 t __do_global_ctors_aux
08048440 t __do_global_dtors_aux
0804a010 D __dso_handle
w __gmon_start__
0804854a T __i686.get_pc_thunk.bx
08049f04 d __init_array_end
08049f04 d __init_array_start
080484e0 T __libc_csu_fini
080484f0 T __libc_csu_init
U __libc_start_main@@GLIBC_2.0
0804a014 A _edata
0804a01c A _end
0804857c T _fini
08048598 R _fp_hw
0804839c T _init
08048410 T _start
U bar
0804a014 b completed.6990
0804a00c W data_start
0804a018 b dtor_idx.6992
080484a0 t frame_dummy
080484c4 T main


8.1 kB in size, and 35 symbols. To strip it, just pass it as an argument to strip:

$ strip use-bar
$ du -bh use-bar
5.4K use-bar
$ nm use-bar
nm: use-bar: no symbols


So we've reduced our executable's size by a third, and gotten rid of all of the symbols. At this stage, use-bar does still contain some symbols - they're all symbols it won't know the value of until runtime, such as where certain sections are, and what undefined symbols it has - i.e. they're all dynamic:

$ nm --dynamic use-bar
0804859c R _IO_stdin_used
w _Jv_RegisterClasses
0804a014 A __bss_start
w __gmon_start__
U __libc_start_main
0804a014 A _edata
0804a01c A _end
0804857c T _fini
0804839c T _init
U bar


Now we want to do the same for our library - but there's a problem. We can't remove all the symbols - if we did that, both the link-time linker and the dynamic linker would have no way of finding out if the symbols it needs are even in our library, let alone how to actually get hold of the code/data it needs at run-time!

We can, however, only get rid of any unnecessary symbols by passing --strip-unneeded when we strip:

$ du -bh library.so
6.6K library.so
$ strip --strip-unneeded library.so
$ du -bh library.so
5.3K library.so


And we can still link and run our use-bar program, just as before.

The savings for library.so are around 20%, but these figures scale well when you strip larger libraries, especially if they're written in C++ since name-mangling can make for ludicrously long symbol names. For example, I just tried stripping the unneeded symbols from a C++ library I'm developing at work - it reduced it from 481 kB to 106 kB.

So why would you even want to keep these unneeded symbols, if they're taking up that much space? Debugging is probably the main thing you'd want them for - a debugger can associate offsets and locations within an executable with its symbols to show you the function you're in or the variable you're accessing. Without these symbols, it can just say what address you're accessing, which is basically useless. This is why libraries and programs are usually only stripped when they're released.


Wrapping library functions called from pre-compiled binaries



Now we'll do something a bit off the wall - we're going to create a wrapper for a standard shared library function that we'll be able to use without having to recompile either the library we're wrapping or the executable whose calls we're intercepting. Our function will be able to do whatever it wants before/after calling the real library function, so the functionality provided is exactly the same, as far as the program's concerned. This method, of course, can only work with shared libraries.

This can be useful for, for example, testing when memory is allocated or freed with malloc()/free(), when threads are created, and so on, when you don't want to (or simply can't) recompile it from source. You could also use this technique for error testing - want to know how your program will react if a standard library call fails? Just wrap it up in something that'll pretend to fail, returning an appropriate error code (don't forget to emulate the full behaviour of the call, like setting errno to something appropriate!).

So, on with it...

When the dynamic linker first encounters an as-yet unresolved symbol, it searches through the list of libraries the executable needs and checks each one for that symbol, until it finds one with the correct symbol. However, the list of shared libraries a program needs is embedded into the executable at link-time, and our library won't be on that list. We get around this by setting the LD_PRELOAD environment variable, which specifies a list of extra libraries to load and search before any others.

So, to demonstrate the first stage (i.e. getting our function called instead of the standard library function) we'll need a program with a call to, say, the puts funcion that we can intercept, and some code to put in a shared library to masquerade as puts. Note that we'll compile a simple test program now, and we won't have to recompile it again. Also note that we don't have to put anything exotic on the command-line when we compile our program (...and the cards are neatly shuffled, and there's nothing up my sleeves...). Afterwards, we'll intercept calls from a more complicted program - cat.

So, our test program is:

$ cat > program.c << 'eof'
#include <stdio.h>

int main(void)
{
puts("Don't stop me now");
}

eof
$ gcc program.c -o program
$ ./program
Don't stop me now


Okay, so we just need our "stooge" puts implementation, that'll just print an extra space between each character:

$ cat > wrapper.c << 'eof'
> #include <stdio.h>
>
> int puts(const char *s)
> {
> while (*s)
> {
> putchar(*s++);
> putchar(' ');
> }
> putchar('\n');
> return 0;
> }
>
> eof
$ gcc wrapper.c -shared -o wrapper.so -fPIC


And now we can just LD_PRELOAD our library, and our function will be the preferred version of puts:

$ LD_PRELOAD=./wrapper.so ./program
D o n ' t s t o p m e n o w


One important thing to note is that the function signature should be the same as for the real puts, including the return type. If you know how arguments and return values are passed between functions, you'll know how important that is - if not, that's something I plan to go into at some other point. Either way, you can't go wrong if you just use the same signature, so include the relevant header file and the compiler will complain if you get it wrong.

The next stage is to pass the function call through to the real function, so we can call arbitrary functions without having to do too much work. Now, we obviously can't just call the real function, because the dynamic linker would think that's a call to our faux function, and we'd end up in an infinite loop. The programming interface to the dynamic linker, however, has a feature that's exactly what we need.

In short, the programming interface to the dynamic linker allows us to request, at run-time, to open a dynamic library and resolve symbols from it. This is mostly used for plugins - a program can search for dynamic libraries in some directory it knows contains plugins, open them query them, and access their symbols. Of course, it only allows you to deal with raw symbols, so all type-safety is lost, but it'll let us do what we need for this purpose.

The function dlsym() (declared in dlfcn.h) returns a pointer to the location of a symbol in memory. It takes two arguments - a handle to a shared library to search, and the name of a symbol to search for. Normally, a handle is obtained from dlopen(), but it's more convenient for us to use the special handle RTLD_NEXT (this is a GNU extension, so we need to define _GNU_SOURCE to use this). As its name implies, this special handle directs dlsym() to start searching for the symbol from the next library in the dynamic linker's search path onwards - this is exactly the symbol that would have been found if our function hadn't barged in and been found first.

So, we can put this into practice with a new version of wrapper.c, which we now have to link against libdl.so:

$ cat > wrapper.c << 'eof'
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>

int (*__real_puts)(const char *s) = NULL;
int lines = 0;

int puts(const char *s)
{
if (__real_puts == NULL)
{
__real_puts = dlsym(RTLD_NEXT, "puts");

if (__real_puts == NULL)
{
return EOF;
}
}

printf("line %d: ", ++lines);

return __real_puts(s);
}

eof
$ gcc wrapper.c -shared -o wrapper.so -fPIC -ldl
$ LD_PRELOAD=./wrapper.so ./program
line 1: Don't stop me now


And there you have it - library functions intercepted with no recompiling.

We can make this slightly neater by using gcc's function attributes - have a look here if you want details of them. We'll use the constructor attribute, which tells gcc to run a function before main() is called, and the destructor attribute, which tells gcc to run a function after main(), or when exit() is called:

$ cat > wrapper.c << 'eof'
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

int (*__real_puts)(const char *s) = NULL;
int lines = 0;

void __attribute__ ((__constructor__))
wrapper_construct(void)
{
__real_puts = dlsym(RTLD_NEXT, "puts");

if (__real_puts == NULL)
exit(EXIT_FAILURE);
}

void __attribute__ ((__destructor__))
wraper_destruct(void)
{
printf("Wrote %d line(s)\n", lines);
}

int puts(const char *s)
{
printf("line %d: ", ++lines);

return __real_puts(s);
}

eof
$ gcc wrapper.c -shared -o wrapper.so -ldl
$ LD_PRELOAD=./wrapper.so ./program
line 1: Don't stop me now
Wrote 1 line(s)



Example - changing the behaviour of cat



Now we'll have some fun - we'll use this method to massively over-complicate the common beginner's exercise of printing a program's source code along with it's line numbers. We'll do this by finding and intercepting whatever call the standard UNIX utility cat uses to print to the screen.

To find out what function call cat uses to print lines, we can use LD_DEBUG=syms to find out what symbols the dynamic linker needs to resolve, and also LD_BIND_NOT (set to anything). Normally, when the linker is asked to lookup a symbol, it stores its value so it doesn't need to look it up again - LD_BIND_NOT inhibits this behaviour. This means we can look at the symbol that's been looked up just before cat actually prints any output, and know that that's the call we need to intercept:

$ echo 'This is our output' | LD_DEBUG=symbols LD_BIND_NOT=yes cat
(...lots of output...)
3032: symbol=write; lookup in file=cat [0]
3032: symbol=write; lookup in file=/lib/tls/i686/cmov/libc.so.6 [0]
This is our output
(...more output...)


So, the call we obviously need to intercept here is write, since that occurs just before our output is printed. This is using cat from GNU Coreutils version 7.4.

If you're not familiar with write(), have a look at it's man page. In short, its signature is ssize_t write(int fd, const void *buf, size_t count). It prints count characters from buf to fd, and returns the number of characters written, or -1 after setting errno.

Here's an implementation of write-wrapper.so that does what we want:

/* write-wrapper.c
*
* Copyright (c) 2010 John Graham (johngavingraham@gmail.com).
*
* Wrapper for the write() system call that prints the line
* number of anything written to stdout using this call,
* along with the text that would have been printed.
*
* See http://prognix.blogspot.com/.
*/

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <dlfcn.h>


/* pointer to the "real" write() */
ssize_t (*__real_write)(int fd,
const void *buf,
size_t count);



/* __constructor__ means this is called before main() */
void __attribute__ ((__constructor__))
write_wrapper_construct(void)
{
__real_write = dlsym(RTLD_NEXT, "write");

if (__real_write == NULL)
{
puts("Error: could not find symbol \"write\" - exiting");
exit(EXIT_FAILURE);
}
}



int current_line = 1;

ssize_t write(int fd, const void *buf, size_t count)
{
// 1 is stdout - pass writes to any other file
// descriptor through to the real write()
if (fd != 1)
return __real_write(fd, buf, count);

// strsep alters the string we pass it,
// so make a copy
char *buf_cpy = strndup(buf, count);

if (buf_cpy == NULL)
return -1;

// purely so we can free() later
const char *_buf_cpy = buf_cpy;

char *buf_ptr = strsep(&buf_cpy, "\n");

do
{
int ret = printf("Line %d: %s\n", current_line++, buf_ptr);

if (ret < 0)
return -1;

buf_ptr = strsep(&buf_cpy, "\n");
}
while (buf_cpy != NULL);

free((void *)_buf_cpy);

return count;
}

/* end write-wrapper.c */


Compile as we did before, and test on its own source code:

gcc write-wrapper.c -shared -o write-wrapper.so -fPIC -ldl
$ LD_PRELOAD=./write-wrapper.so cat write-wrapper.c
Line 1: /* write-wrapper.c
Line 2: *
Line 3: * Copyright (c) 2010 John Graham (johngavingraham@gmail.com).
Line 4: *
Line 5: * Wrapper for the write() system call that prints the line
Line 6: * number of anything written to stdout using this call,
Line 7: * along with the text that would have been printed.
Line 8: *
Line 9: * See http://prognix.blogspot.com/.
Line 10: */

(...and so on...)

Line 81:
Line 82: /* end write-wrapper.c */


As a bonus, we get to use the features of cat for free, like it's ability to take many files on the command-line, or to read from stdin by using the file - (assuming our implementation of the above is correct... corrections welcome!).

No comments:

Post a Comment