Suppose you have a very efficient algorithm which performs a certain action (say, division of floating point number in O(n)), and you want to use this algorithm in 10 different programs in your RT Embedded project. What is the best approach to apply here, in terms of performance and size? Suppose I want to use an algorithm from open source which is packed as a library on a specific program in my project. What is the best approach here?
In this article, I will explain what libraries are in high level, and which approach to take in common usage scenarios.
What is a library?
A library is a container of one or more objects which export some functionality (API). Examples for libraries are the C library, or the Open SSL cryptographic library. The interface with a library is a header file (or more than one) which contains the library’s function prototypes, data types and macros. A program which needs the services of a library needs to include this header file for the compilation process, and to link with the library’s binary file for the link process.
Some general library facts
- There are two types of libraries: Shared and Static.
- Libraries are not linked when created.
- Libraries do not have a main( ) function.
- A library does not have a context of its own. It runs in the context of the calling program.
- A library can rely on other libraries for functionality it requires. In this case, the calling program must link with all the needed libraries. This is often called “The dependency hell”, because the library dependency could result in a very long link list (library A requires library B which requires library C, etc.).
- Libraries must be compiled with Position Independent Code switch on (will be discussed later).
A Shared library is a library which is used by more than one program. The library’s code is shared between all the programs – and not duplicated, however, the library’s data is unique per each program. Therefore, each program can call a function in a shared library with no fear of corrupting another program’s data while saving precious space by not duplicating the same code. Due to this reason, and the fact that each program in the Linux user space has a different address space, shared libraries must be compiled with Position Independent Code switch on. Each process accesses the shared library’s code in its own context, in a different address space.
The shared library’s code resides in the target’s file system, in the /lib directory. The library’s binary file contains the entire contents of the library, because in the time it is created, there is no way to know which functions are not used by any program. The library’s code is loaded upon demand. The shared library may be big, but if a program is using only some of it, the loader will load only the required part. This fact reduces the amount of RAM a program spends, but also increases the amount of time it takes to load and run the program in the first time. Note that there is a way to reduce the final library size by removing unused functions. This will be described in an advanced article.
Calling a function in a shared library is less efficient than calling the same function if it was a part of the program. The reason is due to the fact that the library’s code is located “somewhere else” in a distant address, and therefore, a cache miss will happen.
The naming convention of Shared libraries is libxxxxx.so, where “.so” stands for Shared-Object. When a program wants to link with a shared library, it should specify the –l flag with the library’s name without the “lib“ prefix and “.so” suffix (i.e. –lxxxxx). See the article about specifying compilation flags for more details.
The list of all the shared libraries a program is using is provided by the kernel in a proc file /proc/<pid>/maps. See the article about getting the proc filesystem for more details.
Here’s an example of the contents of a program maps file:
# cat /proc/152/maps00008000-0002a000 r-xp 00000000 1f:02 35 /bin/busybox 00031000-00032000 rw-p 00021000 1f:02 35 /bin/busybox 00032000-00037000 rwxp 00032000 00:00 0 [heap] 04000000-04005000 r-xp 00000000 1f:02 122 /lib/ld-uClibc-0.9.29.so 04005000-04007000 rw-p 04005000 00:00 0 0400c000-0400d000 r--p 00004000 1f:02 122 /lib/ld-uClibc-0.9.29.so 0400d000-0400e000 rw-p 00005000 1f:02 122 /lib/ld-uClibc-0.9.29.so 0400e000-04049000 r-xp 00000000 1f:02 117 /lib/libuClibc-0.9.29.so 04049000-04050000 ---p 04049000 00:00 0 04050000-04051000 r--p 0003a000 1f:02 117 /lib/libuClibc-0.9.29.so 04051000-04052000 rw-p 0003b000 1f:02 117 /lib/libuClibc-0.9.29.so 04052000-04057000 rw-p 04052000 00:00 0 0ec79000-0ec8e000 rwxp 0ec79000 00:00 0 [stack]
We can see that this program links with the uClibc (micro C library for embedded systems) loader and C library. We can also see the virtual address spaces allocated for accessing these libraries.
See the article about generating executables with GNU Compiler collection, for details about the creationof the Shared libraries, and how to link them to your program.
Shared libraries can also be loaded in run-time (as opposed to load time). In this case, a program does not link with the library, but loads it in run-time, in an explicit manner using the dl command family. This form of usage is also reffered as “plug-ins”. Basically, a generic program can provide “hooks” which can be filled by functions from a plug-in. The implementation of each hook can be different in each plug-in, according to the requirements of the system. An article about plug-ins will be provided in the future.
Static libraries, unlink shared libraries, are just archives for objects which export functionality (that’s why they are also referenced as archives). They are not a part of the file system and therefore do not reside there. Each program which requires some of the library’s functions will copy the whole object containing the functions’ binary code. That’s why you should link with a static library when only one program in your project requires it, or only a very few when the performance is more important the space consumption, otherwise, you’ll end up with multiple copies of the same code wasting your storage space. There is a way to optimize this to a function level copy, and it will be described in an advanced article. If more than one copy of the same functionality is required (for example, two different programs want the same function) you should consider to use the shared library format. As mentioned, the performance of a function on a static library is better than a shared library because the actual function is actually copied inside the program itself.
Static libraries must be compiled with Position Independent Code as well, because a Shared library might require one or more functions from this archive.
The naming convention of Static libraries is libxxxxx.a, where “.a” stands for Archive. When a program wants to link with a static library, it should specify the –l flag with the library’s name without the “lib“ prefix and “.a” suffix (i.e. –lxxxxx). See the article about specifying compilation flags for more details.
See the article about generating executables with GNU Compiler collection, for details about the creation of the Static libraries, and how to link them to your program.
Now you know the difference between libraries in high level, and it is clearer which approach to take for each given requirement. There is much more going on behind the scenes, but this will be covered in an advanced article.
|Check out the ads, there could be something that may interest you there. The ads revenue helps me to pay for the domain and storage.|