In embedded systems, size does matter. Embedded products are usually limited in resources of RAM and storage (usually Flash) and the cost pressure forces you to think about creative ways to reduce the overall size of the binary applications and libraries, without reducing the features and functionality. In the years I’ve been working in the embedded systems business, I often deal with requirements to reduce the overall size of the application due to system limitation. Therefore, I have a lot of experience in this field which I am going to share with you in this “Size optimization” series. The series will include information about size optimization and reduction, general tips, optimizing applications, static libraries, shared libraries, file systms and the Linux kernel.
Background
Embedded systems are usually equipped with limited amount of RAM (32MB – 128MB) and even more limited amount of flash memory (1MB – 16MB), where the sizes of the devices grow in the power of 2. So if your application image requires 4.1MB of storage, your product must be equipped with a 8MB flash device instead of a cheaper 4MB device, if the image was just a bit smaller. In the following series of articles, I’m going to show some techniques to reduce the binary size without actually changing the code itself, which could yield also better results. I will dedicate one or more article about writing a more efficient code as well. The input for this purpose is the source code which is refered as a “black box”.
General tips for size optimization
The following tips will yield smaller binary output.
Configure the compiler
The compiler can be configured to produce smaller binaris by applying optimizations to the assembly code that result in smaller output. As described in the “using the gcc” article, enable this option by adding the “-Os” flag in the command line. In case another optimization level is used (-Ox, where x could be 0,1,2,3), you’ll need to replace it. Note that replacing the optimization with size optimization may also result in performance impact, especially if levels 2 or 3 were previously used. These levels configure the compiler to modify the code to yield better performance, and level 3 even increases the output size in order to improve performance (like loop unfolding for example). The size optimization works on the other direction, where the performance is less important than the actual output size. This option is mostly used when compiling a boot loader which must be confined to the first 1 or few sectors in the flash, but can be also used for compiling the kernel and user space applications. The following table shows the output size (in bytes) of an application which was compiled with various optimization levels:
| No optimization | -O2 | -O3 | -Os |
| 503,751 | 466,469 | 522,369 | 449,817 |
As we can see from this table, the level 2 optimization has the best trade-off between size and performance, comparing to “no optimiztion”. The size in level 2 optimization has been reduced by 8%. Level 3 optimization has actually increased the size by 4% (with the supposed performance increase) and level s has reduced the file size by 11% (with some impact of performance).
Stripping the output
By default, and especially when the -g flag was used in the compilation command line, the output contains some debug symbols which are useful for debugging and analysis. However, in you final target, these symbols are not required and can be removed (stripped) in order to reduce the size of the image. In order to remove these symbols, use the stripapplication which comes in the binutils package with the –strip-unneeded flag. The following table shows the size differences (in bytes) before and after stripping:
| Before stripping | After stripping |
| 449,817 | 162,916 |
Stripping the output is safe and has no effect on the performance. It does not change the behavior of the application, just removes unused data.
Compiling in Thumb mode (ARM platforms only)
The ARM family of CPUs provides an option to work in a reduced instruction set, which is 16-bit in length instead of 32-bit. This instruction set is referred as “Thumb mode“. This mode actually uses a compressed instruction set and thus resulting with a smaller code. The Thumb mode may reduce the output size by up to 30%. The performance figures are not always clear here. In case the CPU is 16-bit native, it will yield better performance. However, on a 32-bit CPU, the thumb mode may either reduce the performance or not impact it at all (it’s application depended). In a mixed instruction system, you also need to configure the compiler to produce interwork code, which is required when switching from the ARM mode to Thumb mode and vice versa. Therefore, in order to produce a Thumb output, use the “-mthumb -mthumb-interwork” flags. Note that you can’t compile the Linux kernel in Thumb mode and most of boot loaders due to use of ARM assembly code and other reasons. However, it can be very useful for user space applications and libraries.
Resources:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0135a/BABFCAHG.html
http://www.rt-embedded.com/blog/archives/using-gcc-part-2/
| Check out the ads, there could be something that may interest you there. The ads revenue helps me to pay for the domain and storage. |




ShareThis

Popular Posts