Compiler magic


Forum: Programming and Scripting
Topic: Compiler magic
started by: curaga

Posted by curaga on Dec. 15 2007,10:54
I thought I'd gather some tips here, along with explanations.

Optimization levels:
-O3     Highest
-Os     Like O2, but also optimizes for size. This is a very good option; smaller programs take less ram too. The linux kernel is a good example: the recent 2.6 kernels have an experimental option to turn -Os on. To me it resulted in a kernel that took an entire mb less ram! And because this was for a 16mb machine, it really showed: free ram right after boot increased from 12mb to 13mb. (including 3 gettys running)
-O2     The usual level
-O       Smallest optimization
-O0     That's O zero. No optimization at all.

Optimization for certain processor: Tests have shown this (with -O3) makes programs about 15% faster if run on the cpu optimized for. The difference is biggest with i586 chips, because of their unique design, they get about 20% more speed.
-march=pentium2     Sets minimum processor to pentium2. Also optimizes for it.
-mcpu=pentium2     Optimizes for P2, but doesn't touch the minimum; ie this is still runnable on i386.

All the above go into CFLAGS/CXXFLAGS. Everything below goes into LDFLAGS.

-Wl,-rpath -Wl,/opt/prog/lib    A linker option to add /opt/prog/lib to the runtime library search path. This is useful for not needing to create a wrapper script to run the app, that usually sets LD_LIBRARY_PATH to that dir so the app finds it's libraries.

-Wl,-as-needed     This is a really interesting option. Let's say we have a gtk1 app that only uses libgdk. But gtk-config --libs (that the makefile uses) causes it to link with all the gtk1 libs, causing slow start because all these libraries are loaded into memory. But if you give this flag, the program will only get linked with the libraries it uses; resulting in a faster startup, and a cleaner looking ldd output. *Warning* this requires binutils 2.17.

-s     Strips the results during linking, saving you some time.



PS: for compiling speed, if you ever compile GCC yourself, be sure to do it with profiling optimization. It speeds up C compiling up to 9%.

Posted by WDef on Dec. 15 2007,11:34
Hi Curaga, nice summary.

Quote
Tests have shown


Do you have a ref for that?  Conventional wisdom has it that performance differences using optimizations within the x86 family would usually be much less than this, more like <=4% for most progs, or so I had been led to believe.  But you might have a better source.

Posted by curaga on Dec. 15 2007,12:16
http://wnd.katei.fi/gccopt/

He got 10-16% speed increase in Vorbis enc/dec on 32bit, and going 64bit added about 30% on top of that (32bit unoptimized vs 64bit optimized)


My own experiences also support a figure about 15%.

Posted by WDef on Dec. 15 2007,12:24
Ok, that ref is consistent with the conventional wisdom.  He got that big improvement with multimedia encoding, which is cpu intensive.  Cpu intensive progs can be expected to benefit more from optimization than most other progs.

Quote
Optimizing for i686 instead of i386 may speed up processing multimedia contents, but ordinary user never notices any difference.

Posted by WDef on Dec. 17 2007,21:35
That doesn't make your post any less handy a summary, btw.
Posted by stupid_idiot on Dec. 19 2007,05:26
Hi Curaga:
I think
Code Sample
-Wl,-rpath -Wl,<LIBDIR>
can be abbreviated as
Code Sample
-Wl,-rpath,<LIBDIR>
At least, that is what I use!

Posted by stupid_idiot on Dec. 19 2007,05:37
Also, would like to suggest some other flags:

(1a) '-fdata-sections -ffunction-sections':
QUOTE, GCC 3.3.6 manual:
Quote
Place each function or data item into its own section in the output file if the target supports arbitrary sections. The name of the function or the name of the data item determines the section's name in the output file.
(1b) '-Wl,--gc-sections':
This linker flag will strip out sections that contain only unused code (aka "dead code"), thereby decreasing object size. For this flag to have any effect, you must compile your code with the flags '-fdata-sections' and '-ffunction-sections'.

(2) '-fomit-frame-pointer' increases speed for C code without increasing object size.
When using g++-3.3.x, '-fomit-frame-pointer' increases object size for C++ code noticeably, so I don't use it for C++.

(3) '-fno-exceptions' (C++ only):
QUOTE, GCC 3.3.6 manual:
Quote
-fexceptions
Enable exception handling. Generates extra code needed to propagate exceptions. For some targets, this implies GCC will generate frame unwind information for all functions, which can produce significant data size overhead, although it does not affect execution. If you do not specify this option, GCC will enable it by default for languages like C++ which normally require exception handling, and disable it for languages like C that do not normally require it.[...]
'-fno-exceptions' is the opposite of '-fexceptions'. '-fno-exceptions' disables the generation of exception-handling code (by default enabled for C++). In short, '-fno-exceptions' decreases C++ object size. However, 'fno-exceptions' will not work for source code that makes use of exception handling. In such cases, you will probably get this error message:
Code Sample
<FOO>.cpp: In method `<BAR>': <FOO>.cpp:<LINE_NUM>: exception handling disabled, use -fexceptions to enable
This tells you that '-fno-exceptions' cannot be used.

(4) '-fno-rtti' (C++ only):
QUOTE, GCC manual:
Quote
Disable generation of information about every class with virtual functions for use by the C++ runtime type identification features (`dynamic_cast' and `typeid'). If you don't use those parts of the language, you can save some space by using this flag. Note that exception handling uses the same information, but it will generate it as needed.
In short, '-fno-rtti' decreases C++ object size, especially for larger programs/libraries. However, '-fno-rtti' will not work for source code that makes use of the < run-time type information (RTTI) > feature of C++. In such cases, you will probably get many errors like this:
Code Sample
undefined reference to `typeinfo for foo'
This tells you that '-fno-rtti' cannot be used.


In summary, these are the compiler/linker flags I normally use:
CFLAGS='-Os -fdata-sections -ffunction-sections -fomit-frame-pointer'
CXXFLAGS='-Os -fdata-sections -ffunction-sections -fno-exceptions -fno-rtti'
LDFLAGS='-Wl,--gc-sections'

Posted by stupid_idiot on Dec. 19 2007,05:38
(ld is the GNU linker, and we pass options to ld through gcc by using '-Wl,<blabla>')

Notes regarding garbage-collection ('-Wl,--gc-sections'):
(1) The '--gc-sections' feature of ld does not work with all software.
In such cases, you will see compiler errors, concerning 'undefined reference to [...]' or 'redefinition of [...]' in certain sections (i.e. sections created using '-fdata-sections' and '-ffunction-sections').
If so, you should probably re-configure ('make distclean', re-run 'configure') without the '-fdata-sections -ffunction-sections' and '-Wl,--gc-sections' flags, then re-compile.
The reason is because '-fdata-sections' and '-ffunction-sections' adds to the size of objects. If we know that '--gc-sections' doesn't work, then we should disable those flags.
(2) ld is part of the GNU binutils package.
Versions of ld equal to 2.15 or older can only make use of the '--gc-sections' feature in conjunction with the '/-static'/ flag -- i.e. when linking static executables. To make use of '--gc-sections' under all circumstances, you must use a version of ld equal to 2.16 or newer.
Also, it seems (I may be wrong) the '--gc-sections' algorithm is refined with each new version of binutils, so it is probably best to use the latest version of binutils (currently version 2.18).

Notes regarding CXXFLAGS:
(1) '-fno-exceptions' and '-fno-rtti' are a matter of trial and error. If compilation fails with a particular flag, we re-configure and re-compile without that flag.
(2) '-fno-exceptions' and '-fno-rtti' concern different C++ features. Often-times, only one or the other will work. Sometimes, both of them will not work.

Posted by stupid_idiot on Dec. 19 2007,06:07
Another note:
The '-Os' optimization level is known to break certain apps. So far, the cases I've seen:
(1) Xpdf: '-Os' causes Xpdf to segfault when scrolling through a document. '-O2' works fine.
(2) x264: '-Os' causes encoding to be unable to start due to a segmentation error in a certain SSE2 function. '-O2' works fine. I am aware that SSE code is sensitive to 'alignment' -- although the actual details are all mumbo-jumbo to me. Just a guess -- I wouldn't be surprised if '-Os' "aligns" code somewhat differently from '-O2'.

Posted by curaga on Dec. 19 2007,13:22
Are those (-Os errors) from using gcc 3.3?
I haven't had any problems with 4.1.2, but I usually use -O3 anyway.

SSE/MMX/whatever code is not necessarily sensitive to aligning. It's usually assembly, and it depends on the coding if it can be moved or not..

Posted by stupid_idiot on Dec. 19 2007,19:06
Oh -- thanks for the SSE info. Yes you are right -- the error in x264 was in an assembly function.

If I remember correctly, the '-Os' errors (with both Xpdf and x264) occur both with gcc 3.x and gcc 4.x. I seem to remember I tried both gcc 3.3 and 4.1 (both Xpdf and x264).

Also:
As I understand it, '-O3' increases object size dramatically, unless we use the profiling feature ('-fprofile-generate' and '-fprofile-use'), in which case, the output is both fast AND relatively compact, which is great. :laugh:

Powered by Ikonboard 3.1.2a
Ikonboard © 2001 Jarvis Entertainment Group, Inc.