Forum: Programming and Scripting
Topic: Compiler magic
started by: curaga
Posted by curaga on Dec. 15 2007,10:54I thought I'd gather some tips here, along with explanations.
-Os Like O2, but also optimizes for size. This is a very good option; smaller programs take less ram too. The linux kernel is a good example: the recent 2.6 kernels have an experimental option to turn -Os on. To me it resulted in a kernel that took an entire mb less ram! And because this was for a 16mb machine, it really showed: free ram right after boot increased from 12mb to 13mb. (including 3 gettys running)
-O2 The usual level
-O Smallest optimization
-O0 That's O zero. No optimization at all.
Optimization for certain processor: Tests have shown this (with -O3) makes programs about 15% faster if run on the cpu optimized for. The difference is biggest with i586 chips, because of their unique design, they get about 20% more speed.
-march=pentium2 Sets minimum processor to pentium2. Also optimizes for it.
-mcpu=pentium2 Optimizes for P2, but doesn't touch the minimum; ie this is still runnable on i386.
All the above go into CFLAGS/CXXFLAGS. Everything below goes into LDFLAGS.
-Wl,-rpath -Wl,/opt/prog/lib A linker option to add /opt/prog/lib to the runtime library search path. This is useful for not needing to create a wrapper script to run the app, that usually sets LD_LIBRARY_PATH to that dir so the app finds it's libraries.
-Wl,-as-needed This is a really interesting option. Let's say we have a gtk1 app that only uses libgdk. But gtk-config --libs (that the makefile uses) causes it to link with all the gtk1 libs, causing slow start because all these libraries are loaded into memory. But if you give this flag, the program will only get linked with the libraries it uses; resulting in a faster startup, and a cleaner looking ldd output. *Warning* this requires binutils 2.17.
-s Strips the results during linking, saving you some time.
PS: for compiling speed, if you ever compile GCC yourself, be sure to do it with profiling optimization. It speeds up C compiling up to 9%.
Posted by WDef on Dec. 15 2007,11:34Hi Curaga, nice summary.
Do you have a ref for that? Conventional wisdom has it that performance differences using optimizations within the x86 family would usually be much less than this, more like <=4% for most progs, or so I had been led to believe. But you might have a better source.
Posted by curaga on Dec. 15 2007,12:16http://wnd.katei.fi/gccopt/
He got 10-16% speed increase in Vorbis enc/dec on 32bit, and going 64bit added about 30% on top of that (32bit unoptimized vs 64bit optimized)
My own experiences also support a figure about 15%.
Posted by WDef on Dec. 15 2007,12:24Ok, that ref is consistent with the conventional wisdom. He got that big improvement with multimedia encoding, which is cpu intensive. Cpu intensive progs can be expected to benefit more from optimization than most other progs.
Posted by WDef on Dec. 17 2007,21:35That doesn't make your post any less handy a summary, btw.
Posted by stupid_idiot on Dec. 19 2007,05:26Hi Curaga:
Posted by stupid_idiot on Dec. 19 2007,05:37Also, would like to suggest some other flags:
(1a) '-fdata-sections -ffunction-sections':
QUOTE, GCC 3.3.6 manual:
This linker flag will strip out sections that contain only unused code (aka "dead code"), thereby decreasing object size. For this flag to have any effect, you must compile your code with the flags '-fdata-sections' and '-ffunction-sections'.
(2) '-fomit-frame-pointer' increases speed for C code without increasing object size.
When using g++-3.3.x, '-fomit-frame-pointer' increases object size for C++ code noticeably, so I don't use it for C++.
(3) '-fno-exceptions' (C++ only):
QUOTE, GCC 3.3.6 manual:
(4) '-fno-rtti' (C++ only):
QUOTE, GCC manual:
In summary, these are the compiler/linker flags I normally use:
CFLAGS='-Os -fdata-sections -ffunction-sections -fomit-frame-pointer'
CXXFLAGS='-Os -fdata-sections -ffunction-sections -fno-exceptions -fno-rtti'
Posted by stupid_idiot on Dec. 19 2007,05:38(ld is the GNU linker, and we pass options to ld through gcc by using '-Wl,<blabla>')
Notes regarding garbage-collection ('-Wl,--gc-sections'):
(1) The '--gc-sections' feature of ld does not work with all software.
In such cases, you will see compiler errors, concerning 'undefined reference to [...]' or 'redefinition of [...]' in certain sections (i.e. sections created using '-fdata-sections' and '-ffunction-sections').
If so, you should probably re-configure ('make distclean', re-run 'configure') without the '-fdata-sections -ffunction-sections' and '-Wl,--gc-sections' flags, then re-compile.
The reason is because '-fdata-sections' and '-ffunction-sections' adds to the size of objects. If we know that '--gc-sections' doesn't work, then we should disable those flags.
(2) ld is part of the GNU binutils package.
Versions of ld equal to 2.15 or older can only make use of the '--gc-sections' feature in conjunction with the '/-static'/ flag -- i.e. when linking static executables. To make use of '--gc-sections' under all circumstances, you must use a version of ld equal to 2.16 or newer.
Also, it seems (I may be wrong) the '--gc-sections' algorithm is refined with each new version of binutils, so it is probably best to use the latest version of binutils (currently version 2.18).
Notes regarding CXXFLAGS:
(1) '-fno-exceptions' and '-fno-rtti' are a matter of trial and error. If compilation fails with a particular flag, we re-configure and re-compile without that flag.
(2) '-fno-exceptions' and '-fno-rtti' concern different C++ features. Often-times, only one or the other will work. Sometimes, both of them will not work.
Posted by stupid_idiot on Dec. 19 2007,06:07Another note:
The '-Os' optimization level is known to break certain apps. So far, the cases I've seen:
(1) Xpdf: '-Os' causes Xpdf to segfault when scrolling through a document. '-O2' works fine.
(2) x264: '-Os' causes encoding to be unable to start due to a segmentation error in a certain SSE2 function. '-O2' works fine. I am aware that SSE code is sensitive to 'alignment' -- although the actual details are all mumbo-jumbo to me. Just a guess -- I wouldn't be surprised if '-Os' "aligns" code somewhat differently from '-O2'.
Posted by curaga on Dec. 19 2007,13:22Are those (-Os errors) from using gcc 3.3?
I haven't had any problems with 4.1.2, but I usually use -O3 anyway.
SSE/MMX/whatever code is not necessarily sensitive to aligning. It's usually assembly, and it depends on the coding if it can be moved or not..
Posted by stupid_idiot on Dec. 19 2007,19:06Oh -- thanks for the SSE info. Yes you are right -- the error in x264 was in an assembly function.
If I remember correctly, the '-Os' errors (with both Xpdf and x264) occur both with gcc 3.x and gcc 4.x. I seem to remember I tried both gcc 3.3 and 4.1 (both Xpdf and x264).
As I understand it, '-O3' increases object size dramatically, unless we use the profiling feature ('-fprofile-generate' and '-fprofile-use'), in which case, the output is both fast AND relatively compact, which is great.