Optimizing
From Buzztrax
This page is about ideas of how to optimize resource usage of linux apps.
Contents |
Compiling
Auto vectorization
Build with these flags to get reports which loops get vectorized and the reasons why some are not.
make CFLAGS="-O3 -ffast-math -msse -ftree-vectorize -ftree-vectorizer-verbose=2"
One can also use:
make CFLAGS="-O3 -ffast-math -msse -ftree-vectorize -fdump-tree-vect-details"
Then the results are written to .vect files.
Linking
--as-needed flag
see discussion about --as-needed flag
elf visibility
One can filter exported symbols by using a regexps:
libbuzztard_core_la_LDFLAGS = -export-symbols-regex ^_?\(bt_\|Bt\|BT_\).*
Analyzing
strace
strace is a good too to check whats going on. Here are two examples:
strace -e trace=file 2>trace.log.0003 ./bt-cmd --command=info --input-file=../share/buzztard/songs/simple2.xml strace -c 2>trace.sum.0003 ./bt-cmd --command=info --input-file=../share/buzztard/songs/simple2.xml
Additionally we can use strace together with plot-timeline.py:
strace -ttt -f -o /tmp/logfile.strace my-program plot-timeline.py -o prettygraph.png /tmp/logfile.strace
We can also test if bt-edit operates tickless (which it doesn't yet, because of the cpu-monitor),
strace -ttt -e poll -p `pidof bt-edit`
oprofile
To collect data run
opcontrol --reset opcontrol --start <run programm> opcontrol --stop opcontrol --dump opcontrol --shutdown
To analyse profiling data:
opreport -l | head -n20 opreport -l /home/ensonic/lib/libx* | head -n20 opannotate --source --output-dir=/home/ensonic/temp/libx /home/ensonic/lib/libx*
Here is a nice script to render the callgraph as a image.
# set callgraph depth opcontrol --callgraph=16 opcontrol --separate=kernel opcontrol --reset opcontrol --start <run programm> opcontrol --stop opcontrol --dump # make report opreport -cf | gprof2dot.py -f oprofile | dot -Tpng -o output.png
time & co
When meassuring times by using the with time samples in the code, use clock_gettime(CLOCK_MONOTONIC) instead gettimeofday(). Also try to provide the same environment when comparing runs. Thing one can do:
- stop backround activities
- /etc/init.d/cron stop
- flush caches (original value is 0): sync; echo 3 > /proc/sys/vm/drop_caches
linux perf-tools
The linux kernel comes with nice meassurements tools these days too.
perf record -fg -o /tmp/perf.data ./buzztard-edit
perf report -g -i /tmp/perf.data
On Ubuntu you might need to call the right perf version directly due to bugs in the wrapper script.
perf_2.6.32-22 record -fg -o /tmp/perf.data ./buzztard-edit perf_2.6.32-22 report -g -i /tmp/perf.data
Memory usage
use g_alloca
Alloca reserves memory on the stack. It is a bit more tedious to handle (maybe we can wrap it up in a macro), but has several advantages. It only works if we just temporaily need the space. Advantages are
- its fast
- we don't need to free the stuff
- it does not fragment memory space
Instead of doing:
gchar *status=g_strdup_printf(_("Loading file \"%s\""),file_name);
g_object_set(G_OBJECT(self),"status",status,NULL);
g_free(status);
do it like below:
gchar *status=g_alloca(strlen(_("Loading file \"%s\""))+strlen(file_name)-1);
g_sprintf(status,_("Loading file \"%s\""),file_name);
g_object_set(G_OBJECT(self),"status",status,NULL);
It is not trivial to write it as a macro, as it needs to figure the length of the result string.



