Optimizing

From Buzztrax

Jump to: navigation, search

This page is about ideas of how to optimize resource usage of linux apps.

Compiling

Auto vectorization

Build with these flags to get reports which loops get vectorized and the reasons why some are not.

make CFLAGS="-O3 -ffast-math -msse -ftree-vectorize -ftree-vectorizer-verbose=2"

One can also use:

make CFLAGS="-O3 -ffast-math -msse -ftree-vectorize -fdump-tree-vect-details"

Then the results are written to .vect files.

Linking

--as-needed flag

see discussion about --as-needed flag

elf visibility

see GCC and elf visibility

One can filter exported symbols by using a regexps:

libbuzztrax_core_la_LDFLAGS = -export-symbols-regex ^_?\(bt_\|Bt\|BT_\).*

Analyzing

strace

strace is a good too to check whats going on. Here are two examples:

strace -e trace=file 2>trace.log.0003 ./bt-cmd --command=info --input-file=../share/buzztrax/songs/simple2.xml
strace -c 2>trace.sum.0003 ./bt-cmd --command=info --input-file=../share/buzztrax/songs/simple2.xml

Additionally we can use strace together with plot-timeline.py:

strace -ttt -f -o /tmp/logfile.strace my-program
plot-timeline.py -o prettygraph.png /tmp/logfile.strace

We can also test if bt-edit operates tickless (which it doesn't yet, because of the cpu-monitor),

strace -ttt -e poll -p `pidof bt-edit`

oprofile

To collect data run

opcontrol --reset
opcontrol --start
./buzztrax-edit
opcontrol --stop
opcontrol --dump

opcontrol --shutdown

To analyse profiling data:

opreport -l | head -n20
opreport -l /home/ensonic/lib/libx* | head -n20
opannotate --source --output-dir=/home/ensonic/temp/libx /home/ensonic/lib/libx*

Here is a nice script to render the callgraph as a image.

 # set callgraph depth
 opcontrol --callgraph=16
 opcontrol --separate=kernel
 opcontrol --reset
 opcontrol --start 
 ./buzztrax-edit
 opcontrol --stop
 opcontrol --dump
 # make report
 opreport -cf | gprof2dot.py -f oprofile | dot -Tpng -o output.png

Since a while oprofile also comes with a perf like wrapper:

operf -g ./buzztrax-edit
opreport -cf | gprof2dot.py -f oprofile | dot -Tpng -o output.png

linux perf-tools

The linux kernel comes with nice meassurements tools these days too.

 perf record -fg -o /tmp/perf.data ./buzztrax-edit
 perf report -g -i /tmp/perf.data
 perf script | gprof2dot.py -f perf | dot -Tpng -o perf.png

On Ubuntu you might need to call the right perf version directly due to bugs in the wrapper script.

 perf_2.6.32-22 record ...

time & co

When measuring times by using the with time samples in the code, use clock_gettime(CLOCK_MONOTONIC) instead gettimeofday(). Also try to provide the same environment when comparing runs. Things one can do:

  • stop background activities
    • /etc/init.d/cron stop
  • flush caches (original value is 0): sync; echo 3 > /proc/sys/vm/drop_caches

Memory usage

use g_alloca

Alloca reserves memory on the stack. It is a bit more tedious to handle (maybe we can wrap it up in a macro), but has several advantages. It only works if we just temporaily need the space. Advantages are

  • its fast
  • we don't need to free the stuff
  • it does not fragment memory space

Instead of doing:

gchar *status=g_strdup_printf(_("Loading file \"%s\""),file_name);
g_object_set(G_OBJECT(self),"status",status,NULL);
g_free(status);

do it like below:

gchar *status=g_alloca(strlen(_("Loading file \"%s\""))+strlen(file_name)-1);
g_sprintf(status,_("Loading file \"%s\""),file_name);
g_object_set(G_OBJECT(self),"status",status,NULL);

It is not trivial to write it as a macro, as it needs to figure the length of the result string.

Support Us

Collaboration

GStreamer Logo
Become a Friend of GNOME
Linux Sound Logo
MediaWiki
Valgrind
GNU Library Public Licence
GNU Free Documentation License 1.2