Optimizing

From Buzztrax

Revision as of 11:03, 21 February 2011 by Ensonic (Talk | contribs)

Jump to: navigation, search

This page is about ideas of how to optimize resource usage of linux apps.

Contents

Compiling

Auto vectorization

Build with these flags to get reports which loops get vectorized and the reasons why some are not.

make CFLAGS="-O3 -ffast-math -msse -ftree-vectorize -ftree-vectorizer-verbose=2"

One can also use:

make CFLAGS="-O3 -ffast-math -msse -ftree-vectorize -fdump-tree-vect-details"

Then the results are written to .vect files.

Linking

--as-needed flag

see discussion about --as-needed flag

elf visibility

see GCC and elf visibility

One can filter exported symbols by using a regexps:

libbuzztard_core_la_LDFLAGS = -export-symbols-regex ^_?\(bt_\|Bt\|BT_\).*

Analyzing

strace

strace is a good too to check whats going on. Here are two examples:

strace -e trace=file 2>trace.log.0003 ./bt-cmd --command=info --input-file=../share/buzztard/songs/simple2.xml
strace -c 2>trace.sum.0003 ./bt-cmd --command=info --input-file=../share/buzztard/songs/simple2.xml

Additionally we can use strace together with plot-timeline.py:

strace -ttt -f -o /tmp/logfile.strace my-program
plot-timeline.py -o prettygraph.png /tmp/logfile.strace

We can also test if bt-edit operates tickless (which it doesn't yet, because of the cpu-monitor),

strace -ttt -e poll -p `pidof bt-edit`

oprofile

To collect data run

opcontrol --reset
opcontrol --start
<run programm>
opcontrol --stop
opcontrol --dump

opcontrol --shutdown

To analyse profiling data:

opreport -l | head -n20
opreport -l /home/ensonic/lib/libx* | head -n20
opannotate --source --output-dir=/home/ensonic/temp/libx /home/ensonic/lib/libx*

Here is a nice script to render the callgraph as a image.

 # set callgraph depth
 opcontrol --callgraph=16
 opcontrol --separate=kernel
 opcontrol --reset
 opcontrol --start 
 <run programm>
 opcontrol --stop
 opcontrol --dump
 # make report
 opreport -cf | gprof2dot.py -f oprofile | dot -Tpng -o output.png

time & co

When meassuring times by using the with time samples in the code, use clock_gettime(CLOCK_MONOTONIC) instead gettimeofday(). Also try to provide the same environment when comparing runs. Thing one can do:

  • stop backround activities
    • /etc/init.d/cron stop
  • flush caches (original value is 0): sync; echo 3 > /proc/sys/vm/drop_caches

linux perf-tools

The linux kernel comes with nice meassurements tools these days too.

 perf record -fg -o /tmp/perf.data ./buzztard-edit
 perf report -g -i /tmp/perf.data

On Ubuntu you might need to call the right perf version directly due to bugs in the wrapper script.

 perf_2.6.32-22 record -fg -o /tmp/perf.data ./buzztard-edit
 perf_2.6.32-22 report -g -i /tmp/perf.data

Memory usage

use g_alloca

Alloca reserves memory on the stack. It is a bit more tedious to handle (maybe we can wrap it up in a macro), but has several advantages. It only works if we just temporaily need the space. Advantages are

  • its fast
  • we don't need to free the stuff
  • it does not fragment memory space

Instead of doing:

gchar *status=g_strdup_printf(_("Loading file \"%s\""),file_name);
g_object_set(G_OBJECT(self),"status",status,NULL);
g_free(status);

do it like below:

gchar *status=g_alloca(strlen(_("Loading file \"%s\""))+strlen(file_name)-1);
g_sprintf(status,_("Loading file \"%s\""),file_name);
g_object_set(G_OBJECT(self),"status",status,NULL);

It is not trivial to write it as a macro, as it needs to figure the length of the result string.

Support Us

Collaboration

SourceForge Logo
GStreamer Logo
Become a Friend of GNOME
Linux Sound Logo
MediaWiki
Valgrind
GNU Library Public Licence
GNU Free Documentation License 1.2