Optimization questions

Optimization questions

Post by Himansh » Sat, 13 Aug 2005 05:33:27


Hi all!

Greetings.

I am learning DSP and am using C for all my algorithm implementation. I
need
to tweak my code for better performance. But I have a few questions.
Plz help
in getting through with them.

1. If I need to call a function in a tight loop, Would using function
pointer
a better way? Why? Doesn't it incorporate function calling overhead?

2. Is multiplication with reciprocal better than division?

3. What performance penalty does "test and branch" put on execution?
How can I
make better use of CPU pipelining? I have access to both G4 and P4.

4. Which is better way of accessing arrays? Linear addressing or array
indexing?

Where is the best place to learning about using optimized pipelining?


Lots of questions, but hope I will get all the answers. My googling is
still on. :-)

Thanks and regards
--Himanshu
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
 
 
 

Optimization questions

Post by Barry Schw » Sat, 20 Aug 2005 17:24:45

On Thu, 11 Aug 2005 20:33:27 -0000, "Himanshu" < XXXX@XXXXX.COM >



The standard does not address this. It is a matter of implementation.
I would be surprised if there was a difference. On my machine,
calling a subroutine involves loading the address of the routine into
a register. It makes no difference if the address is in a pointer or
in a constant created by the linker.


Another implementation matter. The rule of thumb when I was in school
was that division was *much* slower but I have no idea if it is still
true.


What do you mean by test and branch in C? There are if, switch, for,
do...while, and while. Which one are you referring to?

Another implementation matter. You are aware that ptr[4] and array[4]
and *(ptr+4) and *(array+4) are guaranteed to be semantically
identical. I would be surprised if the generated code was different.


In a discussion specific to your system. Not in a group dedicated to
a standard and portable language.




<<Remove the del for email>>
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

 
 
 

Optimization questions

Post by Ulrich Eck » Sat, 20 Aug 2005 17:39:24


The overhead of calling a function directly and via a pointer are probably
the same, although in some cases the compiler can't guess that/if the
fn-pointer doesn't change, so it will reload it from memory, meaning
additional cycles.


Can be, if the reciprocal is computed once:
// bad
for(i=0;i!=size;++i)
a[i] = b[i]/f;
// good
float rf = 1.0f/f;
for(i=0;i!=size;++i)
a[i] = b[i]*rf;

This is dependant on the processor and compiler though.


Off-topic.


Not sure what exactly you mean, but imagine having to compute the real
address of a certain element.


Off-topic, try one of the assembler oriented groups.


--
Questions ?
see C++-FAQ Lite: http://www.yqcomputer.com/ ++-faq-lite/ first !
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
 
 
 

Optimization questions

Post by Willer » Sat, 20 Aug 2005 17:39:34


Function pointers are not usually more efficient than regular calls.
Often they are a little bit slower. I would recommend a regular call
because your compiler may be able to inline it.


If you only care about performance, yes. You should, however, also care
about numerical stability. Usually you will get a less precise answer
from x * (1/y) compared to x/y.


You can't generally control this directly from C, and anyway it depends
on how well the CPU predicts the branch. You're always better-off
avoiding branches entirely if you can. Most modern processors have
conditional-move instructions which can be used to good effect.


Write assembler. Or, write your algorithm in C in the simplest way you
know how and use some decent optimising compilers.

On the G4, use IBM's xlc compiler with the -xO5 and -qipa=level=2
options. On the P4, use Intel's icc compiler with its maximum
optimisation level.


char *a = ....;

char b = a + 128;
char b = a[128];

The above 2 lines will produce identical output. If that doesn't answer
your question it's because I didn't understand it.


Intel have an optimisation manual for each of their processors.

The processor-manual for your G4 might help, but the G4 is nowhere near
as aggressively-pipelined as the P4.

Overall, I suspect you need to be writing assembler to exploit the SSE
unit of the P4 and the VMX unit of the G4 rather than trying to do this
from pure C.
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
 
 
 

Optimization questions

Post by Hans-Bernh » Sun, 21 Aug 2005 06:51:49


No. If it makes a difference at all, it'll make it slower.


Impossible to tell from the point of view of the language. The only
way to know for sure is to time actual code on realistic input, on the
actual hardware. As Prof. Knuth put it: "Premature optimization is
the root of all evil."


See 2.


Either you and I mean very different things when we say "linear
addressing", or they're the same.


In your actual CPU's data sheets.

--
Hans-Bernhard Broeker ( XXXX@XXXXX.COM )
Even if all the snow were burnt, ashes would remain.
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
 
 
 

Optimization questions

Post by Himansh » Sun, 21 Aug 2005 13:18:03

Hi!

Greetings!

Thanks for your responses!

I believe function calling definitely has some overhead involved
otherwise why would any body justify for inline functions. Inline
functions are meant to remove that overhead! Ain't they? Moreover, I
don't believe that only assembler written code can use pipeline
optimization. If you look at apple's code for clipping the output
samples (sample audio driver), I think what they are doing is simply
optimizing the processor pipelines.

for (cntr = 0; cntr < 512; cntr++)
a[cntr] = b[cntr];

OR

for (cntr = 0; cntr < 512; cntr += 4)
{
a[cntr] = b[cntr];
a[cntr +1] = b[cntr + 1];
a[cntr + 2] = b[cntr + 2];
a[cntr + 3] = b[cntr + 3];
}

Which one of the above is better? Isn't it second one?

By test and branch I mean, any testing and branching condition,
for example,
if (somecondition)
statement1;
else
statement2;
If we are sure that somecondition will most of the time be correct,
wouldn't
if (likely(somecondition))
statement1;
else
statement2;
help better? ofcourse, if we are using gcc.

I am totally new to this. I am reading and reading. BTW, I got the
intel optimization manual. Thanks for its reference. Plz respond to my
above suggestions.

Thanks alot and regards
--Himanshu
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
 
 
 

Optimization questions

Post by Keith Thom » Sun, 21 Aug 2005 13:18:06

Barry Schwarz < XXXX@XXXXX.COM > writes:

[...]

I would expect that either both would be equivalent in performance, or
a direct call (not using an explicit function pointer) would be
slightly faster. If calling through a pointer were faster for some
reason, the compiler could easily implement a direct call as a call
through a function pointer.

(Strictly speaking, in a "direct" function call like foo(42), the name
"foo" is implicitly converted to a pointer, but in practice the
compiler is going to generate whatever code it thinks is best for the
call.)

--
Keith Thompson (The_Other_Keith) XXXX@XXXXX.COM < http://www.yqcomputer.com/ ~kst>
San Diego Supercomputer Center <*> < http://www.yqcomputer.com/ ~kst>
We must do something. This is something. Therefore, we must do this.
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
 
 
 

Optimization questions

Post by Hans-Bernh » Mon, 22 Aug 2005 14:48:44


Of course it does. I don't recally anybody saying otherwise, either.


As was said before, that's strictly impossible to say from the
point-of-view of the C programming language, which is the topic of
this newsgroup. From our point-of-view, such optimizations are "other
people's problems". They're supposed to be handled by individual
compilers and their optimization switches. How they do that it none
of the language's concern.

--
Hans-Bernhard Broeker ( XXXX@XXXXX.COM )
Even if all the snow were burnt, ashes would remain.
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.
 
 
 

Optimization questions

Post by Willer » Wed, 31 Aug 2005 19:21:35


Well, yes, but whether you call a function directly or through a pointer
makes no difference to the set-up and tear-down cost (and that's what
you asked). Inline functions are not always faster because they occupy
a lot of space in the processor's instruction-decode cache (for
processors which need such a cache).


Most definitely not. The first one is much easier to understand and
therefore easier to maintain. If the first one doesn't produce faster
code at high optimisation levels I would be very surprised. If it
doesn't, your optimising compiler isn't worthy of the name. Test it.

Rule 1: Don't hand-optimise when the compiler can do it.
Rule 2: If the compiler can't do it, use a better compiler.

These days, the odds are that if a decent (IBM xlc, Intel icc, HP c89,
Sun Studio C Compiler etc.) compiler can't optimise it then you won't be
able to without writing assembler. GCC is a bit weaker because the ones
I just mentioned all use some pretty-neat but patented techniques they
cross-license from each other.


It may be slightly better, but nowhere near as good as eliminating the
branch logic entirely. Either way, you should use profile-guided
optimisation to produce your branch-prediction hints rather than
guessing at which branch is more likely to be taken.
--
comp.lang.c.moderated - moderation address: XXXX@XXXXX.COM -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.