Force inlining

Force inlining

Post by glenlo » Mon, 08 Aug 2005 17:03:43


Dear All

In some C/C++ compilers, you can force the compiler to always inline a
function. E.g. in GCC, it is __attribute__ ((always_inline)); in Visual
C++ it is __forceinline. Is there any equivalent in Metrowerks,
especially if it is the syntactic equivalent position-wise of the
inline keyword?

E.g. I'd like to do this

#ifdef __GNUC__
#define INLINE __attribute((always_inline)) inline
#elsif defined(_MSC_VER)
#define INLINE __forceinline
#elsif defined(__MWERKS__)
#define INLINE ??? // what do I put here
#endif

class T
{
static INLINE call () { ... }
};

I'm aware of #pragma inline but does it (1) force inlining, (2) can
appear in the place of the inline keyword, and (3) applies to the next
function so declared?

Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com
 
 
 

Force inlining

Post by Alwy » Mon, 08 Aug 2005 23:56:33

In article < XXXX@XXXXX.COM >,



I don't know what you should have for CodeWarrior, but shouldn't you
also have:
#else
#define INLINE inline

for other possible cases?


Yes.


It has to be on a line by itself.


It is cancelled by '#pragma noinline', also on a separate line.

I'm aware that the standard C++ keyword 'inline' is only a hint to the
compiler, but I would like to know why it is insufficient for your
needs. Have you been able to show that forcing inlining in this way
actually boosts performance with optimised code generation in every case?

Please forgive me for asking such a question, but it was my impression,
but it was my impression that most optimising compilers are fairly good
at making the appropriate decision.


Alwyn

 
 
 

Force inlining

Post by Alwy » Tue, 09 Aug 2005 00:36:44

In article
< XXXX@XXXXX.COM >,


It seems like these pragmas should be used within a function:

static int proc_30(int a)
{
#pragma noinline
int tab_30[1000];
tab_30[0] = 4*a;
return(tab_30[0]);
}

< http://www.yqcomputer.com/
rise_Compiler.pdf>


Alwyn
 
 
 

Force inlining

Post by Greg » Tue, 09 Aug 2005 15:29:09


Presumably the inline question concerned the C++ compiler for
CodeWarrior 9 on Mac OS X. Neither pragma inline nor pragma noinline
are documented for the Mac compiler.

One pragma that is documented, pragma always_inline would appear to be
the right answer, except its use is "strongly deprecated" in favor of
pragma inline_depth().

As with most CW pragmas, the pragma's behavior is governed by an "on"
"off" or "reset". So

#pragma always_inline on

would enable this somewhat questionable inlining strategy.

Greg
 
 
 

Force inlining

Post by Alwy » Tue, 09 Aug 2005 19:24:00

In article < XXXX@XXXXX.COM >,


My apologies - I haven't got CodeWarrior installed currently and can't
access the documentation. However, I did find the following thread in
this newsgroup, which the OP may find of use:
< http://www.yqcomputer.com/
browse_frm/thread/eb75a5c44a912089/98fe6333eb4f3391?tvc=1&q=pragma+inline
+group:comp.sys.mac.programmer.codewarrior&hl=en#98fe6333eb4f3391>


Alwyn
 
 
 

Force inlining

Post by glenlo » Tue, 09 Aug 2005 23:06:52

All these #pragmas seem to cause inlining of the function calls after
the #pragma, not any calls to the function declared after the #pragma.

E.g.

#pragma always_inline on
func (); // so this is inlined
#pragma always_inline off

but what I wanted was

#pragma force_inline on
void func () // any call to func will be inlined
{
...
}
#pragma force_inline off

Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com
 
 
 

Force inlining

Post by Greg » Wed, 10 Aug 2005 13:47:00


In that case the inline keyword is sufficient. As long as the function
being inlined meets CodeWarrior's criteria for inlining than the
compiler will inline the function (and inline routines that it calls up
to a specified depth). The only exceptions would be a function with a
never_inline attribute or in the case that inlining had been turned off
off via a pragma or target setting.

The inline pragmas and target settings really determine the extent that
the compiler will inline functions not declared as inline.

Greg
 
 
 

Force inlining

Post by glenlo » Fri, 12 Aug 2005 11:53:50

Sorry for being pedantic but the whole point of __forceinline (MSVC)
and __attribute__((always_inline)) (gcc) is to override the compiler's
heuristics for inlining a function. I've gone down the route of
increasing inline depth and limits and eventually decided against it
for my library macstl. macstl is meant to be heavily inlined but client
code need not, and heavy automatic inlining often increases compile
time. Thus it's useful for me to have macstl manually inlined and let
the clients decide whether to increase inlining for their own code or
not.

Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com
 
 
 

Force inlining

Post by Alwy » Fri, 12 Aug 2005 22:02:17

In article < XXXX@XXXXX.COM >,



From what we have been able to determine so far (authoritative
information to the contrary would of course be welcome), Metrowerks
CodeWarrior for Mac does not offer such a facility. Therefore the
standard 'inline' keyword would be your best recourse.

Incidentally, if I were a user of macstl, I would want to have some
control over the inlining myself by means of compiler switches rather
than have the author force it on me.


Alwyn
 
 
 

Force inlining

Post by Greg » Sat, 13 Aug 2005 11:31:57


Since the inline keyword is sufficient for your purposes, it is safe to
conclude that CodeWarrior has no heuristics when it comes to inlining
functions declared as "inline". In other words, CodeWarrior will never
decide for reasons of its own not to inline a function declared with
the inline keyword.

The only reason that an inline function would not be inlined would be
for one of the reasons that I already enumerated. In short if the
function is declared "inline" and the function can be inlined (that is,
it does have a variable argument list or accept or return an object
that needs destruction), then the function will be inlined.

It's no more complicated than that.

Greg
 
 
 

Force inlining

Post by Eric Alber » Sat, 13 Aug 2005 14:42:15

In article < XXXX@XXXXX.COM >,



What if the function is recursive?

-Eric

--
Eric Albert XXXX@XXXXX.COM
http://www.yqcomputer.com/
 
 
 

Force inlining

Post by Alexey Pro » Sat, 13 Aug 2005 17:09:23

n 12.08.2005 06:31, in article
XXXX@XXXXX.COM , "Greg"
< XXXX@XXXXX.COM > wrote:


But of course it is - why do you think there is an "inline depth"
preference?

The original question sounds well though-out and valid to me, but
unfortunately, I do not have an answer, too.


On 12.08.2005 09:42, in article ejalbert-0433BC.22421511082005@localhost,
"Eric Albert" < XXXX@XXXXX.COM > wrote:


Interestingly, CodeWarrior does optimize tail recursion, but still seems
to not allow inlining :). I wonder why...


inline int tail(int inVal)
{
if (inVal == 0)
return 0;

return tail(inVal - 1);
}

int main()
{
return tail(100);
}

Hunk: Kind=HUNK_GLOBAL_CODE Align=4 Class=__text,__TEXT
Name="_main"(3) Size=36
00000000: 7C0802A6 mflr r0
00000004: 38600063 li r3,99
00000008: 90010008 stw r0,8(SP)
0000000C: 9421FFC0 stwu SP,-64(SP)
00000010: 48000001 bl __Z4taili
00000014: 80010048 lwz r0,72(SP)
00000018: 38210040 addi SP,SP,64
0000001C: 7C0803A6 mtlr r0
00000020: 4E800020 blr
XRef: Kind=HUNK_XREF_MACHO_24BIT Offset=$00000010
Name="__Z4taili"(4)

<...>

Hunk: Kind=HUNK_GLOBAL_CODE Align=4 Class=__coalesced_text,__TEXT
Name="__Z4taili"(4) Size=180
Flags=MULTIDEF
00000000: 7C0802A6 mflr r0
00000004: 2C030000 cmpwi r3,$0000
00000008: 90010008 stw r0,8(SP)
0000000C: 9421FFC0 stwu SP,-64(SP)
00000010: 4082000C bne *+12 ; $0000001C
00000014: 38600000 li r3,0
00000018: 4800008C b *+140 ; $000000A4
0000001C: 3403FFFF subic. r0,r3,1
00000020: 4082000C bne *+12 ; $0000002C
00000024: 38600000 li r3,0
00000028: 4800007C b *+124 ; $000000A4
0000002C: 3403FFFE subic. r0,r3,2
00000030: 4082000C bne *+12 ; $0000003C
00000034: 38600000 li r3,0
00000038: 4800006C b *+108 ; $000000A4
0000003C: 3403FFFD subic. r0,r3,3
00000040: 4082000C bne *+12 ; $0000004C
00000044: 38600000 li r3,0
00000048: 4800005C b *+92 ; $000000A4
0000004C: 3403FFFC subic. r0,r3,4
00000050: 4082000C bne *+12 ; $0000005C
00000054: 38600000 li r3,0
00000058: 4800004C b *+76 ; $000000A4
0000005C: 3403FFFB subic. r0,r3,5
00000060: 4082000C bne *+12 ; $0000006C
00000064: 38600000 li r3,0
00000068: 4800003C b *+60 ; $000000A4
0000006C: 3403FFFA subic. r0,r3,6
00000070: 4082000C bne *+12 ; $0000007C
00000074: 38600000 li r3,0
00000078: 4800002C b *+44 ; $000000A4
0000007C: 3403FFF9 subic. r0,r3,7
00000080: 4082000C bne *+12 ; $0000008C
00000084: 38600000 li r3,0
00000088: 4800001C b *+28 ; $000000A4
0000008C: 3403FFF8 subic. r0,r3,8
00000090: 4082000C bne *+12 ; $0000009C
00000094: 38600000 li r3,0
00000098: 4800000C b *+12 ; $000000A4
0000009C: 3863FFF7 subi r3,r3,9
000000A0: 48000001 bl __Z4taili
000000A4: 80010048
 
 
 

Force inlining

Post by Greg » Sun, 14 Aug 2005 14:47:59

lexey Proskuryakov wrote:

The inline depth setting specifies the maximum limit for nested inline
functions. Reaching the inline depth limit was one of the reasons I
cited for why a function would not be inlined.




Let me answer it for a third time then. CodeWarrior, unlike most C++
compilers, gives the developer complete control to ensure that a
function will be inlined. All the developer has to do is go down the
checklist. The compiler has no discretion of its own when it comes to
functions declared with the inline keyward. That is the simple reason
why there is no explicit mechanism to disable the compiler's inline
heuristics. It has none. (The compiler does have heuristics for
inlining functions not declared inline, but that is another matter).



Well, you won't have to wonder any longer. The reason that the call to
tail was not inlined in main, is because tail is a recursive inlined
function. A recursive inlined function will always reach the maximum
inline depth inside the function itself because its maximum recursion
depth has no limit. As a consequence, tail will just keep getting
longer and longer the higher the inline depth limit. A limit of 1000
creates a very large tail routine that takes quite some time to
compile.


Let's see what happens to main when the optimizer is turned on, the
inline depth set to 200, and bottom_up inlining disabled.

Compiling this program:

#pragma optimization_level 4
#pragma inline_depth(200)
#pragma inline_max_total_size(1000)
#pragma inline_bottom_up off

inline int tail(int inVal);

int tail(int inVal)
{
if (inVal == 0)
return 0;

return tail(inVal - 1);
}

int main()
{
return tail(100);
}


produces this disassembly for main():


Hunk: Kind=HUNK_GLOBAL_CODE Align=4 Class=__text,__TEXT
Name="_main"(6) Size=8
00000000: 38600000 li r3,0
00000004: 4E800020 blr

Although the tail routine can be expanded to a very high number of
levels, the CodeWarrior optimizer has realized that the the function
call tail(100) always returns 0. So the main routine simply exits with
the same 0 that calling tail(100) would have returned after executing a
lot more than just two instructions.

Now that's an impressive optimization, I will admit.

Greg

 
 
 

Force inlining

Post by Eric Alber » Sun, 14 Aug 2005 16:06:10

In article < XXXX@XXXXX.COM >,






Agreed. I'm impressed.

-Eric

--
Eric Albert XXXX@XXXXX.COM
http://www.yqcomputer.com/
 
 
 

Force inlining

Post by apr » Mon, 15 Aug 2005 00:53:20

>> The original question sounds well though-out and valid to me, but


The question was - how to do that in a library, without imposing any
requirements on the application that uses it. I didn't see an answer to
that question.



Wrong answer, sorry :(. This is tail recursion, which can be
transformed into a loop - read
< http://www.yqcomputer.com/ ; for more details.

- WBR, Alexey Proskuryakov