If I use a function call that returns an integer value as the endpoint
of a for loop, is the function evaluated each time through the loop, or
does the compiler optimize this for me? For example
for(i; i < strlen(str); i++)
Since strlen evaluates to a constant, it would make sense that the
compiler optimize that, right?
Just curious.
TIA
Paul Floyd - 27 Jun 2007 19:09 GMT
> If I use a function call that returns an integer value as the endpoint
> of a for loop, is the function evaluated each time through the loop, or
[quoted text clipped - 4 lines]
> Since strlen evaluates to a constant, it would make sense that the
> compiler optimize that, right?
Hi
If 'str' is constant throughout the loop, then yes, I'd expect a
compiler to hoist the call out of the loop.
Personally I don't like second guessing the compiler. In any case, where
I work, our nightly regression tests are done with debug builds. In this
case, you pay for 'pessimized' code.
A bientot
Paul
Tom Harrington - 27 Jun 2007 19:16 GMT
> If I use a function call that returns an integer value as the endpoint
> of a for loop, is the function evaluated each time through the loop, or
[quoted text clipped - 6 lines]
> Just curious.
> TIA
It's dangerous to second-guess compiler optimizations unless you know
the compiler internals pretty well. However... In this case I'd expect
that the compiler would check whether the loop looked like it might
modify str, and decide whether to optimize the strlen() call based on
the result.
Could be wrong though, I certainly wouldn't rely on it.

Signature
Tom "Tom" Harrington
MondoMouse makes your mouse mightier
See http://www.atomicbird.com/mondomouse/
glenn andreas - 27 Jun 2007 19:40 GMT
> If I use a function call that returns an integer value as the endpoint
> of a for loop, is the function evaluated each time through the loop, or
[quoted text clipped - 6 lines]
> Just curious.
> TIA
The compiler has no way to know that strlen will always return the same
result if it has the same input, and that it doesn't have any side
effects, so there is no way to know that it is safe to optimize it into
a single call (even though it could know if you are not modifying str in
the body of your loop)
The use of "const" modifiers can provide some clues, but not enough to
guarantee both conditions. (It can say that a parameter is not changed,
for example, or that a state of a specific object isn't change for C++
member functions)
On the other hand, if you have
for (i;i < a * 5 + b * 3; i++)
(where neither a nor b are changed in the loop), the compiler is free to
calculate the expression once and reuse it.
Jens Ayton - 27 Jun 2007 23:54 GMT
glenn andreas:
> The compiler has no way to know that strlen will always return the same
> result if it has the same input, and that it doesn't have any side
> effects, so there is no way to know that it is safe to optimize it into
> a single call (even though it could know if you are not modifying str in
> the body of your loop)
Actually, it can. strlen() is a standard library function, and it is illegal
to use the identifier for any purpose other than that standard library
function. Therefore, the compiler can make assumptions about its semantics, as
it does for many standard library functions.
In the particular case of GCC, this semantic constraint can be specified with
the compiler-specific declaration __attribute__((pure)), and in fact strlen()
is mentioned as an example of a function with pure semantics in the function
attributes documentation:
Many functions have no effects except the return value and their return
value depends only on the parameters and/or global variables. Such a
function can be subject to common subexpression elimination and loop
optimization just as an arithmetic operator would be.
...
Some of common examples of pure functions are strlen or memcmp.
http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Function-Attributes.html
However, for the optimization in question to actually be made, the compiler
would need to be able to determine that the string in question is not being
modified in the loop. This is tricky if any char pointer, or indeed any
pointer, is written to.

Signature
Jens Ayton
Lorenzo Thurman - 28 Jun 2007 15:24 GMT
> glenn andreas:
>> The compiler has no way to know that strlen will always return the same
[quoted text clipped - 26 lines]
> modified in the loop. This is tricky if any char pointer, or indeed any
> pointer, is written to.
Thanks, this is interesting reading.
Lorenzo Thurman - 28 Jun 2007 14:13 GMT
>> If I use a function call that returns an integer value as the endpoint
>> of a for loop, is the function evaluated each time through the loop, or
[quoted text clipped - 24 lines]
> (where neither a nor b are changed in the loop), the compiler is free to
> calculate the expression once and reuse it.
Thanks all! I was just curious as I wrote some code and just used the
return from the function call as a shortcut to declaring a variable.
Then I thought, "hey, will that have to be evaluated everytime?" I guess
it probably makes sense to just declare and use that as the loop
counter, but I'll use gdb -S and find out.
Thanks again
Michael Ash - 28 Jun 2007 16:29 GMT
> Thanks all! I was just curious as I wrote some code and just used the
> return from the function call as a shortcut to declaring a variable.
> Then I thought, "hey, will that have to be evaluated everytime?" I guess
> it probably makes sense to just declare and use that as the loop
> counter, but I'll use gdb -S and find out.
As a general principle I would say that you should write your code in
whatever way is most readable, and not worry about the runtime cost until
and unless it proves to be a problem.
However this is a somewhat special case where you're not just changing the
speed of the code, but changing its algorithmic complexity. By moving the
strlen into the loop, you've changed from a linear to a quadratic runtime,
meaning that if you double your data size you'll quadruple your running
time. This can get painful if you have the potential for really gigantic
strings. If this function could possibly be receiving multimegabyte C
strings then I would say it's best to break with principle a bit and write
the "fast" way to avoid this potential problem. If you know that your
strings will always be small then, as usual, don't fix it until it
actually breaks.

Signature
Michael Ash
Rogue Amoeba Software
Reinder Verlinde - 27 Jun 2007 19:46 GMT
> If I use a function call that returns an integer value as the endpoint
> of a for loop, is the function evaluated each time through the loop, or
[quoted text clipped - 4 lines]
> Since strlen evaluates to a constant, it would make sense that the
> compiler optimize that, right?
You know that, but how can the compiler?
A compiler doing that would have to know:
a) that you are calling the C library 'strlen'
b) that that function has no side effects, ever
c) that that function, given identical data,
always returns the same value
d) that there is no way that the value of 'str' or any data
it points to could change throughout the loop
a) is possible, I am not sure that b) is part of the C spec, c) is
possible, d) is as good as impossible in C.
Consider
char a[20];
...
char * str = &a[3];
for(i = 0; i < strlen(str); i++) {
someFunction(a);
}
where 'someFunction' is a function in a library that only the linker
will see. In theory, the linker could do a final optimization pass,
inspecting the actual code, and move things out of the loop. I do not
know whether such compilers exist.
In practice, a compiler such as gcc has C-like modes (where gcc no
longer is an ISO C compiler) that will just assume a) for functions such
as strlen. I do not know whether gcc also assumes anything about b), and
c), so I do not know either whether it will do this optimization for you.
I advise you to get in the habit of doing
int const len = strlen(str);
for(int i = 0; i < len; ++i){
...
}
since that will always work, even with
int const limit = someThirdPartyLibraryFunction(someValue);
for(int i = 0; i < limit; ++i){
...
}
where the compiler is unlikely to be able to know that it can move the
function call out of the loop.
As a bonus, the latter also is way easier when single-stepping in most
debuggers.
Reinder
vze35xda@verizon.net - 28 Jun 2007 01:24 GMT
On Jun 27, 1:13 pm, Lorenzo Thurman <lore...@diespammerhethurmans.com>
wrote:
> If I use a function call that returns an integer value as the endpoint
> of a for loop, is the function evaluated each time through the loop, or
[quoted text clipped - 6 lines]
> Just curious.
> TIA
The easiest way is to compile this in XCode with various levels of
optimization in the way you are doing to use then look at the ASM code
generated. Not real simple but you should be able to see the changes
that optimization does. NOTE: What you learn here would only be
applicable to gcc (for XCode) YMMV with other compilers.
--jim
Tony Walton - 28 Jun 2007 13:14 GMT
> If I use a function call that returns an integer value as the endpoint
> of a for loop, is the function evaluated each time through the loop, or
[quoted text clipped - 5 lines]
> compiler optimize that, right?
> Just curious.
It's been a while since I looked at Intel assembler code, but it seems
that what the compiler does is to inline the strlen call. Rather than
making an explicit call to strlen() the code just loads up a pointer to
the string and uses the "repeat loop" built into the CPU, namely
"point %edi at the string"
repnz
scasb
That's with -O0 (no optimisation, the default) and -O1. Higher levels
of optimisation take out the string length calculation from the
finished code completely, simply doing a
L6:
addl $1, %eax
cmpl $5, %eax
jbe L6
loop, in other words a direct compare with the length of the string (6
with "string" as an example string), determined at compile time.
If you do things to the string within the loop the compiler uses the
"repnz/scasb" inlining even at higher levels of optimisation.
Hmmm... I wonder what it looks like on PPC...

Signature
Tony
Jeffrey Dutky - 29 Jun 2007 00:11 GMT
On Jun 27, 1:13 pm, Lorenzo Thurman <lore...@diespammerhethurmans.com>
wrote:
> If I use a function call that returns an integer value as the endpoint
> of a for loop, is the function evaluated each time through the loop, or
[quoted text clipped - 6 lines]
> Just curious.
> TIA
1) the specific example you give is a common enough idiom that I might
expect most compilers to recognize it and hoist the strlen call
outside the loop body.
2) unfortunately, the strlen call can only be hoisted if there is no
possability that the loop body might changes the length of the string.
3) C's pointer aliasing semantics make it impossible (or nearly so) to
determine whether a given block of code containing pointer operations
touches any given range of memory addresses.
All of this leads to the conclusion that you shouldn't assume that the
compiler will clean up your messes: if you don't want strlen called on
each pass through the loop, don't write a loop that might call it that
way. Hoist the call out of the loop in your code (and don't worry
about allocating an extra variable).