How gcc optimize sum from 1 to 100 into 5050
I compiled following c program with gcc -O3 -S a.c
:
#include <stdio.h>
int main() {
int sum = 0;
for (int i = 0; i <= 100; i++) {
sum += i;
}
printf("%d", sum);
}
The generated assembly code is as follow:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 15 sdk_version 10, 15, 4
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
leaq L_.str(%rip), %rdi
movl $5050, %esi ## imm = 0x13BA
xorl %eax, %eax
callq _printf
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "%d"
.subsections_via_symbols
As if GCC ran the code and notice that the loop times are determined and GCC replaced the whole calculating with the result 5050.
movl $5050, %esi
How does gcc do this kind of optimization? What's the academic name of this kind of optimization?
2 answers
-
answered 2021-01-26 07:50
klutt
I was trying to find the specific flag, but with no luck. I was able to reproduce your result with only
-O1
so what I did after that was manually adding all the flags that are enabled by-O1
. But when I did, I could not reproduce the result.And as can be read in the documentation
Not all optimizations are controlled directly by a flag. Only optimizations that have a flag are listed in this section.
So I assume this is done by something that
-O1
adds that does not have a flag.Here is what I tried:
$ gcc -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments \ -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse \ -fforward-propagate -fguess-branch-probability -fif-conversion \ -fif-conversion2 -finline-functions-called-once -fipa-profile \ -fipa-pure-const -fipa-reference -fipa-reference-addressable \ -fmerge-constants -fmove-loop-invariants -fomit-frame-pointer \ -freorder-blocks -fshrink-wrap -fshrink-wrap-separate -fsplit-wide-types \ -fssa-backprop -fssa-phiopt -ftree-bit-ccp -ftree-ccp -ftree-ch \ -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts \ -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-pta \ -ftree-scev-cprop -ftree-sink -ftree-slsr -ftree-sra -ftree-ter \ -funit-at-a-time -S k.c cc1: warning: this target machine does not have delayed branches $ grep "5050" k.s $ gcc -O1 -S k.c $ grep "5050" k.s movl $5050, %esi $
-
answered 2021-01-26 08:08
yoonghm
From reference in https://gcc.gnu.org/onlinedocs/gcc-4.5.2/gcc/Optimize-Options.html, loop unrolling (
-funroll-loops
) and forward propagation pass on RTL (-fforward-propagate
) are enabled with optimization levels-O
,-O2
,-O3
,-Os
.Could you try to add
-funroll-loops
and-fforward-propagate
to confirm?