How gcc optimize sum from 1 to 100 into 5050

I compiled following c program with gcc -O3 -S a.c:

#include <stdio.h>

int main() {
    int sum = 0;
    for (int i = 0; i <= 100; i++) {
        sum += i;
    printf("%d", sum);

The generated assembly code is as follow:

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 10, 15    sdk_version 10, 15, 4
    .globl  _main                   ## -- Begin function main
    .p2align    4, 0x90
_main:                                  ## @main
## %bb.0:
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    leaq    L_.str(%rip), %rdi
    movl    $5050, %esi             ## imm = 0x13BA
    xorl    %eax, %eax
    callq   _printf
    xorl    %eax, %eax
    popq    %rbp
                                        ## -- End function
    .section    __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
    .asciz  "%d"


As if GCC ran the code and notice that the loop times are determined and GCC replaced the whole calculating with the result 5050.

movl $5050, %esi

How does gcc do this kind of optimization? What's the academic name of this kind of optimization?

2 answers

  • answered 2021-01-26 07:50 klutt

    I was trying to find the specific flag, but with no luck. I was able to reproduce your result with only -O1 so what I did after that was manually adding all the flags that are enabled by -O1. But when I did, I could not reproduce the result.

    And as can be read in the documentation

    Not all optimizations are controlled directly by a flag. Only optimizations that have a flag are listed in this section.

    So I assume this is done by something that -O1 adds that does not have a flag.

    Here is what I tried:

    $ gcc -fauto-inc-dec  -fbranch-count-reg  -fcombine-stack-adjustments \
    -fcompare-elim  -fcprop-registers  -fdce  -fdefer-pop  -fdelayed-branch  -fdse  \
    -fforward-propagate  -fguess-branch-probability  -fif-conversion \
    -fif-conversion2  -finline-functions-called-once  -fipa-profile  \
    -fipa-pure-const  -fipa-reference  -fipa-reference-addressable \
    -fmerge-constants  -fmove-loop-invariants  -fomit-frame-pointer \
    -freorder-blocks  -fshrink-wrap  -fshrink-wrap-separate  -fsplit-wide-types  \
    -fssa-backprop  -fssa-phiopt  -ftree-bit-ccp  -ftree-ccp  -ftree-ch \
    -ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts \
    -ftree-dse  -ftree-forwprop  -ftree-fre  -ftree-phiprop  -ftree-pta \
    -ftree-scev-cprop  -ftree-sink  -ftree-slsr  -ftree-sra  -ftree-ter \
    -funit-at-a-time -S k.c 
    cc1: warning: this target machine does not have delayed branches
    $ grep "5050" k.s
    $ gcc -O1 -S k.c
    $ grep "5050" k.s
        movl    $5050, %esi

  • answered 2021-01-26 08:08 yoonghm

    From reference in, loop unrolling (-funroll-loops) and forward propagation pass on RTL (-fforward-propagate) are enabled with optimization levels -O, -O2, -O3, -Os.

    Could you try to add -funroll-loops and -fforward-propagate to confirm?