awk numbered columns and ignore errors

The following works well and captures all 2nd column values for S_nn. The goal is to add numbers in the 2nd column.

awk -F "," '/s_/ {cons = cons + $2} END {print cons}' G.csv

How can I change this to add only when nnn is between N1 and N2 e.g. s_23 and s_24?

Also is it possible to consider 1 if a line has junk instead of numbers in the 2nd column?

S_22, 1
S_23, 0
S_24, 1
S_25, 1
S_26, ?

Sample input: sum s_24 to s_26

Sample output: 1+1+1=3 (the last one is for error)

1 answer

  • answered 2018-02-13 10:35 kvantour

    The solution is rather simple, all you need to do is perform a simple numeric test.

    awk -v start=24 -v stop=26 '
         BEGIN { FS="[_,]" }
         (start <= $2 ) && ($2 <= stop) { s = s + (($3==$3+0)?$3:1) }
         END{ print s+0 }' <file>

    which outputs


    How does it work:

    • line 1 : defines the start and stop fields
    • BEGIN statement redefines the field separator as a _ or a ,, so now we have 3 fields.
    • the second line checks if field 2 (the number) is between start and stop, if so perform the sum.
    • the field 3 is checked if it is a number by testing the condition $3==$3+0, if this fails, it is assumed to be 1

    If you want to see the numbers printed, you can do :

    awk -v start=24 -v stop=26 '
         BEGIN{ FS="[_,]" }
         (start <= $2 ) && ($2 <= stop) {
            v = ($3==$3+0)?$3:1
            s = s + v
            printf "%s%d", (c++?"+":""), v
         END{ printf "=%d\n", s }' <file>

    output :


    The printf statement always prints "+"$3 except on the first time. This is checked by keeping track of a counter c. By default the value of c is set to zero. The entry (c++?"+":"") determines if we are printing the first entry or not. c++ will return the value of c and afterwards sets c to the value c+1, This is called a post increment operator. Thus, the first time, c=0 and (c++?"+":"") returns "" and sets c to 1. The second time, (c++?"+":"") returns "+" and sets c to 2.