Reading file in bash loop on AIX/bash is much slower than in Linux/ksh - BLOCKSIZE?

We have a custom script ( in ksh) which was developed in RHEL Linux. The functionality is 1) Read the input ASCII file 2) Replace "\" with "\\" using sed -i inplace the files 3) Load the history file into memory 4) Compare the data with current day 5) Generate the net change records

During a platform upgrade, we had to migrate this script on AIX 7.1 and replaced the ksh with bash since, typeset -A is not available on ksh AIX and sed -i command with perl -pi -e and the rest of the script is almost the same.

We observe that the script processes for 1 hour ( 691 files) in Linux but, in AIX it is taking 7+ hours for the same.

We observe for one input file the below snippet is having a performance difference, Linux code completes within 1-2 seconds whereas, in AIX it takes 13-15 seconds. Due to this performance difference for each file , for 691 files, the script is taking 7 hours to complete.

Could you please help me understand if we can tune this script for a better performance on AIX. Any pointers will be very helpful. Thank you in advance for your help!

Adding test results below for more precise issue

Linux Test script:

#!/bin/sh
export LANG="C"
echo `date`
typeset -A Archive_Lines
if [ -f "8249cii1.ASC" ]
then
echo `date` Starting sed
sed -i 's/\\/\\\\/g' 1577cii1.ASC
echo `date` Ending sed
while read line; do
 if [[ "${#line}" == "401" ]]
 then
 Archive_Lines["${line:0:19}""${line:27}"]="${line:27:10}"
else
echo ${#line}
fi
done < 1577cii1.ASC
echo `date` Starting sed
sed -i 's/\\\\/\\/g' 1577cii1.ASC
echo `date` Ending sed
fi
echo `date`

Linux execution:

ksh read4.sh
Sun Nov 12 15:03:18 CST 2017
Sun Nov 12 15:03:18 CST 2017 Starting sed
Sun Nov 12 15:03:19 CST 2017 Ending sed
402
405
403
339
403
403
Sun Nov 12 15:03:22 CST 2017 Starting sed
Sun Nov 12 15:03:23 CST 2017 Ending sed
Sun Nov 12 15:03:23 CST 2017

AIX Test Script:

#!/usr/bin/bash
export LANG="C"
echo `date`
typeset -A Archive_Lines
if [ -f "1577cii1.ASC" ]
then
echo `date` Starting perl
perl -pi -e 's/\\/\\\\/g' 1577cii1.ASC
echo `date` Ending perl
while read line; do
 if [[ "${#line}" == "401" ]]
 then
 Archive_Lines["${line:0:19}""${line:27}"]="${line:27:10}"
else
echo ${#line}
 fi
done < 1577cii1.ASC
echo `date` Starting perl
perl -pi -e 's/\\\\/\\/g' 1577cii1.ASC
echo `date` Ending perl
fi
echo `date`

AIX Test execution:

  bash read_test.sh
    Sun Nov 12 15:00:17 CST 2017
    Sun Nov 12 15:00:17 CST 2017 Starting perl
    Sun Nov 12 15:00:18 CST 2017 Ending perl
    402
    405
    313
    403
    337
    403
    403
    Sun Nov 12 15:01:29 CST 2017 Starting perl
    Sun Nov 12 15:01:29 CST 2017 Ending perl
    Sun Nov 12 15:01:29 CST 2017

Replacing Archive_Lines["${line:0:19}""${line:27}"]="${line:27:10}" with echo"."

 bash read_test.sh
Sun Nov 12 16:56:27 CST 2017
Sun Nov 12 16:56:27 CST 2017 Starting perl
Sun Nov 12 16:56:27 CST 2017 Ending perl
.
.
.
.
.
Sun Nov 12 16:56:42 CST 2017 Starting perl
Sun Nov 12 16:56:42 CST 2017 Ending perl
Sun Nov 12 16:56:42 CST 2017

With Archive_Lines["${line:0:19}""${line:27}"]="${line:27:10}"

 bash read_test.sh
Sun Nov 12 16:59:52 CST 2017
Sun Nov 12 16:59:52 CST 2017 Starting perl
Sun Nov 12 16:59:52 CST 2017 Ending perl
402
405
313
403
337
403
403
Sun Nov 12 17:01:11 CST 2017 Starting perl
Sun Nov 12 17:01:11 CST 2017 Ending perl
Sun Nov 12 17:01:11 CST 2017

Thanks, Vamsi

2 answers

  • answered 2017-11-12 23:10 markp

    As Walter had suggested, it looks like there are some performance hits in bash for the substring processing (and possibly the length test).

    It might be of interest to see what kind of timings you get with other solutions.

    Here's a simplistic awk solution that should do the same thing as the original bash/substring logic (using your current sample data file; sans the output of line lengths != 401):

    awk 'length($0)==401 { print substr($0,1,20)substr($0,28)"|"substr($0,28,10) }' 1577cii1.ASC | \
    while IFS="|" read idx val
    do
        Archive_Lines["${idx}"]="${val}"
    done
    
    • length($0)==401 : if line length is 401 then ...
    • print ...."|" ... : print 2 sections of output/fields separated by a pipe (|), where the fields are ...
    • substr($0,1,20)substr($0,28) : equivalent to your ${line:0:19}${line:27}
    • substr($0,28,10) : equivalent to your ${line:27:10}
    • at this point every line of length 401 is generating output like string1|string2
    • while IFS="|" read idx val : split the input back out into 2 variables ...
    • Archive_Lines["${idx}"]="${val}" : use the 2 variables as the array index/value pairs

    NOTE: The addition of the pipe (|) as a field separator was added in case your substrings could include spaces; and of course if your substrings could include the pipe (|) then replace with some other character that won't show up in your substrings and which you can use as a field delimiter.

    The objective is to see if awk's built-in length/substring processing is faster than bash's length/substring processing ...

  • answered 2017-11-13 01:20 vamsi krishna

    This solved my problem

    #!/usr/bin/ksh93
    export LANG="C"
    echo `date`
    typeset -A Archive_Lines
    if [ -f "1577cii1.ASC" ]
    then
    echo `date` Starting perl
    perl -pi -e 's/\\/\\\\/g' 1577cii1.ASC
    echo `date` Ending perl
    while read line; do
     if [[ "${#line}" == "401" ]]
     then
    Archive_Lines[${line:0:19}${line:27}]="${line:27:10}"
    else
    echo ${#line}
     fi
    done < 1577cii1.ASC
    echo `date` Starting perl
    perl -pi -e 's/\\\\/\\/g' 1577cii1.ASC
    echo `date` Ending perl
    fi
    echo `date`
    
    
    ksh93 read_test3.sh
    Sun Nov 12 19:19:34 CST 2017
    Sun Nov 12 19:19:34 CST 2017 Starting perl
    Sun Nov 12 19:19:34 CST 2017 Ending perl
    402
    405
    403
    339
    403
    403
    Sun Nov 12 19:19:38 CST 2017 Starting perl
    Sun Nov 12 19:19:39 CST 2017 Ending perl
    Sun Nov 12 19:19:39 CST 2017