Perl, find a match and read next line in perl

I would like to use

myscript.pl targetfolder/*

to read some number from ASCII files.

myscript.pl

@list = <@ARGV>;

# Is the whole file or only 1st line is loaded?

foreach $file ( @list ) {
    open (F, $file);
}

# is this correct to judge if there is still file to load?

while ( <F> ) {
    match_replace()
}

sub match_replace {

    # if I want to read the 5th line in downward, how to do that?
    # if I would like to read multi lines in multi array[row],
    # how to do that?

    if ( /^\sName\s+/ ) {
        $name = $1;
    }               
 }

1 answer

  • answered 2018-07-11 07:38 haukex

    I would recommend a thorough read of perlintro - it will give you a lot of the information you need. Additional comments:

    1. Always use strict and warnings. The first will enforce some good coding practices (like for example declaring variables), the second will inform you about potential mistakes. For example, one warning produced by the code you showed would be readline() on unopened filehandle F, giving you the hint that F is not open at that point (more on that below).

    2. @list = <@ARGV>;: This is a bit tricky, I wouldn't recommend it - you're essentially using glob, and expanding targetfolder/* is something your shell should be doing, and if you're on Windows, I'd recommend Win32::Autoglob instead of doing it manually.

    3. foreach ... { open ... }: You're not doing anything with the files once you've opened them - the loop to read from the files needs to be inside the foreach.

    4. "Is the whole file or only 1st line is loaded?" open doesn't read anything from the file, it just opens it and provides a filehandle (which you've named F) that you then need to read from.

    5. I'd strongly recommend you use the more modern three-argument form of open and check it for errors, as well as use lexical filehandles since their scope is not global, as in open my $fh, '<', $file or die "$file: $!";.

    6. "is this correct to judge if there is still file to load?" Yes, while (<$filehandle>) is a good way to read a file line-by-line, and the loop will end when everything has been read from the file. You may want to use the more explicit form while (my $line = <$filehandle>), so that your variable has a name, instead of the default $_ variable - it does make the code a bit more verbose, but if you're just starting out that may be a good thing.

    7. match_replace(): You're not passing any parameters to the sub. Even though this code might still "work", it's passing the current line to the sub through the global $_ variable, which is not a good practice because it will be confusing and error-prone once the script starts getting longer.

    8. if (/^\sName\s+/){$name = $1;}: Since you've named the sub match_replace, I'm guessing you want to do a search-and-replace operation. In Perl, that's called s/search/replacement/, and you can read about it in perlrequick and perlretut. As for the code you've shown, you're using $1, but you don't have any "capture groups" ((...)) in your regular expression - you can read about that in those two links as well.

    9. "if I want to read the 5th line in downward , how to do that ?" As always in Perl, There Is More Than One Way To Do It (TIMTOWTDI). One way is with the range operator .. - you can skip the first through fourth lines by saying next if 1..4; at the beginning of the while loop, this will test those line numbers against the special $. variable that keeps track of the most recently read line number.

    10. "and if I would like to read multi lines in multi array[row], how to do that ?" One way is to use push to add the current line to the end of an array. Since keeping the lines of a file in an array can use up more memory, especially with large files, I'd strongly recommend making sure you think through the algorithm you want to use here. You haven't explained why you would want to keep things in an array, so I can't be more specific here.

    So, having said all that, here's how I might have written that code. I've added some debugging code using Data::Dumper - it's always helpful to see the data that your script is working with.

    #!/usr/bin/env perl
    use warnings;
    use strict;
    use Data::Dumper; # for debugging
    $Data::Dumper::Useqq=1;
    
    for my $file (@ARGV) {
        print Dumper($file);  # debug
        open my $fh, '<', $file or die "$file: $!";
        while (my $line = <$fh>) {
            next if 1..4;
            chomp($line);  # remove line ending
            match_replace($line);
        }
        close $fh;
    }
    
    sub match_replace {
        my ($line) = @_;  # get argument(s) to sub
        my $name;
        if ( $line =~ /^\sName\s+(.*)$/ ) {
            $name = $1;
        }
        print Data::Dumper->Dump([$line,$name],['line','name']);  # debug
        # ... do more here ...
    }
    

    The above code is explicitly looping over @ARGV and opening each file, and I did say above that more verbose code can be helpful in understanding what's going on. I just wanted to point out a nice feature of Perl, the "magic" <> operator (discussed in perlop under "I/O Operators"), which will automatically open the files in @ARGV and read lines from them. (There's just one small thing, if I want to use the $. variable and have it count the lines per file, I need to use the continue block I've shown below, this is explained in eof.) This would be a more "idiomatic" way of writing that first loop:

    while (<>) {  # reads line into $_
        next if 1..4;
        chomp;    # automatically uses $_ variable
        match_replace($_);
    } continue { close ARGV if eof }  # needed for $. (and range operator)