How to extract just part of this string in C?

I have a version file I need to parse to get certain versions in C99. For example purposes, say one of the strings looks like this:

FILE: EXAMPLE ABC123459876-001 REV 1.IMG

The 12345 numbers can be any arbitrary numbers, but always followed by 4 digits and a hyphen + a rev and an extension. I just want to return the middle of this string, that is, the file name + main version so: "EXAMPLE 9876-001 REV 1". I got it to work in the regex101 tester online with something like:

"(?<=EXAMPLE ABC.....)(....-... REV .)(?=.IMG)"

... but C99 regex does not support positive lookahead / lookbehind operators so this does not work for me. Should I be using strstr() or strtok() instead? Just looking for some ideas as to the best way to be doing this in C, thanks.

4 answers

  • answered 2017-11-14 23:48 deiga

    Do you really need regex for this? Could you not just split this string into substrings and work with that?

    1. You can remove the extension with finding the dot with strchr
    2. Substring the file name
    3. Use regex to get the rest with ([0-9]{4}.*$)

  • answered 2017-11-14 23:51 SourceOverflow

    So you want everything except the File:-prefix and the file ending? Since File sounds static, this regex should work:

    File: ([^\.]*)\..*
    

    You can than get that group using regexec

  • answered 2017-11-15 00:02 user3121023

    Try sscanf, strcspn and memmove.

    #include <stdio.h>
    #include <string.h>
    
    int main( void) {
        char line[] = "FILE: EXAMPLE ABC123459876-001 REV 1.IMG";
        char subline[100] = "";
        size_t space = 0;
        size_t dash = 0;
    
        if ( 1 == sscanf ( line, "FILE: %99[^.]", subline)) {
            space = strcspn ( subline, " ");
            space++;
            dash = strcspn ( subline, "-");
            dash -= 4;
            if ( space < dash) {
                memmove ( &subline[space], &subline[dash], strlen ( &subline[dash]) + 1);
            }
            printf ( "%s\n", subline);
        }
        return 0;
    }
    

    output

    EXAMPLE 9876-001 REV 1
    

  • answered 2017-11-15 00:07 Nathan Owen

    Simplest way would probably be to use sscanf but it does risk buffer overflow (make sure your buffers are longer than the max file path length on the system and you should be fine).

    Try something like this (code not tested):

    int ret;
    char sequence_num_prefix[ MAX_PATH_LEN + 1 ] = {0};
    char sequence_num_postfix[ MAX_PATH_LEN + 1 ] = {0};
    char version_num[ MAX_PATH_LEN + 1 ] = {0};
    char my_name[ MAX_PATH_LEN + 1 ] = {0};
    
    ret = sscanf( input_path_buf, "EXAMPLE ABC%[0-9]-%[0-9] REV %[0-9]", 
                  sequence_num_prefix, sequence_num_postfix, version_num);
    
    if( ret != 3 )
    {
        //error
    }
    
    snprintf( my_name, sizeof( my_name ), "EXAMPLE %s-%s REV %s", 
              sequence_num_prefix, sequence_num_postfix, version_num );
    

    Of course a safer way would be to use while loops, or, for cleanliness, use Bison.