Select only matching files using multiple patterns in list.files

I have the following csv files and I only want to select the ones which have matching 'pop' and 'throughput' values in each string:

example_pop_high_throughput_high_strategy.csv
example_pop_high_throughput_base_strategy.csv
example_pop_high_throughput_low_strategy.csv
example_pop_base_throughput_high_strategy.csv
example_pop_base_throughput_base_strategy.csv
example_pop_base_throughput_low_strategy.csv
example_pop_low_throughput_high_strategy.csv
example_pop_low_throughput_base_strategy.csv
example_pop_low_throughput_low_strategy.csv

I want only these:

example_pop_high_throughput_high_strategy.csv                
example_pop_base_throughput_base_strategy.csv
example_pop_low_throughput_low_strategy.csv

I can use list.files to select all files with, for example, 'high':

file_names <- list.files("made/up/path", pattern = c("high"))

Although, trying to do this twice to just match 'high' and 'high', didn't work:

file_names <- list.files("made/up/path", pattern = c("high", "high"))

Is there a way to select the files with matching 'pop' and 'throughput' values, preferably in a single expression?

2 answers

  • answered 2017-08-21 14:48 Gurman

    Try this regex:

    ^.*?pop_([^_]+)_throughput_\1.*$
    

    Demo

  • answered 2017-08-21 14:49 martin_joerg

    The following should work:

    file_names <- list.files("made/up/path", pattern = c("(low|base|high).+\\1"))