Pandas extract between multiple Start words and multiple stop words

Following on from Pandas DataFrame extract between one START word and multiple STOP words, is it possible to extend the solution to multiple start words, too? Example shouldn't be taken very literally:

df 
   
0   start_word1  text1 end_word1
1   start_word2  text2 end_word2

Expected output

df 
   
0   text1 
1   text2 

1 answer

  • answered 2021-07-27 08:45 mozway

    You can use non-capturing groups to define the start/stop words alternatives:

    df['COLUMN_NAME'].str.extract('(?:start_word1|start_word2)\s+(.*)\s+(?:end_word1|end_word2)')
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum