Python Regex Extract Width x Depth x Height

I am trying to extract the physical dimensions of items from a column "Description" in a df to create a new column with it.

Dimensions usually appear in this format (120x80x100) in the middle of long descriptions like:

Lorem ipsum dolor sit amet, consectetur adipiscing elit 120x80x100 ed do eiusmod tempor...

But sometimes have spaces between:

120 x 80 x 100

Or don't have height:

120 x 80

Any help? Thanks in advance

3 answers

  • answered 2021-07-27 15:32 TYZ

    Something like this should work:


  • answered 2021-07-27 15:32 Tim Biegeleisen

    We can try using a re.findall approach with a regex pattern covering all possible dimension formats:

    inp = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit 1. 20x80x100 ed do 120 x 80 x 100 eiusmod 120x80 tempor...'
    dims = re.findall(r'\d+(?:\s*x\s*\d+){1,2}', inp)
    print(dims)  # ['120x80x100', '120 x 80 x 100', '120x80']

  • answered 2021-07-27 15:36 Arvind Kumar Avinash

    You can use the regex, \d+\s*x\s*\d+(?:\s*x\s*\d+)?


    • \d+: One or more digits
    • \s*: Zero or more whitespace characters
    • x: Literal, x
    • (?:\s*x\s*\d+)?: Optional non-capturing group

    If you want the numbers to be of one to three digits, replace \d+ with \d{1,3} as shown in the regex, \d{1,3}\s*x\s*\d{1,3}(?:\s*x\s*\d{1,3})?.

    If your code requires you to use a group, do it as follows:


