How to extract string between quotes, with a delimiter in Snowflake
I've got a bunch of fields which are double quoted with delimiters but for the life of me, I'm unable to get any regex to pull out what I need.
In short - the delimiters can be in any order and I just need the value that's between the double quotes after each delimiter. Some sample data is below, can anyone help with what regex might extract each value? I've tried
'delimiter_1=\\W+\\w+'
but I only seem to get the first word after the delimiter (unfortunately - they do have spaces in the value)
some content delimiter_1="some value" delimiter_2="some other value" delimiter_4="another value" delimiter_3="the last value"
1 answer
-
answered 2022-05-04 15:44
Gokhan Atil
The problem is returning a varying numbers of values from the regex function. For example, if you know that there will 4 delimiters, then you can use REGEXP_SUBSTR for each match, but if the text will have varying delimiters, this approach doesn't work.
I think the best solution is to write a function to parse the text:
create or replace function superparser( SRC varchar ) returns array language javascript as $$ const regexp = /([^ =]*)="([^"]*)"/gm; const array = [...SRC.matchAll(regexp)] return array; $$;
Then you can use LATERAL FLATTEN to process the returning values from the function:
select f.VALUE[1]::STRING key, f.VALUE[2]::STRING value from values ('some content delimiter_1="some value" delimiter_2="some other value" delimiter_4="another value" delimiter_3="the last value"') tmp(x), lateral flatten( superparser(x) ) f; +-------------+------------------+ | KEY | VALUE | +-------------+------------------+ | delimiter_1 | some value | | delimiter_2 | some other value | | delimiter_4 | another value | | delimiter_3 | the last value | +-------------+------------------+
do you know?
how many words do you know