Workaround for Impala Regex lookahead and lookbehind
If I use Hive, the below works fine. But if I use Impala, it throws error:
select regexp_replace("foobarbarfoo","bar(?=bar)","<NA>"); WARNINGS: Could not compile regexp pattern: bar(?=bar) Error: invalid perl operator: (?=
Basically, Impala doesn't support lookahead and lookbehind
Is there a workaround for this today? Maybe use UDF?
Since you are using
regexp_replace, match and capture the part of string you want to keep (but want to use as must-have context) and replace with a backreference. See the
These examples show how you can replace parts of a string matching a pattern with replacement text, which can include backreferences to any
()groups in the pattern string. The backreference numbers start at 1, and any
\characters must be escaped as
So, here, you may use
select regexp_replace("foobarbarfoo","bar(bar)","<NA>\\1"); ^ ^ ^^^
Note it will not work to replace consecutive matches, however, it will work in the current scenario and
foobarbarfoowill turn into
foo<NA>barfoo(note that Go regex engine is also RE2, hence this option is chosen at regex101.com).