Can i combine unicode categories in Regex?

I want to get such set of symbols:

  1. \P{L} unicode category use as base
  2. add хХxXтТTоОoO0 symbols to \P{L} unicode category
  3. do not use symbols -_.

By that i get such regex in Java:


But this not working, what's wrong?

1 answer

  • answered 2020-02-16 15:29 The fourth bird

    Reading this page using &&[хХxXтТTоОoO0] means an intersection.

    You could add matching хХxXтТTоОoO0 to the first character class [\\P{L}хХxXтТTоОoO0]

    Then use subtraction for that character class using &&[^-_.]


    Java demo


    final String regex = "[[\\P{L}хХxXтТTоОoO0]&&[^-_.]]";
    final String string = "aTo-_.#$";
    final Pattern pattern = Pattern.compile(regex);
    final Matcher matcher = pattern.matcher(string);
    while (matcher.find()) {