Can i combine unicode categories in Regex?

I want to get such set of symbols:

  1. \P{L} unicode category use as base
  2. add хХxXтТTоОoO0 symbols to \P{L} unicode category
  3. do not use symbols -_.

By that i get such regex in Java:

[[\P{L}]&&[^-_.]&&[хХxXтТTоОoO0]]

But this not working, what's wrong?

1 answer

  • answered 2020-02-16 15:29 The fourth bird

    Reading this page using &&[хХxXтТTоОoO0] means an intersection.

    You could add matching хХxXтТTоОoO0 to the first character class [\\P{L}хХxXтТTоОoO0]

    Then use subtraction for that character class using &&[^-_.]

    [[\\P{L}хХxXтТTоОoO0]&&[^-_.]] 
    

    Java demo

    Example

    final String regex = "[[\\P{L}хХxXтТTоОoO0]&&[^-_.]]";
    final String string = "aTo-_.#$";
    
    final Pattern pattern = Pattern.compile(regex);
    final Matcher matcher = pattern.matcher(string);
    
    while (matcher.find()) {
        System.out.println(matcher.group(0));
    }
    

    Output

    T
    o
    #
    $