Limit recursion on C++ target

While fuzzing a language made with antlr, the fuzzer reported a slow testcase that was using quite a lot of parens.

One of the rules in the grammar is somewhat like:

paren_expression: '(' expression ')';

Even if it was reported as a slow unit, it underlies the bigger problem of being able to somewhat easily crash the application with enough parens used (and it does on windows which has smaller stack size by default).

From what I searched, there's no option to generate code that checks the stack depth and exits after a reasonable depth, and recovering from stack overflow in C++ is not really a good or portable thing to do.

So, what can be done in this case? Crashing from bad input is not very nice.

2 answers

  • answered 2020-07-29 17:48 Bart Kiers

    You could add a predicate that checks how deep the nested expression is, and let the predicate fail if it exceeds a certain number.

    For example, you allow a maximum of 3 nested expressions, you could do that like this:

    grammar T;
    
    @members {
      private int depth = 0;
    }
    
    parse
     : expr EOF
     ;
    
    expr
     : '(' expr ')' {++depth <= 3}?
     | INT
     ;
    
    INT
     : [0-9]+
     ;
    

    The code:

    TLexer lexer = new TLexer(CharStreams.fromString("(((42)))"));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.parse();
    

    will parse fine, but the code:

    TLexer lexer = new TLexer(CharStreams.fromString("((((42))))"));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.parse();
    

    will throw an exception.

    The parts inside the predicate ({...}?) and inside the @members block are target specific code (Java, in this case). You'll have to write that in C++.

  • answered 2020-07-30 07:43 Mike Lischke

    This is not a bug, but a feature. Recursive decent parsers have no built-in measure to prevent a stack overflow. It could be you really want to parse deeply nested structures and you are able to throw enough memory at that problem. Why should the parser generator set an arbitrary recursion depth for you, if it doesn't know the problem you are trying to solve?

    Use the means of your OS to set a thread's stack depth and you will be able to parse deeply nested expressions. I have a test case with > 700 nested parentheses and that parses fine:

    enter image description here

    This output is from a macOS test app. In the XCode project I set the stack limit in the "Other Linker Flags" entry to: -Wl,-stack_size,0x10000000. Similar settings are available on other platforms/build tools. However, this is a setting for the main thread of your app. For secondary threads things get tricky, because C++ threads don't allow to specify a stack size during creation.