These notes are based on Bob Nystorm’s article on Pratt parsing and an analysis on the sample parser he presents in bantam.
Traditional [top-down parsers](https://en.wikipedia.org/wiki/Top-down_parsing#:~:text=Top-down parsing in computer,a top-down parsing strategy.) rely on [recursive descent](https://en.wikipedia.org/wiki/Recursive_descent_parser#:~:text=In computer science%2C a recursive,the nonterminals of the grammar.) to generate an Abstract Syntax Tree (AST) of a given expression. However, these models tend to lack in flexibility when dealing with expressions that involve operator precedence.
To address this, Pratt parsers introduce the concept of binding power, an informal term by Aleksey Kladov, which gives the parser information about the relative precedence of an operator and the upcoming operator. This allows the parser to honor the precedence of each operator which parsing increasingly complex expressions.
These notes analyse the way precedence comparison works and explains how a general compiler can be built on top of the parser. It assumes that a lexer has already properly tokenised the expression.
1 + 2 * 3 = 1 + (2 * 3)
1 * 4 ^ 7 - 2 = (1 * (4 ^ 7)) - 2
Operator precedence is used to decide which operation to perform first in an expression. As shown above, ^
has a greater precedence than *
which has a greater precedence than +
, hence the grouping of terms.
To honor operator precedence, Pratt parsers maintain two values of precedence per operator. In our case, we can think of the two values as indicating the importance of the sub-expression to the left and to the right of the operator.
Expression: 1 + 2 * 3
Precedence: 0 1
In layman terms, when evaluating the above expression, we assign the start of the expression with a precedence of 0
.
<aside> 💡 The starting precedence must be the lowest possible value among our precedence table, otherwise we will not be able to proceed with the parsing.
</aside>
We then look towards the next operator in this expression +
and check for its precedence 1
. Since 1 > 0
, we will create a sub-expression of 1 +
, which holds the precedence of 1
.