Earley Parser
Earley Parser
May 30This is the first post in what will hopefully be a series about how to build an Earley parser. I've been struggling to hammer the grammar for a language I'm working on into a form suitable for use with a PEG (parsing expression grammar) parser. Over the past two weekends I was even trying to add support for handling left-recursive grammar rules into Citrus (a PEG parser), but as it turns out, the currently known methods for enabling left-recursion are not powerful enough to recognize all forms of left-recursive grammars. I officially gave up trying to use a PEG parser for my grammar, it simply isn't flexible enough. I've decided to go whole hog, and start trying to implement a parsing method that I know is flexible enough to recognize the entire class of context free grammars; the Earley parsing algorithm will do just that.
If you start looking for resources online that explain how the Earley's algorithm works, you quickly find that there isn't much practically useful information available for free. I found that to be the case when I was investigating whether to use Earley's algorithm or a similarly powerful parsing technique called SGLR parsing. The papers explaining these parsing techniques are all written by professors and researchers whose sole purpose in life is to get published. As a result, they submit their papers to publishers who then agree to include them into their publication but require the author to assign all distribution rights to the publisher. The publisher is then the only one who can publish the paper, and they do so, for $30 per paper! Don't even get me started… In any case, all the good papers about SGLR are behind pay-walls.
Papers about Earley's algorithm are a little more accessible, but the really nice thing about Earley's algorithm is that it is covered in detail in a book called Parsing Techniques: a practical guide. I will be building my parser based on the book's description of Earley's algorithm. Another really nice thing about Earley's algorithm is that it's supposed to be easier to implement than an SGLR parser. We'll see.
If you're interested, section 7.2 of the first edition of Parsing Techniques: a practical guide is where Earley parsers are covered.