preface

In the world of functional programming, abstraction and composition often go hand in hand: several fine-grained abstractions are combined in a particular way to form higher-grained abstractions, which can then be combined again and progressively, raising the level of code abstraction step by step. The charm of functional programming that I feel in engineering development is also reflected in its powerful abstraction ability.

Parser parses input and produces results. For example, the regular expression engine can parse strings matching input, and JSONSerialization can help iOS/Mac developers parse JSON into an Objective-C Dictionary. Parser Combinator is an implementation of Parser, which is elegant and based on the idea of functional programming. Using it, we can write parsing logic very easily, and the code is concise and easy to understand. The design of Parser Combinator fully reflects the idea of combination. By using its various combinators, we can combine several fine-grained parsers together to get coarse-grained Parser. After that, the combination can continue and the degree of abstraction is gradually improved.

In this article, I’ll write a simple Parser using Swift to build a lightweight Parser Combinator library step by step. In this article, I hope readers can deeply feel the charm of functional programming with me, and deepen the understanding of its related concepts, so that they can write more elegant and abstract functional code in the future.

This article covers some of the concepts of functional programming. If you are not familiar with this part of the article, check out my previous article on this topic:

Functional programming – Functor, Monad, Applicative in Swift

Functional programming – Integrate Monad into Swift

Functional Programming – Cool Applicative

The Parser Combinator library code implemented in this article has been open source on Github for your reference: TangentW/Parsec

Because my technical level is limited, if the article exists fallacy, but also hope you correct.

model

Before implementing Parser Combinator, let’s look at its basic abstract model.

Basic model

Parsing consumes input, in this case a string, and produces results. As shown in the figure below, each cell represents an element in the input string, and the type is a character. The Parser consumes several elements for parsing and produces results. In addition, the Parser outputs an additional state: the remaining input string that is parsed for subsequent Parser processing.

From the description above, we can use Swift to indicate the type of Parser:

typealias Parser<Value> = (String) - >Result< (Value.String), Error>
Copy the code

Parser is defined here as a function type, and since it produces variable output, Value generics are used to refer to the type of the result. The function takes an input parameter of type String and returns Result. If the parse succeeds, Result loads a binary that represents the Result of the parse and the rest of the input string. When parsing fails, the error message is also returned via the Result load.

For example, suppose we have a Parser that parses the longest number prefix in a string:

typealias Parser = (String) - >Result< (Int.String), Error>
Copy the code

When the string “123abc” is input, the Parser outputs.success((123, “ABC “)). Here the longest prefix is parsed out and converted to an Int, which is returned as a result, along with the rest of the parsed input string.

The optimization model

The model described above follows the typical data immutable + pure function properties of functional programming, that is, there are no mutable variables and functions have only unique outputs for the same inputs. But it is too hard for Swift: sometimes “variable” can bring more convenient, on the other hand, if, in accordance with requirements of the above model, after the Parser in parsing also need to return the rest of the input String, then each instance of String parsing will be constructed, so obviously is not a good practice for performance. Therefore, when implementing Parser using Swift in practice, we need to optimize the model with “Swift features “:

typealias Parser<Value> = (Context) - >Result<String.Error>
Copy the code

This model will no longer adhere to the data immutable + pure function property, but is well suited to Swift. In this case, the Context is mutable, and it records where the current input string was resolved to, instead of returning the rest of the input string directly in the old model.

The following sections will detail the concept of Context in the model and the specific implementation of Parser Combinator.

implementation

Let’s implement the lightweight Parser Combinator library using Swift.

Context

In the “basic model” mentioned at the beginning, Parser is a function type whose argument is an input String. When parsed, it returns a result value along with the rest of the input String. In the optimization model, the parsing function takes Context as its input parameter, and only the resulting value is returned after parsing. The reason for this optimization is that we don’t have to return the rest of the input string every time we parse, we just need to keep a record of where the input string is currently parsed, and here the Context takes care of that, so it’s mutable:

public final class Context {
    
    public typealias Stream = String
    public typealias Element = Stream.Element
    public typealias Index = Stream.Index
    
    public let stream: Stream
    
    public init(stream: Stream) {
        self.stream = stream
        _cursor = stream.startIndex
    }
    
    private var _cursor: Index
}
Copy the code

In the code above, Context records the stream input string and the position that the current input string is parsed to _CURSOR (private, so it’s named with an underscore). The type of the position is string Index and the initial value is startIndex at the beginning of the string.

Context also provides external consumption methods:

// MARK: - Iterator

extension Context: IteratorProtocol {
    
    public func next(a) -> Element? {
        let range = stream.startIndex..<stream.endIndex
        guard range.contains(_cursor) else {
            return nil
        }
        defer {
            stream.formIndex(after: &_cursor)
        }
        return stream[_cursor]
    }
}
Copy the code

Here we have Context implement IteratorProtocol, and every time we call next(), the Context will consume and return an element (character) in the input string by stepping _cursor, and return nil when the input string has been completely consumed before that.

Error

Error handling is essential in parsing, and when parsing fails, we can clearly understand the cause of the failure through the error message. To do this we need to define the types of errors:

public struct Error: Swift.Error {
    
    public let stream: Context.Stream
    public let position: Context.Index
    public let message: String

    public init(stream: Context.Stream.position: Context.Index.message: String) {
        self.stream = stream
        self.position = position
        self.message = message
    }
}
Copy the code

Error records the input string, the position to which the input string is currently parsed, and the failure information expressed as a string. When parsing fails, we can get an instance of Error from Result returned by Parser to know the cause of the failure and the current parsing location.

To make it easier to throw an error, we can extend the Context by adding code to throw an error:

// MARK: - Error

public extension Context {
    
    func `throw`<T> (_ errorMessage: String) -> Result<T.Error> {
        .failure(error(with: errorMessage))
    }
    
    func error(with message: Stream) -> Error{.init(stream: stream, position: _cursor, message: message)
    }
}
Copy the code

We build the Error from the data inside the Context so that the Context can easily throw errors out.

Parser

The Parser we mentioned in the previous model section is actually a function type:

typealias Parser<Value> = (Context) - >Result<String.Error>
Copy the code

However, to facilitate subsequent Combinator expansion and improve the readability of the code, we define the Parser as a struct type and wrap the function directly inside it:

public struct Parser<Value> {

    public typealias Result = Swift.Result<Value.Error>

    public let parse: (Context) - >Result
    
    public init(_ parse: @escaping (Context) - >Result) {
        self.parse = parse
    }
}
Copy the code

Next we extend a handy parsing method for Parser:

public extension Parser {
    
    func parse(_ stream: Context.Stream) -> Result {
        parse(.init(stream: stream))
    }
}
Copy the code

The method parse has the same name as the function wrapped in Parser, but its input arguments are different: the input arguments are directly input strings. Inside the method, we construct the Context and call the function wrapped in Parser. Using this method, we can enter the string directly to get the parsed result.

The first tastes Parser

OK, so far we have basically defined the core data structures of the Parser Combinator library. Let’s try it out:

public let element = Parser<Context.Element> {
    if let element = $0.next() {
        return .success(element)
    } else {
        return $0.throw("unexpected end of stream")}}public let endOfStream = Parser< > () {if $0.next() = = nil {
        return .success(())
    } else {
        return $0.throw("expecting end of stream")}}Copy the code

We define two parsers above:

  • element: Passes during parsingContextConsumes an input string element (character), returns an error if the input string has been consumed before.
  • endOfStream: withelementIt’s going to passContextConsumes the input string, but it expects the input string to have been consumed before, otherwise returns an error.

Let’s input a string and try running both parsers:

let inputOne = "hello world!"
let inputTwo = ""

let inputOneResultOfElement = element.parse(inputOne)
let inputTwoResultOfElement = element.parse(inputTwo)

let inputOneResultOfEndOfStream = endOfStream.parse(inputOne)
let inputTwoResultOfEndOfStream = endOfStream.parse(inputTwo)
Copy the code

Output:

inputOneResultOfElement: success("h") inputTwoResultOfElement: failure(Error(stream: "", position: Swift.String.Index(_rawBits: 1), message: "unexpected end of stream")) inputOneResultOfEndOfStream: failure(Error(stream: "hello world!" , position: Swift.String.Index(_rawBits: 65793), message: "expecting end of stream")) inputTwoResultOfEndOfStream: success()Copy the code

Yes, the two parsers parse the input as expected.

Combinator

It is mentioned in the preface that the design of Parser Combinator fully reflects the idea of combination. The Parser is continuously combined with Combinator, increasing the level of abstraction. To do this, we need to define various combinators.

Monad

Here we use the static method just to refer to Monad’s return function and flatMap to refer to bind (>>=) :

public extension Parser {
    
    static func just(_ value: Value) -> Parser<Value> {.init { _ in .success(value) }
    }
    
    func flatMap<O> (_ transform: @escaping (Value) - >Parser<O>) -> Parser<O> {.init {
            switch self.parse($0) {
            case .success(let value):
                return transform(value).parse($0)
            case .failure(let error):
                return .failure(error)
            }
        }
    }
}
Copy the code
  • The Parser returned by the Just method does not consume any input strings. It simply wraps the input parameters of just into Result and returns them as parsed results.

  • In flatMap, we will first run the parsing logic of the current Parser. If the parsing is successful, the parsed result will be passed into the closure parameter transform of flatMap as a parameter. The closure call returns a new Parser, and the parsing logic of the new Parser continues. If the current Parser fails, an error message is returned directly from flatMap.

Just and flatMap are mainly used to map the result after successful parsing. For the failure of parsing, we can also write the corresponding Combinator:

public extension Parser {
    
    static func error<T> (_ error: Error) -> Parser<T> {.init { _ in .failure(error) }
    }
    
    static func `throw`(_ errorMessage: String) -> Parser<Value> {.init { $0.throw(errorMessage) }
    }
    
    func flatMapError(_ transform: @escaping (Error) - >Parser<Value>) -> Parser<Value> {.init {
            switch self.parse($0) {
            case .success(let value):
                return .success(value)
            case .failure(let error):
                return transform(error).parse($0)}}}}Copy the code

This code has the same idea as above, so I won’t go into details.


Functor

With Monad done, we can easily implement Functor, the core of which is the Map method:

public extension Parser {
    
    func map<O> (_ transform: @escaping (Value) - >O) -> Parser<O> {
        flatMap {
            .just(transform($0))}}}Copy the code

With the help of Monad’s flatMap and just methods, a map can be implemented this easily.

What the map method does is map the result of the current Parser, similar to how Swift maps arrays: let arr = [1,2,3,4]. Map {$0 + 1}.

Like flatMap, map can also be written for error cases:

public extension Parser {
    
    func mapError(_ transform: @escaping (Error) - >Error) -> Parser<Value> {
        flatMapError {
            .error(transform($0))}}}Copy the code

Using the properties of Monad and Functor, we can combine the previously mentioned Element with a simple Parser that consumes two characters:

let twoElements = element.flatMap { first in
    element.map { second in
        String([first, second])
    }
}
let result = twoElements.parse("hello")
Copy the code

Result: result: success(“he”)

The two elements present here parse the corresponding two characters in the result, and the Map closure maps the parsed array of characters to strings.


filter

The Swift array has a filter method that filters out unwanted elements based on predicates. Parser does a similar thing. We can simply define a filter method with the same name for Parser:

public extension Parser {
    
    func filter(_ label: String.predicate: @escaping (Value) - >Bool) -> Parser<Value> {
        flatMap {
            predicate($0) ? .just($0) : .throw(label)
        }
    }
}
Copy the code

Filter takes two parameters, the first of which is a string, indicating the message in the error returned when the predicate fails. The second is a predicate closure that returns a Bool to indicate whether the result meets expectations.

Monad makes it easy to implement filter by calling the predicate closure in the flatMap to see if the result is as expected, and if it is, the result is returned directly, otherwise an error is thrown.

We can extend Parser based on Filter to implement two portable Combinators:

public extension Parser {

    func equal(to value: Value) -> Parser<Value> where Value: Equatable {
        filter("expecting \(value)") { $0 = = value }
    }

    func notEqual(to value: Value) -> Parser<Value> where Value: Equatable {
        filter("unexpected \(value)") { $0 ! = value }
    }
}
Copy the code

The above two combinators work on Parser values that follow the Equatable protocol, with expected values equal to/unequal expected values.

Here’s how to try it:

let str = "hello"
let resultOne = element.equal(to: "h").parse(str)
let resultTwo = element.notEqual(to: "h").parse(str)
Copy the code

The results are as follows:

resultOne: success("h")

resultTwo: failure(Error(stream: "hello", position: Swift.String.Index(_rawBits: 65793), message: "unexpected h"))
Copy the code

Because the forms element.equal(to: Character) and element.notequal (to: Character) are so common, we can simply wrap them as functions:

public func char(_ char: Character) -> Parser<Character> {
    element.equal(to: char)
}

public func notChar(_ char: Character) -> Parser<Character> {
    element.notEqual(to: char)
}
Copy the code

The previous code can be rewritten as:

let str = "hello"
let resultOne = char("h").parse(str)
let resultTwo = notChar("h").parse(str)
Copy the code

Sometimes we also want to string match the following input string, which can be easily extended based on the char above:

public func string(_ string: String) -> Parser<String> {.init {
        for element in string {
            if case .failure(let error) = char(element).parse($0) {
                return .failure(error)
            }
        }
        return .success(string)
    }
}
Copy the code

Try string:

let parser = string("Hello")
let resultOne = parser.parse("Helao")
Copy the code

Results obtained:

resultOne: failure(Error(stream: "Helao", position: Swift.String.Index(_rawBits: 262401), message: "expecting l"))

resultTwo: success("Hello")
Copy the code

Left not right, right not left

Many times we want the Parser to successfully parse the input, but we don’t care about its output and ignore it. For example, parsing the string literal “Hello World”, all we need is a pair of double quotes inside the string, but parsing pairs of double quotes is also essential.

Here we can use the properties of Monad and Functor to implement left-not-right, right-not-left Combinator:

public extension Parser {
    
    func usePrevious<O> (_ next: Parser<O>) -> Parser<Value> {
        flatMap { previous in next.map { _ in previous } }
    }
    
    func useNext<O> (_ next: Parser<O>) -> Parser<O> {
        flatMap { _ in next }
    }
    
    func between<L.R> (_ left: Parser<L>, _ right: Parser<R>) -> Parser<Value> {
        left.useNext(self).usePrevious(right)
    }
}
Copy the code

For Parser: Left and right, there are:

  • left.usePrevious(right): The input string will pass through in sequenceleftandrightBut will be ignored in the endrightOnly the parsed result is returnedleftResults.
  • left.useNext(right): The input string will pass through in sequenceleftandrightBut will be ignored in the endleftOnly the parsed result is returnedrightResults.
  • aParser.bwtween(left, right)Will pass by in turnleft,aParser,rightBut will be ignored in the endleft,rightOnly the parsed result is returnedaParserResults.

Now we can write a literal Parser that parses characters wrapped in pairs of single quotes: Parser

public let charLiteral = element.between(char("'"), char("'"))

let str = "'A'"
let result = charLiteral.parse(str)
Copy the code

Result = success(“A”).


choose

Sometimes when a Parser fails to parse, we want another Parser to take over and parse again. Like Swift’s?? As with the operator, if the left-hand value is not empty, the left-hand value is returned, otherwise the right-hand value is returned. The above scenario corresponds to the concept of functional programming: Alternative.

Before implementing Alternative for Parser, we extend Error:

public extension Error {
    
    func merge(with another: Error) -> Error {
        position > another.position ? self : another
    }
}
Copy the code

The merge method added here for Error is used to merge errors, which takes the furthest of the two errors and returns.

Why add such a method? Consider the scenario where we want bParser to try again when aParser fails, but both aParser and bParser fail. Then two errors are generated, but which Error is returned? Merge allows us to make this decision by taking the Error that is furthest resolved so that the resulting Error information is more accurate and understandable.

Next we implement Alternative for Parser, which is really just a function: or

// MARK: - Alternative

public extension Parser {
    
    func or(_ another: Parser<Value>) -> Parser<Value> {
        flatMapError { error in
            another.mapError(error.merge)
        }
    }
}
Copy the code

If the current Parser fails to be parsed, the Parser passed in with the OR argument will try again. If both parses fail, the Error generated during the parsing process is merged.


back

Wait, is there something wrong with the implementation of choice?

let str = "Hello"
let parser = char("W").or(char("H"))
let result = parser.parse(str)
Copy the code

Failure (Error(stream: “Hello World”, position: Swift.string.Index(_rawBits: 131329), message: “Expecting H”)?

As shown in the figure above, the white triangle represents the position char(“W”) is parsed to, where the element is the character H. This is not sufficient, so the Parser fails. Due to the effect of or, the char(“H”) passed in will continue to be parsed, but because one element of the input string has already been consumed, the next position to be parsed is at the red triangle, where the element is already character E, so parsing fails.

To solve this problem, we need to backtrack: if char(“W”) fails to parse, let char(“H”) still parse at the white triangle.

To implement backtracking, we need to record the current parse position of the input string before parsing. When backtracking is needed, the parse position is reset with this record. For code elegance, we can extend the following methods for the Context:

public extension Context {
    
    func `do`<T> (_ work: (Context) - >Result<T.Error>) -> Result<T.Error> {
        let rewind = { [_cursor] in self._cursor = _cursor }
        let result = work(self)
        if case .failure = result { rewind() }
        return result
    }
}
Copy the code

The do method accepts a nonescaping closure argument, which takes Context itself and returns a Result with a generic type. The closure is called inside the DO method. Before the work closure is called, DO captures the parse location of the current Context and constructs a closure for rollback. When the closure is called, its result value is returned unchanged by the DO method, but when the result indicates failure, the rollback closure is called before the method returns, and the Context’s parse position is reset.

Now we can modify the Parser initialization method a little bit so that all Parser logic runs in Context do:

public struct Parser<Value> {

    public typealias Result = Swift.Result<Value.Error>

    public let parse: (Context) - >Result
    
    public init(_ parse: @escaping (Context) - >Result) {
        self.parse = { $0.do(parse) } / / ⚠ ️ here}}Copy the code

OK, the Parser implementation with traceback is complete. Let’s try the previous code again:

let str = "Hello"
let parser = char("W").or(char("H"))
let result = parser.parse(str)
Copy the code

Test result is SUCCESS (“H”), as expected!


Many & Some

Sometimes we want to run the parsing logic of a Parser multiple times and collect the results. For example, calling element multiple times yields a string of multiple characters. To do this, we can construct the many and some combinators:

Let’s start with some: some means that the Parser must be successfully parsed at least once or more. If none of the parses succeed, an error is returned, and if the parses do not fail, they continue running, collecting the results of previous parses in an array.

public extension Parser {
    
    var some: Parser"[Value]> {
        flatMap { first in
            .init {
                var result = [first]
                while case .success(let value) = self.parse($0) {
                    result.append(value)
                }
                return .success(result)
            }
        }
    }
}
Copy the code

Many is less restrictive than some: Parser allows zero or more successful parses. If the Parser fails on the first attempt, an empty array is returned:

public extension Parser {

    var many: Parser"[Value] > {some.or(.just([]))
    }
}
Copy the code

Many uses or to connect some to a Parser that only returns an empty array, so when some parsing fails, the empty array is returned as the result.

Next we can implement a string literal Parser that parses a pair of double quotes: Parser

public let stringLiteral = notChar("\"").many
    .map { String($0) }
    .between(char("\""), char("\""))
Copy the code
  • notChar("\"").manyConstructor for parsing zero or more non-double quotes"The character of
  • The result type obtained above is a character array[Character]This is used heremapConvert the result toStringtype
  • between(char("\""), char("\""))othersParserThe result is between the double quote pairs

Test it out:

let str = #""Hello World""#
let result = stringLiteral.parse(str)
Copy the code

The result is success(“Hello World”).

The operator

To improve code brevity and readability, we can write some Combinator operators:

precedencegroup AlternativePrecedence {
  associativity: left
}

precedencegroup FunctorPrecedence {
  associativity: left
  higherThan: AlternativePrecedence
}

infix operator <|> : AlternativePrecedence
infix operator * > : FunctorPrecedence
infix operator < * : FunctorPrecedence

public func < * <L.R> (lhs: Parser<L>, rhs: Parser<R>) -> Parser<L> {
    lhs.usePrevious(rhs)
}

public func * > <L.R> (lhs: Parser<L>, rhs: Parser<R>) -> Parser<R> {
    lhs.useNext(rhs)
}

public func <|> <T> (lhs: Parser<T>, rhs: Parser<T>) -> Parser<T> {
    lhs.or(rhs)
}
Copy the code

At this point, we have implemented the entire Parser Combinator library. There are a lot of interesting and useful Combinators missing here, but I won’t go into detail here for the length of the article, so interested readers can try to find out for themselves. Swift also provides a lot of apis for characters and strings, which can be used to construct a lot of cool combinators.

Let’s try to use it to write a little parser.

A profound

Using the Parser Combinator library already implemented above, we tried to write a Parser for string key-value pairs as follows:

Keys and values are wrapped in double quotes around string literals separated by arrows. For example, “key” => “value”, where several Spaces are allowed on the left and right sides of the arrow. With line breaks, we can draw up multiple key-value pairs:

"Name" => "Tangent" "City" => "Shenzhen" "Introduction" => "iOS developer"Copy the code

From simple to complex and from fine-grained to coarse-grained, let’s implement the parser step by step:

First build Parser for “Spaces” and “line feeds” for later use:

public let space = element.filter("expecting whitespace") { $0 = = "" }
    .map { _ in }

public let newline = element.filter("expecting newline") { $0.isNewline }
    .map { _ in }
Copy the code

For parsing string literals, the previous section implemented Parser: stringLiteral, which is used to extract the contents between double quote pairs of string literals.

Next comes the parsing of arrows:

let arrow = string("= >").between(space.many, space.many)
Copy the code

Since the rule allows any number of Spaces to the left and right of the arrow, between(space.many, space.many) is used.

Build the key-value pair Parser below:

typealias Entry = (key: String, value: String)

let entry: Parser<Entry> = stringLiteral.flatMap { key in
    (arrow * > stringLiteral).map { value in
        (key, value)
    }
}
Copy the code

We first declare that the key-value pair is of type a binary (key: String, value:). StringLiteral > Arrow > stringLiteral is parsed and matched by flatMap, useNext and Map. And extract the values of the left and right stringLiteral to construct the key-value pair Entry.

Finally, we need to build a Parser that parses multiple key-value pairs:

let entries = (entry < * (newline <|> endOfStream)).some
Copy the code

Because after the rules require each key value for parsing, followed by either a newline, either at the end of the input string (newline or endOfStream), so use the entry here < * (newline < | > endOfStream) combination, Then use some to make the parse match one or more times.

Now that the key-value pair mini parser is complete, let’s test it:

let string = "" "" the name "= >" Tangent "" city," "the introduction" = > "shenzhen" = > "iOS developers" "" "
let result = entries.parse(string)
Copy the code

Result output:

Success ([(value: the key: "name", "Tangent"), (key: "city", the value: "shenzhen"), (key: "the introduction", the value: "iOS developers")])Copy the code

conclusion

This article takes you through building a lightweight Parser Combinator library to implement a simple Parser. The purpose is to make you more deeply feel the charm of functional programming, and deepen the understanding of functional programming related concepts. If I had time later, I might also consider writing a JSON Parser using Parser Combinator.

If you have any questions or corrections, welcome to comment, thank you!

reference

haskell/parsec

Build Parser Combinator with Haskell