preface
In the world of functional programming, abstraction and composition often go hand in hand: several fine-grained abstractions are combined in a particular way to form higher-grained abstractions, which can then be combined again and progressively, raising the level of code abstraction step by step. The charm of functional programming that I feel in engineering development is also reflected in its powerful abstraction ability.
Parser parses input and produces results. For example, the regular expression engine can parse strings matching input, and JSONSerialization can help iOS/Mac developers parse JSON into an Objective-C Dictionary. Parser Combinator is an implementation of Parser, which is elegant and based on the idea of functional programming. Using it, we can write parsing logic very easily, and the code is concise and easy to understand. The design of Parser Combinator fully reflects the idea of combination. By using its various combinators, we can combine several fine-grained parsers together to get coarse-grained Parser. After that, the combination can continue and the degree of abstraction is gradually improved.
In this article, I’ll write a simple Parser using Swift to build a lightweight Parser Combinator library step by step. In this article, I hope readers can deeply feel the charm of functional programming with me, and deepen the understanding of its related concepts, so that they can write more elegant and abstract functional code in the future.
This article covers some of the concepts of functional programming. If you are not familiar with this part of the article, check out my previous article on this topic:
Functional programming – Functor, Monad, Applicative in Swift
Functional programming – Integrate Monad into Swift
Functional Programming – Cool Applicative
The Parser Combinator library code implemented in this article has been open source on Github for your reference: TangentW/Parsec
Because my technical level is limited, if the article exists fallacy, but also hope you correct.
model
Before implementing Parser Combinator, let’s look at its basic abstract model.
Basic model
Parsing consumes input, in this case a string, and produces results. As shown in the figure below, each cell represents an element in the input string, and the type is a character. The Parser consumes several elements for parsing and produces results. In addition, the Parser outputs an additional state: the remaining input string that is parsed for subsequent Parser processing.
From the description above, we can use Swift to indicate the type of Parser:
typealias Parser<Value> = (String) - >Result< (Value.String), Error>
Copy the code
Parser is defined here as a function type, and since it produces variable output, Value generics are used to refer to the type of the result. The function takes an input parameter of type String and returns Result. If the parse succeeds, Result loads a binary that represents the Result of the parse and the rest of the input string. When parsing fails, the error message is also returned via the Result load.
For example, suppose we have a Parser that parses the longest number prefix in a string:
typealias Parser = (String) - >Result< (Int.String), Error>
Copy the code
When the string “123abc” is input, the Parser outputs.success((123, “ABC “)). Here the longest prefix is parsed out and converted to an Int, which is returned as a result, along with the rest of the parsed input string.
The optimization model
The model described above follows the typical data immutable + pure function properties of functional programming, that is, there are no mutable variables and functions have only unique outputs for the same inputs. But it is too hard for Swift: sometimes “variable” can bring more convenient, on the other hand, if, in accordance with requirements of the above model, after the Parser in parsing also need to return the rest of the input String, then each instance of String parsing will be constructed, so obviously is not a good practice for performance. Therefore, when implementing Parser using Swift in practice, we need to optimize the model with “Swift features “:
typealias Parser<Value> = (Context) - >Result<String.Error>
Copy the code
This model will no longer adhere to the data immutable + pure function property, but is well suited to Swift. In this case, the Context is mutable, and it records where the current input string was resolved to, instead of returning the rest of the input string directly in the old model.
The following sections will detail the concept of Context in the model and the specific implementation of Parser Combinator.
implementation
Let’s implement the lightweight Parser Combinator library using Swift.
Context
In the “basic model” mentioned at the beginning, Parser is a function type whose argument is an input String. When parsed, it returns a result value along with the rest of the input String. In the optimization model, the parsing function takes Context as its input parameter, and only the resulting value is returned after parsing. The reason for this optimization is that we don’t have to return the rest of the input string every time we parse, we just need to keep a record of where the input string is currently parsed, and here the Context takes care of that, so it’s mutable:
public final class Context {
public typealias Stream = String
public typealias Element = Stream.Element
public typealias Index = Stream.Index
public let stream: Stream
public init(stream: Stream) {
self.stream = stream
_cursor = stream.startIndex
}
private var _cursor: Index
}
Copy the code
In the code above, Context records the stream input string and the position that the current input string is parsed to _CURSOR (private, so it’s named with an underscore). The type of the position is string Index and the initial value is startIndex at the beginning of the string.
Context also provides external consumption methods:
// MARK: - Iterator
extension Context: IteratorProtocol {
public func next(a) -> Element? {
let range = stream.startIndex..<stream.endIndex
guard range.contains(_cursor) else {
return nil
}
defer {
stream.formIndex(after: &_cursor)
}
return stream[_cursor]
}
}
Copy the code
Here we have Context implement IteratorProtocol, and every time we call next(), the Context will consume and return an element (character) in the input string by stepping _cursor, and return nil when the input string has been completely consumed before that.
Error
Error handling is essential in parsing, and when parsing fails, we can clearly understand the cause of the failure through the error message. To do this we need to define the types of errors:
public struct Error: Swift.Error {
public let stream: Context.Stream
public let position: Context.Index
public let message: String
public init(stream: Context.Stream.position: Context.Index.message: String) {
self.stream = stream
self.position = position
self.message = message
}
}
Copy the code
Error records the input string, the position to which the input string is currently parsed, and the failure information expressed as a string. When parsing fails, we can get an instance of Error from Result returned by Parser to know the cause of the failure and the current parsing location.
To make it easier to throw an error, we can extend the Context by adding code to throw an error:
// MARK: - Error
public extension Context {
func `throw`<T> (_ errorMessage: String) -> Result<T.Error> {
.failure(error(with: errorMessage))
}
func error(with message: Stream) -> Error{.init(stream: stream, position: _cursor, message: message)
}
}
Copy the code
We build the Error from the data inside the Context so that the Context can easily throw errors out.
Parser
The Parser we mentioned in the previous model section is actually a function type:
typealias Parser<Value> = (Context) - >Result<String.Error>
Copy the code
However, to facilitate subsequent Combinator expansion and improve the readability of the code, we define the Parser as a struct type and wrap the function directly inside it:
public struct Parser<Value> {
public typealias Result = Swift.Result<Value.Error>
public let parse: (Context) - >Result
public init(_ parse: @escaping (Context) - >Result) {
self.parse = parse
}
}
Copy the code
Next we extend a handy parsing method for Parser:
public extension Parser {
func parse(_ stream: Context.Stream) -> Result {
parse(.init(stream: stream))
}
}
Copy the code
The method parse has the same name as the function wrapped in Parser, but its input arguments are different: the input arguments are directly input strings. Inside the method, we construct the Context and call the function wrapped in Parser. Using this method, we can enter the string directly to get the parsed result.
The first tastes Parser
OK, so far we have basically defined the core data structures of the Parser Combinator library. Let’s try it out:
public let element = Parser<Context.Element> {
if let element = $0.next() {
return .success(element)
} else {
return $0.throw("unexpected end of stream")}}public let endOfStream = Parser< > () {if $0.next() = = nil {
return .success(())
} else {
return $0.throw("expecting end of stream")}}Copy the code
We define two parsers above:
element
: Passes during parsingContext
Consumes an input string element (character), returns an error if the input string has been consumed before.endOfStream
: withelement
It’s going to passContext
Consumes the input string, but it expects the input string to have been consumed before, otherwise returns an error.
Let’s input a string and try running both parsers:
let inputOne = "hello world!"
let inputTwo = ""
let inputOneResultOfElement = element.parse(inputOne)
let inputTwoResultOfElement = element.parse(inputTwo)
let inputOneResultOfEndOfStream = endOfStream.parse(inputOne)
let inputTwoResultOfEndOfStream = endOfStream.parse(inputTwo)
Copy the code
Output:
inputOneResultOfElement: success("h") inputTwoResultOfElement: failure(Error(stream: "", position: Swift.String.Index(_rawBits: 1), message: "unexpected end of stream")) inputOneResultOfEndOfStream: failure(Error(stream: "hello world!" , position: Swift.String.Index(_rawBits: 65793), message: "expecting end of stream")) inputTwoResultOfEndOfStream: success()Copy the code
Yes, the two parsers parse the input as expected.
Combinator
It is mentioned in the preface that the design of Parser Combinator fully reflects the idea of combination. The Parser is continuously combined with Combinator, increasing the level of abstraction. To do this, we need to define various combinators.
Monad
Here we use the static method just to refer to Monad’s return function and flatMap to refer to bind (>>=) :
public extension Parser {
static func just(_ value: Value) -> Parser<Value> {.init { _ in .success(value) }
}
func flatMap<O> (_ transform: @escaping (Value) - >Parser<O>) -> Parser<O> {.init {
switch self.parse($0) {
case .success(let value):
return transform(value).parse($0)
case .failure(let error):
return .failure(error)
}
}
}
}
Copy the code
-
The Parser returned by the Just method does not consume any input strings. It simply wraps the input parameters of just into Result and returns them as parsed results.
-
In flatMap, we will first run the parsing logic of the current Parser. If the parsing is successful, the parsed result will be passed into the closure parameter transform of flatMap as a parameter. The closure call returns a new Parser, and the parsing logic of the new Parser continues. If the current Parser fails, an error message is returned directly from flatMap.
Just and flatMap are mainly used to map the result after successful parsing. For the failure of parsing, we can also write the corresponding Combinator:
public extension Parser {
static func error<T> (_ error: Error) -> Parser<T> {.init { _ in .failure(error) }
}
static func `throw`(_ errorMessage: String) -> Parser<Value> {.init { $0.throw(errorMessage) }
}
func flatMapError(_ transform: @escaping (Error) - >Parser<Value>) -> Parser<Value> {.init {
switch self.parse($0) {
case .success(let value):
return .success(value)
case .failure(let error):
return transform(error).parse($0)}}}}Copy the code
This code has the same idea as above, so I won’t go into details.
Functor
With Monad done, we can easily implement Functor, the core of which is the Map method:
public extension Parser {
func map<O> (_ transform: @escaping (Value) - >O) -> Parser<O> {
flatMap {
.just(transform($0))}}}Copy the code
With the help of Monad’s flatMap and just methods, a map can be implemented this easily.
What the map method does is map the result of the current Parser, similar to how Swift maps arrays: let arr = [1,2,3,4]. Map {$0 + 1}.
Like flatMap, map can also be written for error cases:
public extension Parser {
func mapError(_ transform: @escaping (Error) - >Error) -> Parser<Value> {
flatMapError {
.error(transform($0))}}}Copy the code
Using the properties of Monad and Functor, we can combine the previously mentioned Element with a simple Parser that consumes two characters:
let twoElements = element.flatMap { first in
element.map { second in
String([first, second])
}
}
let result = twoElements.parse("hello")
Copy the code
Result: result: success(“he”)
The two elements present here parse the corresponding two characters in the result, and the Map closure maps the parsed array of characters to strings.
filter
The Swift array has a filter method that filters out unwanted elements based on predicates. Parser does a similar thing. We can simply define a filter method with the same name for Parser:
public extension Parser {
func filter(_ label: String.predicate: @escaping (Value) - >Bool) -> Parser<Value> {
flatMap {
predicate($0) ? .just($0) : .throw(label)
}
}
}
Copy the code
Filter takes two parameters, the first of which is a string, indicating the message in the error returned when the predicate fails. The second is a predicate closure that returns a Bool to indicate whether the result meets expectations.
Monad makes it easy to implement filter by calling the predicate closure in the flatMap to see if the result is as expected, and if it is, the result is returned directly, otherwise an error is thrown.
We can extend Parser based on Filter to implement two portable Combinators:
public extension Parser {
func equal(to value: Value) -> Parser<Value> where Value: Equatable {
filter("expecting \(value)") { $0 = = value }
}
func notEqual(to value: Value) -> Parser<Value> where Value: Equatable {
filter("unexpected \(value)") { $0 ! = value }
}
}
Copy the code
The above two combinators work on Parser values that follow the Equatable protocol, with expected values equal to/unequal expected values.
Here’s how to try it:
let str = "hello"
let resultOne = element.equal(to: "h").parse(str)
let resultTwo = element.notEqual(to: "h").parse(str)
Copy the code
The results are as follows:
resultOne: success("h")
resultTwo: failure(Error(stream: "hello", position: Swift.String.Index(_rawBits: 65793), message: "unexpected h"))
Copy the code
Because the forms element.equal(to: Character) and element.notequal (to: Character) are so common, we can simply wrap them as functions:
public func char(_ char: Character) -> Parser<Character> {
element.equal(to: char)
}
public func notChar(_ char: Character) -> Parser<Character> {
element.notEqual(to: char)
}
Copy the code
The previous code can be rewritten as:
let str = "hello"
let resultOne = char("h").parse(str)
let resultTwo = notChar("h").parse(str)
Copy the code
Sometimes we also want to string match the following input string, which can be easily extended based on the char above:
public func string(_ string: String) -> Parser<String> {.init {
for element in string {
if case .failure(let error) = char(element).parse($0) {
return .failure(error)
}
}
return .success(string)
}
}
Copy the code
Try string:
let parser = string("Hello")
let resultOne = parser.parse("Helao")
Copy the code
Results obtained:
resultOne: failure(Error(stream: "Helao", position: Swift.String.Index(_rawBits: 262401), message: "expecting l"))
resultTwo: success("Hello")
Copy the code
Left not right, right not left
Many times we want the Parser to successfully parse the input, but we don’t care about its output and ignore it. For example, parsing the string literal “Hello World”, all we need is a pair of double quotes inside the string, but parsing pairs of double quotes is also essential.
Here we can use the properties of Monad and Functor to implement left-not-right, right-not-left Combinator:
public extension Parser {
func usePrevious<O> (_ next: Parser<O>) -> Parser<Value> {
flatMap { previous in next.map { _ in previous } }
}
func useNext<O> (_ next: Parser<O>) -> Parser<O> {
flatMap { _ in next }
}
func between<L.R> (_ left: Parser<L>, _ right: Parser<R>) -> Parser<Value> {
left.useNext(self).usePrevious(right)
}
}
Copy the code
For Parser: Left and right, there are:
left.usePrevious(right)
: The input string will pass through in sequenceleft
andright
But will be ignored in the endright
Only the parsed result is returnedleft
Results.left.useNext(right)
: The input string will pass through in sequenceleft
andright
But will be ignored in the endleft
Only the parsed result is returnedright
Results.aParser.bwtween(left, right)
Will pass by in turnleft
,aParser
,right
But will be ignored in the endleft
,right
Only the parsed result is returnedaParser
Results.
Now we can write a literal Parser that parses characters wrapped in pairs of single quotes: Parser
public let charLiteral = element.between(char("'"), char("'"))
let str = "'A'"
let result = charLiteral.parse(str)
Copy the code
Result = success(“A”).
choose
Sometimes when a Parser fails to parse, we want another Parser to take over and parse again. Like Swift’s?? As with the operator, if the left-hand value is not empty, the left-hand value is returned, otherwise the right-hand value is returned. The above scenario corresponds to the concept of functional programming: Alternative.
Before implementing Alternative for Parser, we extend Error:
public extension Error {
func merge(with another: Error) -> Error {
position > another.position ? self : another
}
}
Copy the code
The merge method added here for Error is used to merge errors, which takes the furthest of the two errors and returns.
Why add such a method? Consider the scenario where we want bParser to try again when aParser fails, but both aParser and bParser fail. Then two errors are generated, but which Error is returned? Merge allows us to make this decision by taking the Error that is furthest resolved so that the resulting Error information is more accurate and understandable.
Next we implement Alternative for Parser, which is really just a function: or
// MARK: - Alternative
public extension Parser {
func or(_ another: Parser<Value>) -> Parser<Value> {
flatMapError { error in
another.mapError(error.merge)
}
}
}
Copy the code
If the current Parser fails to be parsed, the Parser passed in with the OR argument will try again. If both parses fail, the Error generated during the parsing process is merged.
back
Wait, is there something wrong with the implementation of choice?
let str = "Hello"
let parser = char("W").or(char("H"))
let result = parser.parse(str)
Copy the code
Failure (Error(stream: “Hello World”, position: Swift.string.Index(_rawBits: 131329), message: “Expecting H”)?
As shown in the figure above, the white triangle represents the position char(“W”) is parsed to, where the element is the character H. This is not sufficient, so the Parser fails. Due to the effect of or, the char(“H”) passed in will continue to be parsed, but because one element of the input string has already been consumed, the next position to be parsed is at the red triangle, where the element is already character E, so parsing fails.
To solve this problem, we need to backtrack: if char(“W”) fails to parse, let char(“H”) still parse at the white triangle.
To implement backtracking, we need to record the current parse position of the input string before parsing. When backtracking is needed, the parse position is reset with this record. For code elegance, we can extend the following methods for the Context:
public extension Context {
func `do`<T> (_ work: (Context) - >Result<T.Error>) -> Result<T.Error> {
let rewind = { [_cursor] in self._cursor = _cursor }
let result = work(self)
if case .failure = result { rewind() }
return result
}
}
Copy the code
The do method accepts a nonescaping closure argument, which takes Context itself and returns a Result with a generic type. The closure is called inside the DO method. Before the work closure is called, DO captures the parse location of the current Context and constructs a closure for rollback. When the closure is called, its result value is returned unchanged by the DO method, but when the result indicates failure, the rollback closure is called before the method returns, and the Context’s parse position is reset.
Now we can modify the Parser initialization method a little bit so that all Parser logic runs in Context do:
public struct Parser<Value> {
public typealias Result = Swift.Result<Value.Error>
public let parse: (Context) - >Result
public init(_ parse: @escaping (Context) - >Result) {
self.parse = { $0.do(parse) } / / ⚠ ️ here}}Copy the code
OK, the Parser implementation with traceback is complete. Let’s try the previous code again:
let str = "Hello"
let parser = char("W").or(char("H"))
let result = parser.parse(str)
Copy the code
Test result is SUCCESS (“H”), as expected!
Many & Some
Sometimes we want to run the parsing logic of a Parser multiple times and collect the results. For example, calling element multiple times yields a string of multiple characters. To do this, we can construct the many and some combinators:
Let’s start with some: some means that the Parser must be successfully parsed at least once or more. If none of the parses succeed, an error is returned, and if the parses do not fail, they continue running, collecting the results of previous parses in an array.
public extension Parser {
var some: Parser"[Value]> {
flatMap { first in
.init {
var result = [first]
while case .success(let value) = self.parse($0) {
result.append(value)
}
return .success(result)
}
}
}
}
Copy the code
Many is less restrictive than some: Parser allows zero or more successful parses. If the Parser fails on the first attempt, an empty array is returned:
public extension Parser {
var many: Parser"[Value] > {some.or(.just([]))
}
}
Copy the code
Many uses or to connect some to a Parser that only returns an empty array, so when some parsing fails, the empty array is returned as the result.
Next we can implement a string literal Parser that parses a pair of double quotes: Parser
public let stringLiteral = notChar("\"").many
.map { String($0) }
.between(char("\""), char("\""))
Copy the code
notChar("\"").many
Constructor for parsing zero or more non-double quotes"
The character of- The result type obtained above is a character array
[Character]
This is used heremap
Convert the result toString
type between(char("\""), char("\""))
othersParser
The result is between the double quote pairs
Test it out:
let str = #""Hello World""#
let result = stringLiteral.parse(str)
Copy the code
The result is success(“Hello World”).
The operator
To improve code brevity and readability, we can write some Combinator operators:
precedencegroup AlternativePrecedence {
associativity: left
}
precedencegroup FunctorPrecedence {
associativity: left
higherThan: AlternativePrecedence
}
infix operator <|> : AlternativePrecedence
infix operator * > : FunctorPrecedence
infix operator < * : FunctorPrecedence
public func < * <L.R> (lhs: Parser<L>, rhs: Parser<R>) -> Parser<L> {
lhs.usePrevious(rhs)
}
public func * > <L.R> (lhs: Parser<L>, rhs: Parser<R>) -> Parser<R> {
lhs.useNext(rhs)
}
public func <|> <T> (lhs: Parser<T>, rhs: Parser<T>) -> Parser<T> {
lhs.or(rhs)
}
Copy the code
At this point, we have implemented the entire Parser Combinator library. There are a lot of interesting and useful Combinators missing here, but I won’t go into detail here for the length of the article, so interested readers can try to find out for themselves. Swift also provides a lot of apis for characters and strings, which can be used to construct a lot of cool combinators.
Let’s try to use it to write a little parser.
A profound
Using the Parser Combinator library already implemented above, we tried to write a Parser for string key-value pairs as follows:
Keys and values are wrapped in double quotes around string literals separated by arrows. For example, “key” => “value”, where several Spaces are allowed on the left and right sides of the arrow. With line breaks, we can draw up multiple key-value pairs:
"Name" => "Tangent" "City" => "Shenzhen" "Introduction" => "iOS developer"Copy the code
From simple to complex and from fine-grained to coarse-grained, let’s implement the parser step by step:
First build Parser for “Spaces” and “line feeds” for later use:
public let space = element.filter("expecting whitespace") { $0 = = "" }
.map { _ in }
public let newline = element.filter("expecting newline") { $0.isNewline }
.map { _ in }
Copy the code
For parsing string literals, the previous section implemented Parser: stringLiteral, which is used to extract the contents between double quote pairs of string literals.
Next comes the parsing of arrows:
let arrow = string("= >").between(space.many, space.many)
Copy the code
Since the rule allows any number of Spaces to the left and right of the arrow, between(space.many, space.many) is used.
Build the key-value pair Parser below:
typealias Entry = (key: String, value: String)
let entry: Parser<Entry> = stringLiteral.flatMap { key in
(arrow * > stringLiteral).map { value in
(key, value)
}
}
Copy the code
We first declare that the key-value pair is of type a binary (key: String, value:). StringLiteral > Arrow > stringLiteral is parsed and matched by flatMap, useNext and Map. And extract the values of the left and right stringLiteral to construct the key-value pair Entry.
Finally, we need to build a Parser that parses multiple key-value pairs:
let entries = (entry < * (newline <|> endOfStream)).some
Copy the code
Because after the rules require each key value for parsing, followed by either a newline, either at the end of the input string (newline or endOfStream), so use the entry here < * (newline < | > endOfStream) combination, Then use some to make the parse match one or more times.
Now that the key-value pair mini parser is complete, let’s test it:
let string = "" "" the name "= >" Tangent "" city," "the introduction" = > "shenzhen" = > "iOS developers" "" "
let result = entries.parse(string)
Copy the code
Result output:
Success ([(value: the key: "name", "Tangent"), (key: "city", the value: "shenzhen"), (key: "the introduction", the value: "iOS developers")])Copy the code
conclusion
This article takes you through building a lightweight Parser Combinator library to implement a simple Parser. The purpose is to make you more deeply feel the charm of functional programming, and deepen the understanding of functional programming related concepts. If I had time later, I might also consider writing a JSON Parser using Parser Combinator.
If you have any questions or corrections, welcome to comment, thank you!
reference
haskell/parsec
Build Parser Combinator with Haskell