Introduction to the

People who write Parser, whether it’s a simple custom protocol or a complex protocol, usually interpret it from the top down, from the first byte, all the way to the last byte. Encounter; Use a judgment, encounter: use a match, etc., switch the corresponding case, so called worship god when you meet a ghost, kill a ghost when you meet a Buddha, but don’t know what to do. The problem is that, with error handling, if else can be too complex and messy to maintain over time. At the higher end, you might write a few more complex regular expressions, but you might end up forgetting what you wrote the regular expression for. Lex/YACC Flex/Bison is pretty easy to use and maintain, except for adding a description file that has nothing to do with the development language. However kill chicken how to use ox knife, so huge tool, it is necessary to cut originally small small JJ? !

Nom offers a slightly fancier way of thinking, presumably inspired by lego. As we know, Lego toys provide a very limited number of basic shape modules, through the combination of modules, you can achieve the creation of various objects, lifelike. The idea behind Nom is not to teach you how to construct a template file for a particular syntax like lex/yacc, nor to expect you to explain it from the top down. Rather, like Lego, nom provides a lot of basic base shapes, such as tag, take_while1, is_A, and so on, and provides ways to put them together. Precede Alt, tupple, precede, etc. A variety of parsers can be combined in a variety of ways without sacrificing any performance.

The nom function is highly consistent in design, with almost all function calls returning functions with the same signature:

Impl Fn(Input) -> IResult<Input, Output, Error> E = error::Error<I>> = Result<(I, O), Err<E>>; ParseError<I> ParseError<I> for ()Copy the code

The basic nom function building block is divided into two versions: the complete and stream versions. The biggest difference between the two versions is the Error reporting method. The Stream version reports Err::Incomplete(n) and the complete version reports Err::Error/Err::Failure. After that, there was no significant difference in usage.

Nom’s Parser and Combinator have it all, and you can combine anything you don’t have. In addition to the docs. Rs documentation, the authors are thoughtful enough to list it (because rust generated documentation is inconvenient to look at together). Here it is not cumbersome, see the author’s introduction to Choosing_A_COMBINator.

The sample

Tokio’s official tutorial tutorial is a good place to start learning about Rust, taking a familiar and familiar look at the concepts already in C/C++ and implementing them in Rust. Tokio’s official tutorial tutorial is to implement simple Redis functionality, the code repository is located at: Mini-Redis. Mini-redis parses the Redis protocol in one byte. Fortunately, it is simple, so the original implementation is not messy. The code repository is located at: github.com/buf1024/blo… , protocol resolution file: frame.rs

/// A frame in the Redis protocol. Debug) pub enum Frame {// + XXX \r\n Simple string Simple(string), // - XXX \r\n Error string Error(string), // : DDD \r\n U64 Integer(u64), // $dd\r\ NBBBBB \r\n Chunk Bulk(Bytes) with contents, // $-1\r\n empty chunk Null, // *dd\r\ NXXX \r\n array array (Vec<Frame>),}Copy the code

Mini-redis simply implements the five basic redis protocols. The implementation is not complicated, but it is pieced together. The modified nom version is as follows:

fn parse_simple(src: &str) -> IResult<&str, (Frame, usize)> { let (input, output) = delimited(tag("+"), take_until1("\r\n"), tag("\r\n"))(src)? ; Ok((input, (Frame::Simple(String::from(output)), src.len() - input.len()))) } fn parse_error(src: &str) -> IResult<&str, (Frame, usize)> { let (input, output) = delimited(tag("-"), take_until1("\r\n"), tag("\r\n"))(src)? ; Ok((input, (Frame::Error(String::from(output)), src.len() - input.len()))) } fn parse_decimal(src: &str) -> IResult<&str, (Frame, usize)> { let (input, output) = map_res( delimited(tag(":"), take_until1("\r\n"), tag("\r\n")), |s: &str| u64::from_str_radix(s, 10), )(src)? ; Ok((input, (Frame::Integer(output), src.len() - input.len()))) } fn parse_bulk(src: &str) -> IResult<&str, (Frame, usize)> { let (input, output) = alt(( map(tag("$-1\r\n"), |_| Frame::Null), map(terminated(length_value( map_res( delimited(tag("$"), take_until1("\r\n"), tag("\r\n")), |s: &str| u64::from_str_radix(s, 10)), rest), tag("\r\n"), ), |out| { let data = Bytes::copy_from_slice(out.as_bytes()); Frame::Bulk(data) })) )(src)? ; Ok((input, (output, src.len() - input.len()))) } fn parse_array(src: &str) -> IResult<&str, (Frame, usize)> { let (input, output) = map(length_count( map_res( delimited(tag("*"), take_until1("\r\n"), tag("\r\n")), |s: &str| { println! ("{}", s); u64::from_str_radix(s, 10) }), Frame::parse, ), |out| { let data = out.iter().map(|item| item.0.clone()).collect(); Frame::Array(data) }, )(src)? ; Ok((input, (output, src.len() - input.len()))) } pub fn parse(src: &str) -> IResult<&str, (Frame, usize)> { alt((Frame::parse_simple, Frame::parse_error, Frame::parse_decimal, Frame::parse_bulk, Frame::parse_array))(src) }Copy the code

As you can see, the nom version has a very clean way to do it, even in one line of code, without the byte by byte judgment of the original code to manipulate and move memory locations. It’s like lego, stacked on top of each other, simple and simple.

Testing:

^_^@~/rust-lib/mini-redis]$RUST_LOG=debug cargo run --bin mini-redis-server Compiling mini-redis v0.4.0 (~/rust-lib/mini-redis) Finished dev [unoptimized + debuginfo] target(s) in 7.44s Running 'target/debug/mini-redis-server' Sep 13 00:10:16.517 INFO mini_redis::server: Wipe inbound Connections Sep 13 00:10:40.261 DEBUG mini_redis:: Server: Accept address from 127.0.0.1:55931 5 Sep 13 00:10:40.264 DEBUG run: mini_redis:: Connection: nom frame: Array([Bulk(B "set"), Bulk(B "hello"), Bulk(B "world"), Bulk(B "px"), Integer(3600)]), N: 50 Sep 13 00:10:40.264 DEBUG run: mini_redis::server: cmd=Set(Set { key: "hello", value: b"world", expire: Some(3.6s)}) Sep 13 00:10:40.264 DEBUG run:apply: mini_redis:: CMD ::set: Response =Simple("OK") ^CSep 13 00:10:52.278 DEBUG my_exit: mini_redis_server: Press once more to exit ^CSep 13 00:10:52.934 DEBUG my_exit: mini_redis_server: Now exit Sep 13 00:10:52.934 INFO mini_redis::server: shutting down ^_^@~/rust-lib/mini-redis]$ RUST_LOG=debug cargo run --bin mini-redis-cli set hello world 3600 Compiling Mini-redis V0.4.0 (~/rust-lib/mini-redis) Finished dev [unoptimized + debuginfo] target(s) in 1.61s Running 'target/debug/mini-redis-cli set Hello world 3600' Sep 13 00:10:40.262 debug set_expires{key="hello" value=b"world" Expiration = 3.6 s} : mini_redis: : client: request=Array([Bulk(b"set"), Bulk(b"hello"), Bulk(b"world"), Bulk(b"px"), Integer(3600)]) Sep 13 00:10:40.265 DEBUG set_expires{key="hello" value=b"world" expiration=3.6s}: mini_redis::connection: nom frame: Simple("OK"), n: 5 Sep 13 00:10:40.265 DEBUG set_expires{key="hello" value=b"world" expiration=3.6s}: mini_redis::client: response=Some(Simple("OK")) OKCopy the code

summary

Rust’s NOM provides a lego-like approach to building parsers that can be built complex enough by combining basic parsers without losing any performance. Also, thanks to its naming and uniform function return form, nom-based parsers are semantically much cleaner than, say, get_line. So in the parsing task, in addition to a ready-made and mature parsing library, can completely consider the use of, rather than get_line, get_U8 as a byte or regular as a pattern, open black…