How to Read an RFC By Mark Nottingham 31 July 2018
For better or worse, Requests for Comments (RFCs) are how we specify many protocols on the Internet. These documents are alternatively treated as holy texts by developers who parse them for hidden meanings, Then shunned as irrelevant because they can’t be understood. This often leads to frustration and — more significantly — interoperability and security issues.
For better or worse, many protocols on the Internet are regulated through Requests for Comments (RFC). Developers analyze the text as if it were a bible, and then avoid it because they can’t understand it. This often leads to frustration and, more seriously, communication barriers and security issues.
However, with some insight into how they’re constructed and published, it’s a bit easier to understand what you’re looking at. Here’s my take, informed from my experiences with HTTP and a few other things.
Still, it’s easier to understand what you’re seeing if you know a little bit about how they build and release. Here’s what I learned, mostly from my experience with HTTP and a few other things.
Where to start
The canonical place to find RFCs is the RFC Editor Web Site. However, as we’ll see below, some key information is missing there, so most people use tools.ietf.org.
The authoritative source for finding RFC specifications is the RFC Editor website. But as we’ll see below, the site is missing some information, so most people use tools.ietf.org instead.
Even finding the right RFC can be difficult since there are so many (currently, nearly 9,000!) . Obviously you can find them with general Web search engines, and the RFC Editor has an excellent search facility on their site.
Because of the large number of RFC documents (currently close to 9000!) , so it is difficult to find the required RFC. But you can certainly find it through web search engines and the search facility at the RFC Editor website.
Another option is EveryRFC, which I put together to allow searching RFCs by their titles and keywords, and exploration by tags.
Another approach is the EveryRFC site, which allows you to find by RFC title and keyword, and navigate by tag category.
It’s no secret that plain text RFCs are difficult to read bordering on ugly, but things are about to improve; the RFC Editor is wrapping up a new RFC format, with much more pleasing presentation and the option for customisation. In the meantime, if you want more usable RFCs, you can use third-party repositories for selected ones; for example, greenbytes keeps a list of WebDAV-related RFCs, and the HTTP Working Group maintains a selection of those related to HTTP.
Reading plain text RFC is notoriously difficult, but fortunately things are looking up – the RFC Editor website is being packaged with a new RFC format that makes presentation more comfortable and offers customization options. In addition, if you want to find more useful RFCS, you can use a third-party repository; For example, GreenBytes maintains a list of WebDAV-related RFCS, and the HTTP Working Group maintains an HTTP related document.
What are the types of RFC?
All RFCs have a banner at the top that looks something like this:
All RFC documents have a “banner” at the top that looks like this:
Internet Engineering Task Force (IETF) R. Fielding, Ed.
Request for Comments: 7230 Adobe
Obsoletes: 2145, 2616 J. Reschke, Ed.
Updates: 2817, 2818 greenbytes
Category: Standards Track June 2014
ISSN: 2070-1721
Copy the code
At the top left, This one says “Internet Engineering Task Force (IETF)” That indicates That this is a product of the IETF; Although it’s not widely known, there are other ways to publish an RFC that don’t require IETF consensus; for example, the independent stream.
On the top left, it reads Internet Engineering Task Force (IETF), which means it is the work of the Internet Engineering Task Force (IETF). Although not widely known, there are other ways to publish an RFC, such as standalone publishing streams.
In fact, There are a number of “streams” that a document can be published on. Only the IETF stream indicates that the entire IETF Have reviewed and has declared consensus on a protocol’s specification.
There are also many “release streams.” Only the IETF release stream indicates that the SPECIFICATION of the protocol has been reviewed and agreed upon by the IETF organization.
Older documents (before about RFC5705) say “Network Working Group” there, so you have to dig a bit more to find out whether they represent IETF consensus; look at the “Status of this Memo” section for a start, as well as the RFC Editor site.
Older documents (those before RFC5705) say “Network Working Group”, so you need to dig a little deeper to see if they represent the IETF consensus; For this, we can use the “Status of this Memo” as an entry point, as well as the RFC Editor website.
Under that is the “Request for Comments” number. If it says “Internet-draft” instead, it’s not an RFC; It ‘s just a proposal, And anyone can write one. Just because something is an Internet-draft doesn’t mean it’ll ever be adopted by the IETF.
And then the Request for Comments number. If “Internet-Draft” is written in this column, it indicates that this document is not an RFC document, it is only a proposal, and anyone can write one, because Internet-Draft does not mean that it has been adopted by the IETF.
Category is one of “Standards Track”, “Informational”, “Experimental”, Or “Best Current Practice”. The distinctions between these are sometimes fuzzy, one’s work in work, But if it’s produced by the IETF (see above), it’s had a reasonable amount of review. However, Note that Informational and Experimental are not standards, even if there’s IETF consensus to publish.
Category is divided into “Standard Track”, “Informational”, “Experimental” and “Best Current Practice”. The distinction is sometimes fuzzy, but you can be sure that if it’s made by the IETF, it’s heavily vetted. Note, however, that Informational and Experiment are not standards, even if published with IETF approval.
Finally, the authors of the document are listed on the right side of the header. Unlike in academia, this is not a comprehensive list of who contributed to the document; 6. Not only the initials, but also the initials, This is literally “who wrote the document.” Often, you’ll see “Ed.” appended, which indicates that they were acting as an editor, often because the text was pre-existing (like when an RFC is revised).
Finally, listed to the right of the banner is the author of the document. 6. Some people (g) not some people who have contributed to the document, as in academic articles, this information is listed in the “gets” section at the end of the document; In the RFC, it’s just “who wrote the document?”
How do I determine if it is the latest version?
RFCs are an archival series of documents; They can ‘t change, even by one character (see the diff between RFC7158 and RFC7159 for an example of this taken to the extreme; they got the year wrong 😉 .
An RFC is an archived series of documents that cannot be changed, even by a single character, once identified (see the extreme example of the comparison between RFC7158 and RFC7159 before they got the year wrong).
As a result, it’s important to know that you’re looking at the right document. The header contains a couple of bits of metadata that help here:
So it’s important to know if you’re looking for the right one. This can be helped by some meta information contained at the top of the document:
-
Obsoletes: lists the RFCs that this document completely replaces; i.e., you should be using this document, Not that one. Note that an old version of a protocol isn’t obsoleted when a newer one comes out; For example, HTTP/2 doesn’t obsolete HTTP/1.1, Because it’s still legitimate (and necessary) to implement the older protocol. However, RFC7230 did obsolete RFC2616, Because it’s the reference for that protocol.
Lists the RFCS that have been completely replaced by this document. What it says is, you should use this current document instead of these. Note: the old version of the protocol is not necessarily superseded by the new version; For example, HTTP/2 does not eliminate HTTP/1.1 because it is still legal (and necessary), but RFC7230 eliminates RFC2616 because it is a reference to that protocol.
-
Updates: Lists the RFCs that this document makes substantive changes to; In other words, if you’re reading that other document, you should probably read this one too.
Lists RFCS that have been substantially updated by this document. That is, if you read these documents, you should also read this.
Unfortunately, the ASCII text RFCs (e.g., at the RFC Editor site) don’t tell you what documents update or obsolete the document you’re currently looking at. This is why most people use the RFC repository at tools.ietf.org, which puts this information in a banner like this:
Unfortunately, the plain text presentation version of the RFC (such as the document on the RFC Editor website) does not tell you which documents are updating or deprecating the one you are reading. This is why many people prefer to use the RFC document library at Tools.ietf.org, which gives this information at the top of the document, like this:
[Docs] [txt|pdf] [draft-ietf-http...] [Tracker] [Diff1] [Diff2] [Errata] Obsoleted by: 7230, 7231, 7232, 7233, 7234, 7235 DRAFT STANDARD Updated by: 2817, 5785, 6266, 6585 Errata ExistCopy the code
Each of the numbers on the tools page is a link, so you can easily find the current document.
The numbers above are a link, so you can easily find the latest documents.
Even the most current RFC often has issues. In the tools banner, you’ll also see a warning on the right that “Errata Exist” along with a link to Errata above it.
Even the most recent RFC documents are prone to errors, so on the right side of the toolbar you can also see the words “Errata Exsit” with links to Errata information.
Errata are corrections and clarifications to the document that aren’t worthy of publishing a new RFC. Sometimes they can have a substantial impact on how the RFC is implemented (for example, if a bug in the spec led to a significant misinterpretation), so they’re worth going through.
Errata is a few corrections and clarifications about this document, but not to the extent that it is worth republishing a new RFC. At some point, though, they will have an impact on the implementation of the RFC specification (for example, if there is a bug in the specification, there will be a serious misunderstanding), so it’s worth checking out.
For example, here are the errata for RFC7230. When reading errata, keep their status in mind; many are rejected because someone just misread the spec.
Here, for example, is the erratum for RFC7230. Pay attention to their status as you read. Many of the things that are rejected are actually misinterpreted.
Understand context
It’s more common than you might think for a developer to look at a statement in an RFC, implement what they see, and do the opposite of what the authors intended.
It is more common than you might think for developers to look at a statement in an RFC and implement what they see against the intent of the specification authors.
This is because it’s extremely difficult to write a specification in a manner that can’t be misinterpreted when reading it selectively (as is the case with any holy text).
This is because it is extremely difficult (as is the case with any biblical statement) to write a specification that can be read selectively without misunderstanding.
As a result, It’s necessary to read not only the directly relevant text but also (at a minimum) anything that it references, Whether that’s in the same spec or a different one. In a pinch, Read any potentially related sections will help immensely, if you can’t read the whole document.
Therefore, read not only the text that is directly relevant, but also the text that it refers to, whether within the specification or from another specification. In a hurry, if it is not possible to read the entire document, it will be of great help to read all passages that may be relevant.
For example, HTTP message headers are defined to be separated by CRLF, but if you skip down here, you’ll see that “a recipient MAY recognize a single LF as a line terminator and ignore any preceding CR.” Obvious, right?
For example, the HTTP header definition uses CRLF splitting, but if you look down here, A Recipient MAY recognize a single LF as a line terminator and ignore any preceding CR Character as line terminator and ignore the CR) immediately preceding it.
It’s also important to keep in mind that many protocols set up IANA registries to manage their extension points; these, not the specifications, are the sources of truth. For example, the canonical list of HTTP methods is in this registry, not any of the HTTP specifications.
Another important point is that many protocols have IANA registries to manage their own extensibility points, which are actual facts, not specification documents. For example, the authority for the list of HTTP methods resides in the registry, not in any HTTP specification.
Interpreting rule text
Almost all RFCs have boilerplate that looks something like this near the top:
Almost all RFC documents have a bit of template text near the top like this:
The key words "MUST"."MUST NOT"."REQUIRED"."SHALL"."SHALL NOT"."SHOULD"."SHOULD NOT"."RECOMMENDED"."NOT RECOMMENDED"."MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
Copy the code
These RFC2119 keywords help define interoperability, but they also sometimes confuse developers. It’s very common to see a specification say something like:
These keywords, as defined in RFC2119, are useful for communication, but can sometimes be confusing to developers. It is common to see the following in a specification:
The Foo message MUST NOT contain a Bar header.Copy the code
This requirement is placed upon a protocol artefact, the “Foo message”. It’s pretty clear it needs to not contain a Bar header; If you include one, it won’t be a conformant message.
This requirement describes a message in the protocol, the “Foo message.” If you send a message like this, it says explicitly that you do not need to include a Bar header; If you do, it is not a canonical message.
However, the behaviour of the recipient is much less clear; if you see a Foo message with a Bar header, what do you do?
For the recipient of the message, however, it is not explicit; What do you do if you receive a message for Foo with a Bar header?
Some developers will reject a message that contains it, even though the specification says nothing about doing so. Others will still process the message, But strip the Bar header, or ignore it — even when the spec explicitly says that all headers need to be processed.
Some developers choose to reject such a message — even though the specification documentation is not clear on what action to take; Other developers choose to continue processing the message, but either discard or ignore the Bar header — even though the specification document clearly states that all headers need to be processed.
All of these things can — unintentionally — cause interoperability issues. The correct thing to do is to follow normal Processing for the header unless there’s a specific requirement to the contrary.
All of this can cause unexpected communication problems. The right thing to do is to follow the normal rules for header processing, unless specified in the specification.
That’s because in general, the specifications are written so That exposures are overtly specified; in other words, everything that is not explicitly disallowed is allowed. Therefore, reading too much into specifications can unintentionally cause harm, Today you’ll be introducing new behaviours that others will have to work around.
This is because when norms are broadly defined, behaviors are broadly defined; The implication is that anything that is not explicitly forbidden is allowed. So the more you read too much into the norms the more likely you are to have unintended effects, because you tend to introduce more behaviors that other people need to find ways to avoid.
In an ideal world, the specification would be defined in terms of the behaviours of those who handle the message, like this:
Ideally, specifications for documents are defined in terms of the behavior of the people handling the messages, like this:
Senders of the Foo message MUST NOT include a Bar header. Recipients of a Foo message that includes a Bar header MUST Ignore the Bar header, but MUST NOT remove it. The recipient of a message to Foo containing a Bar header must ** * ignore the Bar header, but ** must ** not discard it.Copy the code
Absent that, it’s best to look for more general advice about error handling elsewhere in the specification (e.g., HTTP’s Conformance and Error Handling sections).
If not, it is best to look at the specification’s broader recommendations for Error Handling (for example, the HTTP specification’s “Conformance and Error Handling behavior specifications and exception Handling” section)
Also, keep in mind the target of requirements; most specifications have a highly developed set of terms that they use to distinguish between different roles in the protocol.
Also, always remember the requirements of the rules; Many specifications have a set of carefully crafted terms that define the responsibilities of the different roles in the protocol.
For example, HTTP has proxies, which are a kind of intermediary, which implement both a client and a server (but not a User-Agent or an origin server); they need to pay attention to requirements targeted at all of those roles.
For example, the HTTP proxy, which is an intermediate layer that acts as both a client and a server (but not really a user-agent and source server), needs to pay attention to the rule requirements of both roles.
Likewise, HTTP distinguishes between “generating” a message and merely “forwarding” it in some requirements, depending on the specific situation. Paying attention to this kind of specific terminology can save you a lot of guesswork.
Similarly, HTTP is the use of “generating” messages or “forwarding” messages in rules, as necessary, to clearly define them. Having multiple concerns for this particular kind of term can save you a lot of guessing time.
SHOULD
Yep, SHOULD deserves its own section. This wishy-washy term plagues many RFCs, despite efforts to eradicate it. RFC2119 describes it as:
That’s right. SHOULD deserves a chapter. This “ambiguous” word is plaguing many RFC documents, despite attempts to eradicate it. RFC2119 describes it as follows:
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
Copy the code
In practice, authors often use SHOULD and SHOULD NOT to mean “We’d like you to do this, But we know we can’t always require it.”
Instead, writers prefer to use “SHOULD” and “SHOULD NOT” to mean “we want you to do this, but we know we can’t always ask you to do it.”
For example, in the overview of HTTP methods, we see:
For example, in the HTTP Request Method Overview section we can see:
When a request method is received that is unrecognized or not
implemented by an origin server, the origin server SHOULD respond
with the 501 (Not Implemented) status code. When a request method
is received that is known by an origin server but not allowed forthe target resource, the origin server SHOULD respond with the 405 (Method Not Allowed) status code. When the server receives a request method that is Not recognized or Not Implementd supported, the server ** should ** respond with a 501 (Not Implementd) status code. When a server receives a request for a Method whose target resource is Not Allowed to access, the server should respond with a 405 status code.Copy the code
These SHOULDs are not MUSTs because the server might reasonably decide to take another action; if the request is from a client that is believed to be an attacker, it might drop the connection, or if HTTP authentication is required for the resource, it might enforce that with a 401 (Not Authenticated) before getting to the 405.
“SHOULD” is not “MUST” because the server may have a reasonable reason to do something else. For example, if a request can be trusted to come from an attacker, the server may discard the connection. Alternatively, if the requested resource requires authentication, the server may force a response to 401 (Not Authenticated) before it reaches 405.
SHOULD doesn’t mean that the server is free to ignore a requirement because it doesn’t feel like honouring it.
SHOULD also does not mean that the server is free to ignore the rule’s requirements if it doesn’t like it.
Sometimes, we see a SHOULD that follows this form:
Sometimes, we can see the text form of SHOULD like this:
A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media typeof the enclosed representation is unknown to the sender. When a sender generates a message containing a message body, ** should ** also generate a Content-Type header field, unless the sender has no way of knowing the media Type of the overall message to which part of the message being sent belongs.Copy the code
Notice the “unless” — it’s specifying the “particular circumstances” that the SHOULD allows specified as a MUST, since the unless clause would still apply, but this style of specification is somewhat common.
Notice the word “unless” above — it points to a specific condition that SHOULD SHOULD allow. Here, you can use MUST as well, because unless is also a constraint, but SHUOLD is a little more generic.
Read the sample
Another very common pitfall is to skim the specification for examples, and implement what they do.
There is one pitfall to be aware of when looking through the specification documentation for examples and what they are intended to implement.
Unfortunately, examples typically get the least amount of attention from authors, since they need to be updated with each change to the protocol.
The bad truth is that examples are usually the ones that the author pays the least attention to because they need to be synchronized with every change in the protocol.
As a result, they’re very often the least reliable parts of the spec. Yes, the authors should absolutely double-check the examples before publication, but errors do slip through.
As a result, they are the least reliable part of the specification. Yes, the authors of the specification should have reviewed the examples before publishing, but that doesn’t mean there aren’t bugs.
Also, even a perfect example might not be intended to illustrate the aspect of the protocol you’re looking for; they’re often truncated for brevity, or shown after an decoding step takes place.
In addition, even if an example is perfect, it may not show what you are looking for about the protocol; These examples are usually condensed for brevity or show only after the decoding step.
Even though it takes more time, it’s better to read the actual text; examples are not the specification.
So be sure to look at the text, even if it takes more time — examples are not specifications.
On ABNF
Augmented BNF is often used to define protocol artefacts. For example:
The content in the protocol is usually represented using the enhanced BNF paradigm. Such as:
FooHeader = 1#foo
foo = 1*9DIGIT [ ";" "bar" ]
Copy the code
Once you get used to it, ABNF offers an easy-to-understand sketch of what protocol elements should look like.
Once you get used to ABNF, you can easily understand what the elements of the protocol should look like.
However, ABNF is “aspirational” – it identifies an ideal form for a message, and those messages that you generate really need to match it. It doesn’t specify what to do with received messages that fail to match it. In fact, many specifications fail to say what the relationship of ABNF is to processing requirements at all.
ABNF provides an ideal representation for messages. You need to satisfy its rules when you generate a message, but it does not specify what to do when you receive a message that does not meet the rules. In fact, many specifications fail to state that the role of ABNF is to describe the processing required by the rules.
Most protocols will fail badly if you try to enforce their ABNF strictly, but sometimes it matters. In the example above, whitespace isn’t allowed around the semicolon, but you can bet that some people will put it there, and some implementations will accept it.
Many agreements fall short if you strictly enforce ABNF rules, but sometimes that can be a good thing. In the example above, whitespace characters are not allowed before or after a semicolon, but you can be sure that some people will place whitespace characters there, and some rule implementers will accept it.
So, make sure you read the text around the ABNF for additional requirements or context, and realise that absent a direct requirement, you may have to adjust parsing to be more accepting of input than the ABNF implies.
So, for context and additional rule requirements, be sure to look at the text before and after ABNF and form an awareness that you may have to relax ABNF when parsing messages.
Some specifications are starting to acknowledge the aspirational nature of ABNF and specifying explicit parsing algorithms that incorporate error handling. When specified, these should be followed exactly, to ensure interoperability.
Some specification documents take the power of ABNF even further and explicitly specify parsing algorithms and what to do if an error occurs. When this happens, you should strictly follow it to ensure communication.
Security issues
Ever since the RFC3552, the RFC Boilerplate has included a “Security Considerations” section.
Since the release of RFC3552, the “Security Considerations” section has appeared in the TEMPLATE of the RFC.
As a result, it’s rare for an RFC to be published without a substantial section on security; the review process does not allow a draft to just say “There are no security considerations for this protocol”.
As a result, subsequent RFC’s rarely appear without a section on security; During the review process, a draft document will not be allowed to say “this protocol does not require security considerations.”
So, it pays to read and make sure you understand the Security Considerations section, whether you’re implementing or deploying the protocol; if you don’t, it’s very likely that something will bite you down the road.
Therefore, you need to make an effort to read the “Security Considerations” section — whether implementing or deploying the protocol; If you don’t, chances are you’ll roll over.
Following its references (if any) is also a good idea. try looking up some of the terms used to get an appreciation of the issues being discussed.
It’s also a good habit to follow the references in it (if any). If there aren’t any, try looking up some of the terms used to get an appreciation of the issues being discussed.
For more information
If an RFC doesn’t answer your question, or you’re not sure about the intent of its text, The best thing to do is to find the most relevant Working Group and ask a question on their mailing list. If there isn’t an active working group covering the topic in question, try the mailing list for the appropriate area.
If the RFC doesn’t answer your question, or if you’re not sure what the text says means what you understand, the best way to do that is to find the relevant working group and ask them on their mailing list. If you don’t have an active working group on the topic, try asking a question on the mailing list of your domain.
Filing an errata is usually not the first step you should take — talk to someone first.
We shouldn’t start with an erratum — we should talk to other people first.
Many Working Groups are now using Github for managing their specifications; If you have a question about an active specification, go ahead and file an issue. If it’s already an RFC, Mailing list unless you find directions to the opposite.
Many workgroups now choose to use GIthub to manage their specifications; If you have any questions, you can submit a question feedback. If the specification is already an RFC, it is best to send feedback to the RFC mailing list.
I’m sure there’s more to write about how to read RFCs, and some will dispute what I’ve written here, but this is how I think about them. I hope it was useful.
I know there’s more to write about how to read the RFC, and I know there will be some people who will comment on what I’ve written, but these are just my own thoughts and I hope you find them useful.