The most detailed CSS character escape handling

CSS Character Escape sequences by Mathias, 12th July 2010

Look at the original

When writing CSS for [markup with weird `class` or `id` attribute values](https://mathiasbynens.be/notes/html5-id-class), you need to [consider](https://www.w3.org/TR/CSS21/syndata.html#characters) some [rules] (https://www.w3.org/International/questions/qa-escapes#cssescapes). For example, you can ‘t just use ` # # {color: #f00; } ‘to target the element with’ id=”#” ‘. Instead, you’ll have to escape the weird characters (in this case, the second `#`). Doing so will cancel the meaning of special CSS characters in identifiers and allows you to refer to characters you cannot easily type out, like crazy Unicode symbols.

There are some other cases where you might want or need to escape a character in CSS. You could be writing a selector for a funky id, class, attribute or attribute value, for example; or maybe you want to insert some weird characters using the content property without changing your CSS file’s character encoding.

There are some limitations to consider when writing CSS styles for tags with strange class or ID attribute values. For example, you can’t use ## {color: #foo; } to match elements like id=”#”; Instead, these weird characters should be escaped (in this case, the second “#”), which removes the meaning of special CSS characters contained in identifiers and references to characters that cannot be simply typed out, such as maddeningly Unicode symbols.

There are other cases where you may or need to use escape to escape a character in CSS. For example, you might write a selector for an interesting ID, class, attribute, or attribute value; Or you might want to use the Content property expression to insert a strange character without changing the character encoding of your CSS file.

Identifiers and strings in the CSS

Look at the original

[The spec](http://dev.w3.org/csswg/css-syntax/#ident-token-diagram) defines *identifiers* using a token diagram. They may contain the symbols from `a` to `z`, from `A` to `Z`, from `0` to `9`, underscores (`_`), hyphens `-`, non-ASCII symbols or escape sequences for any symbol. They cannot start with a digit, or a hyphen (`-`) followed by a digit. Identifiers require at least one symbol (i.e. the empty string is not a valid identifier).

The grammar for identifiers is used for various things throughout the specification, including element names, class names, and IDs in selectors.

The spec definition for strings says that strings can either be written with double quotes or with single quotes. Double quotes cannot occur inside double quotes, unless escaped (e.g., as '\"' or as '\22'). The same goes for single quotes (e.g., "\'" or "\27"). A string cannot directly contain a newline. To include a newline in a string, use an escape sequence representing the line feed character (U+000A), such as "\A" or "\00000a". Newlines can also be represented as "\D \A " (CRLF), "\D " (i.e. \r in other languages), or "\C " (i.e. \f in other languages). It’s possible to break strings over several lines, for aesthetic or other reasons, but in such a case the newline itself has to be escaped with a backslash (\).

As you can see, character escapes are allowed in both identifiers and strings. So, let’s find out how these escape sequences work!

The CSS syntax specification defines identifiers using token diagrams. Identifiers can include a to Z, a to Z, 0 to 9, underscores, hyphens, non-ASCII characters, and escape sequences for any character. However, it cannot begin with a number symbol or a conjunction symbol immediately following a number symbol; And the identifier contains at least one character (that is, a null character is an incorrect identifier).

The syntax of identifiers is referenced in many parts of the specification, including element names, class names, and ids in selectors.

For the definition of a string, the specification says: you can use double or single quotation marks, but double quotation marks cannot appear inside double quotation marks unless they are escaped (for example, ‘\”‘ or ‘\22’); The same goes for single quotes; Strings cannot contain line breaks directly. Escape sequences are used to represent line breaks (U+000A), such as “\A” or “\00000a”; Line breaks can also be denoted by “\ D \A “(CRLF), “\ D” (“\r” in other languages) or “\ C “(“\ f” in other languages); Strings can be broken up into lines for aesthetic or other reasons, but the newline character itself needs to be escaped with a backslash (\).

Now that we know that character escape is supported in both identifiers and strings, let’s look at how these escape sequences are used.

How to escape in CSS

Look at the original

Here’s a ~~simple~ list of rules you should keep in mind when escaping a character in CSS. Keep in mind that if you’re writing a selector for a given classname or ID, The strict syntax for identifiers applies. If you’re using a (letter) string in CSS, You’ll only ever need to escape quotes or newline characters.

When escaping a character in CSS, you should keep these rules in mind. If you write a selector for a class or ID, use strict syntax for it; If you want to use strings (including quotes) in your CSS, you only need to escape quotes and newlines.

The beginning of digital

If the first character of an identifier is numeric, you’ll need to escape it based on its Unicode code point. For example, the code point for the character 1 is U+0031, so you would escape it as \000031 or \31.

Basically, to escape any numeric character, just prefix it with \3 and append a space character (). Yay Unicode!

If the first character of an identifier is a number, it needs to be escaped with its Unicode code. For example, if the Unicode code point for 1 is U+0031, escape with \000031 or \31.

Basically, all numeric characters need to be escaped is preceded by \3 and followed by a space.

Special characters

Look at the original

Any character that is not a hexadecimal digit, line feed, carriage return, or form feed can be escaped with a backslash to remove its special meaning.

The following characters have a special meaning in CSS: ! “, #, $, % & ‘, (), *, +, -. /, :; <, =, >,? , @, [, \, ], ^, “`, {, |, }, and ~.

There are two options if you want to use them. Either you use the Unicode code point — for example, the plus sign (+) is U+002B, so if you would want to use it in a CSS selector, you would escape it into \2b(note the space character at the end) or \00002b (using exactly six hexadecimal digits).

The second option is far more elegant though: just escape the character using a backslash (\), e.g. + would escape into \+.

Theoretically, the : character can be escaped as \:, but IE < 8 doesn’t recognize that escape sequence correctly. A workaround is to use \3Ainstead.

Any character that is not hexadecimal, line feed, carriage return, or page feed can be removed from its special meaning by backslashes.

The following characters have special meanings in CSS:! “, #, $, % & ‘, (), *, +, -. /, :; <, =, >,? , @, [, \], ^ `,,,, {, |,}, and ~.

If you want to use these characters, you have two options: first, use Unicode code points. For example, if the code point for the plus sign (+) is U+002B, use \2b (note the trailing space) or \00002b (use the full 6-digit hexadecimal number) to escape. The second is a little more elegant and simply uses a backslash (\), for example, + uses \+ escape.

In theory, : could be escaped using \:, but IE8 versions below do not recognize this escape sequence correctly. One solution is to use \3A instead.

White space characters

Whitespace — even some characters that are technically invalid in HTML attribute values — can be escaped as well.

Any characters matching [\t\n\v\f\r] need to be escaped based on their Unicode code points. The space character () can simply be backslashed (\). Other whitespace characters don’t need to be escaped.

Whitespace characters – although some characters are technically wrong in HTML attribute values – can also be escaped.

Characters that match [\t\n\v\f\r] need to be escaped according to Unicode codes; The space character () only needs to be escaped with a backslash (“\ “); Other whitespace characters do not need to be escaped.

The underline

CSS doesn’t require you to escape the assumptions (_) but if it appears at the start of an identifier, I’d recommend doing it anyway to prevent IE6 from ignoring the rule altogether.

CSS does not require the underscore (_) to be escaped, but IF it is at the beginning of an identifier, I recommend that it be escaped to avoid IE6 ignoring the entire style rule.

Other Unicode characters

Look at the original

Other than that, characters that can’t possibly convey any meaning in CSS (e.g. `♥`) can and **should** just be used unescaped.

In theory (as per the spec), Escaped any character can be escaped based on its Unicode code point as explained above (e.g. for 𝌆, U+1D306 “Tetragram for Centre” symbol: \1d306or \01d306), But older WebKit Browsers don’t support this syntax for characters outside the BMP (Fixed in April 2012).

Because of browser bugs, there is another (non-standard) way to escape these characters, namely by breaking them up in UTF-16 code units (e.g. \d834\df06), but this syntax (rightfully) isn’t supported in Gecko andOpera 12.

Since there is currently no way to escape non-BMP symbols in a cross-browser fashion without breaking backwards compatibility with older browsers, it’s best to just use these characters unescaped.

Except for these characters, characters that have no meaning can and should be left unescaped.

As a rule, all characters can be escaped with their Unicode code points — as mentioned above. (for example, the four-dash 𝌆 code point is U+1D306, which can be escaped with \ 1D306 or \01d306), but older Webkit browsers do not use the BMP plane (a character plane classified by the Unicode specification that contains the most commonly used characters). Each plane has 65536 or 2 to the 16th characters. Are not supported by this escape.] Older WebKit Browsers don’t support this syntax for characters outside the BMP. (Restored in April 2012)

Because of browser bugs, there is another (non-standard) way to escape these non-BMP characters by splitting their UTF-16 code points (e.g., \d834\ DF06), but this syntax is not supported by Gecko and Opera 12.

Since there is currently no cross-browser compatible way to escape non-BMP flat characters, it is best not to escape them.

The trailing whitespace character of a hexadecimal escape sequence

Look at the original

Any U+0020 space characters immediately following a hexadecimal escape sequence are automatically [consumed by the escape sequence](http://dev.w3.org/csswg/css-syntax/#consume-escaped-code-point). For example, to escape the text `foo © bar`, you would have to use `foo \A9 bar`, with two space characters following `\A9`. The first space character gets swallowed; only the second one is preserved.

The space character following a hexadecimal escape sequence can only be omitted if the next character is not another Space character and not a hexadecimal digit. For example, foo©bar becomes foo\A9 bar, But Foo ©qux could be written as foo\A9qux.

Any space character (U+0020) immediately following the hexadecimal escape sequence is automatically treated as part of the escape sequence. For example, to escape the text foo © bar, use foo \A9 bar, where \A9 is followed by two Spaces, the first being absorbed and only the second retained.

A space character immediately following a hexadecimal escape sequence can be omitted only if the character next to it is not a space character and is not a hexadecimal numeric character. For example, foo©bar corresponds to foo\A9 bar, and foo©qux can be written as foo\A9qux.

The sample

Here are some random examples:

. \ 3A\ ` \ [{}/* Matches the class=": '(" element */31. \a2b3c{}/* Matches the element of class="1a2b3c" */# \#fake-id {} /* Matches the element with id="#fake-id" */
#-a-b-c- {} /* Matches the element with id="-a-b-c-" */
#© { } /* Matches the element whose id="©" */
Copy the code

Check out the demo page for this post (@id and @class in HMTL5) to see more.

… What about in JS?

Look at the original

In JavaScript, it depends.

document.getElementById() and similar functions like document.getElementsByClassName() can just use the unescaped attribute value, the way it’s used in the HTML. Of course, you would have to escape any quotes so that you still end up with a valid JavaScript string.

On the other hand, if you were to use these selectors with the Selectors API (i.e. document.querySelector()and document.querySelectorAll()) or libraries that rely on the same syntax (e.g. jQuery/Sizzle), you would have to take the escaped CSS selectors and escape them again. All you really have to do is double every backslash in the CSS selector (and of course escape the quotes, where necessary):

For JavaScript, it depends.

Document. The getElementById (), and a similar approach, such as the document. The getElementsByClassName () used directly without escape before the attribute value can be used (HTML enthusiast that way of escape). Of course, you still need to escape quotes to make sure the string is syntactically correct.

And if you use the Selectors API (i.e., the document. The querySelector () and the document. The querySelectorAll ()) or use is dependent on the same syntax of library, you must use a escaped the CSS selector, Then escape — all you need to do is double each backslash (including quotes, if necessary).

<! -- HTML -->
<p class=": ` ("></p>
Copy the code

/* CSS */. \ 3A\ ` \ [{}Copy the code

/* JavaScript */
document.getElementsByClassName(': ` (');
document.querySelectorAll('.\\3A \\`\\(');
Copy the code

CSS escape tool

Remembering all these rules sure sounds like fun, but to make life a little easier I created a simple CSS escaper tool that does all the hard work for you.

These rules are fun, but to make things easier, I’ve created a simple CSS escaper tool that does all the hard work for you.

Look at the original

Just enter a value and it will tell you how to escape it in CSS and JavaScript, based on the rules above. It uses an `id` attribute in its example, but of course you could use the the same escaped string for `class` attribute values or the `content` property. Enjoy!

Need to escape text for use in CSS strings or identifiers? I’ve packaged the code that powers this tool as an open-source JavaScript library named cssesc. Check it out!

Simply enter a value and it will tell you how to escape in CSS and JavaScript based on the rules described above. Even though you’re using the ID attribute, you’re actually using the escaped string you get for the class attribute value and the Content attribute as well. Please enjoy!

No need to escape strings or identifiers in CSS? I’ve packaged the code that powers this tool into an open source JavaScript library called CSSESC, check it out!

Update: The CSS object model specification now defines a css.escape () ‘method that can be used to perform an escape. I made a shim library of it.

The most detailed CSS character escape handling

Identifiers and strings in the CSS

How to escape in CSS

The beginning of digital

Special characters

White space characters

The underline

Other Unicode characters

The trailing whitespace character of a hexadecimal escape sequence

The sample

… What about in JS?

CSS escape tool

Related Posts

New elements and attributes for HTML5 forms (Form, FormAction, FormMethod, FormencType, FormTarget, Autofous, Required, Labels)

Practice of MonorePO project based on LERNA + YARN Workspaces

Flying Pig micro front end: unified operation workbench solution