My translator is a front-end engineer of the Strange Dance Company
Headline: CSS for internationalization
Original author: Chen Hui Jing
Original address: chenhuijing.com/blog/css-fo…
I’ve met people who don’t think CSS has anything to do with internationalization at all, but if you think about it, internationalization isn’t just about translating the content on your site into multiple languages and then doing it for good. There are nuances in how this content is presented that affect the native speaker’s experience of using your site.
There is no single canonical definition of internationalization, but the W3C provides the following guidance:
“Internationalization is the design and development of a product, application, or document that can be easily localized for target audiences in different cultures, regions, or languages.”
From the use of Unicode and character encodings, to the implementation of technologies that service translated content, and the presentation of said content, this is an area that needs to be covered. Today, I’ll discuss only the CSS-related aspects of multilingual support.
CSS describes the presentation of a web page by telling the browser how elements on the page should be styled and laid out. On multilingual pages that use CSS, there are several ways to apply different styles to different languages.
In addition, CSS properties provide layout and typesetting capabilities for scripts and writing systems, not just the top-down, Latin-based horizontal typesetting capabilities that are found primarily on the WEB today.
So, brace yourself, because it could be quite a long article. ¯ _ (ツ) _ / ¯
Language-specific styles
Have you ever wondered how Chrome asks you if you want to translate the content of a web page? Er… Oh, no. Maybe I’m the only one who pays attention to these issues? This is because of the lang attribute on the HTML element.
The lang attribute is a very important one because it identifies the language of text content on the WEB, and this information is used in many places. The aforementioned Chrome has built-in translation, a search engine for language-specific content, and a screen reader.
Aha, maybe you don’t think of screen readers, but if you’re not a screen reader user or know someone who is, it probably doesn’t bother you. Screen readers use linguistic information to read out content with the appropriate accent and correct pronunciation.
The key to language-specific styling is the appropriate use of lang attributes in page tags. The lang property recognizes ISO 639[1] language code as a value.
Update :Tobias Bengfort[2] indicates that the lang attribute uses an IETF specification called BCP 47 [3], which is largely based on the ISO 639 standard.
In most cases, you’ll use two-letter codes like zh to represent Chinese, but Chinese (including other languages like Arabic) is considered a macro language, made up of many languages with more specific main language subtags.
For an in-depth explanation of how to construct language tags, see language tags in HTML and XML [4].
A general guideline is that HTML elements must always have a set of lang attributes, which are then inherited by all other elements.
<html lang="zh">
Copy the code
It is not uncommon to see content in different languages on the same page. In this case, you wrap the content with a SPAN or div and apply the correct lang attribute to the wrapped element.
<p>The fourth animal in The Chinese Zodiac is Rabbit (<span lang="zh"> </span>).</p>Copy the code
Now that we have classified it, the following technique assumes that the lang attribute has been implemented responsibly.
:lang() pseudo-class selector
Turns out: the lang() pseudo-class selector isn’t that famous.
But this pseudo-class selector is pretty cool because it recognizes the language of the content, even if that language is declared outside of the element.
For example, a line containing a tag in both languages looks like this:
We use <em>italics</em> to emphasise words in English, <span lang="zh">Copy the code
The following styles can be used:
em:lang(zh) {
font-style: normal;
text-emphasis: dot;
}
Copy the code
If your browser supports the TEXT-emphasis CSS attribute, you should be able to see the emphasis mark (a typesetting symbol traditionally used to emphasize a series of East Asian texts) added to each Chinese character in EM. Chrome needs the -webkit- prefix.
We use italics to emphasize words in English, but In Chinese we use emphasis signs.
But crucially, the lang attribute does not apply to the EM element, but to its parent element. Pseudo classes are still valid. If we use a more common attribute selector, such as [lang= “zh], the attribute must be on the EM element for it to take effect.
Use the property selector
This brings us to our next technique, using property selectors. These allow us to select elements with specific attributes or attributes with specific values. (Plug-in time, to learn more about property selectors, try your own Codrops CSS reference entry [5])
There are seven ways to match property selectors, but I’ll only discuss those that I think are more relevant to matching lang properties. All of my examples use Chinese as the target language, so zh and its variants.
Update: Amelia Bellamy Royds[6] points out that my example makes the attribute selector seem necessary for partial language tag matching, but the :lang() pseudo-class already covers this use case.
First, we can match the lang attribute value exactly with the following syntax:
[lang="zh"]
/* will match only zh */
Copy the code
As I mentioned earlier, Chinese is considered a macro language, which means that its language tags can be made up of additional details, such as the script subtags Hans or Hant (the W3C says to use only script tags if you need to distinguish, but not otherwise), the region subtags HK or TW, and so on.
The point is, language labels can be longer than two letters. But the most general category always comes first, so for attribute values that begin with a particular string, we use the following syntax involving ^ :
[lang^="zh"] /* Will match zh, zh HK, zh Hans, zhong, zh123... * Basically anything with zh as the first two characters */Copy the code
There is another | grammar involved, it will match the selector in the exact value, or match a begin with value, followed by – value. This seems to be just for language subcode matching, doesn’t it?
[lang | = "useful"] / * will match useful, useful - HK, useful - Hans, useful - backpacks, useful - 123 * /Copy the code
Remember that for a property selector, the property must be on the element you want to style, and it will not work if it is at the parent or ancestor level. Note that some of the language tag matching examples I presented can already be done with the :lang() pseudo-class.
In other words, in addition to lang= “en”, :lang(en) will match lang= “en US”, lang= “en GB”, and so on. I’ll update those examples when I can think of better ones. Also, use the :lang() pseudo-class.
How about a normal class or ID?
Yes. You can use a normal class or an ID. Although you don’t take advantage of the convenience you already have on your elements. (Again, my assumption is that the lang attribute is applied correctly and responsibly.) But of course, go ahead and provide class names for elements that apply language-specific styles. If you really want to, no one will stop you.
CSS properties
Ok, the selector is overwritten. Let’s talk about the styles we want to apply to the elements that match these selectors.
Writing mode
The default value of writing-mode is horizontal-tb. Perfectly logical, since the web was born at CERN, where the official languages are English and French. Also, I think most of the web technology was pioneered in English-speaking countries.
But the magic of man has given us more than 3,000 words, and writing and direction are not just horizontal from top to bottom.
Traditional Mongolian is written vertically from left to right, while East Asian languages such as Japanese, Chinese and Korean are written vertically from right to left. The write mode attributes that allow you to do this are vertical LR and vertical RL, respectively.
There are also values of sideways-lr and sideways-rl, which rotate the symbol sideways. Each Unicode character has a vertical orientation attribute that tells the rendering engine how the glyph should be oriented by default.
We can use the text-orientation attribute to change the orientation of the character. This usually comes into play when you use vertically formatted East Asian text, interspersed with Latin-based words or characters. For abbreviations, you can choose to use text-combination-vertical to compress letters into a character space.
Some people might wonder if right-to-left languages such as Arabic, Hebrew, or Persian (to name a few), and if CSS also works for these scripts. In short, CSS should not be used for bidirectional styles. W3C guidance is as follows:
Because directionality is part of the structure of a document, tags should be used to set the directionality of a document or piece of information, or to identify places in the text where the Unicode bidirectional algorithm alone is not sufficient to achieve the desired directionality.
This is because styles applied via CSS can be turned off, overwritten, unrecognized, or changed/replaced in a different context. Instead, it is recommended to use the dir attribute to set the base orientation of the displayed text.
I strongly recommend looking at structured markup and right-to-left text in HTML [7], CSS vs. Bidi support for tags [8], inline tags and bidirectional text in HTML [9] for more detailed interpretation and implementation details.
Logical properties
Everything on a web page is a box, and CSS always uses the top, bottom, left, and right physical directions to indicate which side of the box we are targeting. However, these values can be confusing when the writing-mode is not the default top-down horizontal direction.
Because the specification is still in draft state, the syntax may continue to change. Even now, the current browser implementation is different from the specification, so be sure to double-check the latest syntax with MDN: CSS logical properties and values [10].
Update :David Baron points out that I’m using an old syntax from a previous version of the specification, and that the syntax implemented in the browser is actually the syntax in the edit draft. The table has been updated accordingly.
The matrix of the writing directions and corresponding values of the physical and logical sides of the box used for positioning is as follows (the table was removed from the specification at the time of writing):
The logic of the container uses inset-block-start at the top and inset-block-end at the bottom. The logic on the left side of the container uses inset-inline-start, while the logic on the right side of the container uses inset-inline-end.
There are also corresponding mapping of boundaries, margins, and padding, which are:
- top to block-start
- right to inline-end
- bottom to block-end
- left to inline-start
<h1>A comparison of physical and logical directions for borders</h1> <p> Given the requirement is to have a box with a run of text within it with the following characteristics: </p> <ol> <li> The border colour at the top edge <strong>of the run of text</strong> should be red. </li> <li> The border colour at the right edge <strong>of the run of text</strong> should be green. </li> <li> The border colour at the bottom edge <strong>of the run of text</strong> should be blue. </li> <li> The border colour at the left edge <strong>of the run of text</strong> should be yellow. </li> </ol> <p> Using physical directions requires a modification every time the writing direction changes, whereas using logical properties allows the same properties and values for all six use cases. </p> <hr /> <section> <h1>Physical directions</h1> <div class="phy-boxes"> <article> <div class="phy-box1"> <p>This is a sentence.</p> </div> <pre><code>border-top-color: tomato; border-right-color: limegreen; border-bottom-color: dodgerblue; border-left-color: gold; </code></pre> </article> <article> <div class="phy-box2" dir="rtl"> <p>This is a sentence.</p> </div> <pre><code>border-top-color: tomato; border-left-color: limegreen; border-bottom-color: dodgerblue; border-right-color: gold; </code></pre> </article> <article> <div class="vlr phy-box3"> <p>This is a sentence.</p> </div> <pre><code>border-left-color: tomato; border-bottom-color: limegreen; border-right-color: dodgerblue; border-top-color: gold; </code></pre> </article> <article> <div class="vlr phy-box4" dir="rtl"> <p>This is a sentence.</p> </div> <pre><code>border-left-color: tomato; border-top-color: limegreen; border-right-color: dodgerblue; border-bottom-color: gold; </code></pre> </article> <article> <div class="vrl phy-box5"> <p>This is a sentence.</p> </div> <pre><code>border-right-color: tomato; border-bottom-color: limegreen; border-left-color: dodgerblue; border-top-color: gold; </code></pre> </article> <article> <div class="vrl phy-box6" dir="rtl"> <p>This is a sentence.</p> </div> <pre><code>border-right-color: tomato; border-top-color: limegreen; border-left-color: dodgerblue; border-bottom-color: gold; </code></pre> </article> </div> </section> <hr /> <section> <h1>Logical directions</h1> <div class="log-boxes"> <div class="log-box"> <p>This is a sentence.</p> </div> <div class="log-box" dir="rtl"> <p>This is a sentence.</p> </div> <div class="vlr log-box"> <p>This is a sentence.</p> </div> <div class="vlr log-box" dir="rtl"> <p>This is a sentence.</p> </div> <div class="vrl log-box"> <p>This is a sentence.</p> </div> <div class="vrl log-box" dir="rtl"> <p>This is a sentence.</p> </div> </div> <pre><code>border-block-start-color: tomato; border-inline-end-color: limegreen; border-block-end-color: dodgerblue; border-inline-start-color: gold; </code></pre> </section>Copy the code
[class$="boxes"] {
display: flex;
flex-wrap: wrap;
justify-content: space-around;
gap: 1em;
}
article {
margin-bottom: 1em;
}
article > div,
[class$="box"] {
width: 200px;
height: 200px;
border: 1em solid;
position: relative;
margin: 1em;
}
.phy-box1 {
border-top-color: tomato;
border-right-color: limegreen;
border-bottom-color: dodgerblue;
border-left-color: gold;
}
.phy-box2 {
border-top-color: tomato;
border-left-color: limegreen;
border-bottom-color: dodgerblue;
border-right-color: gold;
}
.phy-box3 {
border-left-color: tomato;
border-bottom-color: limegreen;
border-right-color: dodgerblue;
border-top-color: gold;
}
.phy-box4 {
border-left-color: tomato;
border-top-color: limegreen;
border-right-color: dodgerblue;
border-bottom-color: gold;
}
.phy-box5 {
border-right-color: tomato;
border-bottom-color: limegreen;
border-left-color: dodgerblue;
border-top-color: gold;
}
.phy-box6 {
border-right-color: tomato;
border-top-color: limegreen;
border-left-color: dodgerblue;
border-bottom-color: gold;
}
.log-box {
border-block-start-color: tomato;
border-inline-end-color: limegreen;
border-block-end-color: dodgerblue;
border-inline-start-color: gold;
}
.vlr {
writing-mode: vertical-lr;
}
.vrl {
writing-mode: vertical-rl;
}
pre {
background: #2d2d2d;
padding: 1em;
margin: .5em 0;
overflow: auto;
color: #ccc;
border-radius: 4px;
width: max-content;
margin: auto;
}
Copy the code
Comparison of physical and logical orientation of borders
The given requirement is to have a box inside the text run with the following features:
- The border color of the top edge of the text run should be red.
- The border color of the right edge of the text should be green.
- The border color of the bottom edge of the text run should be blue.
- The border color of the left edge of the text should be yellow.
- Using physical directions requires modification every time the write direction changes, while using logical properties allows all six use cases to have the same properties and values.
Physical direction
Logical direction
The size is mapped as follows: width to inline-size and height to block-size.
Lists and counters
A number system is a writing system used to represent numbers, and even though the most common number system is the indo-arabic number system (0,1,2,3, etc.), CSS allows us to display ordered lists with other number systems.
Predefined counter styles can be used in conjunction with the list-style type attribute, which covers 174 number systems from Afar to Urdu. You can see the full list at MDN[11].
If CSS counters are of interest to you, I wrote an article about them sometime last year [12], in which I explored the “heaven stem” and “earth branch” number systems used in traditional Chinese environments (and a very popular implementation in CSS, why not?).
text-decoration
As mentioned earlier, East Asian languages have no concept of italics. Instead, we have emphasis points. They can be placed above or below characters to emphasize text, reinforce tone, or avoid ambiguity.
These points are placed below the character when it is written horizontally and to the right when it is written vertically.
Japanese, on the other hand, places emphasis points above characters in horizontal writing mode. In order to make CSS properties more universal, the level 3 of CSS Text-decoration module [13] introduces text-style, text-position, and text-color.
In addition to dots, you can use different symbols, such as circles, triangles, or even single characters as strings. Position and color can also be adjusted according to their respective properties.
Line trim is also included in the same specification, giving developers finer control over underlining and overlining (level 4 of the specification). However, this is especially useful for ascending or descending scripts that often overflow the baseline.
CSS Text-decoration level 4 [14] covers text-decoration skip, which controls how to draw overlays and underscores when they cross glycolies. Again, this happens less frequently for languages like English, but has a big impact on the aesthetics of scripts like Burmese.
Change the font
Accessing the OpenType feature has two types of CSS properties, advanced and low-level. This specification recommends using advanced attributes whenever possible. This depends on browser support.
For example, east Asian font variants allow you to control the glyphs of characters with variants, such as simplified and traditional Chinese glyphs. It’s the same character, but they can be written differently.
There is also a font variant hyphen, which provides many predefined options for both hyphen and context forms, such as arbitrary, historical, or context hyphen.
Low-level properties are accessed through font feature Settings, and you can use the 4-letter OpenType flag to toggle the desired features (depending on whether your font has these features, but assuming so).
There are 141 feature tags, ranging from optional fractions to collocative substitutions, from Ruby notation to slash zeros. These CSS properties are closely related to the functionality of the font file itself, so external dependencies depend on the font choice.
The end of the
This article is too long, so I’ll go into more detail in Part 2 about how we use the selectors mentioned earlier to build layouts to ensure that our layouts remain robust even if the language changes. Modern layout properties like Flexbox and Grid are perfect for such use cases.
One of the most interesting things I find about CSS is how we combine them in different ways to achieve countless results, and there are over 500 CSS properties out there, which are many possibilities. I’m not saying anything will do, because often, there are many ways to achieve the same result, and some are more appropriate than others.
However, we need to make decisions that are best for us by understanding the mechanics behind each technology, its pros and cons, and being aware of why we do things the way we do them.
I still believe, more than 30 years later, that the Web is still an information medium and content is key. Therefore, regardless of the language or script used, the presentation of content should be optimized. I’m glad that the evolution of CSS has given developers a way to do just that.
Anyway, stay tuned for part two.
The resources
[1]
ISO 639-1 codes: en.wikipedia.org/wiki/List_o…
[2]
Tobias Bengfort : tobib.spline.de/xi/
[3]
BCP 47: www.w3.org/Internation…
[4]
Language Tags in HTML and XML: www.w3.org/Internation…
[5]
Codrops CSS Reference Entry: tympanus.net/codrops/css…
[6]
Amelia, Bellamy Royds: twitter.com/AmeliasBrai…
[7]
Structured markup and right-to-left text in HTML: www.w3.org/Internation…
[8]
Bidi support for CSS vs. markup: www.w3.org/Internation…
[9]
Inline Markup and Bidirectional text in HTML: www.w3.org/Internation…
[10]
MDN: CSS Logical Properties and Values: developer.mozilla.org/en-US/docs/…
[11]
The list – style – type: developer.mozilla.org/en-US/docs/…
[12]
The amazing World of CSS counters: chenhuijing.com/blog/the-wo…
[13]
CSS Text Decoration Module Level 3: drafts.csswg.org/css-text-de…
[14]
CSS Text Decoration Module Level 4: drafts.csswg.org/css-text-de…