Author: Zhang Zhuo

This article is produced by a member of YFE, please respect the original, please contact the public account (ID: yuewen_YFE) for authorization to reprint, and indicate the author, source and link.

preface

Webnovel began its international efforts this year, adding language and content support for Indonesia, Malaysia, and the Philippines to the new release. In the process of internationalization, we have encountered many problems, and this article focuses on sharing these problems and their solutions.

Internationalization and localization

Before we begin, let’s clarify two concepts: internationalization and localization. Internationalization (I18N) is the process of designing and preparing an application for use in different languages. Localization (L10N) is the process of translating an internationalized application into a specific language for a specific locale. This article is titled “Internationalization” practices, so it also focuses on how to prepare your application for localization.

Problems that need to be solved

Making a website multilingual may seem like a simple thing, and in most cases it is just a process of mapping one string to another, but as a product with aspirations, we certainly don’t take such a simple approach. To do internationalization well, there are many problems, such as singular and complex numbers, rich text, and so on. The following sections describe some common multilingual problems and some common solutions:

Singular and complex problem

In Our Chinese language, there is no singular or plural. The “hour” in “one hour” is the same as the “hour” in “two hours”. But in many other languages, there are different rules for different quantities. In some languages, the rules for cardinal and Ordinal Numbers may also be different, such as “1st”, “2nd”, “3rd”, “4th” in English.

The Unicode standard classifies the rules for singular and plural numbers in most of the world’s languages. There are only six rules:

  • zero
  • one
  • two
  • few
  • Many (also used for fractions if there is a separate category)
  • Other (mandatory, also used if the language has only one single form)

The ordinal number is one (1st, 11st…) and other (0 hours, 2.5 hours). , two (2nd), few (3rd), and other (4th).

The rules are there, how do we use them in practice? A more common approach is to use the ICU MessageFormat. ICU MessageFormat is a syntactic format that defines variables in the form of {key}; {key, plural, matches} (plural, singular) {key, plural, matches})

You have {itemCount, plural,
    =0 {no items}
    one {1 item}
    other {{itemCount} items}
}.
Copy the code

ICU MessageFormat is not only easy to use in singular and plural cases, but also can be used for dates, gender, and other complex cases. It has been widely used. Most JavaScript multilanguage libraries use it. These rules are also built into other languages such as Java and PHP.

There are also some differences between languages, countries and regions in terms of dates and numbers. For example, the United States uses month/day/year, while the United Kingdom, which also uses English, prefers day/month/year. For example, the “¥” of RMB and Yen, and the” €” of euro may also be placed in different positions. Yen is used to putting the currency symbol in front of the number, while euro is the opposite.

It’s almost impossible for us as developers to know all of these differences, but the ECMAScript Internationalization API provides four ways to address these issues:

  • Intl.Collator
  • Intl.DateTimeFormat
  • Intl.NumberFormat
  • Intl.PluralRules

Intl.Collator is not commonly used, mainly for language-sensitive string comparisons; Intl.DateTimeFormat helps us format the time and date according to different regional languages; Intl.NumberFormat is used to format numbers and currencies. Intl.PluralRules, on the other hand, are used to determine plurals and can tell us how a given number is classified in a language (e.g., “one”, “other”). The first three apis are relatively stable and well supported by browsers. DateTimeFormat and NumberFormat are also polyfilled to allow us to use them in a wider range, such as Node and React Native environments. PluralRules is still in draft form and browser support is poor, so direct use is not recommended.

In addition, we are seeing increasing support for internationalization in browsers, especially Chrome. In addition to the four apis above that are already standard, Chrome supports the Intl.RelativeTimeFormat and Intl.ListFormat APIS in versions 71 and 72, respectively. Functions similar to those in moment.js, such as:

const rtf = new Intl.RelativeTimeFormat('en'); The RTF format (3.14,'second'); / / -'in 3.14 seconds'
    
rtf.format(-15, 'minute'); / / -'15 minutes ago'
Copy the code

Intl.ListFormat is used to format lists, for example:

const lf = new Intl.ListFormat('zh');
    
lf.format(['yong feng'.'dextrys']); / / -'Wing Fung and Sun Yu'
Copy the code

These apis are far more powerful than the examples shown above. For specific usage, please refer to MDN and Google Developers Website. It is also believed that internationalization on the Web will be easier and easier in the future.

Meaning and context

As we all know, whether in Chinese or any other language, a word or phrase can have different meanings in different situations. “About” as a page title might mean “About”, but in a sentence it might mean “About”.

We can solve this problem by providing translators with more information. Text description can help the translator to understand the context, send screenshots and other ways to ensure that the translator can accurately translate.

Who is going to translate

Who will translate may not seem like a problem, but it determines our entire translation process, and we need to decide as early as possible when we do multilingualism. Generally speaking, translation may be done by professional translators or users/volunteers. Both methods have their advantages and disadvantages:

  • Translation by a professional translator. Professional translation can basically guarantee high quality and efficiency, but the main problem is high cost.
  • Translated by users/volunteers. Many open source projects take this approach. Twitter also adopted this approach, building a translation platform to make it easier for users to translate. In just one year, more than 400,000 volunteers helped translate the service, which is available in 21 languages. In this way, the cost is low, and due to the large number of possible participants, the translation quality can be guaranteed through multi-person review. The only thing to worry about is uncontrollable translation time. We don’t need to build our own platforms. We can choose an established commercial platform like Crowdin.com or an open source platform like Mozilla’s Pontoon.

In addition to the multi-language problems mentioned above, we may also face many other types of problems in the process of internationalization, such as multi-party cooperation, international/regional operation, etc., which will not be discussed here due to the limited space.

The solution

In multi-language Web applications, a common method is to put characters of different languages in different JSON files or other files, obtain the language favored by the user, load the character files of the corresponding language, and display them in the application.

In this way, the biggest problems that need to be solved are some of the more complex cases, such as the singular and complex numbers mentioned above. We also mentioned that ICU MessageFormat can be used to solve this problem. The specific approach is to parse ICU MessageFormat into AST and convert it into functions, and then pass the corresponding parameters to the corresponding functions in the application.

There are a few more details to discuss in the above approach, which are not covered here for space reasons, but let’s take a look at the relatively mature i18N solutions based on the mainstream frameworks.

React Intl

React Intl is Yahoo’s open source Internationalization solution based on React. It follows the BCP 47 and Unicode CLDR standards, supports ICU Message Format, and supports internationalization of dates, times, and numbers.

React Intl implements multiple languages in the form of components:

<FormattedMessage
  id="welcome"
  defaultMessage={`Hello {name}, you have {unreadCount, number} {unreadCount, plural,
    one {message}
    other {messages}
  }`}
  values={{name: <b>{name}</b>, unreadCount}}
/>
Copy the code

For more details, see Github: github.com/yahoo/react…

Presents the scheme

Angular is probably the only major front-end framework to ship an I18N solution. It also follows the BCP 47 and Unicode CLDR standards and supports the ICU Message Format.

Unlike typical I18N schemes, Angular doesn’t need to prepare a JSON or other form of multilingual mapping. Instead, it uses i18N attributes to mark up text that needs to be multilingual, such as:

<h1 i18n>Hello, webnovel</h1>
Copy the code

By executing the ng xi18n command, Angular automatically extracts all the characters that mean i18N attributes and generates an XLF file (XLF is an XML-based interchange format that standardizes the way localizable data is passed between tools during localization). We can send the XLF file directly to the translator, who will translate it through some specialized software and send it back to us. Finally, we can do all the work by injecting multilingual content into the application either through precompilation or just-in-time compilation.

Angular’s solution is perfect for places we might not be aware of. If we want to multilingual the img tag’s title attribute, we can add the i18n-title attribute, for example:

Webnovel solution

Comparing the above solutions, we find the Current Angular solution to be the most ideal. It’s relatively complete; Not very intrusive; It is also easier to use than other frameworks, thanks to its ability to automatically extract the desired strings and generate ids; Its multilingual support for pre-compilation and just-in-time compilation ensures performance and flexibility.

However, it is a pity that Webnovel does not use Angular at present. Our mobile site and App are built based on React and React Native respectively. Since the existing multi-language library based on React – IntL cannot meet our needs well, We decided to build the base library for i18N ourselves, which needed to do:

  • It works in common JavaScript environments, including browsers, Node (server-side rendering), and React Native
  • Follow international standards and support ICU MessageFormat
  • Supports precompilation and real-time compilation of character templates
  • High performance, dynamic loading language support
  • Support for language/string fallbacks
  • Low cost of use
  • pluggable

With basic goals in mind, we developed a new multilingual library called React-I18n:

@react-i18n/core

The core library, via the React Context API, provides higher-order components withI18n and Message components to help multilingual applications. WithI18n passes the I18N information to the props of the component, while the Message component is similar to the FormattedMessage component in React Intl, which renders the characters directly by passing the id and parameters of the corresponding string.

@react-i18n/cli

Command line tool, mainly provides precompiled character templates and some auxiliary functions, such as

  • Excel processing: Convert Excel to JSON. We synchronize information with the translator through online Excel.
  • Automatic ID generation: For example, Hello World is generatedHELLO_WORLD_6f5902ac237024bdd0c176cb93063dc4To ensure that the editor is easy to input through automatic completion, but also to ensure the uniqueness.
  • Machine Translation: Use the Microsoft Translator Text API and Google Translation API to fill machine Translation results directly into tables to help translators translate faster.

The React-i18N library already meets our basic requirements and has been running on Webnovel’s mobile site and App for some time. Due to time constraints, the React-i18n library is not robust enough to open source for the time being. Next, we will be refining this library and returning it to the open source community as soon as possible.

Other issues that need attention

Follow existing standards

Following existing standards is important for internationalization. When writing an app, it’s almost impossible to avoid working with third parties, such as payments. In the process of cooperation, if we use the same standard, the cost of communication and debugging will be greatly reduced. There are many standards for internationalization, some of which are quite complex and require careful and patient reading. It is worth noting that the standard is not always the same, for example, in the BCP 47 standard mentioned above, the code of Indonesia in the old version is IN, but it is changed to ID in the new version. Generally, we should follow the updated standard.

Learn more about mature solutions earlier

For problems like internationalization that have existed for many years, there must be a mature plan to learn from. Before starting, to learn as much as possible about the existing plan will save us many detours.

conclusion

This paper focuses on some internationalization schemes based on Web technology and the practical application in Webnovel. Internationalization has always been a very difficult problem, so we need to think about it in the long term. Internationalization is not only multi-language, from layout to cultural differences, we need to consider, the internationalization of Webnovel has just begun.

A link to the

  • BCP 47 – tools.ietf.org/html/bcp47
  • ICU MessageFormat – userguide.icu-project.org/formatparse…
  • Language Plural Rules – www.unicode.org/cldr/charts…
  • The Intl API – developer.mozilla.org/en-US/docs/…
  • The Intl. RelativeTimeFormat API – developers.google.com/web/updates…
  • The Intl. ListFormat API – developers.google.com/web/updates…
  • The Intl polyfill – github.com/andyearnsha…
  • Angular Internationalization – angular.io/guide/i18n

For more sharing, please pay attention to the front end team official account of China Literature Group: