preface

Below, I will list the problems and difficulties encountered in my business and the corresponding solutions to explain the whole process of the birth of simple and traditional plug-in

background

At present, a large number of marketing activities need to be written in the development work, which is characterized by small but many. At the same time, the current project needs to make two versions of mainland China and Hong Kong and Taiwan

The plan realized at the present stage:

  1. Finish the mainland version first, and finally copy a code to change the Hong Kong and Taiwan version
  2. Replace the Chinese characters, price and login method in the project.

Existing problems:

  1. First of all, copying to copying is not a very good solution, which is easy to copy problems. Secondly, the two versions need to go online at the same time, so there are problems in the timing of the code to copy the code. If the copy is too early, if there are bugs in the mainland version during the test phase, then two copies of bugs need to be modified. If the copy is too late, there will be insufficient test time of the Hong Kong and Taiwan version, which will easily lead to problems.

  2. Simplified and traditional conversion, is to manually copy simplified to Google Translate web page end translation, and then manual replacement, tedious and large quantities of work, login method needs a separate copy.

There are several differences between the two versions

  1. Different login methods, the mainland mainly uses the account password login, while Hong Kong and Taiwan use Google, Facebook, Apple login

  2. Different price, unit, ¥and NT$

  3. Chinese characters have different forms, simplified Chinese and traditional Chinese

The core problem lies in the workload and potential risks of copying a project, so two projects need to be combined into one project. How to solve it?

The solution

1. Combine the two projects into one

If two projects need to be combined into one to address the differences identified above, then it is obvious that there is a need for a label to distinguish them. Using environment variables to solve this problem is very appropriate. In the case of a VUE project, you can write the corresponding environment variable configuration.

Continental version production environment:.env

VUE_APP_ENV=prod
VUE_APP_PUBLIC_PATH=/mainland
Copy the code

Continental version development environment:.env

VUE_APP_ENV=dev
VUE_APP_PUBLIC_PATH=/mainland
Copy the code

HongKong and Taiwan version development environment:.env.ht

VUE_APP_ENV=ht
VUE_APP_PUBLIC_PATH=/ht
NODE_ENV=production
Copy the code

package.json

"serve": "vue-cli-service serve"."build": "vue-cli-service build"."build:ht": "vue-cli-service build --mode ht".Copy the code

VUE_APP_ENV (process.env.vue_app_env, process.env.vue_app_env, process.env.vue_app_env, process.env.vue_app_env, process.env.vue_app_env, process.env.vue_app_env, process.env.vue_app_env) For example, when NODE_ENV is set to production, optimizations such as compression are performed during packaging.

Note: The RTHK version does not distinguish the test environment, because often the logic of the mainland version does not have problems, so it only needs to be based on the development of the mainland version, RTHK version only needs to be packaged for the last time (the test environment is optional, just need to add one more configuration).

Other points to note: VUE_APP_ENV is normally only accessible from node environments, but vue-CLI creation projects automatically inject variables from. Env into the runtime environment, using a global variable. This is usually implemented using the define-plugin plugin for Webpack.

With the problem of environment variables solved, the rest of the work is easier to do.

2. Resolve differences in login modes

With two sets of login encapsulated into two different components, because login often involves some global state, project usually use global state management tools such as vuex, so default vuex storage condition, the whole code contains the login logic into a project template, the basis of using custom scaffolding pull can, at the same time pay attention to when using vuex, Put the login-related states under a Module, so that after creating projects based on the template, the other states of each project can be written separately to the Module to avoid changing the login-related Module.

Custom scaffolding: create a project interactively, input some options, such as project name, project description and so on, then pull the written template from the remote warehouse such as GitLab, replace some specific variables in the template with the project name in the template using the template engine, and finally generate a new project. (Scaffolding has other uses as well, and it is only described here to create a simple project.)

  • If there is no scaffolding, you have to use itgit cloneChanging something like the project name later on adds a little extra work, but it doesn’t make a big difference.

Part of the logic of encapsulation:

For example, the login component of mainland China is called mainlandLogin, and the login component of Hong Kong and Taiwan is called htLogin. Write another login component to integrate them, distinguish different components by environment variables, and use Component to dynamically load the corresponding login components as follows:

login.vue:

<component :is="currentLogin" @sure="sure" cancel="cancel"></component>

data:{
    return {
        currentLogin: process.env.VUE_APP_ENV === 'ht' ? 'mainlandLogin' : 'htLogin'}},components: {
        mainlandLogin: () = > import("./components/mainlandLogin.vue"),
        htLogin: () = > import("./components/htLogin.vue"),},method: {sure(){
        this.$emit('sure')},cancel(){
        this.$emit('cancel')}}Copy the code

Note: the way to import components is to use dynamic loading. When packaging, the two components will be packaged into two separate chunks, because the mainland version and the Hong Kong and Taiwan version will only use one login, and the other is not needed

Once you’ve wrapped the login component, it’s easy to use

<login @sure="sure" cancel="cancel"></login>
Copy the code

3. Solve price discrepancies

Just like login, it can be differentiated according to environment variables. Add a field such as htPrice to the original commodity JSON in the mainland version

const commodityList = [
    {
        id: 1
        name: "xxx".count:1.price:1.htPrice: 2}]Copy the code

VUE_APP_ENV === ‘ht

{{ isHt ? `${commodity.htPrice} NT$` : `${commodity.price}RMB ` }}

data() {
    return {
        isHt: process.env.VUE_APP_ENV === 'ht'}}Copy the code

4. Simplified and traditional style conversion

I solved the problems of merging two projects into one project and the inconsistency of login, price and unit. Finally, only simplified Chinese to traditional Chinese was left, which was also the most difficult part to solve. After many technical investigations, I failed to find a suitable solution, and finally HAD to write a set of my own.

1. Maintain two sets of language files using I18N

Advantages: one of the most used libraries for internationalization, no need to change the text in the code, use variable substitution, only need to maintain two sets of language files, change points in one file

Disadvantages: The use of variable substitutions increases the code complexity to some extent and does not eliminate the need to manually copy simplified characters to translate additional writing to specific language files, which is not the best solution for this scenario

2. USES:language-tw-loader

Advantages: Seems to be able to automatically convert simplified to traditional, convenient and fast

Disadvantages: a fatal defect is found in the use, can not be accurately replaced, reason: different phrases, the same word may correspond to multiple glyph, such as: contact -> contact, tie shoelaces -> tie shoelaces.

Basic principle: Enumerate the commonly used Simplified and traditional Chinese characters, correspond one by one, and replace one by one, as shown in the picture below:

Advantages: When running, Google Translate automatically translates simplified web pages into traditional ones

Cons: Because it is escaped at runtime, the page always displays simplified characters first and traditional characters later

To sum up: Some existing schemes have the following problems

  1. Additional language files need to be maintained, replacing text with variables
  2. Compile-time conversions do not translate correctly, and run-time conversions are delayed

In order to solve the above problems:

1. There is no need to write multiple sets of language files. Normal development can be written in Chinese

Need a translation API, and the translation should be accurate, after testing simplified Chinese translation Google Translate is the most accurate.

2. Convert at compile time

Write a plugin for the packaging tool, here mainly to webpack for the packaging tool, so need to write a Webpack plugin.

Translation API

You need a free, accurate, and difficult-to-fail translation service, but the Google Translate API has to be paid for. If you have the money to pay, you can easily enjoy this service, but it is not realistic to pay extra for a simplified to traditional translation.

There are a lot of free Google apis in open source projects, but they try to simulate the generation of their encryption tokens, make requests, and the service is easy to hang up, so many simply become nothing.

But!! Remember, Google Translate is a free web version!

So all you have to do is open a browser, fill in the text that needs to be translated, and get the translated text, but the program automatically opens a browser for you. There is already a very sophisticated solution for puppeteer to do just that.

Therefore, the final solution is: access to Google Translate.google.cn based on puppeteer to obtain translation results, which is more stable than other solutions.

Translateer is a translation service based on Puppeteer. If you are interested, you can read the source code. It is not complicated.

But notice, based ontranslateerStarting the API service, there are several points that can be optimized:

First of all, we need to know the maximum number of characters supported by Google Translate web pages. The test shows that the maximum support for a page is 5000 characters, and the part exceeding that can turn the page.

Enter the source text in the left input box above and the page will send onepostRequest that the translated content appear on the right side after a short delay, and note that the link in the navigation bar will change to the following form:

https://translate.google.cn/?sl=zh-CN&tl=zh-TW&text= hahaha & op = translateCopy the code

The meanings of the preceding parameters are different

Sl: source language; Tl: Target language; Text: translated text; Op: TranslateCopy the code

If the request is made directly using the link above, the Google interface will return a 400 error at 16346 characters if, after testing, the text value is replaced with ‘1’.repeat(16346) (this value does not include other characters on the URL, so the total LENGTH of the URL is 16411 if all other characters are included).

It is worth noting that while many articles have stated that the maximum length of Chrome get requests is 2048 or 8182 characters, this test proves that Google Translate can still access a total length of less than 16411 characters, exceeding the 400 errors that will be thrown by Google Translate’s corresponding backend server.

Refer to theLength limit for GET requestsThe following points can be known:

1. First, if there is a length limit, it is the entire URI length, not just the length of your parameter value data.

2. The HTTP protocol never specifies a length limit for GET/POST requests

3. The so-called request length limit is determined and set by the browser and web server, and the browser and Web server Settings are different

So the browser limit is how many characters, temporarily have not found the correct answer, there is a big guy can help explain

Chrome version: 98.0.4758.102 (official) (64-bit)

Having analyzed the above basic limitations, let’s look at the followingtranslateerThe implementation of the:

Translateer service is started to create a PagePool page pool, open five TAB page and all jump to https://translate.google.cn/, here is its part of the code:

export default class PagePool {
  private _pages: Page[] = [];
  private _pagesInUse: Page[] = [];
  constructor(private browser: Browser, private pageCount: number = 5) {
    pagePool = this;
  }
  public async init() {
    this._pages = await Promise.all(
      [...Array(this.pageCount)].map(() = >
        this.browser.newPage().then(async (page) => {
          await page.goto("https://translate.google.cn/", {
            waitUntil: "networkidle2"});returnpage; }))); }}Copy the code

You then start a Node server using Fastify to provide a GET request API. The following is the deleted part of the code:

fastify.get("/".async (request, reply) => {
      const { text, from = "auto", to = "zh-CN", lite = false } = request.query;
      const page = pagePool.getPage();
        await page.evaluate(([from, to, text]) = > {
        location.href = `? sl=The ${from}&tl=${to}&text=The ${encodeURIComponent(
          text
        )}`; },from, to, text]
    );

    // translating...
    await page.waitForSelector(`span[lang=${to}] `);

    // get translated text
    let result = await page.evaluate(
      (to) = >
        (document.querySelectorAll(`span[lang=${to}] `) [0] as HTMLElement)
          .innerText,
      to
    );
}
Copy the code

Passed sl: the source language; Tl: Target language; Location. Href jumps to

? Sl =${from}&tl=${to}&text=${encodeURIComponent(text)}

The basic realization principle is analyzed, and then the pits are analyzed.

Location. Href is a GET request. After the above analysis, we don’t know the character length limit of the browser get request, but we know that Google background service limits the request length to 16411, and then roughly subtract 411 characters as the length of other characters of the URL. The maximum length of each translated text is 16000 characters.

This code encodeURIComponent encodes text (get requests also encode Chinese and other special characters by default)

It should be noted that the encoding of one character in Chinese is 9 characters => %E8%81%94, so 16000/9 is about 1777 Chinese characters

Phase summary:

Due to some limitations of Google Translate web version, the direct use of GET request, a maximum of 1777 Chinese characters can be translated, and there is no character length limit in the input box to simulate the input of Chinese characters, a maximum of 5000 characters can be used for page turning.

Translateer can translate at least 5000 characters at a time, and the number of requests is as low as possible, which can reduce the translation time and speed up the compilation of plug-in, so we need to improve translateer:

  1. usefastifyCreate a new onepostrequestAPI
export const post = ((fastify, opts, done) = > {
  fastify.post('/'.async (request, reply) => {
      ...more...
    }
  );
  done();
});
Copy the code
  1. Add only parameters when jumpingSl source languagewithTl Target languageDo not addtextparameter
  await page.evaluate(
    ([from, to]) = > {
      location.href = `? sl=The ${from}&tl=${to}`; },from, to]
  );
Copy the code
  1. Select the text input box on the left side of the Google Translate page and selectText to be translatedAssigns a value to the input box and needs to be usedpage.typeType a null character to trigger the textbox onceinputEvent, the page will perform the translation.
  await page.waitForSelector(`span[lang=The ${from}] textarea`);
  const fromEle = await page.$(`span[lang=The ${from}] textarea`);
  await page.evaluate((el, text) = > {
    el.value= text
  },fromEle, text)
  // Simulate an input triggering input event so that Google Translate can translate
  await page.type(`span[lang=The ${from}] textarea`.' ');
  
 // translating...
  await page.waitForSelector(`span[lang=${to}] `);

  // get translated text
  const result = await page.evaluate(
    (to) = >
      (document.querySelectorAll(`span[lang=${to}] `) [0] as HTMLElement)
        .innerText,
    to
  );
Copy the code

At the beginning, I used it to assign a value to the text input box. When the text was too long, it took a long time to input, so I didn’t know how to deal with it. For this reason, I even proposed an issue, which was rewritten into the current writing method after being instructed: issues

Conclusion:

Mentioned above, more than 5000 characters to turn pages, there is no translation processing, the current limit is 5000 characters in each request translation is fine, more than 5000 request again translation interface (follow-up can handle a page, no matter how long the character is a translation, but also need to be further compared to both the length of time used)

Last modified code github address: Translateer

translate-language-webpack-plugin

After solving the problem of translation API, the only thing left is to convert simplified Chinese into traditional Chinese in the code. Since the packaging tool uses Webpack, we write webPack Plugin to read Chinese and replace it, and need to support WebPack 5.0 and WebPack 4.0 versions. The following uses version 5.0 as an example:

Let’s start with the idea of the plug-in

  1. Write the WebPack plug-in
  2. Read all Chinese in the code
  3. Request the translation API to get translated results
  4. Write the translated results into code
  5. Additional features: Output the source text and destination text to the log each time it is read, especially for comparison if the length of the text returned by translation is inconsistent with the length of the source text.

Then implement the above functions step by step

1. The first step is to write a plug-in. How? That’s a problem

There are a lot of different hooks in the webPack plugin version 4.0 and version 5.0, and there are a lot of different hooks in the WebPack plugin. There are a lot of different hooks in the WebPack plugin version 4.0 and version 5.0. So the fastest way is to refer to the existing mature plug-ins, I directly refer to the HTML-webpack-plugin when writing, version 4.0 and 5.0 are written according to their corresponding versions.

Tips: This is the point of looking at the source code of open source projects, you can learn a lot of mature solutions, you can step a little bit less pit, so you need to learn how to find entry files, how to debug code.

Part of the code is as follows, refer to the following comments:

const { sources, Compilation } = require('webpack');
// Log output file
const TRANSFROMSOURCETARGET = 'transform-source-target.txt';
// Google Translate supports maximum characters at one time
const googleMaxCharLimit = 5000;
// Plug-in name
const pluginName = 'TransformLanguageWebpackPlugin';

class TransformLanguageWebpackPlugin {
  constructor(options = {}) {
     // Default parameters
    const defaultOptions = { 
    translateApiUrl: ' '.from: 'zh-CN'.to: 'zh-TW'.separator: The '-'.regex: /[\u4e00-\u9fa5]/g, outputTxt: false.limit: googleMaxCharLimit,
    };
    // translateApiUrl translation API must be uploaded
    if(! options.translateApiUrl)throw new ReferenceError('The translateApiUrl parameter is required');
    // Merge the passed parameter with the default parameter
    this.options = { ... defaultOptions, ... options }; }// Add the apply method to be called by Webpack
  apply(compiler) {
    const {separator, translateApiUrl, from, to, regex, outputTxt, limit} = this.options;
    // Listen to the Compiler's thisCompilation hook
    compiler.hooks.thisCompilation.tap(pluginName, (compilation) = > {
     // Listen to the compilation processAssets hook
      compilation.hooks.processAssets.tapAsync(
        {
          name: pluginName,
          PROCESS_ASSETS_STAGE_ANALYSE: analyze the existing assets.
          stage: Compilation.PROCESS_ASSETS_STAGE_ANALYSE,
        },
        // Assets represent the path and contents of all chunk files
        async (assets, callback) => {
           // TODO: Fill in the function to implement here}}})})Copy the code

The processAssets hook in Webpackage 5.0 is used to process files. Let’s take a look at what assets are in the Compilation.PROCESS_ASSETS_STAGE_ANALYSE phase. Take the example provided in the provided Github repository

You can seeassetsThat’s the final output file, depending on what you need to dostage, here selectPROCESS_ASSETS_STAGE_ANALYSEThe reason is that it needs to be dealt withindex.htmSo you need to choose a very rear hook, other hooks reference(The related documents)

2. Read all Chinese characters in the code

First, we need to write a function to match adjacent Chinese characters, such as

lost

tie shoelaces

, return: [‘ lost ‘, ‘tie shoelaces ‘]. [‘ missing ‘, ‘tie shoelaces ‘] => Missing ‘-‘ tie shoelaces ‘, the reason for the separation: for example, Chinese simplified => Traditional Chinese characters (there are polygons) : Missing shoelaces => missing shoelaces, and the correct result should be missing shoelaces. Missing shoelaces is a phrase. Tying shoelaces is a phrase

/ * * *@description Returns an array of Chinese phrase, such as: < p > hello < / p > < div > < / div > in the world, return: [' hello, 'the world'] *@param {*} Content The content of the packaged bundle file *@returns* /
function getLanguageList(content, regex) {
  let index = 0,
    termList = [],
    term = ' ',
    list; // Iterate over the Chinese array
  while ((list = regex.exec(content))) {
    if(list.index ! == index +1 && term) {
      termList.push(term);
      term = ' ';
    }
    term += list[0];
    index = list.index;
  }
  if(term ! = =' ') {
    termList.push(term);
  }
  return termList;
}

Copy the code

Continue at TODO:, grab all the Chinese chunks and save them in the chunkAllList array

let chunkAllList = [];
// Start by storing all chunks of 'named character phrases'
for (const [pathname, source] of Object.entries(assets)) {
    // Only read Chinese from js and HTML files, other files are not needed
    if(! (pathname.endsWith('js') || pathname.endsWith('.html'))) {
      continue;
    }
    // Get the source code string for the current chunk
    let chunkSourceCode = source.source();
    // Get all Chinese characters in chunk.
    const chunkSourceLanguageList = getLanguageList(chunkSourceCode, regex);
    // If less than 0, there is no 'specified character phrase' in the current file and no replacement is required
    if (chunkSourceLanguageList.length <= 0) continue;
    chunkAllList.push({
      // The original text array
      chunkSourceLanguageList,
      // separator is the separator. The default value is -
      chunkSourceLanguageStr: chunkSourceLanguageList.join(separator),
      // chunk source code
      chunkSourceCode,
      // Chunk output path
      pathname,
    });
}
Copy the code

3. Request the translation API to obtain the translated results

For example, there are only 2 characters in one chunk and 3 characters in another chunk, so it is not necessary to request the translation interface twice. In order to reduce the number of requests, the Chinese characters in all chunks are first synthesized into a string and separated by _ to distinguish the contents belonging to that chunk.

const chunkAllSourceLanguageStr = chunkAllList
.map((item) = > item.chunkSourceLanguageStr).join(` _ `);
    
Copy the code

Once a string is synthesized, it needs to be cut up, because up to 5000 characters can be translated at a time

// Split the characters read from all chunks reasonably for Google API translation, no more than Google Translate limits
const sourceList = this.getSourceList(chunkAllSourceLanguageStr, limit);
Copy the code
getSourceList(sourceStr, limit) {
    let len = sourceStr.length;
    let index = 0;
    if (limit) {
    }
    const chunkSplitLimitList = [];

    while (len > 0) {
      let end = index + limit;
      const str = sourceStr.slice(index, end);
      chunkSplitLimitList.push(str);
      index = end;
      len = len - limit;
    }
    return chunkSplitLimitList;
}
Copy the code

After the cutting is complete, finally use promise. all to request all interfaces, all translation success is considered successful

/ / translation
const tempTargetList = await Promise.all(
  sourceList.map(async (text) => {
    return await transform({
      translateApiUrl: translateApiUrl,
      text: text,
      from: from.to: to, }); }));Copy the code

4. Write the translated result into the code

Get all the chunks in the simplified Chinese translation of the traditional, finally through the chunk array chunkAllList, the source code

for (let i = 0; i < chunkAllList.length; i++) {
  const {
    chunkSourceLanguageStr,
    chunkSourceLanguageList,
    pathname,
    chunkSourceCode,
  } = chunkAllList[i];
  let sourceCode = chunkSourceCode;
  // Convert simplified to traditional
  targetList[i].split(separator).forEach((phrase, index) = > {
    sourceCode = sourceCode.replace(
      chunkSourceLanguageList[index],
      phrase
    );
  });
 // 
  if (outputTxt) {
    writeContent += this.writeFormat(
      pathname,
      chunkSourceLanguageStr,
      targetList[i]
    );
  }
  compilation.updateAsset(pathname, new sources.RawSource(sourceCode));
}
Copy the code

The above code is not complete. For the complete code and how to use the plug-in, please refer to the translate-language-webpack-plugin

5. Output the reference text

As follows: Mainly output the Chinese characters in each chunk for comparison. If there are no other dynamic characters on the page and these characters need special fonts, a font file can also be packaged with these read characters, which is much smaller than a whole font file.

conclusion

Note: it will replace all the Chinese characters on the page including JS, but the characters returned by the interface cannot be converted, and the corresponding traditional characters are returned by the back end

At this point, a complete business requirement has been optimized. The translation plug-in theoretically supports any language transfer, but because of the different semantics of the translation, the meaning of the translation is not what we want. It is suitable for simplified and traditional Chinese transfer.