The author | not four (horse), head of the ant gold dress language finches product technology

What is a whisperer?

Language Is a professional cloud knowledge base, for individuals and teams, providing unique knowledge management, creating easy and smooth work collaboration, it provides a variety of formats of online documents (rich text, forms, design draft, etc.) editing ability, support real-time online multi-person collaborative editing, data cloud storage is not lost. The biggest difference between Wordsmith and other document tools is that it organizes documents through knowledge base so that knowledge creators can better manage knowledge.

Language finch technology architecture evolution

Prototype stage

Chatfinch was born in 2016 when ant Financial Cloud needed a tool to host its documents. The technical student who was in charge at that time started to build this document tool in his spare time. In the early stage of the project, we did not have any personnel or resources to support, and in order to quickly verify the prototype, we chose the lowest cost solution in terms of technology selection.

The underlying services are entirely based on the BaaS service and container hosting platform provided internally by the Experience Technology Department:

  • Object service: a MongoDB like data storage service;
  • File service: a File storage service encapsulated on the basis of ALI Cloud OSS;
  • DockerLab: a container hosting platform;

These services and platforms are based on Node.js and are designed for internal innovative applications. It is precisely because of these internal services that reduce innovation costs that engineers are provided with a better innovation environment.

The application layer server naturally chooses Egg (the packaging Chair inside ant), the open source Node.js Web framework of the Experience Technology Department, to realize the server through a single Web application. The application layer client also chooses React technology stack, combines with internal ANTD, and adopts CodeMirror to realize a powerful and elegant markdown online editor.

This was the “prototype phase” of The Language Tool, which was just an engineer’s side project, using the in-house BaaS service for innovative applications and a series of open source technical solutions to validate the product prototype of the online documentation tool.

PS: I wasn’t on the Language Finch team at the time, but I was providing Object, File and other BaaS services and support for the Egg.js Web framework.

Internal service stage

As the online document tool has been recognized by the team, the goal of Yuquai is not only the document tool of financial cloud, but also aims to replace confluence and other competing products and become the knowledge management platform of ali’s 100,000 employees. Language is geared towards knowledge creators, and simply providing a Markdown editor won’t make non-technical people more effective at using language. While many true fans started learning and even fell in love with Markdown because of Wordsmith, we were still in the deep end of the rich text editor world. Meanwhile, unlike rich text editors such as Word, we chose a more “Web” approach, adding features such as formulas, text mapping and mind mapping to the rich text editor. With the continuous exploration in the field of knowledge management, the three-layer structure of knowledge management (team, knowledge base and document) begins to take shape. The functions such as collaboration, sharing, search and news feed are becoming more and more complex. It is no longer possible to meet the business needs of The Language sparrow simply by relying on BaaS service.

In order to meet the challenges brought by business development, we mainly carry out transformation from the following points:

  • Although BaaS services are simple to use and low cost, the functions they provide are not enough to meet the development of Language Sparrow business, and their stability is also insufficient. So we replaced the underlying services from BaaS with internal IaaS services (MySQL, OSS, caching, search, etc.).
  • The Web layer still uses node.js and Egg frameworks, but the business layer has started to become a large monolithic application based on the rails community practice. ORM is introduced to build the data model layer to make the code layer clearer.
  • The front-end editor was migrated from codeMirror to Slate. In order to better realize the functions of the Sparrow editor, we fork Slate for in-depth development, and also customize a separate content storage format to provide more efficient data processing and better compatibility.

In the internal service stage, Yuque has become a formal product, which is no different from other ant projects. Through the internal training in Ali, yuque’s product form is basically finalized.

Commercialization stage

With the increasing internal influence of Yuque, some ali-alumni who left to start their own businesses began to find Yubo: “Yuque is quite useful, have you considered commercialization to allow outside companies to use it?” After incubating and restructuring for less than half a year, Yuqian began to provide services and commercialize in early 2018.

When an application moves out of the company and into a commercial environment, the technical challenges suddenly become larger. The functions of the core knowledge creation management part are becoming more and more complex. With the addition of new formats such as tables and mind maps, the requirement of real-time collaboration between multiple people poses a higher challenge to the editor technology. In order to better serve enterprise users and individual users, Language finch in enterprise services, member services and other aspects also invested a lot of energy. With the rapid development of business, service commercialization also puts forward higher requirements on quality, safety and stability.

In response to business growth, the language Finch architecture has evolved as well:

We have completely transferred the underlying dependence to the cloud and all migrated to ali Cloud. Ali Cloud not only provides basic storage and computing capabilities, but also provides richer advanced services and guarantees stability.

  • Abundant cloud computing basic services, to ensure that the service end of Finch can choose the most suitable for Finch business of storage, queue, search engine and other basic services;
  • More artificial intelligence services bring more possibilities to Yuqiu’s products, including OCR image recognition, intelligent translation and other services, which are directly transformed into yuqiu’s featured services.

In the application layer, the server of the finch is still a large Node.js Web application based on the Egg framework. However, as more and more functions are added, some relatively independent services are separated from the main service. These services can be divided into several categories:

  • Micro-service: For example, real-time multi-person collaboration service is relatively independent and long-connection service is not suitable for frequent release, so we split it into an independent micro-service to maintain its stability.
  • Task services: a large number of local file preview services, like the one provided by Wordfinch, can produce tasks that are resource-intensive and complex to rely on. By separating it from the master service, we can avoid uncontrollable dependencies and resource consumption on the master service.
  • Function calculation: tasks like Plantuml preview and Mermaid preview are not sensitive to the response time, and they can be packaged into Alicloud function calculation. We will put them into function calculation to run, which is both economical and safe.

As editors became more complex, developing on top of SLATE became more problematic. Finally, The Language has moved on to homegrown editors, including a rich text editor for browser-based Contenteditable, a table editor for Canvas, and a mind map editor for SVG.

For an introduction to rich text editors, see The evolution of rich text editors by Long Hao, the father of Lake Editor.

This phase of Finch (where it is now) was the commercial phase, but we still kept a small team working through the full stack of JavaScript. The bottom of the service on the cloud, cloud services to build sparrow features. At the same time for enterprise users and individual knowledge workers to provide knowledge creation and management tools.

JavaScript whole stack

On social networks, people seem to have a negative view of JavaScript full stack. “Good at everything, good at everything” may be the first impression of people after hearing the term full stack engineer. So why does the compiler choose the JavaScript full stack direction?

JavaScript full stack and product engineer

Here, we do not define a full stack JavaScript developer as a full stack engineer, but rather as a “one-size-fits all” product engineer:

  • They are the “technical partners” of the product, who have a sense of ownership of the product. They participate in product discussion and design together with the product manager, make suggestions on product design from the technical point of view, independently complete the full-stack research and development of product functions, and track the product results after release.
  • They are also domain experts in a particular technology area, for example someone might be an expert in the server-side area, an expert in testing, an expert in front-end construction, an expert in CSS. They can use their domain knowledge to optimize the team’s r&d tool chain and improve product development efficiency.

In Yuque, the product engineers’ product development process is as follows:

  • In the product design stage, product engineers will participate in the discussion and finally produce a final design draft of the product. As product engineers participate in full discussions in the early stage, there will generally be no technical problems in the later development of the product design draft.
  • Then the system analysis and design will be documented on the language sparrow. Will initiate an asynchronous review on the finch. Some large technical solutions will be reviewed by other domain experts to ensure that all technical difficulties are sorted out;
  • After the system design is clear, it enters the r&d stage;
  • Automated test coverage is required for all code. Full coverage unit testing is required for all new code and modified business logic, as well as end-to-end testing for link-critical functionality. Writing automated tests is a necessary process before entering code review.
  • Asynchronous code reviews will be initiated after periodic feature development and test writing is completed. The relevant business leaders and corresponding domain experts will be invited to conduct code reviews. Code reviews are conducted from multiple perspectives of correctness, security, and maintainability of business logic.
  • In the final release online, must follow the principle of three plate axe: gray, emergency, monitoring. Avoid bugs that can affect a large number of users due to feature changes.

How does wordsmith do full-stack JavaScript testing? Interested students can see the big front end automated testing team teacher Niu Dafeng sharing: big front end test thinking and practice in the Language finch

Through the full stack of JavaScript, the Yuq team can complete product development more efficiently and with high quality:

  • At the code level, there is a huge amount of code that can be reused, such as the editor, which can be used not only on the Web but also on the desktop. Many of the data processing capabilities are also available on the server side.
  • In terms of product r&d efficiency, full-stack R&D reduces a lot of communication costs and is highly efficient in the current stage of Yuq. However, the JavaScript full stack avoids the need for developers to switch between different languages, and does not need to consider what the front-end lodash/Moment and other tools should be used in other languages, which greatly improves the efficiency of the development of the full stack.
  • Finally, from the point of view of engineers, full-stack r&d gives engineers the opportunity to deeply participate in the whole process of product development. They will spontaneously think about the optimization points of the product and what they can help the product to do technically. For example, the OCR search function recently launched by Yuq is completed by the full stack engineers of Yuq spontaneously from technical pre-research to product landing.

JavaScript full stack with Node.js

When it comes to the full JavaScript stack, one technology that is hard to get around is Node.js. As a server-side runtime tightly integrated with the front end, it basically becomes the spokesperson for the full stack. Is Node.js really a suitable language for large commercial projects? There was a lot of skepticism about it:

In fact, with the development of JS language, many problems have been solved. For example, the emergence of Async Function allows developers to write asynchronous code in a synchronous way, which is easier to understand and exception handling is also easier. At the same time, with the further improvement of the community, a large number of high-quality tool modules and frameworks emerge. The server part of the Finch is based on the Egg framework, which has integrated a large number of modules and services required by Web development. Meanwhile, the programming model based on Async Function is simpler. The advent of TypeScript also dispelled many people’s concerns about JavaScript for large-scale projects. There are other ways to ensure code quality and maintainability (Language is even a pure JavaScript project without a line of TypeScript code).

The first thing the finch does is define the boundary between the core system and the external system. With the hexagonal architecture (also known as the port adapter architecture), we anchor the interaction between the core system and the external system and the user. Input and output are determined in the form of ports. External system through the “adapter” to connect the system to the port exposed by the speaker, just need to follow the definition of “port” to achieve, external system can be freely replaced.

In this model, the Controller is the HTTP adapter that the speaker exposes to the user interface. In Controller, we format and convert user request parameters, check user permissions, and format the output.

We define the way in which the speaker interacts with third-party platforms and services (typically a series of methods), encapsulate different services in different environments into a unified method through adapters, and log calls as they are called.

The data Model layer is the Model of the data layer. Take the Doc Model as an example, its meta information data is stored in MySQL, and the document body data is encrypted and stored in OSS. For the core business logic of the language, there is no awareness of where the underlying storage is. Further, as long as the speakers are using SQL to interact with the database, the underlying data can be seamlessly migrated to other databases such as OceanBase that support the full SQL syntax, and even minor changes can be encapsulated in the Model layer.

Finally, take a document publication as an example, the user interacts with the finch by calling the HTTP interface, and the data is written to the storage, including MySQL and OSS, through the Model layer to update the document cache. At the same time send asynchronous messages to other systems, trigger the WebHook of the pin, and synchronize the data to the search engine. The interaction with the external system is encapsulated by the adapter. Parameter conversion, permission verification and log recording not only ensure the simplification of the core logic, but also make the system call link tracking more simple.

Hybrid Application Architecture

When the system develops to a certain extent, should we continue to add functions to the large unit applications, or split into micro services? Since these two architectures exist, they certainly have their own advantages and disadvantages. The specific type of architecture should be determined by the current business scale and team distribution. Therefore, the technical architecture of Wordfinch has become a hybrid technical architecture along with the business form of Wordfinch.

The main service is a large Node.js service that centralizes all the application business logic. Besides the main service, there are other services in different forms.

  • Microservices: Some independent and stable functional modules, or services with additional deployment architecture requirements, are deployed independently in the form of microservices, and systems temporarily interact with each other through HTTP interfaces. For example, real-time collaboration service is deployed as an independent micro-service because it is relatively independent and stable, and is a long-connection service that cannot be frequently issued and restarted.
  • Task clustering: Some CPU-intensive tasks, or services that rely on complex third-party dependencies, are placed in a separate task cluster. For example, various file preview services, which may depend on other services and require a large amount of computing costs, are best suited for task clusters that queue to eliminate concurrency.
  • Function calculation: For some services with high response time and functionalization, we will try to migrate to alicloud function calculation, such as Plantuml, Avi and other text drawing services.

Take the Avi render as an example. The user enters a piece of Island code to call the puppeteer, which calls a function deployed in the Ali cloud function and returns puppeteer rendered in SVG.

Why single out Serverless in particular? Remember that Node.js is single-threaded and not suitable for CPU-intensive tasks? Thanks to Serverless, we can migrate these security-risky, CPU-intensive tasks to functional computing. It runs in a sandbox environment without worrying about security risks posed by malicious code from users, while taking these CPU-intensive tasks away from the main service and avoiding blocking the main service in the event of concurrency. The pay-as-you-go approach also provides significant cost savings by eliminating the need to deploy a resident service for low-frequency functional scenarios. Therefore, we will try our best to migrate such services to Serverless (such as Ali Cloud function computing).

A universal domain beyond language

There are more aspects to any commercial system than language, and two of the most important are probably security and stability.

There are a variety of security risks in a system’s dependencies from the front end, the service end and the bottom layer:

  • Front-end security risks: XSS, jump phishing, cross-site request, etc
  • Server security risks: horizontal permission issues, unauthorized access, sensitive information leakage, SSRF, SQL injection, etc
  • Security risks of cloud services include SMS/email bombardment, data leakage risk, and content security

There is no silver bullet to solve these security problems. They can only be dealt with one by one, but there are some basic principles:

  • Do not trust any input from the user
    • Any place where rich text is rendered needs to be XSS protected, and the content may not be entered through the IDE;
    • The user’s code to execute on the server must be sandboxed;
    • To request user-passed resources from the server, SSRF filters must be passed.
  • Precipitate standard coding paradigms to address security risks and need to be highlighted in Code Review
    • All interfaces must have permission verification.
    • The response serialization method filters sensitive information.
    • SQL is not allowed to concatenate;

Yuq has been cooperating with the security team since the beginning of commercialization, from internal security awareness training, internal security team testing, to internal red and blue attack and defense, and external white hat penetration testing, security is a protracted battle.

We’ve done a lot of work from the front end to the server and the cloud to ensure stability. Stability, like security, is a long-term project from front to back. The stability guarantee of songbirds is mainly in two dimensions:

  • Ensure service availability: In terms of architecture design, a single point should be eliminated. Data at the bottom must be Dr And backed up, and services must be deployed in multiple units and availability zones. At the same time, avoid introducing unnecessary strong dependencies;
  • Exception monitoring and tracing: service burying point and exception log monitoring at the front end, whole-link log tracking and collection at the server end, and system performance monitoring and analysis. Finally, anomalies can be sensed and traced in time, and performance problems can be located and analyzed.

What does it mean to avoid introducing unnecessarily strong dependencies? For example, MySQL is a strong dependency that cannot be removed, and cache should not be a strong dependency. However, the session of the earliest MySQL is stored in the cache (Redis). Once the Redis cluster fails, the user data cannot be obtained and the user cannot log in. This turns the cache into a strong dependency. So we put the session store in MySQL, and Redis becomes a weak dependency, and the system still works when it dies. Another example is that Wordfinch launched the real-time collaborative editing function of multiple people some time ago. Before this function was launched, multiple people could avoid editing the same document at the same time by locking the document. However, real-time multiplayer collaboration introduced another service. Once the real-time collaboration service failed, users could not edit documents, and it became a strong dependency of the Chatbird system. In order to solve this problem, we automatically switched to the old lock mode when users failed to connect to the collaborative service. Thus the cooperative service also becomes a weak dependency of the language finch.

How to choose a technology stack

The whisperer has evolved over the years, and the technology behind it has evolved, but it has always followed a few principles:

  1. The technology stack selection should match the product development stage. Products have different requirements for technology at different stages. The earlier the product is, the higher the requirements for iteration efficiency will be. After commercialization, the higher the requirements for stability and performance will be. It is not necessary to start with the most advanced technical solutions, but rather to consider and trade off with the production phase.
  2. The selection of technology stack should be combined with the technical background of team members. The reason why Yuqi chooses JavaScript full stack is that most of the incubating team of Yuqi are programmers with JavaScript background. Meanwhile, Node.js is also a first-class citizen in Ant, with relatively perfect supporting facilities.
  3. The most important thing is that whatever technology stack you choose, security, stability, and maintainable (scalable) should always be considered. What language and service you use will change, but these basic security considerations, stability considerations, and how to write maintainable code are all important factors in determining the long-term viability of a project.

“Alibaba Cloud originators pay close attention to technical fields such as microservice, Serverless, container and Service Mesh, focus on cloud native popular technology trends and large-scale implementation of cloud native, and become the technical circle that knows most about cloud native developers.”