Front-end early chat conference, a new starting point for front-end growth, jointly held with the Nuggets. Add wechat Codingdreamer into the conference exclusive push group, win at the new starting line.


The 14th | front-end growth promotion, 8-29 will be broadcast live, 9 lecturer (ant gold suit/tax friends, etc.), point I get on the bus 👉 (registration address) :


The text is as follows

This article is the ninth online document, the front end early chat 61, from the yuquai team senior front-end technology expert – Yu Chun sharing brief collation version of the lecture (the complete version of the demo please see the video and PPT) :


Front end to document | in spring – finches spreadsheet from the way of research


To introduce myself

My name is Zhao Yong, and my flower name is Yu Chun. I joined Alibaba Taobao UED in 2009, and joined Ant Experience Technology Department in 2015. Now I am in yuquai team.

background

Wordfinch is an innovative product of the Experience Technology Department of Ant Financial, which is now in commercial operation. Provide knowledge creation, organization and communication services for individuals and organizations. If you or your team have knowledge management needs, try using language. Wordfinch has always attached great importance to creative experience, so we have been continuously investing in research and development in the field of editor, and constantly iterating. Today, we want to introduce the online spreadsheet developed by ourselves.


When preparing for this sharing, I specially looked at the record of the first submission of the code, which was on May 29 last year, exactly one year ago. Before sharing, I would like to introduce to you the research and development schedule of the following language spreadsheet in this year.

In 2019,
  • May: Project approval.
  • August: Beta version launched, with basic row and column operations, basic styles, merge cells, filter sort, support to insert links, images, attachments, checkbox, drop-down options, etc.
  • September: Import function, multi-sheet, formula, statistics bar, add date, currency, time type.
  • October: Format brush, picture copy and paste, support attachment preview, border function online.
  • November: Export function online, real-time collaborative online, cell protection function.
  • December: Chart function online, selective paste.
In 2020,
  • 1 ~ 3 months: Stability, compatibility, performance, experience optimization.
  • April: Conditional formatting, drop-down options optimized.
  • May: Form template, screening optimization, color screening.

For more details, please visit our update log.

After a year of research and development, Language Sparrow has been able to meet the functional requirements of 80-90% of ordinary Excel users. The following is our current function set. Next, we will control the launch pace of new functions and make in-depth optimization for existing functions.


Through the above introduction, I hope you can have an overall understanding of the current state of the finch.

The following three parts will introduce the self-research process of finch: Why, How and What.


In the Language Team and experience technology department, no matter what we do, we always ask ourselves a lot of why, figuring out why is often more important than thinking about how.

  • We are thinking about the table in the early stage.
  • How: It talks about How to do it and some of the methods and technology selection in the development process.
  • What: Talk about some experience about self-research.

WHY

In this section, I will introduce two questions:

  • Why do online spreadsheets
  • Why choose self-research

Why do online spreadsheets

Spreadsheet Origins

When it comes to spreadsheets, Excel may be the first thing that comes to mind. When some of our users ask us questions, they also say, “How good is your Excel?” I think this is because Excel has formed a deep-rooted influence in people’s minds over the years, thinking that spreadsheets are Excel. But Excel wasn’t the first spreadsheet software. The first spreadsheet dates back to 1978, when Dan Bricklin, an American genius, created a program called VisiCalc in his university lab.


Is the first time to use the line layout to complete digital input and calculation software, although simple, but have now electronic form the core of the model, when people are doing business with notes on the books, one record, by the end of the month to do collect reconciliation, efficiency is very low, and error-prone, once wrong had to start to calculate, So VisiCalc was loved right away. The software was installed on the Apple 2 PC.


It was the software that helped launch the MAC 2 and became a huge seller. Jobs said in an interview that year that spreadsheets helped the industry grow, and VisiCalc contributed to Apple’s success.

Spreadsheet key history


In 1985,

Excel 1.0, Microsoft’s software for Apple, came out. Gates decided that writing software for people wasn’t enough, so he created Windows. There is also a feud between Jobs and Gates.

In 1988,

In a dark room, Qiu Bojun spent 14 months writing and developing a domestic office software called WPS, which is respected by domestic programmers.

In 1993,

Microsoft came out with The Office suite, bundled Excel into it, and then Excel went from there, iterating, almost unchallenged.

In 2006,

There was Google Docs, and all the programmers were stunned. The ability to implement client-side office software on the web was part of the front-end boom.

In 2007,

Apple has developed its own spreadsheet, Numbers, which is a great experience. It’s true to Apple’s simple style, but hidden inside is a very powerful spreadsheet that doesn’t yet support online collaboration.

In 2011,

Microsoft is starting to focus on the online Office experience with the online version of Office 365, and the online Version of Excel is really mediocre.

In 2013,

When Airtable came along, people started to realize that the old spreadsheet could be played this way, and that’s what isomorphic tables are all about.

In 2016,

Domestic only appeared online spreadsheet products, among them WPS, graphite in this relatively early, and then Tencent and fly book and other large factories also began to follow up.

From the long history of the above 40 years, the spreadsheet has always maintained a strong vitality, and its functions have been constantly improved. From the client software to the online, it is enough to prove that people’s demand for data processing is always strong.

Finch and DIKW model

Again, why a whisperer makes a spreadsheet, it starts with one word, “knowledge,” and how knowledge is defined.

Yuq’s product positioning is the cloud knowledge base, and we have been having an interesting discussion inside: “What is knowledge”, what’s The relationship between knowledge and data, until we found this model, as early as in 1888, has been thinking about this problem, Thomas stahl that Eliot, in a book “The Rock” thought about knowledge, wisdom, consultation, The relationship between The later scholars gradually improve a model like this: DIKW model.


  • Data layer: Data is the most original material, and the key task of this layer is to record. Data itself also has information properties, but it reflects more of the past, such as the daily temperature, humidity, rainfall and other data of Hangzhou city in 2019.
  • Information layer: Extract some higher-concentration information from the data layer, or conduct statistical analysis according to the relationship between data, and find some regularity, such as the annual average temperature of Hangzhou, the hottest month and the most comfortable month. The information is refined enough and has certain reference value.
  • Knowledge layer: Information layer has usually guide the life and production, but the information is fragmentary information layer, knowledge layer is combined with more information dimension, targeted and purposefully will classify the bits and pieces of information, connection, produce some of the more comprehensive information, such as the world nearly 100 years of meteorological data analysis, will find that the earth’s temperature is rising, Therefore, combined with the relationship between the data, the greenhouse effect was discovered. After verification, it was found that carbon dioxide is the cause of the greenhouse effect, and the value of this knowledge is greater. Therefore, knowledge is derived from life, recognized by the public, and can ultimately help people make decisions.
  • Wisdom: With the application of knowledge, people can predict the development of the future through the collected data, which is wisdom.

In this model, it can be clearly seen that in order to form knowledge, people first need to produce information from data processing, and tables play a very important role in this step. Documents are a further arrangement from information to knowledge. Therefore, in order to build a professional cloud knowledge base, we cannot miss the link of data processing in the whole process of knowledge generation.

Why self-study

Why do you choose self-research? I remember a marketing director of Miui Youpin said, “99% of the products in our life are worth recreating.”

How do you reinvent a decades-old product like the spreadsheet? Airtable, which provides its answers, is more like an online version of Database. In fact, We launched a version of isomorphic tables similar to Airtable within the group based on Spreadjs. However, we found that users did not like this new thing very much, and after some changes, we stopped developing it, because we always felt that our hands and feet were tied.

In 2018, we started comprehensive self-research. At first, Teacher Long Hao, the author of kindEditor, joined in, and yuq document editor began comprehensive self-research. After the launch, the performance has been significantly improved, and the late research and development and maintenance efficiency is very high. We can respond to many questions raised by users in the first time and solve them quickly. This self-research has brought great confidence to our team. Personally, I started to enter the editor field from document forms.

In May 2019, we decided to restart the form of research and development, set aside Spreadjs, since comprehensive research, the research gives us at the bottom of the controllable, expanding, and in the long term, since the research of the comprehensive cost efficiency is optimal, choose the research you carefully consider the following factors: model, experience, performance, research and development, maintenance, business and other factors.


  • Model: Data model is the basis of business development and the realization of all functions. Only by mastering the design of data model can we better meet the needs of future business and functions. Self-research can fully control the design of the model.
  • Experience: Experience in a broad sense includes many aspects, including performance and service. Here, it mainly refers to experience in a narrow sense: interactive experience. Self-research can design experience according to business needs to a greater extent, and technical implementation is more integrated, rather than patching other people’s functions.
  • Performance: Mastering the model can be addressed by optimizing the data structure where key performance is concerned, and the architecture can also be tailored.
  • Research and development: In the long run, research and development costs are actually lower.
  • Maintenance: Problems can be fixed faster due to control of the underlying layer. You won’t encounter problems that can’t be fixed because of underlying flaws.
  • Business: Diy can design models based on business needs. It’s more expansive.

HOW

  • What technology was used
  • Some cases of technical scheme selection

What technology was used


The Model layer processes data, the View layer has a rendering part and UI parts such as toolbars and panels. The rendering part is drawn using native Canvas, the UI layer uses React, and the component library is ANTD. The diagram uses AntV. Ui-level events converge to the Control layer. Handlers handle event parameters and access rules, and then Command performs data manipulation. Contains the parameters to execute this command and the snapshot data to be carried during the execution, which is convenient to undo and synchronize the command to other clients through WebSocket collaborative service. The architecture is very simple, and in the selection of technical solutions, Language has been pursuing practical and simple.

Isomorphism or isomerism

The first selection that I want to share is the question that needs to be considered when designing the model, the selection of the product, which is what you heard earlier, isomorphic table or heterogeneous table.


  • Isomorphic table: Each column of data has the same data structure, such as text, date and time. These are defined in the column header. Once defined, the data in the entire column is processed according to this data structure, so its data format attributes need only be defined in the column header.
  • Heterogeneous table: Each cell can set its own data format, the whole table is more like a canvas for column layout, each cell can set formula, more flexible.

It can be seen that they have their own advantages and disadvantages, and the implementation of data processing is particularly different.

How to choose, we finally look at the needs of users, we will find two very classic scenarios, one is information structure, the other is data processing, both of which are very important.


Data processing needless to say, must satisfy. What is information structuring? It is to make the information layout more reasonable and easier to understand by taking advantage of the table’s natural row and column layout. As mentioned above, information structure itself is an important link in the process of knowledge formation. Sometimes, structured layout is also a part of the display form of knowledge. Some information can be better displayed through tables, such as periodic table of elements. However, we know that isomorphic table cannot combine cells, which has great defects in information layout ability, so we finally choose heterogeneous table.

Multi-user collaborative editing

Real-time multi-player collaboration has always been the biggest problem in the field of editing, and table editing will also encounter the same, and it will be more complicated. Let’s start with the model of multi-player collaboration.


Two people at the same time, based on a baseline start editing, when the two instructions O1, O2 passed to each other there, must carry on the transformation (OT), here is the fine particles of OP, and often to form the operation, such as mobile, there may be thousands of OP, if just another user has made similar batch operation, Then a large number of OP generated instantly will be transmitted to the server, which will take a long time to convert OP, resulting in response delay. Then users will further generate OP, resulting in more computation and delay, resulting in congestion.

For table operations, we can look at this diagram.


A large data object executes a Command and generates several ops. if the Op synchronization is too challenging for performance, can I directly synchronize the Command? We can compare the differences between the two models.


By comparing the two schemes, we finally choose to do data synchronization based on Command. Each Command will carry the parameter value required by the Command and transfer it to the other end. The server does not need to do operation transformation, and the client will run directly after receiving the Command. Generally, an instruction can be completed within 100ms or even faster from sending to receiving the instruction. In my opinion, it is the best way to reduce conflicts to quickly synchronize the instruction to each end.

As we know, usually there is no conflict in cell editing instructions, but only when changing the position of the row or adding or subtracting the row, so we need to consider the question: what is the probability of operating the row within 100ms at the same time? From the actual scene, this probability is very small. We also combine some post-conflict do-not-undo policies to prevent data clutter. We ended up with a tradeoff, synchronization Command. From the use of the effect, has been able to meet the majority of collaborative scenarios.

Maximum number of cells

There are a lot of limits when you’re working on a spreadsheet, such as the maximum number of computable numbers, the range of dates that can be recognized, and so on. Let’s talk about the maximum number of cells a spreadsheet can support.

Let’s start by looking at a few competing products with clear limits. Google explicitly supports 500W cells and alerts you if you exceed them. Feishu is 50W, Tencent documents is 25W. Google stores the data on the server side and does the computation on the server side, so it can theoretically support a much larger number of cells, so the difference in the number of cells appears to be just the upper limit of the number of cells, but it leads to an important architectural choice: the data computation on the client side or the server side. But it’s not that hard. It’s back to the business itself. Most scenarios in an online spreadsheet won’t have that much data, based on two considerations:

  • Make full use of client computing resources
  • Keep the architecture simple

We choose client-side computing, which means loading the data to the client at one time and computing it in the user’s browser. What is the maximum number of cells we can support in this mode?

If format and style are added, a cell needs about 20 ~ 200 characters. To do local temporary storage, we use localStorage, up to 5M, (of course, IndexedDB asynchronous model will make the whole programming model complicated. We try not to use complex solutions. The maximum number of cells that can be supported is 25W (10,000 rows and 25 columns).

On the basis of this scheme, we did compression optimization again, and finally achieved 250W cells. Of course, this is to ensure the experience of lossless scene in offline editing. If the network environment is good and there is no large number of formulas and styles, the language table can support a larger number of levels.


Let’s look at the effect of data editing performance on the order of millions.

Mega table operate.gif

Copy and paste the damage repair

In form of operations, often meet the operation to copy and paste from other place handling data, such as from a web page, or Excel data, the Numbers will be moved to online, in dealing with local or copy and paste a small amount of data is the most efficient, and usually we came out of the local copy form will meet some strange questions, That’s breakage.


The HTML in the clipboard will have strange cell Td or Tr missing. For tables, if Tr or Td is missing when parsing, the cell position will be easily resolved incorrectly, resulting in the pasted data.

Table fix _yuque.gif

The verbal form does a better job of undoing all kinds of copy-and-paste damage.

More interesting stories

In addition to the above cases, we also have many small cases in r&d, which include our thinking about products, understanding of experience, attitude towards technology, and time relationship.


Welcome to try the language sparrow, and give us valuable suggestions.

WHAT

The last part says from grind medium result, compare not systematic, it is true feeling however, the classmate that hopes to have from grind plan can have some reference.

At the end



Hiring time

At present, the following four types of editor are fully developed by Wordfinch. We welcome more students who are interested in editor to join us.


If you’re interested, you can just drop in your resume.



[email protected]

We look forward to working with you to provide better creative experience and make knowledge equal to wealth.

This article is formatted using MDNICE