A bit of background, we are the Ant Codespaces team, providing cloud capabilities to Ant engineers. What we’re offering is a typical B/S application that allows users to do standard stack development work in a browser without having to download any software locally.

At the same time, problems arise.

Web applications have a significant disadvantage compared with Native applications: keyboard events will conflict, and some shortcut key combinations will be “invalid” in Web applications. Here are a few examples: Such as Cmd + W, Cmd + N, Cmd + T, etc., these events cannot be properly responded to by Web applications without additional processing.

The reason can be attributed to “Before the Web App has time to process Keyborads Event, Browser has already made corresponding response and produced side effects”. For example, when Cmd + W is responded by the Browser, the side effect is to close the current TAB page, and all pages are closed. The Web App was shut down.

Ant Codespaces is an important part of cloud development, and shortcuts are an important way to improve the performance of tools, so we need to provide users with key combinations and experiences that are not behind local IDES.

Now that we have this problem, what do friends do?

Github Codespaces

Github Codespaces cleverly avoids these conflicting shortcuts, allowing users to set their own shortcuts in the configuration interface (but not in the context of browser conflicts).

Theia

Theia provides Alt + W as the default combination instead of Cmd + W

Coding

Coding is the same as above, also avoiding conflicts + autonomous setting

I was withered after trying all the external products.

It’s no surprise that vendors have come up with a generic solution to this problem: avoid browser – incompatible keyboard combinations.

This approach solves the problem, but at the cost of changing the habits of local users.

The functionality is there, but the experience is different.

Then I went to Teacher Theia for lessons, but I did not find a good solution:

The mental model behind keyboard shortcuts

When the industry already has a common “solution,” why bother?

From my personal experience, I came to huanglong in June this year. When I transferred my r&d activities from local VSCode to Cloud IDE, I felt the most uncomfortable: Some of the high frequency keyboard shortcuts are not consistent with my first thought, such as closing the file Cmd + W. Even after more than 100 days, I still don’t want to close the file using a combination of non-CMD + W.

Meanwhile, I’m not the only user struggling with this, and we’ve received some feedback from users looking to fix the consistency of high-frequency keyboard shortcuts.

This is a very reasonable appeal. When we download a Native App on macOS, it is natural to think that Cmd + W means “close XX” and Cmd + N means “create XX”, which is a conventional guide line. Of course, software developers can not comply with this, or even go the other way, which in turn increases the cost of using the software, thereby making it hard to use.

The importance of shortcut keys to tool software

In a given situation, if both the mouse and the keyboard can achieve a certain goal, then the keyboard will most likely be faster than the mouse alone. (Of course, there are also scenarios where a pure mouse might be faster, such as macOS using a trigger corner lock screen.)

As engineers, we can imagine some of the scenarios of daily r&d activities, such as line skipping, searching, closing/opening files…. Sublime or VSCode can do these functions with a keyboard or mouse, but many students will choose to use the keyboard to wake up the corresponding functions.

Even one of the core capabilities of many of the efficiency tools is the substitution of mouse manipulation with keyboard manipulation: Alfred, Spectacle, etc.

We can also think about how other roles outside the industry work. For example, when photo studio owners use Photoshop, some of the keys on their keyboards are almost plastering. You can imagine how much they rely on the ability of shortcut keys. Without shortcut keys, they can continue to work, but the efficiency will be reduced a lot.

It is therefore rash to conclude that following a mental model of keyboard shortcuts improves software efficiency.

Why sloppy, because I made it up:)

Therefore, we should not only solve the problem of “function or not” of shortcut keys, but also solve the problem of “consistent experience” of shortcut keys as much as possible.

Will all conflicting shortcuts fail?

Is not

Phenomenally, there are roughly two types of shortcut keys:

  1. Can be the preventDefault
  2. Cannot be the preventDefault

The combinations that have preventDefault are well soldable, using Cmd + S as an example

In a Web application with no additional handling of keyboard events, Cmd + S triggers the browser to save the page, see:

After canceling the default event with the following code, we can happily add a callback to the Listener

document.addEventListener("keydown", (e) => { if (e.keyCode === 83 && e.metaKey) { e.preventDefault(); alert("I AM CMD_S"); }});Copy the code

Look at the results:

The story behind this will not be developed in this article. I believe that the old masters in front of you understand Events better than I do

What about shortcuts that don’t allow preventDefault

If CMD + S does not cancel the default event, cb will be executed first and then the default event will be executed later (you can see this in the same demo) :

When preventDefault, it should be:

So what happens if you replace the event CMD + S with CMD + W?

Rewrite the above demo and try it:

document.addEventListener("keydown", (e) => { if (e.keyCode === 69 && e.metaKey) { e.preventDefault(); alert("I AM CMD_W"); }});Copy the code

CMD + W events are fundamentally different from CMD + S events. We can continue to make a guess from the original flow chart. When the browser responds to this part of the higher priority shortcut irreversible side effects, Listener’s CB even preventDefault will be impotent as the higher priority side effects already occur:

That! How! Yao! Do! Ah!

The problem is not big, there is always a way, nothing more than to weigh the input and output.

Emit keybords events and browser do STH. (1) To prevent irreversible side effects, (2) to enable the listener to execute cb normally. As long as these two things are met, it’s basically done, and the rest depends on r&d costs and implementation:

After having a general idea, I continue to look for ways to realize it, which can be roughly divided into three directions:

  1. Write a native Bridge to broker system-level events and somehow communicate with the browser (similar to Postman Capture Requests and cookies).
  2. Add the Electron version of IDE, in theory VSC shortcut key capabilities can be achieved
  3. Use Chrome Extension to proxy shortcuts

way

The complexity of the

advantages

disadvantages

Native Bridge

⭐️ ⭐ 100 ⭐ ⭐ 100 parameter maximum maximum maximum maximum maximum maximum maximum maximum maximum maximum maximum maximum

  1. System-level ability. I can do everything I want to do. I can’t think of anything I want to do
  1. You will need to install native Bridge and Chrome Extension
  2. You need to consider cross-platform
  3. High complexity

Electron App

⭐ ️ ⭐ ️ ⭐ ️

  1. cross-platform
  2. Add Electron to your existing Web application at a lower cost than upstairs
  1. Users need to install the Electron App locally, which is inconsistent with the current Cloud IDE positioning and scenario
  2. After electron, if the subsequent system-level API is not provided by Runtime, cross-platform r&d costs will still be unavoidable

Chrome Extension

⭐ ️

  1. cross-platform
  2. Construction of small
  1. You need to install Chrome Extension
  2. Most events can be captured, but some will escape
  3. Constraints on the browser

I think the mission of engineers is to find the optimal solution within the limited complexity, and after sorting through the three ideas, Chrome Extension is basically the preferred implementation for the moment.

On the one hand, Chrome Extension can solve the user’s several strong request combination (such as close a file CMD + W, create a file CMD + N, open a new Tab CMD + T), solve these few keys basically solved the majority of users. That leaves a few combinations that Chrome Extension can’t block, such as CMD + Q. Those of you familiar with macOS shortcuts know that this is a shortcut for forcing an app to exit. It doesn’t matter if you stop or not, just run and run.

On the other hand, our Cloud IDE is different from other mid-background products in that we only need to be compatible with Chrome@latest, providing the best hosting environment for this solution.

Shortcut key conflict resolution based on Chrome Extension

To sum up, basically the theme of this plan has been set:

  1. The ROI is high enough if the solution is light enough and the complexity is low enough
  2. You don’t have to worry about cross-platform
  3. Allows escape from some of the less commonly used combinations

Specific ideas

In one sentence, the idea is to block the native behavior of the browser when a keyboard event is triggered in the TAB of the target application and follow the expected behavior of the Web application.

How to shield?

As you can see from this statement, “blocking browser native behavior” is the main premise of this solution, so the question is, can Chrome Extension do this?

can

Many extensions provide the ability to customize shortcut keys, which can be used to execute the corresponding command through a combination. After testing, it is found that combinations such as CMD + N, CMD + W can be extended by shortcut override. While Chrome’s official documentation does not mention override Browser’s shortcut as a feature, at least in terms of current behavior, this approach is possible.

So the initial flowchart looks like this:

So what’s the new problem with this?

some

You can see that the browser does STH. No more.

One might wonder: we do want to remove events that browser responds to, why would that be a problem?

Expand on the above image to make it clear:

It is not hard to see from the expanded flow chart that we still need the browser behavior that is overridden, because the user’s browser needs to meet other daily browsing requirements besides running our Web application. If you lose Browser’s native ability to quickly create a new TAB simply because you want to open the file TAB using CMD + T in the Cloud IDE, you’re back to where the problem started.

What are the expectations?

As is expected, when the current TAB is the TAB we want to override, the Broswer behavior will not take effect, and when the current TAB is another page, the Browser behavior will continue.

Can you do it perfectly?

You can’t, but you can pretend you can

Here’s a look at the architecture of Chrome Extension (via KMSFan) :

As you can see from the figure, Chrome Extension has several core concepts:

  1. Background
  2. Popup
  3. Content Scripts
  4. Injected Scripts

The concept is not a spread out introduction, pick up two concepts we use to say.

Background JS provides developers with rich Chrome API call capabilities. Common basic capabilities such as opening and closing pages and jumping tabs can be simulated through Chrome API, so it can patch native events of the browser.

The flow chart changes again:

So what’s the problem?

Do you have any questions

Take CMD + OPT +W as an example. Although We intercept CMD +W through Chrome Extension, the Listener of web application is registered on CMD + OPT +W. In other words, it is useless for us to block events, and events corresponding to the Web application need to be added. Content JS is tab-level and can obtain the context of the corresponding page, so we can patch the events registered by the application through Content JS.

The flowchart continues to evolve:

Why this approach?

On the one hand, our Web application is an application with a very complex architecture, so we do not want to make invasive transformation due to this solution, so as to avoid deriving more logical branches and environment concepts. On the other hand, it is a generic solution that not only solves Cloud IDE problems, but other Web applications can access similar solutions without any trace.

Finally, take a look at the code implementation, using CMD + N new file example:

For Backgound JS, we need to emulate the new open window behavior, which the Chrome API already provides:

chrome.windows.create()
Copy the code

For the corresponding Content JS, we need to reissue the events registered by the Web application, namely:

Combine backgroud with patch in content and register it with the corresponding shortcut key of the extension:

Look at the end result

Before using the scheme

Using CMD + N in a Web application opens a new browser window, which does not meet users’ expectations

After use scheme

CMD + N opens the new File TAB in the Web application, as expected

Writing here, basically this wild way has been described, it is still not the most perfect solution, but at least for solving similar problems to provide a new direction and ideas.

Ah! Finally, you can “close” the file

At last!

We are ant R&D Efficiency Department, committed to providing ants and a number of financial enterprises with nuclear power grade r&d products throughout the whole life cycle. The products of r&d efficiency Department cover the whole R&D activities of ants. Includes code services (hosting, auditing, scanning, searching, building, content mining), code editing (Cloud IDE), CI/CD, test inheritance, environment building, full link chain, configuration management, resource scheduling, and data products based on the full life cycle of r&d activities. Join us to build the next generation of Ant Group’s r&d performance platform based on cloud native.

(P.S. we are trying WFH with small flow, one day per week can work at any place, which may be one of the few mysterious departments in China that still keep efficient exploration of remote work in the post-epidemic era. I believe many engineers will like this experience.)

The last last!

I am xiao Feng, a hitchhiker who has never received a single order. If you come from a long way, I can pick you up at the airport and take you to Huanglong. All of this is free, and I can even give you an ice cream.

Xiaoshan -> Huanglong airport hot mail: [email protected]

Pick up machine signal: you still recruit a person there not.

It doesn’t matter what the code is, nor the mode of transportation, nor the title, nor where the base is. What matters is to find interesting things and meet like-minded people in the process of exploration.

Last last last!

Attached JD:

  1. Senior full stack engineer
  2. Code platform technology specialist
  3. Platform product expert
  4. Cloud native container expert
  5. Senior Development Engineer
  6. Distributed computing architecture specialist
  7. Senior IDE r&d engineer
  8. Code intelligence engineer
  9. Compiler development engineer