background

This article takes the interaction area, which is the most complex function and one of the most important functions in Douyin, as an example to share with you the thinking and methods in the reconstruction process, mainly focusing on the architecture and structure.

Introduction to Interaction Zone

Interaction area refers to a page can operate in the area, the function of simple understanding is in addition to the video player attached, below the author name, description, copywriting, head of the red areas, thumb up, comment, share button, cover layer, pop-up panels and so on, nearly all users to see, use up function, and the flow of the main entrance.

Found the problem

Don’t rush to change the code. First, sort out the functions, problems and code, establish a holistic view and find the root cause of the problem.

The status quo

The figure above shows the code volume ranking, and the ViewController in the interaction area ranks first, far ahead of other classes. The data comes from the self-developed code quantization system, which is a tool to assist business to discover architecture, design and code problems.

You can further view version changes:

One version per week, in less than a year, the amount of code doubled, the number of individual versions of the code decreased, is local optimization, the general trend is still rapid growth.

In addition:

  • Poor readability: ViewController has 18 + 10000 lines of code, which is the largest class in Douyin and more than twice as much as the second largest class. The interaction area uses VIPER structure. MVC, MVVM, MVP, VIPER), plus the other four layers of IPER, the total code size is more than 30,000 lines, such a size of code, it is difficult to remember where a function, a business logic is what, in order to modify a place, need to read all the code, very unfriendly

  • Poor scalability: Adding and modifying each function needs to change five classes in VIPER structure. Although the business logic is independent of functions, it needs to couple a lot of existing functions, modify existing code, and even cause chain problems. One problem is fixed, and a new problem is created as a result

  • Large number of maintenance personnel: Statistics of commit history show that there are several business lines and dozens of people commit code every month

Clarify the business

The author is from the basic technology group of Douyin, responsible for the business architecture work. He has no understanding of the business of the interaction area, so he needs to reorganize it.

Has in fact no one understand all the business, including product manager, also does not have a complete requirements document to consult, need according to the code page, function, operation to comb out the business logic, uncertain and find related product development, classmate, omit intermediate process, a total of 10 + comb lines of business, 100 + subfunction, combing the purpose of these functions are:

  • Prioritize by importance, prioritize core functions, and allocate more time for development and testing

  • There are certain rules of layout and interaction between sub-functions, which can guide the design of refactoring

  • To judge the evolution trend of products, the design should not only meet the current situation, but also have a certain forward-looking

  • Self-test needs to use, to avoid omission

Sort out your code

All business functions and problems ultimately fall on the code. Only by sorting out the code can we really sort out the problem, and the solution is also reflected in the code, which is summarized as follows:

  • Code amount: VC 18,000 lines, the total code amount of more than 30,000 lines

  • Interface: More than 200 methods and 100 attributes are exposed

  • Dependencies: The VIPER structure is not ideal. Presenter directly relies on VC and is coupled to each other

  • Cohesive, coupled: a subfunction where code is scattered and overly coupled to other subfunctions

  • Useless code: lots of useless code, code that doesn’t know what to do

  • View hierarchy: All subviews are placed in VC’s direct subView, that is, VC has 100+ subviews, the actual need to show about 10 subviews, the other through the Settings hidden, but create and participate in the layout, will be a serious performance consumption

  • ABTest (group controlled trial) : There are dozens of ABtests, the longest time can be traced back to several years ago, these ABtests are difficult to fully cover in self-test, test

To summarize, you need to read the code in its entirety, focusing on dependencies between classes, which can be understood by drawing class diagrams.

Every line of code is written for a reason, and even if it feels useless, deleting a line could be an online accident.

trend

The characteristics of Tiktok products determine that the video playback page occupies the majority of the traffic. All business lines want to play the flow of the video playback page. With the development of the business, it is constantly evolving to diversity and complexity.

From the form of the playing page, after many explorations and attempts, the current playing page mode is relatively stable, and the business is mainly expanded by the entrance in the form of flow diversion.

The way it’s been tried

ViewController split the Category

Split ViewController into multiple categories, splitting code into categories based on View construct, layout, update, line of business logic. This method can solve some problems, but it is limited. When the function is very complex, it cannot be well supported. The main problems are as follows:

  • The ViewController is split, but the IPER layer is not split. The split is incomplete, and the responsibilities are still coupled to each other

  • When categories access each other’s required properties and internal methods, they need to be exposed in header files, which should be hidden

  • Cannot support batch call, such as ViewDidLoad, need each Category method definition different methods (the same name will be overwritten), call one by one

The left and bottom subfunctions are placed in a UIStackView

The idea was broadly in the right direction, but after trying for more than half a year and failing, I deleted the code.

The right point is to abstract the relationship between sub-functions, using UIStackView for layout.

The point of failure is:

  • Local reconstruction: It is only local refactoring, without in-depth analysis of the overall function and logic, and without completely solving the problem. Both the navigation layout code and the use mode of UIStackView are placed in the ViewController. Views with different functions are easily coupled, and deterioration still exists, but it is difficult to maintain soon. It’s like a broken window

  • The implementation plan is not perfect: the layout needs to achieve two sets of code, development and testing students are very easy to ignore, often online problems

  • UIStackView crash: probabilistic crash in the system library, more than half a year has not found the cause

other

Others have proposed structures such as MVP and MVVM, and some have stumbled, some have failed to pass technical reviews, and some have never been implemented.

The key problem

Only some of these problems are listed above, but if we collect them by head, there will be countless problems. But these are basically symptomatic problems. Only by finding out the essence and causes of the problems and solving the key problems can the problems be completely solved.

The oft-mentioned ideas of cohesion, coupling, encapsulation, layering, etc., feel good, but when used, they don’t really solve the problem. Here are two extensions to help analyze and solve the problem:

  • The complexity of the

  • “Variables” and “Constants”

The complexity of the

Complex features are difficult to maintain because of complexity.

Yes, very direct, relatively, design, refactoring and other techniques are to make things simple, but the process of simplicity is not simple, from two perspectives to break it down:

  • The amount

  • Relationship between

Volume: Volume is explicit. It’s obvious that as the functionality increases, more people need to develop and maintain it, more code needs to be written, and more difficult to maintain.

Relationship: Relationship is implicit, produce coupling between functions for sex, suppose that there are between two functions dependent, relationship between the number down to 1, the relationship among them relationship between number for 3, 4 of 6 number, it is a index increased, when the quantity is large enough, complexity will be exaggerated, relationship is not easy to see, So it’s very easy to make unexpected changes.

The number of features can generally be thought of as increasing linearly with the number of people working on the product, which means that the complexity also increases linearly and can be maintained as the number of developers increases simultaneously. If the number of relationships grows exponentially, it quickly becomes impossible to maintain.

“Variables” and “Constants”

“Variables” refer to what code has changed compared to previous versions, and the corresponding “constants” refer to what code has not changed. The purpose is:

Find patterns in past changes to adapt to future changes.

Usual mentioned encapsulation and cohesion, decoupling concept, are static, that is a point in time is reasonable, doesn’t mean that the future is also reasonable, expect reasonable improvement may in the longer time scale, called dynamic, found in the code “variable” and “constant” is effective method, the corresponding code also has different optimization trend:

  • For “variables”, it is necessary to ensure that the responsibility is cohesive, single and easy to expand

  • For “constants”, encapsulation is needed to reduce interference and be transparent to the consumer

Came interaction area reconstruction, find new added functions, basic is added in three areas of the fixed, layout is open from top to bottom, in this case, the change is the new added functionality, constant refers to the location and the location of the other sub function relationship, logical relationship, then the part of change, can provide a flexible mechanism to support the expansion of the same section, The business-irrelevant ones are subtracted into the underlying framework, and the business-relevant ones are encapsulated into individual modules, so that the overall structure is created.

“Variable” and “constant” can also be used to test the reconstruction effect. For example, modules often communicate with each other through abstract protocols. If the communication methods are specific to business, each student may add their own methods, and this “variable” will lose control and be difficult to maintain.

Design scheme

In the process of sorting out the problem, I have been constantly thinking about what kind of way to solve the problem. I have a rough prototype. This part is more about systemizing the design scheme.

Train of thought

  • Discover the rules of UI design and product through the above sorting functions:

  • The whole can be divided into three areas, left, right, bottom, each sub-function can be grouped into three areas, on-demand display, data driven

  • The author name, description, and music information in the left pane are listed from the bottom up

  • The right side is mainly the type of button, profile picture, like, comment, arranged in the same pattern as the left side

  • There may be a warning, hotspot at the bottom, only 1 or no display

  • To unify the concept, the three areas are defined as containers, and the sub-functions placed within the containers are defined as elements. Container boundaries and capabilities can be relaxed to support weak type instantiation, which can support physical isolation of element code, resulting in a pluggable mechanism.

  • Elements bring together View, layout, business logic code, elements and interaction areas, elements and elements do not directly depend on each other, responsibility is cohesive, easy to maintain.

  • A large number of interfaces can be abstracts, which can be roughly divided into UI lifecycle call and player lifecycle call. Business interfaces are abstracts and distributed to specific elements to process logic.

Architecture design

The following figure shows the desired final target form. The implementation process will be divided into multiple steps to determine the final form and avoid deviation from the target during implementation.

Overall guiding principles: Simple, applicable, and evolvable.

  • SDK layer: abstracts the SDK layer that has nothing to do with business. SDK is responsible for managing elements and communication between elements

  • Business framework layer: the low-frequency modified code such as common business and common code is independent to form the framework layer. The code at this layer can be maintained by specially-assigned personnel, but cannot be modified by students in the business line

  • Business extension layer: The specific sub-functions of each line of business are implemented in this layer, providing flexible registration, plug and pull capabilities, decoupling between elements, and limiting the influence of code within elements

The SDK layer

Container

All elements are managed through containers, which consist of two parts:

  • The creation and holding of an Element

  • It holds a UIStackView, and all the views created by Element are added to that UIStackView

UIStackView is used for bottom-up streaming layout.

Element

UI, logic, operation and other code encapsulation of sub-functions are defined as Element. The concept of Element in the web page is borrowed, and the external behavior can be abstracted as:

  • View: The final View displayed, the lazy form of construction

  • Layout: Self-adapts to be supported by UIStackView in Container

  • Event: a generic event, which can be handled by a handler, and can also be added within a view

  • Update: Pass in the model, internally assign values to the view based on the model content

View

View is defined in BaseElement as follows:

@interface BaseElement : NSObject <BaseElementProtocol>
 
@property (nonatomic, strong, nullable) UIView *view;
@property (nonatomic, assign) BOOL appear;
 
- (void)viewDidLoad;
 
@end
Copy the code
  • BaseElement is an abstract base class that exposes the View property and looks at the view property and the viewDidLoad method in a very similar way to how UIViewController is used, and the design intention is to approach UIViewController, In order to make people accept and understand faster

  • When appear is set to YES, the view is automatically created, the viewDidLoad method is called, and the related child view, layout, and other business code is duplicated in the viewDidLoad method. Similar to what UIViewController uses

  • The difference between “appear” and “hidden” is that “hidden” is just visually invisible, and memory is not released, whereas “low-frequency” views don’t have to stay in memory, so when “Appear” is NO, the view is removed and memory is freed

layout

  • UIStackView installed UILayoutConstraintAxisVertical axis, bottom-up streaming arranged layout

  • The elements in a container are laid out from bottom to top, with the bottom element being placed in order by reference to the bottom constraint, and the height of the container being in reference to the top element

  • Elements are automatically supported internally, either by setting a fixed height directly or by autolayout

The event

@protocol BaseElementProtocol <NSObject>
@optional
- (void)tapHandler:(UITapGestureRecognizer *)sender;
 
@end
Copy the code
  • Implement protocol method, automatically add gestures, support click events

  • You can also add your own events, such as buttons, using the native addTarget click experience is better

update

The data property is assigned, triggering an update, in setter form.

@property (nonatomic, strong, nullable) id data;

The setData method is called when the value is assigned.

- (void)setData:(id)data {
    _data = data;
    [self processAppear:self.appear];
}
Copy the code

When assigned, the processAppear method updates the state of the View based on the Appear state, deciding whether to create or destroy the View.

Data flow diagram

The life cycle of the Element, the data flow diagram when updated, will not be covered here.

Animation effects

The figure shows the actual business scenarios that need to be supported. The current phase is ABTest. The main problems of the old code implementation are as follows:

  • On each place the view with GET_AB_TEST_CASE (videoPlayerInteractionOptimization) judgment, there’re a total of 32 code of judgment

  • Each View is hidden using a Transform animation

This implementation is so decentralized that it’s easy to miss adding new views. Element supports a better approach:

  • All the sub-functions on the left are in one container, so you can hide the container without manipulating each sub-function

  • Hide the head separately on the right side and process the music separately

scalability

The code of each Element is placed in its own business component, which depends on the business framework layer of the interaction area. The independent Element is provided to the interaction area in the form of Runtime by means of registration. The framework will instantiate the string class. Make it work.

[self.container addElementByClassName:@"PlayInteractionAuthorElement"];
[self.container addElementByClassName:@"PlayInteractionRateElement"];
[self.container addElementByClassName:@"PlayInteractionDescriptionElement"];
Copy the code

Business framework layer

Container management

The SDK only provides the abstract definition and implementation of the container. In business scenarios, the scope and responsibilities of the container need to be further defined in combination with specific business scenarios.

The whole page is divided into three areas on the left, right and bottom according to the above functions. Then these three areas are corresponding containers, and all sub-functions can be grouped into these three containers, as shown in the figure below:

agreement

The Feed is implemented using UITableView, and the Cell has only the player in addition to the interaction area, so all external calls can be abstracted, as shown in the figure below.

Conceptually, only one interaction area protocol is required, but this can be broken down into two parts:

  • Page life cycle

  • Player life Cycle

All Element to implement the agreement, thus in the SDK Element in the base class, inheritance PlayInteractionBaseElement is achieved, so don’t need to implement in the concrete Element method can not write.

@interface PlayInteractionBaseElement : BaseElement <PlayInteractionDispatcherProtocol>
@end
Copy the code

In order to more clearly defined responsibility agreement, with the interface segregation ideas continue to split, PlayInteractionDispatcherProtocol as a unified aggregation protocol.

@protocol PlayInteractionDispatcherProtocol <PlayInteractionCycleLifeDispatcherProtocol, PlayInteractionPlayerDispatcherProtocol>
 
@end
Copy the code

The page life cycle agreement: PlayInteractionCycleLifeDispatcherProtocol

These methods are all lifecycle methods corresponding to ViewController, TableView, and Cell. They are completely abstract and unrelated to business, so they will not expand with the increase of business volume.

@protocol PlayInteractionCycleLifeDispatcherProtocol <NSObject>
 
- (void)willDisplay;
 
- (void)setHide:(BOOL)flag;
 
- (void)reset;
 
@end
Copy the code

The player’s life cycle: PlayInteractionPlayerDispatcherProtocol

The state and methods of the player are also abstract and unrelated to the business.

@protocol PlayInteractionPlayerDispatcherProtocol <NSObject>
 
@property (nonatomic, assign) PlayInteractionPlayerStatus playerStatus;
 
- (void)pause;
 
- (void)resume;
 
- (void)videoDidActivity;
 
@end
Copy the code

Manager – Popover, mask

View rules of popover and mask layer are not in container management, so an additional set of management mode is required. Manager is defined here, which is a relatively abstract concept, that is, it can realize functions such as popover and mask layer, as well as functions unrelated to view. Similar to Element, code can be separated.

@interface PlayInteractionBaseManager : NSObject <PlayInteractionDispatcherProtocol>
 
- (UIView *)view;
 
@end
Copy the code
  • PlayInteractionBaseManager also realized PlayInteractionDispatcherProtocol agreement, therefore had the ability to call all the interaction area agreement

  • Manager does not provide the ability to create views, where a View is a View reference to the UIViewController, for example, you need to add a mask, So adding to the Manager view is the same thing as adding to the VIEW of the UIViewController

  • Popovers and masks are implemented in this way. The Manager is not responsible for the mutual exclusion and priority logic processing between popovers and masks, which needs a separate mechanism to do

Methods to distribute

The protocol defined in the business framework layer needs to be called by the framework layer, but the SDK layer is not aware of it. Since there are many elements and managers, a mechanism is needed to encapsulate the batch call process, as shown in the figure below:

Hierarchical structure

The old interaction area uses VIPER paradigm, while douyin uses MVVM as a whole. Multiple sets of paradigms will increase the cost of learning and maintenance. In addition, when using Element, there are too many Levels of VIPER, so it is considered to be unified into MVVM.

VIPER overall hierarchical structure

MVVM overall hierarchical structure

In the MVVM structure, the Element responsibility is close to the ViewController concept and can be understood as a purer, more dedicated ViewController.

After the Element split, each sub-function has been cohesive, the amount of code is limited, can better support business development.

Element combines the MVVM structure

  • Element: If it is a particularly simple Element, only the Element implementation is provided, and the Element layer is responsible for the basic implementation and jump

  • ViewModel: Part of the element logic is complex and needs to be extracted as a ViewModel, corresponding to the present Presentor layer

  • Tracker: Buried point tool. Buried points can also be written in the VM, corresponding to the current Interactor

  • Model: Use the main Model for most cases

The business layer

There are two main types of Element implementations in the business layer:

  • General business: such as author information, description, avatar, likes, comments and other general functions

  • Sub-business line business: more than ten lines of business, not one list

The common business Element is placed together with the interaction area code, and the sub-line of business Element is placed in the line of business. When the code is physically isolated, the responsibilities will be more clear, but this also brings a problem. When the framework is adjusted, multiple warehouses need to be changed, and there may be omissions.

Overdesign mistake

Design tends to go to two extremes: no design and over design.

The so-called no design is in the existing architecture, pattern, without additional consideration of differences, characteristics, copy and use.

Transition design is often after eating the loss of no design, became a bird startled by a bow-string, see what all want to make a pile of configuration, combination, expansion design, simple but complicated, too much of a good thing.

Design is the art of making trade-offs between quality, cost, time, etc.

Implementation plan

Business development can not stop, development, reconstruction at the same time, equivalent to the highway does not stop to change tires, need to have enough plans, record, in order to ensure the smooth landing of the design scheme.

Change the evaluation

First estimate the size and cycle of modification:

  • Code changes: nearly 40,000 lines

  • Time: half a year

The change takes a long time and risks are difficult to control. Each version has a large number of business requirements and a large number of code needs to be changed. During refactoring, if the refactored code conflicts with the new requirements, it is very difficult to solve.

The importance of functions has been mentioned many times. It is necessary to consider whether the functions are normal after refactoring, how to deal with problems if any, and how to prove that the functions after refactoring are consistent with the previous ones, so that the product data will not be affected.

Implementation strategy

The basic idea is to implement a new page and switch through ABTest. If the core index is not obviously negative, the amount will be increased, and the old code will be deleted after the full amount. The schematic diagram is as follows:

It is divided into three stages:

  • The contents of phase I transformation are shown in red in the figure above: protocol extraction, protocol-oriented programming, independent of concrete classes, transformation of old VC, implementation of protocol, and convergence of methods and attributes exposed outside the protocol to the internal

  • The content of phase II transformation is shown in blue: a new VC is created, and the new VC and the old VC are completely identical in function. The protocol is implemented, and the user is controlled by ABTest whether the old VC or the new VC is obtained

  • Contents of the third phase: delete the old VC and ABTest, retain the agreement and new VC, and complete the replacement work

The second phase is the focus, which takes the most time. In this phase, the new and old pages need to be maintained at the same time, and the development and testing workload doubles. Therefore, the second phase should be shortened as much as possible.

ABTest

Two purposes:

  • Using ABTest as a switch, you can flexibly switch between old and new pages

  • Using data to prove that the old and new pages are consistent, in terms of business function, they are exactly the same, but whether the actual situation is in line with expectations, need to use core metrics such as retention, playback, penetration

The development of two sets of pages

In the second phase, there is a cost to switch between two sets of ABTest pages. Two sets of ABTest needs to be developed and tested twice. Although part of the code can be shared, the cost is still greatly increased.

In addition, the development and testing of two sets are not easy to find problems, and once there is a problem, even if you can switch flexibly with ABTest, it will take a very long period to fix the problem, go online again, and reach a conclusion on ABTest data.

If every release had a problem, it would be an endless loop of going online, finding the problem, fixing it, going online again, finding a new problem, and probably never getting all of it.

As shown in the figure above, the version is iterated in one week, the problem is found and repaired next week, so it needs to go through gray scale, online gray scale (the amount of gray scale in AppStore), ABTest verification (AB data stability takes 2 weeks), a total of 6 weeks.

Let each student understand the overall operation mechanism, cost, help to unify the goal, shorten the period.

Delete old code

The architecture was well prepared. Deleting the old code was very simple, just delete the old file and ABTest, and in fact it was done in 1 day.

For example, some branches have been modified to delete the code, because the file does not exist, as long as the modification, will inevitably conflict, before merging, need to git merge the source branch, will have the conflict of the old page deleted.

Crash proof pocket

Develop two sets of pages for protocols, and expect that if you add a feature and the new page misses a method, it won’t crash. Objective – C message forwarding can be used to implement this feature, the judgment method in the forwardingTargetForSelector method exists, if not, add a way out, can be used to deal with.

- (id)forwardingTargetForSelector:(SEL)aSelector { Class clazz = NSClassFromString(@"TestObject"); if (! [self isExistSelector:aSelector inClass:clazz]) { class_addMethod(clazz, aSelector, [self safeImplementation:aSelector], [NSStringFromSelector(aSelector) UTF8String]); } Class Protector = [clazz class]; id instance = [[Protector alloc] init]; return instance; } - (BOOL)isExistSelector:(SEL)aSelector inClass:(Class)clazz { BOOL isExist = NO; unsigned int methodCount = 0; Method *methods = class_copyMethodList(clazz, &methodCount); NSString *aSelectorName = NSStringFromSelector(aSelector); for (int i = 0; i < methodCount; i++) { Method method = methods[i]; SEL selector = method_getName(method); NSString *selectorName = NSStringFromSelector(selector); if ([selectorName isEqualToString: aSelectorName]) { isExist = YES; break; } } return isExist; } - (IMP)safeImplementation:(SEL)aSelector { IMP imp = imp_implementationWithBlock(^(){ // log }); return imp; }Copy the code

In the development and testing stage, we can use strong interactive means to prompt, such as TOAST, popover, etc. In addition, we can receive the dot and report statistics.

The degradation

It requires clear rules, mechanisms to prevent deterioration, and a constant commitment to maintenance.

Not everyone can understand the design intent, code with different responsibilities should be put in the right place, such as business-unrelated code, should be lowered to the framework layer, reduce the probability of breaking, close development pace, even if a simple if else is easy to write a problem, such as adding one more condition, almost always write another if, Until after writing dozens of, found that writing can not go on, and then pushed to rebuild, expected to rebuild once, can keep as long as possible.

What’s more, the code can deteriorate during the refactoring process, and if the problem appears faster than it can be fixed, you’ll be stuck putting out fires that will never be completely fixed.

In the new scheme, the business logic is put in the Element, and the ViewController and container are left with the general code. This part of the code is not necessary for business students to modify, and it is easy to change problems if you do not understand the whole. Therefore, this part of the code is maintained by specially-assigned people.

Each Element is divided into independent files according to business line. The maintained files can be added with reviewer or file change notification, or moved to the business warehouse for physical isolation.

Log & Troubleshooting

Stable recurring problems are relatively easy to troubleshoot and solve, but probabilistic problems, especially those caused by iOS system problems, are relatively difficult to troubleshoot. Even if you guess the possible causes of problems, it is difficult to self-test and verify after modification, so you can only go online and observe.

Log key information in advance. For example, if a user reports a problem with a video, you need to find the corresponding model, Element, View, layout, and constraint information based on the log.

Information synchronization

The changes are too extensive, so we need to inform the development, testing and product of the business line in a timely manner.

  • La group of notice

  • Weekly meeting, weekly report

Development students most concerned about the point is when the amount, when the full amount, when can delete the old code, do not maintain 2 sets of code.

The second is the change, when the framework is not stable, it needs to be changed frequently. If it is changed, it needs to be verified by the maintenance students of the corresponding affected functions, and confirm whether the test is involved.

Product students also want to know, although the product does not pay attention to how to do, but once there is a problem, no known, very troublesome.

Ensure the quality

The most important thing is to find the problem in time, which is the precondition to avoid or reduce the impact.

Conventional RD self-test, QA functional test, integration test and so on are necessary, here is not to say, mainly discuss what other means can be more timely to find the problem.

The needs of new development need to develop the new and old page two sets of code, in the same way, also to test two times, although stressed several times, but involves multiple business lines, across different teams, responsibilities, time lines long, it is easy to miss, and the new page ABTest’s capacity is small, once a problem, is hard to find, so to the distinction between online and test users:

  • Online and offline traffic strategy: Online AppStore channel ABTest is designed and expanded according to data analysts; The amount of offline channels such as internal testing and gray scale is 50%, and the old and new two sets account for half. There is a certain scale of internal testing and gray scale personnel. If it is an obvious problem, it is easy to find

  • Comparison of ABTest product indicators: Gray level and online data are of reference value. According to the amount of ABTest data, make a rough evaluation of whether there is a problem. If there is an obvious problem, it can be investigated in depth in time

  • Slardar ABTest technical index control: The most commonly used is the crash rate. The crash rate of the control group and the experimental group is compared to see if there are new crashes. The amount of crash in the experimental group is relatively small. In addition to other technical indicators, can also pay attention to

  • Slardar technology alarm configuration: The reconstruction cycle is relatively long, it is difficult to do every day to watch, add technology in key positions, configure alarms in the system, set good conditions, so that when there is a problem, you will be notified in time

  • Unit testing: Single testing is a necessary means to ensure refactoring, in the framework, SDK and other core code, have added single testing

  • UI automated testing: If you have a complete validation use case, you can help find problems to some extent

Troubleshoot problems

Stable recurring problems are easier to locate and solve, but the two types of problems are more headache. Let’s talk about them in detail:

  • ABTest Indicates a negative indicator

  • Probabilistic problems

ABTest Indicates a negative indicator

The core index of ABTest is negative, so it is impossible to increase the quantity. We even need to close the experiment to check the problem.

There is an example of sharing. Both the total amount of sharing and the amount of sharing per capita are obviously negative.

Looking for ABTest metrics is similar to looking for bugs. It’s about finding differences, narrowing the scope, and finally locating the code.

  • Contrast features: Look for differences in user usage. Interaction designers, testers, and developers tested themselves

  • Comparison of codes: No differences were found between the new and old sets of dot code logic, especially the conditional logic that entered the dot

  • Split indicators: Many functions can be shared, dot platform can split indicators according to the source of the share page, found that long press the pop-up panel share reduced, other sources are not much different, further investigation of the pop-up panel found that the probability of occurrence is significantly lower, roughly locate the scope of the problem. It is also worth mentioning that dislike of less core metrics, and dislike of less, means higher video quality, so this is hard to detect from the ABTest data

  • Location code: Screening panel appears conditions found that old code is in the long press gestures, ruled out individual thumb up, comments, such as button, other location (if not add event) is to point, such as white space between thumb up, reviews button, and the new code is on the right side button at the bottom of the region, a unified ruled out, such a blank area will not be able to point, click the area small, So it’s less likely to happen

  • Problem solved: After locating the problem, the fix is relatively simple and the old code implementation is restored

There are a lot of things to think about. When you refactor, you see bad code, do you want to change it?

After the above problems, for example, increased the function, don’t know if we should remove click, it’s easy to be ignored, long press belongs to the underlying logic, specific details button belongs to the business, it is not good to depend on the details of the underlying logic, maintainability is very poor, but after modification, is likely to affect the interaction experience and product indicators, especially the core index, once, don’t have much to explore space.

If it is estimated that the function and interaction will be affected, do not change it as much as possible. In the major refactoring, the core problems should be solved first, and local problems can be solved separately later.

Below is a screenshot of the shared data in the long press panel, significantly reduced, other sources are basically the same, no map.

Long press the occurrence rate of the mask to reduce about 10%, more natural guess the occurrence rate of the mask to reduce.

Identify problems by comparing View differences.

There are many similar problems. The ABTest volume and full volume process requires sufficient time estimation and patience. This process will greatly exceed expectations. Almost all the core indicators of Douyin are related to the interaction area, and many analysts and products need to pay attention to it. Therefore, first understand the negative cognitive difference between analysts, product and development students on ABTest indicators.

Most indicators are positive, and some indicators are negative, so they will be judged as negative.

Development students may think about the rationality of the design, the rationality of the code, or from the perspective of the overall profit and loss of the difference, but analysts will give priority to no problems, do not have hidden dangers. The two methods are considered from different perspectives and goals, and there is no right or wrong point. In fact, analysts have helped to find a lot of problems. There are many analysts, product at present, every index has analysts, is responsible for the product, if a core index significantly negative, to find the corresponding analysts, product discussion, it is very difficult to agree, even the first volume again screening program is difficult to accept, suggest yourself to see index, follow up as soon as possible, the key when finding someone to help you advance.

Probabilistic problems

The difficulty of the probabilistic problem lies in that it is difficult to reproduce, debugging and locating the problem cannot be performed, and testing and verification cannot be performed after modification. It can only be determined whether to fix the problem after going online. Take an example of crash on iOS9 as an actual example, and the discovery process is as follows:

  • If slardar=>AB experiment => specify experiment => Monitor type => crash, you can see the crash rate of the experimental group and control group. Other OOM indicators can also be viewed with this function

The following is the crash stack. The crash rate is relatively high, about 50% of iOS 9 users will appear:

The crash stack is in the system library, the source code cannot be seen, and the related problem code cannot be found in the stack, so the problem cannot be located. The whole process of solving the problem is relatively long, and I will try to use the methods used before, for your reference:

  • Manual reproduction, try to modify, can be reproduced, but brush a day also can not reproduce a few times, the efficiency is too low, for some problems, accurate judgment, can be relatively fast to solve

  • Swizzle system crash method, log the last crashed View, related View hierarchy, narrow down the scope of investigation

  • Automated test replication, which can be used to verify that the problem has been fixed, cannot locate the problem

  • Look at the UIKit system implementation backwards and analyze the cause of the crash

Reverse general process:

  • Download the iOS9 Xcode & Emulator file

  • Extract the UIKit dynamic library

  • Analyzing the crash stack, Through the crash last place _layoutEngine, _addOrRemoveConstraints, _withUnsatisfiableConstraintsLoggingSuspendedIfEngineDelegateExists Three key methods to find the call path, as shown in the figure below:

  • _withUnsatisfiableConstraintsLoggingSuspendedIfEngineDelegateExists deactivateConstraints method was called, The _addOrRemoveConstraints method is also called in the deactivateConstraints method, which matches the third line in the crash stack. This is where the problem lies.
@implementation UIView - (void)_withUnsatisfiableConstraintsLoggingSuspendedIfEngineDelegateExists:(Block)action { id engine = [self _layoutEngine]; id delegate = [engine delegate]; BOOL suspended = [delegate _isUnsatisfiableConstraintsLoggingSuspended]; [delegate _setUnsatisfiableConstraintsLoggingSuspended:YES]; action(); [delegate _setUnsatisfiableConstraintsLoggingSuspended:suspended]; if (suspended == YES) { return; } NSArray *constraints = [self _constraintsBrokenWhileUnsatisfiableConstraintsLoggingSuspended]; if (constraints.count ! = 0) { NSMutableArray *array = [[NSMutableArray alloc] init]; for (NSLayoutConstraint *_cons : constraints) { if ([_cons isActive]) { [array addObject:_cons]; } } if (array.count ! = 0) { [NSLayoutConstraint deactivateConstraints:array]; / / NSLayoutConstraint entry [NSLayoutConstraint activateConstraints: array]; } } objc_setAssociatedObject( self, @selector(_constraintsBrokenWhileUnsatisfiableConstraintsLoggingSuspended), nil, OBJC_ASSOCIATION_RETAIN_NONATOMIC); } @end @implementation NSLayoutConstraint + (void)activateConstraints:(NSArray *)_array { [self _addOrRemoveConstraints:_array activate:YES]; } + (void)deactivateConstraints:(NSArray *)_array {[self _addOrRemoveConstraints:_array activate:NO]; } @endCopy the code
  • From the code logic and _constraintsBrokenWhileUnsatisfiableConstraintsLoggingSuspended method named semantic point of view, the code is mainly used for processing cannot satisfy the constraints of logs, logic should not affect function

  • In addition, if the crash location cannot be accurately determined during analysis, it is necessary to reverse the real machine file. Compared with the simulator, the stack of the real machine is accurate, and the final code call can be found through the original crash stack offset

To get the results

  • Development efficiency: The previous 5 files of VIPER structure are divided into about 50 files, and the responsibilities of each function are within the business line. It is no longer necessary to look at all the code to add and modify. According to the survey questionnaire, the development efficiency is improved by more than 20%

  • Development quality: from the point of view of bugs and online faults, the new page problems are relatively few, and the problems are generally frame problems, and can avoid batch problems after repair

  • Product benefits: Although the functions are consistent, the performance of the reconfiguration design is improved, and the core indicators have obvious positive benefits. The experiment has been opened for many times, and the conclusions of the core indicators are consistent

The courage to

This last part comes after a lot of thought. Refactoring is a normal part of development, but it’s always hard to do. Under the strict recruitment of the company, all the people who can come in are smart people, there is no lack of wisdom to solve the problem, what is lacking is courage, and reviewing this refactoring and the above mentioned “tried ways”, is exactly the same.

It’s easy to spot code when it’s hard to maintain, and the idea of optimizing and refactoring is natural, but there are two things that make refactoring ineffective:

  • When does it start?

  • Try local refactoring

Before we start talking about when to start, let’s start with a word. There is a popular word in work called ROI, which basically means the ratio of inputs to benefits, the less, the more, the better, and the better. This word guides many decisions.

Refactoring is a easy thing, need to put in very big heart, time, and can see the direct benefits is not obvious, once the change out of the question, but also to take risks, refactoring is hard to get others recognition, such as product, function completely did not change, the code can run, why refactoring, now still waiting for the development of new demand, The code that has a problem just keeps dragging and getting worse.

It’s true that refactoring is most profitable when there are enough pain points, but it seems that the real benefits are constant, that there are a lot of additional maintenance costs and refactoring costs after degradation, and that in the long run, it’s better to change before it’s too late. It is difficult to make a decision and even harder to persuade people. Everyone may have different understandings and different judgments about long-term benefits, so it is difficult to reach an agreement.

There are many people who think, but few people who go. People tend to be cautious about the unknown. It is the courage to pursue technology that supports going forward.

The best time to refactor is now.

Local refactoring, many a mickle makes a muckle, and finally to complete whole, even out of the question, is partial, this is the way from bottom to top, itself is no problem, is also often used, and the matching is downwards on the overall reconstruction, want to stress here is that local refactoring, overall reconstruction just means, choose what means to see what problem, If the underlying problem is an overall structural or architectural problem, local refactoring cannot solve it.

For example, during the refactoring this time, many people asked if the changes could be smaller and more careful. However, the design scheme was analyzed and sorted out, and it was clear that the structural problems could not be solved by local refactoring. The previous attempts also proved this point.

Don’t forget to run for fear of pulling eggs.

included