I. Conceptual definition

1. What is a category

Category is simply the classification of goods, with the most commonly used Taobao, is the place circled in the picture.



Why there are categories, is also determined by its function, categories have been as a standard for e-commerce website navigation, but the categories of different websites are different.

If we only have dozens or hundreds of products on our website, maybe the categories are not important to us, but if there are thousands or even more products, the categories are crucial for us to find products with certain characteristics. For example, if you want to find women’s jeans, you can find them through the category women’s wear -> Jeans. Otherwise, it is a page to search, even if the quality of our platform goods is good, cost-effective again, I believe that users will not be able to tolerate the crazy.

2. Front and back categories

Categories are divided into foreground and background categories.

The existence of foreground category is mainly for users. The search navigation bar is changeable. Seasons and marketing activities will affect category navigation.

Background category is directly associated with the commodity, when the commodity is created, choose a good category, then the corresponding category will almost not change, it is very stable.

Benefits:

For example, now we launch a new activity on our platform, the category is 12.12. If there is no separation of the front and back categories, we need to find the products that need activities and change their category to 12.12, but obviously this is not the right way. Then re-maintain a set of association relations with these goods, so as to create a new module, it is better to directly use the category to carry, so that we can make a mapping relationship between the front and back categories.

Association relationships can be composed as desired.

For example: now there is A batch of goods, respectively have background category A, B, C, we want to do activity navigation for category A and B goods, then make A mapping relationship, 12.12->(A, B), and associate foreground category 12.12 with background A, B, so that we can find all the goods under category A and B through navigation 12.12.

In addition, the foreground category has the advantage of being fickle and not directly associated with merchandise, which can be extended in many ways. For example, the new active channel, through the WAY of URL directly jump; Defined by keywords, such as category T-shirt is the function of commodity search through keyword T-shirt.

Generally speaking, both the foreground category and the background category are divided into several levels, so they eventually form a category tree.

3. Attributes and attribute values

Each category has attributes that are common to all items in the category, such as color, size, and so on.

The property value is the specific value of the property, for example, the color can be red, white, or black.

Normally, the foreground category has associated background category, so the foreground class object attributes are selected from the background class object attribute set.

For example, in the relation 12.12->(A, B), the corresponding attribute A(A1, A2), B(b1, B2), then the attribute of 12.12 should belong to the set (A1, A2,b1, B2).

Attributes can also be divided into several categories, we use roughly: navigation attributes, sales attributes, general attributes.

Navigation properties: As a property option to enter the filter page by category.



Sales attribute: SkU specification attribute available on the product details page, the price of different attributes may be different.



Common attributes: Other attributes of a commodity.



When creating an item, you need to select a category, and then fill in the attribute values of the item according to the attributes of the category. The value is saved on the item in a format similar to key-value. This allows you to filter items based on attribute values.

4. Subattributes and subattribute values

This relationship is usually not used, not designed in such detail, but it can be reserved.

A child property hangs below a property value. The value of a subattribute is the value of that subattribute. For example, under the category of mobile phone, there is a brand whose attribute value is iPhone, and there is a brand whose attribute value is iPhone, then the sub-attribute model of iPhone can be divided further down, and the corresponding sub-attribute value can be iPhone8, iPhone X, etc.

So there is a relationship: Category (phone)-> Attribute (Brand)-> Attribute Value (iPhone)-> Subattribute (Model)->iPhone8(subattribute value)

Sub-attributes are also carried by attribute model, but the correlation between attribute values and sub-attributes should be established during design.


Second, technical design

1. The diagram



The corresponding relationships are as follows:

Foreground category: background category (many-to-many);

Category: attributes (1 to many);

Attribute: attribute value (1 to many);

Attribute value: subattribute (1 to many);

2. Tree structure diagram of category attributes



The depth of the hierarchy of categories and attributes depends on the business and is itself extensible because it is a tree structure.

Properties hang under the background leaf class.

3. The category table

`cate_id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'primary key',

  `gmt_create` datetime NOT NULL COMMENT 'Creation time',

  `gmt_modified` datetime NOT NULL COMMENT 'Modification time',

  `pid` bigint(20) DEFAULT NULL COMMENT 'parent id',

  `leaf` tinyint(4) NOT NULL COMMENT 'Leaf node 1: Yes 0: no',

  `level` tinyint(4) NOT NULL COMMENT 'hierarchy',

  `title` varchar(64) NOT NULL COMMENT 'title',

  `cate_type` tinyint(4) NOT NULL COMMENT 'Foreground background Category 1: Foreground 2: Background',

  `back_categories` varchar(255) DEFAULT NULL COMMENT 'Background class ID set for foreground English comma separated',

  `root_cate_id` bigint(20) DEFAULT NULL COMMENT 'Root category ID',

  `order_seq` int(11) NOT NULL COMMENT 'Sort sequence',

  `picture_url` varchar(255) DEFAULT NULL COMMENT 'image url',

  `need_audit` tinyint(4) NOT NULL COMMENT 'Do YOU need an audit? 1: Yes 0: No',

  `is_delete` tinyint(4) NOT NULL COMMENT 'Status 0: Normal 1: Deleted',

  `biz_type` varchar(64) NOT NULL COMMENT 'Platform Type',

  `language` varchar(64) DEFAULT 'zh' COMMENT 'language',

  `country` varchar(64) DEFAULT 'CN' COMMENT 'countries',

  `extension` mediumtext COMMENT 'Extended field',

  `version` int(11) NOT NULL DEFAULT '0'The COMMENT 'version',Copy the code

Pid stores the ID of the upper-level category

Level indicates the level of the category

4. The cache

As the category is the basic data of the e-commerce system, many modules will rely on it. With the expansion of the scale of the e-commerce system, the number of category query requests and concurrency will continue to increase, so we adopted the way of caching at the beginning.

Some considerations for using categories for caching:

  • Category as the basic data, the query request is huge;
  • Compared with other basic data, category data occupies less memory.
  • Category is shared data;
  • The real-time requirement of category change is not high, and there is no strict data consistency requirement;

There are two types of caching: distributed caching and distributed local caching

4.1 Distributed Cache:

Use Redis to store categorical data structures.

Advantages: All clients share the same data without data inconsistency

Disadvantages: When the system is large, the QPS will increase and the pressure on the cache center will increase.

The overall process is shown as follows:



  1. Query category data from the DB
  2. Build the data structure you want to save and push it to the cache
  3. Receiving client query requests
  4. Query from the built cache
  5. Return the data


Refresh policy: Periodic update. 1.2 Two-Step Periodic Task Execution. The execution frequency varies according to the service acceptability.

The advantage of this approach is that you do not need to add cache invalidation, push logic to all method paths where a class changes. All logic is processed in a scheduled task.


4.2 Distributed Local Cache:

Advantages: All clients have a copy of the local memory, which is quickly read from the memory.

Disadvantages: All clients need to synchronize data with the server. The key problem is that data inconsistency is difficult to ensure.

A separate Client JAR package needs to be packaged for use. When the JAR package is upgraded, all dependent parties need to be upgraded simultaneously, resulting in high maintenance costs.

The overall flow chart is as follows:




  1. 1. Read category data from the DB
  2. 2. Build the category data structure to be cached.
  3. 3. Pull data from the server using the developed Client JAR package.
  4. 4. Return the constructed data structure.


Here are a few things to note:

1. There are several ways to refresh client data:

  • The constructed data packets are stored in the DB, and the client refreshes them uniformly from the DB.
  • The server broadcasts messages, and the client listens to refresh the local cache.
  • Jar package is used to implement periodic pull encapsulation to periodically refresh the local cache.

Because our system has low real-time requirements for class updates and no strict consistency requirements. Therefore, periodical refresh can be used in the design. The JAR package approach is the easiest one for the dependent parties. It is also the real time, consistency requirements are not high, so the use of local cache. If the system will cause serious consequences because of local cache data inconsistency, it is still used with caution.

2. Class Client is developed and packaged to reference applications that need to use class services. This package has encapsulated all operations on the class, including timing pull, parsing the data structure constructed by the server, data verification, local cache refresh and other functions. It provides basic operations on the class for applications, such as obtaining the class by ID, obtaining the class tree and so on.

3. Each time the server builds the data structure of the category, it should maintain the version number, so that the client can judge whether it is the latest data and pull it again. It is a good choice to determine the version number and then pull. This reduces unnecessary network transmission caused by unupdated data.

4. Why does the class client pull from the server actively? Because if the server pushes actively, the client list must be maintained first, and the push status of all clients must be maintained second. If this is maintained by the client, the complexity is greatly reduced.


By comparing distributed cache and local cache, it is concluded that distributed cache is simple to implement and data consistency is easy to ensure. The local cache needs to implement the client package by itself, which may cause temporary inconsistency. In high concurrency, the advantage is more obvious, but the strong consistency service should be used with caution. We ended up using distributed caching to meet this requirement.


Structural design:

Category for e-commerce front page, the most used is to get the category tree, so it can be designed in this way.

  • RootCategoryIds: Saves the index of the root category ID list
  • SubCategoryIds + Category ID: Saves the index of the sub-category ID list of the category
  • Category + Category ID: Saves the specific data of the category

All category ids in the category tree can be indexed by the root category ID list and the subcategory ID list, and all data can be obtained by the ID.


This structure actually applies to distributed and locally cached data structures. By constructing this data structure, we can get any node in the category tree quickly.


Third, summary

Although the structure of the category is not complicated, because it is used very frequently, the access of the category needs to be well designed in the e-commerce system. Some distributed and local caches are introduced, but not detailed. For detailed caches, please refer to other materials on the web. The implementation of local cache mentioned above is also based on taobao’s category system, if you must use local cache can also refer to it.


For more articles, visit http://www.apexyun.com/

Contact email: [email protected]

(Please do not reprint without permission)