Making the Releases: Gorse v0.2.5

New features

  • User data can be imported and exported

The new version supports the use of user labels and provides the import and export function of user data.

  • The statistics are saved to the database

In previous releases, statistics (number of users, number of items, number of feedback, etc.) were stored in the cache. Starting with this update, these statistics will be stored in the database.

  • Support idF-based similarity calculation

When using labels to calculate the similarity between users or items, the information content of different labels is different. The similarity calculation method based on IDF can increase the weight of tags with low frequency and reduce the weight of tags with high frequency, thus alleviating the interference of high frequency tags on similarity calculation.

  • Remove click feedback type

The previous version of the configuration file had an item called click_feedback_types to set click feedback types. Due to some overlap between click_feedback_types and POSItive_feedback_types, click_feedback_types has been removed in the new version. The new version also updates the CTR calculation so that only positive feedback, which records the corresponding read feedback, is counted in the CTR calculation.

  • Based on recommendations from similar users

The previous version has provided two personalized recall recommendation algorithms, collaborative filtering and similar items (CTR prediction is used for secondary ranking of recommendation results). The recommendation algorithm based on similar users firstly finds out users similar to the current user and recommends the favorite items of similar users to the current user. I’m a Go/C++/Python developer, so I’ve recommended projects for those languages based on recommendations from similar users.

  • Evaluation of CTR prediction models using AUC

While the previous version used accuracy to evaluate the performance of CTR prediction models, the new version uses AUC to evaluate CTR prediction models. AUC pays more attention to the sorting ability of the model and is more suitable for the secondary sorting scenario of the recommendation system.

  • Multi-way recall recommendation is realized

The new version uses a variety of recommendation algorithms to get recommendations from the vast library of items, and then uses a CTR prediction model to rank them and recommend them to users. For similar item recommendation, item label and similar user recommendation depend on user label, and CTR prediction depends on user label or item label. If the data set cannot provide labels, it is suggested to close the corresponding recommendation algorithm.

To fix the problem

  • Fixed panic when duplicate items appeared

The previous implementation assumed that no user or item with the same ID would appear when it was read from the database, but with ClickHouse support that assumption is no longer true. After updating the item information in ClickHouse, if the ClickHouse has not been merged, you will get an item with the same ID read.

Upgrade guide

This update modifies the configuration file.

  • Recommendation algorithm degradation

Fallback_recommend is used to set the recommendation method when the recommendation results in the cache are exhausted, currently changed to an array.

# The fallback recommendation method for cold-start users:
# item_based: Recommend similar items to cold-start users.
# popular: Recommend popular items to cold-start users.
# latest: Recommend latest items to cold-start users.
# The default values is ["latest"].
fallback_recommend = ["item_based"."latest"]
Copy the code
  • Similarity type setting

In the previous version, neighbor_type is replaced with item_neighbor_type and user_neighbor_type. You can set the user similarity type and item similarity type respectively.

# The type of neighbors for items. There are three types:
# similar: Neighbors are found by number of common labels.
# related: Neighbors are found by number of common users.
# auto: If a item have labels, neighbors are found by number of common labels.
# If this item have no labels, neighbors are found by number of common users.
# The default values is "auto".
item_neighbor_type = "similar"

# The type of neighbors for users. There are three types:
# similar: Neighbors are found by number of common labels.
# related: Neighbors are found by number of common liked items.
# auto: If a user have labels, neighbors are found by number of common labels.
# If this user have no labels, neighbors are found by number of common liked items.
# The default values is "auto".
user_neighbor_type = "similar"
Copy the code
  • Recommendation algorithm switch

The five recommended algorithms in multipath recall can be turned on or off in a configuration file.

# Enable latest recommendation during offline recommendation. The default values is false.
enable_latest_recommend = true
# Enable popular recommendation during offline recommendation. The default values is false.
enable_popular_recommend = true
# Enable user-based similarity recommendation during offline recommendation. The default values is false.
enable_user_based_recommend = true
# Enable item-based similarity recommendation during offline recommendation. The default values is false.
enable_item_based_recommend = false
# Enable collaborative filtering recommendation during offline recommendation. The default values is true.
enable_collaborative_recommend = true
Copy the code