Making the Releases: Gorse v0.2.6

For a recommender system, the amount of data that needs to be processed is constantly increasing. In order to better handle recommender scenarios with slightly larger amounts of data, v0.2.6 has optimized memory and CPU usage.

New features

This update focuses solely on performance optimizations and does not include any new features.

Performance optimization

  • Optimized the issue of working nodes repeatedly reading item information from the database

Optimized the time for GitRec to generate offline recommendations for all users from 6 minutes 11 seconds to 3 minutes 29 seconds.

  • Skip the calculation of cold items and cold users when training CCD algorithm

The model parameter search time of GitRec was optimized from 10 minutes 23 seconds to 2 minutes 41 seconds.

  • Incremental updates are implemented for users and item neighbors

The system records the time stamps of recent events of users and items. If the last neighbor update time is later than this time stamp, neighbor search and update will be triggered.

  • Streaming data reading improves data reading efficiency

In the old version, all the data in the database is read by turning pages. This method requires repeatedly establishing a connection with the database and repeatedly locating the starting position of data scanning. The new version removes I/O bottlenecks on the primary node side by fetching data once from the database as a stream.

  • Optimize memory usage by modifying data structures

Firstly, 32 bits integer type is used to represent the ID of users and items in the recommendation system. 32 bits signed integer type can represent 2 billion users and items, which is enough for most scenarios. In addition, some redundant data structures are eliminated through the “time for space” approach.

To fix the problem

  • Fixed an issue where the console service started after reading data

In previous versions, you had to wait until the data read was complete before you could access the console, which was fixed in V0.2.6. In addition, the console will display the real-time progress of data reading.

  • Fixed an issue where master nodes could not distribute large models

The new version increases the size limit of gRPC messages, but the encoding process still consumes too much memory. The next version will use streaming to solve this problem.

  • Fixed an issue where the console could not be accessed after setting the API key

The new console API does not require a key to access.

  • Fixed importing Chinese garbled characters (contributed by @amaaazing)
  • Fixed incompatible MySQL 8.0 syntax with lower versions (contributed by @hetao29)
  • Fixed missing information returned from Swagger API documentation (contributed by @ccfish86)

Welcome to contribute

The development of Gorse is dependent on your feedback and contribution, not just a few people can do it. In the issue list of GitHub warehouse, some simple functional requirements have been labeled “need help”. These requirements are relatively simple to implement, but have a very important impact on the performance and usability of the system. Welcome people to contribute code.