“This is the first day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021”

The story background

  1. Batch data is obtained from the user likes table, assembled by array_merge, and then batch inserted into the data analysis table.
  2. Due to the small amount of data during the test, there was no problem. As the service grows, more than 30,000 pieces of data can be queried.
  3. 30,000 pieces of data on an 8-core 32GB single machine indicates memory overflow.

Ideas for solving problems

  1. The original design is incomplete analysis data, only take the last 7 days like action users
  2. Minimize DB operations, leaving the calculation and concatenation of data to the program
  3. But because it doesn’t take into account that the computation of the program is also capped, a balance needs to be struck.

The optimized idea is as follows:

Batch insertion while batch insertion, each insertion sleep 10 milliseconds

The core code is as follows:

Public static function Configuration ($begin, $end) {$limit = 1000; $offset = 0; $users = []; do { $sponsorUserIds = self::query() ->selectRaw('userid,createtime') ->distinct() ->whereBetween('createtime', [$begin, $end]) ->orderBy('createtime') ->offset($offset) ->limit($limit) ->get() ->toArray(); $beLikedUserIds = self::query() ->selectRaw('"otherUserid",createtime') ->distinct() ->whereBetween('createtime', [$begin, $end]) ->orderBy('createtime') ->offset($offset) ->limit($limit) ->get() ->toArray(); $sponsorUserIds = array_column($sponsorUserIds, 'userid'); $beLikedUserIds = array_column($beLikedUserIds, 'otherUserid'); $likesUserIds = array_unique(array_merge($sponsorUserIds, $beLikedUserIds)); $userIds = array_map(function ($value) { return ['userid' => $value]; }, $likesUserIds); UserActionRecord::recordBatch($userIds); Echo "recommendation algorithms need to like \n"; echo 'arrayCount:' . count($userIds) . "\n"; $offset = $offset + $limit; Echo 'offset:'. $offset. "\n"; usleep(10); } while ($userIds); return $users; }Copy the code

The idea before optimization is as follows:

Batch reads from DB, concatenates all data through array_merge(), and inserts all data into the database in batches through a SINGLE SQL.

The core code is as follows:

Public static function Configuration ($begin, $end, $select = 'userId,"otherUserid") {$limit = 50; $offset = 0; $users = []; do { $thisUsers = self::query() ->selectRaw($select) ->whereBetween('createtime', [$begin, $end]) ->orderBy('createtime') ->offset($offset) ->limit($limit) ->get() ->toArray(); $users = array_merge($users, $thisUsers); $offset = $offset + $limit; } while ($thisUsers); return $users; } // Get all the data, then undo, Inserted into the database $likes = UserRelationSingle: : likeBetweenDuration (Utility: : recommendCalcTimestamp (), Utility::recommendCalcEndTimestamp()); $sponsorUserIds = array_column($likes, 'userid'); $beLikedUserIds = array_column($likes, 'otherUserid'); $likesUserIds = array_unique(array_merge($sponsorUserIds, $beLikedUserIds)); $userIds = array_map(function ($value) { return ['userid' => $value]; }, $likesUserIds); UserActionRecord::recordBatch($userIds); Echo "UserActionRecord ": echo".json_encode ($userIds). "\n";Copy the code

conclusion

  1. The pre-optimized array_merge() is bound to run out of memory as the data grows, whereas the optimized code does not.

  2. The optimized idea, array_merge(), processes a maximum of 2,000 pieces of data at a time.

Comparison of ideas:

  1. The idea before optimization is to try to use as little SQL as possible, reduce DB operations, put the pressure on the program (PHP function) to deal with, ignoring the memory problem.

  2. After optimization, the idea of a better balance between DB operation and the balance between the program, the ALLOCATION of READ SQL has not changed; The previous single write is changed to multiple writes, but avoids memory problems, and sleeps for 10 milliseconds after each DB insert, reducing DB stress.

Participating in interactive

If you have any good ideas, please feel free to comment in the comments section.