0. Basic concepts

  • Four statistical models

    • Aggregate statistics
    • Order statistics
    • Binary state
    • Base statistics
  • Select different sets of elements according to different statistical patterns

1. Aggregate Statistics (SET)

  • Basic concept: it refers to the aggregation results of multiple sets of elements

  • Here’s an example:

    • Counting common elements of multiple sets (intersection statistics) \

    • Count elements unique to one of the sets (difference set statistics) \

    • Counting all elements of multiple sets (union statistics) \

  • Take an actual chestnut

    • Count the number of new users and retained users of mobile App every day \

    • implementation

      • You can use a collection to record all user ids (all users) \ that have logged in to the App

        • Using the Set type

        • Set key to user:id\

        • Value is a Set of all the user ids \ that have logged in to the App

      • Use another set to record the ID of the user who logged in to the App every day (daily user) \

        • Using the set type

        • Key is user: ID and date \

        • Value is a Set of user ids that log in on the current day

      • The new user SDIFFSTORE is obtained by the difference set result between the login user and the total user

      • Obtain the full user SUNIONSTORE\ by combining the current login user with the total user

      • Intersections users of different days to obtain retained users and SINTERSTORE

  • Sets are great for aggregate computing operations on multiple collections

  • However, the computation complexity of difference Set, union Set and intersection of Set is high. If these calculations are performed directly in the case of large data volume, the Redis instance will be blocked

    • You can choose a slave library to be responsible for aggregate computing
    • The data is read to the client, and the client performs the sensing operation

\

2. Ranking statistics (ZSET)

  • Basic concept: Sort sets \

  • For example: a list of recent reviews

  • Set types: List, Hash, Set, Sorted Set

    • List and Sorted Set belong to the ordered Set \

    • List is sorted by the order in which elements enter the List \

    • Sorted Set can sort \ according to the weight of the elements

  • Use the list

    • The List contains all reviews for this item, and these reviews are saved by review time

    • Every time a new comment comes in, insert it into the queue header \ of the List with LPUSH

    • Possible problems

      • Paging results in an error because new headers are added to the element
  • Using a sorted set

    • You can assign a weight value to each comment by comment time and then save the comments in Sorted Set \

    • Sorted Set also gets exactly Sorted data \ using the ZRANGEBYSCORE command

    • There are no paging results errors

    • ZRANGEBYSCORE key min max [WITHSCORES] [LIMIT offset count]

  • It is recommended that you use Sorted Set\ when displaying the latest lists, leaderboards, etc., if the data is updated frequently or needs to be paginated

3. Binary state statistics (Bitmap)

  • Basic concept: Binary state here means that there are only two values of 0 and 1

  • Here’s an example:

    • The daily check-in of each user can be represented by 1 bit, the check-in of a month (suppose 31 days) can be represented by 31 bits, and the annual check-in only needs 365 bits
  • Some operations

    • Bitmap itself is a statistical binary state data type \ implemented using String as the underlying data structure

    • Bitmap provides GETBIT/SETBIT operations, which use an offset value to read and write a bit in the array \

    • Bitmap also provides the BITCOUNT operation, which counts the number of 1’s \ in the array of bits

    • Bitmap supports BITOP command to perform “and” “or” “xor” operations on multiple bitmaps. The operation result will be saved to a new Bitmap

  • Advantages: Store small set date store big \

\

4. Cardinal statistics

  • Basic concept: Cardinal statistics refers to the number of non-repeating elements in a set

  • If you use redis set, SADD every time, get the total SCARD.

  • Record UV HSET page1: UV user1 1\ using the Hash type

  • Using HyperLogLog

    • HyperLogLog is a type of data set used to calculate cardinality. Its great advantage is that the space required to calculate the cardinality is always fixed and small when the number of elements in the set is very large

    • Each user visiting a page can be added to HyperLogLog \ with the PFADD command (used to add new elements to HyperLogLog)

    • The UV value of page1 can be obtained directly with the PFCOUNT command. The function of this command is to return the statistics of HyperLogLog \

    • It should be noted that the statistical rules of HyperLogLog are based on probability, so the statistical results given by HyperLogLog are subject to some error. The standard error rate is 0.81%

    • Not suitable for delicate calculations

5. Summary

  • Statistical methods

    • Aggregate statistics \

    • Sorting statistics \

    • Binary state statistics \

    • Base statistics

\

\