parameter

The database

Parameters are defined

  1. Eu (I) : utility value of element I
  2. Iu (I, T) : the number of elements I in the transaction set T
  3. U (I, T) : total utility value of all elements I in transaction set T, u(I, T) = IU (I, T) × eu(I)
  4. U (X, T) : the total utility value of set X in transaction set T
  5. U (X) : the total utility value of all sets X in the database DB
  6. Tu (T) : The total utility value of transaction set T
  7. Twu (X) : The total utility value of all transaction sets containing set X in the database DB
  8. T/X: T/X represents the set of all the elements in set T that come after set X!
  9. Ru (X, T) : sum of utility values of T/X in set T
  10. I-term extension: if the subtree node of the k-term set has (k+ I) terms, it is called i-term extension of the k-term set.

Utility list parameter

The utility list of set X contains three parameters:

  • Tid: id of the transaction set containing set X
  • Iutil: u (X, T)
  • Rutil: ru (X, T)

operation

Utility list initialization operation

Each set of items has a list of utilities

  1. Find the TWU(I) of all items. If the TWU of item I falls below the threshold given, then any item set containing I is no longer considered.

Utility list generation operation

By comparing tiDs in two utility lists, the algorithm constructs multiple utility lists through the intersection of TIDs. (Below is the comparison between {e} and {c})

Construction of binary utility lists

  1. Extract the tid intersection of the two itemsets as the TIDS of generating itemsets
  2. The iUtils corresponding to the TIDS of each generated utility list are the sum of the iUtils corresponding to the TIDs of the two item sets
  3. For the rutils corresponding to TIDS of generated utility list, take the ruTils corresponding to item set with lower order (larger TWU)
  4. Example of constructing a utility list {eb} : {e} utility list is {< 2, 4, 14 >, < 4, 4, 2 >, < 5, 8, 14 >}, {b} utility list is {< 1, 2, 5 >, < 2, 2, 9 >, < 5, 4, 10 >, < 6, 8, 3 >}, The utility list of {eb} is {<2, 6, 9>, <5, 12, 10>}

Construction of multivariate utility lists

  1. When we construct a utility list of k-item set {i1 · · · I (k−1) IK}, we can think of it as a combination of two k-1 item sets {i1 · · · I (k−2) I (k−1)} and {i1 · · · I (k−2) IK}.
  2. The iutils calculation method of the generated utility list is as follows: U ({i1 · · · I (2) k – I (k – 1) ik}, T) = u ({i1… I (2) k – I (k – 1)}, T) + u ({i1… I (k – 2) ik}, T), u ({i1… I (k – 2)}, T), That is, we need to subtract u(X, T) of the intersection of the two constituent sets (let’s call it X).

Pseudocode representation

HUI-Miner

The search space

The search space of the Hui-Miner algorithm can be represented as an enumeration tree of itemsets with nodes arranged in TWU order.

  1. The root node is an empty set.
  2. The node of the first layer is unary item set, which is arranged in TWU order.
  3. Starting from the second layer, only elements whose TWU value is greater than that of the last element of the parent node are added to the end.
  4. Repeat step 3 until construction is complete.

Pruning strategy

  1. If the sum of all iUtils in the utility list corresponding to the item set is greater than the threshold, the item set is efficient.
  2. If the sum of all iUtils and Rutils in the corresponding utility list of the itemset is greater than the threshold, the itemset needs to be further determined.
  3. If the sum of all iutils and Rutils in the utility list corresponding to the itemset is less than the threshold, then the itemset and all its extensions are inefficient and pruned.

MUI – Miner algorithm

Implementation details

  1. For efficiency, avoid scanning utility lists. The sum of its iutils and rutils is calculated and saved in the utility list when the utility list is constructed.
  2. There is also no need to bind the itemset to the corresponding list of utilities, because all children of a node in the set enumeration tree represent itemsets with the same prefix itemset.
  3. The first line in the utility list stores the extensions and the sum of iutils and rutils, while the prefix set is stored separately.