The article was first published on the public account of Jizhi Club, the original address
Guide language:
The statistical law of “information avalanche” on social media, at least so far, has not been shown to be sufficiently robust across systems. For information dissemination, a completely different process may have a reasonable driving mechanism behind it. Recently, a paper published in Nature Communications analyzed nearly 100 million time series from Titter, Telegram, Weibo and other social platforms over a time window of nearly 10 years, demonstrating the universality and criticality of the information transmission process on social media. Universality is reflected in that, no matter how detailed the specific system is, we can observe the unified pattern of macroscopic scale in different systems. The criticality comes from the power-law distribution of duration and size of information avalanche, the corresponding hyperscaling relations. The statistical test of data in this paper points out that the process of information dissemination on social media is a combination of simple communication rules and complex communication rules, and also points out that the complexity of this process is related to the semantic content contained in the information dissemination.
Research areas: social media, information transmission, avalanche, criticality, seepage model, phase transition
Universality, criticality and complexity of Information Propagation in social Media
Paper links: www.nature.com/articles/s4…
1. Avalanche on social platforms
Social media has dramatically changed the way people produce, receive and digest information. There is growing evidence that online communication and communication are changing society like never before. For example, public discussion about COVID-19 has been accompanied by what has been called an “information pandemic” that has influenced attitudes towards vaccination. At the same time, conversations on Wall Street’s Reddit channel led many to buy GameStop shares in protest against short selling by hedge funds and professional investors, eventually increasing the company’s market value by more than $2.2 billion in just a few days. This has led scientists to be very interested in the mechanism behind this phenomenon of information transmission.
The analysis of the spread of information on social media is, at least qualitatively, very similar to some natural phenomena, such as neurons firing and earthquakes. These processes are characterized by bursty activity patterns. These activities usually consist of point-like events in time, with bursts of activity (or avalanches) defined as a time series of adjacent events separated by long periods of low activity (periods of low activity).
Avalanche activity can be described on a macroscopic scale by distributions P(S) and P(T) of eruption size S and duration T. For many systems in the real world, P(S) and P(T) are usually power-law decaying. This feature is considered evidence that the system is at or near the critical state. In addition, at the critical point, after a given duration, the relationship between the average avalanche size and time must satisfy the supersize relationship. These parameters have different values for different systems.
In social media, an avalanche usually refers to a large amount of public discussion about a topic in a short period of time. The existing studies on avalanche in social media are still limited to small-scale data sets. Although these different works have found that the distribution of avalanche size and duration satisfies power-law attenuation, the obtained exponential parameters are very different. In addition, empirical studies have not found a power law relationship between size and duration. These differences can be attributed to the different definitions of avalanches in different work and the influence of different temporal resolutions on avalanche distribution.
This paper collected over 10 years of data from Twitter, Telegram, Weibo, Parler, StackOverflow, and Delicious, including more than 200 million time series in which all events contained the same theme. It includes more than 900 million events in total. Based on this, the team identified how to characterize avalanches in different data sets, and obtained cross-platform universal laws.
2. Problem definition and existing communication models
A more precise definition of an avalanche is as follows. Given time series {T1, T2… } \ {t_1, t_2, \ dots \} {t1, t2,… }, avalanche is defined as a series of events beginning with tbt_btb {TB, TB +1… , TB + s – 1} \ {t_b, t_ + 1} {b, \ dots, t_} {b + s – 1 \} {TB and TB + 1,… , TB + s – 1}, including 1 TB – TB – > Δ t_b – t_ {1} b > \ Deltatb – TB – 1 > Δ, TB + s + s – TB – 1 > Δ t_} {b + s – t_} {b + s – 1 > \ Deltatb + s + s – TB – 1 > Δ, and for all I = 1,… , Si = 1, \ dots, Si = 1,… S have TB + ib + I – 1 < = Δ i_ t_b + + I – 1} {b < = \ Deltatb + ib + I – 1 < = Δ. δ \Delta δ is a time-resolution hyperparameter. The same time series can be broken up into different avalanches depending on the value of δ \Delta δ. S is the avalanche size, that is, the number of events within a given time series, and T is the avalanche duration, that is, T= TB + S −1−tbT= T_ {b+ S-1} -t_BT = TB + S −1− TB.
In addition, the seepage intensity P∞P_{\infty}P∞ and the corresponding magnetic susceptibility χ\chiχ can also be defined.
Where SMS_MSM is the maximum avalanche size in each sequence.
^2_m>
and
2
^2
2 are the first and second moments of SMS_MSM respectively.
Similar to disease transmission, it is currently widely accepted that information is a process of simple contagion, that is, a single contact activates nodes. This mechanism is sufficient to describe the whole process. However, a considerable number of studies support the paradigm of complex contagion. As first proposed by Centola and Macy, in a complex communication process, individuals’ participation in information transmission requires contact with multiple acquaintances [1]. Complex propagation has also been proved by some models, such as linear threshold Model and Random Field Ising Model (RFIM) [2].
3. Research on the characteristics of information transmission
3.1 general
In this paper, optimal resolution δ ∗\Delta^* δ ∗ is defined as the phase transition point of the one-dimensional seepage model describing time series, and each time series in the data set is considered as an instance of the one-dimensional seepage model. The magnitude of the largest avalanche in each time series was measured, and the seepage intensity and the corresponding susceptibility (Susceptibility) were defined. Optimal resolution δ ∗\Delta^* δ ∗ is obtained by maximizing susceptibility, and ultimately an optimal resolution can be calculated for each time series, i.e
For different resolutions, almost the same quantitative phase transition behavior can be displayed after normalization with the corresponding optimal resolution (as shown in FIG. 1). This shows that the dissemination of information on social media can be regarded as a universal process.
FIG. 1 shows the relationship between seepage intensity (a), corresponding susceptibility (b) and time resolution respectively. Different colors represent different social media. The abscissa is normalized using the optimal resolution.
Furthermore, the optimal resolution was used to calculate the avalanche size S and duration T, and characterize their distributions (FIG. 2A, 2b). It was found that different datasets showed consistent behavior, and the relationship between time T and size S was successfully verified (FIG. 2C), again verifying the existence of such a universal process.
Figure 2. (a) Distribution of avalanche size. (b) Distribution of avalanche duration. (c) Relationship between avalanche duration and size. (d) Comparison of fitting parameters and model simulation parameters of different platforms
3.2 criticality
The results of power-law distribution in FIG. 2 show that there exists criticality behind the process of information transmission, and this criticality can be characterized by fitting the critical index. The propagation Process was simulated by means of numerical simulation using mean-filed RFIM and Branching Process [3], respectively. The issing model of mean random field is a complex propagation process of many-to-many, and the branching process is a simple propagation process of one-to-many. The fitting results on the full data set are shown in Figure 2a-2c, and the fitting results for each platform are shown in Figure 2D. The results show that the critical indices of different platforms are consistent, and the propagation results fitted by RFIM are more consistent with the actual data.
The complexity of 3.3
Phenomenally, the data performance is more consistent with the RFIM fitting results, which to some extent indicates that the macro behavior on the whole data is more inclined to the complex propagation process. In addition, in order to further verify the conclusions, a maximum likelihood method is proposed to verify the validity of the fitting results (this method is inspired by work [4]). This method supports three different tests.
- Evaluate the optimum fitting parameters for a time series — specifically the BRANCHING ratio of BP and the disorder parameter of RFIM — compared with the critical values of the model;
- P-value can be used to evaluate the effectiveness of each fitting method.
- The likelihood probability of different methods is compared to evaluate whether BP or RFIM is better for modeling a sequence.
Figure 3 shows the validity test of the two model fitting.
Figure 3. (a) is the result of RFIM, and (b) is the result of BP. The vertical dotted line represents the critical value of the model. (c) is the proportion probability of logarithmic likelihood. Blue-green indicates that BP is better than RFIM in time series, and red indicates that RFIM is better in corresponding time series. (d) According to the classification method of (c), time series are clearly divided into two different behaviors (only the time series near the critical point is taken as the representative of the corresponding classification), and the avalanche scale distribution of the corresponding series is calculated. The solid point indicates that the model used is RFIM and the hollow point is BP. The dashed lines correspond to the best power fitting of the two models respectively.
Through analysis, the author found that the range of optimal fitting parameters was very wide (FIG. 3A, 3B), including the critical state of the model and a large part of the sub-critical state, that is, most events belong to a small number of time series that trigger large avalanches. Therefore, for BP and RFIM, the large-scale behavior of the system is mainly determined by a few time series, whose parameter space is very narrow and near the critical point (subgraph of FIG. 3A and B).
Furthermore, such tests show that most time series can be well described by at least one of the two models. And it can be seen from Figure 3C that each time series can be divided into two categories of almost the same size. One is better described by BP and the other is better described by RFIM. RFIM has a slightly obvious advantage. Information communication on social platforms is a mixture of complex communication and simple communication. Based on Figure 2, we can further draw a robust conclusion that complex communication is slightly superior, because it has better performance and RFIM compatibility at the overall level.
In fact, there is an obvious “crossover” in the BP time series as a whole (FIG. 3D). Before the crossover, small avalanches meet BP propagation, while the large avalanches after the crossover are close to RFIM again.
3.4 Semantic information of different communication modes
The paper also qualitatively analyzes the semantic content of the two communication modes.
FIG. 4 shows the top 30 hashtags in terms of Twitter data proportion under different categories. Blue-green is the sequence label more consistent with RFIM model, and red is the sequence label more consistent with BP model. Label size indicates proportion sort.
As can be seen from Figure 4, typical labels in BP are common topics, most of which are related to music, movies and TV programs. Topics screened by RFIM are related to controversial topics such as politics and social news. This shows that there is a correlation between the semantic information of labels and the general classification of corresponding time series. The paper concludes that the key difference in the propagation of the two “information avalanches” lies in the dynamics behind the avalanches, and the results of this classification provide a surprising but very solid theoretical basis for this assumption.
4. Conclusion: Transcending temporal features
The authors’ team calls for a reconsideration of the rationality of this algorithm, which considers only temporal characteristics of information transmission process. At present, these algorithms only consider the temporal characteristics, but ignore the semantic information of tags and even the network structure behind the propagation, which are very important for information transmission.
At the end of the paper, the authors speculate that the generalizability of our conclusions probably goes beyond the data sets presented in the paper. If this is true, then there must be a mechanism behind this universality. Understanding the mechanism behind this, and how to develop such a mechanism to predict the spread of information through online social media, remains a challenge.
reference
[1] Centola, D. & Macy, M. Complex Contagions and the Weakness of Long ties. Am. J. Sociol. 113, 702 — 734 (2007).
[2] Dodds, P. S. & Watts, D. J. A generalized model of social and biological contagion. J. Theor. Biol. 232, 587–604 (2005).
[3] Watson, H. W. & Galton, F. On the probability of the extinction of families. J.R. Anthropol. Inst. G.B. Irel. 4, 138-144 (1875).
[4] Clauset, A., Shalizi, C. R. & Newman, M. E. Power law review and Empirical Data. SIAM Rev. 51, 661 — 703 (2009).