Abstract
Online communities are becoming increasingly important as platforms for large-scale human cooperation. In these communities users seek and share professional skills, spreading knowledge along the chain of skill level. To investigate how users communicate and cooperate to
complete a large number of tasks, we analyze StackExchange, one of the largest question and answer systems in the world. We construct expertise networks to include all pairs of help-seeking interactions and measure the skill levels of users based on their positions in networks by a novel indicator "average flow distance". We explain the discovered hierarchy in networks, in particular, the maximum length and the distribution of users across hierarchies.
1. Introduction
2. Method
2.1 Constructing expertise networks
As shown in Figure 1, question answering is a collective action that involves at least two types of users; the asker and the successful answerer whose answer is accepted by the asker. For a majority of questions, there is also a third type of users, the failed answerers whose answers were not accepted. To focus one real contributions, we only draw edges from askers to successful answerers, who are sharing their professional skills to solve the problems (Jun Zhang et al., 2007).
2.2 Calculating average flow distances
3. Findings
3.1 The hierarchy of expertise networks
We construct expertise network using the data of physics.stackexchange.com and investigate the network topology (Figure 1). A divide between askers and answers is observed: the population of askers is 1.5 times as big as that of answerers, but only 28% of askers also answer questions. A similar structure called "bow-tie" was observed by Andrei Broder et al. at 2000 and Jun Zhang et al. at 2007.
We calculate the flow level L_i of all users and found that the askers and answerers are separated (Figure 2). The flow level of askers equals one and that of answerers is equal to or greater than two. Those users who both ask and answer questions have a variety of flow levels, depending on the level of the users receiving their help.
It is observed that question difficulty is related with the flow level of asker and answers.
The TrueSkill score of users separates the askers (whose scores are around 10) from the answerers (whose scores are around 30). The distribution of flow level gaps shows that for a majority of cases the answerers need to have a (1.2) higher skill level to give a satisfying (accepted) answer.
We compare four different measures of skill level, including degree, PageRank score, TrueSkill score, and flow level. It turns out that PageRank score is trivially correlated with the degree of nodes. The TrueSkill scores, while it separates the askers from answerers as efficient as flow level,
3.2 Cascade Model for Attention Competition
The limitation of hierarchical levels
We find the cascade model explains the limitation of flow hierarchy in expertise networks. In particular, the flow distance Li is a function of the ith node in the model such that:
![][1]
[1]:http://latex.codecogs.com/svg.latex?L_i=1+\frac{1}{n-i}(L_1+L_2+...+L_{i-1})
![][2]
[2]:http://latex.codecogs.com/svg.latex?f(x)=\left{\begin{array}{lr}L_i=1+\frac{1}{n-i}(L_1+L_2+...+L_{i-1})&:i\leq\frac{n}{2}\L_i=1+\frac{1}{i-1}(L_1+L_2+...+L_{i-1})&:\frac{n}{2}<i\leq{n}\L_i=1+\frac{4}{n2}\sum_{i=1}{n/2}(n-2i+1)L_{n+1-i}&:i=n+1&n=even\L_i=1+\frac{4}{n2-1}\sum_{i=1}{(n-1)/2}(n-2i+1)L_{n+1-i}&:i=n+1&n=odd\end{array}\right.
See the following figure for simulation
The distribution of users across hierarchical levels
Comparing model against StackExchange data.