2012年3月12日 星期一

Social Network Analysis? What's that?


Brief Introduction

SNA (Social network analysis) has emerged as a key technique in modern sociology, which refers to methods to analyze social networks and social structures. Social network analysis views social relationships in terms of network theory consisting of nodes and ties.
Nodes are the individual actors within the networks, and ties are the relationships between the actors. Nodes are tied by one or more specific types of interdependency, such as friendship, kinship, common interest, financial exchange, dislike, sexual relationships, or relationships of beliefs, knowledge or prestige.

Case Study
Now we give an example to analyze the social network between notes.



This undirected sociogram describes a small social network composed of five social actors and a set of links. Here we just consider the one mode network.

1. General parameters

Degree
Density
Geodesic Distances

The degree of a node ni, noted by d(ni), is the number of nodes adjacent to it, including out-degree (the number of links pointing out of this node) and in-degree (the number of links pointing into of this node).

Density can measure the closeness of a network, is an indicator for the general level of connectedness of the graph.

Geodesic Distances, expressed by d(i, j), is the distance of the geodesic path between two i and j.
With regard to this instance, the degree of each notes are as following:

Notes
Degree
Alice
3
Bob
2
Carol
2
David
4
Eva
1


The density of this undirected graph is 0.6.
Geodesic Distances between two nodes are shown as below:


Alice
Bob
Carol
David
Eva
Alice
1
1
1
2
Bob
1
2
1
2
Carol
1
2
1
2
David
1
1
1
1
Eva
2
2
2
1

What’s more, {Alice, Bob, David} and {Alice, Carol, David} are cliques.

2. Centrality

When identifying which nodes are in the center of the network, here we consider three standard centrality measures to capture a wide range of “importance” in the network:

          Degree Centrality
          Closeness Centrality
          Betweenness Centrality


Historically first and conceptually simplest is degree centrality, which is defined as the number of links incident upon a node (i.e., the number of ties that a node has). The degree can be interpreted in terms of the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information).

In graphs there is a natural distance metric between all pairs of nodes, defined by the length of their shortest paths. The farness of a node s is defined as the sum of its distances to all other nodes, and its closeness is defined as the inverse of the farness. Thus, a node is the more central the lower its total distance to all other nodes. Closeness can be regarded as a measure of how long it will take to spread information from s to all other nodes sequentially.

Betweenness is a centrality measure of a vertex within a graph (there is also edge betweenness, which is not discussed here). It was introduced as a measure for quantifying the control of a human on the communication between other humans in a social network by Linton Freeman. In his conception, vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen nodes have a high betweenness.



With regard to this instance, the degree centrality of each notes are as following:

Notes
Degree Centrality
Closeness Centrality
Betweenness Centrality
Alice
0.6
0.8
0.08
Bob
0.4
0.67
0
Carol
0.4
0.67
0
David
0.8
1
0.58
Eva
0.2
0.57
0
           (the results above have been normalized)
Related Formulas:
(a) Degree Centrality: C’D(ni) = d(ni)/(g-1),
(b) Closeness Centrality:   
            
            









(c) Betweenness Centrality:       


           






 and gjk = the number of geodesics connecting jk, gjk(ni) = the number that actor i is on.

3. Influence Range

There is another measurement called Influence Range to show the set of actors who are reachable from the given node. This refined closeness centrality can be figured up by






Ji is the number of actors in the influence range of actor i (excluding i itself).
The computing results is:
Notes
Closeness Centrality (refined)
Alice
0.75
Bob
0.5
Carol
0.5
David
1
Eva
0.25
This index is a ratio of the fraction of the actors in the group who are reachable, to the average distance that these actors are from the actor ni.


4. Matrices for SNA

Matrix is a very important concept in SNA, and the primary matrix is called the adjacency matrix, or sociomatrix.
With regard to this example:


Alice
Bob
Carol
David
Eva
Alice
1
1
1
0
Bob
1
0
1
0
Carol
1
0
1
0
David
1
1
1
1
Eva
0
0
0
1



                                           X=



n1
n2
n3
n4
n5
n1
1
1
1
0
n2
1
0
1
0
n3
1
0
1
0
n4
1
1
1
1
n5
0
0
0
1



Case conclusion:
According to the computing results, we find David is in the “center” of the network, which means he is the key player and is the most influential note.


What we can know from the above instance:

Social Network Analysis is not just about graphs and data. Once a graph is drawn, you can measure it. Social network metrics reveal much about the nodes, and the clusters they form. Who knows what is going on? Who wields power or influence? Who is a key connector? Who is in the "thick of things" in this conspiracy? In this example, our calculations reveal that David is most important node in the network.

The common wisdom is that only big business and government use social network analysis. Yet, there are many individuals and groups that are learning the craft, and solving local problems. Although social network analysis can not be learned by reading a book, it does not require a PhD either. Any intelligent person, under the right guidance, and with the proper tools, can apply the methodology to an appropriate problem and gain enormous insight into what was previously hidden.

References:

6 則留言:

  1. You and I have different view about the picture and according to your advise the distance in the picture do have sense. And before you do all the calculate, I wish to see a deatil about the reult to someone who never take the clss, not only for us to read.

    回覆刪除
    回覆
    1. I have given detailed explanation about related definitions and the explicit methods of calculation. Then I present the results in terms of this instance. Thank you for your suggestion on my blog. If you still have questions about how I got these datas, you can leave your email address and I am pleased to answer your further doubts. Thank you!

      刪除
  2. what an excellent job you did! In your blog, we can see formulas, graphs and principles. Besides, I got the same results of you.
    However, I am afraid about that in reality, we have a huge network and very complicated relationships between users, which is always a dynamic system. So, can you give me some ideas about how to calculate efficiently?

    回覆刪除
    回覆
    1. Here I recommend you a popular software-InFlow, which provides easy access to the most popular network metrics. With visualization and metrics in one interactive interface, almost unlimited what-if scenarios are possible.

      刪除
  3. I agree with you! There are indeed many individuals and groups that are using this kind of analysis to make profit.

    回覆刪除
  4. Thank you for your detailed explanation about related definitions and the explicit methods of calculation.I'm pleased to see the result of betweenness centrality,because I didn't quit get the detailed calculation steps on betweenness centrality.After reading the process of your description of betweenness,I get both understanding and the ability to handle with betweenness.Vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen nodes have a high betweenness and it can well explain the status of David in the network.

    回覆刪除