2017년 2월 2일 목요일

Web Graph

1. Graph Theory + Web Structure
 => node or vertex is page / directed edge or arc is link
 => Web Graph ~ directed graph that is formed by webpages and their hyperlinks

http://webgraph.di.unimi.it/

2. Graph Terminology
 - Network size: total number of vertices (N)
 - Distance between two vertices (v and v'): shortest path between the pair
 - Diameter of a network: longest distance between any two vertices
 - Average-case diameter
 - Degree of vertex: number of edges connected to v
 - Density of network: ratio of edges to vertices

3. Graph Importance Measures
 - Four types of centrality (Central as in important or vital)
  * Degree Centrality: how many edges does a vertex v have?
  * Betweeness Centrality: how many pairs of nodes is v between?
  * Closeness Centrality: how quickly a message can travel from one node to the whole graph?
  * Eigenvector Centrality: the connection of a vertex v to the important node in the network

https://networkx.github.io/documentation/development/reference/algorithms.centrality.html

4. Degree Distribution
 - Normal distribution: The average degree is most likely. Very high and very low degrees are highly unlikely.
 - Power law distribution: No meaningful average degree. Very low degrees are likely, but very high degrees are likely in aggregate.
  * Power law networks: Some nodes have a tremendous number of connections to other nodes (hubs)
   => popular nodes can have millions of links: the network appears to have no scale (no limit)
  * Preferential attachment: process by which items are distributed to objects according to how many items they already have

http://mathinsight.org/degree_distribution

댓글 없음:

댓글 쓰기