1. Graph Theory + Web Structure
=> node or vertex is page / directed edge or arc is link
=> Web Graph ~ directed graph that is formed by webpages and their hyperlinks
http://webgraph.di.unimi.it/
2. Graph Terminology
- Network size: total number of vertices (N)
- Distance between two vertices (v and v'): shortest path between the pair
- Diameter of a network: longest distance between any two vertices
- Average-case diameter
- Degree of vertex: number of edges connected to v
- Density of network: ratio of edges to vertices
3. Graph Importance Measures
- Four types of centrality (Central as in important or vital)
* Degree Centrality: how many edges does a vertex v have?
* Betweeness Centrality: how many pairs of nodes is v between?
* Closeness Centrality: how quickly a message can travel from one node to the whole graph?
* Eigenvector Centrality: the connection of a vertex v to the important node in the network
https://networkx.github.io/documentation/development/reference/algorithms.centrality.html
4. Degree Distribution
- Normal distribution: The average degree is most likely. Very high and very low degrees are highly unlikely.
- Power law distribution: No meaningful average degree. Very low degrees are likely, but very high degrees are likely in aggregate.
* Power law networks: Some nodes have a tremendous number of connections to other nodes (hubs)
=> popular nodes can have millions of links: the network appears to have no scale (no limit)
* Preferential attachment: process by which items are distributed to objects according to how many items they already have
http://mathinsight.org/degree_distribution
댓글 없음:
댓글 쓰기