2017년 2월 2일 목요일

Search Engines

1. Finding information on the Web
 - Browse strategies: where is information stored?
 - Search strategies: what does the information contain?
  * Specific queries
   e.g. encyclopaedia, library
  * Broad queries
   e.g. web directories
  * Vague queries
   e.g. search engines

2. Web Search
 - All queries answered without accessing texts by indices alone
 - Links: link topology, link popularity, who links
 - Page structure: words in heading > words in text
 - Spamming
  * most search engines have rules against invisible text / meta tag abuse / heavy repetition / "domain spam"

3. Centralised architecture
 e.g. Crawler-indexer
 - Crawler: a program that traverses web to send new or update pages to main server (where they are indexed)
 - Centralised use of index to answer queries

4. Distributed architecture
 e.g. Harvest architecture
 - Gatherers: collect and extract indexing information from one or more web servers at periodic time
 - Brokers: provide indexing mechanism and query interface to data gathered
             retrieve information from gatherers or other brokers, updating incrementally their indices

http://searchsdn.techtarget.com/tip/Centralized-vs-decentralized-SDN-architecture-Which-works-for-you

5. Google Search
 - Crawling and Index Depth: aims to refresh its index on a monthly basis
 - Ranking algorithms
  * Variations of Boolean and vector space model: term frequency * inverse document frequency
  * Hyperlinks between pages: Popularity / Relatedness (e.g. PageRank, HITS)
   !! PageRank: Google finds a single type of universally important page -- intuitively, locations that are heavily visited in a random traversal of the Web's link structure

 - Google Relevancy: Google ranks web pages based on the number, quality and content of links pointing at them (citation)
  * Number of Links
  * Link Quality
  * Link Content
  * Ranking boosts on text style

http://www.slideshare.net/Ankit007_/ranking-algorithms

https://www.wordtracker.com/academy/learn-seo/technical-guides/google-ranking

http://searchengineland.com/what-is-google-pagerank-a-guide-for-searchers-webmasters-11068

댓글 없음:

댓글 쓰기