Sunday, April 17, 2005

Bang on IT7th April,Chinnaswamy stdPramod Chandra B Bhatt
1. Introduction Google's architecture Searching smartly Newer Trends
History Unix 70s 80s TcP/IP,Rlogin,FTP,Mail 90s Netscape,Browser 90s Altavista & others 00s Google, web services Now personalisation & clustering
2. email date/subject/sender/priority/content Header Folders infmn is volumnious,Retri
3. Organising web basic entity,document organising large no of docs query interface which document is where
4. Thoughts in the background several terrabytes of infmn Desktop search HCI-Human computer inerface
5. Debate:conventional wisdom should it be organised as a database? classification based search?(traditional file system) way the query may be formulated? kind of tools?
6. Defining moments store the full text in the server Louis Monier(alta-vista) Larry,sergie(google) index every word in the document
7. Text book TOC(Taste of contents) specific keyword?(appendix)
Random access indexing the web
8. word position and hit counts "to be or not to be" pattern matching
9. Forward and inverted indexing worldid->documentid->worldid
10. Crawlers Scooter-one of the first crawlers web robots-wobots Several thousands pages each day
11. 28 days "moving windows"
12. Role for machine intelligence Spell check (did u mean this?) looking for similar pages?
Web analytics purchasing pattern
PART 2------
Google indian connection Rajiv Motwani(IIT/K,Stanford)1. Random walk Anurag A ch File system
google-10 power100High point : recall : Highly link based
2. Google's URL service depth first,breadth first
3. Document store server 100 pages/second
Indexing: How it is done?1.parsing,to count hits2.hit record3.importance of record
inverting indices short barrel,full barrel title full text
docID wordID No of hits hit hit hit ------------------------------------- wordID ------------------------------------- wordID
4.Google's Architecture 'Seperation of concerns' __Server interface __index server __document server
GOOGLE WEB SERVER spell check
index server doc server Ad server
5. How query is processed 1.parse 2.words->word ids 3.seek
6.Ranking system
Ian RogersPage rank algorithm paucity of the timegoogle does not have a DB
PR=(i-d) + d(PR(T1)+....+PR(Tn)/c(Tn)] damping factor just to give a sense
Prof.Mahabala"How meshed locally the pages are"
7. Fault tolerance google's server down..... 10 bandwidth
Search strategysize & efficiencypower economics data cater 150w/siftspecial coolingparaller computing branch mispredict
PART 3------
words are implicitly anded (travel,travels,travelling,travelled) stop words are ignored
+LA = Los A ~SYnonyms or * ext: fileextensionRefining querydate of crawl
PART 4------
HTML document writer (Simple query interface)image,image annotations,audio & other media formsword,pdf,images (V3) HTML
explore the neighbourhhood
3 Mooter.com software testing clustering personalisation
Quality Assurance
Resource Testing Tools Softwarre Test
Which cluster definesiprofiling algorithm
Google pagerank Explained
cross language search Desktop searchSemantic webproject iridiumMAC

0 Comments:

Post a Comment

<< Home