Number of In-links

We sorted the child URLs which we extracted according to the number of times they occurred in our data set. This showed us the most ``popular'' sites, as measured by the number of in-links observed. We removed several less interesting items from this list, e.g., interlinked genome databases; the remaining sites appear in the following table.

The in-link entries marked with (*) indicate sites that are highly self-referential. That is, these sites (by inspection) appear to contain a great number of links to their own top-level pages. It would probably be instructive to count only links from outside a given site.

Most-linked-to URLs
SiteDescriptionIn-links
www.xerox.com Xerox PARC (*) 28188
www.yahoo.com Yahoo 19424
cool.infi.net Cool Site of the Day 19028
hamsterix.funet.fi Bible (in Finnish) (*) 17243
sundarssrv2.cern.ch CERN preprint service (*) 16049
wings.buffalo.edu Best of the Web '94 14685
wings.buffalo.edu U.S. Gazetteer 14369
www.ist.unige.it Cell database (*) 12750
home.netscape.com Netscape Communications 12081
www.american.recordings.com Ultimate Band List 11014
jasper.ora.com Comprehensive TeX Archive Network 10650
www.ibm.com IBM Corp. 10617
www.informatik.uni-trier.de Bibliography Server on Database Systems & Logic Programming (*) 10212
siva.cshl.org wusage 3.2 (WWW usage statistics) 9038
curly.cc.utexas.edu Jane Austen's Pride & Prejudice (*) 8928
www.starwave.com StarWave 8721
allison.clark.net Rob & Jen's Genealogy Page (*) 8476
helios.jicst.go.jp Japan Information Center of Science and Technology 8331
neoteny.eccosys.com NetSurf mailing list (*) 8036