Domain | # of HTML Documents | % of Total |
---|---|---|
other | 1064318 | 41% |
com | 516709 | 20% |
edu | 698616 | 27% |
gov | 117125 | 4% |
net | 113595 | 4% |
mil | 14734 | 1% |
org | 89939 | 3% |
total | 2615036 | 100% |
Here, ``other'' includes all domains other than the given top-level domains. For example, ``other'' contains all non-US top-level domains (such as Germany's .de).
We analyzed a variety of properties of these documents. In this paper, we present results on the following: