Web server log file analysis

 

Web server log file analysis

Date: 2004.09.30

After a long time of neglecting the issue, today (2004.09.30) I finally setup web log analyser called awstats on some of the sites I am maintaining. I started with my own site szabgab.com with obviously very low expectations regarding the number of visitors. After all I launched the site only a few weeks ago and it does not have a lot of content yet.

Indeed there was only a handful of page-views on the site. I happily noticed that a number of search engines have already looked at the site so I can expect my site to be indexed. Actually I noticed a few days ago that searching on Google for Gabor Szabo already shows this site as the third result and Perl Training Israel being the fourth.

Among the few page views obviously my own visits checking how the site functions make a relatively large part but even with this small sample I can notice some interesting thing. While the percentage of hits from Windows machines is 41.1% only 21.2% of the hits are generated by Internet Explorer. That is, among my visitors half of them use alternative browsers even on Windows.

perl.org.il

Next I ran the analyser on the web site of the Israeli Perl Mongers.

The first thing I noticed that the single biggest visitor generated 9132 page views, 15 times more ! than the second one. As the hostname of that visitor is p4225-ipad05fukuhanazo.fukushima.ocn.ne.jp somewhere in Japan, it seems to be obvious that it is doing something bad to the site. I quickly Denied access from this address.

The next thing to notice was that Google hit the site 137,351 time in September 2004 using 600 Mb of data transfer. This is 3 times more in terms of bandwidth than the total data transfer of all the visitors in this period. This does not seem right either.

Looking at the list of Operating Systems generating hits. Actually why hits ? I think I'd be much more interested in the distribution of the visitors and not the hits. Anyway we get that

  1. Windows 71.2 %
  2. Linux 16.8 %
  3. Unknown 7.8 % - Where does this come from ?
  4. FreeBSD 1.7 %
  5. Macintosh 1.6 %
  6. ...

This is only slightly surprising as among the Perl programmers there are a lot more people using non-MS Operating Systems than in the general public. Still having only 71% seems very low. Looking at the Browsers gave me a bigger surprise:

  1. MS Internet Explorer 57.1 %
  2. FireFox 15 %
  3. Mozilla 11 %
  4. Unknown 4.7 % - Where does this come from ?
  5. Opera 3.4 %
  6. Konqueror 3.1 %
  7. Netscape 1.6%
  8. ...

OK, so from the previous numbers we could already guess that IE could not generate more than 71.2 % of the hits but how come it is only 57.1 % ? This means that a relative large part of the visitors while still using Windows have already moved away from Internet Explorer and are using one of the Open Source browsers.

So where are these visitors coming from ?

An enormous number, some 32 % of the visitors came from Search engines. Among them Google was the clear winner, bringing 30% (or 9407) of our total visitors. So we might as well forgive the Google robots for eating so much of our bandwidth. Among the individual referrer sites YAPC.org leads the pack, immediately followed by a number of porn sites ! I don't understand this one. I am sure, even without looking at their content that these are such sites and that they don't have a link to perl.org.il. So why are they listed as the referrer ? Looking at the real log files all these entries are generated by a Mozilla/5.0 (compatible; Konqueror/2.2.2; Linux 2.2.19; running on a single host. They all fetch the main page of the site. The only thing I can come up with is that by artificially injecting this address into our log file in a large number, the site owners want to make us get curious and visit theirs. Sounds very lame.

Perl Training Israel

The next site to look at was the site of my Perl Training Company. There were no big surprises here but as I found out there are a number of bad pages on the site giving 404 errors. Now, using this statistics and the error.log I'll get to clean up the site and see what to do with the randomly failing visitors who try to reach pages that never existed.

H24, a commercial site

Looking at similar statistics at H24 - one of my clients - show a totally different picture. On their site Windows is 99.4 % and Internet Explorer is 99.1 %.

This isn't surprising either. First of all they cater for the general public and are mostly advertising on news/fun sites. Besides, part of the site currently works with IE only. We are working on fixing this but I expect some difficulties convincing them. They can look at their own statistics and say: For less than 1% we should not invest anything.

MSNBot vs Google

I read about some possible conspiracy theory regarding the many hits MSNBot generates in order to change the way how search results about Linux show up. Having these statistics it seems that the MSNBot is just plain greedy. On the H24 site, in the last couple of days we got about 2500 hits from MSNBot while only 340 from Googlebot. And it is definitely not a pro-Linux site.

Other web statistics

Knowing that Perl programmers use free Operating Systems and Browsers more than the average wasn't a surprise, though the rate was interesting. Anyway I decided, this is a good time to look at similar information from other sources. The first hit on Google lead me to the browser stats of w3schools.
According to their statistics, those on our Perl related site are not that much off the general numbers.

Operating Systems
  1. Windows: 90.2 %
  2. Linux: 3.1 %
  3. Macintosh: 2.6 %
Browsers
  1. Internet Explorer (5+6): 75.8%
  2. Mozilla: 16.9 %
  3. Opera: 2.3 %
  4. Netscape (3+4+7): 1.7%
(data from September 2004)
Here too we can see interesting things:
  1. About 15% of the users use Windows but not Internet Explorer
  2. Windows usage is much lower than reported earlier

While the only analysis the above site provides is saying that Windows XP and IE 6 are the leading OS and Browser. What they fail to mention but can be seen from their data is the trend. A year earlier in September 2003 IE (5+6) had a market share of 86.6% while the various versions of Windows had 92.6 %. Both are declining.

Last Update: Tue Sep 25 17:06:26 2007