Web server log file analysis
Date: 2004.09.30
After a long time of neglecting the issue, today (2004.09.30) I finally setup
web log analyser called awstats
on some of the sites I am maintaining. I started with my own site
szabgab.com with obviously
very low expectations regarding the number of visitors. After all I launched
the site only a few weeks ago and it does not have a lot of content yet.
Indeed there was only a handful of page-views on the site. I happily noticed that a
number of search engines have already looked at the site so I can expect my site to be
indexed. Actually I noticed a few days ago that searching on Google for
Gabor Szabo already shows
this site as the third result and Perl Training Israel
being the fourth.
Among the few page views obviously my own visits checking how the site functions make a relatively
large part but even with this small sample I can notice some interesting thing. While the percentage
of hits from Windows machines is 41.1% only 21.2% of the hits are generated by Internet Explorer.
That is, among my visitors half of them use alternative browsers even on Windows.
perl.org.il
Next I ran the analyser on the web site of the Israeli Perl Mongers.
The first thing I noticed that the single biggest visitor generated 9132 page views, 15 times more !
than the second one. As the hostname of that visitor is p4225-ipad05fukuhanazo.fukushima.ocn.ne.jp somewhere
in Japan, it seems to be obvious that it is doing something bad to the site. I quickly Denied access from
this address.
The next thing to notice was that Google hit the site 137,351 time in September 2004 using 600 Mb of data
transfer. This is 3 times more in terms of bandwidth than the total data transfer of all the visitors in
this period. This does not seem right either.
Looking at the list of Operating Systems generating hits. Actually why hits ? I think I'd be much more interested
in the distribution of the visitors and not the hits. Anyway we get that
- Windows 71.2 %
- Linux 16.8 %
- Unknown 7.8 % - Where does this come from ?
- FreeBSD 1.7 %
- Macintosh 1.6 %
- ...
This is only slightly surprising as among the Perl programmers there are a lot more people using non-MS Operating
Systems than in the general public. Still having only 71% seems very low. Looking at the
Browsers gave me a bigger surprise:
- MS Internet Explorer 57.1 %
- FireFox 15 %
- Mozilla 11 %
- Unknown 4.7 % - Where does this come from ?
- Opera 3.4 %
- Konqueror 3.1 %
- Netscape 1.6%
- ...
OK, so from the previous numbers we could already guess that IE could not generate more than
71.2 % of the hits but how come it is only 57.1 % ? This means that a relative large part of
the visitors while still using Windows have already moved away from Internet Explorer and are
using one of the Open Source browsers.
So where are these visitors coming from ?
An enormous number, some 32 % of the visitors came from Search engines. Among them Google was the
clear winner, bringing 30% (or 9407) of our total visitors. So we might as well forgive the Google
robots for eating so much of our bandwidth. Among the individual referrer sites
YAPC.org leads the pack, immediately followed by a number of
porn sites ! I don't understand this one. I am sure, even without looking at their content
that these are such sites and that they don't have a link to perl.org.il.
So why are they listed as the referrer ? Looking at the real log files all these entries are generated
by a Mozilla/5.0 (compatible; Konqueror/2.2.2; Linux 2.2.19; running on a single host. They all
fetch the main page of the site. The only thing I can come up with is that by artificially injecting
this address into our log file in a large number, the site owners want to make us get curious and visit
theirs. Sounds very lame.
Perl Training Israel
The next site to look at was the site of my Perl Training Company.
There were no big surprises here but as I found out there are a number of bad pages on the site giving
404 errors. Now, using this statistics and the error.log I'll get to clean up the site and see what to
do with the randomly failing visitors who try to reach pages that never existed.
H24, a commercial site
Looking at similar statistics at H24 - one of my clients - show a
totally different picture. On their site Windows is 99.4 % and Internet Explorer is 99.1 %.
This isn't surprising either. First of all they cater for the general public and are mostly advertising on
news/fun sites. Besides, part of the site currently works with IE only. We are working on fixing this but
I expect some difficulties convincing them. They can look at their own statistics and say: For less than 1%
we should not invest anything.
MSNBot vs Google
I read about some possible conspiracy theory
regarding the many hits MSNBot generates in order to change the way how search results about Linux show up.
Having these statistics it seems that the MSNBot is just plain greedy. On the H24 site, in the
last couple of days we got about 2500 hits from MSNBot while only 340 from Googlebot. And it is definitely
not a pro-Linux site.
Other web statistics
Knowing that Perl programmers use free Operating Systems and Browsers more than the average wasn't a
surprise, though the rate was interesting. Anyway I decided, this is a good time to look at
similar information from other sources. The first hit on Google lead me to the
browser stats of w3schools.
According to their statistics, those on our Perl related site are not that much off the general
numbers.
Operating Systems
- Windows: 90.2 %
- Linux: 3.1 %
- Macintosh: 2.6 %
Browsers
- Internet Explorer (5+6): 75.8%
- Mozilla: 16.9 %
- Opera: 2.3 %
- Netscape (3+4+7): 1.7%
(data from September 2004)
Here too we can see interesting things:
- About 15% of the users use Windows but not Internet Explorer
- Windows usage is much lower than reported earlier
While the only analysis the above site provides is saying that Windows XP and IE 6 are
the leading OS and Browser. What they fail to mention but can be seen from their data is
the trend. A year earlier in September 2003 IE (5+6) had a market share of 86.6% while
the various versions of Windows had 92.6 %. Both are declining.
|