Journal

An inexact science

PC Pro logo Posted: 1st July 1999 | Filed under: Press Articles, Technical
Author: Paul Ockenden
First Appeared in PC Pro 1999

While we're on the subject of hits and page impressions, we've pointed out before that many of the tools used for analysing server log files are, how shall we put this politely, somewhat suspect in quality. They're all pretty good at counting how many hits your site has received, but read back through our previous columns to see just how meaningless such raw figures are. There's a trick we've seen used by certain 'non-reputable' Web agencies that involves (at vast expense to their clients, of course) redesigning a Web site so that almost exactly the same content will generate four or five times as many hits from the same number of visitors. The trick involves slicing images, splitting up pages, and other similar tricks that make people more likely to activate a link. If the client is naive enough, they go away happy, thinking that their revamped site has suddenly become more popular. Oh, and of course the agency people will be able to pay themselves big bonuses from the extra cash they've just extracted from the client. But watch out agencies - our email suggests that more and more clients are reading PC Pro and this column!

Almost as meaningless as hits are what most software products describe as page impressions, which are nothing more than a count of all requests for anything with an HTM, HTML, PL or ASP file extension. Again the less scrupulous agencies have realised this and carefully redesigned their sites to include the use of lots of frames, so that they can greatly increase their numbers of so-called page impressions. Add a few extra navigation-layer clicks too, and suddenly you're looking at three or four times the amount of impressions than before. Nice work if you can get it!

Luckily, industry bodies such as ABC electronic (the Web site arm of the Audit Bureau of Circulations) are now insisting on a much tighter definition of a page impression, namely 'A file or combination of files sent to a user as a result of that user's request being received by the server'. In other words, if a multiframed page is requested, it will only count as a single page impression. We'll be taking a look at the process of preparing your site for an ABC //electronic audit, and what that audit actually involves in a few months' time.

Having just trashed hits and page impressions as measures of success, it would be only fair to do the same for typical visit and visitor calculations, which are often based on looking at the requests coming from a particular IP address. The problem is that many ISPs allocate IP addresses dynamically, so today I might be 194.70.234.203 but tomorrow I could be 194.70.234.204 and an analysis tool that uses IP addresses to detect visitors would then think I was two people. The reverse also happens - thousands of people might be hidden behind a corporate firewall with one IP address.

We can help things along by using cookies to track users. Cookies are little bits of info that the Web server can send to the browser, and then check whenever that user returns to the site. Unfortunately, many users don't trust cookies and disable them permanently, while other people have a habit of clearing their cookie list whenever they're tidying up their hard drive. Again, this leads to miscounting repeat visits. Also, if the same person accesses a site from their work desktop and again from a home PC, they will show up as two distinct visitors, even if cookies are enabled. We then come to the problem of cacheing: the presence of either a Web browser's own cache, or else a proxy server somewhere between the visitor and the site will mean that many visits don't touch the server at all as the data will be reconstructed from the cache. We don't have a hope of recording such traffic.

It's all a rather sorry state of affairs. The problem is that the Internet - and the World Wide Web in particular - was never designed as an advertising medium and as such there isn't any effective, built-in mechanism for measuring traffic.

So why do we even bother trying? Well, for two reasons. First, and most importantly, if a client has spent £200,000 or more on a Web site, they need some kind of feedback on how much traffic the site is getting, and even a rough guide is better than no data at all. Second, although the statistics won't be 100 per cent accurate in absolute numbers, they can at least be used for measuring trends over time (to discover, for example, that a particular site now appears to be twice as popular as it was a year ago) and for comparing a series of similarly hosted sites. For example, a chocolate manufacturer may have a number of brand sites hosted from a central location: although the statistics for each site may be meaningless when taken in isolation, they will be useful in making site comparisons - to discover, for example, that the Fruit and Raisin site gets 50 per cent more daytime traffic than the Chewy Crunch Bar site.