<![CDATA[
]]>
Most business owners recognise that it is potentially of great value to understand the behavior of visitors to their website. What the visitor does or does not do can indicate the likely success of each strand of the marketing campaigns run by the owner. It would be ideal if a neat overall picture could be gained of all activity and lessons drawn from this.


Traditionally most ISPs offer ‘web statistics’ to provide the owner with a measure of what is happening on their website. This information is normally derived from log files for the individual web server. There are a number of issues the user needs to be aware of arising from this method of data collection.


Firstly the log files cannot capture the effects of caching – that is temporary copies held elsewhere which can be used to reduce the load on the web server. The ISP may use these particularly on dynamic pages like ASP or PHP, and organisations may also use proxy servers which cache to control the bandwidth they use. All of this means the number of visitors may be artificially reduced.


Secondly the log files record everything that happens including the periodic visits by spiders and bots that collect information for search engines. From looking at log files these bots often visit at intervals – perhaps every 30 minutes or so. These interact with timed methods used to distinguish unique visitors and therefore make reports that concentrate on entry points to the website meaningless unless the effects of spiders and bot are removed as far as possible.


These issues are, however, less important than the competitive pressures facing those who provide web statistics based on log files. This pressure has resulted over time in reports having greatest number of individual sheets possible giving as much detail as possible. This approach and belief in the value of providing every possible bell and whistle leads to the poor business owner getting buried in a sea of statistics. He or she didn’t want a website for this, and many will concentrate on other things rather than get embroiled. Some might even abandon the web altogether because it’s all too complicated to be bothered with.


Unfortunately this just proves that data and information are not the same and it is important to know the difference.


Another approach used is to abandon the web server log files and use scripts ( normally javascript ) and cookies to provide the data. This gets rid of the caching problem and the misinterpretation of spiders and bots noted above but introduces other problems.


Not all users have javascript turned fully on and modern browsers such as Internet Explorer 7 and Firefox allow many options to turn parts of the scripting facility off. Users may also restrict the use of cookies, and even if these are fully allowed it is easy for them to be periodically cleaned out. Again the browsers assist users to achieve this automatically through simple configuration options.


More seriously this approach requires that the service provider has sufficient capacity that collecting this data from all websites simultaneously doesn’t cause visibly slow response to the end user of the business owners website; or some of the data to be lost during periods of high internet traffic. It also requires the business owner to make changes to individual pages of the website – if this is possible. This can be problemmatic if a content management system has been used, and the modifications are not the same for every page.


So neither approach is perfect and both have limitations. The most important issue is that the user of the information understands what they can and can’t assume from the information provided.


What the business owner really needs is just sufficiently detailed information that he or she can work out how to change and improve the end user experience. This information needs to be regularly updated so that trends can be detected and the overall situation progressively improved. This allows the business owner to concentrate on improving customer service and turnover and not to get lost in reams of statistical reports.