An expresson of my interests in Web 2.0 and everything beyond and other than it.

Sunday, April 20, 2008

Collection Agents

Before any kind of analysis, there is invariably a large pool of data. You need to delve deeeeeep into it, throw away the junk and add more than your brain cells to get the disturbing questions answered and your analytical process to come to a conclusion.

So first thing first...collect the data we need for analysis...

The data required for web analysis can be collected and accumulated in three ways: Webserver log, server/network monitor and page tagging with invisible image.
     
  • Webserver log: Almost all webserver has data logging capabilities based on server administrator's chosen settings. The server logs each request by the browser for a given resource as well as the response sent back by itself. In some sense its like the chat logs which gets written and stored in your machine, when you are using IM clients like Yahoo!Messenger. The information available and collected from a log file can be date, time, number of hits, visitors, visitor duration, visitor origin subdomain, referral link, visitor IP address, browser type and version, platform, cookies.

  • Server/network monitor: Used as a plugin to the webserver, a server monitor records events from the server itself. A network monitor, also known as "packet sniffer", records every packet of data as it is transmitted or received - from browser requests, server responses, form-posted data, cookies and individual files. Thus this tool is able to develop a more comprehensive picture of web traffic.

  • Page tagging: This system needs a web-page to be tagged by a small piece of code and occassionally an invisible image. The image is of a special kind which is able to capture all the data that the previous two methods is able to capture, and using the code present in the page, the captured information is transmitted to a (usually) remotely located application, which in turn collects the data and stores in a data container.


As far as the relative methodology is concerned, it is possible to discern a very important element of distinction: the first two techniques need to be implemented on the same side as the web-server itself. Data collection by these two methods must happen locally. Whereas, in the third method, the logging can happen remotely. Since the code in the web page is able to transmits the raw-data away from the website one intends to track, a certain server in some other location can be configured to receive and record that data. This feature makes this third method a rather independent and reliable alternative to server log analysis.

Wondering already which is the better or preferred method?

No comments: