Two small details about statistical analysis of Web logs

webmaster friends usually will give their website installation cnzz, Baidu statistics and other webmaster statistics tools, but these webmaster statistics tools will not record network spider climb take situation. Some webmaster friends usually use log analysis tools to analyze web logs and crawl spiders. Personally, I think, most of the webmaster in the statistical analysis of the website log, ignoring some small details, listed below two:

1. The web log file should be determined by the amount of access, whether or not it needs to be generated by the hour.

a webmaster of my friends, every website is to generate a log file, some time ago he took part in the business competition, maintain the website ranking in the home page, there are thousands of IP daily traffic, every web log file size is about 50M or so, a little cup is his brain a little old one. Open the web log file is not no response is the crash. He had to pass the log to me through the network, let me help to analyze, 50M file is not large, the problem is that he uses the telecommunications network, I use Netcom network, in the log when the regular cups. The 50M log file, log analysis program I use often data overflow and collapse, but had to use text editor to open the view, in the face of dense text log, statistical analysis of these data is very difficult. Therefore, the proposed site visits relatively large webmaster friends, it is best to generate web logs per hour, although the file generated a little more, but more conducive to the analysis of the site log.

two, the information on the web log is not complete.

don’t know your webmaster friends, have you noticed, the website log rarely 5xx return code?. For example, the 500 return code indicates an internal error in the server, and 503 returns the code indicating that the service is unavailable. Everyone webmaster friends know, 5xx return code generally means that the server has a fault, in general, the server is out of order, is unable to generate the site log. In other words, when the web server is down or DNS not analysis, all the people are unable to access, the spider can not access, at this time, web log is certainly not record any information. In order to better monitor the site, I personally suggest that you register and use Google webmaster management tools, you can effectively record the server access error information.

is the analysis of the above two points, I personally think of two small problems in the process of Web log, hoping to attract, welcome webmaster friends.

this article originated from Beijing home appliance maintenance network http://s.bbs.bjjdwx.com/thread-135013-1-1.html, reprinted please indicate the source, thank you for your cooperation.