A Guide to Log File Analysis – How to Open the Google Blackbox
Log file analysis is something you should take into consideration as it can improve your overall digital marketing strategy. Log analysis has an impact on visibility, traffic, conversions, sales and helps reveal new points of SEO improvements. Log files are the only data that are 100% accurate to really get how bots are crawling your website.
A log file is actually a file output made from a web server containing ‘hits’ or records of all requests that the server has received. Data are stored and deliver details about the time and date in which the request was made, the URL requested, the user agent, the request ID address, and other interesting details. To explain it quickly, log file analysis allows you to get information about SEO visits and to see what the Googlebot is actually doing on your website. You can thus cross it with your crawl data and see further.
Let’s see the advantages of what you can get with log file analysis.
The advantages of log file analysis
Why is it useful?
Log file analysis is useful for many reasons:
- For your audits :
- You can diagnosis useful and useless pages
- You can detect zones that Google crawls
- You can know which pages Google does not know
- For your monitoring :
- You can get alerts and avoid waiting for a Google Webmaster Tool message
- You can monitor optimizations or deployment more easily
- You can anticipate attacks
Why should you use log file analysis?
You can exactly know what Google does on your website:
- Which pages are crawled by the Googlebot? Log file analysis helps you monitor the crawl behavior and crawl frequency. You can know what Google is actually crawling and see how page popularity, page depth, load time or any other important metrics can influence Google’s crawl. It can also help you determine if specific new content has increased Google visits on your website. The more interesting your site is, the more often Google will come.
- What are my active pages? Which pages receive the most SEO visits? Are these my most valuable pages? You can know which pages are actually generating SEO traffic, value and, thus, conversions. Logs analysis can also help to determine the most popular pages to Google’s eyes and see which ones are less crawled. For instance, if a user would like to rank a specific post for a targeted query but it is located in a directory that Google only visits once a time every three months, he will miss chances to receive organic traffic from this publication for at least three months. With log analysis, he can know that it could be necessary, for example, to redefine his internal linking to increase the impact of his “most valuable pages”.
- Does Google meet errors? Do you have too many 4xx errors on your website that lower Google experience on your website? Log data analysis can also help track errors in status codes like 4xx or 5xx that compromise SEO. Analyzing a website’s status codes also helps to measure their impact on bot hits and their frequency. Too many 404 errors will limit the crawler visit.
- In every case, you need to save Google crawl budget to help spend it on the right pages. It helps you improve your money pages’ performance and be sure that Google is actually seeing them! With log file analysis you can, for example, detect if Google spends too much crawl budget crawling resources like images or .css files. This budget is linked to the authority of your domain, the sanity of your website, and is proportional to the flow of link equity through your website. You don’t want that budget to be spent on useless pages.
Imagine if you could do log analysis for free.
Actually, log file analysis can be seen as expensive for companies and many of them stick to crawl analysis. But this technology is getting more accessible.
Open source solutions also exist. One of them is the free OnCrawl open source log analyzer. It is quite easy to install even for non-tech profiles.
1- Install Docker
Install Docker Tool Box.
Choose Docker Quickstart terminal to start.
Copy/paste the IP address 192.168.99.100
Then, download oncrawl-elk release: https://github.com/cogniteev/oncrawl-elk/archive/1.1.zip
Add these lines in the terminal to create a directory and unzip the file:
- MacBook-Air:~ cogniteev$ mkdir oncrawl-elk
- MacBook-Air:~ cogniteev$ cd oncrawl-elk/
- MacBook-Air:oncrawl-elk cogniteev$ unzip ~/Downloads/oncrawl-elk-1.1.zip
And then, add:
- MacBook-Air:oncrawl-elk cogniteev$ cd oncrawl-elk-1.1/
- MacBook-Air:oncrawl-elk-1.1 cogniteev$ docker-compose -f docker-compose.yml up -d
(Note: those lines are working for Mac and Linux. If you are under Windows, the process is a little bit more complicated for non techs).
Docker-compose will download all necessary images from docker hub, this may take a few minutes. Once the docker container has started, you can enter the following address in your browser: http://DOCKER-IP:9000. *Make sure to replace DOCKER-IP with the IP you copied earlier.*
You should see the OnCrawl-ELK dashboard, but there is no data yet. Let’s get some data to analyze!
2-Import log files
Importing data is as easy as copying log access files to the right folder. Logstash starts indexing any file found at logs/apache/*.log, logs/nginx/*.log, automatically.
If your web server is powered by Apache or NGinx, make sure the format is combined for log format. They should look like this:
127.0.0.1 – – [28/Aug/2015:06:45:41 +0200] “GET /apache_pb.gif HTTP/1.0” 200 2326 “http://www.example.com/start.html” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
Drop your .log files into the logs/apache or logs/nginx directory accordingly.
Go back to http://DOCKER-IP:9000. You should have figures and graphs. Congratulations!
You can now start using the free open source log analyzer and daily monitor your SEO performance.
To sum up, log file analysis is a powerful partner to increase your SEO performances and as a result, helps you drive more traffic and conversions to your website!