How to Conduct a Quickfire Technical SEO Audit
Search Engine Optimisation

How to Conduct a Quickfire Technical SEO Audit

31st October 2016

In a recent survey, 39% of SEOs suggested that technical SEO is the first area they look to tackle when taking on a new project. These stats are hardly surprising, as the technicalities of how a website is built will clearly underpin any efforts to improve its performance in organic search.

Here we outline 5 key areas you can audit in just a few hours to highlight essential fixes that will all contribute to improving your websites performance in organic search.

1. Audit indexed pages

The first step in identifying any issues hindering organic visibility is to audit the number of pages Google is indexing from a site vs. the number of pages on the site itself.

Using the site command

This can be achieved at a very basic level via the site command. You can use this command on any domain to assess how many pages from your website Google have indexed.

site-command

Checking index status

Within Google search console you can assess how many pages Google are indexing from a site within the ‘index status’ report:

index-status

You should compare this figure (total indexed pages) with the number of pages in your sitemap to assess whether there is a noticeable gap between the number of pages you have asked Google to index and the number of pages they have chosen to index.

To check the number of pages you have submitted to Google from your sitemap visit crawl > sitemaps in Google search console:

sitemap

If there is a noticeable gap in the number of submitted pages vs. the number of indexed pages, there’s a good chance that the sitemap isn’t up to date. You will often encounter sitemaps that have been manually generated as a one off at some point in the past, and pages have since been deleted – meaning there are huge gaps between submitted / indexed pages for the sites in question. To confirm if this is the case, simply export the XML sitemap, convert to CSV and crawl all the pages using Screaming Frog to analyse the status codes. If you find that any of the URLs in the sitemap have the status code 404 (not found), then you should speak to your web developer to implement a dynamic sitemap solution to ensure Google has an up to date list of all the pages on your site at all times.

What if the number of indexed pages is much less than total number of pages on the site?

There are a number of reasons why this would be the case, but the robots.txt file is the first place to start if you find that the number of pages indexed by Google is significantly less than the number of pages you would like them to.

A robots.txt file gives instructions to web robots about the pages you don’t wish search engines to access. For example, many sites block access to their admin folders, downloads etc. – only allowing access to pages, images and code that search engines need to be able to access.

You can check the set up of your robots.txt file by adding /robots.txt after your domain and looking out for any folders which have been disallowed. If you find the following exclusion, it means that you’re blocking access to every page on your site:

User-agent: *
Disallow: /

However, if after reviewing your robotst.txt file you find that Google seemingly has access to all relevant areas of your site, the logical next steps would be as follows:

  • Check no index tags – the meta noindex tag tells search engines not to index the page the tag has been implemented on. If you find that some of your content isn’t being indexed by Google then right click > view source > ctrl + f to search for ‘noindex’. If you spot that this tag has been added to any pages that you want Google to index then remove it.
  • Check for duplicate content – Google will ignore pages that are duplicates of other pages that exist on the web, or on your own site, so if you find Google aren’t indexing certain pages then check for duplicate content using tools such as Copyscape.
  • Check for penalties via the Panguin tool – it may well be that the technical set up of your site is fine, and that the reason your pages aren’t being indexed is due to an algorithmic hit at some point in the past. To check whether this is the case visit https://barracuda.digital/panguin-tool/

If you can’t find any issues after following the steps above and are concerned that key pages on your site aren’t being indexed by Google, it’s probably time to get in touch with an SEO consultant to help dig a little deeper on your behalf.

What if the number of indexed pages is much more than total number of pages on the site?

On the flip side, if you notice that Google are indexing many more pages than you believe you have on the site, the chances are that Google are crawling paginated content, or URLs created as a result of faceted navigation.

Both pagination and URL parameter issues can be diagnosed by:

  • Conducting a site crawl using a tool such as Screaming Frog
  • Exporting the results and searching for any URLs containing ?
  • Checking whether the strings (?product=) identified are blocked via robots.txt

Particularly common on ecommerce sites, URL parameters can often result in multiple variations of the same URL all displaying the same content on the site – and if picked up by search engines this can result in numerous pages being indexed unnecessarily.

For example, the following URLs would, in theory, all point to the same content. The only difference in this case would be that some of these pages might be filtered to contain different types of product:

  • http://www.example.com/products/cars
  • http://www.example.com/products/cars?category=4×4&color=red
  • http://www.example.com/products/cars?category=4×4&brand=jeep&color=red

In this case it would be beneficial for Google to only access and index the base URL of those listed above, rather than all 3. To prevent cases like this causing any issues for your ecommerce site then you can block access to ALL search parameters by using the following command:

######## QUERY STRING BLOCKER #########

Disallow: /*?*

2. Analyse crawl stats

Once you have diagnosed any indexation issues, it’s time to analyse the frequency and average duration of crawls undertaken by Googlebot on your site.

Reviewing your website’s crawl stats can help identify wasted crawl budget. For example, are there pages that Google are spending time attempting to crawl that you don’t want them to index? Are Google spending too much time downloading your pages? The answers to each of these questions can be found by analysing the crawl stats report in Google search console:

crawl-stats

When analysing this data, it’s important to note that Google assign a crawl budget to each domain they crawl. With this in mind, you need to ensure that Google is only crawling URLs on your site that you want them to index. For example, if you find Google is crawling 20,000 pages per day in the crawl stats report, and you only have 5,000 URLs on your site, then clearly Google is able to access URLs it shouldn’t – like the search parameters we outlined in the previous section.

Googlebot provides you with its activity for the last 90 days in the crawl stats report, and in addition to listing the number of pages crawled per day, displays information on the total kilobytes downloaded from your site each day, and the time spent downloading your pages each day. If you spot the figures are particularly high on both of these reports, then you will need to carry out further research into compression and improving page speed.

3. Audit crawl errors and server response codes

The crawl errors report is available in Google search console and provides data straight from the horse’s mouth – so it’s important that you keep a keen eye on this.

The server response codes reported here are classified as server errors (5x response codes), not found errors (4x response codes) and incorrectly implemented redirects (3x response codes).

The area most marketers tend to flag up is the dreaded ‘Not found’ report, which displays a list of all the URLs Google have attempted to crawl before being returned a 404 status code by the server – essentially meaning the pages that Google is attempting to crawl have been deleted or simply do not exist.

404-errors

I’d highly recommend checking out Google’s advice on the topic of 404 errors and what to do about them.

The soft 404 error can also have a big impact on organic performance and is reported in the ‘crawl errors’ section of Google search console:

soft-404-errors

A ‘Soft 404’ error occurs when a deleted page displays a ‘page not found’ message to anyone trying to access it, but fails to return a HTTP 404 status code. This is often an issue with sites that have not properly implemented 404 status codes on deleted pages, and have simply set up a 404 message on the page itself. This means Google and other search engines will still attempt to crawl the page in question, as the server is still returning a status code of 200 (OK).

4. Audit and implement necessary redirects

301 redirects are used to signal to search engines and browsers that the requested URL has moved to a new location. For example, you would use a 301 redirect if you had changed your ‘About us’ URL from www.example.com/about-us to www.example.com/about.

301 redirects help to pass page equity from legacy pages over to the page they’re redirected to, so it’s important not to simply delete key pages on your website, especially if those URLs have inbound links pointing to them.

When auditing redirects I tend to break the task down into two sections:

  • Audit any existing redirects for redirect chains, 301 > 404s etc.
  • Add redirects from any high value pages that have since been deleted, including URLs listed in broken backlink reports

Auditing existing redirects

To audit existing redirects you will need to ask your web host / developer for the file used to manage your server side redirects. Once you have the list of existing redirect rules, you should run any pages that are being redirected to (destination URLs) through Screaming Frog to check the status code of those pages.

If you spot any pages returning a response code of 404 (not found), it means that you’re effectively telling Google that your old page has been replaced, yet the page you’ve replaced it with has since been deleted! In cases like this you will need to find new destination URLs for your legacy redirects.

In addition to checking for 301s > 404’s, you can use Screaming Frog to check for redirect chains, which will also need to be resolved. To do this import all of your redirects into screaming frog using ‘list mode’ and click reports > redirect chains:

redirect-chains

You will then have a list of all redirect chains active on your site, and once exported to CSV you can filter by loop = true:

redirect loops

Redirect loops are confusing for search engines and users alike, so it’s important to identify and avoid these in favour of one-step redirects.

Identifying new redirects

To identify new redirects you will need to analyse any 404 errors currently on your site. You don’t need to redirect all of these pages, just those that could potentially pass value to newer pages on your site. To identify pages of value that should be redirected to relevant replacement pages on your site:

  • Export all 404s errors using a combination of GSC (Google search console) data, any 404s identified as a result of a screaming frog crawl, and any URLs listed in the broken backlinks report from Ahrefs:

broken-backlinks

  • De-dupe your list of 404s
  • Link Screaming Frog to GA and Ahrefs to retrieve data on inbound links and previous traffic to these URLs
  • Redirect any URLs that used to receive a relatively large number of visits compared with your site average and any URLs that have received any valuable inbound links

Other things to check when auditing redirects

  • Ensure that any HTTP URLs redirect to their HTTPS counterparts (if the site has implemented HTTPS)
  • Ensure that any www. / non-www. URLs redirect to their www. / non-www. counterparts – if both versions are active you may have a problem with duplicate content
  • Check file extensions such as .html redirect to the base URL and that both the .html and / versions of your URLs aren’t both active concurrently
  • Check for duplicate versions of your homepage, such as example.com/index and www.example.com/home

5. Audit site and page speed

Google have explicitly stated that they use site speed as a signal within their ranking algorithm, so auditing and improving site speed should be taken seriously by webmasters.

To conduct a quick audit of your page speed and get a set of recommended improvements, you can use Google’s page speed insights tool:

suggested-improvements

Google provide a list of suggestions for both mobile and desktop speed improvements, which they have further explained via the ‘show how to fix’ links.

However, one limitation of Google’s tool is that it isn’t very in depth and doesn’t break down each element on a page to identify where speed improvements can be made. To get even more granular with page speed improvements, I would strongly recommend running your pages through https://www.webpagetest.org/ which provides a waterfall overview of the elements taking the most time to download on each page:

testspeed

Tracking progress

By following the steps above, you will no doubt have identified a range of issues and potential fixes. Some of these fixes will be easier to implement than others, so it’s important to keep track of progress in a logical fashion. The guys over at Distilled have created a Google sheets checklist that you can use to keep track of progress, and I would recommend customising this checklist to meet your own specific requirements.

Tools required

There are various tools required to gather the data you need to make informed decisions on technical SEO. Here are the tools referenced in this post:

 

Written By
Ben Wood is the Marketing Services Director at UK based digital agency Hallam and has previously gained extensive client-side experience at a well known FTSE100 company. Ben specialises in SEO, PPC and Web Analytics.
  • This field is for validation purposes and should be left unchanged.