Recently, we’ve published a piece about how to run your first SEO audit that covered the first essential part: Content Overview. Today we provide you with the second part of the complete SEO audit that you should conduct when starting a new project. The time has come to perform an indexing overview.
What is indexing?
A word of explanation: indexing; i.e. Crawling is a basic SEO process. The primary function of search engines is to crawl web pages and build the index in order to provide users ultimately with relevant results: answers to their various queries. Websites are actually crawled by search engines’ automated robots, in other words, spiders or crawlers that scan pages and determine their relevancy and quality. Needless to say, these spiders can process a huge quantity of data in a flash.
These automated robots go from one page to another using internal connections -> internal links. So the more external blogs and websites link to your resources, the more often crawlers visit your site and evaluate and update your rank in search engine results pages. And actually, this is one of the primary reasons why you need to come up with an effective backlink strategy.
Let’s get back to conducting your first SEO audit. In this write-up, I will go through all the most important indexation elements, describe why they are important and explain how to set them up correctly to maximize their SEO effectiveness.
How do you start the indexing overview? The first thing you need to do is determine which pages of your domain should be indexed and which shouldn’t. So, let’s dive into the page exclusions and inclusions.
#Indexing Overview
# Page exclusions
It’s a matter of fact that not all the pages of your website ought to be indexed. Some of them just don’t contain enough content (which is not well seen from the SEO perspective), and others should be hidden from people who never went through your sales funnel.
What pages should be excluded from the SERPs?
Keep your thank-you pages away from a search engine’s index. Thank-you pages usually appear when a user converts on your particular landing page, becomes a lead and gets access to your downloadable offer like, for instance, whitepapers, checklists, templates or PDF ebooks.
Obviously, you don’t want these pages to be detectable on the search engine results pages.
Because if they can be detected via Google, you might be losing leads right now…
Just don’t let anyone bypass your funnel. Your visitors should perform the specific action, first, in order to get your assets. Fix this quickly and remove all your thank-you pages from indexing.
However, not only thank you pages should be excluded from indexing. The same refers to your private data that might be stored on your website or less valuable pages like Terms of Service, use of cookies and privacy policy.
It’s also better to exclude duplicate content from indexing and third-party content, which is available in other places on the web.
# Page inclusions
Robot.txt file
You can quickly indicate what pages should be crawled by search engines by adding a robot.txt file to your site. Check if a website has it submitted. It is usually located like this www.yourdomain.com/robots.txt.
This file’s main purpose is to provide instructions to search engines’ robots about what to crawl on your website. So before a spider starts scanning, it checks the specifications from the robot.txt file.
All the pages that should be crawled are marked with User-agent: *. On the other hand, all the pages excluded from the indexation should be marked with Disallow: The robots.txt file sends robots to your sitemap. The robot.txt file is also called the robots exclusion protocol.
Learn how to format robot.txt correctly – Everything About a Robots.txt File
More in-depth information about the topic can also be found here About /robots.txt
Remember that robot.txt is a public file that can be seen by anyone. So, if you really need to hide various private data, better take the more secure approach and keep searchers from confidential pages using, for instance, password protection.
No index meta tag
There is another option available for preventing a page from being crawled and becoming visible in a search, though; it’s called a “no index” meta tag. A “no index” meta tag can be added to a page’s HTML code in order to exclude a specific URL. This is a very useful method and the implementation only takes a little technical know-how:
To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the<head> section of your page:
<meta name=”robots” content=”noindex”>
When a robot crawls your website, it reaches all the internally linked resources, however, when a robot spots the no index meta tag, it drops the link entirely from the search.
*This second method can be applied independently. So you won’t need to exclude the link from indexing in the robot.txt. file anymore.
# Sitemap
What is a sitemap? A sitemap is a file where you should list all the pages of your website and present the hierarchy and content structure. Regardless if your site architecture is clearly organized or not, it is always beneficial to submit a sitemap because it will strongly improve the crawl-ability of your website.
So don’t hesitate to build it and submit it. When creating your sitemap, decide what pages you want to be crawled and visible in the search results, and organize them in the proper hierarchy.
Read more about Sitemap: Using a Sitemap Effectively
# Broken pages – 404 errors and redirects
When performing your first SEO audit, you should definitely check if there are any (404) error pages to fix. How do error pages affect indexing?
Well, when a spider crawls your website and comes to a broken page, it immediately leaves. What is more, many broken pages within one domain are a strong indicator of poor website condition and a decreased quality of user experience. This issue usually results in a reduced search ranking.
Use Google Search Console and Bing Webmaster Tools frequently to check if any broken pages have been identified on your website.
# Redirects
In case you find any broken links, first check if their destination source still exists.
If it does, just fix the broken URL, if not, then forward the page to the appropriate replacement.
Any permanent redirections that you install on your website should be 301 redirects because they pass almost 100% (to be precise 90%-99%) of SEO juice to the new destination.
302 redirects and other redirects types, like meta refresh or Javascript based, aren’t as SEO effective as 301. Actually, 302 redirections don’t pass any SEO juice, so these are recommended only as a temporary solution.
# Duplicate content issues
If you have duplicate content on your website, you may end up with a Google penalty. So by all means, avoid it.
Duplicate content issues also look bad from a user’s perspective because they worsen the quality and readability of your website. How can you solve this? If you really need to keep the content, instead of getting rid of it, try to rewrite and rephrase the existing duplicate text in order to make it unique.
The other method is to use the canonical tag to identify the root source. Decide which page should work as an original and then add this markup to the head section of the preferred URL and all its variants:
<link rel=”canonical”href=”http://www.yourdomain.com/your-preferred-url/”/>
Read more about canonical tag here: How to Use Canonical Tags for Better SEO and here: A Beginners Introduction To The Canonical Tag.
# Code validation
Don’t forget to check your code in order to catch unintended HTML mistakes. There are so many programming languages and rules of using each that search engines have their preferences.
It’s openly stated that you should be using W3C standards in order to make your code understandable for search engines.
All you need to do is perform your website test to detect issues that may be very easy to fix but not possible to spot otherwise.
This W3C Markup Validation Service checks the markup validity of Web documents in HTML, XHTML, SMIL, MathML, etc. Check it out!
# Speed your site up!
Because users often have a low attention span, you should examine the load speed of your website during your first SEO audit. Search engine crawlers can be said to have a low attention span as well: they crawl quick websites much more eagerly and way more thoroughly than slow ones.
These two reasons should be enough to convince you to accelerate your site. ?
Make sure that all the pages of your website load smoothly and quickly.
There is a great number of tools available online and for free that can help you influence your load time. Try, for example, Google Page Speed. It will present you with a page speed score along with some useful suggestions on what and how you can improve the current site’s performance.
# Site architecture
We’ve already mentioned several best practices when it comes to site architecture in our previous article. So if you are our regular reader, you know that the optimal amount of internal links needed for shaping a balanced site architecture is at least three internal links, which lead to topically related pages, per page. And that site architecture drawing should resemble a pyramid…
There is also one super useful resource on our blog that is fully devoted to this topic: 9 Effective Site Architecture Tips for SEO
So, in a nutshell – the importance of a website’s architecture is vital because it defines a website’s depth and width.
During an SEO audit, you should be paying attention to the number / the sequence of clicks from your homepage to any important resource on your website and minimizing this count if possible.
Another aspect to keep an eye on is how closely the pages are connected to one another and if the most important pages are correctly prioritized in this hierarchy.
Some Advanced Indexation Hints
Are you curious how many pages with your domain name have already been indexed by Google?
There is a way to check this. Go to google.com and type in “site: yourwebsite.com.”
Google will respond with a full report. You will be provided with an exact number of indexed pages along with the full listing site by site. Specifically, this will enable you to see for yourself if Google is actually indexing the most important pages of your website and brand searches. Is the listing missing an important page? In the case that it is, perform an individual search for it; if it’s still nowhere to be found, the reason might be a penalty.
Conclusion
It looks like we touched on all the basic indexation elements that you need to check during your first SEO audit. In case you missed the content overview please find it here: How to Perform Your First SEO Audit: Content Overview
We hope that this piece is going to help you get the job done!! Let me know what you think about this basic SEO audit process and get ready for the final part which will be published very soon.
Is my indexing overview lacking in some information? Anything? If you want to add something, include it in the comments below.