Indexing as many sub-domains as possible with the goal of boosting your search engine authority may seem like a great idea at face value. However, while your ranking for your choice of keywords may improve, there are scenarios in which your SEO may be harmed. In such scenarios, keeping certain sub-domains out of Google’s index is more prudent.
When excluding sub-domains from SERPs is a good idea
Prevent duplicate content – Duplicate content refers to content within or across domains and sub-domains that are appreciably similar or completely matches. If you have similar content across your sub-domains, then you are likely to cannibalize your traffic. If you duplicate content on other, more trusted sites, you are not going to rank high. Google may flag your sub-domains as thin content, and you could lose visibility on the SERPs due to the Panda algorithm. If you cannot have single versions of unique content on your sub-domains, you may want to de-index the duplicates.
Thank you page – A thank you page is the page that offers what is promised on the landing pages. You do not want your thank you pages appearing on the SERPs. You want visitors to get to your thank you pages after they have filled out a form on a landing page. It is important to capture your customers as leads first before they can access your offers.
De-indexing sub-domains from Search Engines
There are two ways to de-index your sub-domains.
- Add a “Nofollow” Meta tag or/and a “no index.”
Using a Meta tag is easy and efficient. A “no index” Meta tag allows Google to crawl a page but blocks it from indexing it. A “Nofollow” Meta tag, on the other hand, allows indexing but blocks Google from crawling links on the page. You can use them together or separately depending on your desired outcome.
- Add a Robots.txt file to your site
This gives you control over what bots index so you can proactively keep content out of the bots radar and out of the search results. A robots.txt file allows you to specify which content to block bots from, be it a single image or file, a single page or even a whole directory. If you have Google AdSense, you have the option to allow it to work while still preventing your site from being crawled.
It is advisable to implement the “disallow” command in the robots.txt file as opposed to the “no index” Meta tag. The “disallow” command stops the bot from going through the page entirely. Google still crawls sub-domains with a “no index” Meta tag wasting crawl bandwidth, which can be more effectively used to crawl pages that rank in the SERPs.
— Atomic Reach (@Atomic_Reach) 28 de abril de 2017