Digital Marketing & SEO Illustration Novomotus SEO

Preventing Unwanted Pages Being Indexed by Google

Sometimes you’ll find that pages on your website get indexed by Google that you don’t necessarily want to get indexed. This is usually seen in CMS-based websites like WordPress which automatically generate a lot of subpages and directories automatically. These pages won’t likely get your site penalized, but sometimes they can rank above other content for brand-related searches, effectively clogging up your site’s presence in the SERPs. There’s a few quick tips to prevent this from happening, and to clean things up if they’ve already happened.

Cost Effective Solutions

I find WordPress to offer one of the easiest and most cost-effective means of building websites. It offers user-friendly interfaces, powerful extensibility, and a strong base of user and developer support. One of the downsides however, is that it requires some rather mundane customization out of the box before being well-suited for most businesses. The upside is, these are all quick customizations that don’t involve any coding and can quickly turn your website into an SEO optimized treasure. WordPress accommodates a broad range of uses and therefore is often seen to have a bit of bloat. One such example of this is the automatically generated Author archives, date-based archives, sub-page category archives, and tag-based archives.

You wouldn’t want the archive of last May’s posts to outrank the front page of your latest blog posts—but it happens a lot

All of these will be indexed resources by Google unless otherwise specified. This can result in duplicate content issues as well as annoyingly cluttered results for brand-related searchers. For example, you wouldn’t want the archive of last May’s posts to outrank the front page of your latest blog posts—but it happens a lot! If you’re using WordPress, plugins like Yoast SEO offer great free solutions for quickly specifying a ‘no-index’ status for certain archives and directories, or even specific posts and pages. If you’re using other solutions, some simple additions to your robots.txt file can help solve your problem as well. Below are some examples in which these methods are leveraged.

WordPress/Yoast Solution

One of the easiest ways to get around this is by using a WordPress Solution for your website. Unless circumstance dictates otherwise, I frequently find that WordPress meets the needs of most small businesses with the exception of some eCommerce websites with deep categorical structures. WordPress makes life easy and cost-effective but you still have to be wary of certain caveats that come up during its use. WordPress has a core designed for multi-purpose use which was birthed out of the world of personal blogging. Features like author archives, tag-based archives, and sub-pages of archives are often unwanted additions to your website. These don’t get in the way of anything critical per-se, but add bloat and clutter from in the perspectives of webcrawlers. Yoast SEO allows for single click disabling archives and no-indexing of unwanted WordPress functionality. In addition, Yoast also allows for a robots.txt file to be quickly created and edited without the use of ftp. Below you’ll find a quick overview on implementing this solution by using the free Yoast SEO plugin for WordPress.

Creating & Editing Robots.txt

A Robots.txt file is used by web browsers to apply security and content-based rules to your content. This can help prevent users from accessing directories with private information, serve different content to users based on browser or referrer, and can tell search engines where your site maps are located and which types of content you don’t want indexed. In most websites this file is edited via ftp connection through a text editor like Notepad++. This works fine, but when you’re trying to hit the ground running sometimes learning another program’s interface is a daunting task. This is where Yoast steps in and offers a very helpful bit of extensibility. Once installed, the Yoast plugin will register an ‘SEO’ menue on the left-hand WordPress toolbar. Click the ‘Tools’ menu and you’ll be shown a screen that has a ‘File editor’ option—click that. Once you’ve found this screen you’ll want to click the ‘File-editor’ link which will bring you to yet another screen. See the below diagram for visual direction:

Yoast SEO Create Robots.txt

Here you’ll find one of two things; either you already have a robots.txt file and you see two text panels, or you don’t already have a robots.txt file and you see a ‘Create Robots.txt File’ button with an .htaccess editor panel below. If you see the ‘Create Robots.txt File’ button, click that and an editable text area will pop up to allow for quick addition of any rules you need. Below you can see an example of what this screen should look like:

Yoast SEO Edit Robots.txt File & .Htaccess

With Google’s ever-deepening sense of awareness of websites, I generally find that robots.txt rules aren’t of much necessity on WordPress-based websites anymore. By using free plugins such as the Yoast SEO solution, you can disable entire archive sections such as date-based and tag-based archiving, and even specify on individual pages whether or not you want search engines to index them. For the vast majority of websites, this solution handles 90% of the work for 5% of the effort. To learn more about how and when to use robots.txt rules, I suggest you check out this article by Yoast himself which offers a little further insight on the matter. He makes a strong case for avoiding much attention to the robots.txt file at all, citing the potential to block crucial resources used by certain plugins and other site features. I tend to agree that this is suitable for the vast majority of cases involving small and medium sized business websites without large amounts of complexity.

Built-In No-Indexing Function

One of the conveniences that Yoast’s SEO plugin offers is the ability to specify the index status of individual posts and pages by using their advanced meta box, as well as specify other certain useful robots.txt additions. This functionality is disabled by default, but can be enabled by going to the ‘SEO’ menu, clicking the ‘Dashboard’ option, clicking the ‘Security’ tab, and enabling the ‘Advanced part of the Yoast SEO meta box’ option. To enable the ability to control other advanced settings as well, click over to the ‘Features’ tab and enable the ‘Advanced Settings Pages’ option. which will give you a new menu item in the SEO menu bar to the left, and will allow for the additional formatting of permalinks, redirecting attachments to parent urls, removing stop words in post slugs, and controlling some basic RSS features. Below is an illustration of how to enable these two features:

Yoast How to Enable Page Level Indexing

On the Yoast ‘SEO’ menu, click the ‘Titles & Metas’ option to bring up the control panel that will allow the one-click disabling of certain WordPress archive functions. Once there, click the ‘Archives’ tab on the top menu of this screen and you can disable Author-based archives and Date-based archives to keep them from being indexed. If you have a single author site (most small businesses) your author archive would be the exact same as your blog index page that lists all your latest posts. This can be regarded as duplicate content, but really just adds clutter to your site’s presence in branded searches. Below is an illustration of how to disable these options:

Yoast Disable Author & Date Archives

You can browse further through the ‘Post Types’ and ‘Taxonomies’ for other such archive-based functionality to disable. This is very useful when you add certain types of plugins such as visual editors, newsletter managers, or portfolio plugins that create additional post types in your WordPress Database. By default, these newly created types will be added to your sitemap, and are included in Google’s indexing of your website. Each new post type will appear in the ‘Post Type’ menu here, with the option for disabling it. Once you’ve gone through these steps, you should have a much cleaner overall site structure in the eyes of Google, and also have the ability to control how webcrawlers act towards individual pages and posts. For each page and post on your site, a newly-accessible section of the Yoast SEO Meta box will be accessible by clicking the little gear icon, which can be seen below:

Yoast SEO noindex Single Page post

Closing Thoughts

WordPress isn’t always the best solution for a business, and isn’t the perfect SEO solution out-of-box either. With the addition of some simple free plugins such as Yoast SEO however, you can effectively control some very critical structural aspects of your wesbite. This can help de-clutter your site, avoid duplicate content, and make branded searches more likely to return your top pages. If you’re new to SEO, or are trying to revamp your business website to compete online, a WordPress + Yoast SEO combination can offer you deadly force with minimal investment. To get everything mentioned in this post set up on a new WordPress website, you’re looking at about 5 minutes of time with the ability to dynamically change things on the fly moving forward.