Within the considerable subject of SEO and its implementation, two main facets can be focussed upon. One is the creative element which covers aspects of SEO such as content creation, accessibility, and some parts of user experience (UX).
The second aspect is technical SEO, and, as its name suggests, it deals with much of the work you will not usually see such as coding, metadata, and the structure of the website. Technical SEO includes elements that ensure a website is indexed which is essential for it to appear in the search engines, and also for it to be crawled so that its ranking can be determined. The element within a website’s coding that ensures this happens is ‘robots.txt’ so let us look at this in more detail.
What Is ‘robots.txt’?
Before we go any further, we must clarify that robots.txt has nothing to do with Androids or any artificial intelligence tool. Instead, it is a few lines of text within the source code of a website that instruct search engines which pages of that website they can and cannot crawl. The robots.txt acts like a signpost showing the search engine bots to the correct pages and away from the wrong pages. It also lets the bots know how often they should return to crawl the page.
How Does ‘robots.txt’ Work?
The bots and spiders the search engines send out have two main functions. The first is to crawl the internet to discover content, and the second is to index that content so that the search engines can create search results whenever a user types in a search term.
As part of this process, search engine bots will find and follow links, and this journey from link to link can take in several websites. When the bot lands on a website, its first task will be to find the robots.txt, and when it does find it, the bot will read that file to ascertain which URLs within the website it should follow.
The syntax within a robots.txt file is relatively simple, with the two main words being ‘allow’ and ‘disallow’. As you have probably already worked out, for URLs (pages) that have ‘allow’, the bots will crawl and index, and those with ‘disallow’, will ignore. There will also be a ‘user-agent’ identified within the robots.txt, which is the specific bot that the directives apply to.
You can allow search engine bots to crawl your entire website by having the ‘allow’ directive in the robots.txt of your home page. However, if you wanted to exclude your ‘About Us’ and ‘Privacy Policy’ from being crawled, you could add robots.txt with ‘disallow’. You can even specify within a robots.txt a disallow directive for a specific content item, such as a blog post, to not be crawled whilst allowing all the other blog posts to be crawled.
The Importance Of ‘robots.txt’
One of the reasons robots.txt is used is the avoidance of overloading the resources of your website such as bandwidth as it means the bots will not need to access pages you do not want them to. This can be especially important on large websites with hundreds of pages.
Robots.txt is also used to exclude pages that are broken, under construction, or duplicated for testing purposes, none of which you want indexing as it can harm rankings. There will also be pages within your website that will do little to help your rankings and thus do not need indexing. Examples include login pages, legal pages, pages you wish to remain private, and pages with little to no content.
Another aspect of SEO where robots.txt is helpful is your ‘crawl budget’. This has nothing to do with a monetary budget but is instead the number of pages within your website that Google will crawl within a certain period. By removing unwanted pages using the ‘disallow’ directive, it means more of the pages you do want indexed will be crawled.
Creating A ‘robots.txt’ File
Creating a robots.txt file and adding it to your website is relatively easy, but as with any changes you make to your website, always create a backup in case you inadvertently make an error. There are four steps you should follow.
Step 1: Determine Which Pages You Want To Include/Exclude: Also use a tool like Google Search Console or Screaming Frog to identify those pages on your website that might not be relevant to your target audience.
Step 2: Create A ‘robots.txt’ File: For a WordPress website, the simplest way to create your robots.txt file is using a plugin. The alternative is to create a text file using a text editor such as Notepad and then add this text to the root directory of your website.
Step 3: Add Directives To The ‘robots.txt’ File: Once you have created the robots.txt file you can add specific directives to instruct search engine bots. Use the ‘user-agent’ directive to specify which bot the instructions apply to, and the ‘allow’ or ’disallow’ directives as needed.
Step 4: Test Your ‘robots.txt’ File: After you have created your robots.txt file, you should test it using Google Search Console. This will ensure that search engine bots are correctly following the instructions in the robots.txt file.