Freshmarketers.com
Become expert digital marketer with a detailed course right from the basics to advanced level.
Robots.txt file

What is a robots.txt file and why is it used?

What is robots.txt file?

A robots.txt file is a notepad file (.txt) which has a list of URLs of a particular website. There are some web pages that are important in a website and some webpages are not important from the viewpoint of crawling.

Example, your WordPress admin URL is not important to crawl since it is not shown on search engines but your blog posts are important.

But what happens is, when a crawler comes to your website, it scans all the pages related to the website including those which are not usually shown to the users hence wasting time on crawling unimportant pages and missing out the important pages such as blog posts.

Because the time (crawl budget) for every website is limited especially if the website is new, the crawl budget can be very less and it might not be sufficient time for the crawler to crawl all the pages.

Therefore, we want to save the crawler’s time by directing it to the most important pages by using the robots.txt file.

So you have to tell the crawler about it in its own language which is this file.

First of all, don’t get it confused with No index tag (block pages from indexing, so to not show on search engine results page).

URLs in the robots.txt file are visible on search results page. I have explained below how to check.

If the URLs present in robots.txt file is linked via other websites or places on the web. Bot can still crawl and index the URL following the links. Additionally, your media files such as images, videos, and audio files in robots.txt file can be linked by other web pages as well.

Robots.txt file is more like a guide, so it’s up to the crawler to follow the exact instructions or not.

Limitations of robots.txt files-

As I said before, this file is more like a guide than enforceable instructions and the URLs can still be crawled and indexed by bots. Any information that you would like to hide from users, need to be done by other methods.

The * User-agent here means that the instructions given in the file are intended to be applied for all the search engine bots like Googlebot, bingbot, Yahoo bot etc but all search engine bots might not follow the instructions in a similar way.

So you can also create separate instructions like here in the picture below.

How to create a robots.txt file?

Creating and writing rules in a robots.txt file is somewhat of a technical task which should be best left to a web developer. Though a robots.txt file gets created automatically if you are using a plugin in WordPress. You can edit, change the file as you like (explained below).

You can read about creating robots.txt file here.

How does it look?

Just type in – WebsiteURL/robots.txt

eg, https://www.facebook.com/robots.txt

Remember its robots.txt not robot.txt

You can use the same link to check for any website.

How to update the robots.txt file?

To check if your robots.txt file has been working properly use this link by Google.

You will be directed to Google Search Console. Here in the sheet, you can edit the syntax and it will automatically be submitted.

At the bottom of the page, you can see a box to test different URLs. You just need to complete the URL and click on TEST. It will show you if a certain webpage is ALLOWED or DISALLOWED.

I have also made a YouTube short for fun. Check it out here!

I hope you found this post helpful. I try to write about topics that are often disregarded but they are the stepping stones for your early success in digital marketing.

If you would like to get in touch, feel free to comment or reach out to me on LinkedIn and I have recently started to provide marketing content on YouTube as well. I am very excited about it so please check it out and let me know what you think!

If you are wondering how to get started in digital marketing, check out my post on career roadmap.

Happy learning 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *