Importance of Robots.txt

18 June 2022 in Web Development

Robots.txt is the name of a file that resides in the main directory on web systems. This file contains a series of codes that list the places on your site that different search engines should or should not access.

Systematically, all search engines enter the sites with certain bots and check the content of the sites in an algorithmic system partnership and thus add the content of the site on search engines. Site owners sometimes do not want search engines to access certain parts of the site and use the robots.txt file for this.

Robots.txt Correct Usage

Although it is important to add content on search engines, it is also desirable to close some parts of the site to these bots. However, if a wrong action is taken, more damage can be done to the site.

The first steps to take for a correct robots.txt;

The file must be in the root directory
Command lines must conform to UTF-8 encoding
URL format must be set the same as the site

Robots.txt Syntax

The syntax can be described as a robots.txt “language”. Commonly used syntaxes;

User-agent: The command by which instructions are given to Search Engines or bots.
Disallow: Command to completely close the crawl. One can be entered for each url line.
Allow: Command that allows only Google Bots.
Crawl-delay: How many seconds after the page content is loaded. This code does not work on Google, you need to adjust the Google Console to edit it.
Sitemap: Sitemap: The command that determines the url path of the site’s sitemap file.

Robots.txt Examples

You can direct multiple bots (spiders) as you wish with different codes you will add in the file. Some of these are;

Close yourself to all browsers:
User-agent: *
Disallow: /

Open yourself to all browsers:
User-agent: *
Disallow:

Close certain folders to Google bots:
User-agent: Googlebot
Disallow: /ornek-sayfa/

Close certain pages to Google bots:
User-agent: Googlebot
Disallow: /ornek-sayfa/sayfa.html

Let Google bots wait 120 seconds before crawling:
User-agent: Googlebot
Crawl-delay: 120

Contact us

+90 850 346 40 46

Blog