Importance of Robots.txt
Robots.txt is the name of a file that resides in the main directory on web systems. This file contains a series of codes that list the places on your site that different search engines should or should not access.
Systematically, all search engines enter the sites with certain bots and check the content of the sites in an algorithmic system partnership and thus add the content of the site on search engines. Site owners sometimes do not want search engines to access certain parts of the site and use the robots.txt file for this.
Robots.txt Correct Usage
Although it is important to add content on search engines, it is also desirable to close some parts of the site to these bots. However, if a wrong action is taken, more damage can be done to the site.
The first steps to take for a correct robots.txt;
- The file must be in the root directory
- Command lines must conform to UTF-8 encoding
- URL format must be set the same as the site
Robots.txt Syntax
The syntax can be described as a robots.txt “language”. Commonly used syntaxes;
- User-agent: The command by which instructions are given to Search Engines or bots.
- Disallow: Command to completely close the crawl. One can be entered for each url line.
- Allow: Command that allows only Google Bots.
- Crawl-delay: How many seconds after the page content is loaded. This code does not work on Google, you need to adjust the Google Console to edit it.
- Sitemap: Sitemap: The command that determines the url path of the site’s sitemap file.
Robots.txt Examples
You can direct multiple bots (spiders) as you wish with different codes you will add in the file. Some of these are;
Close yourself to all browsers:
User-agent: *
Disallow: /
Open yourself to all browsers:
User-agent: *
Disallow:
Close certain folders to Google bots:
User-agent: Googlebot
Disallow: /ornek-sayfa/
Close certain pages to Google bots:
User-agent: Googlebot
Disallow: /ornek-sayfa/sayfa.html
Let Google bots wait 120 seconds before crawling:
User-agent: Googlebot
Crawl-delay: 120