You can see, the robots.txt file is a fundamental aspect of SEO that, unfortunately, often does not receive adequate attention. The problem is that, although Google’s documentation provides valuable information, it does not cover all the details of how this file will be interpret. If we limit ourselves to that information alone, we run the risk of making mistakes that could have negative consequences in the future.
So, here are 10+1 concepts about
Keep in mind and assimilate. From the most basic to tips that we can only apply to complex websites or those with a lot of detail in Crawl Budget optimization.
What concepts should we know about robots.txt?
1. Where you place your file is more important than you think.
There is still confusion on this point. The robots.txt file was always search for in the “/robots.txt” path of the domain. The robots.txt affects the host where it is host but only that exact host. This means that a subdomain will not pay attention to linkedIn data the robots.txt of its parent host and that http and https use different files.
So why do we see sites where one configuration blocks another?
We will see this later, but it is mainly due to issues with hosts that are completely rirect with a 301. That is, when we see that the robots to only the percentage of also affects mydomain.com, it is usually because there is a rirection from.
Therefore, the same file is read for both hosts
The same happens with http and https, if one is rirect to the other, the same file is appli to both.
Furthermore, we must eliminate the belief that robots.txt acts as many think on specific folders of the site. This is not true: Google only reads it if it is in tg data the root of the document.