Seeing the “Indexed Though Blocked by robots.txt” error in the Google Search Console can be confusing and concerning. This error indicates that pages on your site have been indexed by Google, but they are blocked from being crawled by your robots.txt file.
The good news is this issue can be easily remedied with a few simple steps. In this guide, we’ll cover what causes this robots.txt error, how to diagnose it, and the best ways to resolve it.
What Triggers the Indexed Though Blocked by robots.txt Error
There are two main reasons you may encounter the “Indexed Though Blocked by robots.txt” error:
Overly Restrictive robots.txt File
If your robots.txt file contains overly broad or restrictive crawl directives, it can inadvertently block Googlebot from crawling pages that have already been indexed. This most commonly occurs when the “Disallow” directive is misconfigured.
Recently Updated robots.txt File
Making changes to your robots.txt file can also trigger this error message. If you update your robots.txt to block certain pages, Google will still have those pages indexed but now cannot access them for future crawling.
So in both situations, pages made it into the Google index but are now retroactively blocked for further crawling by the current robots.txt file.
Diagnosing the Root Cause of the Error
To fix this error, you first need to analyze your robots.txt file to determine the specific root cause:
Review Your robots.txt Directives
Carefully check the “User-agent” and “Disallow” directives in your robots.txt file. Identify any disallow lines that may be overly broad or restrictive. Common issues include blocking entire directories or site sections inadvertently.
Compare this With the Cached Version of robots.txt
View a cached previous version of your robots.txt file in Google Search Console. Compare it with the current version to spot any recently added restrictions that could be triggering the error.
Fetch as Google
Use the Fetch as a Google tool to test the crawling of affected pages with your current robots.txt. This will confirm if they are still blocked from access.
Google Search Console Coverage Report
Cross-reference blocked URLs listed under the error with Search Console’s coverage report. This further verifies the robots.txt blocks.
Fixing the “Indexed Though Blocked by robots.txt” Errors
Once you’ve diagnosed the specific issues, follow the below steps to fix the problem:
Remove Overly Broad Blocks
Edit your current robots.txt file to remove any unnecessary or overly broad blocking directives. Carefully define the restrictions.
Selectively Allow Blocked URLs
For URLs you want to remain indexed, selectively allow them by adding “Allow” directives for those paths in robots.txt.
Request URL Removal
For any blocked URLs you wish to remove from the search, request URL removal through Google Search Console.
Submit New Robots.txt File
Submit your updated robots.txt file in Search Console to confirm the fixes resolve your original crawl errors.
Request that Google re-crawl the affected URLs to restore access now that robots.txt allows them.
Following these steps will rectify any instances of “Indexed Though Blocked by robots.txt” errors by aligning your robots.txt directives with your desired pages to index and block.
Best Practices to Avoid Future Errors
Here are some best practices to avoid similar robots.txt errors going forward:
- Carefully review robots.txt changes before uploading to confirm they allow access as intended.
- Only block highly sensitive pages; avoid wide restrictions.
- Don’t block CSS, JS, image files, etc. as that can break sites.
- Always test robots.txt using Fetch as Google before implementing changes.
- Enable Search Console’s robots.txt analysis tool to check for blocking issues.
- Request indexing and crawl rate changes through Search Console rather than robots.txt blocks.
- Review robots.txt frequently to remove outdated or unnecessary directives.
Adopting these recommendations will minimize the chances of encountering frustrating “Indexed Though Blocked” errors due to unintended robots.txt blocks in the future.
The “Indexed Though Blocked by robots.txt” error can happen to any website but is easily fixable. By analyzing your robots.txt directives, troubleshooting specific issues, removing overbroad blocks, allowing desired pages, and requesting recrawling, you can quickly resolve it.
Avoid future indexing hiccups by using robots.txt judiciously, testing changes, and enabling Search Console tools. With a refined crawl directive strategy, you can maintain access for Googlebot to successfully index your website.