How to Fix the Indexed Though Blocked by robots.txt Error

Updated: January 17, 2024
Author : Robben

Seeing the “Indexed Though Blocked by robots.txt” error in the Google Search Console can be confusing and concerning. This error indicates that pages on your site have been indexed by Google, but they are blocked from being crawled by your robots.txt file.

The good news is this issue can be easily remedied with a few simple steps. In this guide, we’ll cover what causes this robots.txt error, how to diagnose it, and the best ways to resolve it.

What Triggers the Indexed Though Blocked by robots.txt Error

There are two main reasons you may encounter the “Indexed Though Blocked by robots.txt” error:

Overly Restrictive robots.txt File

If your robots.txt file contains overly broad or restrictive crawl directives, it can inadvertently block Googlebot from crawling pages that have already been indexed. This most commonly occurs when the “Disallow” directive is misconfigured.

Diagnosing the Root Cause of the Error

To fix this error, you first need to analyze your robots.txt file to determine the specific root cause:

Review Your robots.txt Directives

Carefully check the “User-agent” and “Disallow” directives in your robots.txt file. Identify any disallow lines that may be overly broad or restrictive. Common issues include blocking entire directories or site sections inadvertently.

Compare this With the Cached Version of robots.txt

View a cached previous version of your robots.txt file in Google Search Console. Compare it with the current version to spot any recently added restrictions that could be triggering the error.

Fetch as Google

Use the Fetch as a Google tool to test the crawling of affected pages with your current robots.txt. This will confirm if they are still blocked from access.

Google Search Console Coverage Report

Cross-reference blocked URLs listed under the error with Search Console’s coverage report. This further verifies the robots.txt blocks.

Fixing the “Indexed Though Blocked by robots.txt” Errors

Once you’ve diagnosed the specific issues, follow the below steps to fix the problem:

Remove Overly Broad Blocks

Edit your current robots.txt file to remove any unnecessary or overly broad blocking directives. Carefully define the restrictions.

Selectively Allow Blocked URLs

For URLs you want to remain indexed, selectively allow them by adding “Allow” directives for those paths in robots.txt.

Request URL Removal

For any blocked URLs you wish to remove from the search, request URL removal through Google Search Console.

Submit New Robots.txt File

Submit your updated robots.txt file in Search Console to confirm the fixes resolve your original crawl errors.

Request Re-Crawling

Request that Google re-crawl the affected URLs to restore access now that robots.txt allows them.

Following these steps will rectify any instances of “Indexed Though Blocked by robots.txt” errors by aligning your robots.txt directives with your desired pages to index and block.

Best Practices to Avoid Future Errors

Here are some best practices to avoid similar robots.txt errors going forward:

Carefully review robots.txt changes before uploading to confirm they allow access as intended.
Only block highly sensitive pages; avoid wide restrictions.
Don’t block CSS, JS, image files, etc. as that can break sites.
Always test robots.txt using Fetch as Google before implementing changes.
Enable Search Console’s robots.txt analysis tool to check for blocking issues.
Request indexing and crawl rate changes through Search Console rather than robots.txt blocks.
Review robots.txt frequently to remove outdated or unnecessary directives.

Adopting these recommendations will minimize the chances of encountering frustrating “Indexed Though Blocked” errors due to unintended robots.txt blocks in the future.

Conclusion

The “Indexed Though Blocked by robots.txt” error can happen to any website but is easily fixable. By analyzing your robots.txt directives, troubleshooting specific issues, removing overbroad blocks, allowing desired pages, and requesting recrawling, you can quickly resolve it.

Avoid future indexing hiccups by using robots.txt judiciously, testing changes, and enabling Search Console tools. With a refined crawl directive strategy, you can maintain access for Googlebot to successfully index your website.

Robben

Robben is a Technical Editor and Content writer at Host4Geeks at who began his writing career as a magazine editor for a Tech Magazine. He has been writing technology since early 2000s.

CPANEL HOSTING

RESELLER HOSTING

VPS HOSTING

DEDICATED SERVER

APPLICATION HOSTING

We are always here to provide you the best support 24x7