This is the Reason Why Google doesn’t crawl or index all your Website

Google doesn’t crawl or index all my pages

Although we index billions of webpages and are constantly working to increase the number of pages we include, we don’t guarantee that we’ll crawl all of the pages of a particular site. Google doesn’t crawl all the pages on the web, and we don’t index all the pages we crawl. It’s perfectly normal for not all the pages on a site to be indexed.

While we can’t guarantee that all pages of a site will consistently appear in our index, we do offer our guidelines for maintaining a Google-friendly site. Check the following:

    • Google may have indexed your page under a different version of your domain name. For example, if you receive a message that http://example.com is not indexed, make sure that you’ve also added http://www.example.com to your account (or vice versa), and check the data for that site.
    • If your site is new, it may not be in our index because we haven’t had a chance to crawl and index it yet. Read more about how to tell us about your site.
  • If your site used to be indexed, but no longer is, you can read more about possible reasons for this.

If your site isn’t appearing in Google search results, or it’s performing more poorly than it once did (and you believe that it does not violate our Webmaster Guidelines), you can ask Google to reconsider your site.

Request reconsideration of your site.

8 Reasons Why Google Not Indexed Your Site

The author’s view is entirely hers (not counting cases of hypnosis that are unlikely) and may not reflect my view.

I have just had to deal with some of the indexing issues experienced. The website I just built will be indexed all by itself? Or should we submit our Website to Search Console just once or Every time you create an article?

After digging deeper into the matter, I made the conclusion that I would write my experience so that others did not have to spend much time digging up the answer to the indexing issue.

All that means that if your site, or its parts, are not added to the Google index, then no one will find your content in search results.

Identifying Crawling Problems

Start an investigation by typing site:yoursite.com into the Google search bar.

Is the number of results returned according to the number of pages you own, give or receive your site? If there is a large gap in the number of results versus the actual number of pages, there may be problems in heaven. (Note: the amount given by Google is a crude number, not a fixed amount).

You can use the Quake SEO plugin to extract a list of URLs that Google has indexed. (Kieran Daly makes a brief how-to list in the Q & A section on this).

The first thing you should see is your Google Search Console dashboard. Forget all the other tools that are available for a while. If Google sees issues with your site, then that’s what you want to address first.

If there is a problem, the dashboard will display an error message. See below for an example.

I have no issues with my current site, so I have to look for someone else’s sample screenshot. Thanks in advance, Neil 🙂
Crawl Errors

The HTTP 404 status code is most likely the most likely to occur.
This means that any page to which the link is linked can not be found. Anything other than the 200 (and maybe 301) status code usually means something went wrong, and your site may not work as it should for your visitors.

Some great tools for checking your server headers are URIvalet.com, Screaming Frog SEO Spider, and Mozri Crawl Site (try free for full experience).

Fixing Errors in Crawling

Typically these kinds of issues are caused by one or more of the following reasons:

  1. Robots.txt – This text file which sits in the root of your website’s folder communicates a certain number of guidelines to search engine crawlers. For instance, if your robots.txt file has this line in it; User-agent: * Disallow: / it’s basically telling every crawler on the web to take a hike and not index ANY of your site’s content.
  2. .htaccess – This is an invisible file which also resides in your WWW or public_html folder. You can toggle visibility in most modern text editors and FTP clients. A badly configured htaccess can do nasty stuff like infinite loops, which will never let your site load.
  3. Meta tags – Make sure that the page(s) that’s not getting indexed doesn’t have these meta tags in the source code: <META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
  4. Sitemaps – Your sitemap isn’t updating for some reason, and you keep feeding the old/broken one in Webmaster Tools. Always check, after you have addressed the issues that were pointed out to you in the webmaster tools dashboard, that you’ve run a fresh sitemap and re-submit that.
  5. URL parameters – Within the Webmaster Tools there’s a section where you can set URL parameters which tells Google what dynamic links you do not want to get indexed. However, this comes with a warning from Google: “Incorrectly configuring parameters can result in pages from your site being dropped from our index, so we don’t recommend you use this tool unless necessary.”
  6. You don’t have enough Pagerank – Matt Cutts revealed in an interview with Eric Enge that the number of pages Google crawls is roughly proportional to your Pagerank.
  7. Connectivity or DNS issues – It might happen that for whatever reason Google’s spiders cannot reach your server when they try and crawl. Perhaps your host is doing maintenance on their network, or you’ve just moved your site to a new home, in which case the DNS delegation can stuff up the crawlers access.
  8. Inherited issues – You might have registered a domain which had a life before you. I’ve had a client who got a new domain (or so they thought) and did everything by the book. Wrote good content, nailed the on-page stuff, had a few nice incoming links, but Google refused to index them, even though it accepted their sitemap. After some investigating, it turned out that the domain was used several years before that, and part of a big linkspam farm. We had to file a reconsideration request with Google.

Some other obvious reasons that your site or pages might not get indexed is because they consist of scraped content, are involved with shady link farm tactics, or simply add 0 value to the web in Google’s opinion (think thin affiliate landing pages for example).

Does anyone have anything to add to this post? I think I’ve covered most of the indexation problems, but there’s always someone smarter in the room. (Especially here on Moz!)

One thought on “This is the Reason Why Google doesn’t crawl or index all your Website”

Leave a Reply

Your email address will not be published. Required fields are marked *