There is nothing worse than knowing Google is having a tough time
accessing your content unintentionally. Well, truth is, if you
robots.txt or nofollow your site out of Google by accident, I don't feel
bad for you. But if your server flakes out on you, then I do feel your
pain.
Yesterday, at SMX East, the great Gary Illyes from the Google search quality group, shared two tidbits that you may not have officially heard on-record from Google about crawl efficiency with GoogleBot.
Now,
you know that GoogleBot will play nice with your server. If they feel
crawling it too hard will hurt the server, they back off. But what
signals do they use for determining that? Google has never really
shared that information until yesterday.
They use (1) connection time and (2) server status codes.
If
Google sees it takes longer and longer to connect to a web page on your
domain between GoogleBots hops, it will figure, it should back off a
bit or stop crawling. If GoogleBot is served up HTTP server status
codes in the 5xx realm, it will also back off a bit or stop crawling.
Of course, it will try again later soon but the last thing Google wants
to do is take down your site for users.
So if I were you, I'd have reporting configured on (1) connection time and (2) 5xx server status codes.
Forum discussion at Google+.
No comments:
Post a Comment