There is an interesting thread at Hacker News on DuckDuckGo being toyed by Google. That part honestly doesn't interest me as much as the core SEO topic in the thread.
Google's
Matt Cutts is very active on Hacker News and he questioned Gabriel
Weinberg, the founder of DuckDuckGo, about the DuckDuckGo spiders and
crawlers. There are some folks asking if DuckDuckGo's spider, aka DuckDuckBot, respectes the robots.txt directives.
Some
noticed DuckDuckGo crawling under the IP range they own but not
declaring the useragent and thus not respecting the robots directives
set by the webmaster.
Matt Cutts asked Gabriel:
Gabriel,
does DuckDuckGo's crawler have a distinct user agent? Can you talk more
about how DuckDuckGo observes/respects robots.txt?
I
emailed Gabriel and he explained that in this case, they are only
checking for parked domains. He wrote, "what they're seeing there is
not a crawler but a parked domain checker." He added, "it doesn't
crawl through a site. In fact, it only checks the front page." When I
questioned why they can't do this using the DuckDuckBot useragent, he
said, "some parked domain networks show different things based on the
user agent, and we want to find out what is really shown to the user."
He added also:
We
don't believe it needs to be identified as anything else as it only
makes one request very infrequently and doesn't index any information.
No comments:
Post a Comment