Live @ SMX Advanced: Bot Herding

by jeremiah on June 3, 2008

Search spiders and bots are pretty stupid when the come to your web site. If you don’t guide them, they’ll generate duplicate content issues, miss important pages in favor of junk, not realize where existing content has moved to and have other problems. This session looks at some advanced techniques in herding bots, when IP delivery can be what hat and how search engines view cloaking issues today.

Moderator: Rand Fishkin, Co-Founder and CEO, SEOmoz

Q&A Moderator: Matt McGee

Speakers:

Adam Audette, Founder, AudetteMedia
Hamlet Batista, President, Nemedia S.A.
Nathan Buggia, Lead PM, Live Search Webmaster Center, Microsoft
Priyank Garg, Director Product Management, Yahoo! Search, Yahoo, Inc.
Michael Gray, President, Atlas Web Service
Evan Roseman, Software Engineer, Google
Stephan Spencer, Founder and President, Netconcepts

Tamar gave me this idea. Copy and paste from SMX Advance Itinerary.

1:33 The engines have an announcement at the end.

IMG_0460 Michael Grey – Why don’t people air condition their mailbox?

When you move into a new house you already spent your money, and it takes a while to move up to Central air.   But why don’t you air condition your mailbox.   Bot herding is like this.  You need to send your Pagerank to places in your site where the page rank needs to go.  Don’t send it to your contact us page.   Funnel it to the places that make the most for your business.

Deciding what to Sculpt out?

Who wants to rank for contact us or privacy policy.   Locations, not interesting, Bio’s sculpt them out, unless you are managing reputation.

How to sculpt?

Michael advocates:

  • No Follow — quick and easy
  • Javascript — bots don’t crawl it, but may change in the future.
  • Jump pages and redirect pages, form pages.  

Always use robots in conjunction with with other techniques.

Some people say this is something you shouldn’t have at the top of list.   Michael disagrees,  “Take care of your fires, then do it”  for new sites do it now.

1:42  — Nathan Buggia asked the crowd the if they have seen measurable increase in traffic from the bots?

Crowd says yes.

Adam Audette

Counter points to Michael greys presentation.  Adam has slowed down the use of nofollow except on overhead pages.  * arguments against.

  1. More control? : SEO’s don’t know enough to actually control the internal page rank. We don’t know how much a link is worth.   Don’t know how much it fluctuates.   
  2. It’s a distraction :  making great content is more important.  it can mask other issues. 
  3. Management headaches :  Rules, etc.  Gives you a big case of the mondays.
  4. Band-aid : not addressing the underlying causes.
  5. Where’s the user?  :  Lots of PR to float mediocre pagees.  More power to authority domains.  The rich get richer, and the poor get poorer
  6. Open to abuse :  You can think of all sorts of ways to abuse this.   not a question of how, but when.  Then how will the engines react.  Automated filtering of heavily nofollowed pages.  Way to focused on Google. Targets PageRank
  7. too focused on the engines.
  8. There is no Standard : the engines each have their own view of robots.

SEO is the balance between what is right for the user and what is right for the engines.   This may tip the scale to the engine. 

IMG_0461 Rand just gave a statistic that I totally didn’t catch.

Stephen Spencer 

Duplicate content and how to herd the bots away.

Duplicate content is rampant on blogs… you need to herd the bot to the canonical URL or permalink versus an excerpt or some other syndication.  Use a headshot or signature line to prevent hijacking of the  content. 

E-commerce sites also have rampant dups . Selectively append tracking codes.   Pagination creates many pages that sing the same song to the search engines.  Do lots of testing, because there are lots techniques for eliminating this problem. 

PagRank leakage –  If you think that Robots.txt disallow, you are probably leaking rank.  Stephen, wants you to URL re-write.   Stephen moves really fast and is taking about Regular Expression writing.  

Now we are taking about mod_rewrite rules.  Sorry to fast to capture.

Use rewrite rules versus the redirect directive.

Now he is talking about conditional redirects. We don’t encourage this, but I will remain silent.  Just use at your own risk.

Rand — cracks joke “how to get on matt cutt’s bad side.”

Hamlet batista -  White Hat Cloaking: Six Practical Applications

  • Cloaking is about intention.  
  • Weigh the risks vs the rewards
  • Ask permission
  • Cloaking vs. IP delivery.

When is it practical?

  • Content accessibility
  • Memberships
  • Site Structure improvements
  • Geolocation/ip delivery
  • Multivariate testing

Scenarios

  1. Proprietary CMS that is not SEO friendly :  You can fix problems that are very complex
  2. Flash Websites, Silverlight, or other rich media : present the text to the user
  3. Membership sites :  Snippets of content are shown to the engine, but the user still has to sign in.
  4. Sites that require massive site structure changes to improve index penetration
  5. Geolocation -  Robots seay this is ok.
  6. AB or multivareate testing.  

How do you cloak?   Do you think I should share this????  I don’t.  

Hamlet is being cut off by rand.

Priyank from Yahoo is making an announcement from the three engines.

Robots Exclusion Protocol.  What is the standard?  The search engines are working together on a standard and are proud to share that we are making it standard.   There is uniform functionality so you can count on the engines to respect.   You can read more at the Live Search Webmaster Blog:

http://blogs.msdn.com/
webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx

There are optional directives each engine supports, but these are limited to specific functionality within each engine.  

The engines wanted to launch this at the same time to help show consistency across the board.


Leave a Comment

Previous post:

Next post: