« Blogs on doing better searches? | Home | A third of bloggers don't think they are responsible for their own posts »

Wildcards in robots.txt

Words by Daniel Aleksandersen on 2007-04-20

A list ofsearch engine crawlers that support wildcards, and the Allow: extension in robots.txt configuration files.

The example below will block /public_information/file.html?hidden_value=something, but will allow crawling of other pages in the /public_information/ directory.

# Wildcase example blocking ?hidden_value in
# the /public_information/ directory.

User-Agent: *
Disallow: /public_information/*?hidden_value

The wollowing engines support the above:

  • Googlebot (Google)
  • msnbot (Live Search)
  • Slurp (Yahoo! Search)
  • teoma (Ask!)

All the major search engines' crawlers does indeed support wildcards! And all of the above support the Allow: extension as well.

2007

Copyright © Daniel Aleksandersen – Licensed under GNU FDL

Words: Daniel Aleksandersen on 2007-04-20 at @628.
Meta: Comments | PermaLink | TrackBack
Social: Add to del.icio.us | Sphere it
Tags: |

Related entries

Leave your comment




Find entries


Recent entries
News feed icon Get a free subscription to new entries in the The Web Design Journal!
Reader participation
Share your thoughs, and knowledge with other readers by submitting your own comments!
License

Creative Commons License
The blog entry Wildcards in robots.txt is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. More legal notes.
This blog is compatible with mobile browsers.
Home | Senior Cell Phone | California Amusement Park | Professional SEO Consultant | Legal Buds | San Diego Casino |
© 2006 - 2008