The robot.txt (http://www.robotstxt.org/) is a publicly available file and when used properly is a very good way to control what search engines crawl and what they don’t.
“Who cares. I’d rather watch the grass grow…”
Well, if you are using Volusion, then you may. Volusion has .asp pages that are sometimes tied to parameters (i.e. “?” and “&”) which are based on session/query stuff which, in turn, can generate a ton of URLs all with the same TITLE and META data. You will have lots of URLs all looking the same essentially. You will end up having quasi-duplicate content and the best policy, regardless of how you read into Google’s duplicate content policies, is to minimize as much of it as possible.
“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”- Google
Why leave it to guess work when you can finally control something yourself by writing some exclusion rules thereby giving Google more relevant content.
What now?
Dont’ freak out. Simply edit your robots.txt file in your SEO area (/admin/SEOFriendly.asp). Your goal here is to DISALLOW all* search engines from crawling these pages/patterns.
You can also add your googe_sitemap.asp to your robots.txt file and tell the Google com’n'get it (or submit your google_sitemap.asp via the webmaster tools).
Here’s the robots.txt I use.
Sitemap:http://www.YOURSITE.com/google_sitemap.asp
User-agent:*
Disallow: /cgi-bin/
Disallow: /AccountSettings.asp
Disallow: /Affiliate_info.asp
Disallow: /Affiliate_signup.asp
Disallow: /Affiliate_thankyou.asp
Disallow: /catalog_subscribe.asp
Disallow: /donate.asp
Disallow: /EmailaFriend.asp
Disallow: /Email_Me_When_Back_In_Stock.asp
Disallow: /FileUpload/TextObject.aspx
Disallow: /GiftOptions.asp
Disallow: /help.asp
Disallow: /Help_EmailBetterPrice.asp
Disallow: /Help_FreeShipping.asp
Disallow: /kb_results.asp
Disallow: /login_sendpass.asp
Disallow: /Login.asp
Disallow: /mailinglist_subscribe.asp
Disallow: /mailinglist_unsubscribe.asp
Disallow: /myaccount.asp
Disallow: /MyAccount.asp
Disallow: /OrderFinished.asp
Disallow: /one-page-checkout.asp
Disallow: /orders.asp
Disallow: /ProductDetails.asp
Disallow: /PhotoDetails.asp
Disallow: /PlaceOrder.asp
Disallow: /Returns.asp
Disallow: /Register.asp
Disallow: /Receipt.asp
Disallow: /SearchResults.asp
Disallow: /ShoppingCart.asp
Disallow: /shoppingcart.asp
Disallow: /Terms.asp
Disallow: /Terms_privacy.asp
Disallow: /Ticket_List.asp
Disallow: /Ticket_New.asp
Disallow: /TrackPackage.asp
Disallow: /WishList.asp
Note: As of this post, Volusion does not have a robots.txt file for both it’s SSL and regular layers. Only one, so you are not able to write a special one for https. It’s not terribly common for this to happen but searching for site:www.yoursite.com always brings up some interesting things.
30 Comments (Leave a Reply)
Just declare this..
Disallow: /ProductDetails.aspYou don’t have to write anything after it. This tells bots to stop crawling this page and any parameters that are passed along with it.
BTW – Why do you still have http://www.timelesswroughtiron.com/product-p/stratford-foyer-table-902561.htm
You should have a link title in there. Don’t use product-p. Read the Volusion manual for more info.
Thank you for this website and the great content. I recently moved my web store to Volusion for better SEO, during the first few days I was becoming skeptical but I’m now hopeful.
I have an issue that I don’t think is mentioned on here. My product have multiple tabs; description, technical specs, and extended info. Apparently each tab is considered a new page and Google has started to index them, consequently declaring duplicate Meta Info on all of my products. Is there a way to stop this from happening?
Here is an example of product where Google has indexed the Product Tabs:
1. http://www.timelesswroughtiron.com/ProductDetails.asp?ProductCode=STRATFORD-FOYER-TABLE-902561&Show=ExtInfo
2. http://www.timelesswroughtiron.com/ProductDetails.asp?ProductCode=STRATFORD-FOYER-TABLE-902561&Show=TechSpecs
3. http://www.timelesswroughtiron.com/ProductDetails.asp?ProductCode=KNOT-SIDE-TABLE-900323
4. http://www.timelesswroughtiron.com/product-p/stratford-foyer-table-902561.htm (this is the proper product url)
I would like to stop the indexing of all the “/productDetails.asp?ProductCode=” I’m assuming I would just enter “Disallow: /productDetails.asp?ProductCode=” in the robots.txt but I would like some confirmation/clarification.
You could either (or both) explicity call out Googlebot instead of using the wildcard or you can use Google Webmaster Tools and remove that URL from the crawl. Let me know how that works for you.
Resource: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
I wasn’t saying that I switched to your txt file and now this is happening. I have used this with success for a while now. It seems all the sudden in my abandon carts google bot ip address: 66.249.67.154 and 66.249.67.154 are adding 1 item at a time for large chunks out of each day.
This begs the question, how do you know it’s Google? What exactly are you looking at (e.g. Analytics, Volusion, Webmaster tools) that gives you this information?