Volusion Robots.txt File

May 21st, 2009

Tags:

The robot.txt (http://www.robotstxt.org/) is a publicly available file and when used properly is a very good way to control what search engines crawl and what they don’t.

“Who cares. I’d rather watch the grass grow…”

Well, if you are using Volusion, then you may. Volusion has .asp pages that are sometimes tied to parameters (i.e. “?” and “&”) which are based on session/query stuff which, in turn, can generate a ton of URLs all with the same TITLE and META data. You will have lots of URLs all looking the same essentially. You will end up having quasi-duplicate content and the best policy, regardless of how you read into Google’s duplicate content policies, is to minimize as much of it as possible.

“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”- Google

Why leave it to guess work when you can finally control something yourself by writing some exclusion rules thereby giving Google more relevant content.

What now?

Dont’ freak out. Simply edit your robots.txt file in your SEO area (/admin/SEOFriendly.asp). Your goal here is to DISALLOW all* search engines from crawling these pages/patterns.

You can also add your googe_sitemap.asp to your robots.txt file and tell the Google com’n'get it (or submit your google_sitemap.asp via the webmaster tools).

Here’s the robots.txt I use.

Sitemap:http://www.YOURSITE.com/google_sitemap.asp

User-agent:*
Disallow: /cgi-bin/
Disallow: /AccountSettings.asp
Disallow: /Affiliate_info.asp
Disallow: /Affiliate_signup.asp
Disallow: /Affiliate_thankyou.asp
Disallow: /catalog_subscribe.asp
Disallow: /donate.asp
Disallow: /EmailaFriend.asp
Disallow: /Email_Me_When_Back_In_Stock.asp
Disallow: /FileUpload/TextObject.aspx
Disallow: /GiftOptions.asp
Disallow: /help.asp
Disallow: /Help_EmailBetterPrice.asp
Disallow: /Help_FreeShipping.asp
Disallow: /kb_results.asp
Disallow: /login_sendpass.asp
Disallow: /Login.asp
Disallow: /mailinglist_subscribe.asp
Disallow: /mailinglist_unsubscribe.asp
Disallow: /myaccount.asp
Disallow: /MyAccount.asp
Disallow: /OrderFinished.asp
Disallow: /one-page-checkout.asp
Disallow: /orders.asp
Disallow: /ProductDetails.asp
Disallow: /PhotoDetails.asp
Disallow: /PlaceOrder.asp
Disallow: /Returns.asp
Disallow: /Register.asp
Disallow: /Receipt.asp
Disallow: /SearchResults.asp
Disallow: /ShoppingCart.asp
Disallow: /shoppingcart.asp
Disallow: /Terms.asp
Disallow: /Terms_privacy.asp
Disallow: /Ticket_List.asp
Disallow: /Ticket_New.asp
Disallow: /TrackPackage.asp
Disallow: /WishList.asp

Note: As of this post, Volusion does not have a robots.txt file for both it’s SSL and regular layers. Only one, so you are not able to write a special one for https. It’s not terribly common for this to happen but searching for site:www.yoursite.com always brings up some interesting things.

40 Comments (Leave a Reply)

  1. Symbom Plush and Cloth Toys (August 12, 2011)

    Pretty nice article. I like it.

  2. John Deszell (December 9, 2010)

    Schawel,

    I did exactly what you suggested back on the 24th of November and I’m still seeing search results being indexed.

    Any other suggestions?

  3. schawel (November 24, 2010)

    Try to explicitly call out the bot.

    User-Agent: Googlebot
    Disallow: /SearchResults.asp

    Please note that having a disallow in your robots.txt file does not 100% guarantee it will be heeded. For hard core control make a rule in the .htaccess file

  4. John Deszell (November 24, 2010)

    I’m back once again. After doing some scans with WebCEO software, I saw that our search results pages are still being indexed by Google. Is this a bad thing?

    http://www.google.com/#q=site:http://www.shopvsc.com/SearchResults.asp%3Fmfg%3DMonivision&hl=en&prmd=iv&ei=6A_sTIjvPJSknQeqqKz3AQ&start=10&sa=N&filter=0&fp=b69933d19f024e3b

    Here is a link to our robots file: http://www.shopvsc.com/robots.txt

    It’s the same as your file besides us letting them crawl ProductDetails.asp because we were having issues with Google Base before not seeing our products.

    Thanks

  5. schawel (October 8, 2010)

    WP is a real CMS. If you are thinkging lots of article, think outside the box…or outside Volusion. Use Volusion as the powerful product selling engine it is. Also the backlink from your WP installation to the ecomm site will be worth something when you start to develop some street cred.

    One thing to remember is (currently) Google treats a subdomain as a brand new domain meaning it does not use the years you have been online with the ecomm www site in its math. Is there anyway you can buy an old domain while still adhering to your brand/language/site name?

  6. David (October 8, 2010)

    Based on these two assumptions (below) our SEO strategy is to create and optimize articles more than product pages. Actually we are thinking that maybe adding a WP blog on articles.domain.com for this reason would be a better strategy, than using volusion.

    First: Historically Google prefers content sites vs eCommerce. Second: we may change products or manufacturers down the lane, but the keyword that describes the product is going to stay the same.

  7. schawel (October 8, 2010)

    What sort of articles are you adding? Perhaps I can give you some guidance.

  8. David (October 8, 2010)

    I guess our best option is to manually create an index page with the list of articles and post the link to that page in the footer, b/c we plan to ad a lot of articles.

    Thank s again.

  9. schawel (October 8, 2010)

    “Automated” being the keyword here – there isn’t one. Volusion has a somewhat antiquated way of dealing with content. There is not a true CMS for articles also articles are mixed in with other content blocks in the articles Table that are not necessarily article worthy. My advice is to just create an aritcle index which has all the articles you want on it and then add that to the footer as an index of sorts.

    Another way of course is to use the native knowledge-base for all your articles (which is fairly automated) put a link to it in the footer, and then you would have to rewrite your robots file to crawl those but they are not the most friendly of URLs.

  10. David (October 8, 2010)

    This was a very helpful post, thank you!

    Question: Volusion automatically puts products and categories in to the site map, but not the articles. Do you know an automated method do include articles too?

    

  11. schawel (May 31, 2010)

    Just declare this..

    Disallow: /ProductDetails.asp

    You don’t have to write anything after it. This tells bots to stop crawling this page and any parameters that are passed along with it.

    BTW – Why do you still have http://www.timelesswroughtiron.com/product-p/stratford-foyer-table-902561.htm

    You should have a link title in there. Don’t use product-p. Read the Volusion manual for more info.

  12. Ryan Hansen (May 30, 2010)

    Thank you for this website and the great content. I recently moved my web store to Volusion for better SEO, during the first few days I was becoming skeptical but I’m now hopeful.
    I have an issue that I don’t think is mentioned on here. My product have multiple tabs; description, technical specs, and extended info. Apparently each tab is considered a new page and Google has started to index them, consequently declaring duplicate Meta Info on all of my products. Is there a way to stop this from happening?
    Here is an example of product where Google has indexed the Product Tabs:
    1. http://www.timelesswroughtiron.com/ProductDetails.asp?ProductCode=STRATFORD-FOYER-TABLE-902561&Show=ExtInfo
    2. http://www.timelesswroughtiron.com/ProductDetails.asp?ProductCode=STRATFORD-FOYER-TABLE-902561&Show=TechSpecs
    3. http://www.timelesswroughtiron.com/ProductDetails.asp?ProductCode=KNOT-SIDE-TABLE-900323
    4. http://www.timelesswroughtiron.com/product-p/stratford-foyer-table-902561.htm (this is the proper product url)
    I would like to stop the indexing of all the “/productDetails.asp?ProductCode=” I’m assuming I would just enter “Disallow: /productDetails.asp?ProductCode=” in the robots.txt but I would like some confirmation/clarification.

  13. schawel (May 4, 2010)

    You could either (or both) explicity call out Googlebot instead of using the wildcard or you can use Google Webmaster Tools and remove that URL from the crawl. Let me know how that works for you.

    Resource: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449

  14. Steams (May 4, 2010)

    I wasn’t saying that I switched to your txt file and now this is happening. I have used this with success for a while now. It seems all the sudden in my abandon carts google bot ip address: 66.249.67.154 and 66.249.67.154 are adding 1 item at a time for large chunks out of each day.

  15. schawel (May 2, 2010)

    This begs the question, how do you know it’s Google? What exactly are you looking at (e.g. Analytics, Volusion, Webmaster tools) that gives you this information?

Leave a Reply

(* required)
( * required - will not be published)