E-commerce and Digital Design. Specializing in everything Volusion and making the most out of your online store.

Volusion Robots.txt File

May 21st, 2009. Written by schawel

The robot.txt (http://www.robotstxt.org/) is a publicly available file and when used properly is a very good way to control what search engines crawl and what they don’t.

“Who cares. I’d rather watch the grass grow…”

Well, if you are using Volusion, then you may. Volusion has .asp pages that are sometimes tied to parameters (i.e. “?” and “&”) which are based on session/query stuff which, in turn, can generate a ton of URLs all with the same TITLE and META data. You will have lots of URLs all looking the same essentially. You will end up having quasi-duplicate content and the best policy, regardless of how you read into Google’s duplicate content policies, is to minimize as much of it as possible.

“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”- Google

Why leave it to guess work when you can finally control something yourself by writing some exclusion rules thereby giving Google more relevant content.

What now?

Dont’ freak out. Simply edit your robots.txt file in your SEO area (/admin/SEOFriendly.asp). Your goal here is to DISALLOW all* search engines from crawling these pages/patterns.

You can also add your googe_sitemap.asp to your robots.txt file and tell the Google com’n'get it (or submit your google_sitemap.asp via the webmaster tools).

Here’s the robots.txt I use.

Sitemap:http://www.YOURSITE.com/google_sitemap.asp

User-agent:*
Disallow: /cgi-bin/
Disallow: /AccountSettings.asp
Disallow: /Affiliate_info.asp
Disallow: /Affiliate_signup.asp
Disallow: /Affiliate_thankyou.asp
Disallow: /catalog_subscribe.asp
Disallow: /donate.asp
Disallow: /EmailaFriend.asp
Disallow: /Email_Me_When_Back_In_Stock.asp
Disallow: /FileUpload/TextObject.aspx
Disallow: /GiftOptions.asp
Disallow: /help.asp
Disallow: /Help_EmailBetterPrice.asp
Disallow: /Help_FreeShipping.asp
Disallow: /kb_results.asp
Disallow: /login_sendpass.asp
Disallow: /Login.asp
Disallow: /mailinglist_subscribe.asp
Disallow: /mailinglist_unsubscribe.asp
Disallow: /myaccount.asp
Disallow: /MyAccount.asp
Disallow: /OrderFinished.asp
Disallow: /one-page-checkout.asp
Disallow: /orders.asp
Disallow: /ProductDetails.asp
Disallow: /PhotoDetails.asp
Disallow: /PlaceOrder.asp
Disallow: /Returns.asp
Disallow: /Register.asp
Disallow: /Receipt.asp
Disallow: /SearchResults.asp
Disallow: /ShoppingCart.asp
Disallow: /shoppingcart.asp
Disallow: /Terms.asp
Disallow: /Terms_privacy.asp
Disallow: /Ticket_List.asp
Disallow: /Ticket_New.asp
Disallow: /TrackPackage.asp
Disallow: /WishList.asp

Note: Volusion does not have a robots.txt file for both it’s SSL and regular layers. Only one, so you are not able to write a special one for https. It’s not terribly common for this to happen but searching for site:www.yoursite.com always brings up some interesting things.

Tags:

19 Responses to "Volusion Robots.txt File"

  1. Chuck says:

    Don-

    Thanks for the Volusion robots.txt sample file. I’ve adapted mine to include yours, and for now, am leaving other weird stuff that has shown up in my Webmaster Tools results. Checking the site:uncommonscents.com command and other results in GWT, I’ve had a big problem with duplicate indexing in Google of my homepage resulting in decreased home page PageRank (down fron PR5 to PR3): http://www.uncommonscents.com
    https://www.uncommonscents.com
    http://uncommonscents.com/default.asp
    http://uncommonscents.com
    etc., etc.
    My domain is simply uncommonscents.com (without the www.). Do you know a safe way to employ the canonical tag in Volusion to send all iterations of my home page URL to uncommonscents.com? I’ve added it to my META tags and it seems to be helping, but I’m having to disallow indexing of some .asp pages that I would otherwise want indexed (cindex and pindex) to avoid Duplicate Title and Description tags… Ideally the canonical tag would only exist on all iterations of my home page, but Volusion seems to not allow that. I’ve seen the discussion in the forums about the “IF_HOMEPAGE” and “IF_NOT_HOMEPAGE” tags, but I think they can only be used in the .css template which probably doesn’t help.

  2. kms says:

    Real quick, why did you opt for domain marketing without www?

  3. greg says:

    I’m not quite following your thread, but I just got done using if_homepage and it works like this. If you place this

    in your html template, everything within the if_homepage div will only appear on your home page. But, if you’re looking to style if_homepage, it doesn’t exist in the final pages. Volusion must only use it when the page gets parsed. If you want to style it, you’ll need to wrap your homepage content in another div.

    The if_not_homepage div works the same way. Why volusion chose to do it this way, god only knows, but it’s better than using javascript to do it. Now if I could just figure out a decent menu system that doesn’t rely on javascript…

  4. yummy says:

    how to use the “IF_HOMEPAGE” and “IF_NOT_HOMEPAGE” tags to avoid duplicate title tags in the cindex and pindex page.

  5. kms says:

    Sorry for the late reply. Submit an email through my form. Thanks.

  6. Chris Mauzy says:

    Great posting about Volusion quirks. I have a problem with my volusion cart and I would like to know if you experience the same thing. I am having major problems getting Google to index any of the content articles within Volusion. Like http://www.zipinstallation.com/Articles.asp?ID=210 , I have checked the robots file and I have zero articles excluded. In Google webmaster tools it says that I have 482 pages not being indexed because of robots.txt exclusion. I have verified many times that the volusion robot file doesn’t exclude the articles. Are your articles getting indexed? If yes, what category are your articles in withon volusion category selection? Appreciate any help

  7. kms says:

    I usually build an index file for the articles and stick it in the footer so the bots can grab it and run through it easier. Why do you have these lines in your robots.txt?

    Disallow: /pindex.asp*
    Disallow: /cindex.asp*

  8. Pi says:

    Ran into a problem where pdf files are being scraped off my site into pdf search engines, defeats the purpose of using those files to bring people to our site so we ran across this command

    Disallow: /*.pdf$
    Disallow: /*.cgi$
    Disallow: /*.asp$

    and wondered what you thought of short cutting the robot.txt file in this manner – does it work?

  9. kms says:

    Try adding a declaration.

    User-agent: Googlebot
    Disallow: /*.pdf$

    I could really help you if you tell me how your .pdfs are organized. Are they in different directories or all in one directory?

  10. Pi says:

    we moved the pdf files to a different directory and so far the wild card pdf disallow is working for all engines, we’re so happy with the shortcut that based on v0olusion stats this is what we are doing

    User-Agent: *
    Disallow: /cgi-bin/
    Disallow: /*.pdf$
    Disallow: /*.asp$
    Disallow: /*.aspx$
    Disallow: /*.cgi$
    Disallow: /*.css$
    Disallow: /*.js$
    Disallow: /admin/
    Disallow: /fileupload/
    Disallow: /net/
    Allow: /

  11. John Deszell says:

    Hello,

    I’m using the example you have posted above and I’m having some pages being blocked by the robots.txt file that I don’t want blocked. For example:

    http://www.shopvsc.com/Toner-SMT-Series-8-Port-Multi-Taps-p/ton-smt108-32.htm

    http://www.shopvsc.com/Toner-TGT-Seriers-8-Port-Taps-p/ton-tgt8-14.htm

    I saw these and a few more items that are in that category (more Toner products) that are getting blocked in Google Webmaster Tools.

    Do you have any ideas?

    Thanks!

  12. kms says:

    Ouch, sorry for the late reply. I just came off of large project. I can’t bring up http://www.shopvsc.com/robots.txt. Is there anything I should know?

  13. John Deszell says:

    KMS,

    Hmmm I was just able to do so. But other than changing the site map URL at the top of the robots.txt file it is exactly the same as the one posted above.

    I just looked through Webmaster Tools and it isn’t showing up as being restricted by the robots.txt file. It had been there up until sometime last week I think.

    I guess it’s okay now though, if anything changes I’ll reply again.

    Thanks,

  14. John Deszell says:

    KMS,

    I’m back again. I just received an email from Google Base this afternoon about it not being able to crawl certain pages of our site. I’ve gotta fix this by February 28th or they will remove.

    I generated a new Google Base file just to take a look at it and it is using the ProductDetails.asp?ProductCode= URL instead of the good Google Friendly URL’s.

    I am generating the Base file using the Volusion API to generate it.

    Is there another way you suggest to generate the base file to use the SEO friendly URL’s? Base draws a good amount of traffic for us and we don’t want to be dinged.

    Or do I just remove the Disallow: /ProductDetails.asp and leave it be?

    Thanks Again,

  15. kms says:

    Ah! You are the 5th person to contact me about this issue. Currently, there is no checkbox or Where clause for the URL in the Volusion API. Open a ticket with Volusion. I have told everyone to do this. I will be posting on the forum soon as well.

    Two roads:

    1. Remove the disallow and let them crawl it. There is a good chance that it will not show up in the index (regular search results) unless someone blogs your dynamic URL. Also, the duplicate content might not be that bad and considered secondary content.

    2. Don’t remove it and wait for the issue to resolve within the Volusion workplace.

    I am currently working with GoDataFeed on a custom solution. I can’t say when it will be done.

  16. Zach says:

    Thank you so much for the post! I just began using the Volusion software last week, and I am in need of as much information as I can get. Your blog is extremely helpful, thank you for such good articles!!!

  17. kms says:

    Oh good. I’m glad you are reading.

  18. Daniel says:

    I am also a volusion store owner who has had canonicolization issues with my URLS. For isntance, volusion decided to create a /default.asp for your home page. Google found both of those pages, and my rankings have plumited.

    I have since redicted everything to

    http://www.yourecigarette.com

    and have completely removed all of the unneccessary .asp stuff. I inserted this line of code into my livefile editor between my head tags as per instruction on another blog today and am eager to see if it fixes the issue:

    http://www.convergent7.com/blog/volusion/how-to-use-the-canonical-tag-in-volusion/

  19. kms says:

    You still have your A HREF on your logo pointing to default.asp. I would recommend removing that.

Leave a Reply