The robot.txt (http://www.robotstxt.org/) is a publicly available file and when used properly is a very good way to control what search engines crawl and what they don’t.
“Who cares. I’d rather watch the grass grow…”
Well, if you are using Volusion, then you may. Volusion has .asp pages that are sometimes tied to parameters (i.e. “?” and “&”) which are based on session/query stuff which, in turn, can generate a ton of URLs all with the same TITLE and META data. You will have lots of URLs all looking the same essentially. You will end up having quasi-duplicate content and the best policy, regardless of how you read into Google’s duplicate content policies, is to minimize as much of it as possible.
“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”- Google
Why leave it to guess work when you can finally control something yourself by writing some exclusion rules thereby giving Google more relevant content.
What now?
Dont’ freak out. Simply edit your robots.txt file in your SEO area (/admin/SEOFriendly.asp). Your goal here is to DISALLOW all* search engines from crawling these pages/patterns.
You can also add your googe_sitemap.asp to your robots.txt file and tell the Google com’n'get it (or submit your google_sitemap.asp via the webmaster tools).
Here’s the robots.txt I use.
Sitemap:http://www.YOURSITE.com/google_sitemap.asp
User-agent:*
Disallow: /cgi-bin/
Disallow: /AccountSettings.asp
Disallow: /Affiliate_info.asp
Disallow: /Affiliate_signup.asp
Disallow: /Affiliate_thankyou.asp
Disallow: /catalog_subscribe.asp
Disallow: /donate.asp
Disallow: /EmailaFriend.asp
Disallow: /Email_Me_When_Back_In_Stock.asp
Disallow: /FileUpload/TextObject.aspx
Disallow: /GiftOptions.asp
Disallow: /help.asp
Disallow: /Help_EmailBetterPrice.asp
Disallow: /Help_FreeShipping.asp
Disallow: /kb_results.asp
Disallow: /login_sendpass.asp
Disallow: /Login.asp
Disallow: /mailinglist_subscribe.asp
Disallow: /mailinglist_unsubscribe.asp
Disallow: /myaccount.asp
Disallow: /MyAccount.asp
Disallow: /OrderFinished.asp
Disallow: /one-page-checkout.asp
Disallow: /orders.asp
Disallow: /ProductDetails.asp
Disallow: /PhotoDetails.asp
Disallow: /PlaceOrder.asp
Disallow: /Returns.asp
Disallow: /Register.asp
Disallow: /Receipt.asp
Disallow: /SearchResults.asp
Disallow: /ShoppingCart.asp
Disallow: /shoppingcart.asp
Disallow: /Terms.asp
Disallow: /Terms_privacy.asp
Disallow: /Ticket_List.asp
Disallow: /Ticket_New.asp
Disallow: /TrackPackage.asp
Disallow: /WishList.asp
Note: As of this post, Volusion does not have a robots.txt file for both it’s SSL and regular layers. Only one, so you are not able to write a special one for https. It’s not terribly common for this to happen but searching for site:www.yoursite.com always brings up some interesting things.
40 Comments (Leave a Reply)
I have Google flooding my checkout and it seems to be effecting my sales. During my slow periods I will look in my abandoned carts and see google adding 1 item about 10 times a minute every minute. This just started happening a few weeks ago.
This…
Disallow: /*.asp$
…works really good. You can then allow the aboutus.asp copy through. You might as well be even safer by adding an Allow declaration just for that page and any others. Some of my clients have lots of keyword rich aboutus and affiliate pages.
If both pages are indexed, then just DISALLOW /product-p/ in your robots.txt and remove the URL from your Google Webmaster Tools – a two prong approach, yes!
Thank you for the knowledgeable response!
The reason for wanting a 301 for a page that allready exists is if there are multiple URLs that will take you to the exact same page.
In volusion, this url: http://www.labsafetydeals.com/product-p/665514595.htm
and this url (SEO Optimized): http://www.labsafetydeals.com/3M-8271-RESPIRATOR-PARTICULATEP95-10-BX-p/665514595.htm
both take you to the exact same page. Google will index both pages. In order to avoid a duplicate content penalty, I would 301 the first URL to the 2nd URL. I would repeat this for every /product-p/ page untill they are completely dropped from the index.
Once again thanks! You have a great site loaded with usefull information!
Ok, here is the short and long of it.
First, if you only want a certain “pattern” of URLs to be crawled by the famous Google bots or other search engine bots, then you need to declare that in your robots.txt file. In one of my blog posts there was one user that had great luck with writing wildcard patterns – see http://www.schawel.com/volusion-robotstxt-file/2009/05/21/. You can add Disallow: /product-p/ and Disallow: /category-s/ if you like.
Second, you have already signed up for Google webmaster tools. You can use Google webmaster tools to identify areas that need to be taken out of the index. You can ask Google to remove URL. Cool huh?
Lastly, that is correct. You can not create 301s for things that already exist. But, why would you?
In conclusion, it sounds like you have URLs out there that have been indexed with somethinglike.com/ProductDetails.asp?ProductCode=etc.etc. or the /product-p/ URL. My advice to you is to make sure all your Volusion ProductNameShort fields are filled out properly (which it looks like you do) and then use declarations like “Disallow: /ProductDetails.asp” in your robots.txt file. THEN I would go through Google Webmaster tools after a week and look at your report. Remove URLs you don’t want to be there or simply monitor it and wait for it all to go away.
Great site! There are very limited Volusion SEO development resources available and I feel that Volusion support is really limited as far as SEO knowledge.
I only want URLs that are SEO optimized to appear in the Google index. I never want to see a default dynamically generated Volusion url appear. My site was indexed before I could add the SEO urls, and now I am afraid of duplicate content. Would you recommend adding
Disallow: /product-p/
Disallow: /category-s/
to the robot.txt file so that the only way a url will be crawled or indexed is if SEO is applied to the URL? That way the default product-p or category-s pages will never be displayed. I only want a URL that has been SEO optimized to be able to appear. I never again want to see:
http://www.labsafetydeals.com/product-p/665514595.htm
I want to see the SEO URL version of the same page:
http://www.labsafetydeals.com/3M-8271-RESPIRATOR-PARTICULATEP95-10-BX-p/665514595.htm
The crazy thing is you can not create a 301 for a page that exists. You can only create a 301 for a page that does not exist!! If I had the full ability to create a 301, I would 301 all of the dynamic urls to the seo friendly ones. Volusion tech support replied telling me it is not possible! I find that really limiting for an SEO mass URL rewrite campaign.
Thanks!!
You still have your A HREF on your logo pointing to default.asp. I would recommend removing that.
I am also a volusion store owner who has had canonicolization issues with my URLS. For isntance, volusion decided to create a /default.asp for your home page. Google found both of those pages, and my rankings have plumited.
I have since redicted everything to
http://www.yourecigarette.com
and have completely removed all of the unneccessary .asp stuff. I inserted this line of code into my livefile editor between my head tags as per instruction on another blog today and am eager to see if it fixes the issue:
http://www.convergent7.com/blog/volusion/how-to-use-the-canonical-tag-in-volusion/
Oh good. I’m glad you are reading.
Thank you so much for the post! I just began using the Volusion software last week, and I am in need of as much information as I can get. Your blog is extremely helpful, thank you for such good articles!!!
Ah! You are the 5th person to contact me about this issue. Currently, there is no checkbox or Where clause for the URL in the Volusion API. Open a ticket with Volusion. I have told everyone to do this. I will be posting on the forum soon as well.
Two roads:
1. Remove the disallow and let them crawl it. There is a good chance that it will not show up in the index (regular search results) unless someone blogs your dynamic URL. Also, the duplicate content might not be that bad and considered secondary content.
2. Don’t remove it and wait for the issue to resolve within the Volusion workplace.
I am currently working with GoDataFeed on a custom solution. I can’t say when it will be done.
KMS,
I’m back again. I just received an email from Google Base this afternoon about it not being able to crawl certain pages of our site. I’ve gotta fix this by February 28th or they will remove.
I generated a new Google Base file just to take a look at it and it is using the ProductDetails.asp?ProductCode= URL instead of the good Google Friendly URL’s.
I am generating the Base file using the Volusion API to generate it.
Is there another way you suggest to generate the base file to use the SEO friendly URL’s? Base draws a good amount of traffic for us and we don’t want to be dinged.
Or do I just remove the Disallow: /ProductDetails.asp and leave it be?
Thanks Again,
KMS,
Hmmm I was just able to do so. But other than changing the site map URL at the top of the robots.txt file it is exactly the same as the one posted above.
I just looked through Webmaster Tools and it isn’t showing up as being restricted by the robots.txt file. It had been there up until sometime last week I think.
I guess it’s okay now though, if anything changes I’ll reply again.
Thanks,
Ouch, sorry for the late reply. I just came off of large project. I can’t bring up http://www.shopvsc.com/robots.txt. Is there anything I should know?
Hello,
I’m using the example you have posted above and I’m having some pages being blocked by the robots.txt file that I don’t want blocked. For example:
http://www.shopvsc.com/Toner-SMT-Series-8-Port-Multi-Taps-p/ton-smt108-32.htm
http://www.shopvsc.com/Toner-TGT-Seriers-8-Port-Taps-p/ton-tgt8-14.htm
I saw these and a few more items that are in that category (more Toner products) that are getting blocked in Google Webmaster Tools.
Do you have any ideas?
Thanks!
we moved the pdf files to a different directory and so far the wild card pdf disallow is working for all engines, we’re so happy with the shortcut that based on v0olusion stats this is what we are doing
User-Agent: *
Disallow: /cgi-bin/
Disallow: /*.pdf$
Disallow: /*.asp$
Disallow: /*.aspx$
Disallow: /*.cgi$
Disallow: /*.css$
Disallow: /*.js$
Disallow: /admin/
Disallow: /fileupload/
Disallow: /net/
Allow: /
Try adding a declaration.
User-agent: Googlebot
Disallow: /*.pdf$
I could really help you if you tell me how your .pdfs are organized. Are they in different directories or all in one directory?
Ran into a problem where pdf files are being scraped off my site into pdf search engines, defeats the purpose of using those files to bring people to our site so we ran across this command
Disallow: /*.pdf$
Disallow: /*.cgi$
Disallow: /*.asp$
and wondered what you thought of short cutting the robot.txt file in this manner – does it work?
I usually build an index file for the articles and stick it in the footer so the bots can grab it and run through it easier. Why do you have these lines in your robots.txt?
Disallow: /pindex.asp*
Disallow: /cindex.asp*
Great posting about Volusion quirks. I have a problem with my volusion cart and I would like to know if you experience the same thing. I am having major problems getting Google to index any of the content articles within Volusion. Like http://www.zipinstallation.com/Articles.asp?ID=210 , I have checked the robots file and I have zero articles excluded. In Google webmaster tools it says that I have 482 pages not being indexed because of robots.txt exclusion. I have verified many times that the volusion robot file doesn’t exclude the articles. Are your articles getting indexed? If yes, what category are your articles in withon volusion category selection? Appreciate any help
Sorry for the late reply. Submit an email through my form. Thanks.
how to use the
I’m not quite following your thread, but I just got done using if_homepage and it works like this. If you place this
in your html template, everything within the if_homepage div will only appear on your home page. But, if you’re looking to style if_homepage, it doesn’t exist in the final pages. Volusion must only use it when the page gets parsed. If you want to style it, you’ll need to wrap your homepage content in another div.
The if_not_homepage div works the same way. Why volusion chose to do it this way, god only knows, but it’s better than using javascript to do it. Now if I could just figure out a decent menu system that doesn’t rely on javascript…
Real quick, why did you opt for domain marketing without www?
Don-
Thanks for the Volusion robots.txt sample file. I’ve adapted mine to include yours, and for now, am leaving other weird stuff that has shown up in my Webmaster Tools results. Checking the site:uncommonscents.com command and other results in GWT, I’ve had a big problem with duplicate indexing in Google of my homepage resulting in decreased home page PageRank (down fron PR5 to PR3): http://www.uncommonscents.com
https://www.uncommonscents.com
http://uncommonscents.com/default.asp
http://uncommonscents.com
etc., etc.
My domain is simply uncommonscents.com (without the www.). Do you know a safe way to employ the canonical tag in Volusion to send all iterations of my home page URL to uncommonscents.com? I’ve added it to my META tags and it seems to be helping, but I’m having to disallow indexing of some .asp pages that I would otherwise want indexed (cindex and pindex) to avoid Duplicate Title and Description tags… Ideally the canonical tag would only exist on all iterations of my home page, but Volusion seems to not allow that. I’ve seen the discussion in the forums about the “IF_HOMEPAGE” and “IF_NOT_HOMEPAGE” tags, but I think they can only be used in the .css template which probably doesn’t help.