Help - Search - Members - Calendar
Full Version: Bounce Rate /robots.txt
MonsterSmallBusiness Forums > MonsterCommerce > MC Plug-ins & Add-ons > Urchin Site Statistics
elgarble
Hi-
Hope someone can help.

In reviewing Urchin Bounce Rate, I see that there is a bounce rate of 50-60% for the page /robots.txt.

For instance, yesterday, we had 110 hits from robots. Is the /robots.txt page the first page the spiders hit. If yes, does that mean 60% of the spiders just leave after /robots.txt and don't crawl deeper?

The text on our /robots.txt is as follows:

User-agent: *
allow: /

Disallow: /admin/
Disallow: /affiliatewiz/
Disallow: /eproducts/
Disallow: /themes/

Is this correct? Or is there something we need to change so that the search robots don't leave?
Thank You!
OneRing
I just noticed that we have exactly the same situation that you described, a high bounce rate on robots.txt. I ran our URL through the robots.txt validator at Webmasterworld and it returned 500+ errors and 500+ warnings. All of the errors returned are 'INVALID LINE' and refer to the html code in our site. My robots.txt is the MC default just like yours. Can anyone shed some light on this?
Josh@strapworks
Here may be a good place to do some reading.

: http://www.searchengineworld.com/robots/robots_tutorial.htm :

It describes some simple concepts and important terms that will shed some light.
Barbiro
My robots.txt file is the same as yours. When I ran it thru the validator, it returns an error and/or warning for just about every line of text at the site:

We're sorry, this robots.txt does NOT validate.
Warnings Detected: 501
Errors Detected: 626

monstersmile.gif, can you shed some light on this for us please? John, thank you for that link. happy.gif
Barbiro
I just noticed something different about our robots.txt files. Each of ours says:

allow: /

On that link John posted, it says that's not necessary. I also checked monstersmile.gif's robots.txt file (both the main page and also the monstersmile.gif marketplace.com site) and they don't include the "allow/" on their robots.txt file either; neither does Microsoft on theirs.

monstersmile.gif, would that make a big difference?
Josh@strapworks
well , I have some hunches, first the robot.txt document is correct but depending on the content inside for directorys : /admin/, /affiliatewiz/, /eproducts/, /themes/ the spiders that run a query agains your site may be correctly blocking the content , but that would not explain such a large number of errors, 500+ unless your have a ton on content in those directorys


Second, I took a peek at Barbiro's Web page to possibly explain what is occurring
: http://www.giftbargains4u.com/

and nowhere is in the source code for the page, is the meta tag for Robots

example : <META NAME="Robots" CONTENT="All">

The same is for OneRing's page at http://www.discounttirezone.com

No meta tag for robots either.

Perhaps adding this tags may resolve your issue, if not can either of you please reply w/ the errors exact as possible and where or what was used to create this.


Josh@strapworks
The biggest difference between our site's and say Microsofts's is that our web pages are hosted inside of moster's servers for example : monstersmile.gif/strapworks.com or monstersmile.gif/discounttirezone.com btw these names are incorrect I know.

when a spider checks the / (root directory) it may have write permission which can lead to into a security issue example microsoft/robots.txt as apposed to monstersmile.gif/discounttirezone.com/robots.txt

I could be wrong but, that is what I think



Barbiro
Good observation Josh. However; adding that meta tag is not an option with the monstersmile.gif cart. A large percentage of high ranking sites don't have that in their source code, so I think maybe that doesn't affect it. I may be wrong, but let's see what monstersmile.gif says.

This SEO stuff is a never-ending learning experience. blinking.gif
Ignignokt
QUOTE(Barbiro @ Nov 11 2004, 07:49 PM)
Second, I took a peek at Barbiro's Web page to possibly explain what is occurring
: http://www.giftbargains4u.com/

and nowhere is in the source code for the page, is the meta tag for Robots

example : <META NAME="Robots" CONTENT="All">



THe only time a robots meta tag is necessary is when you want to disallow them on a page

such as
<meta name="robots" content="noindex,nofollow">

robots = all is useless and wastes space
Barbiro
Art,

Meaning the:

allow: /

shouldn't even be on there?
Ignignokt
QUOTE(Barbiro @ Nov 12 2004, 10:38 AM)
Art,

Meaning the:

allow: /

shouldn't even be on there?
[right][snapback]58315[/snapback][/right]



No,

I'm talking about the robots meta tag you can put in your pages.

Lets all look at a properly formatted robots.txt file shall we? This one courtesy of buy dot com:

# robots.txt for http://www.buy.com

User-agent: *
Disallow: /basket/
Disallow: /retail/
Disallow: /corp/
Disallow: /clickfrom/
Disallow: /linksys-covad/
Disallow: /faqs/
Disallow: /support/
Disallow: /visa/
Disallow: /theaters/
Disallow: /buytv/
Barbiro
Exactly my point. Our robots file presently states:

User-agent: *
allow: /

Disallow: /admin/
Disallow: /affiliatewiz/
Disallow: /eproducts/
Disallow: /themes/

but should -- instead -- reflect this:

# robots.txt for [site name]

User-agent: *
Disallow: /admin/
Disallow: /affiliatewiz/
Disallow: /eproducts/
Disallow: /themes/

That's what I was trying to point out. That ours includes the text "allow: /" and the examples I've seen on the various sites (including monstersmile.gif's) do not.

Question: As some of our images reside in the "themes" directory, why should that one be disallowed?
Ignignokt
I see what your saying now.


?? I don't think it hurts, but i dont know for sure.
Barbiro
Would someone from monstersmile.gif please provide us with some feedback on this?
TNTGram
monstersmile.gif gurus, are you out there?? I'd like to hear thoughts on this as well.
Barbiro
*bump*
TNTGram
Guess no one from monstersmile.gif has read this?????
Barbiro
It's Friday night....I guess the staff monitors the forum less often. I'll keep bumping this up until we get some monstersmile.gif feedback. I'm sure curious about an answer.
TNTGram
Me 2!
Barbiro
Guys, look at this post regarding this same issue:

Robots.txt file

We may have found the solution, although we're still waiting for monstersmile.gif's feedback.
yourfamilylegacy
Has changing your robots.txt file to remove the "allow: /" portion helped your bounce rate?
Ben N
And here's a Goodmorning whistling.gif *bump* whistling.gif
bookmark
I don't understand this issue at all.

Are we supposed to create a new robots.txt file?

Is the one that is currently on our sites incorrect?

Or is that what the question is?

Barbiro
Bookmark,

I don't know if the one I had before was incorrect or not, but a few of us ran our robots.txt file thru the robots.txt validator and the results were an enormous amount of error messages. I guess the easier we make it for the crawlers to pick up, the better. Give yours a shot:

Robots.txt Validator

After editing mine and submitting thru the validator again, now it comes out squeaky clean:

Here are my results:

http status: 200 OK
Syntax check robots.txt on http://www.giftbargains4u.com/robots.txt (161 bytes)
Line Severity Code No errors detected! This Robots.txt validates to the robots exclusion standard!
bookmark
Mine had over 1000 errors.

I'm just a bit skeptical because I often run things through validators that turn up errors that aren't really errors.
newageweb
Hi, I tried to respond last night, but my internet connection decided to take a coffee break. Prior to that, I've just been going nuts over here. Moving about 100 web sites over to my new and improved web server.

Anyrate, you pretty much just want to list the directories that you don't want spidered. You could also list individual bots and tell them to not access directories or your entire web site.

At one point (and I'm sure they are still very much out there) I had a site that was getting slammed by psbot. (Google it, they run some sort of image site.) So I told their bot to stay off my entire site with:

User-agent: psbot
Disallow: /

What you have here is good Barbiro. My only concern was in seeing this.
Disallow: /eproducts/

Normally that is a bad idea. But monstersmile.gif apparently blocks viewing of the files. Then again, just putting (even a blank .html page) will stop listing.

"Directory Listing Denied
This Virtual Directory does not allow contents to be listed."

I have no idea how good the security is, but I can tell you that the first thing hackers will do is request the robots.txt file to see where the good and otten paid files are.

Hey, if you don't want them showing up in Google etc. they must be good, right? Not always, but you they could be. When they see things like /products, /download, /ebooks etc. they may have hit gold. I got a $100 product once for free with the authors blessing after I first heard about this sort of SE exploit for a product I was interested in... sure enough, his stuff was right there for many to grab for free. worry.gif Suffice to say he fixed that hole.

But you need to remember that Google etc. will not even list your directories unless you have a link pointing into them in the first place. Of course you need to consider that someone else might link into your download directories, but a noindex tag on that page should be enough. I would suggest using scripts to block access to download pages anyway though.

Here is a simple checker:
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

Yea, this makes no sense.
User-agent: *
allow: /

That site says "there is no allow" and I have to agree.
It should just say

User-agent: *
Disallow: /some-dir

Which says every bot can spider the entire site, oh, except for some-dir.
Captain
I forwarded this thread to Dan, our SEO guru. He will comment shortly. happy.gif
Monsterwebpromotion.com
Hi Guys,

Great posts, you are correct in that the allow line is not needed anymore in the robots files. There were different standards that came out (go figure) when the use of robots.txt became popular. From what we have seen the allow does not have any negative impact.

We have not replaced existing robots.txt files on sites because as you know, you are free edit the file and we don't want to overwrite any possible changes opening up something you may have disallowed to the engines.

If you would like to edit this text file you can simply in the file manager and change to:

User-agent: *
Disallow: /admin/
Disallow: /affiliatewiz/
Disallow: /eproducts/
Disallow: /themes/

For your statistics I wouldn't be surprised to see that bounce rate as many crawlers will hit this file to see if there is anything new they should exclude or any major changes to the site.


Good Luck!
bookmark
Dan -

Can you possible look at my site and tell me what I'm doing wrong?

I uploaded what you wrote, and I'm still getting over 1000 errors.

www.bookmarki.com

Thank you!

Patti
newageweb
QUOTE(bookmark @ Nov 24 2004, 02:42 PM)

I uploaded what you wrote, and I'm still getting over 1000 errors.
[right][snapback]60138[/snapback][/right]


Patti, what you have is fine. What checker shows there are errors? There are not any... and how could there be a 1,000. There are not even that many letters in the file.
bookmark
So I must just be misinterpreting the results.

Thank you, John!

Monsterwebpromotion.com
Hi Guys,

Yes I would recommend this checker:

http://www.searchengineworld.com/cgi-bin/robotcheck.cgi

It shows no errors. Good job Patti!
Ben N
Wholly mackrel!
It shows 200 errors and a TON of warnings for our page.

Dan,
Do you have any idea as to what is wrong?
Thx,
Ben N
newageweb
QUOTE(Ben N @ Nov 24 2004, 05:36 PM)
Wholly mackrel!
It shows 200 errors and a TON of warnings for our page.

Dan,
Do you have any idea as to what is wrong?
Thx,
Ben N
[right][snapback]60162[/snapback][/right]

Guys, check your robots.txt file... NOT the front page of your web site! whistling.gif

http://www.xyz.com/robots.txt

That must be what you are doing. That will produce a TON of errors.
Ben, just get rid of this line and you will be all set.

allow: /
Ben N
Where do I find that line?
I'm always afraid of screwing things up. cry.gif
Thx,
Ben
bookmark
QUOTE(newageweb @ Nov 24 2004, 03:41 PM)
QUOTE(Ben N @ Nov 24 2004, 05:36 PM)
Wholly mackrel!
It shows 200 errors and a TON of warnings for our page.

Dan,
Do you have any idea as to what is wrong?
Thx,
Ben N
[right][snapback]60162[/snapback][/right]

Guys, check your robots.txt file... NOT the front page of your web site! whistling.gif

http://www.xyz.com/robots.txt

That must be what you are doing. That will produce a TON of errors.
Ben, just get rid of this line and you will be all set.

allow: /
[right][snapback]60165[/snapback][/right]


Duh!

That's what I did wrong.

bookmark
QUOTE(Ben N @ Nov 24 2004, 03:55 PM)
Where do I find that line?
I'm always afraid of screwing things up.  cry.gif
Thx,
Ben
[right][snapback]60167[/snapback][/right]


http://www.aawsales.com/robots.txt
Ben N
QUOTE(bookmark @ Nov 24 2004, 05:04 PM)
QUOTE(Ben N @ Nov 24 2004, 03:55 PM)
Where do I find that line?
I'm always afraid of screwing things up.  cry.gif
Thx,
Ben
[right][snapback]60167[/snapback][/right]


http://www.aawsales.com/robots.txt
[right][snapback]60169[/snapback][/right]



Thank you!
I THINK I just straightened it out.
I downloaded the page, took out the line, then reloaded it to our site...
Ben N
cyork
Even though I read this tread back in Nov. I'm just now starting to catch on a bit...
Here's what our robots.txt file looks like:

User-agent: *
allow: /

Disallow: /admin/
Disallow: /affiliatewiz/
Disallow: /eproducts/
Disallow: /themes/

I see that we need to take out the 'allow: /' but here's a question... somebody suggested that our first disallow should be:
Disallow: /images/

Could I trouble a few of you pros to comment on this? Also, I guess I have to call tech support, because when I'm in File Manger, I try to make the change and save it, but no luck, it keeps reverting back to the above. How dense am I? blinking.gif
Thanks for the input!
Cookie
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.