FutureQuest, Inc. FutureQuest, Inc. FutureQuest, Inc.

FutureQuest, Inc.
Go Back   FutureQuest Community > General Site Owner Support (All may read/respond) > Site Promotion
User Name
Password  Lost PW

Reply
 
Thread Tools Search this Thread Display Modes
Old 09-12-2008, 11:02 AM   Postid: 170103
Andilinks
Registered User

Forum Notability:
338 pts: An Honor To Be Around
[Post Feedback]
 
Join Date: Apr 2002
Location: San Antonio, Texas
Posts: 7,204
phantom duplicate urls found by Google

This question is important only because Google traffic is important, and is only important to people who derive income or traffic from Google referrals.

Is there any way to get a legitimate url followed by a forward slash "/" to return a 404 instead of a version of the same page without css?

In a rational world I would pose this question to Google, but they have turned rationality on its head by antrhopomorphizing its dumb search engine. The modifier "dumb" is not a cute put-down, it is a serious assessment of its current behavior. Yes, I could somehow pose this question to Matt Cutts in a blog comment and probably will try soon if no one here has an answer to the question above.

Google has notified me that it has found duplicate pages on my site and then lists the same url twice, once with a fwd slash appended and once without. I have not yet been penalized for this nonsense but that will be coming if I don't figure out what to do. As an experiment I did change the name of one page and used a 301 redirect but the redirect also handles the slash appended url so gbot will still see it as two pages.

Last edited by Andilinks : 09-12-2008 at 11:13 AM.
Andilinks is offline   Reply With Quote
Old 09-12-2008, 01:20 PM   Postid: 170107
Juan G
Site Owner
 
Juan G's Avatar

Forum Notability:
108 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Apr 2000
Location: European Union
Posts: 280
Re: phantom duplicate urls found by Google

Quote:
Originally Posted by Andilinks View Post
Is there any way to get a legitimate url followed by a forward slash "/" to return a 404 instead of a version of the same page without css?
Instead of example.css:

<link rel="stylesheet" type="text/css" href="example.css" />

something to try would be /example.css:

<link rel="stylesheet" type="text/css" href="/example.css" />

Quote:
Originally Posted by Andilinks View Post
Google has notified me that it has found duplicate pages on my site and then lists the same url twice, once with a fwd slash appended and once without. I have not yet been penalized for this nonsense but that will be coming if I don't figure out what to do. As an experiment I did change the name of one page and used a 301 redirect but the redirect also handles the slash appended url so gbot will still see it as two pages.
The Apache server adds the trailing slash to directories (see the thread Mod Rewrite to remove end slash). Probably you are referring to files with and without trailing slash, which probably need a redirect (excluding directories) to only one of the two versions. That should work.
Juan G is offline   Reply With Quote
Old 09-12-2008, 01:54 PM   Postid: 170109
Andilinks
Registered User

Forum Notability:
338 pts: An Honor To Be Around
[Post Feedback]
 
Join Date: Apr 2002
Location: San Antonio, Texas
Posts: 7,204
Re: phantom duplicate urls found by Google

Thanks for the reply Juan, I think this may be the answer. There is just one thing, about the code in that expired thread:


Code:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)(\.php|\.html)/$  /$1$2 [R=301,L]
Running experiments with mod_rewrite code on a live website has always ended badly for me so I'd like to confirm that the following changes are in order before I send it to the server:

Code:
RewriteCond %{REQUEST_myfile.shtm} !-d
RewriteRule ^(.+)(\.php|\.shtm)/$  /$1$2 [R=301,L]
Andilinks is offline   Reply With Quote
Old 09-12-2008, 07:43 PM   Postid: 170121
Juan G
Site Owner
 
Juan G's Avatar

Forum Notability:
108 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Apr 2000
Location: European Union
Posts: 280
Re: phantom duplicate urls found by Google

Well, the following are just possible, untested ideas.

If you only want to remove a possible trailing slash (excepting for directories), I think you can use:

Code:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [R=301,L]
Or, if you also wish to remove "www." from all URLs (another possibility for duplicates):

Code:
RewriteEngine on

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)(/?)$ http://example.com/$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.*)(/?)$ http://example.com/$1/ [R=301,L]
If you had IRMs, probably adding this line (or other solution) would be necessary before each of those two RewriteRule:

Code:
RewriteCond %{HTTP_HOST} ^(www\.)?example\.com$ [NC]
Of course, write your domain instead of example.

Since these are just untested suggestions, maybe others will indicate mistakes or better ways. Also, different situations (for example IROs, etc.) require different solutions. You may see Apache 1.3 URL Rewriting Guide (FQuest is currently using that Apache version, I think).
Juan G is offline   Reply With Quote
Old 09-12-2008, 08:05 PM   Postid: 170122
Andilinks
Registered User

Forum Notability:
338 pts: An Honor To Be Around
[Post Feedback]
 
Join Date: Apr 2002
Location: San Antonio, Texas
Posts: 7,204
Re: phantom duplicate urls found by Google

Quote:
If you only want to remove a possible trailing slash (excepting for directories), I think you can use:


Code:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [R=301,L]
OK, this is what I want to do, my question is, do I replace "{REQUEST_FILENAME}" with {REQUEST_myfile.shtm} or leave it as is? I know it must be obvious, it's just not obvious to me. :)

Since it's untested I'll wait until 3 am to try it.

Thanks.
Andilinks is offline   Reply With Quote
Old 09-12-2008, 08:19 PM   Postid: 170125
Juan G
Site Owner
 
Juan G's Avatar

Forum Notability:
108 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Apr 2000
Location: European Union
Posts: 280
Re: phantom duplicate urls found by Google

About REQUEST_FILENAME, leave it as is:

Quote:
REQUEST_FILENAME
The full local filesystem path to the file or script matching the request.

(Apache module mod_rewrite)
Good luck! :)
Juan G is offline   Reply With Quote
Old 09-12-2008, 08:20 PM   Postid: 170126
Andilinks
Registered User

Forum Notability:
338 pts: An Honor To Be Around
[Post Feedback]
 
Join Date: Apr 2002
Location: San Antonio, Texas
Posts: 7,204
Re: phantom duplicate urls found by Google

OK, I'll try it tonight and report back here in the morning. Thanks again!
Andilinks is offline   Reply With Quote
Old 09-13-2008, 09:39 AM   Postid: 170138
Andilinks
Registered User

Forum Notability:
338 pts: An Honor To Be Around
[Post Feedback]
 
Join Date: Apr 2002
Location: San Antonio, Texas
Posts: 7,204
Re: phantom duplicate urls found by Google

It works perfectly, thanks. Now we'll just have to wait to see how long it takes Google to notice. Again, I'll report that event here when it happens.

The notice of duplicate title tags on pages identical except for the trailing slash has been on my Webmaster Tools Content analysis page for over a week—at first I just ignored them thinking it was just a ridiculous error. But as I've learned the hard way, ridiculous errors by Google can be very costly for me in dollar amounts that are trivial for Google but devastating for me.
Andilinks is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 visitors)
 

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -4. The time now is 12:23 AM.


Running on vBulletin®
Copyright © 2000 - 2019, Jelsoft Enterprises Ltd.
Hosted & Administrated by FutureQuest, Inc.
Images & content copyright © 1998-2019 FutureQuest, Inc.
FutureQuest, Inc.