Generated Site Map - not catching internal broken links to PHP pag

C

cmcmorrow

Hello,

I am using Visio's site mapping tool to generate a site map automatically of
my company's web site. For the most part, it is working extremely well, and
is very useful. However, I'm having one major problem, and that is that the
reports are not picking up some internal links that are broken. The site is
built in PHP, and the broken links are to other php pages within the site. I
do have PHP as one of the extensions to include in my drawing settings.

Visio has no problem catching broken paths to images, pdfs and other
documents on the site, as well as external links that are broken.

At first I thought it was because our .htaccess file on the server had a
redirect set up for 404 errors, to go to the home page. We removed that, and
re-ran the site maps, but Visio is still not catching the broken link. (could
there be a caching issue going on here, that's not letting us see the effects
of the change?)

To illustrate the issue . . . I ran the report for this directory:
http://greencampus.harvard.edu/rep/index.php.

I know that there is a broken link on this page:
http://greencampus.harvard.edu/rep/faq.php
If you do a find on the word "recognition", you'll find it's hyperlinked to
a nonexistent script: http://greencampus.harvard.edu/rep/limelight.php

However, Visio shows is as a normal functioning page on the site map.

Any ideas on what could be going on here?

Thanks!

-cmcmorrow
 
P

Paul Herber

Hello,

I am using Visio's site mapping tool to generate a site map automatically of
my company's web site. For the most part, it is working extremely well, and
is very useful. However, I'm having one major problem, and that is that the
reports are not picking up some internal links that are broken. The site is
built in PHP, and the broken links are to other php pages within the site. I
do have PHP as one of the extensions to include in my drawing settings.

Visio has no problem catching broken paths to images, pdfs and other
documents on the site, as well as external links that are broken.

At first I thought it was because our .htaccess file on the server had a
redirect set up for 404 errors, to go to the home page. We removed that, and
re-ran the site maps, but Visio is still not catching the broken link. (could
there be a caching issue going on here, that's not letting us see the effects
of the change?)

To illustrate the issue . . . I ran the report for this directory:
http://greencampus.harvard.edu/rep/index.php.

I know that there is a broken link on this page:
http://greencampus.harvard.edu/rep/faq.php
If you do a find on the word "recognition", you'll find it's hyperlinked to
a nonexistent script: http://greencampus.harvard.edu/rep/limelight.php

However, Visio shows is as a normal functioning page on the site map.

Any ideas on what could be going on here?

Your server is returning a text/html document containing the text "404
Page Not Found on http://greencampus.harvard.edu/" rather than an
error 404 status.
 
C

cmcmorrow

Paul Herber said:
Your server is returning a text/html document containing the text "404
Page Not Found on http://greencampus.harvard.edu/" rather than an
error 404 status.

Hi, thanks for your response. I wondered if something like this might be the
case, but I don't think so . . . The server is returning a 404 response. I
checked it out by displaying the HTTP request/response info using an
extension on my browser. Also the W3C link checker and other link checkers
I've tried return 404 status on this link.

But not Visio . . . Any thoughts?

Thanks!
 
P

Paul Herber

Hi, thanks for your response. I wondered if something like this might be the
case, but I don't think so . . . The server is returning a 404 response. I
checked it out by displaying the HTTP request/response info using an
extension on my browser. Also the W3C link checker and other link checkers
I've tried return 404 status on this link.

But not Visio . . . Any thoughts?

Right, I've checked in 4 different browsers, Mozilla, Opera, Netscape
and IE and a proper status 404 is only returned for IE, the others get
a text/html response.
So, I'll take a guess that the server is doing a browser check:

if IE then return 404
else wibble and break other browsers

so, as Visio is not IE then it gets back a text/html file, hence it's
a valid link.
 
C

cmcmorrow

Paul Herber said:
Right, I've checked in 4 different browsers, Mozilla, Opera, Netscape
and IE and a proper status 404 is only returned for IE, the others get
a text/html response.
So, I'll take a guess that the server is doing a browser check:

if IE then return 404
else wibble and break other browsers

so, as Visio is not IE then it gets back a text/html file, hence it's
a valid link.

That's odd - I was using the firebug extension in Firefox when I got the
404. So this is what we have in our htaccess file on the server . . .

ErrorDocument 404 "404 Page Not Found on http://greencampus.harvard.edu/

I would think that this is not overriding the 404 error response, but what
do you think?
 
P

Paul Herber

That's odd - I was using the firebug extension in Firefox when I got the
404. So this is what we have in our htaccess file on the server . . .

ErrorDocument 404 "404 Page Not Found on http://greencampus.harvard.edu/

I would think that this is not overriding the 404 error response, but what
do you think?

well, it's now returning "404 Page Not Found" as a text/html file so
that is probably the default for the whole server rather than your
virtual server for your subdomain.
It's still a text response.
It all boils down to what response Visio's site mapping tool gets and
what it does with it.
 
C

cmcmorrow

Paul,

Thanks for your time. I don't have Opera installed, but using both Firefox
and IE, when I go to this site:
http://www.webrankinfo.com/english/tools/server-header.php

and put in my nonexistent page's URL, I get a 404 error ... not just a
text/html response:

HTTP/1.1 404 Not Found
Date: Thu, 29 Mar 2007 18:33:21 GMT
Server: Apache/2.0.54
Vary: Accept-Encoding
Content-Length: 18
Connection: close
Content-Type: text/html; charset=iso-8859-1

Are you saying that in your browser you don't get the same thing?

I can talk to the person that administers our my server, but I need to have
some evidence that the URI is returning something other than a 404, and I
haven't been able to find that.

Thanks,

cmcmorrow
 
P

Paul Herber

I think you need some input from someone who knows more details about
the site map generator .....
 
C

cmcmorrow

Paul,

Thanks again for taking the time to try to figure this one out.

If anyone else has any experience or ideas about this issue, I would be most
appreciative.

Right now, I'm having to use a second tool to check the links on our site,
since I can't figure out what's causing Visio to not catch the broken
internal ones.

Thanks,

cmcmorrow
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top