[Techtalk] website link checking that does orphans?

Akkana Peck akkana at shallowsky.com
Sun Apr 12 21:09:39 UTC 2015

Hi, all --

I've inherited a website that's a complete mess -- I'm sure
there are tons of orphans there, files that aren't linked to by
anything, and I'd like to clean them up.

But I'm totally striking out in finding a link checker that will
tell me about not just broken links, but also orphans.

I found linklint, which has a -orphan flag; but it can only check
for orphans in a local directory, not with the -http flag. When
I try using it, I get huge numbers of orphans because one of the
directories it checks is .. -- in other words, it checks all files
in my entire filesystem to see whether they're referenced by files
in my web directory.

Also, since it doesn't go through the web server, it will totally
miss anything referenced from PHP.

I need something where I can give it a URL, say,
http://localhost/index.html, plus a directory, say, /var/www/htdocs
or ~/public_html, and have it start at the URL, go through and
spider everything accessible from there, and give me a report on
broken links on the website, plus orphaned files in the directory
that aren't accessed from the website. Bonus points if I can control
whether it reports on broken links from external websites, or only
broken links within localhost.

This seems like such a basic need, and something that would be so
simple to write, that I'm flabbergasted that I can't find anything
to do it. But I've been googling for an hour (and this isn't the
first time I've tried looking for something like this) and I haven't
found anything that works.


More information about the Techtalk mailing list