[Techtalk] website link checking that does orphans?

Akkana Peck akkana at shallowsky.com
Tue Apr 14 01:56:43 UTC 2015


Miriam English writes:
> Interesting problem. I checked back on my notes about linklint. I remember
> it worked quite differently to how I expected and I really struggled with
> it. In my notes I wrote to myself that it was "ridiculously difficult to
> program".

Thanks for the suggestions about using linklint.

But looking through the suggestions, and thinking about how I'd have
to wget the whole site with special arguments to save to a temporary
directory every time I wanted to do a link check, I decided that if
for some bizarre reason this tool didn't already exist, it should.
So I wrote it.

https://github.com/akkana/scripts/blob/master/weborphans

The hard part turned out to be turning all links into absolute
links, then turning those into equivalent paths on the local
filesystem. That sounded easy but turned out to have a lot of tricky
aspects (I'm still working on some edge cases).  But it's good
enough that I was able to find the 10 bad links and 606 orphaned
files on this website I inherited.

        ...Akkana


More information about the Techtalk mailing list