[Techtalk] website link checking that does orphans?

jim jim at well.com
Tue Apr 14 16:00:50 UTC 2015


I don't know what you mean by "absolute
links", can you define, please?
with thanks



On 04/14/2015 01:56 AM, Akkana Peck wrote:
> Miriam English writes:
>> Interesting problem. I checked back on my notes about linklint. I remember
>> it worked quite differently to how I expected and I really struggled with
>> it. In my notes I wrote to myself that it was "ridiculously difficult to
>> program".
> Thanks for the suggestions about using linklint.
>
> But looking through the suggestions, and thinking about how I'd have
> to wget the whole site with special arguments to save to a temporary
> directory every time I wanted to do a link check, I decided that if
> for some bizarre reason this tool didn't already exist, it should.
> So I wrote it.
>
> https://github.com/akkana/scripts/blob/master/weborphans
>
> The hard part turned out to be turning all links into absolute
> links, then turning those into equivalent paths on the local
> filesystem. That sounded easy but turned out to have a lot of tricky
> aspects (I'm still working on some edge cases).  But it's good
> enough that I was able to find the 10 bad links and 606 orphaned
> files on this website I inherited.
>
>          ...Akkana
> _______________________________________________
> Techtalk mailing list
> Techtalk at linuxchix.org
> http://mailman.linuxchix.org/mailman/listinfo/techtalk
>



More information about the Techtalk mailing list