[Techtalk] website link checking that does orphans?

jim jim at well.com
Tue Apr 14 16:24:06 UTC 2015


had coffee, looked it up, got it.


On 04/14/2015 04:00 PM, jim wrote:
>
> I don't know what you mean by "absolute
> links", can you define, please?
> with thanks
>
>
>
> On 04/14/2015 01:56 AM, Akkana Peck wrote:
>> Miriam English writes:
>>> Interesting problem. I checked back on my notes about linklint. I 
>>> remember
>>> it worked quite differently to how I expected and I really struggled 
>>> with
>>> it. In my notes I wrote to myself that it was "ridiculously 
>>> difficult to
>>> program".
>> Thanks for the suggestions about using linklint.
>>
>> But looking through the suggestions, and thinking about how I'd have
>> to wget the whole site with special arguments to save to a temporary
>> directory every time I wanted to do a link check, I decided that if
>> for some bizarre reason this tool didn't already exist, it should.
>> So I wrote it.
>>
>> https://github.com/akkana/scripts/blob/master/weborphans
>>
>> The hard part turned out to be turning all links into absolute
>> links, then turning those into equivalent paths on the local
>> filesystem. That sounded easy but turned out to have a lot of tricky
>> aspects (I'm still working on some edge cases).  But it's good
>> enough that I was able to find the 10 bad links and 606 orphaned
>> files on this website I inherited.
>>
>>          ...Akkana
>> _______________________________________________
>> Techtalk mailing list
>> Techtalk at linuxchix.org
>> http://mailman.linuxchix.org/mailman/listinfo/techtalk
>>
>
> _______________________________________________
> Techtalk mailing list
> Techtalk at linuxchix.org
> http://mailman.linuxchix.org/mailman/listinfo/techtalk
>



More information about the Techtalk mailing list