[Techtalk] Scraping a webpage from a website.

J Neefer! neefer at speakeasy.org
Fri May 2 09:25:33 EST 2003


On May 02, 2003 at 05:01PM (+0100), Telsa Gwynne said:
> 
> Failing that, I don't think you need to write a script. wget already
> exists. It has a pile of command-line options, some of which are
> important for your sanity (don't recurse endlessly until you have 
> covered the entire world-wide web for example) and others of which
> will keep the webserver admin happy (maximum number of tries per 
> page; time to wait between retrievals).

wget can't follow buttons and other things in a dynamic website,
but it is fantastic for flat sites full of href's.

A coworker of mine keeps talking about a perl module called
"mechanize" which is designed to do this, but I haven't had time
to look into it.  It might be worth a try if the OP is good at
perl and wget doesn't have what it takes.

--J




More information about the Techtalk mailing list