[Techtalk] Scraping a webpage from a website.
neefer at speakeasy.org
Fri May 2 09:25:33 EST 2003
On May 02, 2003 at 05:01PM (+0100), Telsa Gwynne said:
> Failing that, I don't think you need to write a script. wget already
> exists. It has a pile of command-line options, some of which are
> important for your sanity (don't recurse endlessly until you have
> covered the entire world-wide web for example) and others of which
> will keep the webserver admin happy (maximum number of tries per
> page; time to wait between retrievals).
wget can't follow buttons and other things in a dynamic website,
but it is fantastic for flat sites full of href's.
A coworker of mine keeps talking about a perl module called
"mechanize" which is designed to do this, but I haven't had time
to look into it. It might be worth a try if the OP is good at
perl and wget doesn't have what it takes.
More information about the Techtalk