[prog] Writting a program to automatically fetch webpages, and fill in forms.

jennyw jennyw at dangerousideas.com
Mon Mar 21 18:43:22 EST 2005


Sue Stones wrote:

> I don't know much, so its almost as easy to use one tool as another.  
> I have used perl occasionally in the past and it would be good to 
> consolidate that a bit.
>
> Finding out how to integrate it into the expert system may affect my 
> choice.

What expert system is it you're using? Is it something open source or is 
it proprietary?

You might also look at some testing projects, too, since automation and 
testing are often linked. Some automation libraries take control of a 
browser, such as IE, if that's important (it can be for testing; 
probably not important in your case, though). O'Reilly has a book called 
Spidering Hacks that might be of interest to you. I believe it uses 
Perl's LWP, which Jacinta suggested you check out.

Also, Python has a mechanize library (which is ported from the Perl 
library that Jacinta suggested). Resources on Web scraping in Python here:

http://sig.levillage.org/index.php?p=563

In addition to sending requests and getting pages, you might be 
interested in techniques to extract info (the page above talks about 
some of this). The last time I had to do extract data from Web pages, I 
had success running pages through HTML Tidy and then getting data using 
Python's minidom.

Ruby has some tools, too, although I haven't explored these yet. Watir 
automates Internet Explorer (which is great for testing; might not be 
what you're looking for). There's also webunit which is a testing 
suite.  Someone recently wrote a package called mechanize, but this 
might be kind of raw:

http://www.ntecs.de/blog/Blog/WWW-Mechanize.rdoc/style/print

Jen



More information about the Programming mailing list