[prog] Writting a program to automatically fetch webpages, and fill in forms.

Thu Mar 24 14:09:41 EST 2005

jenny wrote:
> 
> What expert system is it you're using? Is it something open source or is 
> it proprietary?

I am using CLIPS, which was developed by NASA in 1984 and is now 
maintained independently from NASA as public domain software.   What 
license, if any, it is released under is a mystery to me.

Using CLIPS is non-negotiable.

The system is developed first for a Uni subject so it needs to meet 
those requirements, hence the necessity for CLIPS.  But I decided rather 
than just develop some useless example that I wanted to build something 
with a real world use.  All my ideas of things that I would use in later 
research were far too big.  So I spoke to a friend who is a real world 
medical educator.  This means that I also get experience of dealing with 
a real world "expert" and putting is ideas into a system, and putting 
the system into use.

The system will primarily be teaching GP (Doctors) how to search through 
online medical databases to find recent research that is relevant to the 
medical problem that they are trying to treat.  During a consultation 
they need to be able to get an answer to these questions within a few 
seconds, so there is no time to search through pages of results, or try 
lots of searches.

> ....

> 
> Also, Python has a mechanize library (which is ported from the Perl 
> library that Jacinta suggested). Resources on Web scraping in Python here:

Thanks for all your suggestions I will investigate them all.  At this 
stage I am wondering about relative ease of installing the system that I 
use.

That is if I use perl does the user need to install perl on their 
computer, if I use python or URL, does the user need to install that on 
their system.  If they do how big is the install, and how easy is it for 
people who find using a search engine a very technical exercise?

> In addition to sending requests and getting pages, you might be 
> interested in techniques to extract info (the page above talks about 
> some of this). The last time I had to do extract data from Web pages, I 
> had success running pages through HTML Tidy and then getting data using 
> Python's minidom.

I am not sure if I have not been clear enough about what I need to do. 
(Probably because I posted my original question after several late 
nights of searching for a way in to this problem).

Basically I need to submit search terms to a search engine and retrieve 
the results of the search.  My preference is to retrieve the count and 
the links independently.  tie if a search has matched 1024 articles I 
want to refine the search to about 7 articles before I retrieve any 
links to those articles.

> Ruby has some tools, too, although I haven't explored these yet. Watir 
> automates Internet Explorer 

Cool, I am currently enamored of the idea of using ruby.

If I can do what I want without going anywhere near a web-browser that 
is my preference.

sue