[prog] Writting a program to automatically fetch webpages, and
fill in forms.
Sue Stones
suzo at spin.net.au
Thu Mar 24 14:09:41 EST 2005
jenny wrote:
>
> What expert system is it you're using? Is it something open source or is
> it proprietary?
I am using CLIPS, which was developed by NASA in 1984 and is now
maintained independently from NASA as public domain software. What
license, if any, it is released under is a mystery to me.
Using CLIPS is non-negotiable.
The system is developed first for a Uni subject so it needs to meet
those requirements, hence the necessity for CLIPS. But I decided rather
than just develop some useless example that I wanted to build something
with a real world use. All my ideas of things that I would use in later
research were far too big. So I spoke to a friend who is a real world
medical educator. This means that I also get experience of dealing with
a real world "expert" and putting is ideas into a system, and putting
the system into use.
The system will primarily be teaching GP (Doctors) how to search through
online medical databases to find recent research that is relevant to the
medical problem that they are trying to treat. During a consultation
they need to be able to get an answer to these questions within a few
seconds, so there is no time to search through pages of results, or try
lots of searches.
> ....
>
> Also, Python has a mechanize library (which is ported from the Perl
> library that Jacinta suggested). Resources on Web scraping in Python here:
Thanks for all your suggestions I will investigate them all. At this
stage I am wondering about relative ease of installing the system that I
use.
That is if I use perl does the user need to install perl on their
computer, if I use python or URL, does the user need to install that on
their system. If they do how big is the install, and how easy is it for
people who find using a search engine a very technical exercise?
> In addition to sending requests and getting pages, you might be
> interested in techniques to extract info (the page above talks about
> some of this). The last time I had to do extract data from Web pages, I
> had success running pages through HTML Tidy and then getting data using
> Python's minidom.
I am not sure if I have not been clear enough about what I need to do.
(Probably because I posted my original question after several late
nights of searching for a way in to this problem).
Basically I need to submit search terms to a search engine and retrieve
the results of the search. My preference is to retrieve the count and
the links independently. tie if a search has matched 1024 articles I
want to refine the search to about 7 articles before I retrieve any
links to those articles.
> Ruby has some tools, too, although I haven't explored these yet. Watir
> automates Internet Explorer
Cool, I am currently enamored of the idea of using ruby.
If I can do what I want without going anywhere near a web-browser that
is my preference.
sue
More information about the Programming
mailing list