[Techtalk] Scraping a webpage from a website.

cee cee at cybercom.net
Fri May 2 12:02:34 EST 2003


you can download an entire site, or just pages from a site
with 'wget'.

'wget --mirror http://www.yoursite.com'  will take your entire
site, including all sub pages and images, and re-create the 
directory structure on your local machine. it will follow all
visable links. it will not get javascript pop-ups or pages which
you only see after submitting forms.

'wget http://www.yoursite.com/somepage.html' will take that
specific page alone -- without images or following links.

you can also have wget follow external links to outside pages.
the man page has more details.

i use this tool on an almost daily basis, i don't know what
i'd do without it. :)

/cee


On Friday 02 May 2003 11:49 am, Jennifer Davis wrote:
> Hi.  I was wondering if there was a simple way to script and download a
> website so it can be placed somewhere else.  Essentially I need to
> retreive a site for a non-profit provided that is disbanding.  I would
> assume a shell or perl script could do it, but my scripting skills are not
> quite there yet.  Thanks...
>
> Jennifer S. Davis
> Computer Programming, 1st year
> Algonquin College
> davi0302 at algonquinc.on.ca
> _______________________________________________
> Techtalk mailing list
> Techtalk at linuxchix.org
> http://mailman.linuxchix.org/mailman/listinfo/techtalk

-- 
                       ...
                       | |
                       |'|            ._____
               ___    |  |            |.   |' .---"|
       _    .-'   '-. |  |     .--'|  ||   | _|    |
    .-'|  _.|  |    ||   '-__  |   |  |    ||      |
    |' | |.    |    ||       | |   |  |    ||      |
 ___|  '-'     '    ""       '-'   '-.'    '`      |____

           cee at cybercom.net | www.00ff00.com




More information about the Techtalk mailing list