[Techtalk] Scraping a webpage from a website.
cee
cee at cybercom.net
Fri May 2 12:02:34 EST 2003
you can download an entire site, or just pages from a site
with 'wget'.
'wget --mirror http://www.yoursite.com' will take your entire
site, including all sub pages and images, and re-create the
directory structure on your local machine. it will follow all
visable links. it will not get javascript pop-ups or pages which
you only see after submitting forms.
'wget http://www.yoursite.com/somepage.html' will take that
specific page alone -- without images or following links.
you can also have wget follow external links to outside pages.
the man page has more details.
i use this tool on an almost daily basis, i don't know what
i'd do without it. :)
/cee
On Friday 02 May 2003 11:49 am, Jennifer Davis wrote:
> Hi. I was wondering if there was a simple way to script and download a
> website so it can be placed somewhere else. Essentially I need to
> retreive a site for a non-profit provided that is disbanding. I would
> assume a shell or perl script could do it, but my scripting skills are not
> quite there yet. Thanks...
>
> Jennifer S. Davis
> Computer Programming, 1st year
> Algonquin College
> davi0302 at algonquinc.on.ca
> _______________________________________________
> Techtalk mailing list
> Techtalk at linuxchix.org
> http://mailman.linuxchix.org/mailman/listinfo/techtalk
--
...
| |
|'| ._____
___ | | |. |' .---"|
_ .-' '-. | | .--'| || | _| |
.-'| _.| | || '-__ | | | || |
|' | |. | || | | | | || |
___| '-' ' "" '-' '-.' '` |____
cee at cybercom.net | www.00ff00.com
More information about the Techtalk
mailing list