home chapter list downloads target addresses answers & updates videos community purchase author contact
NOTE: This is the website for the 2nd (2012) edition of this book. If you have the 1st (2007), Chinese or Italian editions, go here.
   
  Libraries
The code libraries used by this book are governed by the W3C Software Notice and License .
Library Purpose
LIB_download_images.php Binary-safe downloads, Directory preparation, Downloading all images for a specific web page
LIB_exclusion_list.php An example of an exclusion list, used by a spider
LIB_http.php PHP/CURL routines for downloading web pages and automating form submission
LIB_http_codes.php An array that defines http status codes
LIB_mail.php Various routines for reading email form POP3 Mail Servers
LIB_mysql.php A general purpose MySQL interface
LIB_nntp.php obsolete A general purpose NNTP (newsgroup) interface
LIB_parse.php A collection of parsing routines
LIB_pop3.php A collection of routines that connect to a POP3 Mail Server to send email
LIB_resolve_addresses.php Contains a variety of routines that resolve addresses, determine domains and "page base" addresses
LIB_rss.php A library used with the webbot aggregation project. Contains RSS parsing routines
LIB_simple_spider.php Various routines that harvest, exclude and archive links, resolve domains
LIB_thumbnail.php Defines a function that creates thumbnail images
Download Libraries
All of the book's libraries are conatined in this zip file: WebbotsSpidersScreenScraper_Libraries_REV2_0.zip
You can get your copy of the library files by clicking here
   

  Example Scripts
The example scripts (used in the book) are covered by the W3C Software Notice and License .
NOTE: THESE SCRIPTS ARE FOR DEMONSTRATION PURPOSES ONLY! They are not suitable for any use other than demonstrating the concepts presented in Webbots, Spiders and Screen Scrapers. Do not use these scripts in a production environment where reliability is a priority.
Download Example Scripts
These scripts are individually downloadable by clicking on the script names.
Please note that small, easy-to-enter scripts are not available for download. This may change, depending on demand.
Script Version Chapter Comment Libraries Required
LISTING_3_1.php 1.0   3   (Very) simple 'Hello World' web page download using PHP's fopen( ) and fgets( ) functions n/a 
LISTING_3_2.php 1.0   3   (Very) simple web page download using PHP's file( ) function n/a 
LISTING_3_6.php 1.0   3   Web page download script using LIB_http LIB_http 
LISTING_4_2.php 1.0   3   LIB_parse demo: Using split_string( ). LIB_parse
LIB_http 
LISTING_4_4.php 1.0   4   LIB_parse demo: Using return_between( ) to parse the page title from a web page LIB_parse 
LISTING_4_6.php 1.0   4   LIB_parse demo: Using parse_array() to parse meta tags from www.fbi.gov LIB_parse
LIB_http 
LISTING_6_12.php 1.0   6   Form Analysis Script (to be run on a web server) n/a 
LISTING_7_12.php 1.0   7   Demo of image thumbnail creation script LIB_http
LIB_thumbnail 
price_monitoring_bot.php 1.0   8   Download and parse prices from sample store website LIB_http
LIB_parse 
image_capture_bot.php 2.0   9   Download images in a duplicate directory structure LIB_download_images,
 which includes:
  LIB_http
  LIB_parse
  LIB_resolve_addresses  
link_verification_bot.php 2.0   10   Validate links on a web page LIB_http
LIB_parse.php
LIB_resolve_addresses
LIB_http_codes.php 
search_ranking_bot.php 2.0   11   An example webbot that calculates search engine rankings LIB_http
LIB_parse.php 
aggregation_bot.php 1.0   12   An example of a simple aggregation webbot using RSS feeds LIB_http
LIB_parse.php
LIB_rss.php 
ftp_bot.php 1.0   13   A simple demonstration of using FTP.
NOTICE, REQUIRES ACCESS TO FTP SERVERS & ADDITIONAL CONFIGURATION
Read Chapter 13 for additional information.
n/a 
email_reading_bot.php 1.0   14   Download email from a POP3 mail server
NOTICE, REQUIRES ACCESS TO A POP3 MAIL SERVER & ADDITIONAL CONFIGURATION
Read Chapter 15 for additional information.
LIB_POP3
LIB_parse 
website_to_function_bot.php 1.0   16   Example of how to turn a website into a PHP function n/a 
simple_spider.php 1.0   17   Simple spider project that downloads images from a website. LIB_http.php
LIB_parse.php
LIB_resolve_addresses.php
LIB_exclusion_list.php
LIB_simple_spider.php
LIB_download_images.php 
LISTING_20_1.php 1.0   21   Webbot that auto-authenticates a website that uses BASIC authentication n/a 
LISTING_20_3.php 1.0   21   Webbot that auto-authenticates a website that uses cookie authentication LIB_http
LIB_parse 
LISTING_20_4.php 1.0   21   Webbot that auto-authenticates a website that uses query authentication LIB_http
LIB_parse 
LISTING_28_5.php 1.0   28   Example of detecting "meta tag" redirection LIB_http
LIB_parse
LIB_resolve_addresses 
LISTING_28_9.php 1.0   28   Fault tolerant parsing of form values LIB_http
LIB_parse
LIB_resolve_addresses 
LISTING_29_8.php 1.0   29   Example of downloading and parsing an XML file LIB_http
LIB_parse 
LISTING_29_12.php 1.0   29   Example of webbot using a light-weight data exchange interface LIB_http 
Please contact me if there's an error or another script from the book that you'd like to have.
home chapter list downloads target addresses answers & updates videos community purchase author contact
Copyright 2024, Michael Schrenk