I use the perl module WWW::Mechanize as my go-to solution for anything I need to automate over the web. My dad, a web-scraping pro, recommended it as a good starting point when I was just getting started and I’ve used it for pretty much everything since. There’s a few specialized situations where it doesn’t work well, but for most of the sites I need to work with it does great. I used it this past week to help my sister automate the process of signing up campers for her library system’s summer reading program, and thought it would make a good intro for anyone who is interested in getting started with Mechanize.
The goal of the project was to take a spreadsheet with camper information and fill out a registration form for each one. The form looks something like this:
<form method="post" action="splash.asp?id=1" onSubmit="return validateSignUp(this);">
<input type="hidden" name="form_sr" value="1">
<input type="hidden" name="form_postback" value="yes">
<table width="100%" cellpadding="5" cellspacing="2">
<tr class="bggrey"><td colspan="2" class="font12bold centered">Create a Teen Summer Reading account and get started! <span class="font12boldred">Fields in red are required.</a></td></tr>
<tr class="bggrey01">
<td width="30%" class="font12boldred" align="right">First Name:</td>
<td width="70%"><input type="text" name="form_fname" size="40" value="" onBlur="populateTheOtherInputField(this);"; /></td>
</tr>
<tr class="bggrey02">
<td class="font12boldred" align="right">Last Name:</td>
<td><input type="text" name="form_lname" size="40" value="" onBlur="populateTheOtherInputField(this);"; /></td>
</tr>
<tr class="bggrey01">
<td class="font12boldred" align="right">Phone Number:</td>
<td><input type="text" name="form_phone" size="40" value="" /></td>
</tr>
<tr class="bggrey02">
<td class="font12bold" align="right">Email:<br /><span class="font10">For easy password recovery.</span></td>
<td><input type="text" name="form_email" size="40" value="" /></td>
</tr>
<!-- A bunch of code related to various drop-down boxes-->
<!-- A captcha which doesn't appear if you are on the library network -->
<input type="submit" value="Create Account" />
Setting up a mechanize browsing daemon is easy as shown below. I also use SpreadSheet::ParseExcel for the input.
#!/usr/bin/perl -w
use WWW::Mechanize;
use SpreadSheet::ParseExcel;
use strict;
use warnings;
my ($filename) = @ARGV;
my $mech = WWW::Mechanize->new(autocheck=>0,timeout=>5);
# Currently this hard codes in the age group
my $base = "http://www.cmlibrary.org/programs/summer_reading/2014/splash.asp?id=1";
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse($filename ) or die $parser->error;
my $worksheet = $workbook->worksheet('Sheet1') or die "No worksheet found";
my $username = " ";
To do the signup for each camper we loop over the rows of the spreadsheet and pull out the values we need with “get_cell”. Then we load the signup page with mechanize, select the form (the one I needed happened to be the 3rd one on the page). We use “field” to fill in forms, and “select” for drop-down menus (there were a bunch, I’ve only shown one here). Lastly, we click the “Create account” button and we’re all set:
for (my $row=3;$username ne "";$row++) {
my $cell = $worksheet->get_cell($row,6) or die "Can't get cell";
$username = $cell->value();
$cell = $worksheet->get_cell($row,7); my $password = $cell->value();
$cell = $worksheet->get_cell($row,0); my $firstname = $cell->value();
$cell = $worksheet->get_cell($row,1); my $lastname = $cell->value();
$cell = $worksheet->get_cell($row,2); my $phonenumber = $cell->value();
if ($username) {
print "Registring: " , $username. "\n";
$mech->get($base);
if ($mech->success()) {
my $regform = $mech->form_number(3);
#Fill fields in order of appearance
$mech->field("form_fname",$firstname,1);
$mech->field("form_lname",$lastname,1);
$mech->field("form_phone",$phonenumber,1);
$mech->select("form_library", 99); #Default library for now
$mech->field("form_username",$username,1);
$mech->field("form_password",$password,1);
$mech->field("form_password2",$password,1);
$mech->field("form_memo","This account was succesfully automatically generated",1);
$mech->click_button(value=>"Create Account");
}
}
}
While I was filling out forms this time, I’m usually pulling data off a web site. This is actually easier than filling out forms — just load the page with “$mech->get($url)” and then access with page source with “my $page = $mech->content” or similar. Normally I read the page in line-by-line and use regex to pull out whatever I’m looking for. (Just remember, you can match known html easily with regex, but don’t try to parse it =).