Simple web form automation with Mechanize

I use the perl module WWW::Mechanize as my go-to solution for anything I need to automate over the web. My dad, a web-scraping pro, recommended it as a good starting point when I was just getting started and I’ve used it for pretty much everything since. There’s a few specialized situations where it doesn’t work well, but for most of the sites I need to work with it does great. I used it this past week to help my sister automate the process of signing up campers for her library system’s summer reading program, and thought it would make a good intro for anyone who is interested in getting started with Mechanize.

The goal of the project was to take a spreadsheet with camper information and fill out a registration form for each one. The form looks something like this:

<form method="post" action="splash.asp?id=1" onSubmit="return validateSignUp(this);">
	<input type="hidden" name="form_sr" value="1">
	<input type="hidden" name="form_postback" value="yes">
	<table width="100%" cellpadding="5" cellspacing="2">
		<tr class="bggrey"><td colspan="2" class="font12bold centered">Create a Teen Summer Reading account and get started! <span class="font12boldred">Fields in red are required.</a></td></tr>
	
		<tr class="bggrey01">
		<td width="30%" class="font12boldred" align="right">First Name:</td>
		<td width="70%"><input type="text" name="form_fname" size="40" value="" onBlur="populateTheOtherInputField(this);"; /></td>
		</tr>

		<tr class="bggrey02">
		<td class="font12boldred" align="right">Last Name:</td>
		<td><input type="text" name="form_lname" size="40" value="" onBlur="populateTheOtherInputField(this);";  /></td>
		</tr>
		
		<tr class="bggrey01">
		<td class="font12boldred" align="right">Phone Number:</td>
		<td><input type="text" name="form_phone" size="40" value="" /></td>
		</tr>
		
		<tr class="bggrey02">
		<td class="font12bold" align="right">Email:<br /><span class="font10">For easy password recovery.</span></td>
		<td><input type="text" name="form_email" size="40" value="" /></td>
		</tr>
		
		<!-- A bunch of code related to various drop-down boxes-->

		<!-- A captcha which doesn't appear if you are on the library network -->
		<input type="submit" value="Create Account" />

Setting up a mechanize browsing daemon is easy as shown below. I also use SpreadSheet::ParseExcel for the input.

#!/usr/bin/perl -w
use WWW::Mechanize;
use SpreadSheet::ParseExcel;

  use strict;
  use warnings;
  
my ($filename) = @ARGV;
my $mech =   WWW::Mechanize->new(autocheck=>0,timeout=>5);

# Currently this hard codes in the age group
my $base = "http://www.cmlibrary.org/programs/summer_reading/2014/splash.asp?id=1";
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse($filename ) or die $parser->error;
my $worksheet = $workbook->worksheet('Sheet1') or die "No worksheet found";
my $username = " ";

To do the signup for each camper we loop over the rows of the spreadsheet and pull out the values we need with “get_cell”. Then we load the signup page with mechanize, select the form (the one I needed happened to be the 3rd one on the page). We use “field” to fill in forms, and “select” for drop-down menus (there were a bunch, I’ve only shown one here). Lastly, we click the “Create account” button and we’re all set:

for (my $row=3;$username ne "";$row++) {
	my $cell = $worksheet->get_cell($row,6) or die "Can't get cell";
	$username = $cell->value();
	$cell = $worksheet->get_cell($row,7); 	my $password = $cell->value();
	$cell = $worksheet->get_cell($row,0); my $firstname = $cell->value(); 
	$cell = $worksheet->get_cell($row,1); my $lastname = $cell->value();
	$cell = $worksheet->get_cell($row,2); my $phonenumber = $cell->value();
	
	if ($username) {
		print "Registring: " , $username. "\n";
		$mech->get($base);
		if ($mech->success()) {
				my $regform = $mech->form_number(3);
				#Fill fields in order of appearance
				$mech->field("form_fname",$firstname,1);
				$mech->field("form_lname",$lastname,1);
				$mech->field("form_phone",$phonenumber,1);
				$mech->select("form_library", 99); #Default library for now
				$mech->field("form_username",$username,1);
				$mech->field("form_password",$password,1);
				$mech->field("form_password2",$password,1);
				$mech->field("form_memo","This account was succesfully automatically generated",1);
				$mech->click_button(value=>"Create Account");
		}
	}
}

While I was filling out forms this time, I’m usually pulling data off a web site. This is actually easier than filling out forms — just load the page with “$mech->get($url)” and then access with page source with “my $page = $mech->content” or similar. Normally I read the page in line-by-line and use regex to pull out whatever I’m looking for. (Just remember, you can match known html easily with regex, but don’t try to parse it =).