Our school moved to PeopleSoft for.. I’m not going there.. but that’s where everyone’s timetables are now. I thought maybe this big fancy company has an API to let me access the data but no, it’s basically impossible to access the API directly.
So I was left with screen scraping, which I always wanted to try, why not. Go to the page I want to examine, open up Firebug, and drill down to the table elements I’m interested in: body>div>iframe>html>body>form>div>table>tbody>tr>td>div>table>tbody>tr>td>div>table>tbody>tr>td>div>table>tbody>tr>td>div>table>tbody>tr>td>div…
Er, wtf? I seemed to be going in some Firebug bug infinite loop. Surely they don’t have that many tables inside each other? Then I discovered the “Click an element” button and found that there are lots and lots of tables inside tables on this simple page:
This is with the text at its minimum size, you can see by the scrollbars what I’m talking about:
But after a while I managed to figure it out. I had to learn some XPath to find the cells I was interested in based on their IDs, but I couldn’t use XPath for everything – I tried but it ate all my RAM and was still working through the swap partition when I killed it in the morning.
Here’s the script in case you’re in the same boat. It prints the timetable data in the console. For myself I intend to make some Json out of it for import into Everyone’s Timetable.
// Firebug script to scrape timetable data from a PeopleSoft-backed website.
// Run it when you're on the page that shows the timetable. You get to that page
// like so:
// Faculty Center
// Click the Search tab
// Expand Additional Search Criteria
// Set "Instructor Last Name" to the one you're looking for
// Start Firebug, go to Console, paste in this script and run it
//
// Author: Andrew Smith http://littlesvr.ca
var frameDocument = document.getElementById('ptifrmtgtframe').contentWindow.document;
// DERIVED_CLSRCH_DESCR200$0, $1, etc. have the course title
var courseTitles = frameDocument.
evaluate("//div[contains(@id,'DERIVED_CLSRCH_DESCR200')]",
frameDocument.documentElement, null,
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
// For each course
for (var i = 0; i < courseTitles.snapshotLength; i++) {
var courseTitle = courseTitles.snapshotItem(i);
console.log(courseTitle.textContent);
// Find the the next tr which has the timetable data for this course
var timetableTableParentRow = courseTitle
.parentNode
.parentNode
.parentNode
.parentNode
.parentNode
.parentNode
.parentNode
.parentNode
.parentNode
.parentNode
.parentNode
.nextSibling
.nextSibling;
// There's some fucked up empty row after the first course title only
if (i == 0)
{
timetableTableParentRow = timetableTableParentRow
.nextSibling
.nextSibling;
}
// Now go down to the table in this tr, it's the only thing that has
// an id so I can use xpath to find its children (timetable rows).
var timetableTableId = timetableTableParentRow
.firstChild
.nextSibling
.nextSibling
.nextSibling
.firstChild
.nextSibling
.id;
// MTG_DAYTIME$0, $1, etc. have the day and time range in this format:
// Mo 1:30PM - 3:15PM
var times = frameDocument.
evaluate("//div[@id='" + timetableTableId +"']//div[contains(@id,'MTG_DAYTIME')]",
frameDocument.documentElement, null,
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
var timesArray = new Array();
for (var j = 0; j < times.snapshotLength; j++) {
timesArray[j] = times.snapshotItem(j).textContent;
}
// MTG_ROOM$0, $1, etc. have the room number in this format:
// S@Y SEQ Bldg S3028
var rooms = frameDocument.
evaluate("//div[@id='" + timetableTableId +"']//div[contains(@id,'MTG_ROOM')]",
frameDocument.documentElement, null,
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
var roomsArray = new Array();
for (var j = 0; j < rooms.snapshotLength; j++) {
roomsArray[j] = rooms.snapshotItem(j).textContent;
}
// MTG_INSTR$0, $1, etc. have the instructor names but I think I'll
// ignore them. For shared courses it won't hurt too much I hope.
// Dump all the timetable data into the console, will do something with it later.
for (var j = 0; j < times.snapshotLength; j++) {
console.log(timesArray[j] + roomsArray[j]);
}
}

