By Mike Driedger
Fun fact: Before about 1800, “s” and “f” looked VERY similar in printed texts but they were clearly distinct letters (sort of like “1”, “l” and “I”, or “O” and “0” today). Here’s an example: In an 18th-century essay Joseph Warton wrote that “the favorite and peculiar pasttime” of Ariel in Shakespeare’s The Tempest is expressed in the following song:
This passage is an example of the kind of text that has given us course-planning fits in the past week. Imagine what a blind student using assistive reading technology would hear when trying to listen to this passage! We don’t have to imagine what the OCR (optical character recognition) technology used by archive.org does with the passage, because this is what you will actually (no joke) find online at archive.org (WARNING: explicit language):
Where the bee fucks, there fuck I,
In a cowflip’s bell I lie;
There I couch when owls do cry.
On the bat’s back I do fly,
After fun-fet, merrily;
Merrily, merrily, (hall I live now.
Under the bloflTom that hangs on the bousih.
Many of the sources we plan to have our students read are page scans of books from the 17th and 18th centuries. We used to think: What would be better in a course on the early modern world than to get students to work with reasonably good reproductions of actual sources (like the example above) – not merely modern editions or transcriptions of them?
We have assumed that the online format for our course would open up new and wonderful opportunities. We were blissfully ignorant of the possible pitfalls. We’re learning just how badly OCR technology sucks when applied to early modern texts. As bad as the current situation is, there is reason to hope for real progress in the near future. But this hope will probably not solve our current problem.
Here’s our current problem. Danny and I want our course to be as accessible as possible for all students. We also have an obligation to make it so. As we’re learning, the Accessibility for Ontarians with Disabilities Act (AODA) sets out the legal framework for our obligations. The trouble is that we have been planning most of our course content on the assumption that it would be fairly unproblematic for us to use the fabulous collections of early modern textual sources that are available online. Of course, we know about the limitations on the use of online materials: Like all other university educators we are aware of copyright issues associated with the use of digital sources, especially for texts made available in proprietary databases; and because of help from our University Office for Students with disA!bilities we have been sensitive to student needs on campus for many years. What we hadn’t initially anticipated is that the massive amounts of good texts available free-from-copyright through archive.org and other sites will be of little to no use to blind students, just to name the most obviously excluded group! For a short while we were panicking because we feared we might have to redesign large portions of our reading list and assignments.
We have not given up on our original attitude that getting students to work with online sources that are as true to the real thing as possible should be our ideal. And we think we have a solution that will allow us to improve the course for all students. We now intend to add a new source-transcription assignment to the course. Here is a draft of the assignment’s learning outcomes. At the end of the unit (workshop and assignment), students will be able to:
- read early modern texts in their original typography at a comfortable speed;
- explain the limitations of OCR technology for digitalizing online sources;
- transcribe passages from early modern texts into machine-readable digital formats using Scripto (scripto.org).
Admittedly, this assignment does not help fully blind students in the short term. We need to spend a little more time trying to anticipate how we will address this potential problem. Despite its limitations we think the benefits of our assignment plan are significant:
- together with the students we will build up a large repertoire of properly digitalized early modern sources;
- we can share these newly digitalized sources online through archive.org and other source-sharing sites for scholars;
- and in future iterations of the course we will have an expanded range of accessible sources.
In later assignments (i.e., assignments that follow the transcription project) students will use their properly transcribed and digitalized sources in a text-analysis program (Voyant Tools). We’ll write more about this assignment in future posts.