Session Proposal: “Such a Character” | THATCamp Digital Frontiers 2016

This goes in the Talk about Teaching category. I’m interested in discussing and seeking suggestions on a behind-the-scenes lesson from my corpus linguistics class, which involves students working on an interactive OCR task. The assignment has evolved as I’ve worked on ways for them to understand why OCR works and doesn’t work, and how to better know the tools they use for digitizing texts. In our departmental computer lab we use ReadIRIS in the “learning” mode, which is meant to help the OCR learn the characteristics of your particular text, but it can be also be deployed to help humans see how the computer makes its recognition choices. I like to provide a particularly messy scanned text for them to work on, in conjunction with an easier text page of their choice. The intended outcome is for students to have hands on time OCRing, and then produce a reflective write-up comparing their OCR experiences and giving their hypotheses about what helps digitization work best. The goal is for them to not just learn to use a piece of software, but to speculate on how different tools give the output that they do. How do your colleagues or students play with OCR? What’s worked well or poorly as you’ve all learned to capture texts? Bring along your tales and teaching tips!