Data Mining – THATCamp Digital Frontiers 2016 http://digitalfrontiers2016.thatcamp.org Houston, TX THATCamp Co-sponsored by TXDHC & DIgital Frontiers Sat, 24 Sep 2016 15:01:27 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.12 Towards a Humanities and Data Science Syllabus http://digitalfrontiers2016.thatcamp.org/2016/09/23/towards-a-humanities-and-data-science-syllabus/ Fri, 23 Sep 2016 14:19:59 +0000 http://digitalfrontiers2016.thatcamp.org/?p=206

Over the past five years, “data science” has become a major force, as companies strive to gain insight into customer behavior, researchers look for patterns in large collections of data, and educational institutions aim to train the next generation of data scientists. (Rice has recently launched its own data science initiative.) Humanists have rich data to analyze (such as collections of texts, images, media objects, cultural information, etc) and are developing significant data-intensive research projects, but they are also raising important questions about ethics, how to handle absence and ambiguity, and the risk of reductionism.

So what are we to make of data science in the (digital) humanities, and what can humanists and cultural heritage professionals contribute? What do humanities scholars and cultural heritage professionals need to know about data science? What would a humanities data science course look like? In this Talk/Make session, I’d like to collectively sketch out a humanities data science syllabus in order to articulate key questions/themes and get started imagining a potential course. Potential models include Lauren Klein’s LMC 3206: Studies in Communication and Culture: Data and Miriam Posner’s DH 101.

]]>
Session Proposal: “Such a Character” http://digitalfrontiers2016.thatcamp.org/2016/09/23/session-proposal-such-a-character/ Fri, 23 Sep 2016 00:12:06 +0000 http://digitalfrontiers2016.thatcamp.org/?p=203

This goes in the Talk about Teaching category. I’m interested in discussing and seeking suggestions on a behind-the-scenes lesson from my corpus linguistics class, which involves students working on an interactive OCR task. The assignment has evolved as I’ve worked on ways for them to understand why OCR works and doesn’t work, and how to better know the tools they use for digitizing texts. In our departmental computer lab we use ReadIRIS in the “learning” mode, which is meant to help the OCR learn the characteristics of your particular text, but it can be also be deployed to help humans see how the computer makes its recognition choices. I like to provide a particularly messy scanned text for them to work on, in conjunction with an easier text page of their choice. The intended outcome is for students to have hands on time OCRing, and then produce a reflective write-up comparing their OCR experiences and giving their hypotheses about what helps digitization work best. The goal is for them to not just learn to use a piece of software, but to speculate on how different tools give the output that they do. How do your colleagues or students play with OCR? What’s worked well or poorly as you’ve all learned to capture texts? Bring along your tales and teaching tips!

]]>