Yale DHLab - Page-Level Metadata from Digital Libraries

EVENTS

Mar

Workshop

Page-Level Metadata from Digital Libraries

Tuesday, March 27, 2018

4:00 pm

Bass Library, L01

This workshop will show participants how to use the HathiTrust Research Center (HTRC) Feature Reader to conduct semantic analysis.

EVENTS

Page-Level Metadata from Digital Libraries

Mar

Tuesday, March 27, 2018

4:00 pm

Bass Library, L01

Tags:

Text Analysis Visual Analysis

The new frontier of metadata is at the level of the page, not the volume. Digital libraries like HathiTrust now provide counts of word tokens, sentences, and lines for each page. These are just some of the available “features” that can be used for text mining tasks. Importantly, language statistics like these are non-consumptive, which means they can be provided for in-copyright works.

This demo will show how to use the HathiTrust Research Center (HTRC) Feature Reader for basic semantic analysis. Stephen Krewson will discuss ways to access non-linguistic information such as “image on page,” one of a few experimental features that Google has been offering to HathiTrust. As time and interest dictate, Stephen can give a high-level overview of computer vision and the page segmentation techniques that make this feature possible.

By the end of the demo, we’ll have produced code that will be available on GitHub as a Jupyter Notebook that participants will be able to use with Python and a few APIs to richly characterize the distribution of linguistic and visual content in a volume of their choosing.

Please bring a laptop over which you have administrative control, if you would like to follow along during the hands-on portion of the workshop.

Bio

Stephen Krewson is a Yale graduate student in English and Computer Science.

Tags:

Text Analysis Visual Analysis

RELATED EVENTS

Miriam Posner on Digital Humanities in American Studies

May 04 2015

The Digital Humanities Lab and the Digital Humanities Working Group hosted a talk by Miriam Posner, Program Coordinator of Digital Humanities at UCLA on May 4 in the Hall of...

Learn More »

Statistical Analysis at the Birth of Close Reading

Apr 10 2015

The Digital Humanities Lab partnered with the Digital Humanities Working Group to bring Yohei Igarashi to campus for a talk titled “Statistical Analysis at the Birth of Close Reading.” The...

Learn More »

Text Analysis Workshop with Matthew Jockers

Dec 05 2014

The Digital Humanities Lab sponsored a text analysis workshop with Matthew Jockers, Associate Professor of English at the University of Nebraska-Lincoln. Jockers based the workshop on the first few chapters...

Learn More »

Related Events

Talk

Miriam Posner on Digital Humanities in American Studies

"The Digital Humanities Lab and the Digital Humanities Working Group hosted a talk by Miriam Posner, Program Coordinator of Digital Humanities at UCLA on May 4 in the Hall of Graduate Studies. Posner..."

Talk

Statistical Analysis at the Birth of Close Reading

"The Digital Humanities Lab partnered with the Digital Humanities Working Group to bring Yohei Igarashi to campus for a talk titled Statistical Analysis at the Birth of Close Reading. The abstract..."

Dec

Workshop

Text Analysis Workshop with Matthew Jockers

Friday, December 5, 2014

3:30 - 6:30 pm

Bass Library, L01

"The Digital Humanities Lab sponsored a text analysis workshop with Matthew Jockers, Associate Professor of English at the University of Nebraska-Lincoln. Jockers based the workshop on the first few..."