Announcement
NEWS
Find data for text mining
Find data for text mining
Monday, January 20, 2020
Are you tracking a word’s semantic change across multiple periodicals over many decades? Or maybe you’re looking to perform sentiment analysis or measure changes in word frequency. Do you know how to find and access the data you need?
Yale University Library added new tags to the Quicksearch catalog that make it easier to identify datasets for text and data mining projects. The format (XML, TIFF, etc.) and quality of the optical character recognition (OCR) varies widely, so we recommend starting with a sample issue once you identify a dataset that might work.
To locate newspapers and magazines that the Library licenses for current Yale students, faculty, and staff with an active NetID, add ‘yuldsetmediated’ to your search box in Quicksearch. You can then filter by fields such as language, subject region, or subject era to refine your results. To ask a question or arrange access to the data, email Research Data, and a librarian will follow up.
To identify transcripts, recordings, and other linguistic data, try searching with the more general ‘yuldsettxt’.
To find all datasets—including text, geospatial, numeric, and image data—use ‘yuldset’
For more information, visit the Text and Data Mining research guide.
RELATED NEWS
Spring 2023 DH Classes
Jan 09 2023
Looking for classes to take this spring? Yale will be offering more DH-related courses than ever. Here are some options that will help you learn Python and GIS, discover new...
Learn More »Welcoming Gavi Levy Haskell, Our New Developer
Nov 14 2022
The Yale Digital Humanities Lab (DHLab) is happy to announce that Gavi Levy Haskell has joined us as our new Digital Humanities Developer. Gavi has worked on digital humanities projects...
Learn More »Fall 2022 DH Classes
Sep 01 2022
Looking for classes to take this fall? Here are a few that will help you explore digital tools for Egyptian archaeology, consider how race and gender shape new technologies, learn...
Learn More »