Announcement
NEWS
Find data for text mining
Find data for text mining
Monday, January 20, 2020
Are you tracking a word’s semantic change across multiple periodicals over many decades? Or maybe you’re looking to perform sentiment analysis or measure changes in word frequency. Do you know how to find and access the data you need?
Yale University Library added new tags to the Quicksearch catalog that make it easier to identify datasets for text and data mining projects. The format (XML, TIFF, etc.) and quality of the optical character recognition (OCR) varies widely, so we recommend starting with a sample issue once you identify a dataset that might work.
To locate newspapers and magazines that the Library licenses for current Yale students, faculty, and staff with an active NetID, add ‘yuldsetmediated’ to your search box in Quicksearch. You can then filter by fields such as language, subject region, or subject era to refine your results. To ask a question or arrange access to the data, email Research Data, and a librarian will follow up.
To identify transcripts, recordings, and other linguistic data, try searching with the more general ‘yuldsettxt’.
To find all datasets—including text, geospatial, numeric, and image data—use ‘yuldset’
For more information, visit the Text and Data Mining research guide.
RELATED NEWS
Spring 2024 DH Classes
Jan 18 2024
Looking for classes to take this spring? Here are some options that will help you explore Latinx and East Asian digital media cultures, learn about technologies for archaeological imaging, gain...
Learn More »Fall 2023 DH Classes
Sep 06 2023
Looking for classes to take this fall? Here are some that will help you explore lyric poetry with digital tools, use data visualizations to address environmental problems, study the intersection...
Learn More »Spring 2023 DH Classes
Jan 09 2023
Looking for classes to take this spring? Yale will be offering more DH-related courses than ever. Here are some options that will help you learn Python and GIS, discover new...
Learn More »