Whenever possible, the Yale University Library works with database vendors to include text and data mining (TDM) rights in license agreements. This means that for some databases (generally those with out-of-copyright materials), Yale researchers can access the raw text for the purposes of data analysis. Vendors that currently permit text and data mining on specific collections include: Adam Matthew, Gale, ProQuest, and others.
To locate newspapers and magazines that the Library licenses for current Yale students, faculty, and staff with an active NetID, go to Quicksearch and add ‘yuldsetmediated’ in the search box. You can then filter by fields such as language, subject region, or subject era to refine your results. To ask a question or arrange access to the data, email Research Data, and a librarian will follow up.
To identify transcripts, recordings, and other linguistic data, try searching in Quicksearch with the more general ‘yuldsettxt’.
To find all datasets—including text, geospatial, numeric, and image data—use ‘yuldset’.
For additional information on licensed data, please visit the Text and Data Mining LibGuide.
What about material that hasn’t been digitized yet? The Digital Humanities Lab can provide tools for the creation of digital corpora for text and data mining purposes (rather than for preservation or personal archives) to researchers with current DHLab awards. For longer duration or all-purpose scanning, researchers should use the machines in the lower level of Bass Library.