Corpus creation is the process of building a dataset. For a digital humanities project, this often entails either finding a collection of texts or images online or digitizing physical holdings.
You have a digitized corpus, now what? The answer to this depends on what your data looks like and what your research questions are. For a few possibilities:
Digital Humanities Lab staff can advise on strategies for building and cleaning your corpus during our weekly Office Hours. We also regularly offer workshops that are relevant to corpus creation. Visit our Workshops page to learn about what’s coming up and our GitHub page for tutorials from past sessions. For information on the use of our scanners for corpus creation, as well as information on databases that are already available for text and data mining, please visit our Data Resources page.
If you're new to digital humanities and are interested in starting a project, stop by the Digital Humanities Lab in Sterling Memorial Library, room 316 during our Tuesday or Wednesday Office Hours.
We also highly recommend looking at existing digital humanities projects to get a sense for what's possible. In addition to projects at Yale, we recommend checking out projects at other digital humanities centers, including:
In addition to on-campus support, there are also off-campus and online resources that you might try. The following programs all offer opportunities for researchers to learn different digital humanities methods and theoretical approaches:What we offer