Text mining Turkish newspapers
Text mining Turkish newspapers
Primary investigator
Mustafa Yavas, Graduate student, Sociology
Job description
This position is for web scraping op-eds from 15-20 Turkish newspapers, dating between April 2012 – September 2017. The aim is to create a database of op-eds that include 5 dimensions: author name, date, article URL, the op-ed title, and the op-ed text. The person who is hired will be expected to include documentation to go along with the web-scraping code.
Progress-to-date
There is already a Python script that was used to successfully scrape one newspaper. This code will serve as a benchmark script that will need adjustment for each particular newspaper’s website (particularly for the HTML parsing portion of the task). Scraping will be performed for each newspaper’s website separately.
Here is an example of the URLs to be scraped:
Qualifications
Applicants should ideally have a familiarity with:
- Python (R is also fine, as long as the applicant has web scraping experience in R)
- HTML (HTML parsing experience is a plus)
- Data formats like .csv
- Text encoding (ASCII, UTF, etc.)
- JavaScript is a plus, but not mandatory
Job details
Closing date: June 15, 2018
Payrate: $20/hour
Contact: Mustafa Yavas
This project is funded with a Digital Humanities Lab Seed Grant.
Fall 2023 DH Classes
Looking for classes to take this fall? Here are some that will help you explore lyric poetry with digital tools, use data visualizations to address environmental problems, study the intersection...
Learn More »Spring 2023 DH Classes
Looking for classes to take this spring? Yale will be offering more DH-related courses than ever. Here are some options that will help you learn Python and GIS, discover new...
Learn More »Welcoming Gavi Levy Haskell, Our New Developer
The Yale Digital Humanities Lab (DHLab) is happy to announce that Gavi Levy Haskell has joined us as our new Digital Humanities Developer. Gavi has worked on digital humanities projects...
Learn More »