In this small project, I provide R code to scrape news paper articles from the web archive. As an example, the script runs on faz.net news articles from 2018. Written with easy to read R code and outputs a tibble including all news articles and the accompanying meta data. Full scaling to other news papers is not easily possible, since the HTML tags used to container the articles differ heavily between the newspapers.
You can find my code here
The independent and non-commercial homepage Wahlrecht.de provides researchers, journalists and the interested public with information about all election results and the accompanying polling data from the biggest polling institutes in Germany. However, the data is not provided in an easy-to-use format, but still saved on a classical HTML homepage. From this, one is able to retrieve the data using web scraping and data cleansing methods. In this project, I show how to use the software R and the package rvest to scrape the polling data and create a nicely formatted data frame using simple R base.
You can find my code here
Privacy disclaimer: Do not scrape homepages when you are unsure whether the Terms of Service do not provide permission to extract information automatically. To see the relevance of this problem, please read the following blog entry. The author does not take any responsibilities when applying the code to any non-personal homepages.
The Left: Code
In 2017, I was pleased to participate at the CorrelAid Meet-Up in Hamburg. During this workshop, we welcomed several new members in our network and were able to develop new project ideas with four incredible NGOs from different fields of civil society. On the second day of our Meet-Up we launched several workshops in an open space session. I was invited to give a short introduction to Python and felt honored to give a two hour introduction to over 30 participants.
Since I believe that my code should be publicly available, please find my commented script (in German) here.
From March to September 2016 I was the team leader of our CorrelAid project with the Association of Debating Unions at Universities (VDCH) and the German Debating Society (DDG). VDCH and DDG are committed to promote an active debating culture at German universities and asists students to develop their talents and skills. They asked us to develop and implement a member survey to get to know their members, their demands and wishes. Building on these results they aimed to sharpen their organisational and content profile. Together with four project members, we developed a survey from scratch, implemented it online, pre-tested it and launched it to the members of the VDCH and DDG.
After data acquisition, we analyzed the results and developed a 90 pages full report on the results and implications from the survey, drafted a short report for the members and presented our results at the yearly full assembly of the VDCH. It was the first time that the debating unions received a full overview of their members and realized that their concept spread all over Germany, Austria and Switzerland. For me it was a great experience to coordinate a research team and generate scientific results which have a direct impact for civil society. I am also very proud on my project team, Mirka, Lisa, Thomas, and Fabienne, who did a great job during the project.