Nesto Software GmbH | Data Scientist | R & Python User
e-Mail: konstantin@gavras.de
Follow @kongavras Follow @KostaGavHome | Research | Projects | Contact |
In this small project, I provide R code to scrape news paper articles from the web archive. As an example, the script runs on faz.net news articles from 2018. Written with easy to read R code and outputs a tibble including all news articles and the accompanying meta data. Full scaling to other news papers is not easily possible, since the HTML tags used to container the articles differ heavily between the newspapers.
You can find my code here
The independent and non-commercial homepage Wahlrecht.de provides researchers, journalists and the interested public with information about all election results and the accompanying polling data from the biggest polling institutes in Germany. However, the data is not provided in an easy-to-use format, but still saved on a classical HTML homepage. From this, one is able to retrieve the data using web scraping and data cleansing methods. In this project, I show how to use the software R and the package rvest to scrape the polling data and create a nicely formatted data frame using simple R base.
You can find my code here
Press releases have been just recently been discovered as important source of information to determine the ideological positions of political parties and candidates. However, this data is often not preprocessed adequately and has to be extracted by researchers from the respective homepages of the political parties. In order to train my Python skills, I decided to write a Python script that automatically scrapes all press releases from the six big political parties in the German Bundestag. However, due to the particularities in the respective homepages (different relative paths, HTML and JavaScripted homepages), it is not possible to create a script running over all homepages at once. Since the press releases are still quite messy after scraping them, I supplement my code with some pre-processing steps in Python and create easy-to-use tsv-files from the scraped and pre-processed press releases.
Privacy disclaimer: Do not scrape homepages when you are unsure whether the Terms of Service do not provide permission to extract information automatically. To see the relevance of this problem, please read the following blog entry. The author does not take any responsibilities when applying the code to any non-personal homepages.
CDU/CSU: Code
SPD: Code
FDP: Code
Greens: Code
The Left: Code
AfD: Code
In 2017, I was pleased to participate at the CorrelAid Meet-Up in Hamburg. During this workshop, we welcomed several new members in our network and were able to develop new project ideas with four incredible NGOs from different fields of civil society. On the second day of our Meet-Up we launched several workshops in an open space session. I was invited to give a short introduction to Python and felt honored to give a two hour introduction to over 30 participants.
Since I believe that my code should be publicly available, please find my commented script (in German) here.
From March to September 2016 I was the team leader of our CorrelAid project with the Association of Debating Unions at Universities (VDCH) and the German Debating Society (DDG). VDCH and DDG are committed to promote an active debating culture at German universities and asists students to develop their talents and skills. They asked us to develop and implement a member survey to get to know their members, their demands and wishes. Building on these results they aimed to sharpen their organisational and content profile. Together with four project members, we developed a survey from scratch, implemented it online, pre-tested it and launched it to the members of the VDCH and DDG.
After data acquisition, we analyzed the results and developed a 90 pages full report on the results and implications from the survey, drafted a short report for the members and presented our results at the yearly full assembly of the VDCH. It was the first time that the debating unions received a full overview of their members and realized that their concept spread all over Germany, Austria and Switzerland. For me it was a great experience to coordinate a research team and generate scientific results which have a direct impact for civil society. I am also very proud on my project team, Mirka, Lisa, Thomas, and Fabienne, who did a great job during the project.
If you want to learn more about our project, please read the interview I gave the VDCH, or read our short report online.