Text resources for integration and migration research (MigTex)

DeZIM Research Community

Running time November 2017 until February 2020
Status Completed project

Project team

Principal investigator: Prof. Dr. Andreas Blätte

Project team members: Christoph Leonhardt (InZentIM), Merve Schmitz-Vardar (InZentIM), Florian Gilberg (InZentIM), Kevin Glock (SHK, InZentIM)


Project description:

The aim of the project was to identify text resources (corpora, dictionaries, annotations, training data) relevant for migration and integration research, to transfer them into sustainable formats and to make them available for research. Methodologically, the project was located in the field of eHumanities or Computational Social Sciences. It was carried out in cooperation between the University of Duisburg-Essen (Interdisciplinary Centre for Integration and Migration Research / InZentIM) and the WZB Berlin Social Science Centre. "MigTex" was based on the infrastructure, the code base and the objectives of the PolMine project. As a structural project, "MigTex" contributes to the research data infrastructure of the German Centre for Migration and Integration Research (DeZIM).

A central interest of the project was the sustainable provision of text resources for migration and integration research. Converting these text resources into sustainable formats (eXtended Markup Language / XML) serves the purpose of reproducibility and thus leads to more transparency in research. Time-consuming duplication of work can be avoided. The development of relevant corpora, dictionaries and annotations opens up potential for new research approaches and research projects in migration and integration research. To this end, it is necessary both to develop new procedures in the sustainable processing of text resources and to expand the "classic" qualitative and quantitative approaches to content analysis with procedures from the eHumanities or Computational Social Sciences.

The project's work plan was to identify relevant resources and develop standardisations. Initially, public domain working materials (parliamentary and government communications) were processed. However, the project also had the opportunity to access licence-protected materials. The project thus focused on improving the accessibility and dissemination of the developed methods and the processed data. To this end, the provision of an analysis environment based on the existing R package 'polmineR' was just as central as the creation of tutorials and workshops (November 2018 and September 2019), which served to communicate the available data and methods.

Since the end of the project, the data has been available to scientific users in a way that has been clarified in terms of licensing law. An analysis environment can be used that is open source and allows a technically low-threshold use of these data through adequate documentation, tutorials and training materials. This transparency and user feedback mechanisms serve to ensure the quality of the project.

Participating partners: Interdisciplinary Centre for Integration and Migration Research (InZentIM), WZB Berlin Social Science Center

Funding: Federal Ministry of Family Affairs, Senior Citizens, Women and Youth (Third-party funding)