About the Project

“Text as data” methodology gained traction in international political science, legal studies, and public administration over the past decade. Nevertheless, it has not gained a foothold in Hungarian political methodology. The project “Text Mining and the Quantitative Analysis of Political and Legal Texts (POLTEXT)” fills this gap, and aims to provide an interdisciplinary platform for researchers of political, legal, and administrative fields who use quantitative text analysis as their core method.

That’s one reason that the project will rely on shared methods rather than a single overarching theoretical framework. As for the methods, the researcher Miklós Sebők, who leads the project, uses quantitative text analysis and text mining. Some of his methods include dictionary-based algorithms and supervised and unsupervised learning. Besides, POLTEXT encompasses projects related to multimedia mining, data storage and retrieval, big data clustering techniques, and tools for data visualization.

The approach matters for two reasons. First, it empowers academia. The project fosters multiple collaborative research projects through workshops, ready-to-use databases, and by establishing best practices to overcome methodological issues related to text mining - some of them include language-specific issues or contain unstructured data sources. It also aims to make text mining methods part of research designs in political science, legal studies, and public administration. At MTA TK, the research team will allow junior and senior researchers to cooperate in long-term projects that tackle social issues with innovative methods. Another reason that the project is relevant is societal: by making text databases available, the project reveals how the political process works, contributing to increased transparency.

Learn more!

CAP Babel Machine

Data mining textbook (in Hungarian)