2019. szeptember 15. - Kacsuk Zoltán oktatást tart a Japánban megrendezendő POLTEXT konferencia “Tutorial Day”-én

Kacsuk Zoltán tudományos fokozatát a kiotói Kyoto Seika University-n szerezte. Jelenleg a Japanese Visual Media Graph project posztdoktori kutatója a Stuttgart Media University Alkalmazott Mesterséges Intelligencia Tanszékén, illetve a Társadalomtudományi Kutatóközpont Politikatudományi Intézete Kormányzás és Közpolitika Osztályának tudományos munkatársa.


Az előadás leírása angolul a következő:

earning with Apache Spark
Zoltán Kacsuk

Apache Spark is currently one of the most popular open-source cluster-computing frameworks. With its Machine Learning Library (MLlib) it supports the easy scaling of a range of feature extraction and machine learning tasks commonly employed in text mining. Furthermore, it works with both Python and R.

The tutorial will first cover the basics of using an Apache Spark cluster for text mining and machine learning, and will then provide a walk-through of the text classification solution developed within the framework of the Hungarian leg of the Comparative Agendas Project – with the support of the MTA SZTAKI Cloud team – as a use case example of the possibilities opened up by the increased speed offered by parallel computing.

The tutorial will address among other things: a) configuring the Apache Spark cluster, b) using a Hadoop Distributed File System with the cluster, c) operating the cluster via an RStudio Server and sparklyr (the Spark interface for R developed by RStudio), and d) the differences in available functionality of the Machine Learning Library for sparklyr, SparkR (the R API developed by Apache Spark) and PySpark (the Python API for Spark).

For the participants

The easiest way to follow along with the tutorial will be to have an activated Google Cloud account (the free trial version will be enough). Another option is to have an Ubuntu 18.04 system on a multi-core/cpu (4+ cores/cpus) machine available either on hand or remotely via ssh with root privileges and internet connectivity.