Enhancing Entity Resolution and Retrieval through Dataset Decomposition - Yannis Velegrakis (Utrecht University)

Title: Enhancing Entity Resolution and Retrieval through Dataset Decomposition
Speaker: Prof. Yannis Velegrakis (Utrecht University, The Netherlands)
Abstract: Any traditional data management process is typically applied to some dataset. We advocate that slicing the dataset and treating each slice differently can lead to better performance in a number of different scenarios. We present two works where this principle has been applied and we demonstrate its effectiveness. First, we show that even for short documents (forum posts), slicing allows for better discovery of documents related to a post at hand. Second, we illustrate how entity resolution can benefit from slicing. It is known that different entity resolution algorithms perform better on different datasets. We bring this idea within a dataset. We first slice the dataset and then select for each slice the method that performs best for each slice. Doing so leads to an improved overall performance of the entity resolution process. The two main challenges in this approach are to decide how to do the slicing, and how to select the resolution method that is best for a slice.
Short Biography: Yannis Velegrakis is a Computer Science professor at Utrecht University (Netherlands) where he holds the chair on Very Large Data Management, heads the Data Intensive Systems Group, and leads the Master’s programme in Data Science. His research area of expertise includes Data Preparation and Curation, Data Quality, Big Data Management, Knowledge Engineering, Graph Management, and Highly Heterogeneous Information Integration. He holds a PhD degree in Computer Science from the University of Toronto. He has been a professor at the University of Trento and a researcher at the AT&T Research Labs. He is also a PI at the Archimedes Unit of the Athena Research Center. He has spent time for research work at IBM Almaden Research Center, the Huawei European Research Center in Munich, the Center of Advanced Studies of the IBM Toronto Lab, the University of California, Santa-Cruz, and the University of Paris-Saclay. He has been the general chair of VLDB 2013 and ICDE 2024, the PC Chair of EDBT 2021, and area chair in multiple VLDB, SIGMOD, ICDE, and EDBT Conferences. He is currently serving on the board of the EDBT Association (as president), on the VLDB Board of Trustees, on the SIKS Research School board, and as associate editor for Systems on the SIGMOD Record editorial team.