Harnessing the Untapped Benefits of Near-Sortedness for Data Systems - Manos Athanassoulis (Boston University)

Archimedes_Talk__Athanassoulis___21_October
Dates
2025-10-21 13:00 - 14:00
Venue
Archimedes Amphitheatre (1 Artemidos Street, 15125, Marousi, Archimedes, Athena Research Center, Greece)


Title
: Harnessing the Untapped Benefits of Near-Sortedness for Data Systems

Speaker: Manos Athanassoulis (Associate Professor of Computer Science at Boston University (BU), USA, Director and Founder of the BU Data-intensive Systems and Computing Laboratory, BU, USA, and Co-Director of the BU Massive Data Algorithms and Systems Group, BU, USA)

Abstract: Two of the most common concepts in data processing is sorting and indexing. In fact, one can consider indexing as the process of adding structure (e.g., sorting) an otherwise unstructured data collection, which comes at a cost (ingestion cost to create the sorted instance of the data), which is worth-spending due to future benefits. What happens when the incoming data have some pre-existing structure (a degree of “near-sortedness”)? This can happen by virtue of the dataset (e.g., indexing timestamps of aggregated sensors with a small lag, storing mostly increasing data like stock market values), the operation (operating on previously fully sorted data that receives a number of updates, intermediate result of another operator that created near-sorted data), or data correlation (operating on a column correlated with the sort attribute of a table). Traditional indexes are not designed to exploit near-sortedness and in most cases pay the same cost as classical ingestion (as if the data has no structure). In this work we argue that a “sortedness-aware” index should offer increasingly cheaper ingestion cost for “more sorted” data without hurting read performance or any other aspect of data processing.

We present the first sortedness-aware tree index designs. The first uses smart buffering, partial bulk loading, query-driven sorting, and variable split ratio to achieve remarkable speedup for near-sorted data (10x), while we next show that we can have even better results by radically simplify the design maintaining minimal additional state to classical tree indexes and a lightweight predictor of which node to insert to next. If time permits, I will discuss some more open questions on near-sortedness in conjunction with learned indexes, LSM trees, and join processing.

Short Bio: Manos Athanassoulis is an Associate Professor of Computer Science at Boston University, Director and Founder of the BU Data-intensive Systems and Computing Laboratory, and co-director of the BU Massive Data Algorithms and Systems Group. He also spent a summer as a Visiting Faculty at Meta. His research is in the area of data management, focusing on building data systems that efficiently exploit modern hardware (computing units, storage, and memories), are deployed in the cloud, and can adapt to the workload both at setup time and dynamically, at runtime. Before joining Boston University, Manos was a postdoc at Harvard University, USA. Earlier, he obtained his PhD from EPFL, Switzerland, and spent one summer at IBM Research, Watson. Manos’ work has been recognized by awards like “Best of SIGMOD” in 2016, “Best of VLDB” in 2010 and 2017, “Most Reproducible Paper” at SIGMOD in 2017, "Best Demo" for VLDB 2023, and "Distinguished PC Member" for SIGMOD 2018, 2023, 2024, 2025 and EDBT 2025, and has been supported by multiple NSF grants including an NSF CRII and an NSF CAREER award, and industry funds including a Facebook Faculty Research Award, multiple Red Hat Research Incubation Awards and gifts from Cisco, Red Hat, and Meta.

He is currently serving as ACM SIGMOD Secretary/Treasurer 2025-2029 and has served or serving as Associate Edtior for ACM SIGMOD Record, ACM SIGMOD Availability and Reproducibility Co-Chair (2021, 2022, 2023, 2024, 2025), VLDB Ambassador for Industry Relations (2022, 2023, 2024), Industrial Track Co-chair for ICWE 2024, Proceedings Chair for VLDB 2023, Area Chair for ACM SIGMOD 2026, IEEE ICDE 2026, VLDB 2025, and IEEE ICDE 2022, Publicity Chair for VLDB 2022 and IEEE ICDE 2021, and as a PC member on multiple top data management venues.

________________________________________________________________________________

Microsoft Teams 
Meeting ID: 361 396 317 392 1
Passcode: ww9k7JE3

 
 
Mon Tue Wed Thu Fri Sat Sun
1
2
3
4
5
6
7
9
10
11
12
16
17
18
19
20
22
23
24
25
26
27
28
29
30
31
 
 

The project “ARCHIMEDES Unit: Research in Artificial Intelligence, Data Science and Algorithms” with code OPS 5154714 is implemented by the National Recovery and Resilience Plan “Greece 2.0” and is funded by the European Union – NextGenerationEU.

greece2.0 eu_arch_logo_en

 

Stay connected! Subscribe to our mailing list by emailing sympa@lists.athenarc.gr
with the subject "subscribe archimedes-news Firstname LastName"
(replace with your details)