[Archimedes Talks Series] Scaling Data Science, For All
Dates
2024-10-11 12:00 - 14:00
Venue
Artemidos 1 - Amphitheater
Title: Scaling Data Science, For All
Abstract: Scaling out data science workloads is critical for getting
results on ancient texts, climate change, or vaccine development in
minutes or hours—not weeks or years. This kind of scaleout, however,
requires not only expertise on scalable computing paradigms but also
burdensome effort to manually scale out each computation. In this talk
I will present a family of software systems automatically scaling out
data science workloads. Central among these systems are a just-in-time
compiler that blends static pre-processing with dynamic interposition,
a high-level component specification framework, and a collection of
high-performance runtime primitives that support parallel and
distributed program execution. Combined, they achieve
order-of-magnitude speedups on unmodified data-science workloads,
offer strong correctness and compatibility guarantees, and remain
virtually indistinguishable from (and require no modifications to) the
underlying language runtime. These systems have received multiple
awards, are worked on by several institutions, and are open-source
software available by the Linux Foundation.
Bio: Nikos Vasilakis is an Assistant Professor of Computer Science at
Brown University and an Affiliate with Brown's Data Science Institute.
His research includes software systems, programming languages, and
security — and has been recognized by several distinguished paper
awards. His current focus is on automatically transforming systems to
add new capabilities such as parallelism, distribution, and security —
against a variety of threat models. Prof. Vasilakis is also the chair of the
Technical Steering Committee behind PaSh, a shell-script optimization
system hosted by the Linux Foundation. More: https://nikos.vasilak.is
and https://atlas.cs.brown.edu
Abstract: Scaling out data science workloads is critical for getting
results on ancient texts, climate change, or vaccine development in
minutes or hours—not weeks or years. This kind of scaleout, however,
requires not only expertise on scalable computing paradigms but also
burdensome effort to manually scale out each computation. In this talk
I will present a family of software systems automatically scaling out
data science workloads. Central among these systems are a just-in-time
compiler that blends static pre-processing with dynamic interposition,
a high-level component specification framework, and a collection of
high-performance runtime primitives that support parallel and
distributed program execution. Combined, they achieve
order-of-magnitude speedups on unmodified data-science workloads,
offer strong correctness and compatibility guarantees, and remain
virtually indistinguishable from (and require no modifications to) the
underlying language runtime. These systems have received multiple
awards, are worked on by several institutions, and are open-source
software available by the Linux Foundation.
Bio: Nikos Vasilakis is an Assistant Professor of Computer Science at
Brown University and an Affiliate with Brown's Data Science Institute.
His research includes software systems, programming languages, and
security — and has been recognized by several distinguished paper
awards. His current focus is on automatically transforming systems to
add new capabilities such as parallelism, distribution, and security —
against a variety of threat models. Prof. Vasilakis is also the chair of the
Technical Steering Committee behind PaSh, a shell-script optimization
system hosted by the Linux Foundation. More: https://nikos.vasilak.is
and https://atlas.cs.brown.edu
________________________________________________________________________________
Meeting ID: 382 245 166 066
Passcode: MtsSgy
For organizers: Meeting
options
________________________________________________________________________________