2025 Archimedes Computer Vision Day

Agenda
10:00 - 10:20:
Welcome
10:20 - 10:30:
Opening Remarks
Kostas Daniilidis, University of Pennsylvania (UPenn), USA, and Archimedes, Athena Research Center, Greece
10:30 - 11:30
Talk 1 - "Towards In-the-Wild Understanding of 3D Human-Object Interactions"
Dimitris Tzionas, Assistant Professor for 3D Computer Vision at the University of Amsterdam (UvA), Netherlands
Abstract: People constantly interact with objects to perform tasks. To help people accomplish these, computers need to perceive Human-Object Interactions (HOI), and for this, they need to reconstruct HOI from whole-body color images of people interacting with objects or scenes. This is challenging, due to the occlusions between bodies and objects, motion blur, depth ambiguities, and the low image resolution of hands and graspable object parts. There has been significant prior work on estimating 3D humans without considering objects, and estimating 3D objects without considering humans. Little prior work estimates these jointly, but, for tractability, focuses either on interacting hands, ignoring the body, or on interacting bodies, ignoring hands. Only recent work addresses dexterous interaction of whole bodies, but instruments bodies with intrusive markers or sensors, and uses non-standard cameras to capture video of interactions. Moreover, reconstruction lacks hand detail that is crucial for grasping, and videos are captured in constrained settings, consequently, methods trained on these struggle generalizing. Instead, we need to infer HOI from natural whole-body images/videos. In this talk we will discuss several methods to this end. Specifically, we will discuss methods to estimate 3D contact from monocular color images, as well as methods to estimate 3D HOI while exploiting contact. Moreover, we will discuss methods for recovering 3D object pose and shape under strong occlusions. Last, time permitting, we will also discuss generating 3D HOI through a controllable and efficient method.
📝 Microsoft Teams link:
11:30 - 11:40:
Q&A Session on Talk 1
11:40 - 12:00:
Short Break
12:00 - 13:00
Talk 2 - "From Saliency to Scanpaths: 20 years of Wandering Eyes"
Dimitris Samaras, SUNY Empire Innovation Professor of Computer Science with Stony Brook AI Institute, NY, USA
Abstract: This talk will start with an overview of the connections between Human Vision and Computer Vision. The connections will be discussed through a number of questions about how knowledge advances in each of those fields can help the other. The main part of the talk will discuss how current deep learning architectures from Reinforcement Learning to Transformers, can be leveraged to predict human gaze scanpaths when subjects search for a objects of known categories in an image. All such architectures require significant amounts of data which in this case are difficult to obtain. Thus the talk will explore how to scale gaze prediction both in the number of subjects and in the number of categories. The advent of large Vision Language Models (VLMs) has opened a new way to study gaze related questions and the talk will present some initial findings. The talk will conclude with applications of gaze prediction in graphic designs and medical images.
📝 Microsoft Teams link:
13:00 - 13:10:
Q&A Session on Talk 2
13:10 - 14:00:
Afternoon Break
14:00 - 15:00:
Talk 3 - "Efficient Brains that Imagine"
Vicky Kalogeiton, Professor in AI at the Computer Science Laboratory (LIX) of École Polytechnique, Paris, France
Abstract: Intelligent robots do not just respond to commands—they imagine what you meant; what you wanted; what you believed. And they do this while learning from very little, and running on a chip in your living room. In this talk, I will present recent advances in generative modeling that aim to equip embodied agents with efficient “brains” that can imagine possible futures, infer intent, and generate actions under uncertainty. First, I will show how generative models can be trained to understand the world with minimal supervision, using examples such as text-to-image generation from ImageNet and geolocation. These works demonstrate how far we can go with small datasets and structured training objectives—an essential requirement for real-world robotics. I will then turn to the challenge of controllability and intent understanding. Through trajectory generation conditioned on character-centric text and optical control over camera rays, I will illustrate how generative models can map inferred intentions to expressive, goal-directed actions. Underlying this is the need for temporal and semantic coherence, addressed by coherence-aware training methods that reduce model size while improving consistency.
📝 Microsoft Teams link:
15:00 - 15:10:
Q&A Session on Talk 3
15:10 - 15:20:
Closing Remarks
Kostas Daniilidis, University of Pennsylvania (UPenn), USA, and Archimedes, Athena Research Center, Greece
Organizing Committee
Kostas Daniilidis, Professor of Computer Science, and holds the Ruth Yalom Stone Chair at the University of Pennsylvania (UPenn), USA, and Lead Researcher at Archimedes, Athena Research Center, Greece
Vasiliki Vasileiou, PhD student at the School of Electrical and Computer Engineering, National and Technical University of Athens (NTUA), Greece, and Academic scholar at Archimedes, Athena Research Center, Greece