2025 Archimedes Computer Vision Day

2025_Archimedes_Computer_Vision_Day_2
Dates
2025-07-22 10:00 - 15:40
Venue
Archimedes Amphitheatre, 1 Artemidos Street, 15125, Marousi, Archimedes, Athena Research Center, Athens, Greece

 

Details will follow

Keynote Speakers:

  • Dimitris Tzionas, UVA, NL
  • Dimitris Samaras, Stony Brook AI Institute, NY, USA
  • Vicky Kalogeiton, École Polytechnique, FR


Agenda


10:00 - 10:20:
Welcome

10:20 - 10:30:
Opening Remarks
Kostas Daniilidis, University of Pennsylvania (UPenn), USA, and Archimedes, Athena Research Center, Greece

10:30 - 11:30
Talk 1 - "Towards In-the-Wild Understanding of 3D Human-Object Interactions"
Dimitris TzionasAssistant Professor for 3D Computer Vision at the University of Amsterdam (UvA), Netherlands

Abstract: People constantly interact with objects to perform tasks. To help people accomplish these, computers need to perceive Human-Object Interactions (HOI), and for this, they need to reconstruct HOI from whole-body color images of people interacting with objects or scenes. This is challenging, due to the occlusions between bodies and objects, motion blur, depth ambiguities, and the low image resolution of hands and graspable object parts. There has been significant prior work on estimating 3D humans without considering objects, and estimating 3D objects without considering humans. Little prior work estimates these jointly, but, for tractability, focuses either on interacting hands, ignoring the body, or on interacting bodies, ignoring hands. Only recent work addresses dexterous interaction of whole bodies, but instruments bodies with intrusive markers or sensors, and uses non-standard cameras to capture video of interactions. Moreover, reconstruction lacks hand detail that is crucial for grasping, and videos are captured in constrained settings, consequently, methods trained on these struggle generalizing. Instead, we need to infer HOI from natural whole-body images/videos. In this talk we will discuss several methods to this end. Specifically, we will discuss methods to estimate 3D contact from monocular color images, as well as methods to estimate 3D HOI while exploiting contact. Moreover, we will discuss methods for recovering 3D object pose and shape under strong occlusions. Last, time permitting, we will also discuss generating 3D HOI through a controllable and efficient method.

11:30 - 12:00:
Break

12:00 - 13:00
Talk 2 - "From Saliency to Scanpaths: 20 years of Wandering Eyes"
Dimitris SamarasSUNY Empire Innovation Professor of Computer Science with Stony Brook AI Institute, NY, USA

Abstract: This talk will start with an overview of the connections between Human Vision and Computer Vision. The connections will be discussed through a number of questions about how knowledge advances in each of those fields can help the other. The main part of the talk will discuss how current deep learning architectures from Reinforcement Learning to Transformers, can be leveraged to predict human gaze scanpaths when subjects search for a objects of known categories in an image. All such architectures require significant amounts of data which in this case are difficult to obtain. Thus the talk will explore how to scale gaze prediction both in the number of subjects and in the number of categories. The advent of large Vision Language Models (VLMs) has opened a new way to study gaze related questions and the talk will present some initial findings. The talk will conclude with applications of gaze prediction in graphic designs and medical images.

13:00 - 14:00:
Break

14:00 - 15:00:
Talk 3 - "Efficient Brains that Imagine" 
Vicky Kalogeiton, Professor in AI at the Computer Science Laboratory (LIX) of École Polytechnique, Paris, France 

Abstract: Intelligent robots do not just respond to commands—they imagine what you meant; what you wanted; what you believed. And they do this while learning from very little, and running on a chip in your living room.  In this talk, I will present recent advances in generative modeling that aim to equip embodied agents with efficient “brains” that can imagine possible futures, infer intent, and generate actions under uncertainty. First, I will show how generative models can be trained to understand the world with minimal supervision, using examples such as text-to-image generation from ImageNet and geolocation. These works demonstrate how far we can go with small datasets and structured training objectives—an essential requirement for real-world robotics. I will then turn to the challenge of controllability and intent understanding. Through trajectory generation conditioned on character-centric text and optical control over camera rays, I will illustrate how generative models can map inferred intentions to expressive, goal-directed actions. Underlying this is the need for temporal and semantic coherence, addressed by coherence-aware training methods that reduce model size while improving consistency.

15:00 - 15:10:
Closing Remarks


Organizing Committee

Kostas Daniilidis
, Professor of Computer Science, and holds the Ruth Yalom Stone Chair at the University of Pennsylvania (UPenn), USA, and Lead Researcher at Archimedes, Athena Research Center, Greece
Vasiliki Vasileiou
, PhD student at the School of Electrical and Computer Engineering, National and Technical University of Athens (NTUA), Greece, and Academic scholar at Archimedes, Athena Research Center, Greece
TBC

 
 
 
Mon Tue Wed Thu Fri Sat Sun
7
8
9
15
18
26
27
28
29
30
 
 

The project “ARCHIMEDES Unit: Research in Artificial Intelligence, Data Science and Algorithms” with code OPS 5154714 is implemented by the National Recovery and Resilience Plan “Greece 2.0” and is funded by the European Union – NextGenerationEU.

greece2.0 eu_arch_logo_en

 

Stay connected! Subscribe to our mailing list by emailing sympa@lists.athenarc.gr
with the subject "subscribe archimedes-news Firstname LastName"
(replace with your details)