Publications | Éloi Zablocki

2025

VaViM and VaVAM: Autonomous Driving through Video Generative Modeling

Florent Bartoccioni, Elias Ramzi, Victor Besnier, Shashanka Venkataramanan, Tuan-Hung Vu, Yihong Xu, Loick Chambon, Spyros Gidaris, Serkan Odabas, David Hurych, Renaud Marlet, Alexandre Boulch, Mickael Chen, Éloi Zablocki, Andrei Bursuc, Eduardo Valle, and Matthieu Cord

preprint, 2025

Learning to drive from YouTube. VaViM, a 1.2B parameter video generative model trained on 1,800+ hours of raw YouTube driving videos, enables VaVAM, a video-action model that achieves state-of-the-art results on the NeuroNCAP driving benchmark.

arXiv Bib Code
@article{bartoccioni2025vavim-vavam, title = {{VaViM and VaVAM}: Autonomous Driving through Video Generative Modeling}, author = {Bartoccioni, Florent and Ramzi, Elias and Besnier, Victor and Venkataramanan, Shashanka and Vu, Tuan-Hung and Xu, Yihong and Chambon, Loick and Gidaris, Spyros and Odabas, Serkan and Hurych, David and Marlet, Renaud and Boulch, Alexandre and Chen, Mickael and Zablocki, Éloi and Bursuc, Andrei and Valle, Eduardo and Cord, Matthieu}, journal = {preprint}, year = {2025} }
GaussRender: Learning 3D Occupancy with Gaussian Rendering

Loick Chambon, Éloi Zablocki, Alexandre Boulch, Mickaël Chen, and Matthieu Cord

In ICCV, 2025

A plug-and-play 3D-to-2D projective consistency loss using Gaussian splatting enhances 3D semantic occupancy learning from multiple cameras.

arXiv Bib Code
@inproceedings{chambon2025gaussrender, title = {{GaussRender}: Learning 3D Occupancy with Gaussian Rendering}, author = {Chambon, Loick and Zablocki, Éloi and Boulch, Alexandre and Chen, Mickaël and Cord, Matthieu}, booktitle = {ICCV}, year = {2025} }
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers

Éloi Zablocki^*, Valentin Gerard^*, Amaia Cardiel, Eric Gaussier, Matthieu Cord, and Eduardo Valle

under review, 2025

A framework for generating global, interpretable textual explanations of vision classifiers, combining counterfactual visual explanations with VLMs and LLMs.

arXiv Bib Code
@article{zablocki2025gift, title = {{GIFT}: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers}, author = {Zablocki, Éloi and Gerard, Valentin and Cardiel, Amaia and Gaussier, Eric and Cord, Matthieu and Valle, Eduardo}, journal = {under review}, year = {2025} }
PPT: Pre-Training with Pseudo-Labeled Trajectories for Motion Forecasting

Yihong Xu, Yuan Yin, Éloi Zablocki, Tuan-Hung Vu, Alexandre Boulch, and Matthieu Cord

under review, 2025

Pre-training with pseudo-labeled trajectories, obtained with offline 3D-trackers, boosts trajectory prediction models: improved performance, efficiency, and generalization.

arXiv Bib Code
@article{xu2025ppt, title = {{PPT}: Pre-Training with Pseudo-Labeled Trajectories for Motion Forecasting}, author = {Xu, Yihong and Yin, Yuan and Zablocki, Éloi and Vu, Tuan-Hung and Boulch, Alexandre and Cord, Matthieu}, journal = {under review}, year = {2025} }
Annealed Winner-Takes-All for Motion Forecasting

Yihong Xu, Victor Letzelter, Mickaël Chen, Éloi Zablocki, and Matthieu Cord

In ICRA, 2025

Using an annealing loss enhances training stability and performance of state-of-the-art trajectory prediction models.

arXiv Bib Video Code
@inproceedings{xu2025awta, title = {Annealed Winner-Takes-All for Motion Forecasting}, author = {Xu, Yihong and Letzelter, Victor and Chen, Mickaël and Zablocki, Éloi and Cord, Matthieu}, booktitle = {ICRA}, year = {2025} }
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension

Amaia Cardiel, Éloi Zablocki, Oriane Simeoni, Elias Ramzi, and Matthieu Cord

In ICLR, 2025

LLMs can learn to adapt black-box VLMs for new tasks and domains, by wrapping and reasoning on the vision models’ outputs.

arXiv Bib Code
@inproceedings{cardiel2025llm-wrapper, title = {{LLM-wrapper}: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension}, author = {Cardiel, Amaia and Zablocki, Éloi and Simeoni, Oriane and Ramzi, Elias and Cord, Matthieu}, booktitle = {ICLR}, year = {2025} }

2024

ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable

Yuan Yin, Pegah Khayatan, Éloi Zablocki, Alexandre Boulch, and Matthieu Cord

In ECCV Workshop W-CODA, 2024

ReGentS generates safety-critical driving scenarios with adversarial optimization of real-world trajectories.

arXiv Bib Code

@inproceedings{yin2024regents,
  title = {{ReGentS}: Real-World Safety-Critical Driving Scenario Generation Made Stable},
  author = {Yin, Yuan and Khayatan, Pegah and Zablocki, Éloi and Boulch, Alexandre and Cord, Matthieu},
  booktitle = {ECCV Workshop W-CODA},
  year = {2024}
}

Valeo4Cast: A Modular Approach to End-to-End Forecasting

Yihong Xu^*, Éloi Zablocki^*, Alexandre Boulch^*, Gilles Puy, Mickael Chen, Florent Bartoccioni, Nermin Samet, Oriane Siméoni, Spyros Gidaris, Tuan-Hung Vu, Andrei Bursuc, Eduardo Valle, Renaud Marlet, and Matthieu Cord

In ECCV Workshop ROAD++, 2024

Using separate training and fine-tuning of detection, tracking, and forecasting modules, achieves first place in the Argoverse 2 Challenge, outperforming last year’s winner by +17.1 points.

Challenge winning solution arXiv Bib Video Code

Winning solution for the Argoverse-2 end-to-end forecasting challenge, held at the CVPR WAD workshop
@inproceedings{xu2024valeo4cast, title = {{Valeo4Cast}: A Modular Approach to End-to-End Forecasting}, author = {Xu, Yihong and Zablocki, Éloi and Boulch, Alexandre and Puy, Gilles and Chen, Mickael and Bartoccioni, Florent and Samet, Nermin and Sim{\'e}oni, Oriane and Gidaris, Spyros and Vu, Tuan-Hung and Bursuc, Andrei and Valle, Eduardo and Marlet, Renaud and Cord, Matthieu}, booktitle = {ECCV Workshop ROAD++}, year = {2024}, }
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction

Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud Ben Amor, Éloi Zablocki, Matthieu Cord, and Alexandre Alahi

In ECCV, 2024

Unifying major datasets for vehicle trajectory prediction enables the study of scale and diversity impacts on performance and model generalization.

arXiv Bib Video Code
@inproceedings{feng2024unitraj, title = {{UniTraj}: A Unified Framework for Scalable Vehicle Trajectory Prediction}, author = {Feng, Lan and Bahari, Mohammadhossein and Amor, Kaouther Messaoud Ben and Zablocki, Éloi and Cord, Matthieu and Alahi, Alexandre}, booktitle = {ECCV}, year = {2024} }
PointBeV: A Sparse Approach to BeV Predictions

Loick Chambon, Éloi Zablocki, Mickaël Chen, Florent Bartoccioni, Matthieu Cord, and Patrick Pérez

In CVPR, 2024

A sparse approach to bird’s-eye view perception enhances performance and computational efficiency by avoiding the uniform allocation of resources across all cells, making it flexible to the task, situation and compute budget at inference time.

arXiv Bib Code
@inproceedings{chambon2024pointbev, title = {{PointBeV}: A Sparse Approach to BeV Predictions}, author = {Chambon, Loick and Zablocki, Éloi and Chen, Mickaël and Bartoccioni, Florent and Cord, Matthieu and Pérez, Patrick}, booktitle = {CVPR}, year = {2024} }

Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey

Oriane Siméoni, Éloi Zablocki, Spyros Gidaris, Gilles Puy, and Patrick Pérez

In IJCV, 2024

A survey on unsupervised object localization methods leveraging self-supervised pre-trained features, e.g., DINO.

arXiv Bib Code

@inproceedings{simeoni2024unsupervised,
  author = {Siméoni, Oriane and Zablocki, Éloi and Gidaris, Spyros and Puy, Gilles and Pérez, Patrick},
  title = {Unsupervised Object Localization in the Era of Self-Supervised ViTs:
                    {A} Survey},
  booktitle = {IJCV},
  year = {2024}
}

Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?

Yihong Xu, Loick Chambon, Éloi Zablocki, Mickaël Chen, Alexandre Alahi, Matthieu Cord, and Patrick Pérez

In ICRA, 2024

This work presents a unified evaluation pipeline for motion forecasting with real-world perception inputs, revealing a performance gap between curated and perception-based data.

arXiv Bib Code
@inproceedings{xu2024towards, title = {Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?}, author = {Xu, Yihong and Chambon, Loick and Zablocki, Éloi and Chen, Mickaël and Alahi, Alexandre and Cord, Matthieu and Pérez, Patrick}, booktitle = {ICRA}, year = {2024} }

2023

OCTET: Object-aware Counterfactual Explanations

Mehdi Zemni, Mickaël Chen, Éloi Zablocki, Hédi Ben-Younes, Patrick Pérez, and Matthieu Cord

In CVPR, 2023

Using a spatial- and object-aware generative model enables the generation of counterfactual explanations for deep vision models dealing with complex scenes, including many objects.

arXiv Bib Video Code
@inproceedings{zemni2023octet, author = {Zemni, Mehdi and Chen, Mickaël and Zablocki, Éloi and Ben-Younes, Hédi and Pérez, Patrick and Cord, Matthieu}, title = {{OCTET}: Object-aware Counterfactual Explanations}, booktitle = {CVPR}, year = {2023} }
Unsupervised Object Localization: Observing the Background to Discover Objects

Oriane Siméoni, Chloé Sekkat, Gilles Puy, Antonín Vobecký, Éloi Zablocki, and Patrick Pérez

In CVPR, 2023

FOUND trains a single conv1x1 on DINO features, for unsupervised object segmentation. It runs at 80 FPS on a V100 after a 2h self-training on a single GPU.

arXiv Bib Video Code
@inproceedings{simeoni2023found, author = {Siméoni, Oriane and Sekkat, Chloé and Puy, Gilles and Vobecký, Antonín and Zablocki, Éloi and Pérez, Patrick}, title = {Unsupervised Object Localization: Observing the Background to Discover Objects}, booktitle = {CVPR}, year = {2023} }

LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR

Florent Bartoccioni, Éloi Zablocki, Patrick Pérez, Matthieu Cord, and Karteek Alahari

In CVIU, 2023

Adding a low-cost LiDAR to a monocular camera setup yields improved metric depth maps in a self-supervised manner.

arXiv Bib Code

@inproceedings{bartoccioni2023lidartouch,
  author = {Bartoccioni, Florent and Zablocki, Éloi and Pérez, Patrick and Cord, Matthieu and Alahari, Karteek},
  title = {{LiDARTouch}: Monocular metric depth estimation with a few-beam LiDAR},
  booktitle = {CVIU},
  year = {2023}
}

2022

LaRa: Latents and Rays for Multi-Camera Bird’s-Eye-View Semantic Segmentation

Florent Bartoccioni, Éloi Zablocki, Andrei Bursuc, Patrick Pérez, Matthieu Cord, and Karteek Alahari

In CoRL, 2022

The Perceiver architecture, combined with careful ray encoding, excels in multi-camera fusion and transforming perceptive views into bird’s-eye-view semantic segmentation.

arXiv Bib Code
@inproceedings{bartoccioni2022lara, author = {Bartoccioni, Florent and Zablocki, Éloi and Bursuc, Andrei and Pérez, Patrick and Cord, Matthieu and Alahari, Karteek}, title = {{LaRa}: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation}, booktitle = {CoRL}, year = {2022} }
STEEX: Steering Counterfactual Explanations with Semantics

Paul Jacob, Éloi Zablocki, Hédi Ben-Younes, Mickaël Chen, Patrick Pérez, and Matthieu Cord

In ECCV, 2022

Using a well-structured image generative model unlocks the generation of counterfactual explanations for deep vision models dealing with high-quality image and complex scenes.

arXiv Bib Video Code
@inproceedings{jacob2022steex, author = {Jacob, Paul and Zablocki, Éloi and Ben-Younes, Hédi and Chen, Mickaël and Pérez, Patrick and Cord, Matthieu}, title = {{STEEX}: Steering Counterfactual Explanations with Semantics}, booktitle = {ECCV}, year = {2022} }
Raising context awareness in motion forecasting

Hedi Ben-Younes^*, Éloi Zablocki^*, Mickaël Chen, Matthieu Cord, and Patrick Pérez

In CVPR Workshop on Autonomous Driving (WAD), 2022

As trajectory prediction models merely extrapolate past motion, CAB enhances the use of HD-map information, addressing long-tail corner cases.

arXiv Bib Code
@inproceedings{benyounes2022cab, author = {Ben-Younes, Hedi and Zablocki, Éloi and Chen, Mickaël and Cord, Matthieu and Pérez, Patrick}, title = {Raising context awareness in motion forecasting}, booktitle = {CVPR Workshop on Autonomous Driving (WAD)}, year = {2022} }

Explainability of deep vision-based autonomous driving systems: Review and challenges

Éloi Zablocki^*, Hédi Ben-Younes^*, Patrick Pérez, and Matthieu Cord

In IJCV, 2022

A survey on explainability methods for vision-based autonomous-driving models.

arXiv Bib

@inproceedings{zablocki2022xai_driving_survey,
  author = {Zablocki, Éloi and Ben-Younes, Hédi and Pérez, Patrick and Cord, Matthieu},
  title = {Explainability of deep vision-based autonomous driving systems: Review and challenges},
  booktitle = {IJCV},
  year = {2022}
}

Driving Behavior Explanation with Multi-level Fusion

Hedi Ben-Younes^*, Éloi Zablocki^*, Matthieu Cord, and Patrick Pérez

In Pattern Recognition journal (PR), 2022

BEEF is a self-driving model that both drives and explains its decisions with natural language.

arXiv Bib Code

@inproceedings{benyounes2022beef,
  author = {Ben-Younes, Hedi and Zablocki, Éloi and Cord, Matthieu and Pérez, Patrick},
  title = {Driving Behavior Explanation with Multi-level Fusion},
  booktitle = {Pattern Recognition journal (PR)},
  year = {2022}
}

2020

Transductive Zero-Shot Learning using Cross-modal CycleGAN

Patrick Bordes, Éloi Zablocki, Benjamin Piwowarski, and Patrick Gallinari

arxiv, 2020

Using a cycle-consistency loss reduces the domain shift between visual and textual representations, enhancing performance in zero-shot object recognition.

arXiv Bib
@article{bordes2020transductive, author = {Bordes, Patrick and Zablocki, Éloi and Piwowarski, Benjamin and Gallinari, Patrick}, title = {Transductive Zero-Shot Learning using Cross-modal {CycleGAN}}, journal = {arxiv}, year = {2020} }

2019

Context-Aware Zero-Shot Learning for Object Recognition

Éloi Zablocki^*, Patrick Bordes^*, Benjamin Piwowarski, Laure Soulier, and Patrick Gallinari

In ICML, 2019

Using visual context boosts zero-shot object recognition.

Extended Oral arXiv Bib

Extended Oral

@inproceedings{zablocki2019context,
  author = {Zablocki, Éloi and Bordes, Patrick and Piwowarski, Benjamin and Soulier, Laure and Gallinari, Patrick},
  title = {Context-Aware Zero-Shot Learning for Object Recognition},
  booktitle = {ICML},
  year = {2019},
}

Incorporating Visual Semantics into Sentence Representations within a Grounded Space

Patrick Bordes^*, Éloi Zablocki^*, Laure Soulier, Benjamin Piwowarski, and Patrick Gallinari

In EMNLP, 2019

A careful transfer of visual features to sentence representations enriches the semantics of general-purpose textual representations.

Oral arXiv Bib

Oral
@inproceedings{bordes2019incorporating, author = {Bordes, Patrick and Zablocki, Éloi and Soulier, Laure and Piwowarski, Benjamin and Gallinari, Patrick}, title = {Incorporating Visual Semantics into Sentence Representations within a Grounded Space}, booktitle = {EMNLP}, year = {2019}, }

2018

Learning Multi-Modal Word Representation Grounded in Visual Context

Éloi Zablocki, Benjamin Piwowarski, Laure Soulier, and Patrick Gallinari

In AAAI, 2018

Visual context can be used, along with textual context, to learn improved word representations with the skip-gram algorithm.

arXiv Bib
@inproceedings{zablocki2018learning, author = {Zablocki, Éloi and Piwowarski, Benjamin and Soulier, Laure and Gallinari, Patrick}, title = {Learning Multi-Modal Word Representation Grounded in Visual Context}, booktitle = {AAAI}, year = {2018} }

2017

LIP6@CLEF2017: Multi-Modal Spatial Role Labeling using Word Embeddings

Éloi Zablocki, Patrick Bordes, Laure Soulier, Benjamin Piwowarski, and Patrick Gallinari

In CLEF, 2017

A linear SVM on pooled word representations to classify spatial relations from text and images.

Bib PDF

@inproceedings{zablocki2017sprl,
  author = {Zablocki, Éloi and Bordes, Patrick and Soulier, Laure and Piwowarski, Benjamin and Gallinari, Patrick},
  title = {{LIP6@CLEF2017}: Multi-Modal Spatial Role Labeling using Word Embeddings},
  booktitle = {CLEF},
  year = {2017}
}