Titus Leistner Founder and Computer Vision Researcher


Towards Multimodal Depth Estimation from Light Fields

  • Heidelberg University

paper preview This paper was published at CVPR 2022. Download slides and poster.


example of estimated posteriors Light field applications, especially light field rendering and depth estimation, developed rapidly in recent years. While state-of-the-art light field rendering methods handle semi-transparent and reflective objects well, depth estimation methods either ignore these cases altogether or only deliver a weak performance. We argue that this is due current methods only considering a single "true" depth, even when multiple objects at different depths contributed to the color of a single pixel. Based on the simple idea of outputting a posterior depth distribution instead of only a single estimate, we develop and explore several different deep-learning-based approaches to the problem. Additionally, we contribute the first "multimodal light field depth dataset" that contains the depths of all objects which contribute to the color of a pixel. This allows us to supervise the multimodal depth prediction and also validate all methods by measuring the KL divergence of the predicted posteriors. With our thorough analysis and novel dataset, we aim to start a new line of depth estimation research that overcomes some of the long-standing limitations of this field.


Our implementation and dataset is hosted on GitHub.


    title={Towards Multimodal Depth Estimation from Light Fields},
    author={Leistner, Titus and Mackowiak, Radek and Ardizzone, Lynton and K{\"o}the, Ullrich and Rother, Carsten},
    journal={arXiv preprint arXiv},

Neural Head Avatars from Monocular RGB Videos

  • Heidelberg University¹
  • TU Munich²
  • MPI for Intelligent Systems³
  • *equal contribution

paper preview This paper was published at CVPR 2022.


example of generated avatar We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar that can be used for teleconferencing in AR/VR or other applications in the movie or games industry that rely on a digital human. Our representation can be learned from a monocular RGB portrait video that features a range of different expressions and views. Specifically, we propose a hybrid representation consisting of a morphable model for the coarse shape and expressions of the face, and two feed-forward networks, predicting vertex offsets of the underlying mesh as well as a view- and expression-dependent texture. We demonstrate that this representation is able to accurately extrapolate to unseen poses and view points, and generates natural expressions while providing sharp texture details. Compared to previous works on head avatars, our method provides a disentangled shape and appearance model of the complete human head (including hair) that is compatible with the standard graphics pipeline. Moreover, it quantitatively and qualitatively outperforms current state of the art in terms of reconstruction quality and novel-view synthesis.


Project Page and Code

Visit our official project page. Download or fork our implementation on GitHub.


    title={Neural Head Avatars from Monocular RGB Videos},
    author={Grassal, Philip-William and Prinzler, Malte and Leistner, Titus and Rother, Carsten and Nie{\ss}ner, Matthias and Thies, Justus},
    journal={arXiv preprint arXiv:2112.01554},

Learning to Think Outside the Box: Wide-Baseline Light Field Depth Estimation with EPI-Shift

  • Heidelberg University¹
  • Robert Bosch GmbH²
  • TU Dresden³

paper preview This paper was published at 3DV 2019 with an oral presentation. Download slides and poster.


basic idea of EPI-Shift We propose a method for depth estimation from light field data, based on a fully convolutional neural network architecture. Our goal is to design a pipeline which achieves highly accurate results for small- and wide-baseline light fields. Since light field training data is scarce, all learning-based approaches use a small receptive field and operate on small disparity ranges. In order to work with wide-baseline light fields, we introduce the idea of EPI-Shift: To virtually shift the light field stack which enables to retain a small receptive field, independent of the disparity range. In this way, our approach "learns to think outside the box of the receptive field". Our network performs joint classification of integer disparities and regression of disparity-offsets. A U-Net component provides excellent long-range smoothing. EPI-Shift considerably outperforms the state-of-the-art learning-based approaches and is on par with hand-crafted methods. We demonstrate this on a publicly available, synthetic, small-baseline benchmark and on large-baseline real-world recordings.


experimental results


Download or fork my implementation of EPI-Shift on GitHub.


    title={Learning to think outside the box: Wide-baseline light field depth estimation with EPI-shift},
author={Leistner, Titus and Schilling, Hendrik and Mackowiak, Radek and Gumhold, Stefan and Rother, Carsten},
    booktitle={2019 International Conference on 3D Vision (3DV)},