Semester and Master Projects at BIWI

We constantly offer interesting and challenging semester and master projects for motivated students at our lab. Below, you can find a list of topics that are currently being offered. Not all projects might be listed, if you are generally interested, do not hesitate to contact one of the supervisors, she/he might also give you an overview of other offered projects. Also, don't hesitate to contact us proposing your own ideas for projects, they are more than welcome.

In this presentation you can find an overview of our research topics and available projects.

Common Topics

Where should the tourist go?

Landmarks are photographed from many views from the many international tourists. Flickr provides a rich set of data from all their pictures, yet some views are not as common. This work is to create a tool which visualizes which views are common and suggest new views.

GOAL: Create an android app to display a 2d/3d map of the landmark and possible next locations.

The student will provided the current 3d viewer and the tasks are
- create a framework in android for showing images, pointclouds and meshes
- calculate the next best view based on current state of the art technology
- create an interface to show the next view to the user
- take a new picture and upload it to a server


Interested in cool 3D topics? Contact me! hayko[AT]vision.ee.ethz.ch

Supervisor(s):

Dr. Hayko Riemenschneider, ETF E111, Tel.: +41 44 63 20258

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Introduction. Due to recent advances in 3D range imaging highly accurate and large 3D models for real-world environments can easily be obtained and are already available for many city areas. Given structural information about the world, many new opportunities for computer vision (CV) applications in scene understanding arise. Videos are a rich source for capturing and analyzing social activities, human/vehicular traffic and events. This allows for CV applications such as multi-view object tracking, vehicle and pedestrian trajectory analysis, video cutting, multi-video event and scene summarization. Registration of video data to a 3D world model using visual information is an essential requirement for many of these applications.

Goal. The goal of the project (Semester- or Master Project) is to develop a pipeline which enables registration (i.e. finding position and orientation) for all frames of a video with respect to a given 3D (Structure-from-Motion) point cloud.

The starting point are the ideas, code and datasets of this paper. The project scope can be adapted for either a semester or a master thesis.

Requirements. Basic prior knowledge in 3D Multi-View Geometry / Stereo Vision.

If you can program in Matlab and C++ and would like to learn some computer vision, feel free to send a mail.

Type of Work. 40% theory, 60% programming

Supervisor(s):

Till Kroeger, ETF C113.1, Tel.:

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Introduction. With the rise of mobile phones, GoPro, Google Glass vast amounts of videos are captured every day. As in digital photography, many users follow a capture first, filter later mentality, where little thought is spent on timing, cutting, content and view selection. As a result, such casual videos are too long, shaky, redundant and low-paced to watch in their entirety. As a consequence, it has become increasingly important to automatically edit and summarize videos. This allows to get the overview of a video and also improve the viewing experience.

Goal of this project. This project aims at improving video summarization by leveraging web information. Given the title of a video, related images are automatically collected using Google Image search, etc. Then, a relevance model is build, where video frames are compared to the web images and are assigned a higher “highlightness” score, if they are more similar. This model allows to find highlights even when the video content is difficult to analyze using state-of-the-art event detection. Finally, the highlights are uses to create an aesthetic and interesting summary, based on an existing temporal segmentation and optimization algorithms.

For more information on video summarization see our previous work here:
http://www.vision.ee.ethz.ch/~gyglim/vsum

Supervisor(s):

Michael Gygli, ETF C113.2, Tel.: +41 44 63 26639

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Introduction: Image resizing is one of the most common image operations. Almost all display and editing software employs this operation. This is necessary for example when we adapt images to displaying devices of different dimensions, when we want to explore in more details some regions of the image (e.g. in visual surveillance), and so on. The downsampling of images usually does not pose a challenge. However, the up-sampling of images is still an open problem. Super-resolution is a technology that is used to sharpen smooth rough edges and enrich missing textures in images that have been enlarged using a general up-scaling process (such as a bilinear or bicubic process), thereby delivering an image with high-quality resolution. Image super-resolution is normally solved by learning from examples: pairs of high-resolution image patches and the corresponding low-resolution (down-sampled) image patches. The performance of the system largely relies on the `quality' of the image patches collected. However, how to find the most informative / useful image patches is still an open question and has not drawn much attention.
Goal : The goal of the project is to develop an approach to actively explore the most useful images / patches to be used as the examples for image super-resolution. Students are provided with code of image super-resolution and active learning.
Requirement : Basic knowledge about vision and machine learning; Able to program.

Supervisor(s):

Dr. Dengxin Dai, ETF C113.1, Tel.: +41 44 63 25426

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Description: Microtubules (MTs) are filamentous polymers of tubulin and are essential for many cellular structures and functions: cytoskeleton, intracellular transport, cell division, proliferation and motility, etc. They are characterized by a very dynamic behavior, switching stochastically between phases of growth and shrinkage taking place at the MT ends: a mechanism called dynamic instability. In vivo, this feature of MTs is highly regulated in space and in time. Perturbations of the dynamic instability or its regulation lead to serious diseases such as cancer or neurodegenerative disorders. As macromolecular cellular entities, MTs are well-studied. However, modeling MT dynamic instability on the atomic scale is yet unattainable due to the large size of the MTs and the large time-scale physiologically relevant. In addition to this drawback of modeling and simulation of large biomolecular assemblies, our understanding of dynamic instability faces a conceptual challenge: deciphering how the different spatial and temporal scales determine dynamic instability.

The research project involves development and implementation of mechano-chemical models for MT dynamic instability. The structural and mechanical properties will be based on elastic networks. The stochastic biochemical modifications of the MTs will be explicitly coupled with the structural and mechanical description. The originality of our approach lies in a genuine coupling between these two aspects, involving a careful interaction between deterministic and stochastic simulation algorithms. At a later stage of the project, connections between the atomic nanoscale level and the emerging macroscale mechanical properties of MTs will be explored by coarse- graining techniques. The project is planned to last for 6-8 months and will involve strong collaboration with internationally renowned experimental research groups from ETH Zurich and the Paul Scherrer Institute (PSI).

Requirements: solid skills in C++ programming; general-purpose programming language for prototyping (python preferred) is an advantage; experience in two or more of the following areas is an advantage: biophysical modeling, stochastic processes simulation, computational mechanics; excellent communication skills with fluent English; self-motivation and team-working spirit, with genuine curiosity for science

Goals: The goals of the project are development and testing of mechano-chemical MT models, as well as implementation of the models into an existing software package.

Application: Applicants should send a transcript and a CV by email to the address below, with the subject Application for Internship in Modeling and Simulation of Microtubules. The CV should clearly state which curriculum the candidate follows, as well as the information about the start and end dates of education and/or previous employment positions. We also need contact information of at least two potential references.

Supervisor(s):

Dr. Oliwia Szklarczyk, ETF D117, Tel.: +41 44 63 27690

Dr. Grégory Paul, ETF D117, Tel.: +41 44 63 26670

Professor:

Gábor Székely (szekely), ETF C117, Tel.: +41 44 63 25288

Introduction

Reconstructing the 3D geometry of a city is often achieved through multi-view photographic acquisition. This high-quality imagery is however rarely completely exploited to generate the color of the virtual city, i.e., its facades’ textures. One of the main obstacles for that is the size of the data, which can easily grow to gigapixel textures for 3D cities. It is therefore important that we strive towards maximal storage compactness while keeping quality high. Traditional image compression schemes, like the ubiquitous JPEG-2000 format, are not practical for this purpose: it lacks the random access property that make graphics cards efficient. On the other hand, there exist very efficient procedural texture synthesis methods that are both compact and fast to evaluate, making them very popular in game engines. Some procedural textures can also be learned so as to reproduce input exemplars.

Goal of this project

We aim at studying facade texture summarisation by taking advantage of modern texture synthesis methods. Gaussian textures are a limited category of textures characterized solely by their power spectrum (thus compact) and are very quick to render as a procedural texture. We want to investigate to what extent a facade texture can be summarized for realistic synthesis with a mix of procedural Gaussian texture synthesis and classical texture mapping methods. To answer this question, we will first learn to predict which parts of a facade are faithfully representable by a Gaussian texture. Later on, we will investigate how to summarize a facade given the gathered knowledge.

Interested ? Want more information ? Looking for another nice topic related to (real-time) texture synthesis ? Please contact vanhoey [AT] vision.ee.ethz.ch

Supervisor(s):

Dr. Kenneth Vanhoey, ETF C112, Tel.: +41 44 63 38394

Dr. Hayko Riemenschneider, ETF E111, Tel.: +41 44 63 20258

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Introduction

Browsing webcams online can be useful and sometimes fun too. Therefore several websites propose to browse public webcams or links to them. Determining their location unfortunately requires manual labeling: someone needs to find the webcam provider (e.g., a city’s tourist office) and manually report that location. This hinders the rise of automatic large-scale webcam crawlers.

Goal of this project

We have recorded over a year of hourly images with timestamps of thousands of webcams whose locations are unknown. We propose to determine their location from the image’s content itself by detecting the day/night cycles in the images, and combining that with a model of luminance as a function of time and position on the earth’s surface.

Student's task

By programming this model (luminance on the earth and daytime detector from the image), we will investigate how precise such a locator can get by gradually improving it, starting from simple concepts towards more involved ones.

Required skills

Programming: C/C++, Matlab and/or Python
Computer vision & math: basic 3D geometry and image processing.

Interested ? Want more information ? Please contact vanhoey [AT] vision.ee.ethz.ch

Supervisor(s):

Dr. Kenneth Vanhoey, ETF C112, Tel.: +41 44 63 38394

Dr. Danda Pani Paudel, ETF C112, Tel.: +41 44 63 22774

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Introduction

Splash is a Silicon Valley funded, Berlin-based startup and creator of the world’s first mobile app to create single lens 360 degree panoramic video with real-time stitching. 360° cameras are too unfamiliar and costly for most people to jump to the new medium, while the world’s two billion smartphone camera users are trained to expect simple actions lead to instant, high quality results. As a result, capturing 360° video panoramas needs to be accessible on the smartphone with the familiarity and effortlessness of taking a video. Splash aims to do this by using a large dataset of panoramic images to auto-complete spaces intelligently.

Goal of this project

Splash's users have recorded more than 50000 panoramic videos so far. We would like to explore the possibilities to auto-complete such a video with stills from other panoramas. For this we will train a visual descriptor for the individual frames that compose a panorama using neural networks. Using this descriptor we want to experiment with different ways to select frames from other panoramas to complete the current one. Further on we will investigate different filters and artistic styles that will help to create a coherent immersive world.

Context of internship

This joint project is co-supervised by the Computer Vision Laboratory of ETH ZÃŒrich and Splash, located in Berlin, Germany. Splash will support the student through access to their datasets and codebase as well as through supervision by their engineers and technical artists. Preferred location of the internship will be Berlin. A small stipend to cover living costs will be provided.

Supervisor(s):

Dr. Kenneth Vanhoey, ETF C112, Tel.: +41 44 63 38394

Maximilian Schneider, Splash HQ, Berlin, Tel.:

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Ultrasound elastography is a method that allows the noninvasive measurement of mechanical properties (relative stiffness, Young’s modulus, Bulk modulus, viscosity, nonlinearity) of in vivo soft tissue with emerging clinical applications from breast tissue characterization to cardiology. Harmonic elastography involves applying multiple frequency steady state vibrations to tissues in order to measure the aforementioned properties. In order to evaluate our methods, tissue mimicking phantoms (gelatin, silicon) with different properties are built and tested on. It is the aim of this project to create a simple rheometry setup for measuring the mechanical properties of different tissue mimicking phantoms. The setups can be as simple as placing the phantom on a scale, applying a weight on top of it and measuring the axial deformation to determine its Young’s modulus, or slightly more complex. Phantoms with different properties such as materials (gelatin, agar, and silicon), sizes, stiffness as well as different inclusions sizes, stiffness and position within the phantom will be created and tested in the rheometry setup. Furthermore, the effect of different scatters (flour, cellulose) will also be tested (e.g. signal damping, signal traceability). In the case of a master thesis, experimental results will be compared with finite element simulation results from ANSYS. Type of work: Semester Project or Master Thesis, 40% Theory, 25% Software implementation, 35% Experiments. Requirements: Mechanical design principles, basic Matlab / C++ knowledge, Ansys knowledge is a plus, but not necessary.

Supervisor(s):

Corin Felix Otesteanu, ETF D112, Tel.: +41 44 63 38815

Professor:

Orçun Göksel (ogoksel), ETF C107, Tel.: +41 44 63 22529

Introduction:

Facial Expression Recognition is useful to analyze human emotion, and is an interesting blend of psychology and technology. In general, a facial expression recognition algorithm detects faces within a photo or video, and analyzes the relationship between points on the face. In the end, one sequence of face motion is typically classified into one of seven main categories: Happy, Sadness, Anger, Fear, Surprise, Disgust and Neutral.

Goal of this project:

This project aims to analyze the dynamic temporal expressions of the facial data corpus from real world environment. To capture the face shape and appearance changes over a video sequence, we propose to develop a dynamic version of the Active Appearance Model. This model is then fed into the recently popular deep neural networks to achieve good classification of different facial expressions.

Student's task:

To achieve the goal, the student is expected to study the Active Appearance Model to extract the dynamic information in a face sequence. Later, these features are used (along with the deep features) within a new version of the deep networks which operates on the specified data model of the non-Euclidean domain.

Required skills:

Programming:

C/C++, Matlab and/or Python

Computer vision & math:

image processing, statistical modeling and deep learning.

Supervisor(s):

Dr. Zhiwu Huang, ETF C112, Tel.: +41 44 63 39287

Dr. Danda Pani Paudel, ETF C112, Tel.: +41 44 63 22774

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Intro:

Deep architectures have proven successful for various image recognition tasks, such as image classification, object recognition, segmentation, etc.

However, for commercial applications, efficiency plays a key role.

Goal:

The goal of this project is to explore new architectures that can be more efficient, either in terms of runtime or memory complexity. The objective is to automatically learn an efficient model, or to devise methods to speed up existing architectures.

Requirements:

Programming knowledge of Python and/or C/C++; familiarity with computer vision, machine learning, and deep learning; self-motivation and team-working skills.

We also have more available projects which are not listed yet. For more information, please contact Eirikur Agustsson ( aeirikur@vision.ee.ethz.ch ) or Radu Timofte ( timofter@vision.ee.ethz.ch )

Supervisor(s):

Eirikur Agustsson, ETF D115, Tel.: +41 44 63 29420

Dr. Radu Timofte, ETF C107, Tel.: +41 44 63 25279

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Intro:

As the saying goes: a picture is worth a thousand words .

Given a picture of a person, there are hundreds of attributes that can be predicted and analyzed: age, gender, body shape, clothes, skin color, hairstyle, attractiveness, expressions, gestures, impressions, personality traits, etc.

Demo:

www.howhot.io

Goal:

The goal of this project is to tackle new problems in this domains, using deep neural networks on recently collected datasets.

Requirements:

Programming knowledge of Python and/or C/C++; familiarity with computer vision, machine learning, and deep learning; self-motivation and team-working skills

We also have more available projects which are not listed yet. For more information, please contact us!

Supervisor(s):

Eirikur Agustsson, ETF D115, Tel.: +41 44 63 29420

Dr. Radu Timofte, ETF C107, Tel.: +41 44 63 25279

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Intro:

Recent years shown large improvements (due to new schemes based on dense matching of patches, total variation formulations and end-to-end deep learning) in optical flow estimation, that is the motion estimation for each pixel from one frame to another in a video sequence. The robustness and the accuracy of the estimation improved as well as the runtime efficiency. For applications, there is a large interest for low time and memory complexity methods. Unfortunately, such methods are still far from the top state-of-the-art achieved accuracies.

Goal:

The goal of this project is to assess the current status of efficient optical flow methods and to analyze ways to further improve their performance such that to bridge the gap between low complexity and high accuracy.

An optional branch of this work can be the development of techniques which allow an optical flow algorithm to assess it's own success, i.e. failure detection and/or probabilistic output.

Requirements:

Programming knowledge of Python and/or C/C++; familiarity with computer vision, machine learning, and deep learning; self-motivation and team-working skills

We also have more available projects which are not listed yet. For more information, please contact us!

Supervisor(s):

Till Kroeger, ETF C113.1, Tel.:

Dr. Radu Timofte, ETF C107, Tel.: +41 44 63 25279

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Intro:

Recent years shown large improvements in the upscaling of low resolution images to higher and more detailed resolutions, a task known as single image super-resolution (SISR). The videos are generally tackled by simple deployment of SISR methods frame-wise, by often high complexity methods involving motion and blur estimation at pixel level and optimization methods, or assume the presence in the video of keyframes at high resolution used to adapt the SISR model to the local video contents.

With the advances in displays, camera sensors and storing capacities there is a continuous need for computational efficient and accurate upscaling to higher resolutions of online streams and of videos recorded when displays of low resolution were the norm.

Goal:

The goal of this project is to develop an efficient video super-resolution method with low complexity and high accuracy starting from the recent advances in SISR, optical flow, and deep learning.

Requirements:

Programming knowledge of Python and/or C/C++; familiarity with computer vision, machine learning, and deep learning; self-motivation and team-working skills

We also have more available projects which are not listed yet. For more information, please contact us!

Supervisor(s):

Till Kroeger, ETF C113.1, Tel.:

Dr. Radu Timofte, ETF C107, Tel.: +41 44 63 25279

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Intro:

Switzerland is the home of thousands of plant species... and of millions of people and of a broad system of highways and roads.

If each individual has a residence address the plants do not. Accurate plant species identification is usually the expertise of a few specialized botanists.

Some plants are of particular interest: endangered species, species of economic value, invasive species, newly introduced species to the ecosystem.

Goal:

The goal of this project is to automatize the mapping of plant species from images and videos recorded along highways (such as Google Street View).

We will collaborate with Dr. Michael Nobis, macroecologist at WSL - Swiss Federal Institute for Forest, Snow and Landscape Research.

Requirements:

Programming knowledge of Python and/or C/C++; familiarity with computer vision, machine learning, and deep learning; self-motivation and team-working skills

We also have more available projects which are not listed yet. For more information, please contact us!

Supervisor(s):

Dr. Radu Timofte, ETF C107, Tel.: +41 44 63 25279

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Introduction:

Autocalibration (retrieving camera's intrinsic parameters from image correspondences) of a moving camera is a non-linear and challenging problem. Most approaches rely on the ubiquitous nature of the so-called Absolute Conic: a special conic lying on the plane at infinity. This project aims on robustly estimating the Absolute Conic within a framework of global optimization.

Goal of this project:

Camera intrinsics extraction from Fundamental matrix via Absolute Conic estimation. The first part of the project focuses on deriving various mathematical conditions suitable for convex optimization. Later, these conditions are used to develop a robust, optimal, and practical autocalibration method.

Required skills:

  • Interested in dynamic programming and convex optimization
  • Strong mathematical background
  • Good at Matlab/Python/C++

Supervisor(s):

Dr. Danda Pani Paudel, ETF C112, Tel.: +41 44 63 22774

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Supervisor(s):

Firat Özdemir, ETF C111, Tel.: +41 44 63 27685

Fabien Péan, ETF C111, Tel.: +41 44 63 27632

Christine Tanner (tannerch), ETF C108, Tel.: +41 44 63 26246

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578

Intro:

The recent Deep Learning revolution has been fueled by large datasets, such as the ImageNet database. However, such datasets are extremely expensive to manually annotate. There is an increasing interest in developing novel deep learning techniques based on low-cost data.

Goal:

The goal of this project is to explore methods to automatically construct large datasets for image classification via web crawling, with minimal manual annotation and supervision. Furthermore, you will explore techniques to learn competitive deep neural networks on such data, dealing with label noise and utilising accompanying meta-information crawled.

Requirements:

Programming knowledge of Python, familiarity with computer vision, machine learning and deep learning; motivation and collaboration skills. Experience with Tensorflow is a plus.

Supervisor(s):

Eirikur Agustsson, ETF D115, Tel.: +41 44 63 29420

Dr. Wen Li, ETF C112, Tel.: +41 44 63 25281

Limin Wang (limin.wang), ETF D113.1, Tel.: +41 44 63 20566

Professor:

Luc Van Gool (vangool), ETF C117, Tel.: +41 44 63 26578