A stereo vision system for traffic lane detection and driver assistance

Project overview

The objective of this work was to create a prototype automatic lane detection system which estimates vehicle position within the actual lane by a stereo vision system fixed to the vehicle chassis for lane departure warning and trajectory adjustment (lane keeping assistant). I was main developer of the lane detection algorithm, and I participated and coordinated the development of a stereo vision hardware-software testbed at the Systems and Control Lab, MTA-SZTAKI, Budapest, Hungary.

Part of this work belonged to the project "Autonomous vehicle control" in the Advanced Vehicles and Vehicle Control Knowledge Center (EJJT) (OMFB-01418/2004), founded by the Hungarian National Office for Research and Technology.

Lane detection algorithm

The developed stereo lane detection algorithm produces a 3D-reconstruction of the road plane and the lane shape in real-time, without a dense pixel-by-pixel reconstruction of the scene. Using this reconstruction, the algorithm determines the pose of the vehicle within the lane. The method complements image processing and stereo vision techniques with application-specific methods, such as lane profile fitting and consideration of vehicle dynamics. Vehicle pitching, height variations and roll motion of the chassis are implicit to the transformation between the stereo system and the road computed by the algorithm.

The algorithm was tested on stereo videos recorded in traffic with cameras fixed to the side mirrors of a test car. The test car and the cameras were provided by Thyssenkrupp Presta Hungary. The lane detection algorithm relies on a preliminary far-range camera calibration, described below.

Figure. (1) Test vehicle for video recording, (2) intermediate stereo matching of detected stripe models, and lane detection result during (3) a lane change, (4) night driving and (5) in a highway curve.

Far-range stereo camera calibration for vehicular applications

Calibration of stereo vision systems used in far-range applications require special care. Close-range calibration objects are unsuitable for far range applications, as the estimated camera pose is optimized for nearby features, and small errors in feature locations can lead to large errors in far-range 3D reconstruction. Planar arrangements with markers on the ground are typical for camera pose estimation, provided that internal parameters of the cameras are known in advance. However, far range arrangements
  • are difficult to precisely engineer and localize, even with laser-based distance meters, and the arrangement may easily deviate from planar. Thus, we take inaccuracies in the 3D marker arrangement into consideration, and formulate a Maximum Likelihood Estimate (MLE) for the camera poses. The proposed optimization method adjusts the 3D structure (marker arrangement) with the camera parameters, similarily to Bundle Adjustment in Structure-from-Motion (SfM), but it co-minimizes errors in the images and those in 3D measurements, according to their estimated uncertainties, which we estimate from our measurement procedure.

  • typically consist of only a few (tens of) markers, whereas the number of keypoints can easily exceed 1000 in intrinsic calibration. Thus, camera poses estimated from the few markers may be far from reality, without any indication in terms of residual errors (overfitting, poor generalization from the observed data). We address this by decoupling the stereo pose problem into an inter-camera and a rig-to-world pose problem. We extract a high number of stereo feature matches from on-line videos recorded with the same cameras, and use these in the off-line pose estimation procedure to estimate the inter-camera pose via the essential matrix. Thus, we only use the far-range arrangement for rig-to-world pose estimation. We also show how the far range arrangement can be used to fine-tune intrinsic parameters, even after an elaborate intrinsic calibration.

Figure. (1) Markers detected in images taken of a planar calibration arrangement of 24 markers up to 40 meters from the car (click for 3D uncertainties). 3D marker locations are measured with a sophisticated procedure using laser based distance meters. However, localization covariances are still high in far-range, whereas localization in the images are precise. (2) Measured and reconstructed marker locations (click to enlarge). (3) A high number of stereo matches extracted from on-line videos are incorporated into the off-line inter-camera pose estimation procedure via fundamental/essential matrix computation.

Hardware platform development

The "eye" of the system is an Elphel333 network camera with an embedded FPGA and Axis processor running Linux. The camera, in its original configuration, could be accessed via Ethernet, through a browser interface from a PC to grab still images, access camera settings or start a streamer. An RTSP-capable media player could be then used to see a continous video stream from the camera remotely. The full camera configuration was relatively complex, including an MJPEG encoder in the FPGA, several Linux kernel-mode drivers, user-mode daemons and utility programs running in the camera.

As a result of around a year of development, we transformed two such network cameras into a synchronized stereo vision system capable of real-time stereo image processing. This transformation included a complete redesign of the FPGA configuration in Verilog, implementing new Linux drivers (I/O, DMA and interrupts) and a user-level camera software API to access the FPGA and the CMOS vision chip. The new system can be configured and can stream frames using our proprietary protocols over TCP/IP.

Figure. The Elphel333 camera hardware platform. In several stages, it has been completely reconfigured, including a new Linux software distribution, new FPGA configuration (in Verilog), Linux drivers, embedded software (in C), TCP/IP video streaming and host-side acquisition software (in C++). See also the hw block diagram.

Figure. Stereo vision system for demonstrating real-time stereo image processing algorithms. The system is based on the completely reconfigured and reprogrammed Elphel333 cameras.

Figure. Software (left) and FPGA hardware components (right) developed by our group for the real-time stereo image processing based on the Elphel333 camera. Click to enlarge.
Figure.Demo outputs of our C++ frame client program that is capable of synchronizing and processing corresponding stereo frames independently served in a shared memory area (stereo frame buffer) by two frame server processes. The frame client is a multi-threaded application using Simple DirectMedia Layer (SDL) and Freetype2 for rendering and OpenCV for processing. A frame server is capable of grabbing consequtive frames from a video file (using the libavcodec, libavformat, and libavutil libraries) or from a reconfigured Elphel333 camera. Left: two input source frames. Right: the outputs of OpenCV's Canny edge detection in the frame client. In this example, the same video is fed to both frame servers to check synchronization. Click to enlarge.
Figure. Inter-Process Communication (IPC) mechanism between the frame server and the frame client process in the single input sstream (left) and stereo stream (right) case. Click to enlarge.

References