With SIA VISION 2018 (Salon International de l’Automobile) in mind, Lynred wanted to demonstrate that the integration of infrared technologies in autonomous vehicles could be relevant. To do this, Lynred targeted a very concrete use case: pedestrian detection. A dimension that raises questions at the time of projecting oneself in the era of autonomous vehicles.
Neovision built a database of previously unpublished data (infrared RGB and composite images) and then exploited this dataset. Subsequently, Neovision designed and developed Deep Learning algorithms capable of detecting pedestrians in all lighting conditions.
As a result of Neovision’s work, Lynred was able to present an industrial applied research paper at SIA Vision 2018. This paper highlights the value of using infrared and applying Deep Learning algorithms to detect pedestrians. Lynred thus opens the doors to the automotive market.
If autonomous vehicles embed many perception technologies (optical sensors, radars, lidars, etc.), a problem remains unanswered. What happens when visibility decreases greatly? The above-mentioned technologies are ineffective.
From this observation an idea was born. Why not integrate infrared sensors? These are indeed able to make pedestrians stand out thanks to their thermal footprint. In addition, and contrary to conventional cameras, infrared remains insensitive to variations in brightness.
However, infrared is not a miracle solution. When temperatures rise, it becomes difficult to distinguish a human standing in front of a hot surface. For this reason, Neovision decided to combine the infrared sensor with a conventional RGB camera. This way, pedestrians can be recognized in all light conditions.
The data acquisition device was therefore well defined. All that was left to do was to acquire the data. To do so, Neovision installed the device on a vehicle which drove in the streets of Grenoble day and night. The two sensors simultaneously recorded aligned visible and infrared images. In the end, Neovision had about 6 hours of image capture time from which 5508 images were selected and annotated by hand (a task as crucial as it was tedious). This annotation was performed with the utmost care on multispectral images, obtained by superimposing visible and infrared images.
The data being structured and correctly annotated, all that remained was to exploit it. To do this, Neovision turned to CNN (Convolutionnal Neural Network) and more specifically the RetinaNet architecture (part of the SSD (Single Shot Detectors). This solution was chosen for its simplicity and state-of-the-art results. To be even more precise, the architecture selected is therefore that of RetinaNet based on ResNet-50 pre-trained on the COCO dataset.
Since this architecture only takes visible images as input, the infrared images have been converted to RGB (Red, Green, Blue) images via inferno colormap color matching. Subsequently, Neovision resized these same infrared images to match the size of the visible images. Then, by merging these images, Neovision obtained multispectral images. Neovision therefore had 3 sets of data on which to train the Deep Learning algorithms.
Following the training, Neovision performed a validation of the results. And as we expected, if the visible excels during the day and the infrared at night, the multispectral method takes the best of both technologies. Indeed, the algorithms obtained the best results on these images. And they thus improve the average accuracy by 11%, day and night!
Despite a reduced data set, this work highlights that adding an infrared sensor to a visible camera significantly improves the detection of people. A way to innovate without necessarily reinventing the wheel!
COMPUTER VISION, DEEP LEARNING, R&D
16 October 2020
Computer Vision, Deep Learning, R&D