With SIA VISION 2018 (Salon International de l’Automobile) in mind, Lynred wanted to demonstrate that the integration of infrared technologies in autonomous vehicles could be relevant. To do this, Lynred targeted a very concrete use case: pedestrian detection. A dimension that raises questions at the time of projecting oneself in the era of autonomous vehicles.
Neovision built a database of previously unpublished data (infrared RGB and composite images) and then exploited this dataset. Subsequently, Neovision designed and developed Deep Learning algorithms capable of detecting pedestrians in all lighting conditions.
As a result of Neovision’s work, Lynred was able to present an industrial applied research paper at SIA Vision 2018. This paper highlights the value of using infrared and applying Deep Learning algorithms to detect pedestrians. Lynred thus opens the doors to the automotive market.
If autonomous vehicles embed many perception technologies (optical sensors, radars, lidars, etc.), a problem remains unanswered. What happens when visibility decreases greatly? The above-mentioned technologies are ineffective.
From this observation an idea was born. Why not integrate infrared sensors? These are indeed able to make pedestrians stand out thanks to their thermal footprint. In addition, and contrary to conventional cameras, infrared remains insensitive to variations in brightness.
However, infrared is not a miracle solution. When temperatures rise, it becomes difficult to distinguish a human standing in front of a hot surface. For this reason, Neovision decided to combine the infrared sensor with a conventional RGB camera. This way, pedestrians can be recognized in all light conditions.
The data acquisition device was therefore well defined. All that was left to do was to acquire the data. To do so, Neovision installed the device on a vehicle which drove in the streets of Grenoble day and night. The two sensors simultaneously recorded aligned visible and infrared images. In the end, Neovision had about 6 hours of image capture time from which 5508 images were selected and annotated by hand (a task as crucial as it was tedious). This annotation was performed with the utmost care on multispectral images, obtained by superimposing visible and infrared images.
The data being structured and correctly annotated, all that remained was to exploit it. To do this, Neovision turned to CNN (Convolutionnal Neural Network) and more specifically the RetinaNet architecture (part of the SSD (Single Shot Detectors). This solution was chosen for its simplicity and state-of-the-art results. To be even more precise, the architecture selected is therefore that of RetinaNet based on ResNet-50 pre-trained on the COCO dataset.
Since this architecture only takes visible images as input, the infrared images have been converted to RGB (Red, Green, Blue) images via inferno colormap color matching. Subsequently, Neovision resized these same infrared images to match the size of the visible images. Then, by merging these images, Neovision obtained multispectral images. Neovision therefore had 3 sets of data on which to train the Deep Learning algorithms.
Following the training, Neovision performed a validation of the results. And as we expected, if the visible excels during the day and the infrared at night, the multispectral method takes the best of both technologies. Indeed, the algorithms obtained the best results on these images. And they thus improve the average accuracy by 11%, day and night!
Despite a reduced data set, this work highlights that adding an infrared sensor to a visible camera significantly improves the detection of people. A way to innovate without necessarily reinventing the wheel!
In 2021, after having developed a network capable of optimizing the fusion between 2 video channels (one visible and one thermal infrared channel), Neovision has worked on its embeddability in order to make it work in real time on an NVIDIA AGX board.
The developed architecture considerably improves performance compared to “early fusion” or “late fusion” type architectures. This architecture, called Gated Multimodal Fusion Network, allows to gain 4 to 9 points in night conditions compared to classical fusions (and 18 points compared to visible alone), 8 points on heavily occluded pedestrians, and 12 points on distant targets (pedestrians at more than 50m). See SIA 2021 publication for more details.
The demonstrator has three branches: one branch for each modality (infrared and visible), and a central branch for multimodal fusion. For the demonstration, the 3 branches had to run simultaneously in order to compare the performance of each modality, all on an NVIDIA AGX board. The card had to embark in addition the real time processing of correction of the IR images of Lynred ” ShutterLess “. This processing avoids the mechanical shutter needed to correct IR cameras currently on the market.
COMPUTER VISION, DEEP LEARNING, R&D
« One of the major problems for autonomous vehicles is the ability to detect VRU (Vunerable Road Users = pedestrians, cyclists, scooters) and this in all visibility conditions (including night, headlight glare, tunnel entrances/exits, smoke, fog, etc…). The current systems mainly use visible cameras which are in difficulty or even inoperative in these situations. Thermal infrared cameras can address these difficult situations with great efficiency. There remains the problem of data fusion: how to optimize the pedestrian detection function by making the most of each sensor (visible + infrared)? We called upon Neovision, which was able to take charge of all the phases of the project: state of the art of the possible fusion modes, prototyping of the most promising architectures, constitution of a consequent database (driving with recording of approximately 1M images on 2 visible cameras and 2 IR cameras), training, optimization of the performances, tests, and finally integration in a live demonstrator which had to function in real time.
In the end, it is a neural network that outperforms classical architectures by improving performance in all situations, a co-authored publication, and a working real-time demonstrator! Dealing with a complex problem like this can only work if the teams are competent and if they cooperate. This is the last point I would like to emphasize: it is also thanks to the good cooperation of the Neovision teams with our teams that we were able to achieve such results.»
Xavier Brenière, Application Labs Manager at Lynred
5 January 2022
Computer Vision, Deep Learning, R&D