Drone to the rescue!

Problem Statement

“Search and rescue (SAR) is the search for, and provision of aid to, people who are in distress or imminent danger.” [wiki]

SAR missions are usually time-critical tasks where every second count. The exploration and inspection of cluttered environments to look for survivors while paying attention to hazards is anything but an easy task.

The ideal behaviour for an exploration mission is the capability to explore as fast as possible the unknown environment while focusing the attention on specific artefacts valuable for the mission accomplishment.

In terms of requirements, this behaviour translates to an object-oriented exploration where the system must perform a fast but selective inspection of the environment.

On the other hand, there are several works in literature that study eye gaze and tracking as an indicator of a person’s state of consciousness, attention and mind wandering. Providing important information of the current state of a survivor.

Objective

Build a search and rescue drone to explore an unknown cluttered indoor (GPS-denied) environment looking for persons (survivors) and recognize their current condition (conscious or unconscious) based on their eye detection.

Solution

The solution that we propose to this project has been addressed from the combination of two research areas that are currently being studied in our laboratory:

Planning and Decision-making of a fully autonomous aerial vehicle
Fatigue and Watchfulness Analysis of Drivers

From these areas, we define two topics that have to be faced to solve the SAR problem and to obtain more information about the survivor:

An Object-oriented Exploration of the Unknown Environment
Blink Detection of the survivor

NOTE: By autonomous aerial vehicle we intend a system that can perform tasks without the help of an external influence. All the computation related to environment perception, motion planning, and control run onboard the system.

Moreover, our prototype is a fully vision-based system meaning its sensors are restricted to the use of cameras, i.e. no Lidar, no Motion Capture, no UWB, no localization markers, etc.

Object-oriented Exploration

To accomplish a fast but selective exploration of the environment, we designed a system that localizes specific items in the scene and either maps them in a 3D occupancy map or steers the mapping process towards the detected object if it is outside the currently mapped space.

This algorithm utilizes an RGB-D camera to obtain information from the objects of interest, which is later included in the exploration map to obtain a semantically enhanced knowledge of the environment.

For more information related to this algorithm, this is the link to our paper: [IEEE, video]

Eye Blinking

We decided to fully develop this part on the OpenCV AI Kit with Depth (OAK-D), considering the limited resources of the autonomous platform. The OAK-D comes with an Intel Movidius Myriad X freeing computing process from the NVIDIA Jetson Xavier NX (computer companion of the drone).

The proposed pipeline, which, from the camera’s RGB image, leads to identifying a person’s face and analysing the eye movement to detect blinks. A summary scheme is represented in the figure below.

Face Detector is based on SqueezeNet light as a backbone with a single SSD. The backbone consists of fire modules to reduce the number of computations. The single SSD head from 1/16 scale feature map has nine clustered prior boxes. [openvino docs]

Landmarks is a lightweight regressor, it has a classic convolutional design: stacked 3x3 convolutions, batch normalizations, PReLU activations, and poolings. Final regression is done by the global depthwise pooling head and FullyConnected layers. The model predicts five facial landmarks: two eyes, nose, and two lip corners. [openvino docs]

Once the person’s face and eye position are identified, it is necessary to define a region of interest (ROI) for each eye, on which subsequently will be made a segmentation.

First, the Manhattan distance between the position of the two eyes is calculated, d_man, predicted by the Landmarks NN, after which the ROI of each eye is calculated as the rectangle centered in the position of the eye, with a height of 0.4 * d_man and length 0.8 * d_man.

Then, ROIs are scaled to 80x40, normalized and passed as input to a neural network for the computation of the segmentation. The model chosen is a CNN consisting of two main parts: a first encoder to extract the main features from the input image and a decoder to reconstruct the semantic segmentation.

The model was trained using the dataset presented in this paper, which contains 8882 eye patches from 4461 facial images of different resolutions, illumination conditions and head poses.

In summary, we use:

Face Detection NN - Luxonis github
Landmarks NN - Luxonis github
Eye Segmentation NN - Not available yet (Article Publication Process)
Sequence Classifier NN - Not available yet (Article Publication Process)

NOTE: In this video we can see the overall pipeline working, all NN are running in OAK-D camera. Only the eye cropper takes place outside the camera.

Blink

If anyone wants to test the blink detector in its OpenCV AI Kit, clone or download the DepthAI repository and follow the installation tutorial on the Luxonis website. Then:

git clone https://github.com/edwinpha/OAK-Drone.git
cd OAK-Drone
python main.py

Experiment Platform

The hardware platform consists of a custom-built quad-rotor that relies on a Durandal board (PX4-Autopilot) for low-level attitude rate control and on NVIDIA Jetson XAVIER NX for high-level software modules, i.e. navigation, exploration, mapping and object detection algorithms.

The drone’s perception of the environment is obtained only through the use of cameras. The system integrates two cameras (besides OAK-D), a downward-facing Intel Realsense T265 camera to obtain the drone’s local position and motion estimation, and a front-facing Intel Realsense D435 camera to obtain the depth and the RGB information.

We use ROS as the communication platform between sensors and actuators of the system and mavros, as a mavlink wrapper, for the communication with the flight computing unit (FCU).

Drone

Results

The experiment consisted in the exploration of a small room, where there were obstacles, with no prior information related to maps and/or geometry. We only set the exploration boundaries area to be in line with the actual room size, i.e. 6.0 m × 4.0 m × 2.0 m. The mission goal was to look for a person in the room and see if is blinking.

Considering the space to be explored is small, the camera distance range is limited to 1.5 m. In this way, it is possible to appreciate the exploration algorithm’s features instead of the camera’s range capacity.

In the following video, it is possible to appreciate the whole experiment from a perspective view. Besides this view, we added the continuously expanding map of the environment and once the person is found the video stream of OAK-D with its inferences.

The color code used by the volumetric representation of the environment is:

Gray-scale: For objects that are not of interest and surroundings. The scale goes darker with the height of the voxel.
Blue: Map frontier voxels. Voxels related to an object of interest and projected to the current map frontier, because the object is beyond the depth range of the sensor. More information.
Red: Surface voxels. Voxels related to an object of interest, once its distance is within the range of the depth sensor. More information.
Green: Conscious person voxel. Voxel representing a person who was blinking when the drone goes to give assistance. It is a requirement to be a red voxel to become green. If the person does not blink while the drone is next to it, the voxels remain red.

OpenCV 2021 Final Submission Video

Other SAR Experiment

A small issue

While doing the experiments on the real platform we noticed a delay related to the blinking detector. This delay was due to the computational load of the NVIDIA Xavier NX while the drone is executing all of its modules, and also, the resend of the cropped eye image to the OAK-D for Eye Segmentation and Blink Detection.

All computation in OAK-D	Part of computation in OAK-D

Considering the mission is expected to run in real time, we decided to split the pipeline between OAK-D and Xavier NX, such that we avoid this delay.

Future Work and Considerations

Reduction of the frontal cameras from two to one. Considering Intel Realsense D435 and OAK-D are both RGB-D cameras, they could be reduced to a single front-facing camera. Considering we have released eye segmentation and blink detection from the OAK-D, it could calculate also the Point Cloud onboard.
Reduction of false positive blinks caused by the drone’s movement and vibrations.
Improvement of flight autonomy through weight reduction and algorithm optimization. (Current flight autonomy approx. 6 min)
During the experiments, we considered that the face of the person is in the line of sight of the drone. A body inspection of the person has to be implemented for the drone to look for the face and perform the blink analysis.

Publications

For a deeper understanding of the whole project, please read our publications.

Exploration Algorithm

Edwin Paúl Herrera Alarcón, Davide Bagheri Ghavifekr, Gabriele Baris, Michael Mugnai, Massimo Satler and Carlo Alberto Avizzano, “An Efficient Object-Oriented Exploration Algorithm for Unmanned Aerial Vehicles”, in IEEE Int. Conf. on Unmanned Aerial Systems (ICUAS), June 2021. [IEEE, video]

Blink Detector

TBD.

References

Our whole work is based on the following links, papers and projects,

DepthAI [Documentation]
OpenVINO [Documentation]
PX4-Autopilot [Documentation, github]
L. Schmid, M. Pantic, R. Khanna, L. Ott, R. Siegwart, and J. Nieto., “An Efficient Sampling-based Method for Online Informative Path Planning in Unknown Environments”, in IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1500-1507, April 2020. [github]
H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, and J. Nieto., “Voxblox: Incremental 3D Euclidean Signed Distance Fields for On-Board MAV Planning”, in IEEE Int. Conf. on Intelligent Robots and Systems (IROS), October 2017. [github]
M. Verucchi, G. Brilli, D. Sapienza, M. Verasani, M. Arena, F. Gatti, A. Capotondi, R. Cavicchioli, M. Bertogna, and M. Solieri. A systematic assessment of embedded neural networks for object detection, in 25th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), vol. 1, pp. 937–944, 2020. [github]
B. Luo, J. Shen, S. Cheng, Y. Wang and M. Pantic, “Shape Constrained Network for Eye Segmentation in the Wild.” in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1941-1949, 2020. [arxiv]