View on GitHub

OAK-Drone

Finalists in the OpenCV AI Competition 2021

Drone to the rescue!

Problem Statement

Search and rescue (SAR) is the search for, and provision of aid to, people who are in distress or imminent danger.” [wiki]

SAR missions are usually time-critical tasks where every second count. The exploration and inspection of cluttered environments to look for survivors while paying attention to hazards is anything but an easy task.

The ideal behaviour for an exploration mission is the capability to explore as fast as possible the unknown environment while focusing the attention on specific artefacts valuable for the mission accomplishment.

In terms of requirements, this behaviour translates to an object-oriented exploration where the system must perform a fast but selective inspection of the environment.

On the other hand, there are several works in literature that study eye gaze and tracking as an indicator of a person’s state of consciousness, attention and mind wandering. Providing important information of the current state of a survivor.

Objective

Build a search and rescue drone to explore an unknown cluttered indoor (GPS-denied) environment looking for persons (survivors) and recognize their current condition (conscious or unconscious) based on their eye detection.

Solution

The solution that we propose to this project has been addressed from the combination of two research areas that are currently being studied in our laboratory:

From these areas, we define two topics that have to be faced to solve the SAR problem and to obtain more information about the survivor:

NOTE: By autonomous aerial vehicle we intend a system that can perform tasks without the help of an external influence. All the computation related to environment perception, motion planning, and control run onboard the system.

Moreover, our prototype is a fully vision-based system meaning its sensors are restricted to the use of cameras, i.e. no Lidar, no Motion Capture, no UWB, no localization markers, etc.

Object-oriented Exploration

To accomplish a fast but selective exploration of the environment, we designed a system that localizes specific items in the scene and either maps them in a 3D occupancy map or steers the mapping process towards the detected object if it is outside the currently mapped space.

This algorithm utilizes an RGB-D camera to obtain information from the objects of interest, which is later included in the exploration map to obtain a semantically enhanced knowledge of the environment.

For more information related to this algorithm, this is the link to our paper: [IEEE, video]

Eye Blinking

We decided to fully develop this part on the OpenCV AI Kit with Depth (OAK-D), considering the limited resources of the autonomous platform. The OAK-D comes with an Intel Movidius Myriad X freeing computing process from the NVIDIA Jetson Xavier NX (computer companion of the drone).

The proposed pipeline, which, from the camera’s RGB image, leads to identifying a person’s face and analysing the eye movement to detect blinks. A summary scheme is represented in the figure below.

Face Detector is based on SqueezeNet light as a backbone with a single SSD. The backbone consists of fire modules to reduce the number of computations. The single SSD head from 1/16 scale feature map has nine clustered prior boxes. [openvino docs]

Landmarks is a lightweight regressor, it has a classic convolutional design: stacked 3x3 convolutions, batch normalizations, PReLU activations, and poolings. Final regression is done by the global depthwise pooling head and FullyConnected layers. The model predicts five facial landmarks: two eyes, nose, and two lip corners. [openvino docs]

Once the person’s face and eye position are identified, it is necessary to define a region of interest (ROI) for each eye, on which subsequently will be made a segmentation.

First, the Manhattan distance between the position of the two eyes is calculated, d_man, predicted by the Landmarks NN, after which the ROI of each eye is calculated as the rectangle centered in the position of the eye, with a height of 0.4 * d_man and length 0.8 * d_man.

Then, ROIs are scaled to 80x40, normalized and passed as input to a neural network for the computation of the segmentation. The model chosen is a CNN consisting of two main parts: a first encoder to extract the main features from the input image and a decoder to reconstruct the semantic segmentation.

The model was trained using the dataset presented in this paper, which contains 8882 eye patches from 4461 facial images of different resolutions, illumination conditions and head poses.

In summary, we use:

NOTE: In this video we can see the overall pipeline working, all NN are running in OAK-D camera. Only the eye cropper takes place outside the camera.

Blink

If anyone wants to test the blink detector in its OpenCV AI Kit, clone or download the DepthAI repository and follow the installation tutorial on the Luxonis website. Then:

git clone https://github.com/edwinpha/OAK-Drone.git
cd OAK-Drone
python main.py

Experiment Platform

The hardware platform consists of a custom-built quad-rotor that relies on a Durandal board (PX4-Autopilot) for low-level attitude rate control and on NVIDIA Jetson XAVIER NX for high-level software modules, i.e. navigation, exploration, mapping and object detection algorithms.

The drone’s perception of the environment is obtained only through the use of cameras. The system integrates two cameras (besides OAK-D), a downward-facing Intel Realsense T265 camera to obtain the drone’s local position and motion estimation, and a front-facing Intel Realsense D435 camera to obtain the depth and the RGB information.

We use ROS as the communication platform between sensors and actuators of the system and mavros, as a mavlink wrapper, for the communication with the flight computing unit (FCU).

Drone

Results

The experiment consisted in the exploration of a small room, where there were obstacles, with no prior information related to maps and/or geometry. We only set the exploration boundaries area to be in line with the actual room size, i.e. 6.0 m × 4.0 m × 2.0 m. The mission goal was to look for a person in the room and see if is blinking.

Considering the space to be explored is small, the camera distance range is limited to 1.5 m. In this way, it is possible to appreciate the exploration algorithm’s features instead of the camera’s range capacity.

In the following video, it is possible to appreciate the whole experiment from a perspective view. Besides this view, we added the continuously expanding map of the environment and once the person is found the video stream of OAK-D with its inferences.

The color code used by the volumetric representation of the environment is:

OpenCV 2021 Final Submission Video

Other SAR Experiment

A small issue

While doing the experiments on the real platform we noticed a delay related to the blinking detector. This delay was due to the computational load of the NVIDIA Xavier NX while the drone is executing all of its modules, and also, the resend of the cropped eye image to the OAK-D for Eye Segmentation and Blink Detection.

All computation in OAK-D Part of computation in OAK-D

Considering the mission is expected to run in real time, we decided to split the pipeline between OAK-D and Xavier NX, such that we avoid this delay.

Future Work and Considerations

Publications

For a deeper understanding of the whole project, please read our publications.

Exploration Algorithm

Blink Detector

References

Our whole work is based on the following links, papers and projects,