Human detection and action recognition using depth information by Kinect

Repository

Human detection and action recognition using depth information by Kinect

Show full record

Title: Human detection and action recognition using depth information by Kinect
Author: Xia, Lu, active 21st century
Abstract: Traditional computer vision algorithms depend on information taken by visible-light cameras. But there are inherent limitations of this data source, e.g. they are sensitive to illumination changes, occlusions and background clutter. Range sensors give us 3D structural information of the scene and it’s robust to the change of color and illumination. In this thesis, we present a series of approaches which are developed using the depth information by Kinect to address the issues regarding human detection and action recognition. Taking the depth information, the basic problem we consider is to detect humans in the scene. We propose a model based approach, which is comprised of a 2D head contour detector and a 3D head surface detector. We propose a segmentation scheme to segment the human from the surroundings based on the detection point and extract the whole body of the subject. We also explore the tracking algorithm based on our detection result. The methods are tested on a dataset we collected and present superior results over the existing algorithms. With the detection result, we further studied on recognizing their actions. We present a novel approach for human action recognition with histograms of 3D joint locations (HOJ3D) as a compact representation of postures. We extract the 3D skeletal joint locations from Kinect depth maps using Shotton et al.’s method. The HOJ3D computed from the action depth sequences are reprojected using LDA and then clustered into k posture visual words, which represent the prototypical poses of actions. The temporal evolutions of those visual words are modeled by discrete hidden Markov models (HMMs). In addition, due to the design of our spherical coordinate system and the robust 3D skeleton estimation from Kinect, our method demonstrates significant view invariance on our 3D action dataset. Our dataset is composed of 200 3D sequences of 10 indoor activities performed by 10 individuals in varied views. Our method is real-time and achieves superior results on the challenging 3D action dataset. We also tested our algorithm on the MSR Action3D dataset and our algorithm outperforms existing algorithm on most of the cases.
Department: Electrical and Computer Engineering
Subject: Human detection Action recognition Kinect Depth image 3D View-invariant
URI: http://hdl.handle.net/2152/ETD-UT-2012-05-5509
Date: 2012-05

Files in this work

Download File: XIA-THESIS.pdf
Size: 1.979Mb
Format: application/pdf

This work appears in the following Collection(s)

Show full record


Advanced Search

Browse

My Account

Statistics

Information