Semantic representation and recognition of human activities
MetadataShow full item record
This dissertation describes a methodology for automated recognition of complex human activities. The dissertation presents a general framework which reliably recognizes various types of high-level human activities including human actions, human-human interactions, human-object interactions, and group activities. Our approach is a description-based approach, which enables a user to encode the structure of a high-level human activity as a formal representation. Recognition of human activities is done by semantically matching constructed representations with actual observations. The methodology uses a context-free grammar (CFG) based representation scheme as a formal syntax for representing composite activities. Our CFG-based representation enables us to define complex human activities based on simpler activities or movements. We have constructed a hierarchical framework which automatically matches activity representations with input observations. In the low-level of the system, image sequences are processed to extract poses and gestures. Based on the recognition of gestures, the high-level of the system hierarchically recognizes complex occurring human activities by searching for gestures that satisfies the temporal, spatial, and logical structure described in the representation. The concept of hallucinations and a probabilistic semantic-level recognition algorithm is introduced to cope with imperfect lower-layers. As a result, the system recognizes human activities including 'fighting', 'assault', 'a person leaving a suitcase', and 'a group of thieves stealing an object from owners', which are high-level activities that previous systems had difficulties. The experimental results show that our system reliably recognizes sequences of various types of complex human activities with a high recognition rate.