Browsing by Subject "Human activity recognition"

Now showing 1 - 4 of 4

Recognition of human interactions with vehicles using 3-D models and dynamic context
(2012-05) Lee, Jong Taek, 1983-; Aggarwal, J. K. (Jagdishkumar Keshoram), 1936-; Bovik, Alan C.; Geisler, Wilson S.; Grauman, Kristen; de Veciana, Gustavo
This dissertation describes two distinctive methods for human-vehicle interaction recognition: one for ground level videos and the other for aerial videos. For ground level videos, this dissertation presents a novel methodology which is able to estimate a detailed status of a scene involving multiple humans and vehicles. The system tracks their configuration even when they are performing complex interactions with severe occlusion such as when four persons are exiting a car together. The motivation is to identify the 3-D states of vehicles (e.g. status of doors), their relations with persons, which is necessary to analyze complex human-vehicle interactions (e.g. breaking into or stealing a vehicle), and the motion of humans and car doors to detect atomic human-vehicle interactions. A probabilistic algorithm has been designed to track humans and analyze their dynamic relationships with vehicles using a dynamic context. We have focused on two ideas. One is that many simple events can be detected based on a low-level analysis, and these detected events must contextually meet with human/vehicle status tracking results. The other is that the motion clue interferes with states in the current and future frames, and analyzing the motion is critical to detect such simple events. Our approach updates the probability of a person (or a vehicle) having a particular state based on these basic observed events. The probabilistic inference is made for the tracking process to match event-based evidence and motion-based evidence. For aerial videos, the object resolution is low, the visual cues are vague, and the detection and tracking of objects is less reliable as a consequence. Any method that requires accurate tracking of objects or the exact matching of event definition are better avoided. To address these issues, we present a temporal logic based approach which does not require training from event examples. At the low-level, we employ dynamic programming to perform fast model fitting between the tracked vehicle and the rendered 3-D vehicle models. At the semantic-level, given the localized event region of interest (ROI), we verify the time series of human-vehicle relationships with the pre-specified event definitions in a piecewise fashion. With special interest in recognizing a person getting into and out of a vehicle, we have tested our method on a subset of the VIRAT Aerial Video dataset and achieved superior results.
Recognizing human activities from low-resolution videos
(2011-12) Chen, Chia-Chih, 1979-; Aggarwal, J. K. (Jagdishkumar Keshoram), 1936-; Baldick, Ross; Bovik, Alan C.; Geisler, Wilson S.; Grauman, Kristen
Human activity recognition is one of the intensively studied areas in computer vision. Most existing works do not assume video resolution to be a problem due to general applications of interests. However, with continuous concerns about global security and emerging needs for intelligent video analysis tools, activity recognition from low-resolution and low-quality videos has become a crucial topic for further research. In this dissertation, We present a series of approaches which are developed specifically to address the related issues regarding low-level image preprocessing, single person activity recognition, and human-vehicle interaction reasoning from low-resolution surveillance videos. Human cast shadows are one of the major issues which adversely effect the performance of an activity recognition system. This is because human shadow direction varies depending on the time of the day and the date of the year. To better resolve this problem, we propose a shadow removal technique which effectively eliminates a human shadow cast from a light source of unknown direction. A multi-cue shadow descriptor is employed to characterize the distinctive properties of shadows. Our approach detects, segments, and then removes shadows. We propose two different methods to recognize single person actions and activities from low-resolution surveillance videos. The first approach adopts a joint feature histogram based representation, which is the concatenation of subspace projected gradient and optical flow features in time. However, in this problem, the use of low-resolution, coarse, pixel-level features alone limits the recognition accuracy. Therefore, in the second work, we contributed a novel mid-level descriptor, which converts an activity sequence into simultaneous temporal signals at body parts. With our representation, activities are recognized through both the local video content and the short-time spectral properties of body parts' movements. We draw the analogies between activity and speech recognition and show that our speech-like representation and recognition scheme improves recognition performance in several low-resolution datasets. To complete the research on this subject, we also tackle the challenging problem of recognizing human-vehicle interactions from low-resolution aerial videos. We present a temporal logic based approach which does not require training from event examples. At the low-level, we employ dynamic programming to perform fast model fitting between the tracked vehicle and the rendered 3-D vehicle models. At the semantic-level, given the localized event region of interest (ROI), we verify the time series of human-vehicle spatial relationships with the pre-specified event definitions in a piecewise fashion. Our framework can be generalized to recognize any type of human-vehicle interaction from aerial videos.
Small-scale wireless sensors for automated dietary monitoring
(2023-05-01) Chun, Keum San; Thomaz, Edison; Kleinberg, Samantha; Valvano, Jonathan W.; Barbaro, Kaya de; Julien, Christine
With advances in wearable and mobile computing technology, a significant body of work has been dedicated to the field of automated dietary monitoring (ADM). The ability to detect eating activities in naturalistic environments steadily improved over the past decade. Nonetheless, mainstream adoption of the technologies has been hampered by obtrusive form factors (e.g., neck band, headphones, chest band) and high false positive rates in naturalistic settings. This dissertation addresses such problems through small-scale (<15cm²) sensing systems. Specifically, I demonstrate that the small-scale sensors can realize simple and effective ADM systems through on-body localized and targeted remote sensing in minimally obtrusive form factors. The work discussed in the present dissertation encompasses three form factors (necklace, adhesive patch, and mouthpiece) and five sensing modalities (proximity, temperature, acceleration, humidity, and gas sensing) across five studies I conducted with a total of 90 human subjects. In this dissertation, I show small-scale wireless sensors are capable of monitoring dietary activities in practical manner through localized and targeted sensing. The small-scale wireless sensors investigated in this dissertation include: FlashBite, IntoXense, Sticki v1, and Sticki v2. In the discussion of FlashBite and IntoXense that take necklace form factor, I demonstrate that mastication and alcohol intake activities can be inferred using targeted remote sensing. And in subsequent discussion of ADM approaches with Sticki v1 and Sticki v2, I demonstrate that localized sensing in adhesive patch form factor can effectively balance the practicality and eating detection performance. Furthermore, I show that localized sensing in the intraoral space with Sticki v2 embedded into a mouthpiece can provide useful information for inferring food-related information.
Towards lifelong and long-term sensor-based human activity recognition
(2023-08) Adaimi, Rebecca; Thomaz, Edison; Vikalo, Haris; Wang, Atlas; Ploetz, Thomas; Julien, Christine
The field of Human Activity Recognition (HAR) has witnessed remarkable advancements in the past decade, fueled by advancements in mobile devices, sensors, and computational methods. HAR has found widespread applications in areas such as mobile health monitoring, disorder diagnosis, personal assistance, and smart environments. However, despite significant progress, HAR faces critical learning challenges that hinder its deployment in real-world scenarios. The prevailing approach in HAR relies on offline data collected under controlled laboratory settings, which fails to capture the dynamic nature of real-world environments and individuals’ behaviors. Moreover, the heterogeneity of sensor devices, specifications, and placements introduces further challenges for long-term deployment. To address these shortcomings, lifelong adaptive learning or continual learning, in HAR, becomes crucial to enable large-scale, long-term, and sustainable activity recognition systems. This dissertation aims to tackle the challenges of lifelong adaptive learning in HAR, focusing on three key components: ground truth annotation with minimal user burden, distribution shifts in HAR, and continual model adaptation. The first component tackles the time-consuming task of acquiring ground truth information by exploring active learning in an effort to minimize data annotation effort and in turn user burden. The second component delves into the problem of distribution shifts introduced by device heterogeneity, sensor placements, and contextual environments, and their impact on model generalizability. The third component explores incremental model adaptation to accommodate changes in behavior and environment as well as introducing new activity classes while mitigating catastrophic forgetting of previously learnt information. A proposed continual learning framework, LAPNet, addresses such challenges. To drive new research in this area, we also introduce a large-scale, stronglylabelled, multi-device, and multi-location human activity dataset collected from inertial sensors, MotionPrint. This dataset was collected from 50 people, time-aligned across six locations, with sampling rates up to 800 Hz, resulting in 200 cumulative hours and 408 million samples of data. Our comprehensive analysis reveals a significant challenge in constructing lifelong deployable motion models that can effectively generalize to diverse sensor locations. This observation presents a valuable opportunity for the machine learning community to engage in new research endeavors focused on developing adaptable and generalizable learning algorithms. Addressing the issue of sensor heterogeneities is a critical aspect in realizing our vision of lifelong learning for HAR. Furthermore, we introduce a pioneering line of research, cross-location data synthesis. This research explores the problem of generating synthetic motion data in one location using available data from another location. The work presented in this dissertation contributes to pushing the field of HAR towards long-term sustainable deployment in real-world settings. By addressing the challenges of achieving practical deployable recognition systems, the research provides insights and methodologies for developing personalized and adaptable activity recognition systems that can continuously learn and adapt to evolving circumstances, enhancing their usability and effectiveness in practical applications. In addition, the release of MotionPrint lowers the entry barrier and facilitates progress in the field of lifelong adaptive learning for HAR, fostering the development of innovative techniques and impactful applications in ubiquitous computing and interactive systems.