Browsing by Subject "Human computer interaction"

Now showing 1 - 2 of 2

Acoustic sensing on smart devices
(2019-09-18) Mao, Wenguang; Qiu, Lili, Ph. D.; Mok, Aloysius; Gouda, Mohamed; Yun, Sangki
Smart devices, such as smartphones, smartwatches, and smart speakers, become increasingly popular and change people's daily life in a profound way. However, the usability and functionality of these devices are still limited since there lack of effective sensing techniques to understand users and environments. Particularly, the motion and shape of a target, such as a user's hand or a nearby object, are the most important information to understand user behaviors or gain knowledge about environments, but the existing sensors on smart devices cannot capture the fine-grained motion of a target or the shape of an object under darkness or occlusion. To solve this problem, we study novel sensing techniques using acoustic signals. Specifically, we use speakers to play inaudible signals. During the propagation, the signals will be affected by the motion and shape of a target object in a deterministic way. By collecting and analyzing these signals, we can infer such information about the target object. We explore acoustic signals since they are beneficial to achieve high sensing accuracy due to low propagation speed. They are also widely available since most smart devices are equipped with speakers and microphones. In this dissertation, we develop four acoustic sensing systems on smart devices. We first propose two motion tracking systems. They are called device-based tracking and device-free tracking, respectively, according to whether a user needs to carry a device (e.g., a smartphone or smart watch) for tracking purpose. Based on tracking techniques, we design a novel application that allows a drone to automatically follow a user for video taping. Beside motion tracking, we also design acoustic imaging system to sense the object shape under darkness or occlusion. We elaborate these systems as below. First, we develop a device-based motion tracking system that turns a smartphone into a motion controller. It provides a new way to interact with and control video games, VR/AR headsets, and smart appliances. In this system, we use multiple speakers to play inaudible sounds. Based on received signals, the smartphone estimates its distance to various speakers and then derives its position. The key technique in our system is a distributed Frequency Modulated Continuous Waveform (FMCW) that can accurately estimate the distance between a pair of unsynchronized speaker and microphone. Moreover, we incorporate MUltiple SIgnal Classification (MUSIC) with FMCW to minimize the interference caused by multipath. We further develop an optimization framework to combine FMCW estimation with Doppler shifts and Inertial Measurement Unit (IMU) measurements to enhance the accuracy. We implement the system on a mobile phone and demonstrate that it achieves the millimeter level tracking accuracy with experiments. Second, we develop a device-free motion tracking system that allows users to control smart speakers using their hand gestures. To this end, we develop a novel recurrent neural network (RNN) based system that uses speakers and microphones to realize accurate room-scale tracking. Our system jointly estimates the propagation distance and angle-of-arrival (AoA) of signals reflected by the hand, based on AoA-distance profiles generated by 2D MUSIC. We design a series of techniques to significantly enhance the profile quality under low SNR. We feed the profiles in a recent history to our RNN to estimate the distance and AoA. In this way, we can exploit the temporal structure among consecutive profiles to remove the impact of noise, interference and mobility. We implement the system on a smart speaker development platform. To our best knowledge, this is the first acoustic device-free tracking system working at the room scale. Third, based on the tracking techniques, we develop a novel system that allows a drone to automatically follow a user for video taping indoors. This not only reduces the efforts of human photographers, but also supports video taping in situations where otherwise not possible. To this end, we apply the tracking techniques to determine the relative position between the drone and user and develop a controller based on model predictive control (MPC) framework to control the drone. We solve the practical challenges of applying MPC in our system, including drone modeling, user movement prediction, and integrating the tracking information for control feedback. We implement the system on a commercial drone and show that it can follow a user effectively and maintain a specified following distance and orientation. Fourth, we develop a novel acoustic imaging system only using a smartphone. It is an attractive alternative to camera based imaging under darkness and obstruction. Our system is based on Synthetic Aperture Radar (SAR). To image an object, a user moves a smartphone along a predefined trajectory to mimic a virtual sensor array. SAR based imaging poses several new challenges in our context, including strong interference, trajectory errors due to hand jitters, and severe speaker and microphone distortion. We address these challenges by developing a 2-stage interference cancellation scheme, a new algorithm to compensate trajectory errors, and an effective method to minimize the impact of signal distortion. Based on these techniques, we implement the first acoustic imaging system on a commercial smartphone.
Appropriate, accessible and appealing probabilistic graphical models
(2017-05-04) Inouye, David Iseri; Dhillon, Inderjit S.; Ravikumar, Pradeep; Mooney, Raymond J; Huang, Qixing; Wallace, Byron C
Appropriate - Many multivariate probabilistic models either use independent distributions or dependent Gaussian distributions. Yet, many real-world datasets contain count-valued or non-negative skewed data, e.g. bag-of-words text data and biological sequencing data. Thus, we develop novel probabilistic graphical models for use on count-valued and non-negative data including Poisson graphical models and multinomial graphical models. We develop one generalization that allows for triple-wise or k-wise graphical models going beyond the normal pairwise formulation. Furthermore, we also explore Gaussian-copula graphical models and derive closed-form solutions for the conditional distributions and marginal distributions (both before and after conditioning). Finally, we derive mixture and admixture, or topic model, generalizations of these graphical models to introduce more power and interpretability. Accessible - Previous multivariate models, especially related to text data, often have complex dependencies without a closed form and require complex inference algorithms that have limited theoretical justification. For example, hierarchical Bayesian models often require marginalizing over many latent variables. We show that our novel graphical models (even the k-wise interaction models) have simple and intuitive estimation procedures based on node-wise regressions that likely have similar theoretical guarantees as previous work in graphical models. For the copula-based graphical models, we show that simple approximations could still provide useful models; these copula models also come with closed-form conditional and marginal distributions, which make them amenable to exploratory inspection and manipulation. The parameters of these models are easy to interpret and thus may be accessible to a wide audience. Appealing - High-level visualization and interpretation of graphical models with even 100 variables has often been difficult even for a graphical model expert---despite visualization being one of the original motivators for graphical models. This difficulty is likely due to the lack of collaboration between graphical model experts and visualization experts. To begin bridging this gap, we develop a novel "what if?" interaction that manipulates and leverages the probabilistic power of graphical models. Our approach defines: the probabilistic mechanism via conditional probability; the query language to map text input to a conditional probability query; and the formal underlying probabilistic model. We then propose to visualize these query-specific probabilistic graphical models by combining the intuitiveness of force-directed layouts with the beauty and readability of word clouds, which pack many words into valuable screen space while ensuring words do not overlap via pixel-level collision detection. Although both the force-directed layout and the pixel-level packing problems are challenging in their own right, we approximate both simultaneously via adaptive simulated annealing starting from careful initialization. For visualizing mixture distributions, we also design a meaningful mapping from the properties of the mixture distribution to a color in the perceptually uniform CIELUV color space. Finally, we demonstrate our approach via illustrative visualizations of several real-world datasets.