A new spatial model for predicting multivariate counts : anticipating pedestrian crashes across neighborhoods and firm births across counties
Transportation research regularly relies on data exhibiting both space and time dimensions. Thanks to the rise of smartphones, Bluetooth, and other devices, geo-referenced data collection enables application of more behaviorally realistic -- but complex -- models that account for spatial autocorrelation, temporal correlation, and possible time-space interactions (e.g., time-lagged effects from a neighboring unit's response). One promising area is crash count prediction, where crash frequencies (and severities) at zones, intersections, and along roadways will generally exhibit some spatial relationships, due to missing variables, causal mechanisms, and other ties. This dissertation work proposes and estimates a spatial multivariate count model and provides two case studies to implement such model. One case study is in the context of pedestrian-vehicle crash counts across zones in Austin, Texas, while accounting for network features (e.g., lane-miles and intersection density), land use factors (such as land use entropy and residential accessibility to commercial activities), population and job densities, and school access. The other case study pertains to new firm births by industries across U.S. counties while controlling for population density, agglomeration economies (e.g., percentage of firms with more than 100 people), wealth, and median age. The new model specification captures region-wide heterogeneity (thanks to extra variation introduced by the lognormal component in the mean crash-rate specification), correlations across two (or more) count types (in the same zone), and spatial autocorrelation among unobserved components. This new approach and associated application allow analysts to distinguish covariates' effects on multivariate crash and other counts from spatial spillover effects and cross-response correlations. This work adds to the literature by providing guidance on what types of specifications best reflect spatial count data while facilitating estimation (using large data sets) and illuminating the level and nature of spatial autocorrelation, multivariate correlation, and region-wide (latent) heterogeneity that exists in crash data after controlling for a host of observable factors.