Pedestrian crash prediction and analyzing contributing factors across Texas : an interpretable machine learning approach
This study applied tree-based machine learning methods to investigate the contributing factors to both crash frequency and injury severity in vehicle-pedestrian crash events. Vehicle and roadway characteristics, driver and pedestrian attributes, traffic controls and land use conditions, transit provision and weather conditions are used as covariates to predict pedestrian crash frequencies (by roadway segment) and injury severity levels (for pedestrians struck by vehicles on public roadways). In both cases, tree-based models offered significantly more prediction accuracy than traditional statistical models (using negative binomial and ordered probit specifications, with the same covariates). The tree-based models also offer valuable interpretability through the regression tree graph itself (with clear branching based on variable cut-points), variable importance plots (for each covariate), and partial dependence plots to help analysts understand the relationship between contributing factors and the target variable (count or severity). Average daily vehicle-miles travelled (DVMT) on each road segment, population density, segment length, census tract-level job density, distance from nearest K-12 school, transit stop density, and segment speed limits were estimated to be the top contributing factors for increasing pedestrian crash counts. DVMT has been found as the single most responsible factor for vehicle-pedestrian crashes and in a way representing pedestrian exposure to such situations. In terms of predicting injury outcomes, intoxication of the pedestrian and/or driver, higher speed limits at the site, crash location not being in the traffic way, older pedestrian, interstate highway locations, and dark and unlit conditions were predictors for more severe outcomes. Importantly, if the surrounding urban area’s population is reasonably high (over 25,000 persons), the probability of the pedestrian dying falls significantly, which supports the ‘safety in numbers’ idea, for more people available to help save the crash victims, or drivers going more slowly due to crowded conditions, closer hospitals, and so on. While few crash studies have included land use variables and local demographics, including proximity to schools, hospitals, and transit stops, population and jobs density variables appeared to add to crash counts and severity for pedestrians, though this is moderated by the 25,000-population threshold and distance variables.