A Machine Learning Approach: Socio-economic Analysis to Support and Identify Resilient Analog Communities in Texas
Identification of analog resources or items are important during the planning and development of new communities because available information is usually limited or absent. Conventionally, analogs are made by domain experts however, this is not always readily obtainable. Coupled with this challenge, most of the available data in socioeconomic systems have high dimensionality making interpretation, and visualization of these datasets difficult. Hence, it is crucial to adopt a workflow that can be used to identify analogs regardless of its existing high dimensionality. To this end, we present a systematic and unbiased measure, group similarity score (GCS) and similarity scoring metric (SSM) to support the predictive search of missing properties for target communities and identification of analogous cities based on available socioeconomic data and modeling. Knowing that each Texan community can be characterized by its associated properties, the workflow combines both spatial and multivariate statistics in a novel manner to determine the GCS & SSM whilst visualizing the associated uncertainty space. The workflow consists of three major steps: 1) key parameter selection via feature engineering, 2) multivariate and spatial analysis using multidimensional scaling (MDS) and density-based spatial clustering of applications with noise (DBSCAN) for clustering analysis, 3) similarity ranking using a modified Mahalanobis distance function as a clustering basis on preprocessed data. Afterwards, to assess the quality of the predicted feature and analog communities obtained, K-nearest neighbor algorithm is applied, then the analog cities are found. The workflow is demonstrated using on high dimensional socio-economic data. We find analogs for each community cluster identified with their GCS and SSM in relation to 4 randomly selected communities used for testing. Thus, it is recommended to apply the integration of this workflow in uncertainty exploration, trend-mappings, and community analog assignment, and benchmarking to support decision making.