A Machine Learning Approach: Socio-economic Analysis to Support and Identify Resilient Analog Communities in Texas

dc.contributorPyrcz, Michael
dc.creatorMabadeje, Ademide O.
dc.date.accessioned2023-11-20T22:33:28Z
dc.date.available2023-11-20T22:33:28Z
dc.date.issued2022-08-26
dc.description.abstractIdentification of analog resources or items are important during the planning and development of new communities because available information is usually limited or absent. Conventionally, analogs are made by domain experts however, this is not always readily obtainable. Coupled with this challenge, most of the available data in socioeconomic systems have high dimensionality making interpretation, and visualization of these datasets difficult. Hence, it is crucial to adopt a workflow that can be used to identify analogs regardless of its existing high dimensionality. To this end, we present a systematic and unbiased measure, group similarity score (GCS) and similarity scoring metric (SSM) to support the predictive search of missing properties for target communities and identification of analogous cities based on available socioeconomic data and modeling. Knowing that each Texan community can be characterized by its associated properties, the workflow combines both spatial and multivariate statistics in a novel manner to determine the GCS & SSM whilst visualizing the associated uncertainty space. The workflow consists of three major steps: 1) key parameter selection via feature engineering, 2) multivariate and spatial analysis using multidimensional scaling (MDS) and density-based spatial clustering of applications with noise (DBSCAN) for clustering analysis, 3) similarity ranking using a modified Mahalanobis distance function as a clustering basis on preprocessed data. Afterwards, to assess the quality of the predicted feature and analog communities obtained, K-nearest neighbor algorithm is applied, then the analog cities are found. The workflow is demonstrated using on high dimensional socio-economic data. We find analogs for each community cluster identified with their GCS and SSM in relation to 4 randomly selected communities used for testing. Thus, it is recommended to apply the integration of this workflow in uncertainty exploration, trend-mappings, and community analog assignment, and benchmarking to support decision making.
dc.description.departmentPetroleum and Geosystems Engineering
dc.description.sponsorshipIC2 Institute
dc.identifier.urihttps://hdl.handle.net/2152/122683
dc.identifier.urihttps://doi.org/10.26153/tsw/49486
dc.language.isoen_US
dc.relation.ispartofUT Faculty/Researcher Worksen
dc.rights.restrictionOpen
dc.subjectpetroleum engineering
dc.subjectanalog
dc.titleA Machine Learning Approach: Socio-economic Analysis to Support and Identify Resilient Analog Communities in Texas
dc.typeTechnical report

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
IC2 Report 2022 Mabadeje_Pyrcz.pdf
Size:
664.83 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.64 KB
Format:
Item-specific license agreed upon to submission
Description: