TACCSTER 2021 Proceedings

Permanent URI for this collectionhttps://hdl.handle.net/2152/89497


Recent Submissions

Now showing 1 - 20 of 30
  • Item
    Quantum Mechanical Studies of Topiroxostat Analogs for XO inhibition
    (2021) James, Samantha; Dong, Chao
    Gout is a chronic, inflammatory condition due to slow urate metabolism. The xanthine oxidase inhibitors, such as allopurinol and febuxostat, are recommended to reduce the uric acid level and prevent gout to attack in adult patients. The emerging new generation of xanthine inhibitor, topiroxostat (FYX051), displays a high cost effectiveness compared with febuxostat therapy in chronic gout patients. Here, we designed several modified inhibitors with 1,2,4-triazole ring being replaced by furan, thiophene, pyrrole, pyrazole. The quantum mechanical computational method is performed to obtain binding energy of these topiroxostat analogs with xanthine oxidase active site. the preminary data indicates there is a interesting linear relationship between binding energy and polarity of topiroxostat analogs.
  • Item
    Exploring the Role of Interfacial Cation in F Ion Channel using MD Simulation: Application of Computational Chemistry
    (2021) Chezhian, Aru; Momin, Zabin; Torabifard, Hedieh
    For many microbes, fluoride ion (F-) is toxic in high concentrations. To resist Ftoxicity, microbes have evolved a resistance mechanism, in which the Fluc channel exports Fions with high selectivity. Fluc has several unique features including a dual topology dimeric architecture. It has been shown that a Na+ ion is located at the interface of the dimer, however, the proposed Na+ is tetrahedrally coordinated while Na+ usually coordinates with 5 or 6 ligands. This study provides details about the role of a tetrahedrally–coordinated sodium ion in the structural stability and aid in identifying the contributing residues in high Fselectivity. We are modeling Fluc with various cations including Mg2+ and Mn4+ to provide a comprehensive comparison of Fluc structural stability and conformational changes. This research proposes an alternate interfacial ion for Fluc and could have larger implications for future study of this channel and other cation-coupled transporters for antimicrobial drug design.
  • Item
    Numerical Simulations against Experimental Data: A Comparison of Atmospheric Flow over Real Terrain Topography
    (2021) Bernardoni, Federico; Ciri, Umberto; Leonardi, Stefano
    The application of reduced order models for the estimations of local atmospheric flows in presence of complex terrain topography is challenged by the variety of the terrain geometrical features and of the atmospheric flow parameters. As an example, during the assessment of new wind farm site there is the need to accurately estimate how the terrain topography affects the local flow features and, as a consequence, the Annual Energy Production. In this study we compare numerical and experimental data of the flow over real terrain topography. We consider the complex terrain of the Perdigão site, in Portugal. This site consists of two parallel ridges with a wind turbine located on the top of one of the ridges. In 2017 a field campaign collected data at several locations. We carried out Large Eddy Simulations of the flow over the site topography and compared them with experimental data. In particular, wind velocity profile and turbulence intensities from numerical simulations are compared at the locations of the field campaign met-towers inside the valley and on top of the ridges. The vegetation is also reproduced in the numerical simulations showing its significant effect on the wind velocity profile close to the ground when comparing numerical simulations with experimental data.
  • Item
    Flow Stabilization in a 3D Channel flow
    (2021) Balogh, Andras; Vasquez, Camille
    We present the results of numerical simulations for the boundary feedback stabilization of the parabolic steady state profile of the incompressible Navier-Stokes Equations in a 3D channel flow. The computation is based on a MPI code written in FORTRAN that uses a hybrid pseudospectral-finite difference discretization and fractional step technique. The decentralized, static boundary feedback control laws are derived using Lyapunov technique. While the theoretical results are limited to stability enhancement for small Reynolds numbers, the numerical results demonstrate the effectiveness of the proposed feedback law even in cases when the uncontrolled flow is turbulent.
  • Item
    Using Supercomputing Resources in Genomic Research
    (2021) Bacolla, Albino; Tsai, Chi-Lin; Ye, Zu; De-Paula, Ruth B; Moiani, Davide; Ahmed, Zamal; Tainer, John A
    TACC resources have proven to be critical and enabling to mine cancer genomic data, genomic variants associated with human disease and polymorphic human traits, addressing biological questions otherwise non-approachable by conventional experiments. We have developed computational scripts that we use in a parallel environment to harness the capabilities of TACC HPCs, and which we have made publicly available on GitHub. In selected peer-review publications acknowledging TACC support, we have reported the association of DNA sequences able to form alternative DNA structures (or non-B DNA) with sites of chromosomal breaks leading to gross chromosomal translocations in cancer genomes, with sites of gene duplication predisposing to Parkinson’s disease, and most recently with regions of increased polymorphism in the human population. We found an exquisite correlation between the expression of selected genes and the mutational burden in cancer patients. While solving the crystal structure of a poorly characterized exonuclease, named EXO5, TACC resources enabled the assignment of a role for EXO5 in the cellular response to DNA damage, a vital pathway used by tumors to survive and grow, along with key genes whose high expression is linked to poor survival in cancer patients. Most recently, during the discovery of a nuclear role for GRB2, an adaptor protein previously thought to act only in the cytoplasm, TACC resources enabled us to test hypotheses derived from laboratory data. We were gratified to confirm the laboratory prediction that high expression of GRB2, together with its binding partner the MRE11 nuclease, carries accurate prognostic power for poor patient survival in breast cancer patients proficient in DNA homology-directed repair. These composite findings, significantly facilitated by TACC resources, have been critical to further our understanding in biological processes relevant to human disease, and to provide knowledge for the development of more precise therapeutic tools aimed at improving human health.
  • Item
    Unified Robust Estimation via the COCO
    (2021) Wang, Zhu
    Robust estimation is primarily concerned with how to provide reliable parameter estimates in the presence of outliers. Numerous robust loss functions have been proposed in regression and classification, along with various computing algorithms. This article proposes a unified framework for loss function construction and parameter estimation. The CC-family contains a composite of concave and convex functions. The properties of the CC-family are investigated, and CC-estimation is innovatively conducted via composite optimization by conjugation operator (COCO). The weighted estimators are simple to implement, demonstrate robust quality in penalized generalized linear models and support vector machines, and can be conveniently extended to even more broad applications with existing software. The data analysis in this work was conducted on TACC systems utilizing parallel computing power.
  • Item
    Posterior-vertical White Matter Tracts Cluster with Ventral Stream Tracts in Development and Predict Behavioral Variability
    (2021) Vinci-Booher, S; Caron, B; Bullock, D; James, K H; Pestilli, F
    A relatively unexplored group of white matter tracts has been recently described and made available for scientific investigation. These posterior vertical white matter tracts connect cortical regions associated with the ventral and dorsal visual streams, making them distinct from the well-studied, canonical tracts that connect anterior and posterior cortical regions. We utilized diffusion MRI and open-source cloud computing to characterize the development of vertical white matter tracts in a cross-sectional sample of 24 children (5-8 years old) and 12 adults (18-22 years old). Horizontal tracts within the ventral visual stream (ILF, IFOF) had FA that was more adult-like than horizontal tracts in the dorsal cortex (SLF1/2 and SLF3), consistent with prior work. Results from a clustering analysis demonstrated that the mean FA of the posterior vertical tracts (pArc, TPC, MdLF-spl, MDLF-ang) was more similar to that of the ventral tracts than the dorsal tracts. Performance on a perceptual matching task predicted FA in the pArc in the child sample. Our results suggest that posterior vertical tracts develop later than ventral stream tracts but earlier than tracts in the dorsal cortex and that the development of posterior vertical tracts may be related to the influence of perceptual processing on dorsally mediated action in childhood.
  • Item
    An End-to-End Framework for Informing Hurricane Resilience Investments in Critical Infrastructure
    (2021) Shukla, Ashutosh; Austgen, Brent; Kutanoglu, Erhan; Hasenbein, John
    It is estimated that hurricanes making landfall in the United States over the last decade have collectively cost more than $500B, and recent studies suggest that we are likely to experience more frequent and intense hurricanes in the future. Prudent resilience investments are paramount to decreasing the cost and negative effects of critical infrastructure failures that result from such extreme weather events. We have developed an end-to-end framework for forecasting hurricane effects on short- and mid-term time horizons and subsequently leveraging those simulations in stochastic optimization models designed to inform resilience investment decisions. We present this framework with emphasis on its multiple aspects that currently or soon may be able to leverage HPC systems like those hosted by TACC. Those aspects include Monte Carlo simulation of hurricane effects using computationally expensive physical models like WRF-Hydro, NWM, and ADCIRC and the parallelization features supported by state-of-the-art optimization solvers. In the near future, we hope to also leverage parallelizable decomposition schemes for solving large-scale optimization problems (e.g., those from mpi-sppy). HPC has enabled us to solve problems at scales seldom seen in academic literature, and our results are helping shape investment policies for critical infrastructure resilience.
  • Item
    Prediction of Ligand Activity at Subcellular Location
    (2021) Varshney, Manikya; Verma, Srijan; KC, Govinda; Bocci, Giovanni; Oprea, Tudor I; Sirimulla, Suman
    Understanding subcellular distribution and the mechanism of xenobiotics can help in modulating subcellular dysfunction mediated diseases. Therefore, with improved knowledge of how xenobiotics are distributed across subcellular locations and the mechanism for a specific molecule can play a crucial role in assessing drug efficacy and toxicity. Such knowledge would widen therapeutic windows by allowing specific receptors to be targeted efficiently. Based on datasets that provide information on the subcellular locations of proteins and their ligands, we developed machine learning models for 42 subcellular locations. Such models were trained and validated based on the grid search method and best models based on Cohen’s Kappa scores were selected. With the help of the state-of-the-art supercomputing facilities provided by the Texas Advanced Computing Center(TACC), we were able to develop a suite of more than 22300+ machine learning models. These machine learning models were built using 19 different fingerprints-based features for 42 different subcellular locations using 28 different ML classifiers. The web-application is available on an open portal and can be accessed at https://drugdiscovery.utep.edu/subcell/ by anyone in order to perform high-throughput cheminformatics simulations. All the data and models generated from the project are made available as open-source.
  • Item
    Targeting the Undruggable: A Structure Guided Approach to Targeting KRAS
    (2021) Touhami, Rim; Ranganathan, Srivathsan
    Cancer Early Detection Advanced Research (CEDAR), Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon KRAS gene mutations have been found in up to one-fourth of lung cancer cases, as well as colon and pancreatic cancer [1] . The KRAS gene belongs to the Ras family of oncogenes, which can be involved in transformation of normal cells into cancerous ones under certain circumstances. The KRAS gene encodes for the GTPase protein called K-Ras, whose objective is to relay outer-cell signaling to the cell’s nucleus instructing cell growth and proliferation, or differentiation [2] . Mutations have been shown to lead to uncontrollable cell growth and cancer progression [3]. Although decades of treatment research have been performed, there still remains a challenge in developing a drug directly targeting K-Ras [5]. This is due to its subtle changes in the structure of the mutated protein, as well as its intracellular location [1,5]. Sole success in targeting KRAS has been achieved through small molecules, however this method is still presented with challenges such as low binding affinity. Another promising venue to developing allele-specific inhibitors against KRAS has been found through protein scaffolds. Protein binders can be engineered to bind targets with high affinity and specificity and because of their small size, they are well suited for targeting proteins within cells [5,6]. This was illustrated by the inhibitor R11.1.6, engineered in a study by M.I.T [5] . This project focuses on the development of novel protein binders to KRAS using known 3D structure guided approach. There are hundreds of known KRAS structures with interacting partners, including GAP & GEF proteins, with KRAS in various conformational states. This information is a very good starting point for identifying peptide binders that could be evolved into high affinity reagents for targeting KRAS mutants. This research will also shed light on K-Ras interactions, while providing insight into targeted drug treatment for K-Ras.
  • Item
    Detection of Ransomware using Immutable Anomalous Performance Data
    (2021) Thummapudi, Kumar; Lama, Palden; Boppana, Rajendra V
    Ransomware (RW) is a real threat affecting almost every sector including, government agencies, businesses, educational institutions, and healthcare. The techniques to detect RW work at pre- penetration (before RW is installed on a victim machine), before encryption, and during encryption stages. Pre-penetration detection uses IDS/IPS and network traffic analysis, Trojans delivering RW can bypass. Detection before encryption relies on identifying and blocking the dropper or C&C key exchange. However, this is a short and stealthy activity which makes the detection harder. Detection during encryption, the last line of defense again RW, relies on identifying activities such as high-frequency file access, file entropy changes, and unusual pattern of processor or I/O events. This project aims to detect RW quickly and accurately using hardware performance counters (HPCs). TACC’s Chameleon Cloud is used to set up a Windows machine running on top of a Linux host and KVM hypervisor. Benign and RW are run on the target, with or without a background load of standard windows applications and browser activity, and counts of specific hardware events are captured using the HPCs. Experiments with three different benign apps that use encryption/compression operations and 22 RWs, including Ryuk and Locky from VirusTotal, were run, and HPC data were collected. The captured data is split into 100ms chunks and processed to extract features in time-domain as well as frequency-domain and analyzed using four machine learning (ML) models: the support vector machine, decision tree, K-nearest neighbors, and random forest (RF). The RF model performs the best, with an accuracy of 96.6%. The data collection, processing, and analysis are being implemented on the host Linux machine for real-time detection.
  • Item
    ParaMonte: An Efficient Serial/Parallel MCMC Library
    (2021) Sapkota, Parvat; Osborne, Joshua A; Shashank Kumbhare; Bagheri, Fatemeh; Shahmoradi, Amir
    The scientific inference is a multistep process requiring observational data from which a model/hypothesis is derived. The parameters of this physical model then have to be tuned to more accurately represent data in a process known as model calibration. This calibrated model is then validated and is finally used to predict different quantities of interest. The most fundamental tool for model calibration and uncertainty quantification is the Markov Chain Monte Carlo (MCMC). While existing packages achieve many of the goals of the MCMC simulations, none currently addresses all critical aspects of an MCMC simulation. For instance, packages are frequently limited to only one programming language environment, perform serial or parallel simulations, or lack restart functionality. We present ParaMonte, a generic user-friendly, high- performance Monte Carlo simulation toolbox for serial and parallel Monte Carlo simulations accessible from multiple programming languages. ParaMonte features automatically-enabled restart functionality of all simulations in serial or parallel and comprehensive post-processing and visualization of the simulation results. This package is available to the public under the MIT license from its permanent repository: https://github.com/cdslaborg/paramonte
  • Item
    From Cancer Sequencing Data to Neoantigen Prediction: A Reusable Pipeline using Snakemake
    (2021) Richardson, Jensen; Pritha, Jafrin; Jiang, Wenxuan; Prasad, Rohit; Arasappan, Dhivya; Kowalski-Muegge, Jeanne
    Neoantigens are newly formed peptides formed by somatic mutations that are capable of inducing tumor-specific T-cell recognition. Because neoantigens are expressed specifically in tumor cells, prediction of these neoantigens can lead to personalized immunotherapies for the treatment of cancers. This process involves many steps, the most crucial of which is identification of expressed somatic mutations (or variants) using next generation sequencing data. After evaluating multiple bioinformatics tools for somatic mutation calling, we selected GATK (Genome Analysis ToolKit) for its ability to accurately call expected mutations. There are other steps that need to be performed before and after identification of somatic mutations as well and these include mapping, duplicate marking, annotation of mutation calls, and filtering of mutation calls. We developed a pipeline using the workflow management system Snakemake to perform these steps in order to identify somatic mutations from whole exome and RNA-Seq data. By making this into a snakemake workflow, we are able to easily extend upon it and add more steps as was done for neoantigen prediction. Furthermore, Snakemake submits slurm jobs for each individual step and can intelligently adjust the runtime and processing load for those jobs. This makes it simple to run even very large samples through the pipeline. We have evaluated this pipeline using RNA sequencing and whole exome sequencing data from 46 Multiple Myeloma cell lines and have identified hundreds of expressed mutations per cell line. This reusable and expandable pipeline can serve as a useful resource for other researchers looking to identify expressed mutations and make neoantigen predictions from cancer sequencing data.
  • Item
    DFT Investigation of para-Substituent Effects on the C—C Bond Activation of Benzonitrile by a Zerovalent Nickel Complex
    (2021) Rodriguez, Juliana; Rodriguez, Julissa; Atesin, Abdurrahman C; Jones, William D; Ateşin, Tülay A
    C—C bond activation has been an active area of research due to its extensive range of applications in in industry and synthesis. Despite its significance, the cleavage of a C—C bond has been challenging due to the thermodynamic stability and steric hindrance of C—C σ-bonds. In this study, density functional theory (DFT) calculations on the C—CN bond activation of para-substituted benzonitriles, p-XC6H4CN, where X= NH2, OCH3, CH3, H, F, CO2CH3 CF3 and CN, with the [Ni(dmpe)] fragment as a model for [Ni(dippe)] fragment will be reported. A comparison of the computational results with the previously reported experimental results and the natural bond analysis (nbo) on the η2-nitrile complexes and the Ni(II) oxidative addition products will also be presented.
  • Item
    Contrasting Static and Contextualised Embeddings in the use of Semantic Feature Vectors in Neurophysiological Prediction
    (2021) Dial, Heather; Pugalenthi, Lokesh; Gnanateja, Nike; Tessmer, Rachel; Henry, Maya; Li, Jessy
    Primary progressive aphasia (PPA) is a neurodegenerative syndrome leading to the progressive loss of speech/language. There are three PPA subtypes with unique deficits and underlying brain atrophy. We used temporal response function (TRF) modelling to examine semantic processing across PPA subtypes and age-matched controls (n = 10 per group). TRF modelling is a sophisticated regression approach that maps acoustic and/or linguistic features of a continuous stimulus to continuously collected neurophysiological data (e.g., EEG). This approach allows for examination of acoustic/linguistic processing without the need for overt responses and has shown promise for use in PPA 1. EEG responses were collected while participants listened to an audiobook. For each word in the audiobook, feature vectors were derived using word2vec and GPT2; word2vec uses static embeddings whereas GPT2 uses contextualized embeddings, accounting for polysemy and potentially leading to a better approximation of a word’s semantic features2. Currently, it is not clear whether GPT2 is a better approximation of humans’ semantic processing. Thus, we sought to contrast the TRFs produced by word2vec and GPT2. For each model, we derived semantic dissimilarity values for the current word given its context by calculating one minus the Pearson correlation coefficient between the current word’s feature vector and the mean of the previous words’ feature vectors 3. To estimate the extent to which EEG responses could be modelled as a function of semantic dissimilarity, we used the MATLAB Multivariate Temporal Response Function (mTRF) Toolbox 4.Subsequently, TACC was utilized to estimate the null distribution for the TRFs between EEG signals and semantic dissimilarity values. Despite their contrasting embeddings, no significant differences were observed in the TRF’s predictive accuracy between word2vec and GPT2. Ongoing work seeks to disambiguate these models’ similar TRFs. Future research will investigate the utility of TRF modelling in differential diagnosis of PPA subtype.
  • Item
    CFD Multiphase Flow Modeling Inside Structured Packings
    (2021) Phan, Mikey T; Macfarlan, Luke H; Eldridge, R Bruce
    Semi-empirical models such as the Rocha-Bravo-Fair, Delft, and Billet-Schultes models represent the current state-of-the-art in predicting the hydraulic and mass transfer performance of structured packings inside separation devices such as distillation columns. However, these models are derived from experimental data that cover a limited range of operating conditions and chemical systems, thus precluding them from being truly predictive. To develop a more robust and predictive hydraulic model for structured packings, computational fluid dynamic (CFD) modeling is utilized to provide greater insight into the multiphase flow physics present inside these packings. Specifically, this study focuses on the observations of transitional and laminar flow structures inside simplified Representative Elementary Units (REUs) of structured packing predicted via CFD modeling of industrially relevant chemical systems. In addition, the impact of these transitional and laminar flow observations on hydraulic model development is discussed. Next, we present CFD results illustrating that de facto standard turbulence models used previously in the open literature to model flow physics inside structured packings may fail to capture transitional and laminar flow effects properly and thus potentially produce significant errors in predicted pressure drop values. Finally, a methodical study of the momentum transfer at the gas-liquid interface will be presented, showing that careful numerical treatment of the interface improved pressure drop predictions when compared to experimental values.