TACCSTER 2022 Proceedings

Permanent URI for this collection


Recent Submissions

Now showing 1 - 20 of 31
  • Item
    Flow and Scalar Transfer Characteristics for a Circular Colony of Vegetation
    (2022-09-29) Kingora, Kamau; Raza, Mishal; Sadat, Hamid
    Local and global flow structures, as well as transfer and transport of a passive scalar from a circular colony of uniformly distributed vegetation stems, are investigated at Re = 2100, Re = 4200, and Re = 8400. The number of stems in the colony is varied from 1 to 284 yielding a solid fraction of 0.0<𝜙𝜙<0.65. The following three flow regimes are identified: a co-shedding flow regime prevails at low solid fraction where wakes of individual cylinders have minimal interaction; a bleeding-wake flow regime is identified at intermediate solid fraction in which stream-wise bleeding flow delays the formation of colony-scale vortices yielding a steady wake between two separated shear layers; and a single-body flow regime is observed at high solid fraction and is accompanied by the commencement of colony-scale vortex shedding. As Reynolds number increases, the separated shear layers observed at intermediate solid fraction break up to form stem- scale vortices that organize themselves in colony scale coherent structures. As the solid fraction increases, drag and Sherwood number experienced by colonies increases linearly and at a reducing rate at low and intermediate solid fractions, respectively, while the net lift remains negligible. At high solid fraction, the commencement of colony-scale vortex shedding is accompanied by a jump in lift and base suction. Pressure and friction lift/drag increase and decrease with an increase in solid fraction, respectively, toward the value experienced by a solid cylinder. Sherwood number, on the other hand, decays exponentially toward the value experienced by a solid cylinder at high solid fraction. Colonies at intermediate solid fraction exhibit the highest scalar transfer but weakest transport in their near field wake. Scalar transfer in colonies with high solid fraction deteriorates with an increase in solid fraction yielding less scalar concentration in their downstream wake. Each case consist of about 14M computational points and computations were performed on TACC LS6 clusters. A typical case converges in 128,000 processor hours.
  • Item
    Using Computational Approaches to Reveal Mechanisms of Kinesin-5 Binding with Microtubule
    (2022-09-29) Guo, Wenhan; Sanchez, Jason E.; Li, Lin
    Kinesins are microtubule-based motor proteins that play important roles ranging from intracellular transport to cell division. Human Kinesin-5 (Eg5) is essential for mitotic spindle assembly during cell division. By combining molecular dynamics (MD) simulations (MD simulations were performed on Stampede2 at the Texas Advanced Computing Center (http://www.tacc.utexas.edu)) with other multi-scale computational approaches, we systematically studied the interaction between Eg5 and the microtubule. We find the motor domain of Eg5 shows predominantly positive potential at the binding interface to attract the tubulin heterodimer which has negative potential on the binding interface. Electric field lines and electrostatic binding forces are provided, which demonstrate attractive forces between Eg5 and the tubulin heterodimer. Additionally, the folding and binding energy analysis reveals that the Eg5 motor domain performs its functions best when in a weak acidic environment. Molecular dynamics analyses of hydrogen bonds and salt bridges demonstrate that, on the binding interfaces of Eg5 and the tubulin heterodimer, salt bridges play the most significant role in holding the complex. The salt bridge residues on the binding interface of Eg5 are mostly positive, while salt bridge residues on the binding interface of tubulin heterodimer are mostly negative. In contrast, the interface between α and β-tubulins is dominated by hydrogen bonds rather than salt bridges. Compared to the Eg5/α-tubulin interface, the Eg5/β-tubulin interface has a greater number of salt bridges and higher occupancy for salt bridges. This asymmetric salt bridge distribution may play a significant role in Eg5’s directionality. The residues involved in hydrogen bonds and salt bridges are identified in this work and may help guide Eg5-focused anticancer drug design.
  • Item
    Rare Copy Number Variants Implicated in Bicuspid Aortic Valve
    (2022-09-30) Carlisle, Steven G.; Albasha, Hasan; Prakash, Siddharth K.
    Background: Bicuspid aortic valve (BAV), the most common congenital heart defect in adults, can lead to many long-term complications. More aggressive complications can manifest in children and adolescents as early onset complications of BAV (EBAV). Rare copy number variants (CNVs) have been implicated in cases of EBAV and related cardiovascular lesions. We hypothesized that rare and highly penetrant CNVs are enriched in EBAV cases. Materials: We developed a computational pipeline to identify rare CNVs in familial EBAV cases (n=394) obtained from the UTHealth BAV Research Registry and BAV probands (n=4216) obtained from the International BAV Consortium. Raw intensity data from Illumina SNP array genotypes was analyzed with three separate computer algorithms (PennCNV, QuantiSNP, and cnvPartition) to generate the initial CNV calls and sample-level quality statistics. CNV calls were merged and refined for case-control analysis with control cohorts (n=16,576) processed using identical methods. A cohort of individuals with left ventricular outflow tract obstruction lesions (LVOTO, n=1561) was used for comparison. Annotation and cataloguing of rare CNVs was performed with PLINK. Results: We identified 308 large (>250 kb) recurrent CNVs in EBAV and BAV cases that are absent or rare (<1:100) in controls. Twenty-five recurrent CNVs intersect with known BAV genes. There were 158 overlapping CNVs between LVOTO and EBAV cases and 90 overlapping CNVs between LVOTO and BAV cases. There were 54 very large (>5000 kb) overlapping CNVs between EBAV and BAV cases and 9 very large overlapping CNVs between LVOTO and BAV cases, 2 of which intersect with known BAV genes. CNVs intersecting with known BAV genes were significantly enriched in cases (P < 1x10-5). Conclusions: We identified rare recurrent CNVs in over 10% of cases, some of which intersect with genes known to be causative of BAV disease. Identification of new candidate genes provides important information for risk stratification.
  • Item
    AlphaFold 2 Monomer: Deployment in an HPC Environment
    (2022-09-29) Yang, Yuntao; Li, Zhao; Shih, David J. H.; Zheng, W. Jim
    AlphaFold2, developed by Google DeepMind, is a breakthrough in the grand challenge of protein structure prediction. While the breakthrough will have profound impact on biomedical research, its application faces significant hurdles due to the computing intensive nature. We overcome this challenge by deploying the AlphaFold 2 pipeline in an HPC environment that fully utilized the computing resources and accelerated the workflow. Specifically, the CPU component of the AlphaFold 2 that includes multiple sequence alignment and template search was deployed on a computer cluster at the Texas Advanced Computing Center (TACC). The high performance of CPU cores and I/O requests on the cluster allowed us to complete over 200 jobs within 10 hours. The GPU component that includes model prediction and refinement was deployed on the latest Nvidia GPU server, and 200 jobs could be completed within 24 hours when 2 jobs run in parallel. The deployed workflow can efficiently use different computing environments to process many protein structure predictions to advance biomedical research.
  • Item
    AlignEM-SWiFT: Graphical Interface for Aligning Electron Micrographs Using Signal Whitening Fourier Transforms
    (2022-09-29) Yancey, J. G.; Bartol, T. M.; Wetzel, A.; Carson, J.; Mendenhall, J. M.; Thiyagarajan, V.; Kuwajima, M.; Harris, K. M.; Sejnowski, T. J.
    We have built an intuitive graphical user interface for aligning serial section electron micrographs (ssEM) using Signal Whitening Fourier Transforms (SWiFT). AlignEM-SWiFT is a graphical extension of SWiFT-IR, a proven suite of image registration programs developed by computer scientist Arthur Wetzel at the Pittsburgh Supercomputing Center. The SWiFT-IR approach achieves high precision image matching but requires specific mathematical understanding that limits its accessibility. AlignEM-SWiFT circumvents shell scripting by generalizing low-level computer instructions and forging useful high-level abstractions. It is able to generate scale image hierarchies, compute affine transforms, generate aligned images using multi-image rendering, generate model images using remodeling, and create alignments with a global 3D coordinate system. Exported alignments comply with the Next- generation File Formats (NGFF) specification. An embedded Neuroglancer viewer enables instantaneous volumetric rendering of exported alignments inside the application window. The control panel adjusts to user needs based on a checkpoint mechanism for tracking project completeness. It also has a terminal- like output display for monitoring running processes. Users new to EM alignment will benefit from on- board documentation, descriptive warning dialogs, and instructive tooltips. Advanced controls and debugging features are available in the menubar. We are developing support for user-defined alignment scenarios or ""recipes"" comprised of interchangeable alignment modules or ""ingredients"". While AlignEM-SWiFT is currently deployed and available to the community via the 3DEM.org Workbench at the Texas Advanced Computing Center (TACC), the work presented here is germane to the latest version which is just days away from being deployed to TACC. This software is part of an effort to integrate with other open-source EM technologies being developed at TACC. Our integrated 3DEM analysis platform will include tools for segmentation, annotation, reconstruction, and tomography.
  • Item
    Applications of Machine Learning Algorithms for Coral Disease Fate in Caribbean Corals
    (2022-09-29) Van Buren, Emily W.; Beavers, Kelsey; MacKnight, Nicholas; Wang, Li; Mydlarz, Laura D.
    The Caribbean is known as a coral disease “hot spot” due to the high prevalence of acute and chronic diseases that have plagued corals in the area. Two diseases, Stony Coral Tissue Loss Disease (SCTLD) and White Plague (WP), are common and infect many coral species. These two diseases have been studied in a genotype- matching study that looked at transcriptomics of baseline, and post-exposure to disease in four species of corals. While transcriptomic studies have improved our knowledge of host response, a knowledge gap regarding the disease risk corals have prior to disease exposure still exists. Understanding disease risk before an outbreak is an important step in modeling disease dynamics of corals as it will help conservation efforts and disease response protocols. One way to identify disease risk is the application of machine learning to identify patterns of expression based on disease outcome. By applying novel but proven layers of machine learning programs from medical research and using healthy corals whose disease fate are known, we can identify which biological processes are relevant to disease susceptibility. We examined six different types of machine learning algorithms for detection of presence/absence of genes and expression patterns correlated o whether the coral got disease when exposed or not. We will report what types of data these algorithms provide and how it can be applied for disease motoring and modeling.
  • Item
    Discovering Pathophysiologic Networks of Temporal Lobe Epilepsy Using the BrainMap Community Portal through the Texas Advanced Computing Center
    (2022-09-29) Towne, Jonathan M.; Eslami, Vahid; Fox, P. Mickle; Cavazos, José E.; Fox, Peter T.
    Temporal lobe epilepsy (TLE) seizures cause regional damage, detectable on imaging of brain structure (VBM) and function (VBM). Damage is mediated by aberrant neuronal activity propagating along existing network architecture. TLE networks remain ill-defined and are of great interest to diagnostic and therapeutic development. Independent component analysis (ICA) can detect neural-networks by computing multi-variate co-occurrence patterns across a volume and is validated for coordinate-based meta-analysis (CBMA/Meta-ICA). Meta-ICA is methodologically distinct from mass-univariate meta-analytics (ALE) that simply detect robust regions/hubs of pathology. Although meta-ICA is typically used to extract canonical/healthy networks, we applied meta-ICA to VBM/VBP reports of TLE-pathology, to infer TLE-specific network anomalies. To identify TLE-networks, BrainMap Community Portal applications were used to access coordinate-results (Sleuth) of 74 experiments (n=1599), model coordinates as spatial probability distributions (weighted by sample-size) and apply ICA (Meta-ICA) at dimensions:d=1&2 (per sample-size restrictions), computing coordinate co-occurrence within and across experiments. TLE pathology-hubs were computed (GingerALE) separately as spatial convergence across studies (ALE), agnostic to within-study co-occurrence. Two anatomically distinct TLE-networks were identified (i.e. no overlap, excepting ALE hubs). Network-IC1 included brain-regions involved in language processing (speech-execution:Z=6.95; speech-cognition:Z=4.10) at d=2, with similar results (spatial-correlation:R=0.89) at d=1. Network-IC2 included brain-regions involved in emotion (reward:Z=4.76) and cognition (attention:Z=4.02; memory:Z=3.80). Meta-ICA implicated a superset of hub regions from GingerALE (IC1-tonsil; IC2-pulvinar, caudate, superior-temporal; Both-hippocampus, MDN-thalamus), uniquely identifying anterior nucleus, insula, supramarginal, and pre/paracentral gyri. Neither network matched canonical networks. VBM/VBP distribution to ICs was homogenous (χ2:p=0.07). Meta-ICA networks align with TLE symptom profiles. IC1 (verbal/visual) disruption can impair communication and cause visual hallucinosis. IC2 (limbic) aligns with social-emotional deficits; dyscognitive seizures disrupt cognition (attention/memory), impair awareness, and induce postictal amnesia. These findings reveal two novel TLE-networks, debut the first low-d meta-ICA detection of disease-networks, and highlight a BrainMap Community Portal use-case with implications in biomarker development within/beyond epilepsy pathologies.
  • Item
    Phosphorylation of Tyrosine 841 Strongly Affects JAK3 Kinase’s Activation
    (2022-09-29) Sun, Shengjie; Li, Lin
    Janus Kinase 3 (JAK3) plays a key role in the proliferation, development, and differentiation of various cells. It regulates gene expression by phosphorylation of Signal Transducer and Activators of Transcriptions (STATs). A new JAK3 kinase domain phosphorylation site was found, tyrosine-841 (Y841). The effects of phosphorylated tyrosine-841 (pY841) on ATP/ADP binding affinities of the JAK3 kinase domain were systematically studied and reported here. With the support of TACC, we applied long all-atom molecular dynamic simulations to study the effects of phosphorylation on Y841. The results show that pY841 reduces the size of the cleft between the N-lobe and C-lobe of the JAK3 kinase domain. However, when an ATP/ADP is bound to the kinase pY841 was found to enlarge the cleft. Additionally, for unphosphorylated JAK3 (JAK3-Y841), the binding forces between the kinase domain and ATP or ADP are similar. After phosphorylation of Y841, JAK3-pY841 exhibits more salt bridges and hydrogen bonds between ATP and kinase than ADP and kinase. Consequently, the electrostatic binding force between ATP and kinase is higher than that between ADP and kinase. The result is that compared to ADP, ATP is more attractive to JAK3 when Y841 is phosphorylated. Therefore, JAK3-pY841 tends to bind ATP rather than ADP.
  • Item
    Molding Aortic Valve Hemodynamics Using a Novel Immersed Boundary Method
    (2022-09-29) Raza, Mishal; Kingora, Kamau; Sadat, Hamid
    This research entails the study of the transfer and transport of a passive scalar around the aortic valve to aid in understanding Calcific Aortic Valve Disease (CAVD). Simulations were conducted using a novel interpolation-free sharp-interface immersed boundary method. The method is generic in nature, enabling imposing boundary conditions for scalar concentration to investigate CAVD. In this study, the 3D geometry of the native tricuspid AV including the cusps, commissures, and sinuses will be reconstructed based on the parametric model developed by (Rami Haj-Ali 2012) based on the AV anatomy and measurements reported in the literature. We will solve advection-diffusion transport equations to find the scalar transport, albeit in a Fluid- Structure Interaction (FSI) setting. The FSI framework will be based on the developed immersed boundary coupled with a solid solver (Calculix (Guido Dhondt 2020)) using preCICE (preCICE 2021). The results will be employed to evaluate the distribution of scalar concentration on leaflets as well as to understand the correlation between the level of concentration and valve movements. The correlation between the predicted scalar concentration and several WSS-based parameters (WSS, WSSG, OSI, GON, RRT) will be also investigated. Parallel simulations are to be conducted on a High-Power Computer from Texas Advanced Computing Center (TACC). Approximately 15 million computational points will be decomposed into 560 processors.
  • Item
    MuBuCo: Mutation Burden Composition
    (2022-09-29) Pugalenthi, Lokesh; Richardson, Jensen; Jiang, Wenxuan; Srinivasan, Harish; Reddy, Himanshu; Pritha, Jafrin; Nanduri, Rahul; Hong, Raymond; Kuhlman, Christopher; Prasad, Rohit K.; Arasappan, Dhivya; Kowalski-Muegge, Jeanne
    Gene mutations can vary by type, in terms of affecting a single site, spanning hundreds of base pairs, or their over/under-representation in cancer cells. Collectively, these mutation types include: single nucleotide variants (SNVs), structural variants (SVs), and copy number variants (CNVs). Tumor mutation burden is one measure widely used throughout cancer research but it is often limited to a single dimension, using SNVs only. We derive a sample mutation burden for each mutation type and combine them to define their relative contribution, forming a mutation burden composition (MuBuCo). We applied MuBuCo to multiple myeloma, a well-recognized, genomically heterogeneous blood cancer. Using 70 multiple myeloma cell lines, we first computationally assessed more than 15 bioinformatics tools to detect each type of variant and selected the best performing ones (using known features) to calibrate and characterize these cell lines. This required more than 10,500 node hours to run on Texas Advanced Computing Center clusters. We also developed a snakemake pipeline incorporating preprocessing and SNV calling. Each cell line’s variant calls were used to calculate each mutation type burden. We further defined expressed mutations by variants found in expressed genes to predict neoantigens. We implemented the results in a query-able application that enables cancer researchers to select MM cell lines of interest and visualize its MuBuCo relative to other cell lines. With this information, we hope to improve our understanding of the molecular background against which these cell lines are used for testing new treatments. We further provide an in silico look at changes in cell lines’ MuBuCo from user specified removal of a single or multiple genes, mimicking a ‘knock out’ experiment. Our application offers a novel mutation analyses whose results are not readily attainable, until now.
  • Item
    Automated Identification of Cotton Diseases and Pests
    (2022) Noore, Adly; Beksi, William J.
    There is a constant need for optimizing food production and farming across the world, espe- cially in impoverished countries where the economy heavily depends on agriculture. The major contributing factors to crop yield loss are pest infestations and plant diseases. The U.N. Food and Agriculture Organization reports that 40% of crops are lost due to pests and diseases. This research project aims to contribute to the field of AI-based assistive technologies in precision agriculture by helping develop models to aid farmers in accurately identifying cotton pests and diseases from uploaded cotton leaf pictures. Farmers will be able to mitigate crop loss by (i) taking remedial ac- tion before additional damage occurs, (ii) minimizing pesticide waste by only spraying unhealthy crops, (iii) mapping areas of the field impacted by infestations, and (iv) reducing the health risks associated with the presence of excessive pesticides in the consumer’s food. However, training a high-quality model is challenging since real-world data contains more healthy leaves than diseased leaves and some plant diseases are more prevalent than others. This results in undesirable class im- balances and biases which lower the model’s accuracy. To address these imbalances, advanced data augmentation with generative adversarial networks (GANs) were used to create realistic-looking diseased plant images. These images were added to the underrepresented classes to balance the dataset and improve the model’s accuracy. Training high-quality GANs is computationally expen- sive. Each GAN took approximately 48 hours to train on a node with two V100 GPUs using TACC resources. Traditional affine data augmentation was also performed separately to compare with the GAN results.
  • Item
    Investigating the Permissive Environment of Perisynaptic Astroglia for Information Storage in the Dentate Gyrus
    (2022-09-29) Nam, A.J.; Kuwajima, M.; Mendenhall, J.M.; Hubbard, D.D.; Hanka, D.C.; Parker, P.H.; Wetzel, A.; Bartol, T.M.; Sejnowski, T.J.; Abraham, W.C.; Harris, K.M.
    Perisynaptic astroglial processes (PAPs), are active modulators of neuronal activity and directly contribute to information processing in the brain. Both in vivo and in vitro experiments have demonstrated that PAPs undergo activity-dependent structural changes. Thus, here we employ cutting-edge resources at the Texas Advanced Computing Center (TACC) to explore PAP structural remodeling associated with long-term potentiation (LTP) and long-term depression (LTD) that may help support local changes in information processing. Long-term potentiation (LTP) and long-term depression (LTD), widely accepted cellular mechanisms of learning and memory, were induced in vivo in the awake adult rat hippocampal dentate gyrus. LTP induction in the middle molecular layer (MML) was achieved by delta-burst stimulation in the medial perforant pathway, a procedure that produced concurrent long-term depression (cLTD) in the outer molecular layer (OML). The contralateral control hemisphere received only baseline stimulation to the medial perforant path. Three-dimensional electron microscopy (3DEM) offers significant advantages over two-dimensional approaches including a more complete view of ultrastructure in all X-Y-Z planes. AlignEM Swift, the state-of-the-art interactive application available at TACC, is integral for achieving the standard of perfect serial section image alignment needed for 3DEM analysis. Furthermore, Blender at TACC, equipped with the computing power of TACC’s supercomputers, similarly facilitates large-scale and realistic PAP reconstructions for visualization and quantitative mesh analysis. Changes to PAP ultrastructure have important implications on the spatiotemporal dynamics of astrocyte calcium signaling. Thus, TACC resources will further enable computational modeling to investigate the functional consequences of PAP morphological changes. Preliminary analysis suggests that more than 80% of all dentate gyrus synapses exhibit some degree of PAP apposition at the axon-spine interface (ASI). Results from this study made possible using TACC systems will contribute to our overall understanding of the cellular mechanism of information processing and the role of specifically astrocytes in this process.
  • Item
    Automated Detection and Tracking of Infield Cotton Bolls
    (2022-09-29) Muzaddid, Md Ahmed Al; Beksi, William J.
    Cotton fiber accounts for nearly 25% of world-wide textile fiber use. Texas produces more cotton than any other state. It contributes approximately 45% of the U.S. cotton production, with about 25% of the entire U.S. crop, and plants more than 6 million acres. Cotton is the state’s leading cash crop and it ranks third behind the beef and nursery industries. The number of cot- ton bolls on a given farm is arguably the most important phenotyping trait. It provides a better understanding of the physiological and genetic mechanisms of crop growth and development that supports breeding research. Currently, the standard approach to obtain cotton boll counts is by manual sampling via human visual inspection, which is tedious, labor intensive, and error prone. To address the inefficiencies in this approach, we developed an automated vision-based system for cotton boll counting from infield videos where each track uniquely identifies a cotton boll, and the total number of tracks equals the estimated cotton boll count as follows. First, we identify the relationship among the locations of neighboring cotton bolls and model them in a probabilistic framework to handle occlusions. In addition, we exploit dense optical flow and utilize particle filtering to guide each tracker. Then, correspondences between detections and tracks are found through data association via direct observations and indirect cues, which are then combined to obtain an updated observation. We highlight the efficacy our approach in detecting and tracking cotton bolls against other cotton boll counting methods, along with three state-of-the-art tracking methods. TACC’s computational and storage resources were essential for obtaining the results reported in this project.
  • Item
    Resolving Multiple Gravitational Wave Sources in Pulsar Timing Array Data
    (2022-09-29) Mohanty, Soumya D.; Qian, Yi-Qian; Wang, Yan
    Timing the arrival of radio pulses from an array of rapidly spinning neutron stars (Pulsars) is a promising method for detecting gravitational waves (GWs) in the ultra-low frequency regime (10!" Hz to 10!# Hz), primarily from supermassive (billion solar mass and above) black hole binaries . It is expected that next-generation radio telescopes, namely, the Five-Hundred-Meter Aperture Spherical Radio Telescope (FAST) and the Square Kilometer Array (SKA), will grow the number of well-timed pulsars to 𝑂(10$). This will result in greater distance reach for GW sources, uncovering multiple resolvable GW sources in addition to an unresolved population. The multisource resolution problem for PTAs poses a unique set of data analysis challenges such as nonuniformly sampled data, a large number of so-called pulsar phase parameters that arise from the inaccurately measured distances to the pulsars, and poor separation of signals in the Fourier domain due to a small number of cycles in the observed waveforms. We are developing an end-to-end software pipeline for addressing these challenges. The core idea is the iterative subtraction of individually estimated GW sources from the data. However, multiple stages of refinement are needed improve the sample of identified sources, including a novel approach that mitigates spurious sources by cross-checking the outputs from two semi-independent refinement steps. The performance of the current version of the pipeline was quantified on simulated data from PTAs containing 10% and 10$ pulsars, leading to state-of-the-art results in all cases. For example, the fraction of sources found by the method that correspond to true sources in the simulated data exceeds 78% and 93% for a large-scale (with 10$ pulsars and 200 sources) and a midscale (with 10% pulsars and 100 sources) PTA, respectively. The pipeline is currently implemented as a mix of Matlab and parallelized C-code running on TACC resources.
  • Item
    Pure Seminoma Subtyping Using Computational Approaches
    (2022-09-29) Medvedev, Kirill E.; Savelyeva, Anna V.; Bagrodia, Aditya; Jia, Liwei; Grishin, Nick V.
    Testicular germ cell tumors (TGCT) being the most common solid malignancy in adolescent and young men, are second in terms of the average life years lost per person dying of cancer. Two major types of TGCTs are seminoma and non-seminoma (NSE). Management of patients with seminoma includes orchiectomy, platinum-based chemotherapy or radiation therapy. Despite a high patient survival rate, current treatments significantly decrease patients’ quality of life and lead to around 40 severe side effects. We conducted a computational study of 64 pure seminomas (the most common subtype of TGCTs) available at TCGA. Consensus clustering approach of seminoma samples based on transcriptomic data identified two distinct subtypes that showed differences in pluripotency stage, activity of double stranded DNA breaks repair mechanisms, rates of loss of heterozygosity, DNA methylation, expression of lncRNA associated with cisplatin resistance and level of lymphocytes infiltration. Seminoma subtype2 shows signs of differentiation into NSE and therefore may have higher resistance to platinum-based chemotherapy. Despite of the high level of lymphocyte infiltration, TGCTs immunotherapy clinical trials were shut down due to lacking clinical efficacy. We identified 20 significantly overexpressed genes in subtype2 that are related to senescence-associated secretory phenotype. This fact and data on altered pathways in subtype2 allow us to hypothesize that senescence of seminoma infiltrating lymphocytes can be one of the reasons for immunotherapy failure. Using all available histopathological slides of pure seminoma at TCGA we developed test version of deep learning (DL) decision making tool for identification of seminoma subtypes using only slide images (accuracy 0.864). As future direction we plan to develop DL tool for identification of seminoma subtypes using whole slide images (WSI). This approach will simplify utilization of this tool by pathologists but also requires significantly more powerful computational resources and we anticipate to use TACC resources for this task.
  • Item
    Pore-scale Simulations of Multiphase Flow for CO2 Migration Through Saline Aquifers in the Capillary- Dominated Regime
    (2022-09-29) Larson, Richard; Bakhshian, Sahar; Hosseini, Seyyed A.
    Carbon capture and storage intends to inject anthropogenic carbon dioxide from large point sources into the geologic formations for emissions mitigation. In geological carbon sequestration, it is critical to understand the behavior of carbon dioxide as it displaces subsurface, resident fluids during storage to assure its safety and permanence. The multiphase nature of carbon dioxide displacing saline water over long-term periods of post-injection relies heavily on the buoyancy forces arising from the density contrast between CO2 and saline water and the capillary forces controlled by the pore geometry at the pore scale. The competition of those governing forces controls the field-scale migration and confinement of CO2 in reservoir formations. In this study, we use high fidelity pore-scale simulations of multiphase flow to investigate this phenomenon in porous geometries representative of sedimentary rock formations. A computational fluid dynamic technique known as the volume of fluid method is taken to model the buoyancy-driven flow of CO2 in subsurface reservoirs. There are many computational burdens in these simulations. High-resolution meshes with a high number of grid cells are required to capture the complexity of the pore morphology leading to a high computational burden when considering the number of necessary simulated timesteps. Even reducing the domain size of the models to an effective 2-dimensional (2-d) structure with an area less than 1 cm2, nearly prohibitive computational time is needed. Furthermore, to replicate the capillary dominated flow, low velocity values are needed. However, the slower the flowrate, the more an interfacial issue known as spurious currents occurs, leading to numerical instabilities. To handle this numerical issue, simulations with smaller timesteps are required, but in tandem with the slow velocities, computational expense is further compounded. The usage of TACC’s parallel processing resources along with optimization techniques facilitates solutions and results that inform us about the fundamental mechanics of this flow.
  • Item
    Discovering Spatially Coherent Gene Modules from Spatial Transcriptomics Data
    (2022-09-29) Larina, Maria; Singh, Salvi; Samee, Md. Abul Hassan
    Spatial transcriptomics (ST) is an emerging technology that quantifies gene expression at spatial resolution from intact tissue sections. Although ST is enabling unprecedented studies on spatial gene expression, it has posed new challenges to biological data science. A typical ST dataset contains information of ~20K genes from 50K-100K cells. It is challenging to design efficient and scalable algorithms that generate new biological insights from these datasets. Here we feature an efficient and scalable non-negative matrix factorization (NMF) algorithm for identifying “spatial gene modules” (spatial-gems), i.e., groups of genes that express at spatially adjacent locations, in ST data. Spatial-gems are fundamental aspects of multi-cellular organisms. NMF is suitable for this problem since, in theory, NMF can identify the “informative parts” constituting a dataset, e.g., lips and eyes in human facial images and spatial-gems in ST data. The basic NMF formulation, however, can give sub-optimal results for spatial datasets – it ignores spatial locations of data points and thus does not guarantee informative parts that are spatially coherent. Graph-regularized NMF (GNMF) overcomes this issue by constraining the informative parts to comprise spatially adjacent data points. We introduce three changes to tailor the state-of-the-art GNMF algorithm for ST data. First, we statistically determine the optimal number of spatial-gems in an ST dataset. Secondly, we introduce regularizations that minimize the number of genes common between spatial-gems. Finally, we leverage numerical libraries and efficient data structures to obtain a scalable implementation. We benchmarked our GNMF against alternative algorithms on a brain ST dataset. Our algorithm comprehensively charted the spatial-gems in this dataset with a 20x speedup in execution time, making this an attractive tool for large-scale ST consortia like HuBMAP (Human BioMolecular Atlas Program). This tool and our multifaceted approach to enhance efficiency and scalability will be of major interest to the broad userbase of TACC.
  • Item
    Job Losses, Marriage Troubles and Rich Uncles: Foreclosure Prevention Policy when Borrowers Hold Private Information about their Financial Health
    (2022-09-29) Kytömaa, Lauri
    My dissertation studies foreclosure prevention in environments where borrowers have an incentive to appear distressed in order to receive mortgage reductions. Such behavior is possible when borrowers have knowledge about their abilities to repay debt that cannot be observed by their lenders. Using a sample of Fannie Mae loans originated in California between 2004 and 2007, I show that mortgage providers only offer debt relief when they are highly informed about borrower default probabilities. I then use the estimated model to explore the effects of the Federal Home Affordability Modification Program, which was launched in 2009 in response to the Great Recession. I find that subsidies offered to banks under the program were more effective at preventing foreclosures in loans originated earlier in the 2000’s, even though banks tended to be equally well informed about borrower financial health in all sample cohorts. The results suggest that government subsides decreased foreclosures by 7.2% for loans originated in 2004, but that this rate steadily declines to 1.0% for loans originated in 2007. I also find that the average subsidy expenditure per prevented foreclosure increased from $17,000 to $150,000 between my sample origination cohorts. Jointly, the results offer a comprehensive look at how borrower financial well- being and the behavior of financial institutions influence debt relief policy. This project benefits greatly from access to TACC resources. Estimation uses a maximum likelihood routine in which I solve for a high-dimensional grid that rationalizes bank behavior, and then match model-predicted loan outcome probabilities to data. Solving for bank behavior is conducive to parallel computing since the optimum can be computed independently for every set of inputs. Leveraging many nodes allows me to solve for optimal policy at 60.2 million input combinations in under an hour. All numerical maximization takes place using Python on the Stampede2 cluster.
  • Item
    Skyrmion Stochastic Dynamics for Novel Computing Architectures
    (2022-09-29) Khodzhaev, Zulfidin; Turgut, Emrah; Incorvia, Jean Anne
    Dynamics of the magnetic skyrmions can be controlled and a desired location can be achieved using spin torques [1]. There were proposals for skyrmion racetrack memory and logic gate applications [2]. But the instabilities and stochastic behavior of skyrmions due to pinning [3], temperature and applied current, and skyrmion Hall angle (SkHA) [4] make the realization of these applications challenging. Using the instabilities and stochastic behavior of skyrmions, new computing architectures, e.g., neuromorphic, probabilistic and reservoir computing can be achieved [5, 6]. The major aspects of these computations are understanding synaptic response and having a constant uncorrelated signal [2]. In this study, the stochastic property in a shuffling chamber is investigated using micromagnetic simulation. Morever, temperature response to the number of skyrmions in a chamber is analyzed. At first, skyrmion dynamics is analyzed under different geometries, constant temperatures, grains and different currents to get a stochastic skyrmion motion. Then, two locations, along the path of the skyrmions were chosen for local laser spot heating, when grain, current density and geometry were kept the same. The correlation between input and output paths were analyzed using Pearson correlation coefficient (PCC). Finally, for constant temperatures, synaptic response is shown. The study will guide for the creation of a better probabilistic computing device and artificial synapse. [1] Nakatani Y., Yamada K. and Hirohata A., Scientific Reports, 11(1), pp.1-6 (2021) [2] Luo S. and You L., APL Materials, 9(5), p.050901 (2021) [3] Stosic D., Ludermir T.B. and Milošević M.V., Physical Review B, 96(21), p.214403 (2017) [4] Litzius K., Leliaert J., Bassirian P. et al., Nature Electronics, 3(1), pp.30-36 (2020) [5] K.M. Song, J.S. Jeong, B.Pan et al., Nature Electronics 3, 148 (2020) [6] J. Zázvorka, F. Jakobs, D. Heinze et al., Nature nanotechnology 14, 658 (2019)