Past work


Title: Sensor Selection Algorithm for Real-Time Fault Detection and Prevention


Maintenance is a critical aspect of manufacturing, encompassing activities to preserve a system's functional state. While some maintenance types can be conducted without disrupting the manufacturing process, often, maintenance requires a production shutdown. Anticipating component failures before they occur yields numerous benefits, including reduced maintenance costs, improved production quality, and increased productivity. Given the global challenges in manufacturing processes, this research project addresses the need for a prognostic methodology. Employing an artificial intelligence (AI) approach, we integrate Lasso and Group Lasso regression as variable selectors within deep neural networks. This method is applied to real-time additive manufacturing data, enabling the detection and prevention of faults during production.

The project also involves analyzing melt pool data using deep learning techniques for early fault detection, enhancing manufacturing process reliability. Initial stages focus on determining optimal methods for estimating model parameters, emphasizing precision. Simultaneously, we address the optimization problem associated with integrating Lasso and Group Lasso regression into the deep neural network framework. Rigorous testing utilizes the C-MAPSS dataset, monitoring engine failure through multiple sensors. Once validated, we plan to extend the model's application to CAMICS data, broadening its relevance. This comprehensive approach aims to significantly contribute to sensor selection and prognostic modeling within AI-driven frameworks, with practical implications for fault detection in complex systems.

Author & Co-Authors:

Myrine Barreiro-Arevalo, Prince Asamani, Benjamin Peters, Jianzhi Li

School of Mathematical and Statistical Sciences, The University of Texas Rio Grande Valley, 1201 W. University Dr., Edinburg, Texas, USA 78539
Center for Manufacturing Innovation and Cyber Systems (CAMICS), The University of Texas Rio Grande Valley, 1201 W. University Dr., Edinburg, Texas, USA 78539  

M.S. Thesis

Title: Significant Gene Array Analysis and Cluster-Based Machine Learning for Disease Class Prediction


Gene expression analysis has been a major interest to biostatisticians for many decades. Such studies are necessary for the understanding of disease risk assessment and prediction, for a creation of better treatment plans, to lessen symptoms, and perhaps find cures. In this study, we have investigated how to incorporate clusters of genes based on prior biological knowledge into machine learning models for effective gene expression data analysis and to uncover differentially expressed (DE) genes for different disease pathologies. Gene expression datasets for multiple pathologies have been used to test model evaluation metrics and will be obtained using the Affymetrix U133A platform (GPL96). 

Significant Analysis of Microarrays (SAM) had been used to identify potential disease biomarkers, followed by the predictive models: (a) random forest, (b) random forest with Gene eXpression Network Analysis (GXNA), (c) RF++, (d) LASSO, and (e) Bayesian Neural Networks. Deferentially expressed genes within the clusters of co-expressed networks, have been successfully identified where they may be used as potential biomarkers within their particular disease pathology. Moreover, we were able to utilize the Automatic Relevancy Determination prior to identify the relatively important genes with Bayesian neural networks effectively.

You can read my Master's Thesis online via OpenAthens here


NAtional Center for toxicology research

Title: IG-CNN: Imaging Genomic Data to Enable Convolutional Neural Network for the Enhanced Survival Rate Prediction


Neuroblastoma (NB) is the leading type of pediatric cancer diagnosed in children under one. Despite significant advances in treatment courses, high-risk NB still touts an overall survival rate of less than 50%. Accurate survival rate prediction is critical to improving NB diagnosis, facilitating precision medicine-based treatment development, and supporting clinical decision-making. However, the performance of the survival perdition model and its clinical implications are still suboptimal. In this study, we proposed a novel deep learning framework named IG-CNN by imaging the transcriptomic profiles to enable a convolutional neural network (CNN) model for enhancing the survival rate prediction with an improved model explainability. Specifically, three digitalized image transformation strategies, including independent component analysis (ICA), Latent Dirichlet Allocation (LDA), and Autoencoder, were employed to cluster the transcriptomic profiles and then visualize them with a TreeMap tool. Then, a five-layer CNN model was developed based on the image-like transcriptomic profiles to predict the clinical endpoints (i.e., OS and EFS). Finally, the Score-Weighted Visual Explanations for Convolutional Neural Network (Score-CAM) was used to extract key genes and their associated pathways contributing to the survival rate for an improved model explainability. To investigate the performance of the proposed IG-CNN, we employed the RNA-seq data of 498 NB patients with two endpoints, including overall survival (OS) and Event-free survival (EFS). As a result, the proposed IG-CNN achieved a high Matthews correlation coefficient (MCC) of 0.608 and 0.558 for OS and EFS for the test set, outperforming the state-of-the-art machine models generated by the SEQC-I participants. Moreover, the Score-CAM extracted key genes and pathways were highly correlated with NB biology. Altogether, the proposed IG-CNN is a promising framework for the transcriptomic profile-based prediction model development with an enhanced prediction power and model explainability to facilitate precision medicine applications.

Author & Co-authors:

Myrine Barreiro-Arevalo, Ruth Roberts, Weida Tong, Zhichao Liu
Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
School of Mathematical and Statistical Sciences, The University of Texas Rio Grande Valley, 1201 W. University Dr., Edinburg, Texas, USA 78539
ApconiX Ltd, Alderley Park, Alderley Edge, SK10 4TG, UK
University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK


B.S. Project

Title: Ticks in Texas


Ticks have been known to be vectors of several deadly diseases including Lyme disease, which has often gone misdiagnosed. Yet, little is known about whether the abundance of ticks is related with the presence of its host and which climatic variables might limit their distribution. The main goal of this research project is to examine the ecological and climatic factors related to ticks found in several counties of Texas. Passive surveillance, in which samples are not actively being looked for, was conducted between the years 2011 - 2016; during which time, ticks that were found on humans, on animals, or questing in the environment were voluntarily submitted by individuals, clinicians, or organizations for pathogen detection. Each specimen was identified using dichotomous keys identifying body structures (e.g. mouthparts, legs, palps) in conjunction with dissection microscopy for morphological identification of the ticks’ species, life- stage, and sex. A total of 1081 tick samples were used in this study. Of the remaining tick samples, five regions of Texas were identified and each tick and the county it was found in was accounted for accordingly. Monthly precipitation and temperature was recorded for each location that a tick was found in for 12 months prior to discovery. Preliminary results of the 1081 tick samples, 4 main genera were identified: Rhipicephalus spp., Dermancentor spp., Amblyomma spp., and Ixodes spp. The most common genus was found to be Rhipicephalus spp., accounting for 469 of the ticks collected. The most collected tick species in our state were Rhipicephalus sanguineus, also known as the Brown Dog Tick. The majority of the ticks reported were from the Texas Gulf Coast region, and Brazos county reported the most ticks with 87 samples identified. Further observation of tick variables is needed, and will be examined using cluster analysis and principle component analysis. 

Author & Co-Authors:
Myrine Barreiro-Arevalo, Zeinab Mohamed, Loles Esteve-Gasent, Tamer Oraby, Teresa Feria- Arroyo

University of Texas Rio Grande Valley College of Sciences, Edinburg, TX, USA
Texas A&M University College of Veterinary Medicine, College Station, TX, USA
Talent in Agriculture for Climate Change and Food Security Adaptation