Open to new opportunities

Hi, I'm Jen (Ha).
pronounced hah (/hɑː/)
I turn data into
impact.

Data Scientist & ML Engineer with dual Master's degrees (GPA 4.0/4.0) and hands-on experience in machine learning, NLP, medical imaging AI, and business intelligence across healthcare, finance, and research.

0+ Years exp.
0 Publications
0 Awards
Jen (Ha) Nguyen

Jen (Ha) Nguyen

Data Scientist & ML Engineer · USA

Python SQL Power BI PyTorch ML/DL PySpark
Python
95%
SQL
92%
ML / DL
88%
Power BI
90%

About Me

Passionate about data science at the intersection of health, AI & business

I build end-to-end ML pipelines, publish research in top-tier journals, and translate data into meaningful decisions — from ETL and BI dashboards to deep learning medical imaging models.

🤖
Machine Learning & AI

End-to-end ML pipelines: tree-based models, deep learning (ResNet, BERT, LSTM), and radiomics for real-world impact.

Business Intelligence

Power BI dashboards, DAX modeling, PySpark ETL pipelines, and Microsoft Fabric — turning raw data into executive insights.

🔬
Research & Publications

Co-authored papers targeting Q1 journals and top conferences — RSNA, Springer Nature, Diagnostics, ICME.


Experience & Skills

My professional journey

Work Experience

Jan 2026 – Present
Machine Learning Engineer Intern
Coulomb Technology · Remote, USA

Led end-to-end tree-based regression pipeline in Python to identify how material characteristics affect battery performance, increasing energy capacity from 100–120 to 135 mAh/g and saving over $1M in manufacturing costs.

Nov 2024 – Present
AI Researcher
AIMA Research Institution · Remote, USA

Designed deep learning pipelines (ResNet34) for mediastinal abnormality detection on chest X-rays. Led feature selection and benchmarking of 16+ ML/DL models — achieved AUC = 0.903, reducing diagnostic decision time by 50%.

Aug 2024 – Present
Graduate Research Assistant
Missouri State University · Missouri, USA

Led economic research with logistic regression & R, developed Power BI dashboards for 2000+ students across 5 years, deployed AWS web app (EC2, S3, RDS), and mentored 30+ students in ML competitions.

Jun 2025 – Aug 2025
Data Analyst Intern
CoxHealth · Missouri, USA

Processed 11M+ physician text messages with BERT & PySpark, built Power BI dashboards unlocking $1M+ in revenue, automated 300+ ServiceNow reports, and developed a 10TB+ ETL pipeline on Microsoft Fabric reducing cloud costs by 40%.

Mar 2022 – Jul 2023
Senior People Analyst
Home Credit B.C · Ho Chi Minh, Vietnam

Automated 100+ HR reports, built gradient-boosting models improving headcount accuracy by 80%, led A/B tests increasing applicant volume by 20%, and received Global DNA STARs Award & "Best DEI Projects" award.

Technical Skills

PythonExpert
SQL / PySparkExpert
Power BI / DAXAdvanced
PyTorch / scikit-learnAdvanced
Deep Learning (CNN/NLP)Advanced
R / StatisticsAdvanced
AWS (EC2, S3, RDS)Proficient
Microsoft FabricProficient

Education

Missouri State University
M.S. Data Science — GPA 4.0/4.0
Valedictorian Scholarship
Missouri State University
MBA (Finance & Data Analysis) — GPA 4.0/4.0
Valedictorian Scholarship · Beta Sigma Honor Society
Hoa Sen University
B.Ec. Economics — GPA 3.6/4.0
Outstanding Student Scholarship · Dean's List
AI Vietnam Institution
AI Engineer & Data Science Associate Degree
Vietnam

Selected Work

Featured Projects

All Projects →
🔬

MediRad-MRI: AI Tumor Classification

Radiomics-based ML/DL pipeline for benign–malignant anterior mediastinal tumor classification on MRI. Evaluated 16+ models; AUC = 0.903, reducing radiologist decision time by 50%.

Python ResNet34 3D Slicer Radiomics
🏥

CoxHealth Radiology Analytics Platform

Processed 11M+ physician text messages with BERT & PySpark. Modality demand–capacity forecasting via Time Series Analysis. ETL pipeline on Microsoft Fabric handling 10TB+ of distributed HDFS data.

Power BI PySpark BERT MS Fabric
🔋

Battery Performance ML Pipeline

Tree-based regression pipeline to identify how material characteristics affect battery performance at Coulomb Technology. Increased energy capacity from 100–120 to 135 mAh/g, saving $1M+ in manufacturing costs.

Python XGBoost Regression

Recognition

Awards & Honors

🥉
3rd Prize – National Big Data Health Science Conference 2026
7th National Big Data Health Science Conference

Data Hackathon competition.

2026
🥇
1st Prize – US IT National Collegiate Conference 2025
US IT National Collegiate Conference

Database Design & Machine Learning Contest.

2025
🏆
Visionary Award 2025
Association for Business Information Technology Students

Recognized for visionary leadership and technical excellence.

2025

Research

Publications

Radiology Society of North America (RSNA) Conference — Poster
OASIS-Net for Obstetric Adversarial Semi-Supervised Segmentation of Cervical and Fetal Head Ultrasound Imaging
Springer Nature Journal — Under Review
MediRad-MRI: AI-Driven Radiomics Classification of Anterior Mediastinal Tumors on MRI
Diagnostics Journal — Under Review
Leveraging Large Language Models for Automated Extraction and Phenotyping of Abdominal Aortic Aneurysm from Radiology Reports
ICME Conference — Under Review
SO-LoRA: Sparse Orthogonal LoRA for Parameter-Efficient Continual Learning

Let's collaborate on something impactful

Whether it's ML research, data analytics, or AI engineering — I'd love to connect and hear about your challenge.