About

Hi, I'm Saurabh Loya, a Data Scientist and AI Enthusiast passionate about transforming data into intelligent insights and building the future of AI. I hold a Master's in Computer Science from the University of Utah and have worked with global teams at BMW Group and Volkswagen Group, developing data-driven AI solutions and intelligent automation systems.

My core expertise lies in data science, machine learning, and artificial intelligence. I specialize in extracting meaningful insights from complex datasets, building predictive models, and developing AI systems that drive business value. My work spans across data analysis, statistical modeling, deep learning, and Large Language Models, using technologies like Python, TensorFlow, PyTorch, Language models, and LangChain. I've developed data pipelines, built ML models, created AI-powered applications, and designed intelligent systems that solve real-world problems through data-driven approaches.

I also have strong software engineering skills that enable me to build robust, scalable systems to support my data science and AI work. This combination allows me to create end-to-end solutions—from data collection and analysis to model deployment and AI system implementation—delivering complete, production-ready data science and AI solutions.

Feel free to check out my CV and drop me an email if you want to chat with me!

Education

Master of Science in Computer Science

University of Utah

📍 Salt Lake City, UT, USA 📅 May 2025 🎯 GPA: 3.6/4.0

Relevant Coursework

Advanced Algorithms Distributed Systems Data Visualization Natural Language Processing Machine Learning Deep Learning Computer Architecture System Security

Bachelor of Technology in Computer Science & Engineering

MIT World Peace University

📍 Pune, Maharashtra, India 📅 May 2021 🎯 GPA: 3.8/4.0

Relevant Coursework

Data Structures & Algorithms Database Management Machine Learning Artificial Intelligence Big Data Analytics Cloud Computing Operating Systems Computer Networks Web Development

Work Experience

Data Scientist

Aug 2025 - Present

University of Utah Health

📍 Salt Lake City, UT, USA Full-time

Data Scientist Intern

Jan 2025 - May 2025

BMW Group

📍 Salt Lake City, UT, USA Internship

Software Development Intern

June 2024 - Sept 2024

H7 BioCapital

📍 San Francisco, CA, USA Internship

Software Engineer

August 2021 - July 2023

Volkswagen Group

📍 Pune, Maharashtra, India Full-time

Salesforce Developer Intern

June 2021 - August 2021

ForceArk

📍 Pune, Maharashtra, India Internship

Other Experience

Graduate Teaching Assistant

January 2024 - May 2024

University of Utah

📍 Salt Lake City, UT, US Academic

Python Tutor

November 2020 - May 2021

Clone Futura

📍 Remote, India Academic

Technical Skills

Programming Languages

Python SQL R Java JavaScript

Databases & Data Storage

PostgreSQL MongoDB MySQL Pinecone Redis

AI & Large Language Models

LLM AI Agents LangChain Vector Databases RAG Systems Prompt Engineering Fine-tuning

Data Analytics & Visualization

Tableau Power BI D3.js Matplotlib Seaborn Plotly Apache Spark Time Series Analysis

AI Frameworks & Libraries

Pandas NumPy TensorFlow PyTorch Scikit-learn LangChain Streamlit Hugging Face Transformers

Software Engineering

Django Flask Spring Boot Angular React FastAPI REST APIs

Cloud & AI Infrastructure

AWS Azure Docker Git Apache Airflow MLOps

Projects

Citi Bike Analytics Project

Citi Bike Rental Analytics & Forecasting

Developed comprehensive analytics and time-series forecasting models for Citibike rental data using Apache Spark and Facebook Prophet. Applied advanced statistical methods to predict demand patterns, enabling data-driven decision-making and optimized resource allocation in urban transportation systems.

Python Apache Spark Prophet Time Series Analysis Statistical Modeling

Source Code: Time-Series-Analytics-and-Forecasting-with-Apache-Spark

Medical Chatbot Project

AI-Powered Medical Chatbot

Developed an advanced Medical Chatbot leveraging LLaMA2, LangChain, and Pinecone VectorDB to provide instant, accurate medical information. Applied natural language processing and vector similarity search techniques to enhance patient engagement and deliver personalized healthcare insights.

Python LangChain LLaMA2 Pinecone NLP Generative AI Vector Search

Source Code: Medical Chatbot
🏆 Hackathon Winner: Taskformer's AI Chatbot Hackathon

Pokemon Data Visualization Project

Interactive Pokemon Data Visualization

Developed an award-winning interactive data visualization tool to explore Pokémon stats, type matchups, and battle outcomes using advanced statistical analysis and machine learning techniques. Applied data science methodologies to uncover hidden patterns in Pokémon data, securing the winner position in a class of 120 students.

D3.js Python Data Visualization Statistical Analysis Machine Learning

Source Code: visual-journey-in-the-world-of-pokemon
🌐 Live Demo: Explore the Pokémon World

Android Malware Detection Project

ML-Based Android Malware Detection

Developed and compared multiple machine learning models to detect malicious Android apps using system call frequency data analysis. Implemented advanced feature engineering and model evaluation techniques, with most algorithms built from scratch to achieve high accuracy in malware classification.

Python Machine Learning Scikit-learn Feature Engineering Cybersecurity Model Evaluation

Source Code: Android Malware detection

MCQ Generator Web Application

AI-Powered MCQ Generator

Developed an intelligent web application using OpenAI's language model, LangChain, and Streamlit to automate the creation of multiple-choice questions. Applied natural language processing and prompt engineering techniques to provide educators and content creators with customizable options for generating high-quality MCQs based on any input content.

Python LangChain Streamlit OpenAI API NLP Prompt Engineering

Source Code: MCQ Generator Web Application

MeetingMate - Automated reminders for Google Calendar events.

Automation tool integrated with the Google Calendar and Gmail APIs to send timely reminders to attendees of upcoming events.

Source Code: MeetingMate
Technology: Python, Streamlit, Google APIs

Traffic Sign Classification Project

Deep Learning Traffic Sign Classification

Built an advanced predictive model using convolutional neural networks (CNN) for Traffic Sign Classification, achieving 96% accuracy across 42 classes of traffic signs. Applied computer vision techniques, data augmentation, and model optimization to create a robust classification system with real-world applications in autonomous driving.

Python CNN Deep Learning TensorFlow Computer Vision Data Augmentation

Source Code: Traffic Sign Classification
📄 Research Paper: Optimized Detection and Classification on GTRSB: Advancing Traffic Sign Recognition with Convolutional Neural Networks

AutoBotTrain - Automated Chatbot Training Pipeline

Crafted an automated pipeline to efficiently generate utterances, responses, and intents from user-entered text or business documents, streamlining chatbot training process.

Source Code: AutoBotTrain
Technology: Python, Spacy, NLP, Machine Learning

LLM based Medicine Recommendation System

Implemented an AI-driven medicine recommendation system utilizing Large Language Model (LLM) technology to suggest medications based on patient symptoms while providing insights into potential side effects.

Source Code: Medicine Recommendation System
Technology: Python, Spacy, Machine Learning, Large Language Model

Smart Contract Fuzzing - Enhancing Security in Blockchain Applications

Utilizing fuzzing techniques and static analysis, this project meticulously identifies vulnerabilities within Ethereum smart contracts, bolstering their security and fortifying decentralized applications against potential threats.

Source Code: Smart-Contract-Fuzzing
Technology: Solidity, Echidna

Get In Touch

I'm always interested in new opportunities and collaborations. Feel free to reach out!