Description: Developed and deployed a full OCR pipeline using Google Cloud Vision and OpenCV to extract key fields (e.g., airport codes, tracking numbers) from scanned shipping labels, used Gemini to classify fields, and automatically assign routing decisions across UPS freight operations.
Tools/Technologies Used: - Python, OpenCV, Pandas - Google Cloud Vision API - Vertex AI, Flask (for API deployment)
Methodology: 1. Preprocessed scanned label images using image cleaning and contour detection. 2. Extracted text using Google OCR and parsed it into structured data. 3. Applied business logic and Gemini-based interpretation to determine routing codes. 4. Packaged into a deployable API for automated sort decisions.
Results: Automated 90%+ of routing tasks for freight labels, improving accuracy and operational speed at UPS sort hubs.
Description: Built a voice-enabled AI Agent assistant that allows UPS Freight Flow Controllers to ask natural language questions like “Where is trailer 123456?” and receive spoken answers with real-time trailer/bay assignments based on scaping data from the TMS.
Tools/Technologies Used: - Google Speech-to-Text & Text-to-Speech APIs - Gemini 1.5 Flash (LLM) for intent parsing - Vertex AI, Python, FastAPI
Methodology: 1. Converted audio input to text using Google STT with custom context boosting. 2. Parsed intent and entities (e.g., trailer numbers) with Gemini. 3. Queried UPS TMS system and responded via Google TTS. 4. Deployed as a cloud API and tested across real hub workflows.
Results: Reduced manual trailer lookup time by over 70%, enabled hands-free interaction for control room staff, and served as a proof-of-concept for conversational AI in logistics.
Description: This project explores student performance data to identify relationships between parental education, economic indicators, and prior academic results, aiming to understand factors influencing graduation success.
Tools/Technologies Used: - Python, Pandas, NumPy - Matplotlib, Seaborn, Statsmodels
Methodology: 1. Cleaned and prepared the data by handling missing values and outliers. 2. Conducted exploratory data analysis (EDA) to identify patterns and distributions. 3. Performed regression analysis to evaluate relationships between features and graduation rates.
Results: Parental education and economic indicators significantly impact student graduation rates.
View ProjectDescription: This project uses machine learning models to detect fraudulent financial transactions by analyzing patterns and identifying anomalies.
Tools/Technologies Used: - Python, Scikit-learn, Pandas, NumPy - Matplotlib
Methodology: 1. Cleaned and combined credit card, online retail, and mobile transaction datasets. 2. Implemented classification models such as Logistic Regression and Random Forest. 3. Evaluated model performance using precision, recall, and AUC scores.
Results: The Random Forest model achieved an AUC score of 0.98, effectively detecting fraudulent activities.
View ProjectDescription: Analyzes video game sales data to identify top-performing games, platforms, and regional trends.
Tools/Technologies Used: - Python, Pandas, Matplotlib
Methodology: 1. Cleaned the dataset and removed irrelevant entries. 2. Performed EDA to analyze sales trends across platforms, regions, and publishers. 3. Visualized insights using bar charts and line graphs.
Results: The analysis showed Nintendo as a dominant publisher, with the Wii leading in platform-specific global sales.
View ProjectDescription: Explores relationships between vehicle specifications (horsepower, weight, fuel efficiency) and builds a predictive model for fuel efficiency.
Tools/Technologies Used: - Python, Pandas, Matplotlib, Seaborn
Methodology: 1. Cleaned and preprocessed automotive data. 2. Explored relationships between features using heatmaps and scatter plots. 3. Built a regression model to predict fuel efficiency (mpg).
Results: The regression model highlighted weight and horsepower as key factors negatively impacting fuel efficiency.
View ProjectDescription: Explores the Boston Housing dataset to identify socioeconomic factors influencing housing prices.
Tools/Technologies Used: - Python, Pandas, Matplotlib, Seaborn
Methodology: 1. Performed data cleaning and handled missing values. 2. Conducted EDA to find correlations between crime rate, rooms per dwelling, and housing prices.
Results: The number of rooms per dwelling had a strong positive correlation with housing prices, while the crime rate negatively impacted prices.
View ProjectDescription: Uses web scraping techniques to extract the top 100 eBooks from Project Gutenberg.
Tools/Technologies Used: - Python, BeautifulSoup, Requests
Methodology: 1. Scraped data (titles, authors, and book IDs) from Project Gutenberg. 2. Cleaned and formatted the data using regular expressions.
Results: Successfully extracted and cleaned a list of the top 100 eBooks, providing a ready-to-use dataset.
View ProjectDescription: Analyzes TSA complaint data to uncover trends and patterns across categories and airports.
Tools/Technologies Used: - Python, Pandas, Seaborn, Matplotlib
Methodology: 1. Cleaned and combined TSA-Compliant datasets. 2. Visualized complaints by airport, time period, and complaint category.
Results: The analysis revealed mishandling of passenger property as the most frequent complaint.
Files Available:
Description: Uses time series analysis to model and predict monthly retail sales trends in the United States, identifying key patterns, trends, and disruptions.
Tools/Technologies Used: - Python, Pandas, Statsmodels - ARIMA (AutoRegressive Integrated Moving Average), Matplotlib
Methodology: 1. Cleaned monthly retail sales data and handled inconsistencies. 2. Visualized retail sales trends over time. 3. Applied ARIMA models for time series forecasting. 4. Evaluated model accuracy.
Results: Successfully predicted retail sales trends, highlighting major economic disruptions like the 2008 financial crisis and COVID-19.
View ProjectDescription: Builds a predictive model to identify customers likely to churn from a telecom service provider.
Tools/Technologies Used: - Python, Pandas, NumPy, Scikit-learn - Logistic Regression, Gradient Boosting, Matplotlib, Seaborn
Methodology: 1. Cleaned data, handled missing values, and prepared features for modeling. 2. Visualized usage patterns and billing trends across churned and retained customers. 3. Implemented Logistic Regression and Gradient Boosting models. 4. Evaluated performance using accuracy, precision, recall, and AUC metrics.
Results: The Gradient Boosting model achieved 85% accuracy with a strong ROC-AUC score, successfully identifying high-risk churn customers.
View ProjectDescription: Analyzes global mental health data to identify prevalence rates, risk factors, and trends in disorders like depression and anxiety.
Tools/Technologies Used: - R, ggplot2, dplyr, caret
Methodology: 1. Cleaned and organized global mental health datasets. 2. Visualized prevalence rates across demographics, regions, and time periods. 3. Applied regression models to identify risk factors influencing mental health disorders.
Results: Significant relationships were found between economic conditions, unemployment, and mental health disorder prevalence.
View ProjectDescription: Explores U.S. childcare costs from 2008 to 2018, highlighting trends, regional disparities, and their socioeconomic impacts.
Tools/Technologies Used: - Python, Tableau, Pandas, Excel
Methodology: 1. Cleaned missing values and preprocessed variables in the childcare costs dataset. 2. Conducted EDA to identify key trends and regional disparities. 3. Designed a Tableau dashboard for interactive exploration. 4. Documented findings in a final report.
Results: Childcare costs increased by 21% over the decade, with the Northeast and West Coast having the highest costs.
View Project