8 Major
Projects
Python BigQuery BERT / NLP Power BI scikit-learn Random Forest Predictive Modelling Web Scraping
available for data roles · UK

Tanushree
R. Dalvi

MSc. Data Science Graduate · Turning data into decisions

I transform messy data into clear, actionable insights. From building explainable AI systems to designing cloud data pipelines and NLP models. I bring academic rigour and hands-on project depth to every problem.

8 Major
Projects
Python BigQuery BERT / NLP Power BI scikit-learn Random Forest Predictive Modelling Web Scraping
Key Highlights
  • Python, SQL and R for end-to-end ML workflows
  • NLP with BERT and RoBERTa models
  • GCP pipelines and BigQuery ML for scalable analytics
  • Power BI and executive dashboards for business insights
  • Flask and explainable AI solutions
8+ academic projects
6 certifications
10+ tools & technologies
01 / about

Who I am

I'm Tanushree, a data scientist and analyst with an MSc in Data Science (Merit) from the University of Surrey, Guildford, and a BSc in Computer Science (CGPA 8.58/10) from the University of Mumbai. My academic background gave me a strong foundation in both the theory and the practice of working with data at scale.

My work spans the full data lifecycle: from designing cloud pipelines on Google Cloud Platform, to building NLP classifiers using BERT and RoBERTa, running end-to-end machine learning workflows, and translating findings into Power BI dashboards that stakeholders can actually use. I love the moment data stops being numbers and starts telling a story.

I am now looking for graduate data analyst, junior data scientist, or BI analyst roles in the UK where I can contribute meaningfully from day one.

02 / skills

Technical toolkit

A broad technical skill set developed through intensive project work, where each skill is gained through real-world implementation rather than just coursework.

Programming & Querying
Python90%
SQL85%
R70%
ML & Data Science
scikit-learn88%
NLP (BERT / RoBERTa / BiLSTM / CRF)82%
Random Forest / Ensemble ML85%
pandas / NumPy90%
Predictive Modelling Feature Engineering Hyperparameter Tuning Class Imbalance Handling
BI & Visualisation
Power BI85%
Looker Studio80%
Excel (Pivot, VLOOKUP)88%
Cloud & Databases
Google Cloud / BigQuery82%
MySQL / PostgreSQL80%
MongoDB / AWS70%
Soft Skills
Stakeholder Communication90%
Analytical Problem-Solving92%
Report Writing85%
Tools & Environments
Jupyter Google Colab VS Code Flask Cloud Run Azure VM SSMS / SSAS BigQuery ML GCP Storage PowerPoint
03 / projects

What I've built

Seven end-to-end projects demonstrating practical data science expertise across explainable AI, cloud-based pipelines, NLP systems, and executive BI dashboards.

01
Python Explainable AI Flask NLP Dissertation

MicrobiomeBot — Explainable AI System for Microbial Interaction Inference

An interactive chatbot and AI pipeline that infers microbial interactions using logic-based machine learning, NLP querying, and data visualisation. Deployed end-to-end.

  • Built explainable AI pipeline using Python and logic-based ML
  • Created interactive Flask-based chatbot with NLP query interface
  • Integrated data visualisation layer for interpretable outputs
  • Deployed complete application — live, working system
view on github
02
Random Forest scikit-learn Python Lloyds Simulation

Customer Churn Prediction — Lloyds Banking Group Data Science Simulation

Developed a predictive churn model for Lloyds Banking Group's Data Science & Analytics team. Completed advanced preprocessing, model training and hyperparameter tuning, achieving strong performance metrics.

  • Built customer churn model using Random Forest and ensemble ML algorithms
  • Conducted advanced preprocessing: missing values, encoding, feature scaling
  • Optimised hyperparameters with GridSearchCV for peak performance
  • Applied feature importance analysis to generate actionable business insights
view on github
03
Random Forest scikit-learn Python BCG Simulation

Customer Churn Analysis — BCG Data Science Job Simulation

Built a customer churn prediction pipeline for a BCG Data Science simulation, covering exploratory data analysis, feature engineering, and Random Forest modelling, culminating in an executive summary with actionable retention recommendations.

  • Completed a customer churn analysis simulation for XYZ Analytics, demonstrating advanced data analytics skills, identifying essential client data and outlining a strategic investigation approach.
  • Conducted efficient data analysis using Python, including Pandas and NumPy. Employed data visualisation techniques for insightful trend interpretation.
  • Engineered and optimised a Random Forest model for customer churn prediction. Model evaluation showed strong overall accuracy (90%) and a ROC-AUC of 0.66; however, performance on the minority churn class was limited (recall: 3%) due to class imbalance.
  • Developed a concise executive summary for the team, delivering actionable insights for informed decision-making based on the analysis.
ROC-AUC Score: 0.66
view on github
04
Web Scraping Python Predictive Modelling BA Simulation

Customer Review Analysis & Buying Behaviour Prediction — British Airways Simulation

Explored how data science drives strategic decisions at British Airways. Scraped and analysed customer review data to uncover key sentiment trends, then built a predictive model to identify factors influencing booking behaviour.

  • Scraped and cleaned customer review data at scale using Python
  • Conducted sentiment and topic analysis to surface actionable findings
  • Built a predictive model for customer buying behaviour
  • Presented data-driven recommendations in a structured insight report
view on github
05
BERT RoBERTa BiLSTM CRF

Biomedical NLP — Named Entity Recognition & Text Classification

Implemented and rigorously compared four state-of-the-art NLP architectures for biomedical text classification. Detailed error analysis conducted across all models.

  • Compared CRF, BiLSTM, BERT and RoBERTa on biomedical corpora
  • Conducted structured error analysis per model architecture
  • Ran experiments end-to-end in Google Colab at scale
  • Produced performance benchmarks across ROC, F1, confusion matrices
view on github
06
BigQuery Google Cloud Looker Studio Cloud Run

Cloud & Big Data Analytics Pipeline — GCP End-to-End

Designed and deployed a complete cloud data pipeline on Google Cloud Platform — from raw storage ingestion through BigQuery ML to Looker Studio dashboards.

  • Architected pipeline: GCS → BigQuery → BigQuery ML → Looker Studio
  • Deployed serving layer via Cloud Run for scalable access
  • Applied BigQuery ML for in-warehouse predictive modelling
  • Created executive-facing Looker Studio dashboard for insights
07
scikit-learn Python ML

Machine Learning & Data Mining — Predictive Modelling Suite

Full ML workflow covering regression, classification, feature engineering, train/test splits and comprehensive model evaluation across multiple algorithms.

  • Built regression and classification models with feature engineering
  • Evaluated models using MSE, R², ROC curves, confusion matrices
  • Applied cross-validation and hyperparameter tuning methodologies
  • Documented complete experimental workflow with reproducible code
view on github
08
Power BI Excel HR Analytics PWC Simulation

Customer KPIs & HR Dashboard — Executive Gender Balance Insights

Built Power BI dashboards to analyse HR KPIs and deliver executive-level gender balance insights, presented with structural solutions and professional stakeholder correspondence.

  • Designed multi-page Power BI dashboard for HR KPI tracking
  • Identified gender balance insights at executive leadership level
  • Presented data-driven structural recommendations to stakeholders
  • Produced professional correspondence accompanying analysis
04 / education

Academic foundation

A rigorous computer science foundation from Mumbai, further strengthened by a Master’s degree in Data Science from Surrey.

2025 — 2026

MSc Data Science

University of Surrey · Guildford, United Kingdom
Merit ML · NLP · Cloud Analytics · Big Data · Statistical Modelling
2021 — 2024

BSc Computer Science

University of Mumbai · Mumbai, India
CGPA 8.58 / 10 Algorithms · Databases · Software Engineering · Statistics

Let's work
together

I’m currently seeking graduate Data Analyst, Junior Data Scientist, and BI Analyst roles across the UK. If you’re hiring or would like to connect, I’d be glad to hear from you.
Feel free to reach out.

send me an email →