Sundus Afreen — Business Analyst

01 / 08

Big DataAzure DatabricksApache KafkaNLP / ClinicalBERTMLflow

Trinity College Dublin

Tempus AI — Clinical Data Architecture & Agentic NLP Strategy

How do you unlock $28M in value from 40M patient records when 78% of the data is trapped in unstructured text across three incompatible post-acquisition architectures?

Approach

Designed a 6-layer Medallion Lakehouse (Bronze-Silver-Gold) on Azure Databricks to consolidate three incompatible post-acquisition architectures (Paige AI, Ambry Genetics, Deep6 AI). Built an Agentic NLP pipeline using ClinicalBERT and Bio-BERT to extract structured clinical entities at scale, with MLflow governance and RAG infrastructure.

Key Insights

Only 22% of 40M patient records were AI-ready. The bottleneck was not data volume but schema incompatibility and unstructured text. A confidence-gated multi-agent NLP pipeline (≥0.90 auto-accept; 0.70–0.89 HITL review) enabled scalable extraction while preserving clinical accuracy and audit provenance.

Business Value

Roadmap to increase structured data coverage from 22% to 50% within 18 months. Projected 3-year NPV of $28M, reducing manual abstraction cost from $38 to under $12 per chart. Enabled pharmaceutical-grade audit trails for Pfizer, AstraZeneca, and GSK partners.

$28Mprojected 3-year NPV — structured data coverage roadmap from 22% to 50% across 40M patient records

02 / 08

PythonBERTTF-IDF / LDANLPLogistic Regression

View on GitHub

Sephora — Do Negative Brand Reviews Impact All Customers or Specific Segments?

With 519,409 reviews and severe class imbalance, can we identify exactly which customer segments drive negative sentiment — and what Sephora should do about it?

Approach

Compared three NLP models — Naïve Bayes baseline, Logistic Regression + TF-IDF, and fine-tuned BERT (bert-base-uncased, 110M params, 3 epochs). Applied LDA topic modelling to surface 7 complaint themes, and segmented results by skin type, skin tone, and price tier.

Key Insights

BERT outperformed LR by +26pp on negative F1 (0.76 vs 0.51). Darker skin tones showed a 67% higher negative review rate vs fair/porcelain. Dry skin + budget products had the worst dissatisfaction (12.3%). La Mer hit 19.8% negative despite premium pricing — a value perception problem, not a quality problem.

Business Value

4 actionable recommendations: reformulate for dry/sensitive skin, expand shade range for deeper tones, address value-for-money perception with sample sizes and outcome messaging, and deploy BERT for real-time review monitoring at scale.

67%higher negative rate for darker skin tones — uncovered an actionable inclusivity gap in Sephora's product strategy

03 / 08

AI-Powered AppPythonGroq LLMStreamlitData Visualisation

View on GitHub

InsightIQ — Your AI Business Analyst

How can a non-technical business user get trend analysis, risk flags, KPIs, and strategic recommendations from their own data in seconds — without writing a single line of code?

Approach

Built a full-stack AI analytics application in Python + Streamlit. Users drag-and-drop a CSV, XLSX, or XLS file (up to 200MB), select their industry and report type (Executive Summary, Risk Analysis, Trend Report), and InsightIQ uses a Groq-powered LLM (Llama 3.2 11B) to generate structured analysis and interactive visualisations. Supports Markdown, Word, and PDF export.

Key Insights

Designed a 4-step UX: Upload → Analyse → Ask → Decide. The "Ask AI about your data" chat layer allows free-text business questions against the uploaded dataset — functioning like a junior analyst on demand. Industry-aware prompting adapts recommendations to context (retail, finance, healthcare, etc.).

Business Value

Reduces time-to-insight from hours to under 60 seconds. Designed for non-technical decision-makers — no SQL, no Python required. Demonstrates end-to-end product thinking: UX design, LLM integration, export functionality, and real business framing — not just a notebook.

InsightIQfull-stack AI analytics app — Upload. Ask. Decide. Powered by Groq LLM, built for non-technical business users

04 / 08

PythonChi-Square TestingSQLPareto Analysis

View on GitHub

Order Cancellation Root Cause Analysis

Why are 20,000+ transactions failing — and which failures actually cost the business?

Approach

End-to-end process mapping on 20,000+ electronics transactions. Chi-square testing validated statistical significance of failure patterns; Pareto modelling prioritised fixes by ROI impact. Customer risk segmentation identified high-value loss cohorts.

Key Insights

Top 3 cancellation drivers responsible for 80% of revenue leakage. Failure points mapped to specific process stages, enabling surgical corrective action rather than blanket fixes.

Business Value

Delivered BA-style recommendations with Pareto-driven ROI modelling — directly replicating root cause and process improvement methodology used in enterprise analytics environments.

80%of revenue leakage traced to just 3 root causes — enabling targeted, measurable intervention

05 / 08

PythonXGBoostRandom ForestSHAPTableau

View on GitHub

E-Commerce Customer Churn Prediction

Which customers are about to leave — and what would actually change their mind?

Approach

Multi-model churn prediction pipeline using XGBoost and Random Forest with SHAP explainability to surface the 'why' behind predictions, not just the score. Mirrors customer-facing analytics used in energy retail contexts like Electric Ireland.

Key Insights

SHAP revealed which customer behaviours were the most powerful churn predictors — translating black-box outputs into actionable, prioritised retention signals for non-technical stakeholders.

Business Value

Delivered business-ready retention recommendations structured for executive consumption, with intervention priority ranked by predicted revenue risk.

SHAPexplainability turned model output into executive-ready, actionable retention strategy

06 / 08

RLogistic RegressionDecision Trees

Trinity College Dublin

Student Success Analytics & Performance Prediction

Can we identify at-risk students early enough to intervene — before it's too late?

Approach

Logistic Regression and Decision Tree models applied to 6,378+ student records at Trinity College Dublin. Built in R with full model validation pipeline to predict academic risk before formal assessment checkpoints.

Key Insights

Model achieved 90.49% accuracy. Decision tree visualisation allowed non-technical academic staff to understand risk logic directly — making findings actionable beyond the data team.

Business Value

Enabled an early-identification framework for targeted academic support — a meaningful institutional shift from reactive intervention to proactive student success management.

90.49%accuracy across 6,378 records — enabling proactive early intervention at institutional scale

07 / 08

PythonTableauExcel

View on GitHub

Film Production Strategy Dashboard

With 278,000+ IMDb records — what actually makes a film commercially viable?

Approach

Data-backed creative strategy analysis using 278,000+ IMDb records (2005–2025). Python for data wrangling and EDA; Tableau for an interactive dashboard targeted at a new film production venture's leadership team.

Key Insights

Identified optimal genre mix, ideal runtime window of 95–110 minutes, and talent positioning patterns correlated with maximum commercial ROI — patterns studios overlook when relying on gut instinct alone.

Business Value

Delivered an executive-ready strategy deck moving creative decision-making from instinct to data-validated recommendations a new production company could act on immediately.

278K+IMDb records analysed to identify the precise formula for commercially viable film production

08 / 08

ExcelTableauPower BI

View on GitHub

Retail Sales & Customer KPI Dashboard

How do you give a national sales team real-time visibility — without drowning them in data?

Approach

Comprehensive KPI dashboard for Schneider Electric India's P&G segment using Excel and Tableau. Built backwards from what sales managers actually need to make weekly decisions — not exhaustive reporting for its own sake.

Key Insights

Pipeline velocity and account engagement rate identified as leading KPIs predicting quarterly performance up to 6 weeks ahead — shifting the team from reactive to anticipatory management.

Business Value

Drove a 20% improvement in target achievement efficiency. Dashboard became the standard reporting tool for the segment, replacing fragmented spreadsheets with a single source of truth.

20%improvement in target achievement — end-of-month surprises replaced by real-time foresight

I turn messy business problems into clear, data-backed decisions.

From the sales floor to the data lab — and why it matters.

Problems I've solved — and how I solved them.

Tempus AI — Clinical Data Architecture & Agentic NLP Strategy

Sephora — Do Negative Brand Reviews Impact All Customers or Specific Segments?

InsightIQ — Your AI Business Analyst

Order Cancellation Root Cause Analysis

E-Commerce Customer Churn Prediction

Student Success Analytics & Performance Prediction

Film Production Strategy Dashboard

Retail Sales & Customer KPI Dashboard

Skills that bridge business and data.

Business Analysis

Data, AI & Analytics

Tools & Platforms

How I approach a problem.

Ready to bring clarity to your data challenges?