Stroke Risk Analysis in Nigeria: A Beginner's Data Journey

Introduction
Ever looked at a dataset and wondered, "What story is hiding in these numbers?" That's exactly what happened when I found stroke data from Nigeria on Hugging Face. As a data analytics beginner, I wanted to tackle something real – understanding what puts people at risk for strokes.
This article walks through how I went from a messy CSV to an interactive Tableau dashboard, the challenges I faced, and what I learned along the way.
The Project Goal
Build an interactive dashboard analyzing stroke risk factors in Nigerian patients, focusing on:
Medical conditions (hypertension, heart disease)
Demographics (age, gender, location)
Lifestyle factors (work type, smoking, marital status)
The Result: Six visualizations that tell a compelling story about stroke risk.
The Dataset
Source: Hugging Face
Records: Several thousand Nigerian patients
Columns:
Demographics: Age, Gender, Residence (Urban/Rural), Work Type
Medical: Hypertension, Heart Disease, Stroke (all binary 0/1)
Health Metrics: BMI, Average Glucose Level
Lifestyle: Smoking Status, Ever Married
The Challenge: Binary columns stored as 0/1 needed special handling for meaningful visualizations.
The Workflow
Step 1: Data Cleaning with Pandas
Started in a Jupyter Notebook because it's perfect for experimenting.
python
import pandas as pd
# Load the dataset
df = pd.read_csv('stroke_data.csv')
# Quick exploration
print(df.head())
print(df.info())
print(df.isnull().sum())
# Handle missing values
df = df.dropna()
# Ensure binary columns are integers
binary_cols = ['stroke', 'hypertension', 'heart_disease', 'ever_married']
for col in binary_cols:
df[col] = df[col].astype(int)
# Create age groups for better visualization
df['age_group'] = pd.cut(df['age'],
bins=[0, 18, 35, 50, 65, 100],
labels=['Child', 'Young Adult', 'Middle Age', 'Senior', 'Elderly'])
# Export cleaned data
df.to_csv('stroke_data_cleaned.csv', index=False)
```
**Key Cleaning Steps:**
- Removed rows with missing values
- Converted text columns to appropriate types
- Created age categories for analysis
- Verified data integrity
### Step 2: Into Tableau
**Loading Data:**
1. Open Tableau Public
2. Connect to the cleaned CSV
3. **Critical:** Verify data types in Data Source tab
- Binary columns (0/1) should show as numbers (#)
- If they're text (Abc), convert them
**Creating Calculated Fields:**
This is where the magic happens. You can't just drag 0/1 columns around – you need to calculate meaningful metrics.
```
Stroke Rate (%):
AVG([Stroke]) * 100
```
Why this works: Averaging 0s and 1s gives you the proportion of 1s (stroke cases), then multiply by 100 for percentage.
```
Risk Score:
[Hypertension] + [Heart Disease] + IF [BMI] > 30 THEN 1 ELSE 0 END
```
**Important:** The `IF [BMI] > 30 THEN 1 ELSE 0 END` is crucial because `[BMI] > 30` returns TRUE/FALSE, not a number. You can't add TRUE to integers.
```
Risk Category:
IF [Risk Score] >= 3 THEN "Very High Risk"
ELSEIF [Risk Score] = 2 THEN "High Risk"
ELSEIF [Risk Score] = 1 THEN "Medium Risk"
ELSE "Low Risk"
END
Step 3: Building Visualizations
Visualization 1: Heatmap (Hypertension × Heart Disease)
Goal: Show how these conditions interact
Setup:
Columns: Hypertension (blue pill/dimension)
Rows: Heart Disease (blue pill/dimension)
Color: AVG(Stroke) as percentage
Mark Type: Square
Size: COUNT() for sample size
Rookie Mistake: Kept getting a continuous axis instead of a 2×2 grid. Solution: Right-click the pill → Convert to Discrete.
Visualization 2: Risk Score Bar Chart
Setup:
Columns: Risk Category
Rows: AVG(Stroke) * 100
Color: Risk Category
Sort: Descending by stroke rate
Label: Show percentage and COUNT()
Other Visualizations
Following the same pattern, I created:
Age group analysis (bar chart)
Gender & marriage patterns (grouped bars)
Work type comparison (bar chart)
Urban vs rural (simple comparison)
Step 4: Dashboard Assembly
Layout Strategy:
Top row: Medical factors (heatmap, risk score)
Middle row: Demographics (age, gender/marriage)
Bottom row: Environmental (work, location)
Design Principles:
Leave white space
Consistent color scheme
Clear titles and labels
Add context in tooltips
Key Challenges (And Solutions)
Challenge 1: "Cannot mix aggregate and non-aggregate arguments"
Problem: Tried doing math with dimensions and measures incorrectly.
Solution: Used calculated fields to convert everything to the same type before calculations.
Challenge 2: Green Pills vs Blue Pills
Problem: Hypertension and Heart Disease showed as green (continuous) instead of blue (discrete).
Solution: Right-click → Convert to Discrete. Or drag them from Measures to Dimensions in the data pane.
Challenge 3: Tiny Unreadable Visualizations
Problem: Charts looked fine in edit mode but tiny in dashboard.
Solution:
Use containers to control sizing
Set minimum dimensions
Test on different screen sizes
Challenge 4: Making It Actually Useful
The Test: For each visualization, I asked:
What question does this answer?
Can someone understand it in 5 seconds?
Does it add new information?
If I couldn't answer all three, I deleted it (even if it looked pretty).
Key Findings
1. The Double Trouble Effect
Patients with BOTH hypertension and heart disease had dramatically higher stroke rates than those with just one. It's multiplicative, not additive.
2. Age Progression
Stroke risk increases steadily with age, but younger patients (under 50) still had strokes – it's not just an "old person problem."
3. Geographic Disparities
Urban and rural areas showed different stroke rates, likely due to healthcare access, lifestyle, or detection differences.
4. Work Type Matters
Different occupations showed varying stroke rates, possibly related to stress, activity levels, and healthcare access.
5. Risk Stratification Works
Combining multiple factors into a risk score effectively identified the most vulnerable populations.
What I Learned
Technical Skills
Pandas: Data cleaning, type conversions, creating categories
Tableau: Calculated fields, different chart types, dashboard design
Problem-solving: Reading error messages, debugging visualizations
Data Analysis Skills
Asking the right questions
Choosing appropriate visualizations
Balancing detail with clarity
Understanding aggregations and what they mean
Tools That Saved Me
Learning:
Tableau Public tutorials
Pandas documentation
YouTube for specific techniques
Development:
Jupyter Notebook for exploration
Tableau Public for visualization
ColorBrewer for color schemes
Markdown for documentation
The Nigerian Context
This isn't just practice – it matters. Nigeria's healthcare system faces:
Limited funding (below international benchmarks)
High out-of-pocket payments (70-75%)
Geographic disparities in access
Underfunded primary care
Data-driven insights can help:
Target resources to high-risk groups
Guide public health campaigns
Inform policy decisions
Identify infrastructure needs
What's Next
Potential Improvements:
Add predictive modeling (machine learning)
Include temporal trends if data available
Build interactive web app for risk assessment
Expand to other cardiovascular conditions
Final Thoughts
The Secret? There isn't one. Just:
Pick a project you care about
Break it into tiny steps
Google everything
Make mistakes and learn
Repeat
Your project won't be perfect. Mine isn't either. But done beats perfect, and started beats planning forever.
Useful Code Snippets
Data Cleaning Template
python
import pandas as pd
# Load and explore
df = pd.read_csv('data.csv')
print(df.info())
print(df.isnull().sum())
# Clean
df = df.dropna()
df['column'] = df['column'].astype(int)
# Export
df.to_csv('cleaned_data.csv', index=False)
```
### Tableau Calculated Fields
```
# Stroke Rate
AVG([Stroke]) * 100
# Risk Score
[Hypertension] + [Heart Disease] + IF [BMI] > 30 THEN 1 ELSE 0 END
# Boolean Conversion
IF [Condition] THEN 1 ELSE 0 END
Remember: Every expert was once a beginner. The only difference? They didn't give up. You've got this! 🚀
GitHub: Tracy Ouma
Tableau Dashboard: https://public.tableau.com/views/Book3_17624273398250/StrokeAnalysisinNigeria2?:language=en-US&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link
Website link: https://nigerian-stroke-insights.lovable.app
