Stroke Risk Analysis in Nigeria: A Beginner's Data Journey

Introduction

Ever looked at a dataset and wondered, "What story is hiding in these numbers?" That's exactly what happened when I found stroke data from Nigeria on Hugging Face. As a data analytics beginner, I wanted to tackle something real – understanding what puts people at risk for strokes.

This article walks through how I went from a messy CSV to an interactive Tableau dashboard, the challenges I faced, and what I learned along the way.

The Project Goal

Build an interactive dashboard analyzing stroke risk factors in Nigerian patients, focusing on:

Medical conditions (hypertension, heart disease)
Demographics (age, gender, location)
Lifestyle factors (work type, smoking, marital status)

The Result: Six visualizations that tell a compelling story about stroke risk.

The Dataset

Source: Hugging Face
Records: Several thousand Nigerian patients
Columns:

Demographics: Age, Gender, Residence (Urban/Rural), Work Type
Medical: Hypertension, Heart Disease, Stroke (all binary 0/1)
Health Metrics: BMI, Average Glucose Level
Lifestyle: Smoking Status, Ever Married

The Challenge: Binary columns stored as 0/1 needed special handling for meaningful visualizations.

The Workflow

Step 1: Data Cleaning with Pandas

Started in a Jupyter Notebook because it's perfect for experimenting.

python

import pandas as pd

# Load the dataset
df = pd.read_csv('stroke_data.csv')

# Quick exploration
print(df.head())
print(df.info())
print(df.isnull().sum())

# Handle missing values
df = df.dropna()

# Ensure binary columns are integers
binary_cols = ['stroke', 'hypertension', 'heart_disease', 'ever_married']
for col in binary_cols:
    df[col] = df[col].astype(int)

# Create age groups for better visualization
df['age_group'] = pd.cut(df['age'], 
                         bins=[0, 18, 35, 50, 65, 100],
                         labels=['Child', 'Young Adult', 'Middle Age', 'Senior', 'Elderly'])

# Export cleaned data
df.to_csv('stroke_data_cleaned.csv', index=False)
```

**Key Cleaning Steps:**
- Removed rows with missing values
- Converted text columns to appropriate types
- Created age categories for analysis
- Verified data integrity

### Step 2: Into Tableau

**Loading Data:**
1. Open Tableau Public
2. Connect to the cleaned CSV
3. **Critical:** Verify data types in Data Source tab
   - Binary columns (0/1) should show as numbers (#)
   - If they're text (Abc), convert them

**Creating Calculated Fields:**

This is where the magic happens. You can't just drag 0/1 columns around – you need to calculate meaningful metrics.
```
Stroke Rate (%):
AVG([Stroke]) * 100
```

Why this works: Averaging 0s and 1s gives you the proportion of 1s (stroke cases), then multiply by 100 for percentage.
```
Risk Score:
[Hypertension] + [Heart Disease] + IF [BMI] > 30 THEN 1 ELSE 0 END
```

**Important:** The `IF [BMI] > 30 THEN 1 ELSE 0 END` is crucial because `[BMI] > 30` returns TRUE/FALSE, not a number. You can't add TRUE to integers.
```
Risk Category:
IF [Risk Score] >= 3 THEN "Very High Risk"
ELSEIF [Risk Score] = 2 THEN "High Risk"
ELSEIF [Risk Score] = 1 THEN "Medium Risk"
ELSE "Low Risk"
END

Step 3: Building Visualizations

Visualization 1: Heatmap (Hypertension × Heart Disease)

Goal: Show how these conditions interact

Setup:

Columns: Hypertension (blue pill/dimension)
Rows: Heart Disease (blue pill/dimension)
Color: AVG(Stroke) as percentage
Mark Type: Square
Size: COUNT() for sample size

Rookie Mistake: Kept getting a continuous axis instead of a 2×2 grid. Solution: Right-click the pill → Convert to Discrete.

Visualization 2: Risk Score Bar Chart

Setup:

Columns: Risk Category
Rows: AVG(Stroke) * 100
Color: Risk Category
Sort: Descending by stroke rate
Label: Show percentage and COUNT()

Other Visualizations

Following the same pattern, I created:

Age group analysis (bar chart)
Gender & marriage patterns (grouped bars)
Work type comparison (bar chart)
Urban vs rural (simple comparison)

Step 4: Dashboard Assembly

Layout Strategy:

Top row: Medical factors (heatmap, risk score)
Middle row: Demographics (age, gender/marriage)
Bottom row: Environmental (work, location)

Design Principles:

Leave white space
Consistent color scheme
Clear titles and labels
Add context in tooltips

Key Challenges (And Solutions)

Challenge 1: "Cannot mix aggregate and non-aggregate arguments"

Problem: Tried doing math with dimensions and measures incorrectly.

Solution: Used calculated fields to convert everything to the same type before calculations.

Challenge 2: Green Pills vs Blue Pills

Problem: Hypertension and Heart Disease showed as green (continuous) instead of blue (discrete).

Solution: Right-click → Convert to Discrete. Or drag them from Measures to Dimensions in the data pane.

Challenge 3: Tiny Unreadable Visualizations

Problem: Charts looked fine in edit mode but tiny in dashboard.

Solution:

Use containers to control sizing
Set minimum dimensions
Test on different screen sizes

Challenge 4: Making It Actually Useful

The Test: For each visualization, I asked:

What question does this answer?
Can someone understand it in 5 seconds?
Does it add new information?

If I couldn't answer all three, I deleted it (even if it looked pretty).

Key Findings

1. The Double Trouble Effect

Patients with BOTH hypertension and heart disease had dramatically higher stroke rates than those with just one. It's multiplicative, not additive.

2. Age Progression

Stroke risk increases steadily with age, but younger patients (under 50) still had strokes – it's not just an "old person problem."

3. Geographic Disparities

Urban and rural areas showed different stroke rates, likely due to healthcare access, lifestyle, or detection differences.

4. Work Type Matters

Different occupations showed varying stroke rates, possibly related to stress, activity levels, and healthcare access.

5. Risk Stratification Works

Combining multiple factors into a risk score effectively identified the most vulnerable populations.

What I Learned

Technical Skills

Pandas: Data cleaning, type conversions, creating categories
Tableau: Calculated fields, different chart types, dashboard design
Problem-solving: Reading error messages, debugging visualizations

Data Analysis Skills

Asking the right questions
Choosing appropriate visualizations
Balancing detail with clarity
Understanding aggregations and what they mean

Tools That Saved Me

Learning:

Tableau Public tutorials
Pandas documentation
YouTube for specific techniques

Development:

Jupyter Notebook for exploration
Tableau Public for visualization
ColorBrewer for color schemes
Markdown for documentation

The Nigerian Context

This isn't just practice – it matters. Nigeria's healthcare system faces:

Limited funding (below international benchmarks)
High out-of-pocket payments (70-75%)
Geographic disparities in access
Underfunded primary care

Data-driven insights can help:

Target resources to high-risk groups
Guide public health campaigns
Inform policy decisions
Identify infrastructure needs

What's Next

Potential Improvements:

Add predictive modeling (machine learning)
Include temporal trends if data available
Build interactive web app for risk assessment
Expand to other cardiovascular conditions

Final Thoughts

The Secret? There isn't one. Just:

Pick a project you care about
Break it into tiny steps
Google everything
Make mistakes and learn
Repeat

Your project won't be perfect. Mine isn't either. But done beats perfect, and started beats planning forever.

Useful Code Snippets

Data Cleaning Template

python

import pandas as pd

# Load and explore
df = pd.read_csv('data.csv')
print(df.info())
print(df.isnull().sum())

# Clean
df = df.dropna()
df['column'] = df['column'].astype(int)

# Export
df.to_csv('cleaned_data.csv', index=False)
```

### Tableau Calculated Fields
```
# Stroke Rate
AVG([Stroke]) * 100

# Risk Score  
[Hypertension] + [Heart Disease] + IF [BMI] > 30 THEN 1 ELSE 0 END

# Boolean Conversion
IF [Condition] THEN 1 ELSE 0 END

Remember: Every expert was once a beginner. The only difference? They didn't give up. You've got this! 🚀

GitHub: Tracy Ouma

Tableau Dashboard: https://public.tableau.com/views/Book3_17624273398250/StrokeAnalysisinNigeria2?:language=en-US&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

Website link: https://nigerian-stroke-insights.lovable.app

Stroke Risk Analysis in Nigeria: A Beginner's Data Journey

Introduction

The Project Goal

The Dataset

The Workflow

Step 1: Data Cleaning with Pandas

Step 3: Building Visualizations

Visualization 1: Heatmap (Hypertension × Heart Disease)

Visualization 2: Risk Score Bar Chart

Other Visualizations

Step 4: Dashboard Assembly

Key Challenges (And Solutions)

Challenge 1: "Cannot mix aggregate and non-aggregate arguments"

Challenge 2: Green Pills vs Blue Pills

Challenge 3: Tiny Unreadable Visualizations

Challenge 4: Making It Actually Useful

Key Findings

1. The Double Trouble Effect

2. Age Progression

3. Geographic Disparities

4. Work Type Matters

5. Risk Stratification Works

What I Learned

Technical Skills

Data Analysis Skills

Tools That Saved Me

The Nigerian Context

What's Next

Final Thoughts

Useful Code Snippets

Data Cleaning Template

Comments

More from this blog

How I Analyzed Maternal & Child Health in East Africa (And What the Numbers Actually Tell Us)

Building a Simple Hospital Finder with Python and Google Maps API

Command Palette

Introduction

The Project Goal

The Dataset

The Workflow

Step 1: Data Cleaning with Pandas

Step 3: Building Visualizations

Visualization 1: Heatmap (Hypertension × Heart Disease)

Visualization 2: Risk Score Bar Chart

Other Visualizations

Step 4: Dashboard Assembly

Key Challenges (And Solutions)

Challenge 1: "Cannot mix aggregate and non-aggregate arguments"

Challenge 2: Green Pills vs Blue Pills

Challenge 3: Tiny Unreadable Visualizations

Challenge 4: Making It Actually Useful

Key Findings

1. The Double Trouble Effect

2. Age Progression

3. Geographic Disparities

4. Work Type Matters

5. Risk Stratification Works

What I Learned

Technical Skills

Data Analysis Skills

Tools That Saved Me

The Nigerian Context

What's Next

Final Thoughts

Useful Code Snippets

Data Cleaning Template

Comments

More from this blog