End-to-End Emotion Detection: Data Processing, Modeling & Real-Time Deployment

- December 12, 2025

🎭 Building a Robust Real-Time Emotion Detection System Using Ensemble Learning

🔗 GitHub Repository: https://github.com/KToppo/Emotion-Detection-ML

Human emotion recognition has emerged as a powerful tool in modern AI applications—ranging from digital well-being solutions to marketing analytics and interactive systems. In this project, I built a Real-Time Emotion Detection System that uses a camera feed or an image URL to classify a person’s facial expression into one of several emotion categories.

The complete project — including code, models, pipelines, and demo — is available on GitHub:
👉 https://github.com/KToppo/Emotion-Detection-ML

This blog documents the entire journey — from data preprocessing to final deployment — and highlights the experiments, improvements, and insights gained along the way.

📂 Project Structure

Here is the complete directory structure:

├── models/
│   ├── labels_1.pkl
│   ├── labels_2.pkl
│   ├── labels_3.pkl
│   ├── M1SMOTE_boost.png
│   ├── M1SMOTE_Clf.png
│   ├── M2ENN_boost.png
│   ├── M2ENN_Clf.png
│   ├── M3SMOTE_clf-NE.png
│   ├── model_1.pkl
│   ├── model_2.pkl
│   ├── model_3.pkl
│   ├── model-boost_1.pkl
│   ├── model-boost_2.pkl
│   ├── pipline_1.pkl
│   ├── pipline_2.pkl
│   ├── pipline_3.pkl
├── Model_Building.ipynb
├── Model_Testing.ipynb
├── image-to-vector.py
├── kaggle_handler.py
├── web-app.py
├── haarcascade_frontalface_default.xml

Purpose of Each File

File / Folder	Description
models/	Stores all trained models, label encoders, pipelines, and model performance images.
image-to-vector.py	Converts raw images into gray-scaled 48×48 face vectors and builds `data.csv`.
kaggle_handler.py	Downloads datasets using `kagglehub` and organizes them into folders.
Model_Building.ipynb	Main notebook containing preprocessing, sampling, model training, and saving models.
Model_Testing.ipynb	Used to test predictions from all models and evaluate ensemble performance.
web-app.py	Streamlit application for webcam-based and URL-based emotion prediction.
haarcascade_frontalface_default.xml	Pretrained OpenCV cascade model for face detection.

🧱 Step 1: Dataset Preparation

Face Extraction & Vectorization

The file image-to-vector.py handles:

✔ Image loading
✔ Face detection using Haar Cascade
✔ Cropping & resizing to 48×48
✔ Flattening into 2304-pixel vectors
✔ Writing batches into data.csv

Even large datasets are handled efficiently using batch processing.

⚙️ Step 2: Model Building & Experiments

All experiments were executed inside Model_Building.ipynb.

📌 Preprocessing Flow

Remove duplicates
Split features (X) and labels (y)
Scale images using MinMaxScaler
Reduce dimensionality with PCA (400 components)

🧪 Step 3: Sampling Techniques & Their Impact

A major challenge was imbalanced emotion data. To handle this, several experiments were conducted.

🔹 Experiment 1: SMOTE + class_weight='balanced'

Models trained:

XGBoost
Stacking Classifier

Output classification reports saved as:

M1SMOTE_boost.png
M1SMOTE_Clf.png

Observation:
SMOTE increased the minority class recall but also introduced noise → affecting precision.

🔹 Experiment 2: SMOTEENN + class_weight='balanced'

SMOTEENN combines oversampling + cleaning using ENN.

Reports stored as:

M2ENN_boost.png
M2ENN_Clf.png

Observation:
Better than pure SMOTE, but class weights occasionally over-penalized majority classes.

🔹 Experiment 3: SMOTEENN + No Class Weight

This configuration was tested only for StackingClassifier.

Report saved as:

M3SMOTE_clf-NE.png

Observation:
This model showed the best balance between precision and recall across all emotions.

🏆 Step 4: Final Decision — Ensemble Voting

After analyzing all classification results, instead of selecting a single “best” model, I combined five models:

✔ model_1 (Stacking)

✔ model_2 (Stacking)

✔ model_3 (Stacking - Best version)

✔ model_boost_1 (XGBoost)

✔ model_boost_2 (XGBoost)

Each model includes its own pipeline (scaling + PCA) and label encoder.

I implemented a majority voting mechanism:

def combinepredic(img, models=models):    pred = []
    for model, pipline in models.items():
        X = pipline[0].transform(img)
        emotion = model.predict(X)
        pred.append(pipline[1].inverse_transform(emotion)[0])
    return pd.Series(pred).mode()[0]

📌 Result: Improved f1-score & recall across nearly all emotion classes.
This final classification summary is stored in:

final_model.png

📸 Step 5: Real-Time Emotion Detection Web App

The full working application, including Streamlit code, can be explored here:
🔗 GitHub: https://github.com/KToppo/Emotion-Detection-ML

The web-app.py uses Streamlit + WebRTC for:

1. Webcam Emotion Detection

Captures frames
Detects face
Runs prediction using ensemble voting
Updates every 2 seconds
Displays emotion overlay on video

2. Image URL Emotion Detection

Load image from URL
Detect face → preprocess → predict
Output final predicted emotion

🧠 Key Learnings & Improvements

1️⃣ Importance of Data Balancing Techniques

SMOTE introduced synthetic noise, while SMOTEENN cleaned incorrectly generated samples.
Final learning: SMOTEENN (no class weights) gave the most stable performance.

2️⃣ PCA dim-reduction is essential

Without PCA, models suffered:

High training time
Overfitting
Poor generalization

Reducing to 400 PCA components preserved >95% variance.

3️⃣ Ensemble > Individual Models

No single model performed best on all metrics.
Using a voting ensemble of 5 independent models gave the most reliable outcome.

4️⃣ Designing for Deployment

To ensure smooth deployment:

Pipelines were saved with models
Label encoders saved separately
Streamlit caching improved performance
WebRTC allowed real-time video inference

🚀 How to Run This Project

1. Install dependencies

pip install -r requirements.txt

2. Ensure your models are in `/models` folder

3. Run the Streamlit app

streamlit run web-app.py

4. Use either:

Webcam Mode
Image URL Mode

📌 Final Thoughts

This project helped me understand:

✔ How sampling techniques influence model fairness
✔ How PCA & pipelines help maintain reproducibility
✔ How ensemble learning significantly boosts robustness
✔ How to integrate ML models into a real-time application
✔ Complete ML lifecycle — from dataset creation to deployment

If you're looking to build accurate emotion recognition systems, combining clean preprocessing, imbalanced handling, and multiple models is the most effective approach—not just choosing a single classifier.