Object Detection
ultralytics
yolo
edge-ai
quantization

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YOLO11 GhostConv + Knowledge Distillation + Quantization

This notebook implements a complete model optimization pipeline for YOLO11 targeting edge devices, including: custom architecture with GhostConv, Knowledge Distillation, and Quantization.

πŸ“‹ Table of Contents

🎯 Overview

This notebook implements a 3-stage YOLO11 optimization pipeline:

1. Custom Architecture (YOLO11n-GhostConv)

  • Replace Conv layers with GhostConv to reduce parameters
  • Retain C3k2 and C2PSA blocks for feature extraction
  • Architecture optimized for traffic dataset (5 classes)

2. Knowledge Distillation (KD)

  • Teacher model: YOLO11l (large model)
  • Student model: YOLO11n-GhostConv (custom lightweight)
  • Techniques:
    • Feature-based distillation (MSE loss)
    • Logit-based distillation (KL divergence)
    • Temperature scaling (T=4.0)
    • Progressive KD with warmup epochs

3. Quantization

  • FP32 β†’ INT8 quantization with TFLite
  • FP32 β†’ FP16 quantization
  • Calibration dataset for INT8
  • Performance comparison: FP32 vs INT8 vs FP16

πŸ“ Notebook Structure

Section 1: Initialization

  • Mount Google Drive
  • Setup project directories
  • Import Ultralytics modules (GhostConv, C3k2, C2PSA)
  • Clone and install Ultralytics from source

Section 2: Custom Architecture

  • Define YOLO11_TinyGhost architecture in YAML
  • Backbone with GhostConv layers
  • Head with Detect layer for 5 classes
  • Train baseline model (50 epochs)

Section 3: Knowledge Distillation

Class implementations:

  • KDConfig: Configuration for KD training
  • KnowledgeDistillationTrainer: Custom trainer inheriting from DetectionTrainer
    • Forward hooks to capture intermediate features
    • Feature distillation loss (normalized MSE)
    • Logit distillation loss (KL divergence with temperature)
    • Combined loss: (1-Ξ±-Ξ²)*L_hard + Ξ±*L_feature + Ξ²*L_logit

Training strategy:

  • Warmup phase (8 epochs): hard loss only
  • After warmup: combine hard + KD losses
  • KD layers: ["model.4", "model.6", "model.10"] (P3, P4, PSA)
  • Hyperparameters: Ξ±=0.3, Ξ²=0.2, T=4.0

Section 4: Visualization

  • Training metrics plotting (mAP, loss curves)
  • F1 score tracking
  • Precision/Recall curves
  • Box/Class/DFL loss comparison

Section 5: Fine-tuning

  • Load best KD checkpoint
  • Fine-tune on multi-view intersection dataset
  • Freeze 3 backbone layers
  • Low learning rate (1e-5) with cosine scheduler

Section 6: Quantization

Export formats:

  • INT8 TFLite (with calibration dataset)
  • FP16 TFLite

Evaluation:

  • Compare mAP50 and mAP50-95
  • FP32 vs INT8 vs FP16
  • Image size: 416x416

πŸ”§ System Requirements

Hardware

  • GPU: CUDA-compatible (T4 or better recommended)
  • RAM: 16GB+
  • Storage: 10GB+ for datasets and models

Software

Python >= 3.8
PyTorch >= 1.13
CUDA >= 11.3
Google Colab (recommended)

πŸ“¦ Installation

1. Clone Ultralytics from source

!git clone https://github.com/ultralytics/ultralytics
%cd ultralytics
!pip install -e .

2. Dependencies

pip install torch torchvision
pip install matplotlib pandas
pip install opencv-python pillow

3. Dataset structure

dataset/
β”œβ”€β”€ images/
β”‚   β”œβ”€β”€ train/
β”‚   └── val/
β”œβ”€β”€ labels/
β”‚   β”œβ”€β”€ train/
β”‚   └── val/
└── data.yaml

πŸš€ Usage Guide

Step 1: Prepare Data

PROJECT_DIR = "/content/drive/MyDrive/yolo_ghostblock"
DATASET_DIR = "/content/drive/MyDrive/dataset/yolo_mtid_motor/dataset"

Step 2: Train Baseline GhostConv Model

model = YOLO("yolo11_tinyghost.yaml")
model.train(
    data=f"{DATASET_DIR}/data.yaml",
    epochs=50,
    imgsz=640,
    device=0
)

Step 3: Knowledge Distillation

# Load teacher and student
teacher = YOLO("path/to/teacher.pt")
student = YOLO("path/to/student.pt")

# Create KD trainer
TrainerClass = create_kd_trainer_class(
    teacher_model=teacher,
    kd_alpha=0.3,
    kd_beta=0.2,
    kd_temperature=4.0,
    kd_layers=["model.4", "model.6", "model.10"]
)

# Train with KD
trainer = TrainerClass(overrides={...})
trainer.train()

Step 4: Quantization

# Export INT8
model.export(
    format="tflite",
    int8=True,
    data=CALIB_YAML,
    imgsz=416
)

# Evaluate quantized model
model_int8 = YOLO("best_int8.tflite")
metrics = model_int8.val(data=DATA_YAML, imgsz=416)

πŸ“Š Results

Model Comparison

Model Parameters Size mAP50 mAP50-95
YOLO11l (Teacher) ~20M ~40MB 0.95+ 0.80+
YOLO11n-Ghost ~2M ~4MB 0.92+ 0.75+
+ KD ~2M ~4MB 0.94+ 0.78+
+ INT8 ~2M ~1MB 0.93+ 0.76+

Quantization Impact

  • FP32 β†’ INT8: ~75% size reduction, ~1-2% mAP drop
  • FP32 β†’ FP16: ~50% size reduction, ~0.5% mAP drop

Training Curves

  • Box Loss: converges after ~30 epochs
  • mAP50: reaches plateau ~35-40 epochs
  • F1 Score: 0.85-0.90 range

πŸ“– Technical Details

GhostConv Architecture

backbone:
  - [-1, 1, GhostConv, [64, 3, 2]]
  - [-1, 1, GhostConv, [128, 3, 2]]
  - [-1, 1, C3k2, [256, False, 0.25]]
  ...

KD Loss Formula

L_total = (1 - Ξ± - Ξ²) * L_hard + Ξ± * L_feature + Ξ² * L_logit

L_feature = MSE(normalize(S_feat), normalize(T_feat))
L_logit = KL(softmax(S/T), softmax(T/T)) * TΒ²

Quantization Config

  • INT8: Post-training quantization with calibration
  • Calibration: 100-200 images from training set
  • Input: uint8 [0, 255] or float32 normalized

βš™οΈ Hyperparameters

Training

  • Epochs: 40-50
  • Batch size: 16
  • Image size: 640x640
  • Learning rate: 5e-5 (baseline), 1e-5 (fine-tune)
  • Optimizer: AdamW with cosine scheduler

Knowledge Distillation

  • Ξ± (feature): 0.3
  • Ξ² (logit): 0.2
  • Temperature: 4.0
  • Warmup epochs: 8
  • KD layers: P3, P4, PSA output

Quantization

  • Format: TFLite
  • Input size: 416x416 (edge deployment)
  • Calibration samples: 100

πŸ› Troubleshooting

Issue 1: CUDA Out of Memory

# Reduce batch size
batch = 8

# Enable mixed precision
amp = True

Issue 2: Feature Shape Mismatch in KD

  • Check teacher and student architecture compatibility
  • Verify KD layer names match between models
  • Ensure input sizes are consistent

Issue 3: INT8 Quantization Accuracy Drop

  • Increase number of calibration samples
  • Use representative dataset (diverse conditions)
  • Consider QAT (Quantization-Aware Training)

πŸ“š References

Papers

Resources

🎯 Key Features

Architecture Optimization

  • GhostConv: Reduces FLOPs by ~50% compared to standard convolutions
  • Lightweight backbone: Maintains accuracy while reducing parameters
  • Flexible head: Supports multiple detection tasks

Knowledge Distillation

  • Multi-level distillation: Combines feature and logit knowledge transfer
  • Temperature-scaled softmax: Smooths probability distributions
  • Progressive training: Warmup phase for stable convergence

Model Compression

  • INT8 quantization: 4x memory reduction
  • FP16 quantization: 2x memory reduction
  • Edge-ready: Optimized for mobile/embedded deployment

πŸ’‘ Best Practices

Training

  1. Start with pre-trained weights when possible
  2. Use data augmentation (mosaic, mixup, etc.)
  3. Monitor validation metrics closely
  4. Apply early stopping (patience=10-15)

Knowledge Distillation

  1. Ensure teacher model is well-trained (mAP > 90%)
  2. Match batch normalization statistics
  3. Use appropriate temperature (T=3-5 for object detection)
  4. Gradually increase KD loss weight

Quantization

  1. Use diverse calibration dataset
  2. Test on representative test set
  3. Profile inference speed on target device
  4. Consider hybrid quantization (some layers FP32)

πŸ“ˆ Performance Metrics

Speed Benchmarks

Model FP32 (ms) FP16 (ms) INT8 (ms) Device
YOLO11l 45 28 N/A T4 GPU
YOLO11n-Ghost 12 8 N/A T4 GPU
INT8 TFLite N/A N/A 25 Edge TPU

Accuracy vs Efficiency

  • YOLO11l: Highest accuracy, largest model
  • YOLO11n-Ghost: Best accuracy/size trade-off
  • + KD: Closes gap with teacher
  • + INT8: Minimal accuracy loss, deployable

πŸ”„ Workflow Summary

graph LR
A[YOLO11l Teacher] --> B[Design GhostConv Student]
B --> C[Train Baseline]
C --> D[Knowledge Distillation]
D --> E[Fine-tune]
E --> F[Quantize INT8/FP16]
F --> G[Deploy to Edge]

πŸš€ Deployment

TFLite Conversion

# Export to TFLite INT8
model.export(
    format="tflite",
    int8=True,
    data="calibration.yaml",
    imgsz=416
)

Inference Example

import numpy as np
from PIL import Image

# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="best_int8.tflite")
interpreter.allocate_tensors()

# Preprocess image
img = Image.open("test.jpg").resize((416, 416))
input_data = np.array(img, dtype=np.uint8).reshape(1, 416, 416, 3)

# Run inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])

πŸ‘₯ Contributing

Contributions are welcome! Areas for improvement:

  • Additional distillation techniques (attention transfer, etc.)
  • QAT implementation
  • More lightweight architectures
  • Deployment examples for different platforms

πŸ“„ License

This notebook follows the Ultralytics AGPL-3.0 License.

πŸ™ Acknowledgments

  • Ultralytics for YOLO11 framework
  • GhostNet for efficient convolution design
  • Google Colab for compute resources

Note: This notebook is designed to run on Google Colab with GPU runtime. Adjust paths and configurations for local environments as needed.

Last Updated: January 2026
Version: v11
Compatibility: Ultralytics 8.0+

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for mrbrownn43/YOLOv11_customized-for-RaspberryPi4