Quantifying the Carbon Emissions of Machine Learning
Paper
•
1910.09700
•
Published
•
24
A fine-tuned DistilBERT model for multi-class topic classification. This model predicts the most relevant topic label from a predefined set based on input text. It was trained using 🤗 Transformers and PyTorch on a custom dataset derived from academic and news-style corpora.
This model was developed by Daniel (@AfroLogicInsect) to classify text into one of several predefined topics. It builds on the distilbert-base-uncased architecture and was fine-tuned for multi-class classification using a softmax output layer.
top_k=5 with return_all_scores=True to retrieve multiple topic predictionsfrom transformers import pipeline
classifier = pipeline(
"text-classification",
model="AfroLogicInsect/topic-model-analysis-model",
tokenizer="AfroLogicInsect/topic-model-analysis-model",
return_all_scores=True
)
text = "New AI breakthrough in natural language processing"
results = classifier(text)
top_5 = sorted(results[0], key=lambda x: x['score'], reverse=True)[:5]
for i, res in enumerate(top_5):
print(f"Top {i+1}: {res['label']} ({res['score']:.3f})")
Used Hugging Face Trainer API with TrainingArguments configured for early stopping and best model selection.
Model achieved strong performance across multiple topic categories. Evaluation metrics include:
@misc{afrologicinsect2025topicmodel,
title = {AfroLogicInsect Topic Classification Model},
author = {Akan Daniel},
year = {2025},
howpublished = {\url{https://huggingface.co/AfroLogicInsect/topic-model-analysis-model}},
}
Base model
distilbert/distilbert-base-uncased