turkish-organization-ner-dataset

This model is a fine-tuned version of dbmdz/bert-base-turkish-128k-uncased on the yeniguno/turkish-organization-ner-dataset dataset.

Unlike general NER models, it is trained only for organization detection (ORG).

The labels are:

O (outside),
B-ORG (beginning of organization),
I-ORG (inside organization).

Model description

NER (but only ORG)

lightweight NER focused on org detection

How to use it

You can load the model directly with the 🤗 pipeline API for NER:

from transformers import pipeline

model_id = "yeniguno/bert-turkish-organization-ner"

ner = pipeline("ner", model=model_id, aggregation_strategy="simple")

text = "Microsoft ve Koç Holding birlikte bir proje başlattı."

print(ner(text))
"""
[{'entity_group': 'ORG', 'score': np.float32(0.99849355), 'word': 'microsoft', 'start': 0, 'end': 9},
{'entity_group': 'ORG', 'score': np.float32(0.9970416), 'word': 'koc holding', 'start': 13, 'end': 24}]
"""

Intended uses & limitations

Guardrails in LLM applications: detect and flag organization names in user prompts or model outputs.
Content filtering & compliance: e.g. anonymization, redaction, or entity-specific monitoring.
Analytics: extracting organization mentions from Turkish text for search, clustering, or knowledge graphs.

Training and evaluation data

It achieves the following results on the evaluation set:

Loss: 0.1152
F1: 0.9159

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	F1
0.0617	1.0	8080	0.0679	0.8990
0.0471	2.0	16160	0.0640	0.9105
0.0295	3.0	24240	0.0846	0.9110
0.0277	4.0	32320	0.0959	0.9153
0.0116	5.0	40400	0.1152	0.9159

Framework versions

Transformers 4.55.2
Pytorch 2.9.0.dev20250816+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 5

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for yeniguno/bert-turkish-organization-ner-uncased

Base model

dbmdz/bert-base-turkish-128k-uncased

Finetuned

(9)

this model

yeniguno
/

bert-turkish-organization-ner-uncased