turkish-organization-ner-dataset

This model is a fine-tuned version of dbmdz/bert-base-turkish-128k-uncased on the yeniguno/turkish-organization-ner-dataset dataset.

Unlike general NER models, it is trained only for organization detection (ORG).

The labels are:

  • O (outside),
  • B-ORG (beginning of organization),
  • I-ORG (inside organization).

Model description

NER (but only ORG)

lightweight NER focused on org detection

How to use it

You can load the model directly with the 🤗 pipeline API for NER:

from transformers import pipeline

model_id = "yeniguno/bert-turkish-organization-ner"

ner = pipeline("ner", model=model_id, aggregation_strategy="simple")

text = "Microsoft ve Koç Holding birlikte bir proje başlattı."

print(ner(text))
"""
[{'entity_group': 'ORG', 'score': np.float32(0.99849355), 'word': 'microsoft', 'start': 0, 'end': 9},
{'entity_group': 'ORG', 'score': np.float32(0.9970416), 'word': 'koc holding', 'start': 13, 'end': 24}]
"""

Intended uses & limitations

  • Guardrails in LLM applications: detect and flag organization names in user prompts or model outputs.
  • Content filtering & compliance: e.g. anonymization, redaction, or entity-specific monitoring.
  • Analytics: extracting organization mentions from Turkish text for search, clustering, or knowledge graphs.

Training and evaluation data

It achieves the following results on the evaluation set:

  • Loss: 0.1152
  • F1: 0.9159

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss F1
0.0617 1.0 8080 0.0679 0.8990
0.0471 2.0 16160 0.0640 0.9105
0.0295 3.0 24240 0.0846 0.9110
0.0277 4.0 32320 0.0959 0.9153
0.0116 5.0 40400 0.1152 0.9159

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.9.0.dev20250816+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
5
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yeniguno/bert-turkish-organization-ner-uncased

Finetuned
(9)
this model

Dataset used to train yeniguno/bert-turkish-organization-ner-uncased

Evaluation results