|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- de |
|
|
- en |
|
|
base_model: Qwen/Qwen2-VL-2B-Instruct |
|
|
tags: |
|
|
- ocr |
|
|
- german |
|
|
- vision |
|
|
- document-understanding |
|
|
- invoice |
|
|
- qwen2-vl |
|
|
pipeline_tag: image-text-to-text |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- neuralabs/german-synth-ocr |
|
|
--- |
|
|
<p align="center"> |
|
|
<img src="https://raw.githubusercontent.com/Keyvanhardani/german-ocr/main/docs/logo.png" alt="German-OCR Logo" width="600"/> |
|
|
</p> |
|
|
|
|
|
<h1 align="center">German-OCR</h1> |
|
|
|
|
|
<p align="center"> |
|
|
<strong>High-performance German document OCR using fine-tuned Qwen2-VL-2B & Qwen2.5-VL-3B vision-language model</strong> |
|
|
</p> |
|
|
|
|
|
|
|
|
## Model Description |
|
|
|
|
|
German-OCR is specifically trained to extract text from German documents including invoices, receipts, forms, and other business documents. It outputs structured text in Markdown format. |
|
|
|
|
|
- **Base Model**: Qwen/Qwen2-VL-2B-Instruct |
|
|
- **Fine-tuning**: QLoRA (4-bit quantization) |
|
|
- **Training Data**: German invoices and business documents |
|
|
- **Output Format**: Markdown structured text |
|
|
|
|
|
## Model Variants |
|
|
|
|
|
| Model | Size | Base | HuggingFace | |
|
|
|-------|------|------|-------------| |
|
|
| german-ocr | 4.4 GB | Qwen2-VL-2B | [Keyven/german-ocr](https://huggingface.co/Keyven/german-ocr) | |
|
|
| german-ocr-3b | 7.5 GB | Qwen2.5-VL-3B | [Keyven/german-ocr-3b](https://huggingface.co/Keyven/german-ocr-3b) | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Option 1: Python Package (Recommended) |
|
|
|
|
|
```bash |
|
|
pip install german-ocr |
|
|
``` |
|
|
|
|
|
```python |
|
|
from german_ocr import GermanOCR |
|
|
|
|
|
# Using Ollama (fast, local) |
|
|
ocr = GermanOCR(backend="ollama") |
|
|
result = ocr.extract("document.png") |
|
|
print(result) |
|
|
|
|
|
# Using Transformers (more accurate) |
|
|
ocr = GermanOCR(backend="transformers") |
|
|
result = ocr.extract("document.png") |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
### Option 2: Ollama |
|
|
|
|
|
[!WARNING] |
|
|
> **In Entwicklung** - Vision-Adapter Kompatibilität wird noch bearbeitet. Für stabile Nutzung: |
|
|
[HuggingFace-Version](https://huggingface.co/Keyven/german-ocr) empfohlen. |
|
|
|
|
|
```bash |
|
|
ollama run Keyvan/german-ocr "Extrahiere den Text: image.png" |
|
|
``` |
|
|
|
|
|
### Option 3: Transformers |
|
|
|
|
|
```python |
|
|
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor |
|
|
from qwen_vl_utils import process_vision_info |
|
|
from PIL import Image |
|
|
|
|
|
model = Qwen2VLForConditionalGeneration.from_pretrained( |
|
|
"Keyven/german-ocr", |
|
|
device_map="auto" |
|
|
) |
|
|
processor = AutoProcessor.from_pretrained("Keyven/german-ocr") |
|
|
|
|
|
image = Image.open("document.png") |
|
|
|
|
|
messages = [{ |
|
|
"role": "user", |
|
|
"content": [ |
|
|
{"type": "image", "image": image}, |
|
|
{"type": "text", "text": "Extrahiere den Text aus diesem Dokument."} |
|
|
] |
|
|
}] |
|
|
|
|
|
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
image_inputs, video_inputs = process_vision_info(messages) |
|
|
inputs = processor( |
|
|
text=[text], |
|
|
images=image_inputs, |
|
|
videos=video_inputs, |
|
|
padding=True, |
|
|
return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
output_ids = model.generate(**inputs, max_new_tokens=512) |
|
|
result = processor.batch_decode( |
|
|
output_ids[:, inputs.input_ids.shape[1]:], |
|
|
skip_special_tokens=True |
|
|
)[0] |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| Base Model | Qwen2-VL-2B-Instruct | |
|
|
| Model Size | 4.4 GB | |
|
|
| VRAM (4-bit) | 1.5 GB | |
|
|
| Inference Time | ~15s (GPU) | |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Method**: QLoRA (4-bit quantization) |
|
|
- **Epochs**: 3 |
|
|
- **Learning Rate**: 2e-4 |
|
|
- **LoRA Rank**: 64 |
|
|
- **Target Modules**: All linear layers |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for German documents |
|
|
- Best results with clear, high-resolution images |
|
|
- May struggle with handwritten text |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## Author |
|
|
|
|
|
**Keyvan Hardani** |
|
|
- Website: [keyvan.ai](https://keyvan.ai) |
|
|
- LinkedIn: [linkedin.com/in/keyvanhardani](https://www.linkedin.com/in/keyvanhardani/) |
|
|
- GitHub: [@Keyvanhardani](https://github.com/Keyvanhardani) |
|
|
|
|
|
## Links |
|
|
|
|
|
- [GitHub](https://github.com/Keyvanhardani/german-ocr) |
|
|
- [Ollama](https://ollama.com/Keyvan/german-ocr) |
|
|
- [HuggingFace](https://huggingface.co/Keyven/german-ocr) |