README.md · Keyven/german-ocr at main

german-ocr / README.md

Keyven

Update README.md

b9c0db8 verified about 1 month ago

preview code

raw

history blame contribute delete

3.93 kB

	---
	license: apache-2.0
	language:
	- de
	- en
	base_model: Qwen/Qwen2-VL-2B-Instruct
	tags:
	- ocr
	- german
	- vision
	- document-understanding
	- invoice
	- qwen2-vl
	pipeline_tag: image-text-to-text
	library_name: transformers
	datasets:
	- neuralabs/german-synth-ocr
	---
	<p align="center">
	<img src="https://raw.githubusercontent.com/Keyvanhardani/german-ocr/main/docs/logo.png" alt="German-OCR Logo" width="600"/>
	</p>

	<h1 align="center">German-OCR</h1>

	<p align="center">
	<strong>High-performance German document OCR using fine-tuned Qwen2-VL-2B & Qwen2.5-VL-3B vision-language model</strong>
	</p>


	## Model Description

	German-OCR is specifically trained to extract text from German documents including invoices, receipts, forms, and other business documents. It outputs structured text in Markdown format.

	- Base Model: Qwen/Qwen2-VL-2B-Instruct
	- Fine-tuning: QLoRA (4-bit quantization)
	- Training Data: German invoices and business documents
	- Output Format: Markdown structured text

	## Model Variants

	\| Model \| Size \| Base \| HuggingFace \|
	\|-------\|------\|------\|-------------\|
	\| german-ocr \| 4.4 GB \| Qwen2-VL-2B \| [Keyven/german-ocr](https://huggingface.co/Keyven/german-ocr) \|
	\| german-ocr-3b \| 7.5 GB \| Qwen2.5-VL-3B \| [Keyven/german-ocr-3b](https://huggingface.co/Keyven/german-ocr-3b) \|

	## Usage

	### Option 1: Python Package (Recommended)

	```bash
	pip install german-ocr
	```

	```python
	from german_ocr import GermanOCR

	# Using Ollama (fast, local)
	ocr = GermanOCR(backend="ollama")
	result = ocr.extract("document.png")
	print(result)

	# Using Transformers (more accurate)
	ocr = GermanOCR(backend="transformers")
	result = ocr.extract("document.png")
	print(result)
	```

	### Option 2: Ollama

	[!WARNING]
	> In Entwicklung - Vision-Adapter Kompatibilität wird noch bearbeitet. Für stabile Nutzung:
	[HuggingFace-Version](https://huggingface.co/Keyven/german-ocr) empfohlen.

	```bash
	ollama run Keyvan/german-ocr "Extrahiere den Text: image.png"
	```

	### Option 3: Transformers

	```python
	from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
	from qwen_vl_utils import process_vision_info
	from PIL import Image

	model = Qwen2VLForConditionalGeneration.from_pretrained(
	"Keyven/german-ocr",
	device_map="auto"
	)
	processor = AutoProcessor.from_pretrained("Keyven/german-ocr")

	image = Image.open("document.png")

	messages = [{
	"role": "user",
	"content": [
	{"type": "image", "image": image},
	{"type": "text", "text": "Extrahiere den Text aus diesem Dokument."}
	]
	}]

	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	image_inputs, video_inputs = process_vision_info(messages)
	inputs = processor(
	text=[text],
	images=image_inputs,
	videos=video_inputs,
	padding=True,
	return_tensors="pt"
	).to(model.device)

	output_ids = model.generate(**inputs, max_new_tokens=512)
	result = processor.batch_decode(
	output_ids[:, inputs.input_ids.shape[1]:],
	skip_special_tokens=True
	)[0]
	print(result)
	```

	## Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Base Model \| Qwen2-VL-2B-Instruct \|
	\| Model Size \| 4.4 GB \|
	\| VRAM (4-bit) \| 1.5 GB \|
	\| Inference Time \| ~15s (GPU) \|

	## Training

	- Method: QLoRA (4-bit quantization)
	- Epochs: 3
	- Learning Rate: 2e-4
	- LoRA Rank: 64
	- Target Modules: All linear layers

	## Limitations

	- Optimized for German documents
	- Best results with clear, high-resolution images
	- May struggle with handwritten text

	## License

	Apache 2.0

	## Author

	Keyvan Hardani
	- Website: [keyvan.ai](https://keyvan.ai)
	- LinkedIn: [linkedin.com/in/keyvanhardani](https://www.linkedin.com/in/keyvanhardani/)
	- GitHub: [@Keyvanhardani](https://github.com/Keyvanhardani)

	## Links

	- [GitHub](https://github.com/Keyvanhardani/german-ocr)
	- [Ollama](https://ollama.com/Keyvan/german-ocr)
	- [HuggingFace](https://huggingface.co/Keyven/german-ocr)