prdev
/

query-gen

text-generation-inference

Model card Files Files and versions

query-gen / README.md

prdev's picture

Update README.md

773f985 verified 12 months ago

|

history blame contribute delete

3.1 kB

	---
	base_model: unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	license: apache-2.0
	language:
	- en
	---

	# Query Generation with LoRA Finetuning

	This project fine-tunes a language model using supervised fine-tuning (SFT) and LoRA adapters to generate queries from documents. The model was trained on the [`prdev/qtack-gq-embeddings-unsupervised`](https://huggingface.co/datasets/prdev/qtack-gq-embeddings-unsupervised) dataset using an A100 GPU.

	## Overview

	- Objective:
	The goal is to train a model that, given a document, generates a relevant query. Each training example is formatted with custom markers:
	- `<\|document\|>\n` precedes the document text.
	- `<\|query\|>\n` precedes the query text.
	- An EOS token is appended at the end to signal termination.

	- Text Chunking:
	For optimal performance, chunk your text into smaller, coherent pieces before providing it to the model. Long documents can lead the model to focus on specific details rather than the overall context.

	- Training Setup:
	The model is fine-tuned using the Unsloth framework with LoRA adapters, taking advantage of an A100 GPU for efficient training. See W&B loss curve here: https://wandb.ai/prdev/lora_model_training/panel/jp2r24xk7?nw=nwuserprdev



	## Quick Usage

	Below is an example code snippet to load the finetuned model and test it with a chunked document:

	```python
	from unsloth import FastLanguageModel
	from transformers import TextStreamer

	# Load the finetuned model and tokenizer from Hugging Face Hub.
	model, tokenizer = FastLanguageModel.from_pretrained("prdev/query-gen", load_in_4bit=True)

	# Enable faster inference if supported.
	FastLanguageModel.for_inference(model)

	# Example document chunk (ensure text is appropriately chunked).
	document_chunk = (
	"liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge "
	"and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects."
	)

	# Create the prompt using custom markers.
	prompt = (
	"<\|document\|>\n" + document_chunk + "\n<\|query\|>\n"
	)

	# Tokenize the prompt.
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

	# Set up a TextStreamer to view token-by-token generation.
	streamer = TextStreamer(tokenizer, skip_prompt=True)

	# Generate a query from the document.
	_ = model.generate(
	input_ids=inputs["input_ids"],
	streamer=streamer,
	max_new_tokens=100,
	temperature=0.7,
	min_p=0.1,
	eos_token_id=tokenizer.eos_token_id, # Ensures proper termination.
	)
	```

	# Uploaded model

	- Developed by: prdev
	- License: apache-2.0
	- Finetuned from model : unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit

	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)