We-Math
/

V-Thinker

Model card Files Files and versions

V-Thinker / README.md

YanmHa's picture

Create README.md

3e864ce verified 4 months ago

|

1.52 kB

	---
	license: mit
	---
	## 💡 Overview

	> "The soul never thinks without an image." — Aristotle

	V-Thinker is a general-purpose multimodal reasoning assistant that enables Interactive Thinking with Images through end-to-end reinforcement learning. Unlike traditional vision-language models, V-Thinker actively interacts with visual content—editing, annotating, and transforming images to simplify complex problems.
	```bash
	import torch
	import os
	import json
	import argparse
	from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor, AutoConfig, Qwen3VLForConditionalGeneration
	from tqdm import tqdm
	from utils import run_evaluation # Assuming you have this utility function
	MODEL_PATH=""

	config = AutoConfig.from_pretrained(MODEL_PATH)
	model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
	MODEL_PATH,
	device_map="auto", # "auto" works perfectly with CUDA_VISIBLE_DEVICES
	config=config
	)
	processor = AutoProcessor.from_pretrained(MODEL_PATH)

	question_text = "Question: Hint: Please answer the question and provide the final answer at the end.\nQuestion: How many lines of symmetry does this figure have?\n\n\nPlease provide the final answer in the format <answer>X</answer>"
	image_path = "./224.png"

	# Construct the full, normalized image pat
	final_assistant_response, final_answer, aux_path = run_evaluation(question_text, image_path, "./", model, processor)
	print("Model Response")
	print(final_answer)
	print("auxiliary path")
	print(final_answer)

	```