Uncensor eval

by kth8 - opened 7 days ago

kth8

7 days ago

I made my own test that replicate the official Heretic benchmark using the mlabonne/harmful_behaviors dataset and check for same refusal_markers in the first 100 tokens. This model refused 1/100 prompts, similar to 3.1 and actually much better than the 10/100 stated in the readme.

Testing log: https://gist.github.com/kth8/221516178d54fd47f693a902688e2892

kth8

7 days ago

the 1 response it matched seems like a false positive so this model actually answered every prompt

kth8 changed discussion status to closed about 12 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment