Uncensor eval

#1
by kth8 - opened

I made my own test that replicate the official Heretic benchmark using the mlabonne/harmful_behaviors dataset and check for same refusal_markers in the first 100 tokens. This model refused 1/100 prompts, similar to 3.1 and actually much better than the 10/100 stated in the readme.

Testing log: https://gist.github.com/kth8/221516178d54fd47f693a902688e2892

the 1 response it matched seems like a false positive so this model actually answered every prompt

kth8 changed discussion status to closed

Sign up or log in to comment