Uncensor eval
#1
by
kth8
- opened
I made my own test that replicate the official Heretic benchmark using the mlabonne/harmful_behaviors dataset and check for same refusal_markers in the first 100 tokens. This model refused 1/100 prompts, similar to 3.1 and actually much better than the 10/100 stated in the readme.
Testing log: https://gist.github.com/kth8/221516178d54fd47f693a902688e2892
the 1 response it matched seems like a false positive so this model actually answered every prompt
kth8
changed discussion status to
closed