Vulnerable Code SAEs
Collection
8 items • Updated
This repository contains 8 Sparse Autoencoder(s) (SAE) trained using SAELens.
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-7B-Instruct |
| Architecture | standard |
| Input Dimension | 3584 |
| SAE Dimension | 16384 |
| Training Dataset | TQRG/DeltaSecommits_qwen-2.5-7b-instruct_tokenized_v2_vulnerable |
| Hook Point |
|---|
blocks.11.hook_resid_post |
blocks.0.hook_resid_post |
blocks.3.hook_resid_post |
blocks.7.hook_resid_post |
blocks.15.hook_resid_post |
blocks.19.hook_resid_post |
blocks.23.hook_resid_post |
blocks.27.hook_resid_post |
from sae_lens import SAE
# Load an SAE for a specific hook point
sae, cfg_dict, sparsity = SAE.from_pretrained(
release="rufimelo/vulnerable_code_qwen_coder_standard_16384",
sae_id="blocks.11.hook_resid_post" # Choose from available hook points above
)
# Use with TransformerLens
from transformer_lens import HookedTransformer
model = HookedTransformer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Get activations and encode
_, cache = model.run_with_cache("your text here")
activations = cache["blocks.11.hook_resid_post"]
features = sae.encode(activations)
blocks.11.hook_resid_post/cfg.json - SAE configurationblocks.11.hook_resid_post/sae_weights.safetensors - Model weightsblocks.11.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.0.hook_resid_post/cfg.json - SAE configurationblocks.0.hook_resid_post/sae_weights.safetensors - Model weightsblocks.0.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.3.hook_resid_post/cfg.json - SAE configurationblocks.3.hook_resid_post/sae_weights.safetensors - Model weightsblocks.3.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.7.hook_resid_post/cfg.json - SAE configurationblocks.7.hook_resid_post/sae_weights.safetensors - Model weightsblocks.7.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.15.hook_resid_post/cfg.json - SAE configurationblocks.15.hook_resid_post/sae_weights.safetensors - Model weightsblocks.15.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.19.hook_resid_post/cfg.json - SAE configurationblocks.19.hook_resid_post/sae_weights.safetensors - Model weightsblocks.19.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.23.hook_resid_post/cfg.json - SAE configurationblocks.23.hook_resid_post/sae_weights.safetensors - Model weightsblocks.23.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsblocks.27.hook_resid_post/cfg.json - SAE configurationblocks.27.hook_resid_post/sae_weights.safetensors - Model weightsblocks.27.hook_resid_post/sparsity.safetensors - Feature sparsity statisticsThese SAEs were trained with SAELens version 6.26.2.