You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SAM-Audio Judge Model

SAM-Audio Judge is a model for evaluating the quality of audio separation results for SAM Audio. It assesses how well a separated audio matches a given text description by providing four different quality metrics: overall quality, recall, precision, and faithfulness.

Authentication

Before using SAM-Audio Judge, you need to:

Request access to the checkpoints on the SAM-Audio Judge Hugging Face repo
Authenticate with Hugging Face: huggingface-cli login

Usage

Basic Usage

The Judge model evaluates the quality of audio separation by comparing the input audio, separated audio, and text description.

import torch
import torchaudio
from sam_audio import SAMAudioJudgeModel, SAMAudioJudgeProcessor

# Load model and processor
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SAMAudioJudgeModel.from_pretrained("facebook/sam-audio-judge").to(device).eval()
processor = SAMAudioJudgeProcessor.from_pretrained("facebook/sam-audio-judge")

# Load audio files
input_audio, sr = torchaudio.load("path/to/input_audio.wav")
separated_audio, sr = torchaudio.load("path/to/separated_audio.wav")

# Text description that was used for separation
description = "A man speaking"

# Process inputs
inputs = processor(
    text=[description],
    input_audio=[input_audio],  # Can also use a list of tensors (shape (1, num_samples))
    separated_audio=[separated_audio],  # Can also use a list of tensors (shape (1, num_samples))
).to(device)

# Get quality scores
with torch.inference_mode():
    result = model(**inputs)

# Access individual scores
print(f"Overall Quality: {result.overall.item():.3f}")
print(f"Recall: {result.recall.item():.3f}")
print(f"Precision: {result.precision.item():.3f}")
print(f"Faithfulness: {result.faithfulness.item():.3f}")

Batch Processing

You can evaluate multiple separation results in a single batch:

import torch
import torchaudio
from sam_audio import SAMAudioJudgeModel, SAMAudioJudgeProcessor

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SAMAudioJudgeModel.from_pretrained("facebook/sam-audio-judge").to(device).eval()
processor = SAMAudioJudgeProcessor.from_pretrained("facebook/sam-audio-judge")

# Multiple examples
descriptions = ["A man speaking", "Piano playing a melody", "A dog barking"]
input_audios = ["input1.wav", "input2.wav", "input3.wav"]
separated_audios = ["separated1.wav", "separated2.wav", "separated3.wav"]

# Process batch
inputs = processor(
    text=descriptions,
    input_audio=input_audios,
    separated_audio=separated_audios,
).to(device)

with torch.inference_mode():
    result = model(**inputs)

# Results shape: (batch_size, 1)
for i, desc in enumerate(descriptions):
    print(f"\nExample {i+1}: {desc}")
    print(f"  Overall: {result.overall[i].item():.3f}")
    print(f"  Recall: {result.recall[i].item():.3f}")
    print(f"  Precision: {result.precision[i].item():.3f}")
    print(f"  Faithfulness: {result.faithfulness[i].item():.3f}")

Evaluating SAM-Audio Separation

Here's a complete example that performs separation with SAM-Audio and then evaluates it with the Judge model:

import torch
import torchaudio
from sam_audio import SAMAudio, SAMAudioProcessor
from sam_audio import SAMAudioJudgeModel, SAMAudioJudgeProcessor

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Step 1: Perform separation with SAM-Audio
sam_model = SAMAudio.from_pretrained("facebook/sam-audio-large").to(device).eval()
sam_processor = SAMAudioProcessor.from_pretrained("facebook/sam-audio-large")

audio_file = "path/to/audio.wav"
description = "A person coughing"

# Separate
inputs = sam_processor(audios=[audio_file], descriptions=[description]).to(device)
with torch.inference_mode():
    separation_result = sam_model.separate(inputs)

# Step 2: Evaluate the separation
judge_model = SAMAudioJudgeModel.from_pretrained("facebook/sam-audio-judge").to(device).eval()
judge_processor = SAMAudioJudgeProcessor.from_pretrained("facebook/sam-audio-judge")

# Prepare for judge
judge_inputs = judge_processor(
    text=[description],
    input_audio=[audio_file],
    separated_audio=[separation_result.target[0].unsqueeze(0)],
    sampling_rate=judge_processor.audio_sampling_rate,
).to(device)

with torch.inference_mode():
    judge_result = judge_model(**judge_inputs)

print(f"\nSeparation Quality Metrics:")
print(f"Overall Quality: {judge_result.overall.item():.3f}")
print(f"Recall: {judge_result.recall.item():.3f}")
print(f"Precision: {judge_result.precision.item():.3f}")
print(f"Faithfulness: {judge_result.faithfulness.item():.3f}")

Output Format

The SAMAudioJudgeModel returns a SAMAudioJudgeOutput object with the following attributes:

overall (torch.Tensor): Overall quality of shape (batch_size, 1). This is a combined metric that represents the overall separation quality.
recall (torch.Tensor): Recall of shape (batch_size, 1). Measures how much of the target sound was successfully captured in the separation.
precision (torch.Tensor): Precision of shape (batch_size, 1). Measures how pure the separated sound is (i.e., how little unwanted sound is included).
faithfulness (torch.Tensor): Faithfulness of shape (batch_size, 1). For target sounds present in the extracted audio, how similar to they sound to their counterparts in the input audio.

All scores are continuous values where higher values indicate better quality.

Citation

If you use SAM-Audio Judge in your research, please cite:

@article{sam-audio,
  title={SAM-Audio: Segment Anything in Audio},
  author={Bowen Shi, Andros Tjandra, John Hoffman, Helin Wang, Yi-Chiao Wu, Luya Gao, Julius Richter, Matt Le, Apoorv Vyas, Sanyuan Chen, Christoph Feichtenhofer, Piotr Doll�r, Wei-Ning Hsu, Ann Lee},
  year={2025}
  url={arxiv link coming soon}
}

License

This project is licensed under the SAM License. See the LICENSE file for details.

Downloads last month: 659

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

facebook
/

sam-audio-judge