TCGA & IMPACT Genomic Biomarker WSI Training Checkpoints
This repository hosts the full set of 200th-epoch classification checkpoints used for genomic biomarker prediction across TCGA and IMPACT cohorts.
Checkpoints are organized strictly by:
- Dataset source (
TCGAorIMPACT) - Tumor type (e.g.,
HNSC,UCS,BRCA) - Gene (e.g.,
PIK3CA,FBXW7,BRAF) - Encoder (e.g.,
virchow,gigapath_ft) - Data split index (
split_1,split_2, ...)
Repository Structure
The exact directory layout in this Hugging Face repo is:
TCGA_Genomic_Biomarker_WSI_Training/
βββ TCGA/
β βββ checkpoints/
β βββ <TUMOR>/
β βββ <GENE>/
β βββ TCGA_trained_<TUMOR>_<GENE>_<ENCODER>_gma_<SPLIT>_200.pth
β
βββ IMPACT/
βββ checkpoints/
βββ <TUMOR>/
βββ <GENE>/
βββ IMPACT_trained_<TUMOR>_<GENE>_<ENCODER>_gma_<SPLIT>_200.pth
Examples
TCGA/checkpoints/HNSC/PIK3CA/
TCGA_trained_HNSC_PIK3CA_virchow_gma_1_200.pth
TCGA_trained_HNSC_PIK3CA_virchow_gma_2_200.pth
TCGA_trained_HNSC_PIK3CA_gigapath_ft_gma_1_200.pth
IMPACT/checkpoints/UCS/FBXW7/
IMPACT_trained_UCS_FBXW7_virchow_gma_1_200.pth
IMPACT_trained_UCS_FBXW7_gigapath_ft_gma_2_200.pth
Each checkpoint filename is self-descriptive:
<SOURCE>_trained_<TUMOR>_<GENE>_<ENCODER>_gma_<SPLIT>_200.pth
Downloading
1. Clone with Git LFS (recommended)
git lfs install
git clone https://huggingface.co/chadvanderbilt/TCGA_Genomic_Biomarker_WSI_Training
cd TCGA_Genomic_Biomarker_WSI_Training
2. Download an individual checkpoint
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="chadvanderbilt/TCGA_Genomic_Biomarker_WSI_Training",
filename="TCGA/checkpoints/HNSC/PIK3CA/TCGA_trained_HNSC_PIK3CA_virchow_gma_1_200.pth"
)
print(ckpt_path)
Checksum Logs (SHA256)
Each upload run writes a checksum log under:
logs/checkpoint_checksums_YYYYMMDD_HHMMSS.json
Each entry in this JSON file includes:
source(TCGAorIMPACT)tumorgeneencodersplitremote_path(path inside this repo)size_bytessha256timestamp
These logs allow you to verify that your local copies of the checkpoints match the originals used at upload time.
Verifying Checkpoints After Download
This repo includes a helper script verify_checkpoints.py for checksum verification.
Usage
From the root of the cloned repo:
python verify_checkpoints.py logs/checkpoint_checksums_YYYYMMDD_HHMMSS.json
The script will:
- Read the JSON log.
- For each record, look up the file at
remote_pathunder the repo root. - Recompute SHA256 and size.
- Compare with the logged
sha256andsize_bytes.
Example output:
OK : 128
MISMATCH : 0
MISSING : 0
- OK β file exists and matches checksum and size.
- MISMATCH β file exists but checksum or size does not match the log.
- MISSING β file listed in the log is not present on disk.
The script exits with a non-zero status code if there are any mismatches or missing files.
verify_checkpoints.py
For convenience, the expected content of verify_checkpoints.py is:
import json, hashlib, sys
from pathlib import Path
def sha256_file(path, buf=1024*1024):
h = hashlib.sha256()
with open(path, "rb") as f:
while True:
chunk = f.read(buf)
if not chunk:
break
h.update(chunk)
return h.hexdigest()
def main(log_json: str):
log_file = Path(log_json)
if not log_file.is_file():
print(f"ERROR: log not found: {log_json}")
sys.exit(1)
with log_file.open() as f:
records = json.load(f)
repo_root = Path(__file__).resolve().parent
ok = mismatch = missing = 0
for rec in records:
remote_path = rec["remote_path"]
expected_sha = rec["sha256"]
expected_size = rec["size_bytes"]
local_path = repo_root / remote_path
if not local_path.exists():
print(f"[MISSING] {remote_path}")
missing += 1
continue
actual_size = local_path.stat().st_size
actual_sha = sha256_file(local_path)
if actual_sha == expected_sha and actual_size == expected_size:
ok += 1
else:
mismatch += 1
print(f"[MISMATCH] {remote_path}")
print(f" expected sha : {expected_sha}")
print(f" actual sha : {actual_sha}")
print(f" expected size: {expected_size}")
print(f" actual size : {actual_size}")
print()
print(f"OK : {ok}")
print(f"MISMATCH : {mismatch}")
print(f"MISSING : {missing}")
if mismatch or missing:
sys.exit(1)
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python verify_checkpoints.py logs/checkpoint_checksums_YYYYMMDD_HHMMSS.json")
sys.exit(1)
main(sys.argv[1])
You can either copy this script into your local clone, or use the version shipped directly in the repository (if present).