YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

TCGA & IMPACT Genomic Biomarker WSI Training Checkpoints

This repository hosts the full set of 200th-epoch classification checkpoints used for genomic biomarker prediction across TCGA and IMPACT cohorts.

Checkpoints are organized strictly by:

  • Dataset source (TCGA or IMPACT)
  • Tumor type (e.g., HNSC, UCS, BRCA)
  • Gene (e.g., PIK3CA, FBXW7, BRAF)
  • Encoder (e.g., virchow, gigapath_ft)
  • Data split index (split_1, split_2, ...)

Repository Structure

The exact directory layout in this Hugging Face repo is:

TCGA_Genomic_Biomarker_WSI_Training/
β”œβ”€β”€ TCGA/
β”‚   └── checkpoints/
β”‚       └── <TUMOR>/
β”‚           └── <GENE>/
β”‚               └── TCGA_trained_<TUMOR>_<GENE>_<ENCODER>_gma_<SPLIT>_200.pth
β”‚
└── IMPACT/
    └── checkpoints/
        └── <TUMOR>/
            └── <GENE>/
                └── IMPACT_trained_<TUMOR>_<GENE>_<ENCODER>_gma_<SPLIT>_200.pth

Examples

TCGA/checkpoints/HNSC/PIK3CA/
    TCGA_trained_HNSC_PIK3CA_virchow_gma_1_200.pth
    TCGA_trained_HNSC_PIK3CA_virchow_gma_2_200.pth
    TCGA_trained_HNSC_PIK3CA_gigapath_ft_gma_1_200.pth

IMPACT/checkpoints/UCS/FBXW7/
    IMPACT_trained_UCS_FBXW7_virchow_gma_1_200.pth
    IMPACT_trained_UCS_FBXW7_gigapath_ft_gma_2_200.pth

Each checkpoint filename is self-descriptive:

<SOURCE>_trained_<TUMOR>_<GENE>_<ENCODER>_gma_<SPLIT>_200.pth

Downloading

1. Clone with Git LFS (recommended)

git lfs install
git clone https://huggingface.co/chadvanderbilt/TCGA_Genomic_Biomarker_WSI_Training
cd TCGA_Genomic_Biomarker_WSI_Training

2. Download an individual checkpoint

from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="chadvanderbilt/TCGA_Genomic_Biomarker_WSI_Training",
    filename="TCGA/checkpoints/HNSC/PIK3CA/TCGA_trained_HNSC_PIK3CA_virchow_gma_1_200.pth"
)
print(ckpt_path)

Checksum Logs (SHA256)

Each upload run writes a checksum log under:

logs/checkpoint_checksums_YYYYMMDD_HHMMSS.json

Each entry in this JSON file includes:

  • source (TCGA or IMPACT)
  • tumor
  • gene
  • encoder
  • split
  • remote_path (path inside this repo)
  • size_bytes
  • sha256
  • timestamp

These logs allow you to verify that your local copies of the checkpoints match the originals used at upload time.


Verifying Checkpoints After Download

This repo includes a helper script verify_checkpoints.py for checksum verification.

Usage

From the root of the cloned repo:

python verify_checkpoints.py logs/checkpoint_checksums_YYYYMMDD_HHMMSS.json

The script will:

  1. Read the JSON log.
  2. For each record, look up the file at remote_path under the repo root.
  3. Recompute SHA256 and size.
  4. Compare with the logged sha256 and size_bytes.

Example output:

OK       : 128
MISMATCH : 0
MISSING  : 0
  • OK – file exists and matches checksum and size.
  • MISMATCH – file exists but checksum or size does not match the log.
  • MISSING – file listed in the log is not present on disk.

The script exits with a non-zero status code if there are any mismatches or missing files.


verify_checkpoints.py

For convenience, the expected content of verify_checkpoints.py is:

import json, hashlib, sys
from pathlib import Path

def sha256_file(path, buf=1024*1024):
    h = hashlib.sha256()
    with open(path, "rb") as f:
        while True:
            chunk = f.read(buf)
            if not chunk:
                break
            h.update(chunk)
    return h.hexdigest()

def main(log_json: str):
    log_file = Path(log_json)
    if not log_file.is_file():
        print(f"ERROR: log not found: {log_json}")
        sys.exit(1)

    with log_file.open() as f:
        records = json.load(f)

    repo_root = Path(__file__).resolve().parent

    ok = mismatch = missing = 0

    for rec in records:
        remote_path = rec["remote_path"]
        expected_sha = rec["sha256"]
        expected_size = rec["size_bytes"]

        local_path = repo_root / remote_path

        if not local_path.exists():
            print(f"[MISSING] {remote_path}")
            missing += 1
            continue

        actual_size = local_path.stat().st_size
        actual_sha = sha256_file(local_path)

        if actual_sha == expected_sha and actual_size == expected_size:
            ok += 1
        else:
            mismatch += 1
            print(f"[MISMATCH] {remote_path}")
            print(f"  expected sha : {expected_sha}")
            print(f"  actual sha   : {actual_sha}")
            print(f"  expected size: {expected_size}")
            print(f"  actual size  : {actual_size}")

    print()
    print(f"OK       : {ok}")
    print(f"MISMATCH : {mismatch}")
    print(f"MISSING  : {missing}")

    if mismatch or missing:
        sys.exit(1)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python verify_checkpoints.py logs/checkpoint_checksums_YYYYMMDD_HHMMSS.json")
        sys.exit(1)
    main(sys.argv[1])

You can either copy this script into your local clone, or use the version shipped directly in the repository (if present).


license: mit

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support