Fix loading instructions: use AutoModelForMaskedLM and trust_remote_code=True
Browse files
README.md
CHANGED
|
@@ -16,15 +16,17 @@ This repository contains jointly trained Nucleotide Transformer (NT) and ESM2 mo
|
|
| 16 |
|
| 17 |
## Usage
|
| 18 |
|
|
|
|
|
|
|
| 19 |
```python
|
| 20 |
-
from transformers import
|
| 21 |
|
| 22 |
-
# Load DNA model
|
| 23 |
-
dna_model =
|
| 24 |
-
dna_tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="dna")
|
| 25 |
|
| 26 |
# Load protein model
|
| 27 |
-
protein_model =
|
| 28 |
protein_tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="protein")
|
| 29 |
|
| 30 |
# Example joint usage
|
|
|
|
| 16 |
|
| 17 |
## Usage
|
| 18 |
|
| 19 |
+
**IMPORTANT**: Both models are masked language models. The DNA model uses the Nucleotide Transformer architecture which requires `trust_remote_code=True`.
|
| 20 |
+
|
| 21 |
```python
|
| 22 |
+
from transformers import AutoModelForMaskedLM, AutoTokenizer
|
| 23 |
|
| 24 |
+
# Load DNA model - requires trust_remote_code for custom NT architecture
|
| 25 |
+
dna_model = AutoModelForMaskedLM.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="dna", trust_remote_code=True)
|
| 26 |
+
dna_tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="dna", trust_remote_code=True)
|
| 27 |
|
| 28 |
# Load protein model
|
| 29 |
+
protein_model = AutoModelForMaskedLM.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="protein")
|
| 30 |
protein_tokenizer = AutoTokenizer.from_pretrained("vsubasri/joint-nt-esm2-transcript-coding", subfolder="protein")
|
| 31 |
|
| 32 |
# Example joint usage
|