judge_answer___29_deberta_v3_base_msmarco_answerability
This model is a fine-tuned version of microsoft/deberta-v3-base on tom-010/msmarcov2.1-binary-answerability. The dataset is heavily biased (only 6% positives). The notebook used to train the model solved this, by sampling the negative samples, so that the ratio is 1-to-1.
It achieves the following results on the evaluation set:
- Loss: 0.4194
- Accuracy: 0.8164
- Precision: 0.7814
- Recall: 0.8815
- F1: 0.8284
See the run here: https://wandb.ai/stadeltom-com/huggingface/runs/l5mt601p?nw=nwuserstadeltom
Model description
The model is a fine-tunded DeBERTa v3 and classifies if a question/query is answered by a text (passage).
Intended uses & limitations
The task is to judge if a text answers a question. The dataset uses msmarco v2, which has a query and 10 search results of the bing search engine. An annotator answered the question and marked the passages (search results) used for the answer. The dataset goes through each passage of each query and adds to the dataset the query, the passage and if wether the passage was used to answer. The downside: False negatives are totally possible. The upside: A realistic case, as we also get 10 search results and need to filter them. But: It is unknown what the baseline is.
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| 0.5008 | 0.0272 | 2000 | 0.4931 | 0.7864 | 0.7498 | 0.8632 | 0.8025 |
| 0.4832 | 0.0544 | 4000 | 0.4565 | 0.7858 | 0.7422 | 0.8795 | 0.8050 |
| 0.4716 | 0.0816 | 6000 | 0.4758 | 0.7926 | 0.7527 | 0.8751 | 0.8093 |
| 0.4645 | 0.1088 | 8000 | 0.4740 | 0.7878 | 0.7633 | 0.8377 | 0.7988 |
| 0.4697 | 0.1360 | 10000 | 0.4519 | 0.7982 | 0.7720 | 0.8496 | 0.8089 |
| 0.4729 | 0.1632 | 12000 | 0.4471 | 0.7946 | 0.7664 | 0.8508 | 0.8064 |
| 0.4589 | 0.1904 | 14000 | 0.4455 | 0.8002 | 0.7661 | 0.8675 | 0.8137 |
| 0.4513 | 0.2176 | 16000 | 0.4726 | 0.7934 | 0.7472 | 0.8902 | 0.8125 |
| 0.4573 | 0.2448 | 18000 | 0.4357 | 0.8016 | 0.7775 | 0.8481 | 0.8113 |
| 0.4474 | 0.2720 | 20000 | 0.4738 | 0.7932 | 0.7503 | 0.8823 | 0.8110 |
| 0.448 | 0.2992 | 22000 | 0.4360 | 0.7934 | 0.7940 | 0.7955 | 0.7948 |
| 0.449 | 0.3264 | 24000 | 0.4464 | 0.7996 | 0.7708 | 0.8560 | 0.8112 |
| 0.449 | 0.3536 | 26000 | 0.4467 | 0.8048 | 0.7655 | 0.8819 | 0.8196 |
| 0.4483 | 0.3808 | 28000 | 0.4459 | 0.8042 | 0.7603 | 0.8918 | 0.8208 |
| 0.4468 | 0.4080 | 30000 | 0.4400 | 0.8054 | 0.7898 | 0.8353 | 0.8119 |
| 0.4413 | 0.4352 | 32000 | 0.4321 | 0.8048 | 0.7917 | 0.8302 | 0.8105 |
| 0.4444 | 0.4624 | 34000 | 0.4309 | 0.8086 | 0.7691 | 0.8850 | 0.8230 |
| 0.4507 | 0.4896 | 36000 | 0.4301 | 0.8124 | 0.7945 | 0.8457 | 0.8193 |
| 0.4426 | 0.5168 | 38000 | 0.4243 | 0.8052 | 0.7698 | 0.8739 | 0.8186 |
| 0.4321 | 0.5440 | 40000 | 0.4243 | 0.8074 | 0.7681 | 0.8839 | 0.8219 |
| 0.4301 | 0.5712 | 42000 | 0.4380 | 0.806 | 0.7640 | 0.8886 | 0.8216 |
| 0.4418 | 0.5984 | 44000 | 0.4280 | 0.8096 | 0.7857 | 0.8544 | 0.8186 |
| 0.4334 | 0.6256 | 46000 | 0.4326 | 0.809 | 0.7765 | 0.8707 | 0.8209 |
| 0.4385 | 0.6528 | 48000 | 0.4273 | 0.8116 | 0.7844 | 0.8624 | 0.8215 |
| 0.4337 | 0.6800 | 50000 | 0.4306 | 0.8086 | 0.7795 | 0.8636 | 0.8194 |
| 0.4294 | 0.7072 | 52000 | 0.4397 | 0.811 | 0.7706 | 0.8886 | 0.8254 |
| 0.4276 | 0.7344 | 54000 | 0.4344 | 0.8138 | 0.7770 | 0.8831 | 0.8267 |
| 0.4183 | 0.7616 | 56000 | 0.4291 | 0.812 | 0.7650 | 0.9037 | 0.8286 |
| 0.4226 | 0.7888 | 58000 | 0.4342 | 0.8134 | 0.7767 | 0.8827 | 0.8263 |
| 0.4266 | 0.8160 | 60000 | 0.4234 | 0.8132 | 0.7840 | 0.8675 | 0.8236 |
| 0.4285 | 0.8432 | 62000 | 0.4167 | 0.8156 | 0.7882 | 0.8660 | 0.8252 |
| 0.4265 | 0.8704 | 64000 | 0.4206 | 0.8142 | 0.7734 | 0.8918 | 0.8284 |
| 0.429 | 0.8976 | 66000 | 0.4165 | 0.8174 | 0.7910 | 0.8656 | 0.8266 |
| 0.4308 | 0.9248 | 68000 | 0.4192 | 0.814 | 0.7775 | 0.8827 | 0.8268 |
| 0.4248 | 0.9520 | 70000 | 0.4205 | 0.8152 | 0.7807 | 0.8795 | 0.8272 |
| 0.425 | 0.9792 | 72000 | 0.4194 | 0.8164 | 0.7814 | 0.8815 | 0.8284 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.4.1+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 26
Model tree for tom-010/judge_answer___29_deberta_v3_base_msmarco_answerability
Base model
microsoft/deberta-v3-base