Final changes
Browse files- docs/dataset_card.md +4 -2
- docs/model_card.md +0 -20
docs/dataset_card.md
CHANGED
|
@@ -84,12 +84,14 @@ The NLBSE24 dataset was created to foster research and development in natural la
|
|
| 84 |
|
| 85 |
#### Data Collection and Processing
|
| 86 |
|
| 87 |
-
The original data consists of GitHub issue reports collected from various open-source repositories.
|
|
|
|
|
|
|
| 88 |
1. **Column Identification:** Identifying 'label' and 'issue' columns for target and text data respectively.
|
| 89 |
2. **Basic Structural Cleaning:**
|
| 90 |
* Removal of rows with inconsistent data types in columns.
|
| 91 |
* Removal of rows containing missing values (NaN) or empty strings across relevant fields.
|
| 92 |
-
* Removal of exact duplicate rows.
|
| 93 |
|
| 94 |
#### Who are the source data producers?
|
| 95 |
|
|
|
|
| 84 |
|
| 85 |
#### Data Collection and Processing
|
| 86 |
|
| 87 |
+
The original data consists of GitHub issue reports collected from various open-source repositories.
|
| 88 |
+
<!--
|
| 89 |
+
The preprocessing for our "soft-cleaned" version includes:
|
| 90 |
1. **Column Identification:** Identifying 'label' and 'issue' columns for target and text data respectively.
|
| 91 |
2. **Basic Structural Cleaning:**
|
| 92 |
* Removal of rows with inconsistent data types in columns.
|
| 93 |
* Removal of rows containing missing values (NaN) or empty strings across relevant fields.
|
| 94 |
+
* Removal of exact duplicate rows. -->
|
| 95 |
|
| 96 |
#### Who are the source data producers?
|
| 97 |
|
docs/model_card.md
CHANGED
|
@@ -221,26 +221,6 @@ If you use this model in your research, consider citing the relevant SetFit and
|
|
| 221 |
|
| 222 |
**BibTeX:**
|
| 223 |
|
| 224 |
-
```bibtex
|
| 225 |
-
@article{setfit2022,
|
| 226 |
-
title={{SetFit: Efficient Few-Shot Learning with Sentence Transformers}},
|
| 227 |
-
author={Hofst{\"a}tter, Philipp and Reimers, Nils and {de Jong}, Henri and van der Vegt, Wouter and van der Velde, Maarten and Rausch, Andreas and Aken, Bob van and Pietsch, Stefan and Godey, Julien and van der Goot, Rob and de Jong, Iryna and Gurevych, Iryna and de Rijke, Maarten},
|
| 228 |
-
journal={arXiv preprint arXiv:2209.11055},
|
| 229 |
-
year={2022}
|
| 230 |
-
}
|
| 231 |
-
@inproceedings{wang-etal-2020-mini,
|
| 232 |
-
title = "{MiniLM}: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers",
|
| 233 |
-
author = "Wang, Wenhui and
|
| 234 |
-
Gao, Furu and
|
| 235 |
-
Chen, Bin and
|
| 236 |
-
Chen, Yaojie and
|
| 237 |
-
Li, Shijian and
|
| 238 |
-
Han, Shuzhou",
|
| 239 |
-
booktitle = "Advances in Neural Information Processing Systems",
|
| 240 |
-
volume = "33",
|
| 241 |
-
year = "2020"
|
| 242 |
-
}
|
| 243 |
-
```
|
| 244 |
|
| 245 |
**APA:**
|
| 246 |
|
|
|
|
| 221 |
|
| 222 |
**BibTeX:**
|
| 223 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 224 |
|
| 225 |
**APA:**
|
| 226 |
|