pedrodev2026 commited on
Commit
792ff9a
·
verified ·
1 Parent(s): 356fff2

Create DATASET_CREDITS.md

Browse files
Files changed (1) hide show
  1. DATASET_CREDITS.md +37 -0
DATASET_CREDITS.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Credits
2
+
3
+ This dataset is a combination of three existing datasets, pre-processed with **deduplication** and **token limit of 1024 tokens per example**.
4
+
5
+ ## Included Datasets
6
+
7
+ 1. **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)**
8
+ - Creator: CyberNative
9
+ - License: Apache 2.0
10
+ - Description: Code dataset focused on security vulnerabilities.
11
+
12
+ 2. **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)**
13
+ - Creator: Madras1
14
+ - License: Apache 2.0
15
+ - Description: Distilled code dataset emphasizing coding patterns and representations.
16
+
17
+ 3. **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)**
18
+ - Creator: pedrodev2026
19
+ - License: BSD 3-Clause
20
+ - Description: Custom distilled code dataset created and maintained by pedrodev2026.
21
+
22
+ ## Preprocessing
23
+
24
+ The combined dataset was prepared by:
25
+
26
+ - **Deduplicating** all examples to remove redundancy.
27
+ - Limiting examples to **1024 tokens each**.
28
+
29
+ ## License
30
+
31
+ The final combined dataset is licensed under **BSD 3-Clause**.
32
+ Users must still respect the original licenses of the included datasets when redistributing or using the original unmodified datasets.
33
+
34
+ - Original licenses:
35
+ - **[CyberNative/Code_Vulnerability_Security_DPO](https://huggingface.co/datasets/CyberNative/Code_Vulnerability_Security_DPO)**: Apache 2.0
36
+ - **[Madras1/minimax-m2.5-code-distilled-14k](https://huggingface.co/datasets/Madras1/minimax-m2.5-code-distilled-14k)**: Apache 2.0
37
+ - **[pedrodev2026/pedro-open-distil-dataset](https://huggingface.co/datasets/pedrodev2026/pedro-open-distil-dataset)**: BSD 3-Clause