A high quality Vietnamese pretraining dataset for LLMs
AI & ML interests
None defined yet.
models 0
None public yet
datasets 17
group2sealion/vnu-hard-clean
Viewer • Updated • 29.8k • 8
group2sealion/web_science_extract
Viewer • Updated • 11.6k • 8
group2sealion/qwen-gen-vnu
Viewer • Updated • 856 • 7
group2sealion/vnu_crawl
Viewer • Updated • 42.2k • 15
group2sealion/15mil_milestone
Viewer • Updated • 2.43M • 9
group2sealion/sft_eval
Viewer • Updated • 223 • 7
group2sealion/4mil_milestone
Viewer • Updated • 2.53M • 8
group2sealion/11mil_last
Viewer • Updated • 1.85M • 62
group2sealion/8mil_last
Viewer • Updated • 1.85M • 18
group2sealion/last_result
Viewer • Updated • 1.82M • 9