clustering sampling development and evaluation notebook. UV libraries added and numpy version downgraded to 1.26.4 (due to the usage of sklearn-extra). Gemma tokenizer and multidimensional ks test added. Finally, dvc add of datasets embeddings and clustering sampling sample and plot
0964d15