Update README.md
Browse files
README.md
CHANGED
|
@@ -3,13 +3,13 @@ tags:
|
|
| 3 |
- sentence-transformers
|
| 4 |
- sentence-similarity
|
| 5 |
- feature-extraction
|
|
|
|
| 6 |
pipeline_tag: sentence-similarity
|
| 7 |
library_name: sentence-transformers
|
| 8 |
---
|
| 9 |
|
| 10 |
-
#
|
| 11 |
-
|
| 12 |
-
This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 13 |
|
| 14 |
## Model Details
|
| 15 |
|
|
@@ -20,14 +20,8 @@ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps
|
|
| 20 |
- **Output Dimensionality:** 1024 dimensions
|
| 21 |
- **Similarity Function:** Cosine Similarity
|
| 22 |
<!-- - **Training Dataset:** Unknown -->
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
### Model Sources
|
| 27 |
-
|
| 28 |
-
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
| 29 |
-
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
| 30 |
-
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
| 31 |
|
| 32 |
### Full Model Architecture
|
| 33 |
|
|
@@ -52,62 +46,41 @@ pip install -U sentence-transformers
|
|
| 52 |
Then you can load this model and run inference.
|
| 53 |
```python
|
| 54 |
from sentence_transformers import SentenceTransformer
|
| 55 |
-
|
| 56 |
-
#
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
]
|
| 64 |
-
embeddings = model.encode(sentences)
|
| 65 |
-
print(embeddings.shape)
|
| 66 |
-
# [3, 1024]
|
| 67 |
-
|
| 68 |
-
# Get the similarity scores for the embeddings
|
| 69 |
-
similarities = model.similarity(embeddings, embeddings)
|
| 70 |
-
print(similarities.shape)
|
| 71 |
-
# [3, 3]
|
| 72 |
-
```
|
| 73 |
-
|
| 74 |
-
<!--
|
| 75 |
-
### Direct Usage (Transformers)
|
| 76 |
-
|
| 77 |
-
<details><summary>Click to see the direct usage in Transformers</summary>
|
| 78 |
-
|
| 79 |
-
</details>
|
| 80 |
-
-->
|
| 81 |
-
|
| 82 |
-
<!--
|
| 83 |
-
### Downstream Usage (Sentence Transformers)
|
| 84 |
-
|
| 85 |
-
You can finetune this model on your own dataset.
|
| 86 |
-
|
| 87 |
-
<details><summary>Click to expand</summary>
|
| 88 |
-
|
| 89 |
-
</details>
|
| 90 |
-
-->
|
| 91 |
-
|
| 92 |
-
<!--
|
| 93 |
-
### Out-of-Scope Use
|
| 94 |
-
|
| 95 |
-
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
| 96 |
-
-->
|
| 97 |
-
|
| 98 |
-
<!--
|
| 99 |
-
## Bias, Risks and Limitations
|
| 100 |
|
| 101 |
-
|
| 102 |
-
|
|
|
|
| 103 |
|
| 104 |
-
|
| 105 |
-
|
| 106 |
|
| 107 |
-
|
| 108 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
-
|
| 111 |
|
| 112 |
### Framework Versions
|
| 113 |
- Python: 3.10.16
|
|
@@ -118,24 +91,7 @@ You can finetune this model on your own dataset.
|
|
| 118 |
- Datasets: 2.21.0
|
| 119 |
- Tokenizers: 0.21.1
|
| 120 |
|
| 121 |
-
## Citation
|
| 122 |
-
|
| 123 |
-
### BibTeX
|
| 124 |
-
|
| 125 |
-
<!--
|
| 126 |
-
## Glossary
|
| 127 |
-
|
| 128 |
-
*Clearly define terms in order to be accessible across audiences.*
|
| 129 |
-
-->
|
| 130 |
-
|
| 131 |
-
<!--
|
| 132 |
-
## Model Card Authors
|
| 133 |
-
|
| 134 |
-
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
| 135 |
-
-->
|
| 136 |
|
| 137 |
-
|
| 138 |
-
## Model Card Contact
|
| 139 |
|
| 140 |
-
|
| 141 |
-
-->
|
|
|
|
| 3 |
- sentence-transformers
|
| 4 |
- sentence-similarity
|
| 5 |
- feature-extraction
|
| 6 |
+
- telepix
|
| 7 |
pipeline_tag: sentence-similarity
|
| 8 |
library_name: sentence-transformers
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# PIXIE-Rune
|
| 12 |
+
An encoder-based embedding model trained on Korean and English triplets, developed by [TelePIX Co., Ltd](https://telepix.net/).
|
|
|
|
| 13 |
|
| 14 |
## Model Details
|
| 15 |
|
|
|
|
| 20 |
- **Output Dimensionality:** 1024 dimensions
|
| 21 |
- **Similarity Function:** Cosine Similarity
|
| 22 |
<!-- - **Training Dataset:** Unknown -->
|
| 23 |
+
- **Language:** Multilingual โ optimized for high performance in Korean and English
|
| 24 |
+
- **License:** apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
### Full Model Architecture
|
| 27 |
|
|
|
|
| 46 |
Then you can load this model and run inference.
|
| 47 |
```python
|
| 48 |
from sentence_transformers import SentenceTransformer
|
| 49 |
+
|
| 50 |
+
# Load the model
|
| 51 |
+
model_name = 'PIXIE-Rune-M-v1.0'
|
| 52 |
+
model = SentenceTransformer(model_name)
|
| 53 |
+
|
| 54 |
+
# Define the queries and documents
|
| 55 |
+
queries = [
|
| 56 |
+
"ํ
๋ ํฝ์ค๋ ์ด๋ค ์ฐ์
๋ถ์ผ์์ ์์ฑ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ๋์?",
|
| 57 |
+
"๊ตญ๋ฐฉ ๋ถ์ผ์ ์ด๋ค ์์ฑ ์๋น์ค๊ฐ ์ ๊ณต๋๋์?",
|
| 58 |
+
"ํ
๋ ํฝ์ค์ ๊ธฐ์ ์์ค์ ์ด๋ ์ ๋์ธ๊ฐ์?",
|
| 59 |
+
]
|
| 60 |
+
documents = [
|
| 61 |
+
"ํ
๋ ํฝ์ค๋ ๊ตญ๋ฐฉ, ๋์
, ์์, ํด์ ๋ฑ ๋ค์ํ ๋ถ์ผ์์ ์์ฑ ๋ฐ์ดํฐ๋ฅผ ๋ถ์ํ์ฌ ์๋น์ค๋ฅผ ์ ๊ณตํฉ๋๋ค.",
|
| 62 |
+
"์ ์ฐฐ ๋ฐ ๊ฐ์ ๋ชฉ์ ์ ์์ฑ ์์์ ํตํด ๊ตญ๋ฐฉ ๊ด๋ จ ์ ๋ฐ ๋ถ์ ์๋น์ค๋ฅผ ์ ๊ณตํฉ๋๋ค.",
|
| 63 |
+
"TelePIX์ ๊ดํ ํ์ฌ์ฒด ๋ฐ AI ๋ถ์ ๊ธฐ์ ์ Global standard๋ฅผ ์ํํ๋ ์์ค์ผ๋ก ํ๊ฐ๋ฐ๊ณ ์์ต๋๋ค.",
|
| 64 |
+
"ํ
๋ ํฝ์ค๋ ์ฐ์ฃผ์์ ์์งํ ์ ๋ณด๋ฅผ ๋ถ์ํ์ฌ '์ฐ์ฃผ ๊ฒฝ์ (Space Economy)'๋ผ๋ ์๋ก์ด ๊ฐ์น๋ฅผ ์ฐฝ์ถํ๊ณ ์์ต๋๋ค.",
|
| 65 |
+
"ํ
๋ ํฝ์ค๋ ์์ฑ ์์ ํ๋๋ถํฐ ๋ถ์, ์๋น์ค ์ ๊ณต๊น์ง ์ ์ฃผ๊ธฐ๋ฅผ ์์ฐ๋ฅด๋ ์๋ฃจ์
์ ์ ๊ณตํฉ๋๋ค.",
|
| 66 |
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
+
# Compute embeddings: use `prompt_name="query"` to encode queries!
|
| 69 |
+
query_embeddings = model.encode(queries, prompt_name="query")
|
| 70 |
+
document_embeddings = model.encode(documents)
|
| 71 |
|
| 72 |
+
# Compute cosine similarity scores
|
| 73 |
+
scores = model.similarity(query_embeddings, document_embeddings)
|
| 74 |
|
| 75 |
+
# Output the results
|
| 76 |
+
for query, query_scores in zip(queries, scores):
|
| 77 |
+
doc_score_pairs = list(zip(documents, query_scores))
|
| 78 |
+
doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
|
| 79 |
+
print("Query:", query)
|
| 80 |
+
for document, score in doc_score_pairs:
|
| 81 |
+
print(score, document)
|
| 82 |
|
| 83 |
+
```
|
| 84 |
|
| 85 |
### Framework Versions
|
| 86 |
- Python: 3.10.16
|
|
|
|
| 91 |
- Datasets: 2.21.0
|
| 92 |
- Tokenizers: 0.21.1
|
| 93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
+
## Contact
|
|
|
|
| 96 |
|
| 97 |
+
If you have any suggestions or questions about this Model, please reach out to the authors at bmkim@telepix.net.
|
|
|