BM-K commited on
Commit
0cb7797
ยท
verified ยท
1 Parent(s): f027ec8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -81
README.md CHANGED
@@ -3,13 +3,13 @@ tags:
3
  - sentence-transformers
4
  - sentence-similarity
5
  - feature-extraction
 
6
  pipeline_tag: sentence-similarity
7
  library_name: sentence-transformers
8
  ---
9
 
10
- # SentenceTransformer
11
-
12
- This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
13
 
14
  ## Model Details
15
 
@@ -20,14 +20,8 @@ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps
20
  - **Output Dimensionality:** 1024 dimensions
21
  - **Similarity Function:** Cosine Similarity
22
  <!-- - **Training Dataset:** Unknown -->
23
- <!-- - **Language:** Unknown -->
24
- <!-- - **License:** Unknown -->
25
-
26
- ### Model Sources
27
-
28
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
29
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
30
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
31
 
32
  ### Full Model Architecture
33
 
@@ -52,62 +46,41 @@ pip install -U sentence-transformers
52
  Then you can load this model and run inference.
53
  ```python
54
  from sentence_transformers import SentenceTransformer
55
-
56
- # Download from the ๐Ÿค— Hub
57
- model = SentenceTransformer("sentence_transformers_model_id")
58
- # Run inference
59
- sentences = [
60
- 'The weather is lovely today.',
61
- "It's so sunny outside!",
62
- 'He drove to the stadium.',
 
 
 
 
 
 
 
 
 
63
  ]
64
- embeddings = model.encode(sentences)
65
- print(embeddings.shape)
66
- # [3, 1024]
67
-
68
- # Get the similarity scores for the embeddings
69
- similarities = model.similarity(embeddings, embeddings)
70
- print(similarities.shape)
71
- # [3, 3]
72
- ```
73
-
74
- <!--
75
- ### Direct Usage (Transformers)
76
-
77
- <details><summary>Click to see the direct usage in Transformers</summary>
78
-
79
- </details>
80
- -->
81
-
82
- <!--
83
- ### Downstream Usage (Sentence Transformers)
84
-
85
- You can finetune this model on your own dataset.
86
-
87
- <details><summary>Click to expand</summary>
88
-
89
- </details>
90
- -->
91
-
92
- <!--
93
- ### Out-of-Scope Use
94
-
95
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
96
- -->
97
-
98
- <!--
99
- ## Bias, Risks and Limitations
100
 
101
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
102
- -->
 
103
 
104
- <!--
105
- ### Recommendations
106
 
107
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
108
- -->
 
 
 
 
 
109
 
110
- ## Training Details
111
 
112
  ### Framework Versions
113
  - Python: 3.10.16
@@ -118,24 +91,7 @@ You can finetune this model on your own dataset.
118
  - Datasets: 2.21.0
119
  - Tokenizers: 0.21.1
120
 
121
- ## Citation
122
-
123
- ### BibTeX
124
-
125
- <!--
126
- ## Glossary
127
-
128
- *Clearly define terms in order to be accessible across audiences.*
129
- -->
130
-
131
- <!--
132
- ## Model Card Authors
133
-
134
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
135
- -->
136
 
137
- <!--
138
- ## Model Card Contact
139
 
140
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
141
- -->
 
3
  - sentence-transformers
4
  - sentence-similarity
5
  - feature-extraction
6
+ - telepix
7
  pipeline_tag: sentence-similarity
8
  library_name: sentence-transformers
9
  ---
10
 
11
+ # PIXIE-Rune
12
+ An encoder-based embedding model trained on Korean and English triplets, developed by [TelePIX Co., Ltd](https://telepix.net/).
 
13
 
14
  ## Model Details
15
 
 
20
  - **Output Dimensionality:** 1024 dimensions
21
  - **Similarity Function:** Cosine Similarity
22
  <!-- - **Training Dataset:** Unknown -->
23
+ - **Language:** Multilingual โ€” optimized for high performance in Korean and English
24
+ - **License:** apache-2.0
 
 
 
 
 
 
25
 
26
  ### Full Model Architecture
27
 
 
46
  Then you can load this model and run inference.
47
  ```python
48
  from sentence_transformers import SentenceTransformer
49
+
50
+ # Load the model
51
+ model_name = 'PIXIE-Rune-M-v1.0'
52
+ model = SentenceTransformer(model_name)
53
+
54
+ # Define the queries and documents
55
+ queries = [
56
+ "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์–ด๋–ค ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ์œ„์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋‚˜์š”?",
57
+ "๊ตญ๋ฐฉ ๋ถ„์•ผ์— ์–ด๋–ค ์œ„์„ฑ ์„œ๋น„์Šค๊ฐ€ ์ œ๊ณต๋˜๋‚˜์š”?",
58
+ "ํ…”๋ ˆํ”ฝ์Šค์˜ ๊ธฐ์ˆ  ์ˆ˜์ค€์€ ์–ด๋А ์ •๋„์ธ๊ฐ€์š”?",
59
+ ]
60
+ documents = [
61
+ "ํ…”๋ ˆํ”ฝ์Šค๋Š” ๊ตญ๋ฐฉ, ๋†์—…, ์ž์›, ํ•ด์–‘ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์œ„์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
62
+ "์ •์ฐฐ ๋ฐ ๊ฐ์‹œ ๋ชฉ์ ์˜ ์œ„์„ฑ ์˜์ƒ์„ ํ†ตํ•ด ๊ตญ๋ฐฉ ๊ด€๋ จ ์ •๋ฐ€ ๋ถ„์„ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
63
+ "TelePIX์˜ ๊ด‘ํ•™ ํƒ‘์žฌ์ฒด ๋ฐ AI ๋ถ„์„ ๊ธฐ์ˆ ์€ Global standard๋ฅผ ์ƒํšŒํ•˜๋Š” ์ˆ˜์ค€์œผ๋กœ ํ‰๊ฐ€๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.",
64
+ "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์šฐ์ฃผ์—์„œ ์ˆ˜์ง‘ํ•œ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜์—ฌ '์šฐ์ฃผ ๊ฒฝ์ œ(Space Economy)'๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ฐ€์น˜๋ฅผ ์ฐฝ์ถœํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.",
65
+ "ํ…”๋ ˆํ”ฝ์Šค๋Š” ์œ„์„ฑ ์˜์ƒ ํš๋“๋ถ€ํ„ฐ ๋ถ„์„, ์„œ๋น„์Šค ์ œ๊ณต๊นŒ์ง€ ์ „ ์ฃผ๊ธฐ๋ฅผ ์•„์šฐ๋ฅด๋Š” ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
66
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
+ # Compute embeddings: use `prompt_name="query"` to encode queries!
69
+ query_embeddings = model.encode(queries, prompt_name="query")
70
+ document_embeddings = model.encode(documents)
71
 
72
+ # Compute cosine similarity scores
73
+ scores = model.similarity(query_embeddings, document_embeddings)
74
 
75
+ # Output the results
76
+ for query, query_scores in zip(queries, scores):
77
+ doc_score_pairs = list(zip(documents, query_scores))
78
+ doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
79
+ print("Query:", query)
80
+ for document, score in doc_score_pairs:
81
+ print(score, document)
82
 
83
+ ```
84
 
85
  ### Framework Versions
86
  - Python: 3.10.16
 
91
  - Datasets: 2.21.0
92
  - Tokenizers: 0.21.1
93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
+ ## Contact
 
96
 
97
+ If you have any suggestions or questions about this Model, please reach out to the authors at bmkim@telepix.net.