|
|
--- |
|
|
license: bigscience-openrail-m |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
GPT-R [Ronin] |
|
|
|
|
|
GPT-R is an experimental model containing a parameter-wise 60/40 blend (weighted average) of the weights of ppo_hh_gpt-j and GPT-JT-6B-v1. |
|
|
|
|
|
-Intended Merge Value- |
|
|
|
|
|
As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs. |
|
|
GPT-Ronin combines ppo_hh_gpt-j and GPT-JT; both technical |
|
|
achievements are blended with the intent to elevate the strengths of |
|
|
both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have |
|
|
the largest impact on the usefulness of a model without the expense of |
|
|
fine-tuning. Blend was done in FP32 and output in FP16. |
|
|
|
|
|
-Intended Use- |
|
|
|
|
|
Research purposes only, intended for responsible use. |
|
|
Express a task in natural language, and GPT-R will do the thing. |
|
|
Try telling it "Write an article about X but put Y spin on it.", |
|
|
"Write a five step numbered guide on how to do X.", or any other |
|
|
basic instructions. It does its best. |
|
|
|
|
|
Can also be used as a base to merge with conversational, |
|
|
story writing, or adventure themed models of the same class |
|
|
(GPT-J & 6b NeoX) and parameter size (6b) to experiment with |
|
|
the morphology of model weights based on the value added |
|
|
by instruct. |
|
|
|
|
|
Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.7, Temperature at 0.5, and Repetition Penalty at 1.14; extra samplers |
|
|
disabled. |
|
|
|
|
|
-Credits To- |
|
|
|
|
|
Core Model: |
|
|
https://huggingface.co/EleutherAI/gpt-j-6B |
|
|
Author: |
|
|
https://www.eleuther.ai/ |
|
|
|
|
|
Model1; 60% ppo_hh_gpt-j: |
|
|
https://huggingface.co/reciprocate/ppo_hh_gpt-j |
|
|
|
|
|
Author Repo: |
|
|
https://huggingface.co/reciprocate |
|
|
|
|
|
Related; CarperAI: |
|
|
https://huggingface.co/CarperAI |
|
|
|
|
|
Dataset is a variant of the Helpful Harmless assistant themed |
|
|
dataset and Proximal Policy Optimization, specific datasets |
|
|
used are unknown; listed repo datasets include: |
|
|
https://huggingface.co/datasets/reciprocate/summarize_eval_ilql |
|
|
https://huggingface.co/datasets/reciprocate/hh_eval_ilql |
|
|
|
|
|
PPO explained: |
|
|
https://paperswithcode.com/method/ppo |
|
|
Potential HH-type datasets utilized: |
|
|
https://huggingface.co/HuggingFaceH4 |
|
|
https://huggingface.co/datasets/Anthropic/hh-rlhf |
|
|
|
|
|
Model2; 40% GPT-JT-6B-V1: |
|
|
https://huggingface.co/togethercomputer/GPT-JT-6B-v1 |
|
|
|
|
|
Author Repo: |
|
|
https://huggingface.co/togethercomputer |
|
|
|
|
|
Related; BigScience: |
|
|
https://huggingface.co/bigscience |
|
|
|
|
|
Datasets: |
|
|
https://huggingface.co/datasets/the_pile |
|
|
https://huggingface.co/datasets/bigscience/P3 |
|
|
https://github.com/allenai/natural-instructions |
|
|
https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html |
|
|
|
|
|
Weight merge Script credit to Concedo: |
|
|
https://huggingface.co/concedo |
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_digitous__GPT-R) |
|
|
|
|
|
| Metric | Value | |
|
|
|-----------------------|---------------------------| |
|
|
| Avg. | 35.71 | |
|
|
| ARC (25-shot) | 41.21 | |
|
|
| HellaSwag (10-shot) | 66.89 | |
|
|
| MMLU (5-shot) | 36.5 | |
|
|
| TruthfulQA (0-shot) | 34.22 | |
|
|
| Winogrande (5-shot) | 64.4 | |
|
|
| GSM8K (5-shot) | 1.59 | |
|
|
| DROP (3-shot) | 5.14 | |
|
|
|