Introduction

The model is trained with Masked Thought Fine-Tuning (MFT), a simple variant of standard Supervised Fine-Tuning (SFT). You can refer to our code and paper below.

Results

We test it with the scripts provided in MetaMath.

Model	GSM8K	MATH
adalaw/MetaMath-Mistral-7B-MFT	79.90	29.0
meta-math/MetaMath-Mistral-7B-SFT	77.70	28.2

Downloads last month: 3

Safetensors

Model size

7B params

Tensor type

F32

Dataset used to train adalaw/MetaMath-Mistral-7B-MFT

Paper for adalaw/MetaMath-Mistral-7B-MFT

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

Paper • 2403.02178 • Published Mar 4, 2024 • 1

adalaw
/

MetaMath-Mistral-7B-MFT

Introduction

Links

Results

Dataset used to train adalaw/MetaMath-Mistral-7B-MFT

Paper for adalaw/MetaMath-Mistral-7B-MFT

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models