Update README.md
Browse files
README.md
CHANGED
|
@@ -15,8 +15,11 @@ This repository contains the PyTorch version of the InternVL model weights.
|
|
| 15 |
|
| 16 |
InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM.
|
| 17 |
|
|
|
|
|
|
|
| 18 |
It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
|
| 19 |
|
|
|
|
| 20 |
# Pretrained Weights
|
| 21 |
|
| 22 |
| model name | type | download | size |
|
|
@@ -24,13 +27,13 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
|
|
| 24 |
| InternViT-6B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth) | 12 GB |
|
| 25 |
| InternVL-C-13B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/internvl_c_13b_224px.pth) | 25.4 GB |
|
| 26 |
|
| 27 |
-
# Linear-Probe Image Classification
|
| 28 |
|
| 29 |
| model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
|
| 30 |
| ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| 31 |
| InternViT-6B-224px | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
|
| 32 |
|
| 33 |
-
# Semantic Segmentation
|
| 34 |
|
| 35 |
| type | backbone | head | mIoU | config | download |
|
| 36 |
| --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
|
@@ -43,7 +46,6 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
|
|
| 43 |
| head tuning | InternViT-6B (frozen) | UperNet | 54.9 | [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/head_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.log) |
|
| 44 |
| full tuning | InternViT-6B | UperNet | 58.9 | [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/full_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.log) |
|
| 45 |
|
| 46 |
-
|
| 47 |
# License
|
| 48 |
This project is released under the MIT license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.
|
| 49 |
|
|
|
|
| 15 |
|
| 16 |
InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM.
|
| 17 |
|
| 18 |
+
It is trained using web-scale, noisy image-text pairs. The data are all publicly available and comprise multilingual content, including LAION-en, LAION-multi, LAION-COCO, COYO, Wukong, CC12M, CC3M, and SBU.
|
| 19 |
+
|
| 20 |
It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
|
| 21 |
|
| 22 |
+
|
| 23 |
# Pretrained Weights
|
| 24 |
|
| 25 |
| model name | type | download | size |
|
|
|
|
| 27 |
| InternViT-6B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth) | 12 GB |
|
| 28 |
| InternVL-C-13B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/internvl_c_13b_224px.pth) | 25.4 GB |
|
| 29 |
|
| 30 |
+
# Linear-Probe Image Classification (ImageNet Series)
|
| 31 |
|
| 32 |
| model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
|
| 33 |
| ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
| 34 |
| InternViT-6B-224px | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
|
| 35 |
|
| 36 |
+
# Semantic Segmentation (ADE20K)
|
| 37 |
|
| 38 |
| type | backbone | head | mIoU | config | download |
|
| 39 |
| --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
|
|
|
| 46 |
| head tuning | InternViT-6B (frozen) | UperNet | 54.9 | [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/head_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.log) |
|
| 47 |
| full tuning | InternViT-6B | UperNet | 58.9 | [config](https://github.com/OpenGVLab/InternVL/blob/main/segmentation//configs/intern_vit_6b/full_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.log) |
|
| 48 |
|
|
|
|
| 49 |
# License
|
| 50 |
This project is released under the MIT license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.
|
| 51 |
|