Rank | Model | LLM Params |
Frames | Date | Overall (%) | Real-Time Visual Understanding (%) | Omni-Source Understanding (%) | Contextual Understanding (%) |
---|---|---|---|---|---|---|---|---|
Gemini 1.5 Pro
|
- | Video | 2024-06-15 | 66.91 | 75.69 | 60.22 | 47.86 | |
Qwen2-VL
Alibaba |
7B | 768 | 2024-08-19 | 54.14 | 69.04 | 34.90 | 31.66 | |
GPT-4o
OpenAI |
- | 32 | 2024-06-15 | 60.15 | 73.28 | 44.50 | 38.70 | |
LLaVA-NeXT-Video
Bytedance & NTU S-Lab |
32B | 64 | 2024-05-10 | 52.77 | 66.96 | 34.90 | 30.79 | |
LLaVA-OneVision
Bytedance & NTU S-Lab |
7B | 32 | 2024-08-08 | 56.36 | 71.12 | 38.40 | 32.74 | |
VideoLLaMA 2
Alibaba |
7B | 32 | 2024-08-29 | 40.40 | 49.52 | 32.40 | 21.93 | |
VILA-1.5
NVIDIA & MIT |
8B | 14 | 2024-07-21 | 43.20 | 52.32 | 33.10 | 27.35 | |
MiniCPM-V 2.6
OpenBMB |
8B | 64 | 2024-08-12 | 53.85 | 67.44 | 35.00 | 34.97 | |
Claude 3.5 Sonnet
Anthropic |
- | 20 | 2024-07-30 | 57.68 | 72.44 | 36.80 | 37.70 | |
InternVL2
Shanghai AI Lab |
8B | 16 | 2024-07-18 | 51.40 | 63.72 | 35.80 | 32.42 | |
Kangaroo
Meituan & UCAS |
8B | 64 | 2024-07-23 | 51.10 | 64.60 | 34.20 | 30.06 | |
Video-CCAM
QQMM |
14B | 96 | 2024-07-16 | 42.53 | 53.96 | 29.70 | 22.88 | |
LongVA
NTU S-Lab |
7B | 128 | 2024-06-25 | 48.66 | 59.96 | 35.40 | 29.95 |