ben burtenshaw's picture

ben burtenshaw

burtenshaw

·

AI & ML interests

None yet

Recent Activity

updated a dataset 4 minutes ago

agents-course/certificates

updated a dataset 31 minutes ago

reasoning-course/certificates

updated a dataset 37 minutes ago

reasoning-course/certificates

View all activity

Organizations

Posts 24

Post

226

The open LLM leaderboard is completed, retired, dead, ‘ascended to a higher plane’. And in its shadow we have an amazing range of leaderboards built and maintained by the community.

In this post, I just want to list some of those great leaderboards that you should bookmark for staying up to date:

- Chatbot Arena LLM Leaderboard is the first port of call for checking out the best model. It’s not the fastest because humans will need to use the models to get scores, but it’s worth the wait. lmarena-ai/chatbot-arena-leaderboard

- OpenVLM Leaderboard is great for getting scores on vision language models opencompass/open_vlm_leaderboard

- Ai2 are doing a great job on RewardBench and I hope they keep it up because reward models are the unsexy workhorse of the field. allenai/reward-bench

- The GAIA leaderboard is great for evaluating agent applications. gaia-benchmark/leaderboard

🤩 This seems like such a sustainable way of building for the long term, where rather than leaning on a single company to evaluate all LLMs, we share the load.

Articles 13

Article

6

Making Gemma 3 think

View all Articles

Collections 5

Papers 2

arxiv:2502.02737

arxiv:2408.16961

spaces 26

Agent Builder

Create an AI agent using Hugging Face Spaces

Talk to Smolagents

FastRTC Voice Agent with smolagents

Deepseek Ai DeepSeek R1

Turn text into detailed images

Unit 1 Quiz - AI Agent Fundementals

Test your knowledge of the Agent fundamentals.

Code Quiz

A quiz app for rows of a dataset

Hub Recap

Generate Hugging Face stats image

models 10

burtenshaw/gemma-3-4b-thinking

Updated 1 day ago • 2

burtenshaw/code-smol2-text-to-sql

Updated Nov 18, 2024 • 3

burtenshaw/Qwen2.5-3B-Instruct-GGUF

Updated Oct 30, 2024 • 37

burtenshaw/gemma-help-tiny-sft

Text Generation • Updated Aug 9, 2024 • 125 • 1

burtenshaw/Qwen1.5-0.5B-dpo-mix-7k

Text Generation • Updated Apr 3, 2024 • 15

burtenshaw/notus-merged-with-code-mistral-so-its-better-at-coding

Updated Apr 2, 2024 • 3

burtenshaw/Qwen1.5-0.5B-dpo-mix-7k-GGUF

Updated Apr 2, 2024

burtenshaw/Qwen1.5-0.5B-dpo-mix-7k-5000

Text Generation • Updated Mar 29, 2024 • 15

burtenshaw/Qwen1.5-0.5B-dpo-mix-7k-3000

Text Generation • Updated Mar 29, 2024 • 7

burtenshaw/setfit_food_annotated

Text Classification • Updated Mar 2, 2023 • 16

datasets 29

burtenshaw/dummy-code-quiz

Viewer • Updated 29 days ago • 40 • 144

burtenshaw/frog_sft

Viewer • Updated about 1 month ago • 5 • 61

burtenshaw/exam_questions

Viewer • Updated Jan 24 • 10 • 100

burtenshaw/quiz-responses

Viewer • Updated Jan 24 • 1 • 76

burtenshaw/ohp-test-conversation

Preview • Updated Jan 8 • 37

burtenshaw/fineweb-c-prelim

Viewer • Updated Dec 17, 2024 • 157k • 68

burtenshaw/synthetic-generator-sft

Viewer • Updated Dec 12, 2024 • 10 • 62

burtenshaw/farming-dataset-synthetic-generator-classification

Viewer • Updated Dec 12, 2024 • 10 • 85

burtenshaw/cusomer-assitant

Viewer • Updated Dec 11, 2024 • 10 • 55

burtenshaw/testing-optout-app

Viewer • Updated Nov 28, 2024 • 2 • 168