Open-weight model

II-Medical 8B

An open AI model for medical questions and clinical text. It runs fully offline on a small desktop, so patient details never leave the clinic.

II-Medical 8B
Field
Medicine
Runs
Fully offline
Built on
II-Medical 8B
License
Apache-2.0, free

II-Medical 8B

II-Medical 8B is an open AI model for medical questions and clinical text. It runs fully offline on a small desktop, so patient details never leave the clinic.

What it can do

It answers health and medical questions, explains conditions and terms in plain words, and works through clinical text step by step. It is built on Intelligent-Internet’s II-Medical-8B, which learned to reason its way to an answer rather than just guess, and it is packed into ready-to-run files for a single desktop.

How well it works

We scored five builds on MedMCQA, a 50-question medical exam test, on a small Spark desktop. The Q5_K_M build scored the best at 52 percent, above the full-size build, while running at about 36 tokens a second. The table above shows every build.

This is a short test, and the model is a study and drafting helper, not a doctor. It can be wrong, so never use it to make a real medical decision. Always check with a qualified clinician.

How to run it

Download the GGUF files (the ready-to-run format) and run them with llama.cpp on a Spark-class desktop, a small AI machine with 128 GB of memory. The Q5_K_M build is the sweet spot here: the best score and a healthy 36 tokens a second.

Install

huggingface-cli download Orionfold/II-Medical-8B-GGUF

Use it

llama-cli -hf Orionfold/II-Medical-8B-GGUF:Q5_K_M -p "Explain the difference between Type 1 and Type 2 diabetes."

Specs

Base model
Intelligent-Internet/II-Medical-8B
Format
GGUF (ready to run)
Builds
Q4_K_M · Q5_K_M · Q6_K · Q8_0 · F16
Best build
Q5_K_M (about 36 tokens a second on a Spark desktop)
License
Apache-2.0 (free to use)

Benchmarks

BuildMedMCQA scoreSpeed on a Spark
Q4_K_M42%44 tokens a second
Q5_K_M (best pick)52%36 tokens a second
Q6_K46%33 tokens a second
Q8_048%28 tokens a second
F16 (full size)48%16 tokens a second

Used in the open

Live counts from HuggingFace, refreshed when the site builds. Built and maintained in the open by Orionfold.

455
Downloads · last 30 days