tiny_vllm --watch

A minimal continuous-batching LLM engine — paged KV-cache, automatic prefix caching, chunked prefill, SSE streaming. Built to be read end-to-end. Below is a live visualization of the scheduler and memory pool.

connecting…

max_tokens temperature top_p

This is a recorded session — to send live prompts, run the server locally.

git clone https://github.com/surajsharan/tiny_vLLM
cd tiny_vLLM && pip install -r requirements.txt
python -m tiny_vllm.server --model Qwen/Qwen2.5-0.5B-Instruct
# then open http://localhost:8000

Block pool

free cached (evictable) in use shared (refcount>1) hashed (border)

Scheduler

tokens / step

prefill / decode

0 / 0

step (ms)

prefix cache

—

free blocks

preemptions

step log

Sequences

Subscribed to /engine/events · Source on github.com/surajsharan/tiny_vLLM