tiny_vllm --watch

A minimal continuous-batching LLM engine — paged KV-cache, automatic prefix caching, chunked prefill, SSE streaming. Built to be read end-to-end. Below is a live visualization of the scheduler and memory pool.

connecting…

Block pool

free cached (evictable) in use shared (refcount>1) hashed (border)

Scheduler

tokens / step
0
prefill / decode
0 / 0
step (ms)
0
prefix cache
free blocks
0
preemptions
0

step log


  

Sequences