CLI Reference¶

Usage:

$ scratchpad [OPTIONS] COMMAND [ARGS]...

Options:

--install-completion: Install completion for the current shell.
--show-completion: Show completion for the current shell, to copy it or customize the installation.
--help: Show this message and exit.

Commands:

benchmark
chat
serve: Spin up the server
version: Print the version

`scratchpad benchmark`¶

Usage:

$ scratchpad benchmark [OPTIONS] MODEL

Arguments:

MODEL: [required]

Options:

--tasks TEXT: [default: mmlu]
--url TEXT: [default: http://localhost:8080/v1]
--num-fewshot INTEGER: [default: 0]
--instruct-model / --no-instruct-model: [default: no-instruct-model]
--help: Show this message and exit.

`scratchpad chat`¶

Usage:

$ scratchpad chat [OPTIONS] MODEL

Arguments:

MODEL: [required]

Options:

--backend TEXT: [default: http://localhost:8080]
--help: Show this message and exit.

`scratchpad serve`¶

Spin up the server

Usage:

$ scratchpad serve [OPTIONS] MODEL

Arguments:

MODEL: [required]

Options:

--host TEXT: [default: 0.0.0.0]
--port INTEGER: [default: 3000]
--debug / --no-debug: [default: no-debug]
--model-path TEXT
--served-model-name TEXT: [default: auto]
--trust-remote-code / --no-trust-remote-code: [default: trust-remote-code]
--json-model-override-args TEXT: [default: {}]
--is-embedding / --no-is-embedding: [default: no-is-embedding]
--context-length INTEGER: [default: 4096]
--skip-tokenizer-init / --no-skip-tokenizer-init: [default: no-skip-tokenizer-init]
--tokenizer-path TEXT: [default: auto]
--tokenizer-mode TEXT: [default: auto]
--schedule-policy TEXT: [default: lpm]
--random-seed INTEGER
--stream-interval INTEGER: [default: 1]
--chunked-prefill-size INTEGER: [default: 8192]
--max-prefill-tokens INTEGER: [default: 16384]
--max-running-requests INTEGER
--max-total-tokens INTEGER
--kv-cache-dtype TEXT: [default: auto]
--schedule-conservativeness FLOAT: [default: 1.0]
--load-format TEXT: [default: auto]
--quantization TEXT
--dtype TEXT: [default: auto]
--dist-init-addr TEXT
--dp-size INTEGER: [default: 1]
--tp-size INTEGER: [default: 1]
--nnodes INTEGER: [default: 1]
--node-rank INTEGER: [default: 0]
--load-balance-method TEXT: [default: round_robin]
--tokenizer-port INTEGER: [default: 30001]
--scheduler-port INTEGER: [default: 30002]
--detokenizer-port INTEGER: [default: 30003]
--nccl-ports TEXT: [default: 30004]
--init-new-token-ratio FLOAT: [default: 0.7]
--base-min-new-token-ratio FLOAT: [default: 0.1]
--new-token-ratio-decay FLOAT: [default: 0.001]
--num-continue-decode-steps INTEGER: [default: 10]
--retract-decode-steps INTEGER: [default: 20]
--mem-fraction-static FLOAT: [default: 0.8]
--constrained-json-whitespace-pattern TEXT
--lora-paths TEXT
--max-loras-per-batch INTEGER: [default: 1]
--attention-backend TEXT
--sampling-backend TEXT
--disable-flashinfer / --no-disable-flashinfer: [default: no-disable-flashinfer]
--disable-flashinfer-sampling / --no-disable-flashinfer-sampling: [default: no-disable-flashinfer-sampling]
--disable-radix-cache / --no-disable-radix-cache: [default: no-disable-radix-cache]
--disable-regex-jump-forward / --no-disable-regex-jump-forward: [default: no-disable-regex-jump-forward]
--disable-cuda-graph / --no-disable-cuda-graph: [default: no-disable-cuda-graph]
--disable-cuda-graph-padding / --no-disable-cuda-graph-padding: [default: no-disable-cuda-graph-padding]
--disable-disk-cache / --no-disable-disk-cache: [default: no-disable-disk-cache]
--disable-custom-all-reduce / --no-disable-custom-all-reduce: [default: no-disable-custom-all-reduce]
--disable-mla / --no-disable-mla: [default: no-disable-mla]
--enable-mixed-chunk / --no-enable-mixed-chunk: [default: no-enable-mixed-chunk]
--enable-torch-compile / --no-enable-torch-compile: [default: no-enable-torch-compile]
--max-torch-compile-bs INTEGER: [default: 32]
--torchao-config TEXT
--enable-p2p-check / --no-enable-p2p-check: [default: no-enable-p2p-check]
--flashinfer-workspace-size INTEGER: [default: 402653184]
--triton-attention-reduce-in-fp32 / --no-triton-attention-reduce-in-fp32: [default: no-triton-attention-reduce-in-fp32]
--log-requests / --no-log-requests: [default: no-log-requests]
--show-time-cost / --no-show-time-cost: [default: no-show-time-cost]
--help: Show this message and exit.

`scratchpad version`¶

Print the version

Usage:

$ scratchpad version [OPTIONS]

Options:

--help: Show this message and exit.

CLI Reference¶

scratchpad benchmark¶

scratchpad chat¶

scratchpad serve¶

scratchpad version¶

`scratchpad benchmark`¶

`scratchpad chat`¶

`scratchpad serve`¶

`scratchpad version`¶