CLI Reference

Usage:

$ scratchpad [OPTIONS] COMMAND [ARGS]...

Options:

  • --install-completion: Install completion for the current shell.

  • --show-completion: Show completion for the current shell, to copy it or customize the installation.

  • --help: Show this message and exit.

Commands:

  • benchmark

  • chat

  • serve: Spin up the server

  • version: Print the version

scratchpad benchmark

Usage:

$ scratchpad benchmark [OPTIONS] MODEL

Arguments:

  • MODEL: [required]

Options:

  • --tasks TEXT: [default: mmlu]

  • --url TEXT: [default: http://localhost:8080/v1]

  • --num-fewshot INTEGER: [default: 0]

  • --instruct-model / --no-instruct-model: [default: no-instruct-model]

  • --help: Show this message and exit.

scratchpad chat

Usage:

$ scratchpad chat [OPTIONS] MODEL

Arguments:

  • MODEL: [required]

Options:

  • --backend TEXT: [default: http://localhost:8080]

  • --help: Show this message and exit.

scratchpad serve

Spin up the server

Usage:

$ scratchpad serve [OPTIONS] MODEL

Arguments:

  • MODEL: [required]

Options:

  • --host TEXT: [default: 0.0.0.0]

  • --port INTEGER: [default: 3000]

  • --debug / --no-debug: [default: no-debug]

  • --model-path TEXT

  • --served-model-name TEXT: [default: auto]

  • --trust-remote-code / --no-trust-remote-code: [default: trust-remote-code]

  • --json-model-override-args TEXT: [default: {}]

  • --is-embedding / --no-is-embedding: [default: no-is-embedding]

  • --context-length INTEGER: [default: 4096]

  • --skip-tokenizer-init / --no-skip-tokenizer-init: [default: no-skip-tokenizer-init]

  • --tokenizer-path TEXT: [default: auto]

  • --tokenizer-mode TEXT: [default: auto]

  • --schedule-policy TEXT: [default: lpm]

  • --random-seed INTEGER

  • --stream-interval INTEGER: [default: 1]

  • --chunked-prefill-size INTEGER: [default: 8192]

  • --max-prefill-tokens INTEGER: [default: 16384]

  • --max-running-requests INTEGER

  • --max-total-tokens INTEGER

  • --kv-cache-dtype TEXT: [default: auto]

  • --schedule-conservativeness FLOAT: [default: 1.0]

  • --load-format TEXT: [default: auto]

  • --quantization TEXT

  • --dtype TEXT: [default: auto]

  • --dist-init-addr TEXT

  • --dp-size INTEGER: [default: 1]

  • --tp-size INTEGER: [default: 1]

  • --nnodes INTEGER: [default: 1]

  • --node-rank INTEGER: [default: 0]

  • --load-balance-method TEXT: [default: round_robin]

  • --tokenizer-port INTEGER: [default: 30001]

  • --scheduler-port INTEGER: [default: 30002]

  • --detokenizer-port INTEGER: [default: 30003]

  • --nccl-ports TEXT: [default: 30004]

  • --init-new-token-ratio FLOAT: [default: 0.7]

  • --base-min-new-token-ratio FLOAT: [default: 0.1]

  • --new-token-ratio-decay FLOAT: [default: 0.001]

  • --num-continue-decode-steps INTEGER: [default: 10]

  • --retract-decode-steps INTEGER: [default: 20]

  • --mem-fraction-static FLOAT: [default: 0.8]

  • --constrained-json-whitespace-pattern TEXT

  • --lora-paths TEXT

  • --max-loras-per-batch INTEGER: [default: 1]

  • --attention-backend TEXT

  • --sampling-backend TEXT

  • --disable-flashinfer / --no-disable-flashinfer: [default: no-disable-flashinfer]

  • --disable-flashinfer-sampling / --no-disable-flashinfer-sampling: [default: no-disable-flashinfer-sampling]

  • --disable-radix-cache / --no-disable-radix-cache: [default: no-disable-radix-cache]

  • --disable-regex-jump-forward / --no-disable-regex-jump-forward: [default: no-disable-regex-jump-forward]

  • --disable-cuda-graph / --no-disable-cuda-graph: [default: no-disable-cuda-graph]

  • --disable-cuda-graph-padding / --no-disable-cuda-graph-padding: [default: no-disable-cuda-graph-padding]

  • --disable-disk-cache / --no-disable-disk-cache: [default: no-disable-disk-cache]

  • --disable-custom-all-reduce / --no-disable-custom-all-reduce: [default: no-disable-custom-all-reduce]

  • --disable-mla / --no-disable-mla: [default: no-disable-mla]

  • --enable-mixed-chunk / --no-enable-mixed-chunk: [default: no-enable-mixed-chunk]

  • --enable-torch-compile / --no-enable-torch-compile: [default: no-enable-torch-compile]

  • --max-torch-compile-bs INTEGER: [default: 32]

  • --torchao-config TEXT

  • --enable-p2p-check / --no-enable-p2p-check: [default: no-enable-p2p-check]

  • --flashinfer-workspace-size INTEGER: [default: 402653184]

  • --triton-attention-reduce-in-fp32 / --no-triton-attention-reduce-in-fp32: [default: no-triton-attention-reduce-in-fp32]

  • --log-requests / --no-log-requests: [default: no-log-requests]

  • --show-time-cost / --no-show-time-cost: [default: no-show-time-cost]

  • --help: Show this message and exit.

scratchpad version

Print the version

Usage:

$ scratchpad version [OPTIONS]

Options:

  • --help: Show this message and exit.