CLI Reference¶
Usage:
$ scratchpad [OPTIONS] COMMAND [ARGS]...
Options:
--install-completion: Install completion for the current shell.--show-completion: Show completion for the current shell, to copy it or customize the installation.--help: Show this message and exit.
Commands:
benchmarkchatserve: Spin up the serverversion: Print the version
scratchpad benchmark¶
Usage:
$ scratchpad benchmark [OPTIONS] MODEL
Arguments:
MODEL: [required]
Options:
--tasks TEXT: [default: mmlu]--url TEXT: [default: http://localhost:8080/v1]--num-fewshot INTEGER: [default: 0]--instruct-model / --no-instruct-model: [default: no-instruct-model]--help: Show this message and exit.
scratchpad chat¶
Usage:
$ scratchpad chat [OPTIONS] MODEL
Arguments:
MODEL: [required]
Options:
--backend TEXT: [default: http://localhost:8080]--help: Show this message and exit.
scratchpad serve¶
Spin up the server
Usage:
$ scratchpad serve [OPTIONS] MODEL
Arguments:
MODEL: [required]
Options:
--host TEXT: [default: 0.0.0.0]--port INTEGER: [default: 3000]--debug / --no-debug: [default: no-debug]--model-path TEXT--served-model-name TEXT: [default: auto]--trust-remote-code / --no-trust-remote-code: [default: trust-remote-code]--json-model-override-args TEXT: [default: {}]--is-embedding / --no-is-embedding: [default: no-is-embedding]--context-length INTEGER: [default: 4096]--skip-tokenizer-init / --no-skip-tokenizer-init: [default: no-skip-tokenizer-init]--tokenizer-path TEXT: [default: auto]--tokenizer-mode TEXT: [default: auto]--schedule-policy TEXT: [default: lpm]--random-seed INTEGER--stream-interval INTEGER: [default: 1]--chunked-prefill-size INTEGER: [default: 8192]--max-prefill-tokens INTEGER: [default: 16384]--max-running-requests INTEGER--max-total-tokens INTEGER--kv-cache-dtype TEXT: [default: auto]--schedule-conservativeness FLOAT: [default: 1.0]--load-format TEXT: [default: auto]--quantization TEXT--dtype TEXT: [default: auto]--dist-init-addr TEXT--dp-size INTEGER: [default: 1]--tp-size INTEGER: [default: 1]--nnodes INTEGER: [default: 1]--node-rank INTEGER: [default: 0]--load-balance-method TEXT: [default: round_robin]--tokenizer-port INTEGER: [default: 30001]--scheduler-port INTEGER: [default: 30002]--detokenizer-port INTEGER: [default: 30003]--nccl-ports TEXT: [default: 30004]--init-new-token-ratio FLOAT: [default: 0.7]--base-min-new-token-ratio FLOAT: [default: 0.1]--new-token-ratio-decay FLOAT: [default: 0.001]--num-continue-decode-steps INTEGER: [default: 10]--retract-decode-steps INTEGER: [default: 20]--mem-fraction-static FLOAT: [default: 0.8]--constrained-json-whitespace-pattern TEXT--lora-paths TEXT--max-loras-per-batch INTEGER: [default: 1]--attention-backend TEXT--sampling-backend TEXT--disable-flashinfer / --no-disable-flashinfer: [default: no-disable-flashinfer]--disable-flashinfer-sampling / --no-disable-flashinfer-sampling: [default: no-disable-flashinfer-sampling]--disable-radix-cache / --no-disable-radix-cache: [default: no-disable-radix-cache]--disable-regex-jump-forward / --no-disable-regex-jump-forward: [default: no-disable-regex-jump-forward]--disable-cuda-graph / --no-disable-cuda-graph: [default: no-disable-cuda-graph]--disable-cuda-graph-padding / --no-disable-cuda-graph-padding: [default: no-disable-cuda-graph-padding]--disable-disk-cache / --no-disable-disk-cache: [default: no-disable-disk-cache]--disable-custom-all-reduce / --no-disable-custom-all-reduce: [default: no-disable-custom-all-reduce]--disable-mla / --no-disable-mla: [default: no-disable-mla]--enable-mixed-chunk / --no-enable-mixed-chunk: [default: no-enable-mixed-chunk]--enable-torch-compile / --no-enable-torch-compile: [default: no-enable-torch-compile]--max-torch-compile-bs INTEGER: [default: 32]--torchao-config TEXT--enable-p2p-check / --no-enable-p2p-check: [default: no-enable-p2p-check]--flashinfer-workspace-size INTEGER: [default: 402653184]--triton-attention-reduce-in-fp32 / --no-triton-attention-reduce-in-fp32: [default: no-triton-attention-reduce-in-fp32]--log-requests / --no-log-requests: [default: no-log-requests]--show-time-cost / --no-show-time-cost: [default: no-show-time-cost]--help: Show this message and exit.
scratchpad version¶
Print the version
Usage:
$ scratchpad version [OPTIONS]
Options:
--help: Show this message and exit.