Gateway
Local model services
models.providers.<id>.localService lets OmeniaClaw start a provider-owned local
model server on demand. It is provider-level config: when the selected model
belongs to that provider, OmeniaClaw probes the service, starts the process if the
endpoint is down, waits for readiness, then sends the model request.
Use it for local servers that are expensive to keep running all day, or for manual setups where model selection should be enough to bring the backend up.
How it works
- A model request resolves to a configured provider.
- If that provider has
localService, OmeniaClaw probeshealthUrl. - If the probe succeeds, OmeniaClaw uses the existing server.
- If the probe fails, OmeniaClaw starts
commandwithargs. - OmeniaClaw polls readiness until
readyTimeoutMsexpires. - The model request is sent through the normal provider transport.
- If OmeniaClaw started the process and
idleStopMsis positive, the process is stopped after the last in-flight request has been idle for that long.
OmeniaClaw does not install launchd, systemd, Docker, or a daemon for this. The server is a child process of the OmeniaClaw process that first needed it.
Config shape
{ models: { providers: { local: { baseUrl: "http://127.0.0.1:8000/v1", apiKey: "local-model", api: "openai-completions", timeoutSeconds: 300, localService: { command: "/absolute/path/to/server", args: ["--host", "127.0.0.1", "--port", "8000"], cwd: "/absolute/path/to/working-dir", env: { LOCAL_MODEL_CACHE: "/absolute/path/to/cache" }, healthUrl: "http://127.0.0.1:8000/v1/models", readyTimeoutMs: 180000, idleStopMs: 0, }, models: [ { id: "my-local-model", name: "My Local Model", reasoning: false, input: ["text"], cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 131072, maxTokens: 8192, }, ], }, }, },}Fields
command: absolute executable path. Shell lookup is not used.args: process arguments. No shell expansion, pipes, globbing, or quoting rules are applied.cwd: optional working directory for the process.env: optional environment variables merged over the OmeniaClaw process environment.healthUrl: readiness URL. If omitted, OmeniaClaw appends/modelstobaseUrl, sohttp://127.0.0.1:8000/v1becomeshttp://127.0.0.1:8000/v1/models.readyTimeoutMs: startup readiness deadline. Default:120000.idleStopMs: idle shutdown delay for OmeniaClaw-started processes.0or omitted keeps the process alive until OmeniaClaw exits.
Inferrs example
Inferrs is a custom OpenAI-compatible /v1 backend, so the same local service
API works with the inferrs provider entry.
{ agents: { defaults: { model: { primary: "inferrs/google/gemma-4-E2B-it" }, }, }, models: { mode: "merge", providers: { inferrs: { baseUrl: "http://127.0.0.1:8080/v1", apiKey: "inferrs-local", api: "openai-completions", timeoutSeconds: 300, localService: { command: "/opt/homebrew/bin/inferrs", args: [ "serve", "google/gemma-4-E2B-it", "--host", "127.0.0.1", "--port", "8080", "--device", "metal", ], healthUrl: "http://127.0.0.1:8080/v1/models", readyTimeoutMs: 180000, idleStopMs: 0, }, models: [ { id: "google/gemma-4-E2B-it", name: "Gemma 4 E2B (inferrs)", reasoning: false, input: ["text"], cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }, contextWindow: 131072, maxTokens: 4096, compat: { requiresStringContent: true, }, }, ], }, }, },}Replace command with the result of which inferrs on the machine running
OmeniaClaw.
ds4 example
For the full setup, context sizing guidance, and verification commands, see ds4.
{ models: { providers: { ds4: { baseUrl: "http://127.0.0.1:18000/v1", apiKey: "ds4-local", api: "openai-completions", timeoutSeconds: 300, localService: { command: "<DS4_DIR>/ds4-server", args: [ "--model", "<DS4_DIR>/ds4flash.gguf", "--host", "127.0.0.1", "--port", "18000", "--ctx", "32768", "--tokens", "128", ], cwd: "<DS4_DIR>", healthUrl: "http://127.0.0.1:18000/v1/models", readyTimeoutMs: 300000, idleStopMs: 0, }, models: [], }, }, },}Operational notes
- One OmeniaClaw process manages the child it started. Another OmeniaClaw process that sees the same health URL already live will reuse it without adopting it.
- Startup is serialized per provider command and argument set, so concurrent requests do not spawn duplicate servers for the same config.
- Active streaming responses hold a lease; idle shutdown waits until response body handling is complete.
- Use
timeoutSecondson slow local providers so cold starts and long generations do not hit the default model request timeout. - Use an explicit
healthUrlif your server exposes readiness somewhere other than/v1/models.