Quick Start

Anyway provides an OpenAI-compatible API server endpoint via the anyd command. To deploy a model across multiple machines: run anyd on each machine, ensure all machines can reach each other on the network, and Anyway handles orchestration automatically. The command follows the structure below.

Basic Command Structure

anyd --model=<path/to/model.gguf> --model-ctx=<N> --oapi=<OAPI_port> [<node_addr2> <node_addr3> ...]

Example with a Three-Node Cluster

Node #0: 192.168.0.0
anyd --model=gpt-oss-120b-Q4_K_M-00001-of-00002.gguf --model-ctx=4096 --oapi=8080 192.168.0.1 192.168.0.2
Node #1: 192.168.0.1
anyd --model=gpt-oss-120b-Q4_K_M-00001-of-00002.gguf --model-ctx=4096 --oapi=8080 192.168.0.0 192.168.0.2
Node #2: 192.168.0.2
anyd --model=gpt-oss-120b-Q4_K_M-00001-of-00002.gguf --model-ctx=4096 --oapi=8080 192.168.0.0 192.168.0.1
⚠️ Important: Each node must list all other nodes in the cluster, but not its own IP address. Ensure all nodes can communicate with each other on the network and have the same model files available locally.

Command Parameters

Parameter Description Example
--model=FILE Path to your GGUF model file. Anyway supports Hugging Face models in GGUF format at every quantization level (you can find a lot of them here). If the model is stored in mutliple gguf files, only provide the first one. gpt-oss-120b-Q4_K_M-00001-of-00002.gguf
or
gpt-oss-20b-Q4_K_M.gguf
--model-ctx=LENGTH Context window size (number of tokens). Larger values allow longer conversations but require more memory. 4096
--oapi=[IP4:]PORT OpenAI-compatible API endpoint port. Your applications connect to this port for AI inference.
(default:0.0.0.0:PORT)
8080
or
127.0.0.1:8080
--log[=FILE]
(optionnal)
Print the logs on stderr or in FILE if specified --log
or
--log=logs.txt
--peer=[IP4:]PORT
(optionnal)
Listen for other peers connections on IP4:PORT
(default: 0.0.0.0:13060)
--peer=13060
or
--peer=localhost:13060
--mem=SIZE
(optionnal)
Use at most SIZE bytes of VRAM
(default: max capacity of VRAM)
--mem=3G
or
--mem=400M
[node addresses] IP addresses of other nodes in your cluster. List all other nodes but exclude the current node's own address. 192.168.0.1 192.168.0.2
💡 Pro Tip: Start with a smaller context window (2048-4096) and increase it based on your actual usage patterns and available memory.

OpenAI-Compatible API Endpoints

Anyway provides an OpenAI-compatible API that allows you to integrate with existing applications and tools designed for OpenAI's API. The following endpoints are supported:

Endpoint Description Supported Parameters
v1/models Lists available models deployed on your Anyway cluster. Returns information about the model currently running on your nodes. All mandatory parameters
v1/chat/completions Generates chat completions for conversational AI applications. Accepts a series of messages and returns the model's response in a chat format. All mandatory parameters
+ max_completion_tokens
+ stream
v1/embeddings Generates vector embeddings for text input. Useful for semantic search, similarity comparison, and other NLP tasks that require dense vector representations. All mandatory parameters

For endpoint specifications, refer to the official OpenAI API documentation.

Python Example using OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://192.168.0.0:8080/v1",
    api_key="not_needed"
)

response = client.chat.completions.create(
    model="gpt-oss-120b-Q4_K_M",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)

cURL Request Example

curl -L http://192.168.0.0:8080/v1/chat/completions \
          -H "Content-Type: application/json" \
          -d '{
            "model": "gpt-oss-120b-Q4_K_M",
            "messages": [{"role": "user", "content": "Hello!"}],
            "max_tokens": 100, 
            "stream": true
          }'
⚠️ Important: Each node exposes its own OpenAI API endpoint, however your client must be configured to follow HTTP redirects.
For instance, for curl, use the -L flag. For the OpenAI Python API, redirects are already handled automatically.

Error Handling

Anyway follows the standard OpenAI API error response protocol with the following additionnal informations.

Additional finish_reason types for v1/chat/completions:

system_again

Indicates that the request should be retried. This typically occurs when a node crashes and Anyway transparently performs failover and load balancing to ensure that the next request succeeds.

system_mem

Indicates insufficient memory. This occurs when Anyway attempts to load the requested model, but the combined available memory across all nodes is insufficient to load the model and its required context simultaneously.

system_err

A general system error occurred.

Need Help?

Email: contact@anyway.dev
Documentation: Additional resources and advanced configuration options available upon request