Anyway Systems

Quick Start

Anyway provides an OpenAI-compatible API server endpoint via the anyd command. To deploy a model across multiple machines: run anyd on each machine, ensure all machines can reach each other on the network, and Anyway handles orchestration automatically. The command follows the structure below.

Basic Command Structure

anyd --model=<path/to/model.gguf> --model-ctx=<N> --oapi=<OAPI_port> [<node_addr2> <node_addr3> ...]

Example with a Three-Node Cluster

Node #0: 192.168.0.0

anyd --model=gpt-oss-120b-Q4_K_M-00001-of-00002.gguf --model-ctx=4096 --oapi=8080 192.168.0.1 192.168.0.2

Node #1: 192.168.0.1

anyd --model=gpt-oss-120b-Q4_K_M-00001-of-00002.gguf --model-ctx=4096 --oapi=8080 192.168.0.0 192.168.0.2

Node #2: 192.168.0.2

anyd --model=gpt-oss-120b-Q4_K_M-00001-of-00002.gguf --model-ctx=4096 --oapi=8080 192.168.0.0 192.168.0.1

⚠️ Important: Each node must list all other nodes in the cluster, but not its own IP address. Ensure all nodes can communicate with each other on the network and have the same model files available locally.

Command Parameters

Parameter	Description	Example
`--model=FILE`	Path to your GGUF model file. Anyway supports Hugging Face models in GGUF format at every quantization level (you can find a lot of them here). If the model is stored in mutliple gguf files, only provide the first one.	`gpt-oss-120b-Q4_K_M-00001-of-00002.gguf` or `gpt-oss-20b-Q4_K_M.gguf`
`--model-ctx=LENGTH`	Context window size (number of tokens). Larger values allow longer conversations but require more memory.	`4096`
`--oapi=[IP4:]PORT`	OpenAI-compatible API endpoint port. Your applications connect to this port for AI inference. (default:`0.0.0.0:PORT`)	`8080` or `127.0.0.1:8080`
`--log[=FILE]` (optional)	Print the logs on `stderr` or in FILE if specified	`--log` or `--log=logs.txt`
`--peer=[IP4:]PORT` (optional)	Listen for other peers connections on IP4:PORT (default: `0.0.0.0:13060`)	`--peer=13060` or `--peer=localhost:13060`
`--mem=SIZE` (optional)	Use at most SIZE bytes of VRAM (default: max capacity of VRAM)	`--mem=3G` or `--mem=400M`
`[node addresses]`	IP addresses of other nodes in your cluster. List all other nodes but exclude the current node's own address.	`192.168.0.1 192.168.0.2`

💡 Pro Tip: Start with a smaller context window (2048-4096) and increase it based on your actual usage patterns and available memory.

OpenAI-Compatible API Endpoints

Anyway provides an OpenAI-compatible API that allows you to integrate with existing applications and tools designed for OpenAI's API. The following endpoints are supported:

Endpoint	Description	Supported Parameters
`v1/models`	Lists available models deployed on your Anyway cluster. Returns information about the model currently running on your nodes.	All mandatory parameters
`v1/chat/completions`	Generates chat completions for conversational AI applications. Accepts a series of messages and returns the model's response in a chat format.	All mandatory parameters + `max_completion_tokens` + `stream`
`v1/embeddings`	Generates vector embeddings for text input. Useful for semantic search, similarity comparison, and other NLP tasks that require dense vector representations.	All mandatory parameters

For endpoint specifications, refer to the official OpenAI API documentation.

Python Example using OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://192.168.0.0:8080/v1",
    api_key="not_needed"
)

response = client.chat.completions.create(
    model="gpt-oss-120b-Q4_K_M",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)

cURL Request Example

curl -L http://192.168.0.0:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-120b-Q4_K_M",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100, 
    "stream": true
  }'

⚠️ Important: Each node exposes its own OpenAI API endpoint, however your client must be configured to follow HTTP redirects.
For instance, for curl, use the -L flag. For the OpenAI Python API, redirects are already handled automatically.

Error Handling

Anyway follows the standard OpenAI API error response protocol with the following additional information.

Additional `finish_reason` types for `v1/chat/completions`:

system_again

Indicates that the request should be retried. This typically occurs when a node crashes and Anyway transparently performs failover and load balancing to ensure that the next request succeeds.

system_mem

Indicates insufficient memory. This occurs when Anyway attempts to load the requested model, but the combined available memory across all nodes is insufficient to load the model and its required context simultaneously.

system_err

A general system error occurred.

Need Help?

Email: contact@anyway.dev
Documentation: Additional resources and advanced configuration options available upon request