Coaxiom API — Real-Time Inference Pricing for Developers

Quick Start

60 seconds to your first response

No signup required for the free tier. Just call the endpoint.

curl

# Get live prices for Llama 3.1 70B across all providers
curl "https://coaxiom.io/api/v1/prices?model=llama-3.1-70b"

      JSON response
    

      {
  "model": "llama-3.1-70b",
  "providers": [
    {
      "provider":         "Groq",
      "provider_slug":   "groq",
      "input_per_1m":    0.59,
      "output_per_1m":   0.79,
      "blended_per_1m":  0.65,
      "buy_url":         "https://console.groq.com/?utm_source=coaxiom"
    },
    {
      "provider":         "DeepInfra",
      "provider_slug":   "deepinfra",
      "input_per_1m":    0.30,
      "output_per_1m":   0.40,
      "blended_per_1m":  0.33,
      "buy_url":         "https://deepinfra.com/?utm_source=coaxiom"
    }
    // ... 13 more providers
  ],
  "cheapest":  "DeepInfra",
  "last_updated": "2026-05-12T14:22:10Z"
}
    

      python
      
    

      import requests

# Free tier — no auth required
r = requests.get(
    "https://coaxiom.io/api/v1/prices",
    params={"model": "llama-3.1-70b"}
)
data = r.json()

# Find cheapest provider
cheapest = min(data["providers"], key=lambda p: p["blended_per_1m"])
print(f"Cheapest: {cheapest['provider']} at ${cheapest['blended_per_1m']:.4f}/1M")

# With API key (Developer+ tier) — historical data
r2 = requests.get(
    "https://coaxiom.io/api/v1/history",
    params={"model": "llama-3.1-70b", "provider": "groq", "days": 30},
    headers={"Authorization": "Bearer cxm_your_key_here"}
)
history = r2.json()["snapshots"]  # list of OHLC candles
    

      javascript / node
      
    

      // Free tier — no auth
const r = await fetch("https://coaxiom.io/api/v1/prices?model=llama-3.1-70b");
const { providers } = await r.json();

// Sort by blended cost
const sorted = [...providers].sort((a, b) => a.blended_per_1m - b.blended_per_1m);
console.log(`Cheapest: ${sorted[0].provider} @ $${sorted[0].blended_per_1m}/1M`);

// Developer+ — historical prices
const hist = await fetch("https://coaxiom.io/api/v1/history?model=llama-3.1-70b&provider=groq&days=30", {
  headers: { Authorization: "Bearer cxm_your_key_here" }
});
const { snapshots } = await hist.json(); // OHLC candles
    

API Reference

Endpoints

Base URL: https://coaxiom.io/api/v1/ Auth: Authorization: Bearer cxm_...

    
      
          Method
          Endpoint
          Description
          Auth
        

      
          GET
          /prices
          Live prices across providers. ?model=llama-3.1-70b
          Free
        

          GET
          /providers
          All tracked providers + metadata
          Free
        

          GET
          /models
          All tracked models
          Free
        

          GET
          /history
          OHLC price history. ?model=&provider=&days=30
          API Key
        

          GET
          /compare
          Cross-model comparison. ?models=llama-3.1-70b,gpt-4o
          API Key
        

          GET
          /usage
          Your API key quota + usage stats
          API Key
        

          GET
          /news
          AI infrastructure news + market signals
          Free
        

    
  

Method	Endpoint	Description	Auth
GET	/prices	Live prices across providers. `?model=llama-3.1-70b`	Free
GET	/providers	All tracked providers + metadata	Free
GET	/models	All tracked models	Free
GET	/history	OHLC price history. `?model=&provider=&days=30`	API Key
GET	/compare	Cross-model comparison. `?models=llama-3.1-70b,gpt-4o`	API Key
GET	/usage	Your API key quota + usage stats	API Key
GET	/news	AI infrastructure news + market signals	Free

Pricing

Simple, usage-based tiers

Start free. Upgrade when you need history, webhooks, or higher volume.

Free

$0

forever

100 requests / day

Live prices endpoint
All providers + models
JSON responses
No API key required
No history endpoint
No compare endpoint

Start for free →

⚡ Developer

$19

/ month

10,000 requests / month

Everything in Free
Historical OHLC data (30 days)
Cross-model compare endpoint
Usage stats endpoint
1 price alert webhook
Email support

Get API key →

Team

$79

/ month

100,000 requests / month

Everything in Developer
Full price history (90 days)
5 webhook endpoints
Team API key management
Provider click tracking
Priority support

Get API key →

Enterprise

Custom

contact us

Unlimited requests

Everything in Team
Full history (unlimited)
Kafka streaming feed
Snowflake / BigQuery connector
Unlimited webhooks
SLA + dedicated support

Contact us →

Rate Limits

Limits by tier

All limits are rolling monthly windows. Headers included in every response.

    
      
          Feature
          Free
          Developer
          Team
          Enterprise
        

      
          Request limit
          100/day
          10K/mo
          100K/mo
          Unlimited
        

          Live prices /prices
          ✓
          ✓
          ✓
          ✓
        

          Historical data /history
          —
          30 days
          90 days
          Unlimited
        

          Cross-model compare
          —
          ✓
          ✓
          ✓
        

          Webhooks (price alerts)
          —
          1
          5
          Unlimited
        

          Kafka streaming feed
          —
          —
          —
          ✓
        

          Snowflake / BigQuery
          —
          —
          —
          ✓
        

          Response headers
          X-RateLimit-Limit · X-RateLimit-Remaining · X-RateLimit-Reset
        

    
  

Feature	Free	Developer	Team	Enterprise
Request limit	100/day	10K/mo	100K/mo	Unlimited
Live prices `/prices`	✓	✓	✓	✓
Historical data `/history`	—	30 days	90 days	Unlimited
Cross-model compare	—	✓	✓	✓
Webhooks (price alerts)	—	1	5	Unlimited
Kafka streaming feed	—	—	—	✓
Snowflake / BigQuery	—	—	—	✓
Response headers	X-RateLimit-Limit · X-RateLimit-Remaining · X-RateLimit-Reset

FAQ

Common questions

Where does the pricing data come from?

Price data is coming soon. Endpoints currently return a coming_soon status.

How fresh is the data?

Live prices endpoint is refreshed every 5 minutes from upstream. Historical OHLC snapshots are taken every hour. If a provider changes prices, you'll see it within 5 minutes on the live endpoint and within 1 hour in the time series.

What's the "blended" rate?

Blended rate is a weighted average of input and output pricing: (input_per_1m × 0.4) + (output_per_1m × 0.6). Most real-world workloads generate more output tokens than input tokens, so this weighting reflects typical usage. You can always use raw input/output rates for exact cost modeling.

Do you support webhooks for price alerts?

Webhooks are on Developer and Team plans (Sprint 8, coming shortly). You'll be able to register an endpoint and receive a POST when a specific model/provider price changes by more than a threshold you set. HMAC-SHA256 signed on every delivery.

Can I use this in production?

Yes. The free tier is intentionally useful — 100 requests/day is enough for a cost monitor that checks hourly. For production systems doing continuous monitoring or powering dashboards, Developer or Team is the right fit. We don't throttle burst requests — limits are rolling monthly totals.

Is there a Python or Node SDK?

Coming in Sprint 8: pip install coaxiom and npm install @coaxiom/sdk. The API is simple enough that a direct fetch/requests call works today — the SDKs add retry logic, rate limit handling, and typed responses.

What is Llama 3.1 70B and why is it the benchmark?

Llama 3.1 70B is Coaxiom's benchmark model — every major provider hosts it, making it the only model with truly comparable pricing across all providers. It's the "benchmark barrel" of the inference market, like WTI crude for oil pricing. When we say a provider is "cheapest," we mean cheapest for this model specifically.

Inference pricing.
In your code.

60 seconds to your first response

Endpoints

Simple, usage-based tiers

Limits by tier

Common questions

Start in 60 seconds

Inference pricing.In your code.

60 seconds to your first response

Endpoints

Simple, usage-based tiers

Limits by tier

Common questions

Start in 60 seconds

Inference pricing.
In your code.