Quickstart

A zero-provisioning way to call LLMs: fast, flexible, and fully managed.

1. Register & Fund

Fill in your billing address on the profile page and press Save.
Enter your credit card details.
Top up your account to increase your balance.
Generate an API key for serverless inference.

If your balance reaches zero, your requests will fail. To avoid this, use Auto top-up to set an amount that is automatically transferred when your balance falls below a specified threshold.

2. Choose a Model

Browse the Model Catalog and select the model that fits your use case.

If you’d like to test model responses interactively, see the Playground chapter.

💡 Tip: The Playground is a great way to experiment with prompts and understand model behavior before making real API requests.

Once you’re confident with the model's performance, you can proceed to obtain your access token and start sending requests via the API.

3. Obtain Your Access Token

To make API calls, you need a valid API key. Generate your API key at the profile page if you haven't already done so.

4. Send Your First Request

Sky Inference supports OpenAI-compatible calls. Here's are some usage examples:

from openai import OpenAI

client = OpenAI(
  base_url="https://cortecs.ai/api/v1/models/serverless",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="<MODEL_NAME>",
  messages=[
    {
      "role": "user",
      "content": "Tell me a joke."
    }
  ],
  extra_body={
    "preference": "balanced"
  }
)

print(completion.choices[0].message.content)

import OpenAI from "openai";

const openai = new OpenAI({
    baseURL: 'https://cortecs.ai/api/v1/models/serverless',
    apiKey: '<API_KEY>'
});

const completion = await openai.chat.completions.create({
  model: "<MODEL_NAME>",
  messages: [
    {
      role: "user",
      content: "Tell me a joke.",
    }
  ],
  extra_body: {
    "preference": "balanced"
  }
});

console.log(completion.choices[0].message.content);

curl 'https://cortecs.ai/api/v1/models/serverless/chat/completions' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer <API_KEY>' \
    -d '{
      "model": "<MODEL_NAME>",
      "messages": [
        { "role": "user", "content": "Tell me a joke." }
      ],
      "preference": "balanced"
    }'

When using the OpenAI compatible wrapper, router specific parameters need to be passed inside the extra_body parameter (eg. preference and allowed_providers). Find out more about supported parameters here.

Tip: Set "preference" to "speed", "cost" or "balanced" to control routing behavior.

➡️ For more details on how routing and preferences work, check the next chapter.

PreviousIntroduction NextServerless Routing

Last updated 10 days ago