Docs
cortecs.aiDedicated ModelsServerless ModelsLogin
  • Introduction
  • DEDICATED INFERENCE
    • Quickstart
    • Provisioning API
      • Authentication
      • User
      • Instances
      • Models
      • Hardware Types
    • Python client
      • Objects
      • Langchain integration
    • Examples
      • Batch jobs
      • Realtime streams
  • SERVERLESS INFERENCE
    • Quickstart
    • Serverless Routing
    • Playground
    • API Overview
      • Chat Completions
      • Embeddings
      • Models
  • Discord
Powered by GitBook
On this page
  1. SERVERLESS INFERENCE

API Overview

Cortecs provides an OpenAI-compatible API that makes it simple to run serverless inference across multiple providers with no infrastructure setup.

The API supports three main capabilities:

🔁 Chat Completions : POST /v1/chat/completions

Send chat requests using any available model. Supports standard OpenAI parameters like messages, temperature, and max_tokens.

Use the preference parameter to optimize routing for speed, cost, or balanced.

🧩 Embeddings : POST /v1/embeddings

Generate text embeddings using supported models. The API is compatible with OpenAI’s embedding request format and can be routed across multiple providers using the same preference-based selection.

📦 Model Listing : GET /v1/models

Retrieve the full list of available models, including their supported features, costs, context sizes, and providers.

PreviousPlaygroundNextChat Completions

Last updated 6 days ago