Docs
cortecs.aiDedicated ModelsServerless ModelsLogin
  • Introduction
  • DEDICATED INFERENCE
    • Quickstart
    • Provisioning API
      • Authentication
      • User
      • Instances
      • Models
      • Hardware Types
    • Python client
      • Objects
      • Langchain integration
    • Examples
      • Batch jobs
      • Realtime streams
  • SERVERLESS INFERENCE
    • Quickstart
    • Serverless Routing
    • Playground
    • API Overview
      • Chat Completions
      • Embeddings
      • Models
  • Discord
Powered by GitBook
On this page
  1. DEDICATED INFERENCE

Provisioning API

Cortecs gives you flexible options to provision dedicated LLM instances based on your needs:

  • Web App: An intuitive UI for quick setup

  • Provisioning REST API: Full automation for seamless workflow integration

  • Python Client: A lightweight wrapper around the Provisioning API for easier scripting

📦 The Provisioning API allows you to programmatically start and stop dedicated models, giving you control over your resource usage directly from your applications.

For inference (sending prompts to your model), Cortecs uses vLLM’s OpenAI-compatible interface. Learn more in the vLLM guide and see practical examples in the examples section.

✅ Why Use this API?

  • Automate resource management: Start and stop models as part of your workflows.

  • Optimize costs: Shut down unused instances to avoid unnecessary charges.

  • Seamless integration: Works easily with your backend systems and pipelines.

👉 Ready to get started?

The following sections provide everything you need to authenticate, connect to the API, and send requests.

PreviousQuickstartNextAuthentication

Last updated 7 days ago