Docs
cortecs.aiDedicated ModelsServerless ModelsLogin
  • Introduction
  • DEDICATED INFERENCE
    • Quickstart
    • Provisioning API
      • Authentication
      • User
      • Instances
      • Models
      • Hardware Types
    • Python client
      • Objects
      • Langchain integration
    • Examples
      • Batch jobs
      • Realtime streams
  • SERVERLESS INFERENCE
    • Quickstart
    • About Serverless Routing
    • API
      • Chat Completions
      • Models
  • Discord
Powered by GitBook
On this page
  1. DEDICATED INFERENCE

Provisioning API

PreviousQuickstartNextAuthentication

Last updated 11 days ago

Cortecs lets you provision dedicated LLM instances in two ways:

  • Cortecs Web App – an intuitive UI for quick setup

  • Provisioning REST API – for automation, scripting, and integration into your workflows

  • Python Client – a thin wrapper around the REST API

This page covers the Provisioning API, which allows you to start and stop dedicated models programmatically.

The Provisioning API is used to manage model lifecycles (start/stop). For inference—sending prompts to your model—Cortecs uses vLLM’s OpenAI-compatible interface. Learn more in the and see practical examples in the section.

Why Use this API?

  • Automate resource allocation

  • Reduce cost by shutting down unused instances

Refer to the following sections for authentication, endpoint URLs, and request examples.

vLLM guide
examples