Sky inference
cortecs.aiDedicated ModelsServerless ModelsLogin
  • Introduction
  • DEDICATED INFERENCE
    • Quickstart
    • Provisioning API
      • Authentication
      • User
      • Instances
      • Models
      • Hardware Types
    • Python client
      • Objects
      • Langchain integration
    • Examples
      • Batch jobs
      • Realtime streams
  • SERVERLESS INFERENCE
    • Quickstart
    • About Serverless Routing
    • API
      • Chat Completions
      • Models
  • Discord
Powered by GitBook
On this page
  • Why Sky Inference?
  • Two Modes of Deployment
  • Serverless Inference
  • Dedicated Inference

Introduction

Run language models on Europe's cloud.

NextQuickstart

Last updated 5 hours ago

Our Sky Inference is an AI inference platform built on the principles of : a unified, resilient, and scalable approach to compute that spans across multiple cloud providers. Instead of being tied to a single vendor, Sky Inference dynamically routes AI workloads across a network of GDPR-compliant European providers, automatically optimizing for performance, cost, and availability.

Why Sky Inference?

Traditional cloud AI services often suffer from vendor lock-in, regional outages, and unpredictable pricing. Sky computing changes this by treating the cloud as a global utility. With Sky Inference, you benefit from:

  • High availability: High uptime with seamless failover between providers

  • Sovereign infrastructure: Data protection compliant with GDPR and ISO standards

  • Optimal performance: Dynamic routing to the fastest or most cost-efficient endpoint

  • Simplified development: One API, no infrastructure management

Two Modes of Deployment

Sky Inference supports two complementary modes, depending on your needs:

Serverless Inference

Ideal for most applications, our serverless mode lets you send requests without provisioning anything. Cortecs automatically filters and ranks available providers based on your preferences, speed, cost, or a balanced strategy and routes your request accordingly. If a provider goes down, the system reroutes your call in real-time to the next best option.

Dedicated Inference

For high-throughput workloads or strict latency requirements, dedicated mode provisions a private model instance just for you. You can start and stop your deployment via API, and get unlimited calls for a flat fee. This is suitable for batch jobs or custom models from Hugging Face.

Next steps

Resources

Register at

Follow the for Serverless Inference

cortecs.ai
quick start
Discord
Support
Privacy policy
Sky computing