🔵Instant provisioning

Dedicated inference as you go

Recommended models which are suitable for most use cases support 🔵Instant provisioning. For these models we provide an optimized provisioning workflow. Thanks to this, these models can be provisioned to you instantaneously, without any startup time!

from cortecs_py.client import Cortecs
from cortecs_py.integrations import DedicatedLLM

cortecs = Cortecs()

with DedicatedLLM(client=cortecs, model_id='neuralmagic--Meta-Llama-3.1-8B-Instruct-FP8') as llm:
    essay = llm.invoke('Write an essay about dynamic provisioning')
    print(essay.content)

Last updated