Python client

Start, manage and stop instances

Cortecs-py is a ligthweight Python wrapper of our REST API. It provides you with the necessary tools to dynamically manage your instances directly from your workflow.

Setup

In order to use the API you need to create your access credentials in your profile page first. Before accessing the API make sure your environment variables are set:

OPENAI_API_KEY -> use your cortecs api key
CORTECS_CLIENT_ID
CORTECS_CLIENT_SECRET

Methods Overview

The client helps you start, manage and stop your models.

Method

Description

Return

start

Starts an instance.

An Instance object*.

restart

Restarts a stopped instance by the given instance_id.

An Instance object*.

ensure_instance

If an instance with the same InstanceArgs is running it returns that one; if it's stopped, it restarts it; otherwise it starts a new instance.

An Instance object*.

poll_instance

Polls an instance until it is running.

An Instance object.

get_instance

Retrieves an Instance by the instance_id.

An Instance object.

get_instance_status

Retrieves only the InstanceStatus by the instance_id.

An InstanceStatus object.

get_all_instances

Retrieves a list of all instances (both running and stopped).

A list of Instance objects.

get_running_instances

Retrieves a list of all running instances.

A list of Instance objects.

stop

Stops an instance by its instance_id.

Instance object of the stopped instance.

stop_all

Stops all running instances.

A list of Instance objects of the stopped instances.

delete

Deletes an instance by its instance_id. The instance must first be stopped to be deleted.

The instance_id of the deleted instance.

delete_all

Deletes all instances. They must first be stopped to be deleted.

A list of instance_ids of the deleted instances.

*If not using poll=True, the Instance object won't be complete. For more information visit the Objects page.

Additionally, the client can be used to retrieve information about models and hardware types.

Method

Description

Return

get_all_models

Retrieve a list of all supported Models.

A list of Model objects.

get_all_hardware_types

Retrieve a list of all supported HardwareTypes.

A list of HardwareType objects.

get_available_hardware_types

Retrieve a list of the HardawareTypes which are currently available.

A list of HardwareType objects.

Starting instances

The client offers several methods for starting an instance: start, ensure_instance, and restart. Given that model startup times can take up to a few minutes (unless using Instant Provisioning), users have the option to wait for the instance to become ready by setting the poll argument to True. Alternatively, users can set the poll argument to False and use the poll_instance method separately for more control.

`start`

Start an instance with the given instance arguments. It accepts the same arguments as ensure_instance.

`ensure_instance`

Checks if an instance with the same arguments is already running, in which case that one is returned. If there is an equivalent pending instance, that one is returned. If there is an equivalent stopped instance, it's restarted and returned. Otherwise, a new instance with the given arguments is started.

Both start and ensure_instance accept the following arguments:

Parameters

Description

Default

model_name: str

The model name (equivalent to HuggingFace name, eg. neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8').

Required

hardware_type_id: str

The id of the HardwareType to use, eg.NVIDIA_L40S_1.

The recommended hardware configuration

context_length: int

The maximum context length the model should be initialized with. A larger context length slows down inference, so it's good practice to limit it according to your use case.

32k tokens or the maximum context length of the corresponding hardware configuration (if it is smaller than 32k)

billing_interval: str

The interval in which the instance should be billed. Can be per_minute or per_hour.

per_minute

num_workers: int

The number of workers to start.*

1

poll: bool

If true, the method will wait until the Instance object is fully available. Otherwise it returns the partial Instance object with some fields set to null.

True

*A worker is a processing unit within an instance. Each worker with hardware_type_id NVIDIA_L40S_2 has 2 GPUs, so an instance with num_workers=2 contains 4 GPUs in total and is billed accordingly.

`restart`

Restart an instance that has already been started and stopped by its instance_id.

Parameters

Description

Default

instance_id: str

The id of the instance.

Required

poll: bool

If true, the method will wait until the Instance object is fully available. Otherwise it returns the partial Instance object with some fields set to null.

True

`poll_instance`

Poll an instance until it is running.

Parameters

Description

Default

instance_id: str

The id of the instance.

Required

poll_interval: int

The interval in seconds between each status check.

5

max_retries: int

The maximum number of retries before raising an error.

150

Example

from langchain_openai import ChatOpenAI
from cortecs_py import Cortecs

model = 'cortecs/phi-4-FP8-Dynamic'
cortecs = Cortecs()

def do_some_work(instance):
    llm = ChatOpenAI(model_name=model, base_url=instance.base_url)
    joke = llm.invoke('Write a joke about LLMs.')
    print(joke.content)

#Start a new instance
my_instance = cortecs.start(model)
do_some_work(my_instance)

#If there is an existing instance with my arguments, use that one
my_instance = cortecs.ensure_instance(model)
do_some_work(my_instance)
cortecs.stop(my_instance.instance_id)

#More work requiring the same instance came up
my_instance = cortecs.restart(my_instance.instance_id)

Managing instances

`get_instance`

Get an instance by its id.

Parameters

Description

Default

instance_id: str

The id of the instance.

Required

`get_instance_status`

Get the instance status by its id.

Parameters

Description

Default

instance_id: str

The id of the instance.

Required

`get_all_instances`

Get all instances.

`get_running_instances`

Get running instances.

Example

from cortecs_py import Cortecs

cortecs = Cortecs()

# All instances
all_instances = cortecs.get_all_instances()
for instance in all_instances:
    print(instance.instance_id, instance.instance_status.status)

# Running instances
running_instances = cortecs.get_running_instances()
for instance in running_instances:
    print(instance.instance_id, instance.instance_status.status)

# Info about a specific instance
my_instance = all_instances[0]
instance = cortecs.get_instance(my_instance.instance_id)
instance_status = cortecs.get_instance_status(my_instance.instance_id)

Stopping instances

Stopping an instance lets the user halt an instance as soon as a job is complete, avoiding additional costs while preserving the instance setup for future convenience.

`stop`

Stop a specific instance by its id.

Parameters

Description

Default

instance_id: str

The id of the instance.

Required

`stop_all`

Stop all running instances.

Example

from langchain_openai import ChatOpenAI
from cortecs_py import Cortecs

model = 'cortecs/phi-4-FP8-Dynamic'
cortecs = Cortecs()

def do_some_work(instance):
    llm = ChatOpenAI(model_name=model, base_url=instance.base_url)
    joke = llm.invoke('Write a joke about LLMs.')
    print(joke.content)

# Start an instance and do some work
instance = cortecs.start(model)
do_some_work(instance)

# When the work is done, stop the instance.
cortecs.stop(instance_id)

# Alternatively stop all instances
cortecs.stop_all()

Deleting instances

When a setup is no longer needed, the instance can be deleted from the user's console. Note that

`delete`

Delete a stopped instance. Note that the instance must be stopped to be deleted, otherwise the method produces an error.

Parameters

Description

Default

instance_id: str

The id of the instance.

Required

`delete_all`

Delete all instances.

Parameters

Description

Default

force: bool

If set to true all instances will be deleted, regardless of their status. Otherwise, only stopped instances will be deleted.

False

Example

from langchain_openai import ChatOpenAI
from cortecs_py import Cortecs

model = 'cortecs/phi-4-FP8-Dynamic'
cortecs = Cortecs()

def do_some_work(instance):
    llm = ChatOpenAI(model_name=model, base_url=instance.base_url)
    joke = llm.invoke('Write a joke about LLMs.')
    print(joke.content)
    
instance = cortecs.start(model)
do_some_work(instance)

cortecs.stop(instance_id) # Stop the instance
cortecs.delete(instance_id) # Delete the instance

# Alternatively stop and delete all instances
cortecs.delete_all(force=True)

PreviousHardware Types NextObjects

Last updated 4 months ago