# Python client

**Cortecs-py** is a ligthweight Python wrapper of our [REST API](https://docs.cortecs.ai/dedicated-inference). It provides you with the necessary tools to dynamically manage your instances directly from your workflow.

#### Setup

In order to use the API you need to create your access credentials in your profile page first. Before accessing the API make sure your environment variables are set:

* `OPENAI_API_KEY` -> use your **cortecs** api key
* `CORTECS_CLIENT_ID`
* `CORTECS_CLIENT_SECRET`

## Methods Overview

The client helps you start, manage and stop your models.

| Method                                            | Description                                                                                                                                      | Return                                                 |
| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------ |
| [start](#start)                                   | Starts an instance.                                                                                                                              | An `Instance` object\*.                                |
| [restart](#restart)                               | Restarts a stopped instance by the given `instance_id`.                                                                                          | An `Instance` object\*.                                |
| [ensure\_instance](#ensure_instance)              | If an instance with the same `InstanceArgs` is running it returns that one; if it's stopped, it restarts it; otherwise it starts a new instance. | An `Instance` object\*.                                |
| [poll\_instance](#poll_instance)                  | Polls an instance until it is running.                                                                                                           | An `Instance` object.                                  |
| [get\_instance](#get_instance)                    | Retrieves an `Instance` by the `instance_id`.                                                                                                    | An `Instance` object.                                  |
| [get\_instance\_status](#get_instance_status)     | Retrieves only the `InstanceStatus` by the `instance_id`.                                                                                        | An `InstanceStatus` object.                            |
| [get\_all\_instances](#get_all_instances)         | Retrieves a list of all instances (both running and stopped).                                                                                    | A list of `Instance` objects.                          |
| [get\_running\_instances](#get_running_instances) | Retrieves a list of all running instances.                                                                                                       | A list of `Instance` objects.                          |
| [stop](#stop)                                     | Stops an instance by its `instance_id`.                                                                                                          | `Instance` object of the stopped instance.             |
| [stop\_all](#stop_all)                            | Stops all running instances.                                                                                                                     | A list of `Instance` objects of the stopped instances. |
| [delete](#delete)                                 | Deletes an instance by its `instance_id`. The instance must first be stopped to be deleted.                                                      | The `instance_id` of the deleted instance.             |
| [delete\_all](#delete_all)                        | Deletes all instances. They must first be stopped to be deleted.                                                                                 | A list of `instance_id`s of the deleted instances.     |

\*If not using poll=True, the Instance object won't be complete. For more information visit the [Objects ](https://docs.cortecs.ai/dedicated-inference/python-client/objects)page.

Additionally, the client can be used to retrieve information about models and hardware types.

| Method                          | Description                                                            | Return                            |
| ------------------------------- | ---------------------------------------------------------------------- | --------------------------------- |
| get\_all\_models                | Retrieve a list of all supported `Model`s.                             | A list of `Model` objects.        |
| get\_all\_hardware\_types       | Retrieve a list of all supported `HardwareType`s.                      | A list of `HardwareType` objects. |
| get\_available\_hardware\_types | Retrieve a list of the `HardawareType`s which are currently available. | A list of `HardwareType` objects. |

## Starting instances

The client offers several methods for starting an instance: `start`, `ensure_instance`, and `restart`. Given that model startup times can take up to a few minutes (unless using [Instant Provisioning](https://docs.cortecs.ai/dedicated-inference/broken-reference)), users have the option to wait for the instance to become ready by setting the `poll` argument to `True`. Alternatively, users can set the `poll` argument to `False` and use the `poll_instance` method separately for more control.

### `start`

Start an instance with the given instance arguments. It accepts the same arguments as [`ensure_instance`](#ensure_instance).

### `ensure_instance`

Checks if an instance with the same arguments is already running, in which case that one is returned. If there is an equivalent pending instance, that one is returned. If there is an equivalent stopped instance, it's restarted and returned. Otherwise, a new instance with the given arguments is started.

Both `start` and `ensure_instance` accept the following arguments:

| Parameters              | Description                                                                                                                                                                  | Default                                                                                                          |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| `model_name`: str       | The model name (equivalent to HuggingFace name, eg. `neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8'`).                                                                          | Required                                                                                                         |
| `hardware_type_id`: str | The id of the HardwareType to use, eg.`NVIDIA_L40S_1`.                                                                                                                       | The recommended hardware configuration                                                                           |
| `context_length`: int   | The maximum context length the model should be initialized with. A larger context length slows down inference, so it's good practice to limit it according to your use case. | 32k tokens or the maximum context length of the corresponding hardware configuration (if it is smaller than 32k) |
| `billing_interval`: str | The interval in which the instance should be billed. Can be `per_minute` or `per_hour`.                                                                                      | `per_minute`                                                                                                     |
| `num_workers`: int      | The number of workers to start.\*                                                                                                                                            | `1`                                                                                                              |
| `poll`: bool            | If `true`, the method will wait until the `Instance` object is fully available. Otherwise it returns the partial `Instance` object with some fields set to null.             | `True`                                                                                                           |

\*A worker is a processing unit within an instance. Each worker with `hardware_type_id NVIDIA_L40S_2` has 2 GPUs, so an instance with `num_workers=2` contains 4 GPUs in total and is billed accordingly.

### `restart`

Restart an instance that has already been started and stopped by its `instance_id`.

| Parameters         | Description                                                                                                                                                      | Default  |
| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- |
| `instance_id`: str | The id of the instance.                                                                                                                                          | Required |
| `poll`: bool       | If `true`, the method will wait until the `Instance` object is fully available. Otherwise it returns the partial `Instance` object with some fields set to null. | `True`   |

### `poll_instance`

Poll an instance until it is running.

| Parameters           | Description                                            | Default  |
| -------------------- | ------------------------------------------------------ | -------- |
| `instance_id`: str   | The id of the instance.                                | Required |
| `poll_interval`: int | The interval in seconds between each status check.     | `5`      |
| `max_retries`: int   | The maximum number of retries before raising an error. | 150      |

#### Example&#x20;

{% code overflow="wrap" %}

```python
from langchain_openai import ChatOpenAI
from cortecs_py import Cortecs

model = 'cortecs/phi-4-FP8-Dynamic'
cortecs = Cortecs()

def do_some_work(instance):
    llm = ChatOpenAI(model_name=model, base_url=instance.base_url)
    joke = llm.invoke('Write a joke about LLMs.')
    print(joke.content)

#Start a new instance
my_instance = cortecs.start(model)
do_some_work(my_instance)

#If there is an existing instance with my arguments, use that one
my_instance = cortecs.ensure_instance(model)
do_some_work(my_instance)
cortecs.stop(my_instance.instance_id)

#More work requiring the same instance came up
my_instance = cortecs.restart(my_instance.instance_id)
```

{% endcode %}

## Managing instances

### `get_instance`

Get an instance by its id.

| Parameters         | Description             | Default  |
| ------------------ | ----------------------- | -------- |
| `instance_id`: str | The id of the instance. | Required |

### `get_instance_status`

Get the instance status by its id.

| Parameters         | Description             | Default  |
| ------------------ | ----------------------- | -------- |
| `instance_id`: str | The id of the instance. | Required |

### `get_all_instances`

Get all instances.

### `get_running_instances`

Get running instances.

#### Example

```python
from cortecs_py import Cortecs

cortecs = Cortecs()

# All instances
all_instances = cortecs.get_all_instances()
for instance in all_instances:
    print(instance.instance_id, instance.instance_status.status)

# Running instances
running_instances = cortecs.get_running_instances()
for instance in running_instances:
    print(instance.instance_id, instance.instance_status.status)

# Info about a specific instance
my_instance = all_instances[0]
instance = cortecs.get_instance(my_instance.instance_id)
instance_status = cortecs.get_instance_status(my_instance.instance_id)
```

## Stopping instances

Stopping an instance lets the user halt an instance as soon as a job is complete, avoiding additional costs while preserving the instance setup for future convenience.

### `stop`

Stop a specific instance by its id.

| Parameters         | Description             | Default  |
| ------------------ | ----------------------- | -------- |
| `instance_id`: str | The id of the instance. | Required |

### `stop_all`

Stop all running instances.

#### Example

```python
from langchain_openai import ChatOpenAI
from cortecs_py import Cortecs

model = 'cortecs/phi-4-FP8-Dynamic'
cortecs = Cortecs()

def do_some_work(instance):
    llm = ChatOpenAI(model_name=model, base_url=instance.base_url)
    joke = llm.invoke('Write a joke about LLMs.')
    print(joke.content)

# Start an instance and do some work
instance = cortecs.start(model)
do_some_work(instance)

# When the work is done, stop the instance.
cortecs.stop(instance_id)

# Alternatively stop all instances
cortecs.stop_all()
```

## Deleting instances

When a setup is no longer needed, the instance can be deleted from the user's console. Note that

### `delete`

Delete a stopped instance. Note that the instance must be stopped to be deleted, otherwise the method produces an error.

| Parameters         | Description             | Default  |
| ------------------ | ----------------------- | -------- |
| `instance_id`: str | The id of the instance. | Required |

### `delete_all`

Delete all instances.

| Parameters    | Description                                                                                                                    | Default |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------ | ------- |
| `force`: bool | If set to `true` all instances will be deleted, regardless of their status. Otherwise, only stopped instances will be deleted. | False   |

#### Example

```python
from langchain_openai import ChatOpenAI
from cortecs_py import Cortecs

model = 'cortecs/phi-4-FP8-Dynamic'
cortecs = Cortecs()

def do_some_work(instance):
    llm = ChatOpenAI(model_name=model, base_url=instance.base_url)
    joke = llm.invoke('Write a joke about LLMs.')
    print(joke.content)
    
instance = cortecs.start(model)
do_some_work(instance)

cortecs.stop(instance_id) # Stop the instance
cortecs.delete(instance_id) # Delete the instance

# Alternatively stop and delete all instances
cortecs.delete_all(force=True)
```