Python client
Start, manage and stop models
Last updated
Start, manage and stop models
Last updated
In some use cases, like batched or scheduled jobs, it is useful to start the compute resources and shut them down automatically after the job is finished. cortecs dynamic provisioning allows you to do exactly that - start an instance of the desired model, execute the job requiring LLM resources and shut it down when the resources are no longer required. That way, you are paying for the exact amount of resources you used, and not a minute more!
Cortecs-py is a ligthweight Python wrapper of our REST API. It provides you with the necessary tools to dynamically manage your instances directly from your workflow.
In order to use the API you need to create your access credentials in your profile page first. Before accessing the API make sure your environment variables are set:
OPENAI_API_KEY
-> use your cortecs api key
CORTECS_CLIENT_ID
CORTECS_CLIENT_SECRET
The client helps you start, manage and stop your models.
Method | Description | Return |
---|---|---|
The client has two methods to start an instance:
start
: This method starts an instance and returns a status dict
.
start_and_poll
: This method starts an instance and waits until provisioning is finished. It returns the instance_id and a dict
containing base_url and model_name as tuple
, e.g.
'123456', {
'base_url': 'https://your_url.cortecs.ai',
'model_name': 'neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8'
}
Both methods accept the parameters as listed below.
You can use the get_instances_status
method to retrieve a list of all your instances and their details (a detailed list of the returned information is available here).
To stop an instance call the stop
method and pass the instance_id
.
Parameters | Description | Default |
---|---|---|
start
Starts the model.
Model status dict
start_and_poll
Starts the model and waits until it is provisioned and ready to use.
Tuple of instance_id, llm_info dict
get_instances_status
Retrieves a list of all existing instances (both running and stopped).
A list of all existing instances and their details
stop
Stops the model.
Response in dict
format
model_name
str
- The model name in HuggingFace format (e.g. neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8'
).
Mandatory
instance_type
str
- The hardware configuration string, e.g. NVIDIA_L40S_1
.
The recommended hardware configuration
context_length
int
- The maximum context length the model should be initialized with. A larger context length slows down inference, so it's a good practice to limit it according to your use case.
32k tokens or the maximum context length of the corresponding hardware config (if it is smaller than 32k)
force
bool
- True always creates a new instance.
False checks if an instance with the same model_name & instance_type is running. If so, nothing happens, otherwise, if it exists but was stopped, it is restarted.
False