docs cortecs
cortecs.aiModels
  • Getting started
    • Introduction
    • Quickstart
    • LLM Workers
  • Examples
    • Basics
    • Structured output
    • Batch jobs
    • Multi-agents
    • Realtime streams
  • cortecs-py
    • Python client
      • Objects
    • Integrations
  • API
    • Authentication
    • User
    • Instances
    • Models
    • Hardware Types
  • Discord
Powered by GitBook
On this page
  1. Examples

Streaming inference

Streaming inference is supported out-of-the-box, allowing you to receive responses from the model in real-time as they are generated. This feature is particularly useful for applications that require immediate feedback or need to process large amounts of data incrementally.

Using OpenAI

To enable streaming with the OpenAI library, set the stream parameter to true.

from openai import OpenAI

client = OpenAI(api_key='<API_KEY>',
                base_url='<MODEL_URL>')

response = client.chat.completions.create(
  model="meta-llama/Meta-Llama-3.1-8B-Instruct",
  messages=[
    {"role": "user", "content": "Tell me a joke."}
  ],
  stream=True
)

for chunk in completion:
    print(chunk.choices[0].delta.content)
import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: '<API_KEY>',
    baseURL: '<MODEL_URL>'
});

async function main() {
  const completion = await openai.chat.completions.create({
    messages: [{ role: "user", content: "Tell me a joke." }],
    model: "meta-llama/Meta-Llama-3.1-8B-Instruct",
    stream: true
  });

  for await (const part of completion) {
    console.log(part.choices[0]?.delta?.content || '');
  }
}

main();

Using LangChain

from langchain_openai import OpenAI

llm = OpenAI(openai_api_key='<API_KEY>',
             openai_api_base='<MODEL_URL>',
             model_name='meta-llama/Meta-Llama-3.1-8B-Instruct')

for chunk in llm.stream('Tell me a joke.'):
    print(chunk)

Last updated 6 months ago

Streaming is also supported in LangChain, offering fine-grained control over streaming responses. For detailed usage, refer to the .

LangChain docs