Streaming inference
Streaming inference is supported out-of-the-box, allowing you to receive responses from the model in real-time as they are generated. This feature is particularly useful for applications that require immediate feedback or need to process large amounts of data incrementally.
Using OpenAI
To enable streaming with the OpenAI library, set the stream
parameter to true
.
Using Langchain
Streaming is also supported in Langchain, offering fine-grained control over streaming responses. For detailed usage, refer to the langchain docs.
Last updated