Batch jobs
Everything All at Once
Dedicated inference is the best way to process massive workloads. It enables parallel processing without restricting you to rate limits.
Example
DedicatedLLM
returns the ChatOpenAI object from LangChain. As LangChain supports batched inference with its Langchain Expression Language, it is easy to execute batch jobs using the batch(.)
method.
This simple example showcases the power of dynamic provisioning. We summarized 224.2k input tokens into 12.9k output tokens in 55 seconds.
The llm can be fully utilized in those 55 seconds enabling superior cost efficiency without rate limits.
Last updated