When it comes to realtime data processing, dedicated inference allows you to have:
No request limits. You can run hundreds of requests a seconds.
Stable latency. As compute is dedicated to you, there is stable performance.
In this simple example of a reddit bot demonstrates this. All comments from reddit are classified in realtime. Dozens of requests are sent each second to the classification chain, which classifies each comment into one of the categories Art, Finance, Science, Taylor Swift or Other. In case a comment about Taylor Swift is detected, the bot, a huge Taylor Swift fan, will create a comment in response.
import prawfrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.prompts import ChatPromptTemplatefrom cortecs_py import Cortecsfrom cortecs_py.integrations import DedicatedLLM# this example demonstrates dedicated inference in realtime settingsif__name__=='__main__': model_id ='neuralmagic--Meta-Llama-3.1-8B-Instruct-FP8' cortecs =Cortecs() reddit = praw.Reddit(user_agent='Read-only example bot')withDedicatedLLM(cortecs, model_id, context_length=1500, temperature=0.)as llm:# todo decrease context_length prompt = ChatPromptTemplate.from_template(""" Given the reddit post below, classify it as either `Art`, `Finance`, `Science`, `Taylor Swift` or `Other`. Do not provide an explanation.{channel}: {title}\n Classification:""") classification_chain = prompt | llm |StrOutputParser() prompt = ChatPromptTemplate.from_messages([ ("system", "You are the biggest Taylor Swift fan."), ("user", "Respond to this post:\n {comment}") ]) response_chain = prompt | llm# scan reddit in realtime and shill about tay tayfor post in reddit.subreddit("all").stream.comments(): topic = classification_chain.invoke({'channel': post.subreddit_name_prefixed, 'title': post.link_title})print(f'{post.subreddit_name_prefixed}{post.link_title}')if topic =='Taylor Swift': response = response_chain.invoke({'comment': post.body})print(post.body +'\n---> '+ response.content)
The example is build on praw, so if you want to run this example on your own machine you have to setup a Reddit account with API-Access first