Docs
cortecs.aiDedicated ModelsServerless ModelsLogin
  • Introduction
  • DEDICATED INFERENCE
    • Quickstart
    • Provisioning API
      • Authentication
      • User
      • Instances
      • Models
      • Hardware Types
    • Python client
      • Objects
      • Langchain integration
    • Examples
      • Batch jobs
      • Realtime streams
  • SERVERLESS INFERENCE
    • Quickstart
    • About Serverless Routing
    • API
      • Chat Completions
      • Models
  • Discord
Powered by GitBook
On this page
  1. DEDICATED INFERENCE
  2. Examples

Realtime streams

Zero Limits, Instant Response

When it comes to realtime data processing, dedicated inference allows you to have:

  • No request limits. You can run hundreds of requests a seconds.

  • Stable latency. As compute is dedicated to you, there is stable performance.

In this simple example of a reddit bot demonstrates this. The whole reddit stream is classified in realtime. Dozens of requests are sent each second to a chain, which classifies each comment into one of the categories Art, Finance, Science, Taylor Swift or Other. In case a comment about Taylor Swift is detected, the bot, a huge Taylor Swift fan, will create a comment in response.

import praw
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

from cortecs_py import Cortecs
from cortecs_py.integrations.langchain import DedicatedLLM

# this example demonstrates dedicated inference in realtime settings
if __name__ == '__main__':
    model_name = 'cortecs/phi-4-FP8-Dynamic'
    cortecs = Cortecs()
    reddit = praw.Reddit(user_agent='Read-only example bot')

    with DedicatedLLM(cortecs, model_name, context_length=1500, temperature=0.) as llm:  # todo decrease context_length

        prompt = ChatPromptTemplate.from_template("""
        Given the reddit post below, classify it as either `Art`, `Finance`, `Science`, `Taylor Swift` or `Other`.
        Do not provide an explanation.
        
        {channel}: {title}\n Classification:""")
        classification_chain = prompt | llm | StrOutputParser()

        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are the biggest Taylor Swift fan."),
            ("user", "Respond to this post:\n {comment}")
        ])
        response_chain = prompt | llm

        # scan reddit in realtime and shill about tay tay
        for post in reddit.subreddit("all").stream.comments():
            topic = classification_chain.invoke({'channel': post.subreddit_name_prefixed, 'title': post.link_title})
            print(f'{post.subreddit_name_prefixed} {post.link_title}')
            if topic == 'Taylor Swift':
                response = response_chain.invoke({'comment': post.body})
                print(post.body + '\n---> ' + response.content)

The example is build on praw, so if you want to run this example on your own machine you have to setup a Reddit account with API-Access first

PreviousBatch jobsNextQuickstart

Last updated 4 months ago