# Batch jobs

Dedicated inference is the best way to process massive workloads. It enables parallel processing without restricting you to rate limits.

## Example

`DedicatedLLM` returns the  [ChatOpenAI](https://python.langchain.com/docs/integrations/chat/openai/) object from LangChain. As LangChain supports batched inference with its Langchain Expression Language, it is easy to execute batch jobs using the `batch(.)` method.

```python
with DedicatedLLM(client=cortecs, model_name='<MDOEL_NAME>') as llm:
    chain = ... | llm
    summaries = chain.batch([{...} for doc in docs])
```

This simple example showcases the power of dynamic provisioning. We summarized **224.2k input tokens** into **12.9k output tokens** in **55 seconds**.&#x20;

```python
from langchain_community.document_loaders import ArxivLoader
from langchain_core.prompts import ChatPromptTemplate

from cortecs_py.client import Cortecs
from cortecs_py.integrations.langchain import DedicatedLLM

cortecs = Cortecs()
loader = ArxivLoader(
    query="reasoning",
    load_max_docs=40,
    get_ful_documents=True,
    doc_content_chars_max=25000,  # ~6.25k tokens, make sure the models supports that context length
    load_all_available_meta=False
)

prompt = ChatPromptTemplate.from_template("{text}\n\n Explain to me like I'm five:")
docs = loader.load()

with DedicatedLLM(client=cortecs, model_name='cortecs/phi-4-FP8-Dynamic') as llm:
    chain = prompt | llm

    print("Processing data batch-wise ...")
    summaries = chain.batch([{"text": doc.page_content} for doc in docs])
    for summary in summaries:
        print(summary.content + '-------\n\n\n')

```

The llm can be **fully utilized** in those 55 seconds enabling superior cost efficiency without rate limits.&#x20;

<figure><img src="https://github.com/user-attachments/assets/3d50d642-9f78-4336-a1a5-235b109d5f68" alt="" width="375"><figcaption><p>Price Comparison (USD)</p></figcaption></figure>
