Introduction

Cortecs simplifies running dedicated inference endpoints at peak performance.

Welcome to the cortecs docs! cortecs makes it easy to run dedicated language models 🔐 at maximum performance 🚀.

Why Dedicated Inference?

Dedicated inference offers exclusive access to a specific model, ensuring that you are the sole user of the underlying compute resources. This makes it particularly suitable for applications that:

  • Require high data security

  • Need guaranteed latency

  • Have a heavy workload

  • Involve batch processing tasks

Which model should I use?

Cortecs offers a variety of popular models. Visit our models page to explore the available options. Each model comes with detailed information and quality assessments to help you determine if it meets your requirements. Generally, more complex tasks require larger models, while smaller models provide faster performance. For most use cases, we recommend FP8 ⚡ quantized models.

Don't see a model you want to use? Join our Discord to add or upvote the model you'd love to use.

Next steps

  • Go to cortecs.ai and start your first models.

  • Follow our Quickstart.

  • Accomplish more complex tasks using langchain.

Resources

Last updated