LBL On-Premises Models

Note

Questions? Email scienceit@lbl.gov or join the CBorg Users Chat group on Google Workspace.

The CBORG API provides access to a set of AI models hosted on-premises at Lawrence Berkeley National Laboratory. These models run entirely within LBL infrastructure — no data leaves the lab — making them suitable for sensitive research workflows.


We recommend using the lbl/cborg-* model aliases rather than referencing underlying model names directly. The aliases are mapped to specific model configurations that may be updated over time (e.g. when a newer or better model becomes available). Using the aliases ensures your application remains robust against future changes and version updates without requiring code changes on your end.

Chat & Reasoning Models

Model AliasDescription
lbl/cborg-chatOptimized for low latency and streaming; best for interactive chat applications
lbl/cborg-coderHighest quality reasoning with low latency and streaming; best for coding tasks
lbl/cborg-visionOptimized for visual question answering (vision + reasoning)
lbl/cborg-deepthoughtHighest quality reasoning with high throughput; best for complex analytical tasks
lbl/cborg-miniOptimized for lightweight tasks and small context windows

Specialized Models

Model AliasDescription
lbl/cborg-ocrOptimized for image-to-text conversion throughput without reasoning

Embedding Models

Warning

Embeddings are not portable across models. Because different embedding models produce incompatible vector spaces, we do not provide cborg-branded embedding aliases. You should pin your application to a specific embedding model name and avoid switching models without re-embedding your data.

The following embedding models are available on-premises:

ModelDimensionsDescription
nomic-embed-text768Good general-purpose text embedding for small-to-medium context
nomic-embed-vision768Image embedding model; shares the same embedding space as nomic-embed-text, enabling cross-modal retrieval
nomic-embed-code~3100Large embedding model optimized for source code

Because nomic-embed-text and nomic-embed-vision share the same embedding space, you can embed both text and images and compare them directly — useful for multimodal search and retrieval applications.


General Usage Tips

Parallelism

Limit your application to 5 parallel requests to on-premises models. Exceeding this may result in degraded performance or rejected requests for other users.

Long-Running & Agentic Workloads

It is perfectly fine to run agents and automated pipelines around the clock against the on-premises models. There is no time-of-day restriction.

Handling Rate Limit Errors

If you receive a 429 Too Many Requests error, use exponential backoff or a rate-limiting library to retry your requests gracefully. Do not simply retry in a tight loop.

Example using the Python tenacity library:

import openai
import os
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type

client = openai.OpenAI(
    base_url="https://api.cborg.lbl.gov",
    api_key=os.environ["CBORG_API_KEY"],
)

@retry(
    retry=retry_if_exception_type(openai.RateLimitError),
    wait=wait_exponential(multiplier=1, min=2, max=60),
    stop=stop_after_attempt(6),
)
def chat(prompt: str) -> str:
    response = client.chat.completions.create(
        model="lbl/cborg-chat",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

print(chat("Summarize the key findings of my experiment."))

Quick Start Example

import openai
import os

client = openai.OpenAI(
    base_url="https://api.cborg.lbl.gov",
    api_key=os.environ["CBORG_API_KEY"],
)

# Use a cborg alias — robust against future model updates
response = client.chat.completions.create(
    model="lbl/cborg-chat",
    messages=[{"role": "user", "content": "Hello! What can you help me with?"}],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Support

For questions or assistance, contact the Science IT team: