Note
Questions? Email scienceit@lbl.gov or join the CBorg Users Chat group on Google Workspace.
The CBORG API provides access to a set of AI models hosted on-premises at Lawrence Berkeley National Laboratory. These models run entirely within LBL infrastructure — no data leaves the lab network. These models are also free to use with no API cost.
Recommended Model Aliases
We recommend using the lbl/cborg-* model aliases rather than referencing underlying model names directly. The aliases are mapped to specific model configurations that may be updated over time (e.g. when a newer or better model becomes available). Using the aliases ensures your application remains robust against future changes and version updates without requiring code changes on your end.
Chat & Reasoning Models
| Model Alias | Description |
|---|---|
lbl/cborg-chat | General-purpose chat; optimized for low latency and streaming (currently GPT-OSS 20B, reasoning: high) |
lbl/cborg-coder | High-quality coding and reasoning with large context window (currently Gemma 4 31B with Thinking) |
lbl/cborg-coder-fast | Fast coding assistance with lower latency (currently GPT-OSS 120B, reasoning: high) |
lbl/cborg-deepthought | Highest quality reasoning for complex analytical tasks using the full-size dense model with thinking (currently Gemma 4 31B with Thinking) |
lbl/cborg-mini | Lightweight reasoning with large output context; best for long-form generation (currently Gemma 4 E2B with Thinking) |
lbl/cborg-mini-fast | Fastest lightweight option with minimal reasoning overhead (currently Gemma 4 E2B non-Thinking) |
lbl/cborg-vision | Optimized for visual question answering using the full-size dense model with thinking (currently Gemma 4 31B with Thinking) |
lbl/cborg-vision-fast | Fast vision model with lower latency; good for simpler visual tasks (currently Gemma 4 E2B with Thinking) |
Specialized Models
| Model Alias | Description |
|---|---|
lbl/cborg-ocr | High-quality image-to-text extraction using the full-size dense model (currently Gemma 4 31B) |
lbl/cborg-ocr-fast | Faster OCR with lower latency for simpler documents (currently Gemma 4 E2B with Thinking) |
lbl/cborg-instant | Bounded-latency text generation via diffusion-based LLM; for short outputs and bidirectional reasoning (currently DiffusionGemma) |
lbl/cborg-instant-short | Same as lbl/cborg-instant with a shorter maximum output length for minimum bounded latency (256 tokens) |
lbl/cborg-safeguard | Content safety classifier; returns a safety assessment of the input (currently GPT-OSS Safeguard 120B) |
lbl/cborg-safeguard-high | Same as lbl/cborg-safeguard with higher reasoning effort for improved accuracy |
Embedding Models
Warning
Embeddings are not portable across models. Because different embedding models produce incompatible vector spaces, we do not provide cborg-branded embedding aliases. You should pin your application to a specific embedding model name and avoid switching models without re-embedding your data.
The following embedding models are available on-premises:
| Model | Dimensions | Description |
|---|---|---|
nomic-embed-text | 768 | Good general-purpose text embedding for small-to-medium context |
nomic-embed-vision | 768 | Image embedding model; shares the same embedding space as nomic-embed-text, enabling cross-modal retrieval |
nomic-embed-code | 3584 | Large embedding model optimized for source code |
Because nomic-embed-text and nomic-embed-vision share the same embedding space, you can embed both text and images and compare them directly — useful for multimodal search and retrieval applications.
General Usage Tips
Parallelism
Limit your application to 5 parallel requests to on-premises models. Exceeding this will result in a 429 Rate Limit Exceeded error.
Long-Running & Agentic Workloads
It is perfectly fine to run agents and automated pipelines around the clock against the on-premises models. There is no time-of-day restriction.
Handling Rate Limit and Network Errors
Warning
On-premises models may occasionally be taken offline for maintenance or updates. Your application should handle network errors gracefully and wait for service to be restored rather than failing immediately.
If you receive a 429 Too Many Requests error, use exponential backoff to retry your requests. For network-level errors (connection refused, timeouts, service unavailable), your application should retry indefinitely with a capped backoff — the service will come back online after maintenance completes.
Example using the Python tenacity library:
import openai
import os
import httpx
from tenacity import (
retry,
wait_exponential,
stop_after_attempt,
retry_if_exception_type,
retry_any,
)
client = openai.OpenAI(
base_url="https://api.cborg.lbl.gov",
api_key=os.environ["CBORG_API_KEY"],
)
# Retry on rate limits (up to 6 attempts) and on network/connection errors
# (unlimited retries — wait for the service to come back after maintenance).
@retry(
retry=retry_if_exception_type(openai.RateLimitError),
wait=wait_exponential(multiplier=1, min=2, max=60),
stop=stop_after_attempt(6),
)
@retry(
retry=retry_any(
retry_if_exception_type(openai.APIConnectionError),
retry_if_exception_type(openai.APIStatusError),
retry_if_exception_type(httpx.ConnectError),
retry_if_exception_type(httpx.RemoteProtocolError),
),
wait=wait_exponential(multiplier=2, min=5, max=120),
# No stop= here: keep retrying until the service is restored
)
def chat(prompt: str) -> str:
response = client.chat.completions.create(
model="lbl/cborg-chat",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
print(chat("Summarize the key findings of my experiment."))The inner decorator retries indefinitely on connection and protocol errors (backing off up to 2 minutes between attempts), so a long-running agent will automatically resume once the model service is restored after maintenance.
Quick Start Example
import openai
import os
client = openai.OpenAI(
base_url="https://api.cborg.lbl.gov",
api_key=os.environ["CBORG_API_KEY"],
)
# Use a cborg alias — robust against future model updates
response = client.chat.completions.create(
model="lbl/cborg-chat",
messages=[{"role": "user", "content": "Hello! What can you help me with?"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="", flush=True)Support
For questions or assistance, contact the Science IT team:
- Email: scienceit@lbl.gov
- Google Chat: CBorg Users Chat Group