AI Models

Available Models Summary

LBL-Hosted Customized Models

LBL-Hosted Customized Models use a customized system prompt on top of a base model, to provide improved behavior for LBL users in chat modes.

Note: API users can bypass the system prompt by accessing underlying models directly, if desired.

Model Endpoint LocationBase ModelModel NameContext Length*VisionCost**
LBL IT DivisionMistral Large 2CBorg Chat128KNFree
LBL IT DivisionMistral Large 2CBorg Coder128KNFree
LBL IT DivisionPhi 3.5 VisionCBorg Nano128KYFree

Chat and Vision Models

Note: This list is subject to change.

Model Endpoint LocationModel CreatorModel NameContext Length*VisionCost**
LBL IT DivisionMistralMistral Large 2128KNFree
LBL IT DivisionMistralMistral Large 2128KNFree
LBL IT DivisionMicrosoftPhi 3.5 Vision128KYFree
Microsoft Azure CloudOpenAIChatGPT 3.5*16KN$$
Microsoft Azure CloudOpenAIChatGPT 4o-Mini128KY$
Microsoft Azure CloudOpenAIChatGPT 4-Omni128KY$$$
Google CloudGoogleGemini 1.5 Flash1.0MY$
Google CloudGoogleGemini 1.5 Pro1.0MY$$
AWS CloudAnthropicClaude 3.0 Haiku200kY$
AWS CloudAnthropicClaude 3.5 Sonnet200kY$$
AWS CloudAnthropicClaude 3.0 Opus200kY$$$
AWS CloudMetaLlama 3.1 405b128kN$$
AWS CloudMetaClaude 3.0 Opus128kN$
AWS CloudMetaClaude 3.0 Opus128kN$
AWS CloudCohereCommand R+128kN$$
AWS CloudCohereCommand R128kN$
  • ChatGPT 3.5 is deprecated, please switch to ChatGPT 4o-Mini

Vector Embedding Models

Model Endpoint LocationModel CreatorModel NameMax TokensEmbedding DimensionsCost**
LBNL IT DivisionNomic.AInomic-embed-text8192768Free

Note

** Cost for using commercial models are paid for by the IT Division. There is no cost to individual users at this time and no PID is required.

Note

  • Context window sizes for commercially-hosted Generative AI models are reduced in CBORG Chat to limit excessive usage. To make use of the full-length of context windows please request an API key or engage with Science IT Consulting to discuss using cloud services with a PID recharge.

LBL-Hosted Models

The IT Division’s Science IT group provides access to open-weight models running on Berkeley Lab-owned networks and hardware, located in the Building 50 data center. LBL-Hosted models are free-to-use.

These models are licensed for non-commercial research use.

  • Endpoint Location: LBNL IT Division Data Center
  • Model Name: lbl/cborg-chat:latest
  • Underlying Model: Mistral Large 2407 with Custom System Prompt

  • Endpoint Location: LBNL IT Division Data Center
  • Model Name: lbl/cborg-coder:latest
  • Underlying Model: Mistral Large 2407 with Custom System Prompt and Temperature = 0.0

  • Endpoint Location: LBNL IT Division Data Center
  • Model Name: lbl/cborg-nano:latest
  • Underlying Model: Phi 3.5 Vision with Custom System Prompt

  • Endpoint Location: LBNL IT Division Data Center
  • Use Cases: Chat, Summarization, Coding Assistant
  • Vision Support: No
  • Tool Support: Yes
  • Context Window: 128K Tokens
  • Cost: Free to use
  • Model Name: lbl/mistral-large
  • Model Information: Mistral Large 2
  • Terms of Service: Mistral Research License

  • Endpoint Location: LBNL IT Division Data Center
  • Use Cases: Summarization, Vision
  • Vision Support: No
  • Tool Support: Yes
  • Context Window: 128K Tokens
  • Cost: Free to use
  • Model Name: lbl/phi
  • Model Information: Phi Open Models
  • Terms of Service: MIT License

A high-performing open embedding model with a large token context window. nomic-embed-text is popular for use with self-hosted ollama installations. This provides a hosted endpoint with the same model.

  • Endpoint Location: LBNL IT Division Data Center
  • Use Cases: Query and Passage Encoding
  • Max Tokens: 8192
  • Embedding Dimensions: 768
  • Cost: Free to use
  • API Model Name: lbl/nomic-embed-text
  • Model Information: Nomic.AI

CURRENTLY OFFLINE - WILL BE RESTORED SOON

e5-large-v2 is based on research originating from Microsoft Research, as described in Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022.

e5-large-v2 is a popular embedding model for vector search and retreival augmented generation, but is a small embedding model by current standards.

  • Endpoint Location: LBNL IT Division Data Center
  • Use Cases: Query and Passage Encoding
  • Max Tokens: 512
  • Embedding Dimensions: 1024
  • Cost: Free to use
  • API Model Name: lbl/e5-large-v2
  • Model Card: HuggingFace intfloat/e5-large-v2

CURRENTLY OFFLINE - WILL BE RESTORED SOON

NV-Embed-v1

NV-Embed-v1 is a leading embedding model created by Nvidia, ranked No. 1 on the Massive Text Embedding Benchmark (MTEB benchmark) as of May 24, 2024. NV-Embed-v1 is licensed for non-commercial use only.

  • Endpoint Location: LBNL IT Division Data Center
  • Use Cases: Instructed Query and Passage Encoding
  • Max Tokens: 8192
  • Embedding Dimensions: 4096
  • Cost: Free to use
  • API Model Name: lbl/nv-embed-v1
  • Notes: For non-commercial use only.
  • Model Card: HuggingFace nvidia/NV-Embed-v1

Cloud-Hosted Models

Cloud-hosted models are provided using on-demand services from commercial cloud providers. Costs for using these models are paid for by the IT Division. Please select the appropriate model for your application, keeping in mind the cost burdens associated with each. Using these models will cause your data to be shared with cloud providers in accordance with their terms of service. For detailed terms of service of each provider, see the model details below.

Model Aliases

To simplify application development, alias model names are provided that point to the recommended version of each model provider, as follows:

Alias NameBase Model
/openai/chatgpt:latest/openai/gpt-4o
/anthropic/claude:latest/anthropic/claude-sonnet
/google/gemini:latestgoogle/gemini-1.5-pro

Note: We use ChatGPT through Microsoft Azure Cloud AI Services, subject to OpenAI/Azure commercial terms of service. This model is deprecated, please switch to ChatGPT 4o-mini.

  • Endpoint Location: Microsoft Azure Cloud (East US)
  • Use Cases: Chat, Text Summarization
  • Vision Support: No
  • Tool Support: No
  • Context Window: 16K Tokens
  • Cost per 1M Tokens (Input): $0.50
  • Cost per 1M Tokens (Output): $1.50
  • API Model Name: openai/gpt-3.5-turbo
  • Pricing Details: Azure OpenAI Service Pricing
  • Terms of Service: Code of conduct for Azure OpenAI Service

ChatGPT-4o-Mini is the latest cost-efficient version ChatGPT from OpenAI. It is faster and lower cost compared to the GPT-4o model, and less than half the cost of ChatGPT 3.5. In addition, 4o-Mini supports a long context window and is multi-modal with vision support. Note: We use ChatGPT through Microsoft Azure Cloud AI Services, subject to the Azure + OpenAI commercial terms of service. GPT 4o-Mini is accessed through the regional deployment based in in the East US Azure region.

  • Endpoint Location: Microsoft Azure Cloud (East US)
  • Use Cases: Chat, Text Summarization, Image Description, Tool Use
  • Vision Support: Yes
  • Tool Support: Yes
  • Context Window: 128K Tokens (Note: Limited to 32K in CBORG Chat)
  • Cost per 1M Tokens (Input): $0.165
  • Cost per 1M Tokens (Output): $0.66
  • API Model Name: openai/gpt-4o-mini
  • Pricing Details: Azure OpenAI Service Pricing
  • Terms of Service: Code of conduct for Azure OpenAI Service

ChatGPT-4o is the latest version of ChatGPT from OpenAI. It is faster and lower cost compared to the legacy GPT-4 model. Note: We use ChatGPT through Microsoft Azure Cloud AI Services, subject to the Azure + OpenAI commercial terms of service.

  • Endpoint Location: Microsoft Azure Cloud (East US)
  • Use Cases: Chat, Text Summarization, Image Description, Tool Use
  • Vision Support: Yes
  • Tool Support: Yes
  • Context Window: 128K Tokens (Note: Limited to 8K in CBORG Chat)
  • Cost per 1M Tokens (Input): $5.00
  • Cost per 1M Tokens (Output): $15.00
  • API Model Name: openai/gpt-4o
  • Pricing Details: Azure OpenAI Service Pricing
  • Terms of Service: Code of conduct for Azure OpenAI Service

Our service connects to the enterprise version of Google Gemini. Inputs are not used by Google for training of future AI models.

  • Endpoint Location: Google Cloud
  • Use Cases: Chat, Text Summarization, Image Description
  • Vision Support: Yes
  • Tool Support: No
  • Context Window: 1.0M Tokens (Note: Limited to 32K in CBORG Chat)
  • Cost per 1M Tokens (Input): $0.35
  • Cost per 1M Tokens (Output): $0.70
  • API Model Name: google/gemini-1.5-flash
  • Pricing Details: Gemini API Pricing
  • Terms of Service: Gemini API Additional Terms of Use

Our service connects to the enterprise version of Google Gemini. Inputs are not used by Google for training of future AI models.

  • Endpoint Location: Google Cloud
  • Use Cases: Chat, Text Summarization, Image Description
  • Vision Support: Yes
  • Tool Support: No
  • Context Window: 1.0M Tokens (Note: Limited to 16K in CBORG Chat)
  • Cost per 1M Tokens (Input): $3.50
  • Cost per 1M Tokens (Output): $7.00
  • API Model Name: google/gemini-1.5-pro
  • Pricing Details: Gemini API Pricing
  • Terms of Service: Gemini API Additional Terms of Use

Claude has excellent reasoning and code analysis capabilities compared to other leading models, but can be expensive in the largest variants. The 200K token context window is large compared to competitors. The Haiku version is suitable for short text summarization.

  • Endpoint Location: Amazon Web Services (US West)
  • Use Cases: Chat, Text Summarization, Image Description
  • Vision Support: Yes
  • Tool Support: Yes
  • Context Window: 200k Tokens (Note: Limited to 64K in CBORG Chat)
  • Cost per 1M Tokens (Input): $0.25
  • Cost per 1M Tokens (Output): $1.25
  • API Model Name: anthropic/claude-haiku
  • Pricing Details: Anthropic API Pricing
  • Terms of Service: Anthropic Commercial Terms of Service

Claude has superior reasoning and code analysis capabilities compared to other leading models, but can be expensive in the largest variants. The 200K token context window is large compared to competitors. The 3.5 Sonnet is the latest version of Claude, outperforming 3.0 Opus with lower cost and faster inference speed.

  • Endpoint Location: Amazon Web Services (US West)
  • Use Cases: Chat, Text Summarization, Image Description
  • Vision Support: Yes
  • Tool Support: Yes
  • Context Window: 200k Tokens (Note: Limited to 16K in CBORG Chat)
  • Cost per 1M Tokens (Input): $3.00
  • Cost per 1M Tokens (Output): $15.00
  • API Model Name: anthropic/claude-sonnet
  • Pricing Details: Anthropic API Pricing
  • Terms of Service: Anthropic Commercial Terms of Service

Claude has excellent reasoning and code analysis capabilities compared to other leading models, but can be expensive in the largest variants.

  • Endpoint Location: Amazon Web Services (US West)
  • Use Cases: Chat, Text Summarization, Image Description
  • Vision Support: Yes
  • Tool Support: Yes
  • Context Window: 200k Tokens (Note: Limited to 4096 in CBORG Chat)
  • Cost per 1M Tokens (Input): $15.00
  • Cost per 1M Tokens (Output): $75.00
  • API Model Name: anthropic/claude-opus
  • Pricing Details: Anthropic API Pricing
  • Terms of Service: Anthropic Commercial Terms of Service

Llama 3.1 is the latest version of the open source LLM from Meta. Llama is friendly and conversational, with capabilities approximately equivalent to ChatGPT 4 in the 405B-parameter version.

  • Endpoint Location: Amazon Web Services (US West)
  • Use Cases: Chat, Text Summarization, Coding Assistant
  • Vision Support: No
  • Tool Support: No
  • Context Window: 128K Tokens
  • Cost per 1M Tokens (Input): $5.32
  • Cost per 1M Tokens (Output): $16.00
  • API Model Name: lbl/llama-3
  • Terms of Service: Meta Llama Model Card

Cohere Command R and R+ has an advanced self-hosted model well suited to technical applications. The model works well for text summarization and RAG applications with long documents. R+ also supports tool use / function calling.

  • Endpoint Location: Amazon Web Services (US West)
  • Use Cases: Chat, Text Summarization, RAG, Tool Use
  • Vision Support: No
  • Tool Support: Yes
  • Context Window: 128K Tokens
  • Cost per 1M Tokens (Input): $1.50
  • Cost per 1M Tokens (Output): $3.00
  • API Model Name: lbl/command-r-plus
  • Terms of Service: Cohere For AI Acceptable Use Policy

Frequently Asked Questions

1. What is the role of context length?

The context length is a measure of the approximate number of words that a model can process as inputs. Some models support extremely long context lengths, such as Command R+ and ChatGPT 4-Omni (128K tokens), the Anthropic Claude models (200K) and Google Gemini 1.5 (1.0M Tokens). For a typical book with 300 words per page, the correspondence between pages and tokens is approximately as follows:

Context LengthPages of TextModel Support*
1.0M2000Google Gemini 1.5
200K400Anthropic Claude
128K250ChatGPT 4, Mistral Large, Phi 3.5
64K128
32K64
16K32ChatGPT 3.5
8K16Llama 3 70B
4K8

When chatting with a model, your entire chat history of the session is fed into the context window with every message sent. Therefore, as you send more messages the context length will increase. Over time this can cause the cost of each message exchange to increase until the model’s maximum token limit is reached.

Note

  • Note: In CBORG Chat, we have set the maximum context length of commercial models to significnatly lower limits compared to their design maximum, in order to control costs for the IT Division. If you need to use a model employing the full-length context window, our API key service provides access to commercial models with the full context window.