Overview / Best Practices / Model Selection

Contents
  • Coding Tools
  • Technical Writing
  • Data Extraction & Summarization

Getting Started with Coding Tools

  • VS Code with RooCode: Works well with CBorg API - powerful agentic features and easy to switch between models to control costs.
  • Aider CLI: An open-source CLI “agentic” tool that works well with CBorg Coder (lbl/cborg-coder) for unlimited free usage.
  • Codex CLI: Recommended to use with o4-mini (excellent cost-performance efficiency) or gpt-oss-120b (free to use).

Also Good with Caveats

  • VS Code with Continue: Works well for more targeted code edits (insertion, line editing). However, setup is more complicated.
  • Claude Code: Excellent overall, but very expensive. The “agentic” nature of this tool can lead to very high cost to complete simple operations. Lacks support for codebase indexing.
  • Gemini CLI: Excellent overall, but also very expensive to use when Gemini Pro is enabled. Recommended to reserve this model for use with high-complexity tasks.

Unsupported Tools:

  • Cursor: You need a subscription from Cursor for best results, which is not available from CBorg
  • Github Copilot: Not supported by CBorg - you need a subscription from GitHub to use it

Limitations of AI Coders:

  • Knowledge Cutoff Date: Models are trained on a corpus of data that is typically 6-12 months old and cannot generate code for features or APIs newer than the model’s cutoff date.
  • Poor performance in scientific coding: Coding models are currently weak in most scientific and numerical programming tasks (see SciCode benchmark results)
  • Poor performance on new, obscure or unstable APIs: Models will hallucinate frequently when generating code for APIs that are obscure, due to lack of sufficient examples in their training data. Due to knowledge cut-off dates, models will be unaware of recent changes to APIs newer than 6-12 months ago. Unstable APIs with multiple incompatible versions may result in broken code generation. Consider augmenting your coder’s knowledge with access to the API documentation, e.g. using the Context7 MCP tool.
  • Context Length Limit: Models can only ingest a finite amount of code at inference time, limiting its total field of view in large codebases. Use of indexing (vector search) can help, but is not guaranteed to ensure the coding agent is aware of all relevant code to a prompt. Models can only generate a finite and usually relatively small amount of output code in each turn, limiting its ability to rewrite large sections of code. Use of ‘diff’ for edits can enable working with large files, but comes with issues of impect diff matching causing errors.
  • Context Rot: Models are easily overwhelmed when presented with multiple instructions in a prompt, and randomly “forget” to follow instructions particularly when the context length becomes long. Keep prompts as simple as possible and do not attempt to perform multiple tasks in a single prompt.
  • Poor adherence to DRY principles: LLM-generated code tends to have a lower “information density” due to excessive repetition and underutilization of abstractions, leading to code that is less efficient to execute and more difficult to maintain.
  • Bias toward on Legacy Solutions: LLMs tend to over-use legacy code libraries because there is more documentation for these in the training data, causing underutilization of newer and better alternatives.
  • Low Quality of Tab-Completion: “Fill-in-the-middle” models which are used for tab-style autocomplete rely on extremely small models, which have very low performance on complex coding tasks. Large models cannot be used for tab completion because of high latency.
  • Potential Data Loss: AI coding tools can perform unexpected actions such as deleting code, corrupting files, breaking code or even deleting files. Always use a version control system and maintain backups of your code base to guard against potential data loss.

Model Selection for Coding

Last updated: 2025-08-25

Recommended Choices

ModelCostSpeedComments
lbl/cborg-coderFreeFastGood performance and fast. Unlimited usage
google/gemini-pro-high$$$SlowBest for complex coding, scientific coding. Expensive - watch usage
anthropic/claude-sonnet-high$$$SlowGood for complex coding. Expensive. Recommended for ClaudeCode CLI
xai/grok-3-mini$MediumSolid performer and very low cost
gcp/qwen-3-coder*$MediumExcellent performance and low cost. See comments
openai/o3-high$$$SlowGood performance. Recommended model for OpenAI Codex CLI
openai/o4-mini-high$$SlowGood performance. Recommended model for OpenAI Codex CLI

Currently Not Recommended for Coding

ModelCostSpeedComments
google/gemini-pro$$$MediumRecommended to use the -high variant instead
anthropic/claude-opus$$$$SlowVery expensive, with minimal improvements over Sonnet
anthropic/claude-opus-high$$$$SlowVery expensive, with minimal improvements over Sonnet
xai/grok-3$$MediumRecommended alternative xai/grok-3-mini which has reasoning
openai/gpt-5$$MediumRecommended alternative openai/o3-high or openai/o4-mini-high
lbl/cborg-chatFreeSlowUnderperforming for coding tasks
azure/deepseek-r1*$$$SlowGood performance but excessive reasoning. See comments

Comments:

  • Qwen 3 Coder: Model contains censorship and may have weak gaurdrails. Use with caution.
  • Azure DeepSeek R1: MSAI-DS-R1 model is fine tuned by Microsoft AI to remove censorship and improve performance, but generates excessive reasoning tokens on hard tasks which hurts cost efficiency. May have weak guardrails compared to alternatives. Use with caution.

Models for Advanced Technical Writing

Recommended Best Practices:

  • Style Emulation: Using a custom system prompt, provide a large corpus of sample writing and ask the model to emulate the style and language. This is more effective for maintaining style than using general descriptions of writing style.
  • Complex Document Formation: Use multiple prompts to construct the document one paragraph or one section at a time. This is more scalable and will maintain accuracy and tone throughout.

Best models: google/gemini-pro-high, anthropic/claude-sonnet-high, openai/o3-high


Models for Data Extraction / Summarization

Recommended Best Practices:

  • Use a small / lightweight model and extract data one piece at a time.
  • Do not overload model instructions by asking too many questions at once - context rot will cause poor performance.
  • Use chain of thought prompting to enhance accuracy, or use a reasoning model.
  • Save the CoT result and/or reasoning content to improve transparency when debugging data extraction runs.
  • Add line numbers to your data, then ask AI to include line number references in extraction, so you can debug/trace where it found information.
  • Structure prompts with all bulk data and static content first, then dynamic data/questions at the end. This will leverage implicit prompt caching and result in faster inference time and lower cost on endpoints with implicit prompt caching (all openai/ models)

Models for Data Extraction

Last updated: 2025-08-25

Recommended Selections

ModelCostSpeedComments
lbl/cborg-deepthoughtFreeMediumUnlimited usage
lbl/cborg-miniFreeFastUnlimited usage
lbl/cborg-visionFreeMediumFor vision extraction - unlimited usage
openai/gpt-5-mini$$MediumLow cost
openai/gpt-5-nano$FastVery Low cost with good performance
aws/llama-4-maverick$MediumVery low cost with good performance and long context
google/gemini-flash-high$$MediumMore expensive than alternatives - use if you need huge context

Not Currently Recommended

ModelCostSpeedComments
google/gemini-flash$FastMuch better performance with reasoning (-high variant)
anthropic/claude-haiku$$MediumHigh cost and underperforming relative to other small models
openai/gpt-5$$$MediumCost-prohibitive for scaling up to large data extraction tasks

Example Code (Not a working example - just a starting point):

import json

def ask_cborg(query):

    # helper function to send query to a model
    # see other example code to implement this
    ...

fields = {
    'project': 'The name of the project',
    'instrument': 'The name of the scientific instrument',
    'status': 'The status of the scientific instrument'
}

data = {}

for key, query in fields.items():

    data[key] = json.loads(ask_cborg(
        "Read the following document and then respond to the question below.\n" + 
        "[[ * Start of Document * ]]\n" + 
        document + 
        "\n[[ * End of Document ]]\n" + 
        f"What is {query}? First answer the question in natural language, describing how it relates to the document, then provide a concise final answer." + 
        "Use the following JSON template: " +
        "{ 'answer': Your answer in natural language, 'final_answer': The final answer. }\n" +
        "Only output the JSON object without any additional commentary."
      ))['final_answer']

Using a loop over each key value as shown above will work better than attempting to extract a large number of items in a single query.

This approach can also be scaled to a very large number of items without any loss in accuracy.

Use of a small model is recommended to reduce cost. Prompt caching can also reduce cost as the preamble for each query is repeated.