AI Toolbox

The tools we
actually build with.

Our curated stack. The platforms, models, engines, frameworks, and methodologies we draw from when we design and deploy AI systems. Not exhaustive. Opinionated. If it is here, we have used it or read it, and we stand behind it.

AI Stack Generative Media Data & Retrieval Infrastructure Practices Reading

AI Stack

AI Models

The underlying neural networks. Models are trained artifacts. Engines are the tools you use to interact with them.

Claude↗ Frontier

Anthropic's flagship LLM family. Strong reasoning, long context, low hallucination rate.
GPT (OpenAI)↗ Frontier

The GPT family. Widest ecosystem, mature tooling, highest cost at production scale.
Gemini↗ Frontier

Google DeepMind's multimodal frontier family. Deep Google Cloud integration.
Grok (xAI)↗ Frontier

xAI's frontier model. Fast iteration, tight X-platform integration.
Cohere Command↗ Enterprise

Enterprise-focused frontier models with strong retrieval and multilingual capability. Self-hostable at scale.
AI21 Jamba↗ Hybrid arch

Hybrid Mamba-Transformer architecture. Extremely long context with efficient throughput.
Reka↗ Multimodal

Native multimodal models with image, video, and audio understanding built in.
Llama↗ Open weight

Meta's open-weight family. Self-hostable, commercially usable, community-backed.
Mistral↗ Open weight

European open-weight models. Strong performance-to-size ratios, commercial and Apache options.
DeepSeek↗ Open weight

Chinese frontier-class open-weight models. Aggressive price-performance, MoE architectures.
Qwen↗ Open weight

Alibaba's multilingual model family. Strong non-English performance, permissive licensing.
Gemma↗ Google open

Google's open-weight model family. Built from the same research as Gemini, runnable on consumer hardware.
Phi↗ Microsoft open

Microsoft's small, capable open models. Tuned for on-device and edge workloads.
NVIDIA Nemotron↗ NVIDIA

NVIDIA's open model family. Strong for building agentic systems on NVIDIA infrastructure.

AI Engines: Web

Browser-based chat and search interfaces. The public on-ramp for most AI use.

ChatGPT↗ Web

OpenAI's flagship web interface. Still the default on-ramp for most people using AI.
Claude.ai↗ Web

Anthropic's web client for Claude. Projects, Artifacts, and file-first workflows.
Google Gemini↗ Web

Google's consumer chat surface for Gemini. Deep integration with Workspace and Search.
Grok↗ Web

xAI's chat interface. Real-time access to the X firehose for current-events queries.
Perplexity↗ Search

AI-powered search with inline citations. A Google replacement for many technical queries.
Poe↗ Multi-model

Quora's multi-model chat client. One interface, many models, useful for comparison.

AI Engines: Desktop

Native applications. Better OS integration, local-model support, MCP tool access.

Claude Desktop↗ Mac/Win

Native Claude app. First-class MCP server support. The reference client for local tool access.
ChatGPT Desktop↗ Mac/Win

Native ChatGPT app. Tighter OS integration for voice and screen capture workflows.
LM Studio↗ Local models

Desktop engine for running open-weight models locally. Chat UI + server in one app.
Msty↗ Local + cloud

Desktop chat that handles both local and cloud models in one unified interface.

AI Engines: Command Line

Terminal-native agents. The engine of choice for developers and automation workflows.

Claude Code↗ Anthropic

Anthropic's agentic CLI. Autonomously edits code, runs commands, completes multi-step tasks.
OpenAI Codex CLI↗ OpenAI

OpenAI's open source coding agent. Runs locally with GPT models for code generation and edits.
Gemini CLI↗ Google

Google's open source terminal agent for Gemini. Code, reasoning, and tool use from the CLI.
Aider↗ Open source

Git-aware pair-programming CLI. Model-agnostic, strong for diff-based edits to large codebases.
GitHub Copilot CLI↗ GitHub

Copilot in the terminal. Suggests shell commands and explains existing ones.
Goose↗ Block

Block's open source CLI agent. MCP-native, extensible, strong for developer automation.

AI Engines: IDE Integration

AI built into the editor. Inline completion, chat panels, and multi-file agentic edits.

GitHub Copilot↗ VS Code + JetBrains

The original inline code assistant. Chat, edits, agents, and the widest IDE coverage.
Cursor↗ Fork of VS Code

AI-native editor with multi-file edits, agents, and tight inline context. Default for many teams.
Windsurf↗ Codeium

Codeium's agentic IDE. Strong Cascade agent for multi-file and cross-repo work.
Cline↗ VS Code ext

Open source agentic coding extension. Executes commands, edits files, runs checks inline.
Continue↗ Open source

Open source Copilot alternative. Bring your own model, full control over context and rules.
Zed↗ Native editor

Rust-built editor with first-class AI assistant, inline prediction, and agent panel.
Cody↗ Sourcegraph

Sourcegraph-powered AI with deep codebase context through their indexing infrastructure.
JetBrains AI Assistant↗ JetBrains

Native AI across IntelliJ, PyCharm, WebStorm, and the rest of the JetBrains family.

AI Service Providers

Hosted inference APIs. Where you actually call the models programmatically.

Anthropic API↗ Direct

Direct Claude access. Best-in-class reasoning for governed production workloads.
OpenAI API↗ Direct

GPT + Whisper + embeddings + fine-tuning. The most mature LLM API on the market.
AWS Bedrock↗ Hyperscale

Unified API for Claude, Llama, Mistral, Titan. Sits cleanly inside existing AWS governance.
Azure OpenAI↗ Hyperscale

OpenAI models on Microsoft's enterprise compliance and identity stack.
Google Vertex AI↗ Hyperscale

Gemini + third-party models with full GCP integration. Strong MLOps tooling.
OpenRouter↗ Gateway

Unified API across every major provider. Swap models without changing integration code.
Groq↗ Fast inference

Custom LPU hardware for extremely low-latency open-model inference.
Fireworks AI↗ Fast inference

Optimized hosting for open-weight models with aggressive throughput and fine-tuning support.
Together AI↗ Open models

Hosted open-weight models with fine-tuning. Cheap scale for non-frontier workloads.
Replicate↗ Open models

One-click hosted inference for open models. Strong for image and audio workloads.
DeepInfra↗ Open models

Serverless inference for open-weight models. Pay-per-token with no capacity commitment.

Local Runtimes

Run models on your hardware. For privacy, cost, or compliance.

Ollama↗ Local

Run open-weight LLMs locally. Fastest path from zero to a working private-model environment.
LM Studio↗ Local

Desktop GUI for local model testing. Useful for evaluation and private workflows.
llama.cpp↗ Engine

C++ inference engine underneath most local LLM tooling. Runs on anything that boots.
vLLM↗ Server

High-throughput production inference server with PagedAttention. Industry default for self-hosting.
Text Generation Inference↗ Server

Hugging Face's production serving stack. First-class support for any HF model.

Orchestration

Multi-step reasoning, chains, and stateful LLM applications.

LangGraph↗ Graph

Stateful, graph-based orchestration. Fine-grained control over how agents reason and act.
LangChain↗ Framework

The original LLM app framework. Chains, retrievers, tools, memory. The batteries-included path.
LlamaIndex↗ RAG-first

Data framework for RAG and agents. Strong for document-heavy and knowledge-graph workflows.
Vercel AI SDK↗ TypeScript

TypeScript-first SDK for building AI UX. Streaming, tools, agents, and provider abstraction.
Semantic Kernel↗ .NET / Py

Microsoft's orchestration SDK. First-class across C#, Python, and Java.
DSPy↗ Compiler

Declarative prompting. Compile programs, not prompts. Optimize end-to-end against evals.
Haystack↗ RAG

Open source framework for search, RAG, and agent pipelines with strong pre-built components.
Flowise↗ Low-code

Open source drag-and-drop builder for LLM flows. Fast prototyping, deployable API endpoints.
Langflow↗ Visual

Visual LangChain builder. Good fit when non-engineers need to collaborate on agent flows.
n8n↗ Workflow

General-purpose workflow automation with strong AI node support. Self-hostable.

Agent Frameworks

Autonomous or semi-autonomous agents that plan and act.

Claude Agent SDK↗ Anthropic

Anthropic's agent SDK. Production primitives for building and deploying Claude-powered agents.
OpenAI Agents SDK↗ OpenAI

OpenAI's official agent framework. Handoffs, tool use, guardrails, tracing. Successor to Swarm.
Microsoft 365 Agents SDK↗ Microsoft

Microsoft's cross-platform SDK for building agents. Successor to Bot Framework, works across Teams, Copilot, web.
Copilot Studio Agent SDK↗ Microsoft

SDK for building agents that plug into Copilot Studio and the broader Microsoft 365 Copilot ecosystem.
Google Agent Development Kit (ADK)↗ Google

Google's open source multi-agent framework. First-class support for A2A protocol and Gemini.
CrewAI↗ Multi-agent

Role-based agent teams that collaborate on complex tasks. Fast path to useful multi-agent setups.
AutoGen↗ Microsoft Research

Microsoft Research's multi-agent framework. Strong academic and production lineage.
Pydantic AI↗ Typed Python

Type-safe agent framework from the Pydantic team. Production-ready, FastAPI-like ergonomics.
Mastra↗ TypeScript

TypeScript-native agent framework. Strong for building agents alongside web apps in the JS ecosystem.
Letta↗ Memory

Formerly MemGPT. Stateful agents with explicit long-term memory and self-editing context.
smolagents↗ Hugging Face

Minimalist agent framework from Hugging Face. Code-first agents with small dependency footprint.
OpenHands↗ Coding agent

Formerly OpenDevin. Fully autonomous software engineering agent running in a sandboxed environment.
Griptape↗ Python

Python framework for building agents with structured memory, tool use, and workflows.
Agno↗ Lightweight

Formerly Phidata. Lightweight agent framework with Python-native ergonomics.
Goose↗ CLI

Block's open source CLI agent. Extensible through MCP, strong for developer workflows.
Strands Agents↗ AWS

AWS-backed open source agent framework. Model-driven loop, first-class Bedrock and AWS integration.

Decision Platforms

Enterprise AI platforms with ontology and action layers baked in.

Palantir AIP↗ Decision

Ontology-anchored AI decision platform. LLMs with governed access to enterprise data, logic, and actions.
Palantir Foundry↗ Data OS

The operational system beneath AIP. Integration, ontology, and workflow layer for the enterprise.
Databricks Mosaic AI↗ Lakehouse AI

AI built on top of the lakehouse. Strong fit when data already lives in Databricks.

Protocols and Standards

The wire formats and contracts AI systems communicate over.

Model Context Protocol (MCP)↗ Anthropic

Open standard for giving AI governed, contextual access to tools, data, and systems.
OpenAI Function Calling↗ OpenAI

The original tool-use API. Still the widest-deployed spec for LLM function invocation.
A2A Protocol↗ Google

Agent-to-Agent communication protocol. Emerging standard for agent-ecosystem interop.
OpenAPI↗ Spec

Not AI-specific, but the lingua franca for describing APIs that agents need to call.

Generative Media

Image Generation

Text and image conditioning to produce still imagery.

Midjourney↗ Image

Discord and web-native image generation. Still the aesthetic benchmark for stylized output.
DALL-E↗ OpenAI

OpenAI's image model. Tight integration with ChatGPT, strong at prompt comprehension.
Stable Diffusion↗ Open weight

The open-weight foundation for most self-hosted image workflows. SDXL, SD3, etc.
Flux↗ Frontier

Black Forest Labs. Currently the strongest open-weight image model. Built by the Stable Diffusion team.
Imagen↗ Google

Google DeepMind's photorealism-focused image family, available through Vertex AI.
Ideogram↗ Typography

Best-in-class for in-image typography and coherent text rendering.
Adobe Firefly↗ Commercial-safe

Adobe's commercially safe generative stack. Trained on licensed content, deep Creative Cloud integration.
Leonardo.ai↗ Workflow

Image-focused workflow platform with fine-tuning, canvas editing, and production controls.

Video Generation

Text, image, and video-to-video models for motion output.

Runway↗ Pro video

Gen-3 and Gen-4 video models with a professional editing workflow. Used in actual commercial production.
Google Veo↗ Google

DeepMind's text-to-video model. Veo 3 delivers high-fidelity output with native audio generation.
Sora↗ OpenAI

OpenAI's text and image-to-video model. Strong for longer, coherent scenes with complex motion.
Pika↗ Consumer

Fast, playful text-to-video with strong effects library. Popular for short-form and social content.
Kling↗ Kuaishou

Chinese text-to-video model with strong realism and long clip support.
Luma Dream Machine↗ Luma

Text and image-to-video with fast iteration. Good at camera motion and scene dynamics.
Hailuo↗ MiniMax

MiniMax's video model. Strong character consistency, competitive on cost per clip.

Audio and Music

Voice synthesis, song generation, sound design.

Suno↗ Music

End-to-end song generation with vocals and instruments. The most capable music model available.
Udio↗ Music

Competitor to Suno with strong audio fidelity and genre flexibility.
ElevenLabs↗ Voice + TTS

Industry-standard voice synthesis and cloning. Production-ready voice agents and dubbing.
Stable Audio↗ Stability

Stability AI's audio generation. Open weights available, strong for sound design and loops.
Google Lyria↗ Google

DeepMind's music model. Available through MusicFX and Vertex AI.
MusicGen↗ Meta

Meta's open-weight music generation model. Self-hostable for private workflows.
AudioGen↗ Meta

Meta's open-weight ambient and effects audio model. Environmental sound from prompts.

3D Generation

Meshes, environments, and spatial assets from prompts.

Meshy↗ 3D models

Text and image to 3D mesh generation. Game-dev and visualization-ready outputs.
Luma Genie↗ Luma

Luma's 3D generator. Strong for 3D assets from photos or text prompts.
Rodin↗ Deemos

High-fidelity 3D generation focused on characters and organic models.
Blockade Labs↗ Skyboxes

AI-generated 360° skyboxes and environments. Used widely in game and VR workflows.
Tripo3D↗ 3D models

Fast text and image-to-3D generation with PBR materials for real-time rendering.

Creative Platforms

Workflow tools and UIs that wrap generative models.

ComfyUI↗ Node-based

Open source node-based workflow builder for Stable Diffusion and beyond. Industry standard for serious image pipelines.
Automatic1111↗ Open source

The original Stable Diffusion web UI. Massive extension ecosystem, still widely deployed.
Krea↗ Creative

Real-time image generation with a canvas-native creative workflow. Strong for iteration.
Freepik AI Suite↗ All-in-one

Unified interface across multiple gen models. Good for comparing outputs quickly.
Higgsfield↗ Motion

Cinematic video generation with camera-motion presets. Tuned for motion-design work.

Data & Retrieval

Vector Databases

Similarity search engines for embeddings.

Pinecone↗ Managed

Managed vector database for RAG. Low-latency similarity search at production scale.
Weaviate↗ Open source

Vector database with hybrid search, schema-awareness, and graph capabilities.
Qdrant↗ Open source

Rust-based vector engine. Fast, lightweight, strong for self-hosted deployments.
Chroma↗ Open source

Dev-friendly open source vector database. Zero-config start, embeddable.
Milvus↗ Scale

Cloud-native vector database built for billion-scale workloads.
LanceDB↗ Embedded

Embedded serverless vector database. Columnar, versioned, strong for multimodal workloads.
pgvector↗ Postgres

Vector search extension for Postgres. Often the right answer when Postgres is already in play.
Redis Vector Search↗ Redis

Vector search in Redis. Useful when Redis is already the cache or session store in your stack.
MongoDB Atlas Vector Search↗ MongoDB

Native vector search in MongoDB Atlas. Best fit when document data is already in Mongo.
Elasticsearch↗ Elastic

Dense vector + hybrid search in Elasticsearch. Strong when combined with full-text at scale.

Knowledge Graphs

Structured representation of entities and relationships.

Neo4j↗ Property graph

The industry standard graph database. Strong tooling, mature ecosystem, Cypher query language.
TigerGraph↗ Analytics

High-performance analytics graph. Strong for deep-link analysis and fraud detection.
ArangoDB↗ Multi-model

Graph, document, and key-value in one engine. Useful when graph is part of a broader pattern.

Data Infrastructure

Storage, transformation, and movement of data at scale.

Snowflake↗ Warehouse

Cloud data warehouse with strong data sharing, governance, and now native AI functions.
Databricks↗ Lakehouse

Unified analytics and AI platform on top of open storage formats (Delta, Iceberg).
dbt↗ Transform

SQL-based data transformation with version control and tests. The default for modern ELT.
Airbyte↗ Ingestion

Open source data ingestion. Large connector catalog, self-hostable.
Apache Kafka↗ Streaming

The streaming event backbone most production architectures eventually need.
Apache Iceberg↗ Table format

Open table format for large analytic datasets. The neutral ground between warehouses.

Data Governance and Catalogs

Lineage, classification, access, audit.

Collibra↗ Catalog

Enterprise data catalog and governance platform. Default in regulated industries.
Atlan↗ Modern catalog

Modern data catalog with strong lineage, collaboration, and API-first design.
OpenLineage↗ Open standard

Open standard for collecting data lineage from pipelines. Vendor-neutral.
Unity Catalog↗ Databricks

Databricks-native governance layer. Fine-grained access control across data and AI assets.

Infrastructure

Cloud Providers

Where the workloads run.

AWS↗ Hyperscale

The broadest cloud surface area. Strong for heterogeneous enterprise architectures.
Azure↗ Hyperscale

Default cloud for Microsoft-aligned enterprises. Tight identity and M365 integration.
Google Cloud↗ Hyperscale

Strongest data and AI primitives. Gemini, BigQuery, Vertex AI all first-class.
Oracle Cloud Infrastructure↗ Enterprise

Fits Oracle-heavy shops. Strong database integration and competitive egress pricing.
IBM Cloud↗ Enterprise

WatsonX AI and regulated-industry positioning. Hybrid and mainframe-adjacent workloads.

GPU Compute and Model Hosting

Raw GPU compute and specialized inference hosting. Where generative and custom model workloads actually run.

RunPod↗ GPU cloud

On-demand and serverless GPU compute. One of the cheapest paths to run custom model workloads.
Fal.ai↗ Fast inference

Optimized inference for generative models. Sub-second image and video generation at API scale.
Modal↗ Serverless

Serverless Python functions with GPU support. Clean path from notebook to production.
Baseten↗ Model deploy

Model deployment platform. Strong for serving custom and fine-tuned open-weight models.
CoreWeave↗ GPU infra

GPU-specialized cloud infrastructure. Favored for large-scale training and inference.
Lambda↗ GPU cloud

GPU cloud for training and inference. Competitive pricing, direct NVIDIA partnerships.

Containers, Orchestration, IaC

Packaging, deploying, and describing systems declaratively.

Kubernetes↗ Orchestration

Container orchestration. The substrate most production AI workloads end up running on.
OpenShift↗ Enterprise K8s

Red Hat's enterprise Kubernetes distribution. Default in regulated and hybrid-cloud shops.
Docker↗ Containers

Container runtime and image format. Still the baseline for packaging workloads.
Terraform↗ IaC

Infrastructure as code. The baseline for reproducible, version-controlled cloud environments.
Ansible↗ Config

Agentless configuration management. Still the fastest way to automate existing systems.
ArgoCD↗ GitOps

Declarative, Git-driven continuous deployment for Kubernetes.
Helm↗ K8s

Package manager for Kubernetes. Charts are how most production apps get templated.

Application Performance Monitoring

General-purpose APM and observability. Traces, metrics, logs across applications and infrastructure.

SigNoz↗ Open source

OpenTelemetry-native open source APM. Traces, metrics, and logs in one UI. Self-hostable alternative to Datadog.
Datadog↗ Enterprise

Industry-standard SaaS observability. Broad coverage across infrastructure, APM, logs, and security.
New Relic↗ Enterprise

Long-established APM and full-stack observability. Consumption-based pricing across telemetry types.
Grafana + Prometheus↗ Open source

The de facto open source stack for metrics, dashboards, and alerting. Runs anywhere.
Honeycomb↗ High-cardinality

Event-oriented observability for complex distributed systems. Strong for debugging production unknowns.
Elastic APM↗ ELK

APM inside the Elastic stack. Fits naturally when logs and search already live in Elasticsearch.
Dynatrace↗ Enterprise

AI-powered enterprise observability. Strong auto-instrumentation and root-cause analysis.
Sentry↗ Errors + Perf

Error tracking and performance monitoring focused on application code. The default for front-end and app errors.
OpenTelemetry↗ Open standard

The vendor-neutral telemetry standard. Instrument once, export anywhere. Backbone of modern observability.

LLM Observability and Evals

Measurement and monitoring specific to LLM workloads. Prompt tracing, evaluations, and cost tracking.

LangSmith↗ Tracing + Evals

LangChain's observability and evaluation platform. Tight integration with their stack.
Langfuse↗ Open source

Open source LLM engineering platform. Tracing, evals, prompt management.
Braintrust↗ Evals

Evaluation-first platform for AI products. Strong for iteration speed on production prompts.
Arize Phoenix↗ Open source

Open source observability for LLMs. Visual traces, evaluations, datasets.
Helicone↗ Gateway

LLM gateway with logging, caching, and cost tracking built in. Drop-in proxy.

Practices

Methodologies

The patterns we apply. Vendor-neutral thinking that outlives any specific tool.

Ontology Design

Formal modeling of business objects, relationships, and logic. The anchor for every downstream AI system.
Retrieval-Augmented Generation (RAG)

LLM responses grounded in retrieved context from your own data. The baseline pattern for enterprise AI.
Agentic Orchestration

Multi-step reasoning where AI plans, calls tools, evaluates results, and iterates. Beyond single-shot prompts.
Human-in-the-Loop (HITL)

AI proposes, humans approve, systems execute. Accountability without slowing the flywheel.
Purpose-Based Access Control

Access scoped by role, data classification, and intent. Not just who. Also why.
Model Evaluations

Systematic measurement of model behavior over time. The difference between production AI and a hopeful pilot.
Semantic Data Classification

Automatic tagging of data by type, sensitivity, and meaning. Makes governance scale past manual review.
GitOps and Infrastructure as Code

Git as the source of truth for infrastructure and deployment. Auditable, reproducible, rollback-able.
Knowledge Graphs

Relational representation of entities and their links. Gives AI structural context beyond bag-of-words retrieval.
MCP Tool Design

Wrapping existing business logic, APIs, and data as callable tools for AI agents through MCP servers.
Fine-tuning vs RAG

A diagnostic framing: fine-tune for style and format, retrieve for facts. Conflating them burns budget.
Guardrails and Policy Enforcement

Deterministic checks around non-deterministic models. Input filtering, output validation, scope enforcement.

Reading

Specifications, Essays, and References

Outside thinking worth your time.

Palantir AIP platform overview↗ Platform

How Palantir frames the decision platform: ontology, tools, actions, scenarios, guardrails.
MCP specification↗ Spec

The Anthropic-authored open standard for AI-to-system integration. Read it before you build.
Anthropic Engineering Blog↗ Research

Practical guidance from the team building Claude. Strong on agents and evaluation patterns.
OpenAI Cookbook↗ Recipes

Practical, runnable examples from OpenAI. Still the best source for day-one integration patterns.

Have a platform or pattern we should know?

We update this as the landscape moves. If there is something production-proven we missed, tell us.

Send us a pointer → Take the Assessment

The tools weactually build with.