Build with Private AI.

Integrate RedPill's Private AI into your app with a simple API. Access dozens of AI models through one secure endpoint. No more juggling multiple AI APIs or worrying about data compliance.

Key Features for Developers

Unified API for 60+ Models

One API key unlocks GPT-4, Claude, Llama, Mistral and more. No vendor lock-in - switch models or use Smart Router to auto-select the best model per request.

Privacy & Security Built-In

All API calls are processed in confidential enclaves. Feed sensitive data to the API and even we can't read it. Ideal for healthcare, legal, or enterprise apps.

Simple SDKs & Docs

SDKs available in Python, JavaScript, and more. Robust REST API with clear documentation. Get started in minutes with our quickstart guides.

Example Use Cases

Add a confidential AI assistant to your app. Process user data with AI without storing it. Use RedPill as a secure backend for chatbots or automation.

Flexible Deployment

Enterprise options for dedicated private instances. Deploy on-prem or in your VPC for maximum control and compliance with your organization's policies.

Performance & Cost Controls

Smart Router ensures efficient model usage. Save costs by routing to appropriate models per request. Rate limits and flexible pricing tiers available.

Just a few lines of code

YOUR CODE.
OUR PRIVACY.

Integrate private AI into your app with simple SDKs. OpenAI-compatible API means minimal code changes to switch from other providers.

View Full Docs
redpill-chat.js
// Node.js / JavaScript SDKimport RedPill from 'redpill-sdk';const client = new RedPill({  apiKey: process.env.REDPILL_API_KEY});// Simple chat completionconst response = await client.chat.completions.create({  model: 'gpt-4',  messages: [    { role: 'user', content: 'Summarize this contract' }  ]});console.log(response.choices[0].message.content);// With streamingconst stream = await client.chat.completions.create({  model: 'claude-3-opus',  messages: [{ role: 'user', content: 'Write a haiku' }],  stream: true});for await (const chunk of stream) {  process.stdout.write(chunk.choices[0]?.delta?.content || '');}

Explore AI Models

From private models in GPU TEE to all your favorites.

phala logo
Phala: Gemma-4 26B-A4B Uncensored (Heretic)
NewGPU TEE
Uncensored "Heretic" variant of google/gemma-4-26B-A4B-it created using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method and row-norm preservation. Refusals drop from 100/100 to 11/100 with KL divergence 0.0499 vs the base model. The base Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total / 3.8B active parameters (8 active / 128 total experts), 30-layer transformer with hybrid local sliding (1024) + global attention, supporting a 256K context window. Natively multimodal (text + images, variable aspect ratios). Strong on coding, reasoning, function calling, with native system prompt support across 35+ languages. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; vLLM-compatible FP8-Static quantization by cloud19 (router excluded from quantization).
by phala|66K context|$0.15/M input|$0.70/M output
Intel TDXNVIDIA CC
phala logo
Phala: Qwen3.6 35B-A3B Uncensored (Aggressive)
NewGPU TEE
Uncensored "Aggressive" variant of Qwen3.6-35B-A3B from Alibaba's Qwen team. The fine-tune by HauhauCS removes refusal behaviors (0/465 refusals) without modifying datasets or core capabilities. The base architecture is a 35B-parameter Mixture-of-Experts model with 256 experts routing 8 per token (~3B active params), 40 layers, and a hybrid linear+full-softmax attention mechanism (3:1 ratio). Supports a native 262K context and is natively multimodal across text, images, and video. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; FP8 quantization by lamianlbe.
by phala|131K context|$0.30/M input|$1.50/M output
Intel TDXNVIDIA CC
qwen logo
Qwen: Qwen3.5-27B
GPU TEE
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
by phala|262K context|$0.30/M input|$2.40/M output
Intel TDXNVIDIA CC
z-ai logo
Z.AI: GLM 4.7 Flash
GPU TEE
As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.
by phala|203K context|$0.10/M input|$0.43/M output
Intel TDXNVIDIA CCBETA
qwen logo
Qwen: Qwen3 Embedding 8B
GPU TEE
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.
by phala|33K context|$0.01/M input|$0.00/M output
Intel TDXNVIDIA CC
phala logo
Phala: Venice Uncensored 24B
GPU TEE
Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving user control over alignment, system prompts, and behavior. Intended for advanced and unrestricted use cases, Venice Uncensored emphasizes steerability and transparent behavior, removing default safety and alignment layers typically found in mainstream assistant models.
by phala|33K context|$0.20/M input|$0.90/M output
Intel TDXNVIDIA CC

Start Building.

API Documentation

Comprehensive guides, API references, and tutorials to help you integrate RedPill into your applications. Try the interactive playground or get a free API key.

Developer Community

Join our Discord community to connect with other developers, get help with integration questions, and share what you're building with RedPill.

Ready to experience private AI?

Try RedPill in our Private AI Playground - no signup needed. Your conversations stay encrypted and completely private.

Try RedPill Free
Private Chat
E2E Encrypted
AI
Hi! I'm your private AI assistant. Ask me anything - your conversations are fully encrypted.
Zero data retentionTEE secured