Phala: Gemma-4 26B-A4B Uncensored (Heretic)

GPU TEE

phala/gemma-4-26b-a4b-uncensored

Created May 23, 2026|66K context|$0.15/M input tokens|$0.70/M output tokens

Intel TDXNVIDIA CC

Uncensored "Heretic" variant of google/gemma-4-26B-A4B-it created using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method and row-norm preservation. Refusals drop from 100/100 to 11/100 with KL divergence 0.0499 vs the base model. The base Gemma 4 26B A4B is a Mixture-of-Experts model with 25.2B total / 3.8B active parameters (8 active / 128 total experts), 30-layer transformer with hybrid local sliding (1024) + global attention, supporting a 256K context window. Natively multimodal (text + images, variable aspect ratios). Strong on coding, reasoning, function calling, with native system prompt support across 35+ languages. Served on Phala in TDX-attested H200 enclave with end-to-end ECDSA response signing; vLLM-compatible FP8-Static quantization by cloud19 (router excluded from quantization).

Providers for Phala: Gemma-4 26B-A4B Uncensored (Heretic)

RedPill routes requests across these providers with automatic fallbacks to maximize uptime. Pricing is unified — you pay the same price no matter which provider serves your request.

Total Context

66K

Input

$0.15/M

Output

$0.70/M

Provider	TTFT	Throughput	Uptime
phala

API

RedPill provides a unified completion API to all models & providers that you can call directly, or using the OpenAI SDK. Additionally, some third-party SDKs are available.

fetch("https://api.redpill.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer <YOUR-REDPILL-API-KEY>",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "phala/gemma-4-26b-a4b-uncensored",
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ]
  })
})

Verify Evidence

Confidential GPU-TEE responses carry two proof layers you can check yourself: a nonce-bound attestation report for the gateway, and a signed receipt that binds your request and response to an attested upstream session.

# 1. Attest the gateway (nonce-bound, proves which TEE workload serves you)
NONCE="$(openssl rand -hex 16)"
curl -s "https://api.redpill.ai/v1/aci/attestation?nonce=$NONCE" \
  -H "Authorization: Bearer $REDPILL_API_KEY" -o report.json

# 2. Call the model and capture the x-receipt-id response header
curl -s "https://api.redpill.ai/v1/chat/completions" -D headers.txt \
  -H "Authorization: Bearer $REDPILL_API_KEY" -H "Content-Type: application/json" \
  -d '{"model":"phala/gemma-4-26b-a4b-uncensored","messages":[{"role":"user","content":"Hello"}]}' -o response.json
RECEIPT_ID="$(grep -i ^x-receipt-id headers.txt | tr -d '\r' | awk '{print $2}')"

# 3. Fetch the signed receipt, then follow it to the attested session
curl -s "https://api.redpill.ai/v1/aci/receipts/$RECEIPT_ID" \
  -H "Authorization: Bearer $REDPILL_API_KEY" -o receipt.json
SESSION_ID="$(jq -r '.event_log[]|select(.type=="upstream.verified").session_id' receipt.json)"
curl -s "https://api.redpill.ai/v1/aci/sessions/$SESSION_ID" \
  -H "Authorization: Bearer $REDPILL_API_KEY"

Full verification walkthrough →

The confidential AI cloud: verifiable inference with attestation reports, signed receipts, audit sessions, and E2EE paths.

Phala: Gemma-4 26B-A4B Uncensored (Heretic)

Providers for Phala: Gemma-4 26B-A4B Uncensored (Heretic)

API

Products

Developers

Resources