Token costs, margin math, and why most AI startups underprice their products — and how to fix it.

How to Price an AI SaaS Product

Pricing AI products is harder than pricing traditional SaaS because your cost of goods is variable. A traditional SaaS product has near-zero marginal cost per user. An AI product pays per token, per embedding call, per vector search. Get pricing wrong and more customers means less profit — you scale your costs faster than your revenue.

Start with Your Unit Economics

Before setting a price, know your cost per conversation. For aiassist.chat, a typical support conversation involves:

| Cost component | Unit cost | Per conversation | |---|---|---| | Embedding (user message, ~50 tokens) | $0.02/1M tokens | ~$0.000001 | | Vector search (Qdrant managed) | $0.08/1M vectors/month | ~$0.0001 | | LLM input (~800 tokens with context) | $0.25/1M tokens (Haiku) | ~$0.0002 | | LLM output (~200 tokens) | $1.25/1M tokens (Haiku) | ~$0.00025 | | Total | | ~$0.0006 |

At 500 conversations/month on a $29/month starter plan, your infrastructure cost is $0.30 — a 99% gross margin. At 5,000 conversations on a $99/month plan, you're at $3 in costs — still excellent. At 50,000 conversations on a $299 plan, you're at $30 in variable costs.

The math works — but only if your pricing tiers are calibrated against these numbers. Pure flat-rate pricing is a trap because a small number of heavy users can consume disproportionate cost.

Detailed Cost Breakdown at Scale

Here's how the full unit economics look across tiers for an AI chat product with these costs:

| Plan | Price/mo | Conversations | Variable cost | Gross margin | |---|---|---|---|---| | Starter | $29 | 500 | $0.30 | 99% | | Growth | $99 | 5,000 | $3.00 | 97% | | Pro | $299 | 25,000 | $15.00 | 95% | | Scale | $799 | 100,000 | $60.00 | 92% | | Enterprise | Custom | Custom | Variable | 85–90% |

The margins compress as volume grows. This is expected and healthy — the product delivers more value at higher volumes, and you're passing some of that efficiency back to the customer. The danger is underpricing the middle tiers where fixed overhead (support, infrastructure) per customer is highest relative to revenue.

The Psychology of Pricing Pages

The most common pricing page mistake is feature differentiation that customers don't care about. Listing "up to 3 knowledge base integrations" on the starter plan and "unlimited integrations" on the pro plan doesn't convert if customers don't know what an integration is yet.

What actually drives plan selection:

Conversation volume — customers understand this immediately. They can estimate how many support chats their site gets per month.
Number of sites/widgets — relevant for agencies and businesses with multiple properties
Team seats — relevant for collaborative management
Analytics depth — advanced reporting as a Pro feature

Keep the pricing page to three plans, maximum four. Highlight the middle plan. The top plan should exist to make the middle plan look like the sensible choice, not because most customers buy it.

Price anchoring works: an $799/month "Scale" plan makes a $299/month "Pro" plan feel like a deal, even if the $299 plan is where your median enterprise customer will land.

Annual vs. Monthly Billing Incentives

Annual billing has two benefits: improved cash flow for you and lower churn risk from customers. The standard incentive is 2 months free (a ~17% discount). The more effective incentive is framing:

"Get 2 months free" — perceived as a savings offer
"$299/month billed annually ($250/month)" — perceived as a lower rate

Both describe the same discount. The second framing converts better with B2B buyers who are comparing monthly line items across vendors.

For SaaS products with high setup cost per customer, annual billing should be your default checkout suggestion, not an opt-in. In aiassist.chat's checkout flow, annual billing is pre-selected with a visible "save $XX" label. Monthly billing is available but requires a click to select it.

Handling Overage Notifications Without Damaging Trust

Customers hate billing surprises more than they hate overages. The solution is not to prevent overages — it's to make them completely transparent and controllable.

Our overage system works in three stages:

80% threshold email — "You've used 80% of your conversation limit this month. At your current pace, you'll hit your limit in ~6 days."
95% threshold email + in-app banner — "You're almost at your conversation limit. Upgrade now to avoid interruption, or set an overage budget to continue automatically."
At 100% — Widget shows a fallback message ("Chat is temporarily unavailable — please email us") unless the tenant has enabled overage billing.

Tenants who opt into automatic overages are charged $0.01 per additional conversation above their plan limit — a rate that's more expensive per-conversation than any plan tier, which provides strong incentive to upgrade rather than rely on overages as a pricing strategy.

Hard cutoffs without warning produce support escalations and churn. Transparent caps with self-service upgrade paths produce upgrades.

Enterprise Custom Pricing: When to Do It

"Contact us for enterprise pricing" should not be a euphemism for "we don't know what to charge." Enterprise custom pricing is appropriate when:

The customer requires a custom SLA (response time guarantees, uptime commitments beyond standard 99.9%)
Security compliance review is required (SOC 2 report sharing, questionnaire completion, procurement review)
Custom integration work is needed
Volume is high enough that the standard tiers are clearly wrong (150k+ conversations/month)

When you agree to enterprise pricing, price for the relationship cost, not just the infrastructure. A $50k/year enterprise contract that requires 20 hours/month of support, a dedicated Slack channel, and quarterly business reviews has very different economics than the same $50k from a customer who self-serves.

The floor for enterprise custom pricing should be 3–5x your highest published tier. Below that, the customer should use a standard plan.

Metered Billing Infrastructure

Usage-based billing requires infrastructure that most teams underestimate:

Usage counters — track consumption in real time, not just at billing cycle end. We use Redis counters keyed by tenant and billing period.
Idempotent billing events — conversation events need to be deduplicated. A retry of a failed API call should not double-bill. We use conversation_id as an idempotency key in the billing ledger.
Stripe Billing integration — for metered billing, Stripe's Usage Records API lets you report consumption and bill at month-end. Set up a cron job that reports usage and verify it reconciles against your internal counters.
Self-service usage visibility — tenants should be able to see their current period consumption at any time from the dashboard. Opaque billing is a trust killer.

// Record a conversation usage event (Rust example)
pub async fn record_conversation(
    db: &Pool<Postgres>,
    redis: &RedisPool,
    tenant_id: Uuid,
    conversation_id: Uuid,
) -> Result<()> {
    // Idempotent: skip if already recorded
    let inserted = sqlx::query!(
        "INSERT INTO billing_events (tenant_id, conversation_id, occurred_at)
         VALUES ($1, $2, NOW())
         ON CONFLICT (conversation_id) DO NOTHING
         RETURNING id",
        tenant_id,
        conversation_id,
    )
    .fetch_optional(db)
    .await?;

    if inserted.is_some() {
        // Increment real-time counter in Redis
        let key = format!("usage:{}:{}", tenant_id, current_billing_period());
        redis.incr(&key).await?;
    }

    Ok(())
}

Building this infrastructure correctly upfront costs one sprint. Building it correctly after you've shipped and have real customers costs three sprints and two billing incidents.