Project Fit & Success Probabilities

We only take on use‑cases with a high probability of success. This page shows our high‑level model for estimating when a chat‑agent solution is likely to succeed, and when to decline or re‑scope.

1) High‑level probability model

Definitions

  • F: Task is feasible with current tech + data (binary)
  • Q: Input quality (docs, data, APIs) on 0–1 scale
  • C: Clarity/constraint of the task spec on 0–1 scale
  • H: Human‑in‑the‑loop availability (0 none → 1 always)
  • S: Success (task completed to spec within guardrails)

Top‑line equation

P(S) = P(S\,|\,F)\,P(F) + P(S\,|\,\neg F)\,[1 - P(F)]

We aim to maximize P(F) at intake and keep P(S|¬F) ~ 0 via guardrails (decline or re‑scope).

Success conditional on feasibility

P(S\,|\,F) \approx w_Q Q + w_C C + w_H H - w_R R

Where R is risk/ambiguity (0–1). We tune weights by domain; defaults: w_Q=0.35, w_C=0.35, w_H=0.20, w_R=0.10.

Feasibility prior (Bayes)

P(F\,|\,E) = \dfrac{P(E\,|\,F)\,P(F)}{P(E\,|\,F)\,P(F) + P(E\,|\,\neg F)\,[1 - P(F)]}

Evidence E includes: historical wins in domain, API coverage, benchmark tasks, and prototype spikes.

2) Decision policy (accept vs. re‑scope vs. decline)

Accept

Rule: \( \hat P(S) \ge T_S \) and positive expected value.

EV = p_S \cdot V - (1 - p_S) \cdot C_{fail} - C_{build} \;\;\Rightarrow\;\; EV > 0

Typical threshold T_S ∈ [0.7, 0.85] depending on criticality.

Re‑scope

Increase C, improve Q, add human checks (H) to raise \(\hat P(S)\).

Decline

Low P(F) or unacceptable R; we avoid moonshots without a research track.

3) Scenarios

A) Lead Intake (Pro Services)

APIs exist, data is clean, task is constrained. Handoff to calendar + CRM.

Assume: P(F)=0.9, Q=0.85, C=0.8, H=0.6, R=0.2 \Rightarrow P(S|F) ≈ .35(.85)+.35(.8)+.20(.6)−.10(.2)= 0.2975+0.28+0.12−0.02 = 0.6775 P(S) ≈ 0.6775·0.9 + 0·0.1 = 0.6098

Re‑scope to add human review on qualification (raise H→0.8) and tighten prompts (C→0.9): then P(S|F) ≈ 0.35(.85)+0.35(.9)+0.20(.8)−.10(.2) = 0.2975+0.315+0.16−0.02 = 0.7525 → P(S) ≈ 0.677.

B) Policy Retrieval (Healthcare)

Retrieval from payer policies with strict accuracy; HITL required.

Assume: P(F)=0.8, Q=0.7, C=0.9, H=0.9, R=0.3 P(S|F) ≈ .35(.7)+.35(.9)+.20(.9)−.10(.3)= 0.245+0.315+0.18−0.03 = 0.71 P(S) ≈ 0.71·0.8 = 0.568

Tighten sources and templates (Q→0.85), keep HITL: P(S|F)≈.35(.85)+.35(.9)+.20(.9)−.10(.3)=0.2975+0.315+0.18−0.03=0.7625 → P(S)≈0.61. Accept if EV positive and threshold ≤ 0.6; else split into narrower intents.

C) Generative Media Summaries

Loose inputs, subjective outputs; success relies on C & H.

Assume: P(F)=0.7, Q=0.6, C=0.75, H=0.7, R=0.35 P(S|F) ≈ .35(.6)+.35(.75)+.20(.7)−.10(.35)= 0.21+0.2625+0.14−0.035 = 0.5775 P(S) ≈ 0.5775·0.7 = 0.404

Recommendation: re‑scope to niche formats (tight C), add checklists (raise H), or decline.

4) Evaluation metrics

Task‑level outcomes

Feasible (F)Not Feasible (¬F)
Agent succeedsTrue PositiveFalse Positive (guardrails aim ≈ 0)
Agent failsFalse Negative (re‑scope)True Negative (decline)

Program KPIs

  • Acceptance rate (projects meeting threshold)
  • Observed success rate (by domain & spec tightness)
  • Median time‑to‑first‑value
  • Human review effort / task
  • Cost per successful task vs. baseline

5) Estimate your project’s success (interactive)

Adjust the sliders to explore how input quality (Q), clarity (C), human review (H), risk (R), and feasibility prior P(F) affect the success probability \(\hat P(S)\).

Weights

Defaults: wQ=0.35, wC=0.35, wH=0.20, wR=0.10

P(F)

Feasibility prior: 0.80

Q

Input quality: 0.80

C

Spec clarity: 0.85

H

Human-in-the-loop: 0.70

R

Risk/ambiguity: 0.20

Results

P(S|F)

0.00

\(\hat P(S)\)

0.00

Decision

Decision uses threshold TS=0.7 by default. Tweak your sliders to see how scoping changes the outcome.

Value (V)

Per-success value units: 50

Cost of failure (C_fail)

Penalty units: 15

Build cost (C_build)

One-time units: 20

Expected Value

0.000

5) Our promise

We only take probable use‑cases

We partner where feasibility is high and the value is clear. Otherwise, we propose a research spike or politely decline.

Transparent math

Every proposal includes our current \(\hat P(S)\), the levers to raise it (Q, C, H), risk factors R, and an EV check. If the numbers don’t add up, we won’t recommend it.