complext — Project Fit & Success Probabilities

1) High‑level probability model

Definitions

F: Task is feasible with current tech + data (binary)
Q: Input quality (docs, data, APIs) on 0–1 scale
C: Clarity/constraint of the task spec on 0–1 scale
H: Human‑in‑the‑loop availability (0 none → 1 always)
S: Success (task completed to spec within guardrails)

Top‑line equation

P(S) = P(S\,|\,F)\,P(F) + P(S\,|\,\neg F)\,[1 - P(F)]

We aim to maximize P(F) at intake and keep P(S|¬F) ~ 0 via guardrails (decline or re‑scope).

Success conditional on feasibility

P(S\,|\,F) \approx w_Q Q + w_C C + w_H H - w_R R

Where R is risk/ambiguity (0–1). We tune weights by domain; defaults: w_Q=0.35, w_C=0.35, w_H=0.20, w_R=0.10.

Feasibility prior (Bayes)

P(F\,|\,E) = \dfrac{P(E\,|\,F)\,P(F)}{P(E\,|\,F)\,P(F) + P(E\,|\,\neg F)\,[1 - P(F)]}

Evidence E includes: historical wins in domain, API coverage, benchmark tasks, and prototype spikes.

2) Decision policy (accept vs. re‑scope vs. decline)

Accept

Rule: \( \hat P(S) \ge T_S \) and positive expected value.

EV = p_S \cdot V - (1 - p_S) \cdot C_{fail} - C_{build} \;\;\Rightarrow\;\; EV > 0

Typical threshold T_S ∈ [0.7, 0.85] depending on criticality.

Re‑scope

Increase C, improve Q, add human checks (H) to raise \(\hat P(S)\).

Decline

Low P(F) or unacceptable R; we avoid moonshots without a research track.

3) Scenarios

A) Lead Intake (Pro Services)

APIs exist, data is clean, task is constrained. Handoff to calendar + CRM.

Assume: P(F)=0.9, Q=0.85, C=0.8, H=0.6, R=0.2 \Rightarrow P(S|F) ≈ .35(.85)+.35(.8)+.20(.6)−.10(.2)= 0.2975+0.28+0.12−0.02 = 0.6775 P(S) ≈ 0.6775·0.9 + 0·0.1 = 0.6098

Re‑scope to add human review on qualification (raise H→0.8) and tighten prompts (C→0.9): then P(S|F) ≈ 0.35(.85)+0.35(.9)+0.20(.8)−.10(.2) = 0.2975+0.315+0.16−0.02 = 0.7525 → P(S) ≈ 0.677.

B) Policy Retrieval (Healthcare)

Retrieval from payer policies with strict accuracy; HITL required.

Assume: P(F)=0.8, Q=0.7, C=0.9, H=0.9, R=0.3 P(S|F) ≈ .35(.7)+.35(.9)+.20(.9)−.10(.3)= 0.245+0.315+0.18−0.03 = 0.71 P(S) ≈ 0.71·0.8 = 0.568

Tighten sources and templates (Q→0.85), keep HITL: P(S|F)≈.35(.85)+.35(.9)+.20(.9)−.10(.3)=0.2975+0.315+0.18−0.03=0.7625 → P(S)≈0.61. Accept if EV positive and threshold ≤ 0.6; else split into narrower intents.

C) Generative Media Summaries

Loose inputs, subjective outputs; success relies on C & H.

Assume: P(F)=0.7, Q=0.6, C=0.75, H=0.7, R=0.35 P(S|F) ≈ .35(.6)+.35(.75)+.20(.7)−.10(.35)= 0.21+0.2625+0.14−0.035 = 0.5775 P(S) ≈ 0.5775·0.7 = 0.404

Recommendation: re‑scope to niche formats (tight C), add checklists (raise H), or decline.

4) Evaluation metrics

Task‑level outcomes

	Feasible (F)	Not Feasible (¬F)
Agent succeeds	True Positive	False Positive (guardrails aim ≈ 0)
Agent fails	False Negative (re‑scope)	True Negative (decline)

Program KPIs

Acceptance rate (projects meeting threshold)
Observed success rate (by domain & spec tightness)
Median time‑to‑first‑value
Human review effort / task
Cost per successful task vs. baseline

5) Estimate your project’s success (interactive)

Adjust the sliders to explore how input quality (Q), clarity (C), human review (H), risk (R), and feasibility prior P(F) affect the success probability \(\hat P(S)\).

Weights

Defaults: w_Q=0.35, w_C=0.35, w_H=0.20, w_R=0.10

P(F)

Feasibility prior: 0.80

Q

Input quality: 0.80

C

Spec clarity: 0.85

H

Human-in-the-loop: 0.70

R

Risk/ambiguity: 0.20

Results

P(S|F)

0.00

\(\hat P(S)\)

0.00

Decision

—

Decision uses threshold T_S=0.7 by default. Tweak your sliders to see how scoping changes the outcome.

Value (V)

Per-success value units: 50

Cost of failure (C_fail)

Penalty units: 15

Build cost (C_build)

One-time units: 20

Expected Value

0.000

5) Our promise

We only take probable use‑cases

We partner where feasibility is high and the value is clear. Otherwise, we propose a research spike or politely decline.

Transparent math

Every proposal includes our current \(\hat P(S)\), the levers to raise it (Q, C, H), risk factors R, and an EV check. If the numbers don’t add up, we won’t recommend it.