Probability & Expected Value

Quick Revisit
USE WHEN: problem asks "expected value" or "probability of X" — especially with linearity of expectation
INVARIANT: E[X₁+X₂+...] = E[X₁]+E[X₂]+... regardless of dependence (linearity of expectation)
TIME: O(n) for linearity-of-expectation reframe; O(n²) for expected-value DP | SPACE: O(n)
CLASSIC BUG: forgetting modular inverse for probability under mod — P = a/b becomes a·b⁻¹ mod p
PRACTICE: 08-ladder-mixed

Linearity of expectation, geometric distribution, expected value DP, and probability on graphs. The single most underrated toolkit at CF 1600+ -- once you see it, half the "hard math" problems become straightforward.

Difficulty: [Intermediate]Prerequisites: Modular Arithmetic, 1-D DPCF Rating Range: 1600-2200

Three modes — pick one before you write code

Contest probability problems land in one of three modes. Recognising which mode applies determines the entire solution shape:

Mode	Trigger	Tool
1. Direct probability counting	Finite uniform sample space, "what is P(event)?"	$\frac{
2. Expected value via indicators	"Expected number of [X]" with linearity of expectation	$E [\sum X_{i}] = \sum P (X_{i} = 1)$ . Pick the right indicator events; each individual probability is usually trivial.
3. Expected value DP / equations	Random walks, absorbing states, "expected steps until..."	Set up $E [s] = cost (s) + \sum p (s \to t) \cdot E [t]$ . Linear-system on cycles, simple recurrence on DAGs.

Reach for indicators before DP — many "expected value DP" problems collapse to a one-line indicator sum once you spot the right $X_{i}$ .

Modular probability — concrete example. Under modulus $p = 10^{9} + 7$ , the probability $\frac{1}{6}$ is not 1.0/6.0 and certainly not the integer 0. It is mod_inv(6) = pow(6, p-2, p) = 166666668. Verify: $6 \cdot 166666668 mod (10^{9} + 7) = 1$ . Every fractional probability in a modular problem becomes a multiplication by the modular inverse of the denominator. If you ever see (int)(prob * something) mixing floats and mod, the answer is wrong even if the test passes.

Intuition
Visual Intuition (ASCII Art)
Foundational Probability
C++ STL Reference
Implementation (Contest-Ready)
Operations Reference
Problem Patterns & Tricks
Contest Cheat Sheet
Gotchas & Debugging
Self-Test
Practice Problems
Further Reading

Intuition

Roll a fair 6-sided die until you get a 6. How many rolls does it take on average?

Naive approach: simulate $10^{6}$ trials in code and average the counts.

Trial 1:  3 5 2 6           -> 4 rolls
Trial 2:  1 6               -> 2 rolls
Trial 3:  4 4 3 2 5 1 6     -> 7 rolls
Trial 4:  6                  -> 1 roll
Trial 5:  2 3 6             -> 3 rolls
...
Average over 10^6 trials    ~= 6.0

You get approximately 6.0, but the contest asks for the exact answer as a fraction or modulo $10^{9} + 7$ . Simulation cannot prove $E = 6$ exactly, and for harder variants (e.g., "expected rolls to see all 6 faces") simulation is too slow and too imprecise.

We need a way to compute expected values symbolically, without enumerating an exponential or infinite sample space.

Linearity of expectation lets you split complex expected values into simple independent parts -- $E [X + Y] = E [X] + E [Y]$ even when $X$ and $Y$ are dependent!

Analogy: to find the expected number of heads in 100 coin flips, just add up $100 \times 0.5 = 50$ . You never need to consider all $2^{100}$ outcomes. Each flip contributes $0.5$ to the total expectation independently of every other flip. Linearity is the algebraic fact that makes this shortcut legal.

This analogy is exact -- it works for any random variables, dependent or not. The only place it breaks is when you need $E [X \cdot Y]$ (product, not sum); then independence does matter.

Visual Walkthrough

Coupon collector: there are 4 coupon types. Each purchase gives a uniformly random coupon. Expected purchases to collect all 4?

Decompose into phases using linearity of expectation:

Phase 0: You have 0 types. Next purchase gives a NEW type with prob 4/4.
Phase 1: You have 1 type. Next NEW type with prob 3/4.
Phase 2: You have 2 types. Next NEW type with prob 2/4.
Phase 3: You have 3 types. Next NEW type with prob 1/4.

Each phase is a geometric random variable -- expected $1 / p$ trials until success.

+-----------------------------------------------------------+
| Phase | Types owned | P(new) | E[purchases in phase]      |
|-------|-------------|--------|----------------------------|
|   0   |     0       |  4/4   |  4/4 = 1.00                |
|   1   |     1       |  3/4   |  4/3 = 1.33                |
|   2   |     2       |  2/4   |  4/2 = 2.00                |
|   3   |     3       |  1/4   |  4/1 = 4.00                |
+-----------------------------------------------------------+
| TOTAL (by linearity):  1 + 4/3 + 2 + 4 = 25/3 ~= 8.33   |
+-----------------------------------------------------------+

Step-by-step trace (one possible run):

Step 1: Buy coupon -> type A  (new!)    collected = {A}
Step 2: Buy coupon -> type C  (new!)    collected = {A, C}
Step 3: Buy coupon -> type A  (dup)     collected = {A, C}
Step 4: Buy coupon -> type B  (new!)    collected = {A, B, C}
Step 5: Buy coupon -> type B  (dup)     collected = {A, B, C}
Step 6: Buy coupon -> type A  (dup)     collected = {A, B, C}
Step 7: Buy coupon -> type D  (new!)    collected = {A, B, C, D}  DONE!

Brute force would enumerate all possible sequences (infinite!). Linearity of expectation gives the exact answer $25 / 3$ with zero enumeration.

COUPON COLLECTOR -- PHASE DECOMPOSITION (n=4 types):

  Phase:   ---- 0 ----   ---- 1 ----   ----- 2 -----   --------- 3 ---------
  Owned:       0 types        1 type         2 types           3 types
  P(new):       4/4            3/4            2/4               1/4
  E[tries]:     1.0            1.33           2.0               4.0

  Timeline of one possible run:
  +---+---+---+---+---+---+---+
  | A | C | A | B | B | A | D |
  +---+---+---+---+---+---+---+
   new  new dup  new dup  dup  new
   <0>  <-1->     <--2-->      <------3------>

  Total E = 1 + 4/3 + 2 + 4 = 25/3 ~ 8.33

For $n$ types: $E = n \cdot H_{n} = n (1 + 1 / 2 + 1 / 3 + \dots + 1 / n)$ , where $H_{n}$ is the $n$ -th harmonic number.

The Invariant

+------------------------------------------------------------------+
| CORE FORMULAS                                                    |
|                                                                  |
| 1. Linearity:  E[X] = sum of E[X_i] for indicator vars X_i.    |
|    (Even if X_i are dependent!)                                  |
|                                                                  |
| 2. Geometric distribution: repeated independent trials with      |
|    success probability p.  E[trials until first success] = 1/p. |
|                                                                  |
| 3. Expected value DP:                                            |
|    E[state] = sum over transitions of                            |
|               prob(t) * (cost(t) + E[next_state(t)])             |
|                                                                  |
| 4. Indicator variable trick:                                     |
|    E[count of events] = sum of P(event_i)                        |
+------------------------------------------------------------------+

Look at the coupon collector walkthrough: each phase is an independent geometric variable with success probability $(n - k) / n$ . The total expectation is the sum of individual phase expectations (linearity). The invariant connects all three formulas.

When to Reach for This

Trigger phrases:

"expected value" / "expected number of steps"
"probability that..." / "what is the probability"
"random process" / "each step independently"
"answer modulo $10^{9} + 7$ " combined with any of the above (signals modular inverse for probability)

Counter-examples:

"Count the number of ways" without "expected" -> pure combinatorics, not expected value. See Combinatorics Basics.
"Minimize the expected cost where you choose actions" -> this is optimization/game theory, not pure probability. You may still use E-value DP, but the recurrence has a $min$ or $max$ over choices.
Deterministic simulation of a process -> no randomness, no probability.

What the Code Won't Teach You

The DP fills a table and produces a number. What it hides:

Linearity of expectation does not require independence. This is the single most powerful -- and most counterintuitive -- fact in competitive-programming probability. $E [X + Y] = E [X] + E [Y]$ holds for any random variables, even when $X$ and $Y$ are correlated, dependent, or defined on completely different events. The proof is one line of algebra on the definition of expectation. Once you trust it, "expected number of inversions in a random permutation" becomes a two-line solution instead of an intractable sum over $n!$ outcomes.

Indicator decomposition is the meta-skill. Any "expected count" problem decomposes: let $X_{i} = 1$ if event $i$ occurs, then $E [\sum X_{i}] = \sum P (X_{i} = 1)$ . The hard part is choosing the right indicator events so that each $P (X_{i})$ is easy to compute. The code computes a number; the human chooses the decomposition.

INDICATOR DECOMPOSITION -- THE PATTERN:

  Problem: "Expected # of elements in correct position in random permutation"

  Step 1: Define X_i = 1 if sigma(i) = i   (indicator for fixed point)
  Step 2: E[X_i] = P(sigma(i) = i) = 1/n   (by symmetry)
  Step 3: E[total] = Sum E[X_i] = n * (1/n) = 1

  +-----------------------------------------------------+
  |  No need to enumerate permutations.                  |
  |  No need to track dependencies between X_i and X_j.  |
  |  Linearity handles everything.                       |
  +-----------------------------------------------------+

Visual Intuition (ASCII Art)

Diagram 1 -- Indicator variable decomposition for "expected number of matching pairs" in a random permutation of [1..4]:

Permutation: [2, 1, 4, 3]

Pair positions:    (1,2)  (1,3)  (1,4)  (2,3)  (2,4)  (3,4)
Match? (a_i=i):     N      N      N      N      N      N

X_i = 1 if position i has a_i = i (a "fixed point")
E[# fixed points] = sum E[X_i] = 4 * (1/4) = 1

+-----+-----+-----+-----+
| pos |  1  |  2  |  3  |  4  |
|-----|-----|-----|-----|-----|
| val |  2  |  1  |  4  |  3  |
| X_i |  0  |  0  |  0  |  0  |
+-----+-----+-----+-----+-----+

P(X_i = 1) = 1/4  for each i   (by symmetry)
E[total] = 4 * 1/4 = 1         (by linearity)

Diagram 2 -- Expected value DP: random walk on a line, 0 to $n$ .

Start at 0. Each step: move right with prob $p$ , left with prob $1 - p$ . Expected steps to reach $n$ ? (Absorbing barrier at $n$ , reflecting at 0.)

State diagram (n=3):

     1-p        1-p        1-p
  +------+   +------+   +------+
  |      v   |      v   |      v
  |  +---+---+---+---+---+---+---+
  +--| 0 |-->| 1 |-->| 2 |-->| 3 |  (absorbing)
     +---+ p +---+ p +---+ p +---+
       ^       |       |
       +-------+       |
        reflect        |
         at 0          |
                       |
    E[0] = 1 + p*E[1] + (1-p)*E[0]    (self-loop at 0)
    E[1] = 1 + p*E[2] + (1-p)*E[0]
    E[2] = 1 + p*E[3] + (1-p)*E[1]
    E[3] = 0                            (target)

Solve the linear system for $E [0]$ . For $p = 1 / 2, n = 3$ : $E [0] = 9$ .

Foundational Probability

The sections above establish linearity of expectation and the geometric distribution -- the two workhorses of CP probability. But many contest problems (and virtually all quant/HFT interviews) require sharper tools: Bayes' theorem, named distributions, variance, and simulation. This section fills those gaps.

Basic Axioms & Rules

A sample space $Ω$ is the set of all possible outcomes of an experiment. An event $A$ is a subset of $Ω$ . A probability function maps events to $[0, 1]$ .

Three axioms (everything else derives from these):

$P (A) \geq 0$ for every event $A$ .
$P (Ω) = 1$ .
For mutually exclusive events $A, B$ : $P (A \cup B) = P (A) + P (B)$ .

Derived rules you'll use constantly:

P (A \cup B) = P (A) + P (B) - P (A \cap B)

P (\bar{A}) = 1 - P (A)

Complementary counting is the single most useful probability trick at CF 1100-1600: when computing $P (A)$ directly is hard, compute $1 - P (\bar{A})$ instead.

Worked example -- two dice:

Roll two fair 6-sided dice. What is $P (sum \geq 10)$ ?

All 36 equally likely outcomes (die1, die2):

  Sum >= 10 outcomes:
    (4,6) (5,5) (5,6) (6,4) (6,5) (6,6)  -->  6 outcomes

  Direct:      P = 6/36 = 1/6
  Complement:  P(sum < 10) = 30/36,  so P(sum >= 10) = 1 - 30/36 = 1/6

Inclusion-exclusion generalizes to more events:

P (A \cup B \cup C) = P (A) + P (B) + P (C) - P (A \cap B) - P (A \cap C) - P (B \cap C) + P (A \cap B \cap C)

+-------------------------------------------------------+
| INCLUSION-EXCLUSION (two events, Venn diagram)        |
|                                                       |
|      +----------+                                     |
|      |    A     |                                     |
|      |     +----+------+                              |
|      |     |  A n B    |                              |
|      +-----+----+   B  |                              |
|            |    +------+                              |
|                                                       |
|  P(A u B) = P(A) + P(B) - P(A n B)                   |
+-------------------------------------------------------+

Conditional Probability & Bayes' Theorem

Conditional probability -- the probability of $A$ given that $B$ occurred:

P (A ∣ B) = \frac{P (A \cap B)}{P (B)}

This is the foundation of all "given that..." reasoning.

Multiplication rule (rearranging):

P (A \cap B) = P (A ∣ B) \cdot P (B) = P (B ∣ A) \cdot P (A)

Bayes' theorem reverses the conditioning direction:

P (A ∣ B) = \frac{P (B ∣ A) \cdot P (A)}{P (B)}

The denominator expands via the law of total probability:

P (B) = P (B ∣ A) \cdot P (A) + P (B ∣ \bar{A}) \cdot P (\bar{A})

Example 1 -- Disease testing (classic quant interview):

A disease affects 1 in 1000 people. A test is 99% accurate (99% sensitivity, 99% specificity). You test positive. What is $P (disease ∣ +)$ ?

Let D = "has disease",  T+ = "tests positive"

Given:
  P(D)      = 0.001          (base rate / prior)
  P(T+|D)   = 0.99           (sensitivity)
  P(T+|~D)  = 0.01           (false positive rate)

Compute P(T+):
  P(T+) = P(T+|D)*P(D) + P(T+|~D)*P(~D)
        = 0.99 * 0.001 + 0.01 * 0.999
        = 0.00099 + 0.00999
        = 0.01098

Apply Bayes:
  P(D|T+) = P(T+|D)*P(D) / P(T+)
          = 0.00099 / 0.01098
          = 0.0902  (~9%)

  SURPRISE: 91% chance you DON'T have it despite testing positive.
  The low base rate dominates.

Visualize with natural frequencies (often clearer than formulas):

+------------------------------------------------------------+
|  Population: 100,000 people                                |
|                                                            |
|  +----------+    +--------------------------------------+  |
|  | Diseased |    | Healthy                              |  |
|  |   100    |    | 99,900                               |  |
|  |          |    |                                      |  |
|  | Test+:99 |    | Test+: 999      Test-: 98,901       |  |
|  | Test-: 1 |    |                                      |  |
|  +----------+    +--------------------------------------+  |
|                                                            |
|  P(D | T+) = 99 / (99 + 999) = 99/1098 ~ 9.0%            |
+------------------------------------------------------------+

Example 2 -- Monty Hall problem:

Three doors: one car, two goats. You pick door 1. The host (who knows what's behind each door) opens door 3, revealing a goat. Should you switch?

Initial setup:
  Door 1     Door 2     Door 3
  (You)      (???)      (Host opens: goat)

Case analysis (car equally likely behind any door):

  Car behind 1: Host opens 2 or 3 (say 3). Switch -> LOSE.  P = 1/3
  Car behind 2: Host MUST open 3.          Switch -> WIN.   P = 1/3
  Car behind 3: Host MUST open 2 (not 3).  (Contradicts host opening 3.)

  Given host opened 3:
    P(car = 1 | host opens 3) = 1/3
    P(car = 2 | host opens 3) = 2/3

  SWITCH. Switching wins 2/3 of the time.

The Bayes calculation:

P (car = 2 ∣ host opens 3) = \frac{P (host opens 3 ∣ car = 2) \cdot P (car = 2)}{P (host opens 3)} = \frac{1 \cdot 1 / 3}{1 / 2} = \frac{2}{3}

Why Bayes matters for CP: any problem where you observe partial information and must update beliefs. "Given that event $B$ happened, what is $P (A)$ ?" -- that's conditional probability. CF problems with partially revealed randomness often need this reasoning.

cpp

#include <bits/stdc++.h>
using namespace std;

// Bayes update: given prior[i] and likelihood[i] = P(evidence | hypothesis i),
// return posterior[i] = P(hypothesis i | evidence).
vector<double> bayes_update(vector<double>& prior, vector<double>& likelihood) {
    int n = prior.size();
    double evidence = 0;
    for (int i = 0; i < n; i++) evidence += prior[i] * likelihood[i];
    vector<double> posterior(n);
    for (int i = 0; i < n; i++)
        posterior[i] = prior[i] * likelihood[i] / evidence;
    return posterior;
}

Independence: events $A$ and $B$ are independent iff $P (A \cap B) = P (A) \cdot P (B)$ , equivalently $P (A ∣ B) = P (A)$ . Independent events give no information about each other. Don't confuse with mutually exclusive (disjoint), which is the opposite -- if $A$ happened, $B$ definitely didn't.

Discrete Distributions

A probability distribution describes the behavior of a random variable. These named distributions appear repeatedly in CP and quant interviews -- recognize them instantly.

Bernoulli( $p$ )

A single trial: success with probability $p$ , failure with probability $1 - p$ .

P (X = 1) = p, P (X = 0) = 1 - p

E [X] = p, Var (X) = p (1 - p)

The building block for everything else. An indicator random variable is Bernoulli.

cpp

#include <bits/stdc++.h>
using namespace std;
// Bernoulli trial: returns 1 with probability p
int bernoulli(mt19937& rng, double p) {
    return uniform_real_distribution<double>(0, 1)(rng) < p ? 1 : 0;
}

Binomial( $n, p$ )

Number of successes in $n$ independent Bernoulli( $p$ ) trials.

P (X = k) = (\binom{n}{k}) p^{k} (1 - p)^{n - k}

E [X] = n p, Var (X) = n p (1 - p)

When it appears: "flip $n$ coins with bias $p$ , probability of exactly $k$ heads?" Voting models, reliability problems, any "count successes in fixed trials" setup.

Binomial(10, 0.5) PMF:

  P(k)
  0.25 +          *
       |        *   *
  0.20 +       *     *
       |      *       *
  0.15 +
       |    *           *
  0.10 +
       |   *             *
  0.05 +  *               *
       | *                 *
  0.00 +*                   *
       +--+--+--+--+--+--+--+--+--+--+--
       0  1  2  3  4  5  6  7  8  9  10  k

cpp

#include <bits/stdc++.h>
using namespace std;
// P(X = k) for Binomial(n, p) -- use log to avoid overflow
double binom_pmf(int n, int k, double p) {
    double log_prob = lgamma(n + 1) - lgamma(k + 1) - lgamma(n - k + 1)
                    + k * log(p) + (n - k) * log(1 - p);
    return exp(log_prob);
}

For CP mod-prime problems, compute $(\binom{n}{k})$ with precomputed factorials and modular inverse, then multiply by $p^{k} \cdot (1 - p)^{n - k}$ in modular arithmetic.

Geometric( $p$ )

Number of independent trials until the first success (inclusive).

P (X = k) = (1 - p)^{k - 1} \cdot p for k = 1, 2, 3, \dots

E [X] = \frac{1}{p}, Var (X) = \frac{1 - p}{p^{2}}

This is the distribution behind the coupon collector phases in Section 2 above -- each phase is geometric with success probability $(n - k) / n$ .

Memoryless property: $P (X > s + t ∣ X > s) = P (X > t)$ . If you've already failed $s$ times, the expected remaining trials is still $1 / p$ . Past failures don't help. This is the only discrete memoryless distribution.

Geometric(p=0.3) -- probability of first success on trial k:

  k:     1     2     3     4     5     6     7    ...
  P:  0.300 0.210 0.147 0.103 0.072 0.050 0.035  ...
       ***
       ***  ***
       ***  ***  **
       ***  ***  **   *
       ***  ***  **   *    *    *

Poisson( $λ$ )

Number of events in a fixed interval when events occur independently at rate $λ$ . Useful when $n$ is large, $p$ is small, and $n p \approx λ$ .

P (X = k) = \frac{λ^{k} e^{- λ}}{k!}

E [X] = λ, Var (X) = λ

When it appears: modeling arrivals (network packets, server requests -- quant interview staple), approximating Binomial( $n, p$ ) when $n$ is huge and $p$ is tiny.

Key property: sum of independent Poisson variables is Poisson: $X \sim Poisson (λ_{1}), Y \sim Poisson (λ_{2}) \Rightarrow X + Y \sim Poisson (λ_{1} + λ_{2})$ .

cpp

#include <bits/stdc++.h>
using namespace std;
// P(X = k) for Poisson(lambda)
double poisson_pmf(double lambda, int k) {
    return exp(k * log(lambda) - lambda - lgamma(k + 1));
}

Hypergeometric

Drawing without replacement. An urn has $N$ balls total: $K$ red, $N - K$ blue. Draw $n$ balls without replacement. $X$ = number of red balls drawn.

P (X = k) = \frac{(\binom{K}{k}) (\binom{N - K}{n - k})}{(\binom{N}{n})}

E [X] = \frac{n K}{N}, Var (X) = n \cdot \frac{K}{N} \cdot \frac{N - K}{N} \cdot \frac{N - n}{N - 1}

When it appears: card-drawing problems, lottery calculations, sampling without replacement. Key difference from binomial: draws are dependent because there's no replacement. As $N \to \infty$ with $K / N = p$ fixed, hypergeometric $\to$ binomial.

+-----------------------------------------------------------+
| DISCRETE DISTRIBUTIONS -- QUICK REFERENCE                  |
|-----------------------------------------------------------|
| Name            | E[X]   | Var(X)       | Key property    |
|-----------------|--------|--------------|-----------------|
| Bernoulli(p)    | p      | p(1-p)       | single trial    |
| Binomial(n,p)   | np     | np(1-p)      | n ind. trials   |
| Geometric(p)    | 1/p    | (1-p)/p^2    | memoryless      |
| Poisson(lam)    | lam    | lam          | rare events     |
| Hypergeometric  | nK/N   | (see above)  | no replacement  |
+-----------------------------------------------------------+

Continuous Distributions (Brief, for Interviews)

These rarely appear in CP (which deals with discrete structures) but are quant/HFT interview staples. Know the formulas and key properties.

Uniform( $a, b$ )

All values in $[a, b]$ equally likely.

f (x) = \frac{1}{b - a}, E [X] = \frac{a + b}{2}, Var (X) = \frac{(b - a)^{2}}{12}

Classic interview problem: "Two people agree to meet between 12:00 and 1:00. Each arrives at a uniformly random time and waits 15 minutes. What is the probability they meet?"

Answer: $P = 1 - (3 / 4)^{2} = 7 / 16 = 0.4375$ . (Area of the meeting region in the unit square minus the two corner triangles.)

Exponential( $λ$ )

Time until the next event in a Poisson process.

f (x) = λ e^{- λ x} for x \geq 0, E [X] = \frac{1}{λ}, Var (X) = \frac{1}{λ^{2}}

Memoryless property (continuous analog of geometric): $P (X > s + t ∣ X > s) = P (X > t)$ . "Given you've waited $s$ minutes, the expected additional wait is still $1 / λ$ ."

CDF: $P (X \leq t) = 1 - e^{- λ t}$ . Use this to compute "probability of event within time $t$ ."

Normal / Gaussian( $μ, σ^{2}$ )

The most important continuous distribution, thanks to the Central Limit Theorem.

f (x) = \frac{1}{σ \sqrt{2 π}} \exp (- \frac{(x - μ)^{2}}{2 σ^{2}})

E [X] = μ, Var (X) = σ^{2}

68-95-99.7 rule:

                      99.7%  (within 3 sigma)
                |<------------------------------>|
                |    95%  (within 2 sigma)        |
                |  |<---------------------->|     |
                |  |   68%  (within 1 sigma)|     |
                |  |  |<-------------->|    |     |
                |  |  |                |    |     |
           -----+--+--+-------*-------+--+-+-----+-----
              mu-3s  mu-s     mu    mu+s  mu+2s  mu+3s

Central Limit Theorem (CLT): the average of $n$ independent random variables (with finite mean $μ$ and variance $σ^{2}$ ) converges to a normal distribution as $n \to \infty$ :

\frac{\bar{X} - μ}{σ / \sqrt{n}} \overset{d}{\to} N (0, 1)

Interview application: "Flip 10000 fair coins. Approximate $P (heads > 5100)$ ." Use CLT: $μ = 5000$ , $σ = \sqrt{2500} = 50$ , $z = (5100 - 5000) / 50 = 2$ . $P (Z > 2) \approx 0.0228$ (from standard normal table / 68-95-99.7 rule: about $2.5 %$ in the upper tail beyond $2 σ$ ).

+-----------------------------------------------------------+
| CONTINUOUS DISTRIBUTIONS -- QUICK REFERENCE                |
|-----------------------------------------------------------|
| Name             | E[X]     | Var(X)       | Key property |
|------------------|----------|--------------|--------------|
| Uniform(a,b)     | (a+b)/2  | (b-a)^2/12   | flat PDF     |
| Exponential(lam) | 1/lam    | 1/lam^2      | memoryless   |
| Normal(mu,s^2)   | mu       | s^2          | CLT limit    |
+-----------------------------------------------------------+

Variance & Covariance

Variance measures how spread out a distribution is. You need it when problems ask for "the variance of ..." or when analyzing the reliability of Monte Carlo estimates.

Computational formula (almost always easier than the definition):

Var (X) = E [X^{2}] - (E [X])^{2}

The definition $Var (X) = E [(X - E [X])^{2}]$ is equivalent but harder to work with in practice. The computational formula lets you track $E [X]$ and $E [X^{2}]$ separately.

Scaling:

Var (a X + b) = a^{2} \cdot Var (X)

Adding a constant shifts the distribution but doesn't change its spread.

Variance of a sum:

Var (X + Y) = Var (X) + Var (Y) + 2 \cdot Cov (X, Y)

Covariance:

Cov (X, Y) = E [X Y] - E [X] \cdot E [Y]

If $X, Y$ are independent, then $Cov (X, Y) = 0$ , giving the simpler:

Var (X + Y) = Var (X) + Var (Y) (independent only!)

For $n$ independent variables: $Var (\sum_{i = 1}^{n} X_{i}) = \sum_{i = 1}^{n} Var (X_{i})$ .

Worked example -- sum of two dice:

Let S = D_1 + D_2 (two independent fair dice).

E[D_i]   = (1+2+3+4+5+6)/6 = 3.5
E[D_i^2] = (1+4+9+16+25+36)/6 = 91/6
Var(D_i) = 91/6 - (3.5)^2 = 91/6 - 49/4 = 35/12 ~ 2.917

D_1, D_2 independent:
  Var(S) = Var(D_1) + Var(D_2) = 35/12 + 35/12 = 35/6 ~ 5.833
  Std dev = sqrt(35/6) ~ 2.415

When variance matters in CP: some problems ask for the variance or standard deviation of a random quantity (not just its expectation). Approach: compute $E [X]$ and $E [X^{2}]$ separately using linearity, then subtract. Both $E [X]$ and $E [X^{2}]$ can use indicator decomposition.

cpp

#include <bits/stdc++.h>
using namespace std;

// Compute E[X] and Var(X) from samples or a PMF
// pmf[i] = {value, probability}
pair<double, double> mean_variance(vector<pair<double, double>>& pmf) {
    double ex = 0, ex2 = 0;
    for (auto& [val, prob] : pmf) {
        ex += val * prob;
        ex2 += val * val * prob;
    }
    return {ex, ex2 - ex * ex};
}

Covariance in practice: when random variables are not independent (e.g., two indicator variables for overlapping events), you must compute covariance explicitly. For indicators $X_{i}, X_{j} \in {0, 1}$ :

Cov (X_{i}, X_{j}) = P (X_{i} = 1 and X_{j} = 1) - P (X_{i} = 1) \cdot P (X_{j} = 1)

Classic Probability Problems (Worked Solutions)

These appear in CP contests, quant interviews, and mathematical olympiads. Recognize the pattern, apply the formula.

Coupon Collector

(Detailed walkthrough in Section 2 above.) For $n$ types:

E = n \cdot H_{n} = n (1 + \frac{1}{2} + \frac{1}{3} + \dots + \frac{1}{n}) \approx n \ln n + 0.577 n

The variance is:

Var = \sum_{k = 1}^{n} \frac{n^{2}}{k^{2}} - n \cdot H_{n} \leq \frac{π^{2}}{6} \cdot n^{2}

For $n = 6$ (dice faces): $E = 14.7$ , $Var \approx 38.4$ .

Birthday Paradox

In a group of $k$ people with birthdays uniform over 365 days, the probability of at least one shared birthday:

P (collision) = 1 - \prod_{i = 0}^{k - 1} (1 - \frac{i}{365})

For $k = 23$ : $P \approx 0.507$ -- just over 50%.

Application in CS: hash collision analysis. With a hash space of size $N$ , expect a collision after $\sim \sqrt{π N / 2} \approx 1.18 \sqrt{N}$ random insertions. For 64-bit hashes ( $N = 2^{64}$ ): collision expected after $\sim 5 \times 10^{9}$ elements.

+----------------------------------------------+
|  Birthday paradox: P(collision) vs group size |
|                                               |
|  P  1.0 +                         ****       |
|     0.8 +                    *****           |
|     0.6 +               ****                |
|     0.5 +- - - - - -* - - - - - - -          |
|     0.4 +          *                          |
|     0.2 +       **                            |
|     0.0 +****                                 |
|         +--+--+--+--+--+--+--+--+--> k       |
|         0  5  10 15 20 25 30 40 50            |
+----------------------------------------------+

cpp

#include <bits/stdc++.h>
using namespace std;
// Minimum group size k for P(collision) >= threshold in space of size n
int birthday_bound(long long n, double threshold = 0.5) {
    double p_no_collision = 1.0;
    for (int k = 1; ; k++) {
        p_no_collision *= (double)(n - k + 1) / n;
        if (1.0 - p_no_collision >= threshold) return k;
    }
}

Gambler's Ruin

A gambler starts with $$ a$ . Each round: wins $$ 1$ with probability $p$ or loses $$ 1$ with probability $q = 1 - p$ . Game ends at $$ 0$ (ruin) or $$ N$ (goal).

Probability of ruin starting from $a$ :

P (ruin from a) = {\begin{cases} \frac{(q / p)^{a} - (q / p)^{N}}{1 - (q / p)^{N}} & if p \neq q \\ 1 - \frac{a}{N} & if p = q = 0.5 \end{cases}

Expected duration (for $p = q = 0.5$ ): $E [steps] = a \cdot (N - a)$ .

Fortune over time (random walk, absorbing barriers at 0 and N):

  N +                                   * (WIN)
    |                     *   *       *
    |                 *  * * * *  *  *
    |            *   * *       * * *
  a + - - - * * * *                      <-- start
    |      *
    |   * *
    | **
  0 +*                                   (RUIN)
    +--+--+--+--+--+--+--+--+--+--+--> rounds

When it appears: random walk problems, stopping time analysis, competitive betting models in quant interviews. Also connects to the expected value DP on random walks (Section 3 diagram above).

Ballot Problem

Candidate A gets $a$ votes, candidate B gets $b$ votes, with $a > b$ . If votes are counted in a uniformly random order, the probability that A is strictly ahead throughout the entire counting is:

P (A always ahead) = \frac{a - b}{a + b}

This connects to Catalan numbers and lattice path counting. For $a = 3, b = 1$ : $P = 2 / 4 = 1 / 2$ .

Generalization (Cycle Lemma): the number of cyclic permutations of a sequence of $+ 1$ 's and $- 1$ 's (with sum $> 0$ ) that have all positive partial sums is exactly $(a - b) / (a + b)$ of the total.

Dice Puzzles

Expected rolls to see all 6 faces -- coupon collector with $n = 6$ :

E = 6 (1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6}) = \frac{147}{10} = 14.7

Expected value of the max of $k$ independent Uniform $(0, 1)$ variables:

E [max (X_{1}, \dots, X_{k})] = \frac{k}{k + 1}

Proof: CDF of the max is $F (x) = x^{k}$ , so $E [max] = \int_{0}^{1} (1 - x^{k}) d x = 1 - \frac{1}{k + 1}$ .

Common quant interview question: "I draw $n$ numbers uniformly from $[0, 1]$ . What is the expected value of the largest?" Answer: $n / (n + 1)$ .

Expected number of rolls of a die until sum exceeds $N$ : use DP exactly as in Section 5.3 (implementation section below). $E [s] = 1 + \frac{1}{6} \sum_{f = 1}^{6} E [min (s + f, N)]$ , computed backwards.

Monte Carlo Simulation

When exact computation is intractable, simulate. The Law of Large Numbers guarantees that the sample average converges to the true expectation as the number of trials $\to \infty$ .

When to use Monte Carlo:

Verifying your exact formula on small inputs (stress testing).
Getting approximate answers when no closed form exists.
Computing integrals numerically (e.g., estimating $π$ ).

Basic template:

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
    uniform_real_distribution<double> real01(0.0, 1.0);
    uniform_int_distribution<int> die(1, 6);

    const int TRIALS = 1'000'000;
    int hits = 0;
    for (int i = 0; i < TRIALS; i++) {
        // simulate one trial
        int s = die(rng) + die(rng) + die(rng);
        if (s > 15) hits++;
    }
    double prob = (double)hits / TRIALS;
    printf("Estimated P(3 dice sum > 15): %.6f\n", prob);
    // Exact: 10/216 = 0.046296...
}

Estimating $π$ : drop random points in the unit square $[0, 1]^{2}$ . Fraction that land inside the quarter-circle $x^{2} + y^{2} \leq 1$ is $\approx π / 4$ .

+---+---+---+---+---+
|   .   . * .       |  1.0
|     *   *   . *   |
|   *   * * .   *   |
| * . *   *   *     |
|   *   *   .       |  0.5
| . * *       .     |
|   *   .           |
| *       .         |
+---+---+---+---+---+
0.0               1.0

  *  = inside quarter-circle (x^2 + y^2 <= 1)
  .  = outside quarter-circle
  pi ~ 4 * (count of *) / (total points)

cpp

#include <bits/stdc++.h>
using namespace std;
int main() {
    mt19937 rng(42);
    uniform_real_distribution<double> dist(0.0, 1.0);
    int inside = 0, total = 1'000'000;
    for (int i = 0; i < total; i++) {
        double x = dist(rng), y = dist(rng);
        if (x * x + y * y <= 1.0) inside++;
    }
    printf("pi ~ %.6f\n", 4.0 * inside / total);  // ~3.1416
}

Stress testing: when you derive an exact formula, always verify with Monte Carlo on small inputs before submitting.

cpp

#include <bits/stdc++.h>
using namespace std;

// Verify: expected fixed points in a random permutation = 1.0
int main() {
    mt19937 rng(42);
    int n = 10;
    const int TRIALS = 1'000'000;
    long long total_fixed = 0;
    for (int t = 0; t < TRIALS; t++) {
        vector<int> perm(n);
        iota(perm.begin(), perm.end(), 0);
        shuffle(perm.begin(), perm.end(), rng);
        for (int i = 0; i < n; i++)
            if (perm[i] == i) total_fixed++;
    }
    printf("E[fixed points] ~ %.4f (exact: 1.0000)\n",
           (double)total_fixed / TRIALS);
}

+-------------------------------------------------------------------+
| MONTE CARLO PRECISION GUIDE                                       |
|-------------------------------------------------------------------|
| Trials      | Typical error (p ~ 0.5) | Good for...               |
|-------------|-------------------------|---------------------------|
| 10^4        | ~0.005                  | Quick sanity check         |
| 10^5        | ~0.0016                 | 3-digit accuracy           |
| 10^6        | ~0.0005                 | Most stress tests          |
| 10^7        | ~0.00016                | High-precision verify      |
|-------------------------------------------------------------------|
| Error ~ 1/sqrt(TRIALS).  To halve the error, quadruple trials.   |
+-------------------------------------------------------------------+

C++ STL Reference

Function / Class	Header	What it does	Time
`mt19937 rng(seed)`	`<random>`	Mersenne Twister PRNG, 32-bit	$O (1)$ per call
`uniform_int_distribution<>(a,b)`	`<random>`	Uniform random int in $[a, b]$	$O (1)$
`uniform_real_distribution<>(a,b)`	`<random>`	Uniform random double in $[a, b)$	$O (1)$
`__gcd(a, b)` / `gcd(a,b)`	`<algorithm>` / `<numeric>`	GCD (needed for fraction simplification)	$O (\log min (a, b))$
`__int128`	built-in	128-bit int for modular arithmetic overflow avoidance	--

For contest probability problems, you rarely use <random> directly -- the key operations are modular inverse and DP. The STL's main role is providing gcd and fast I/O.

Implementation (Contest-Ready)

5.1 Minimal: Modular inverse for probability mod prime

cpp

#include <bits/stdc++.h>
using namespace std;

const long long MOD = 1e9 + 7;

long long power(long long base, long long exp, long long mod) {
    long long result = 1;
    base %= mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;
        base = base * base % mod;
        exp >>= 1;
    }
    return result;
}

long long inv(long long a, long long mod = MOD) {
    return power(a, mod - 2, mod);
}

// Probability p/q mod MOD = p * inv(q) mod MOD
long long prob_mod(long long p, long long q) {
    return p % MOD * inv(q) % MOD;
}

int main() {
    // Coupon collector with n types: E = n * H_n
    int n;
    cin >> n;
    long long ans = 0;
    for (int k = 1; k <= n; k++)
        ans = (ans + prob_mod(n, k)) % MOD;
    cout << ans << "\n";
}

5.2 Minimal: Expected value DP

cpp

#include <bits/stdc++.h>
using namespace std;

const long long MOD = 1e9 + 7;

long long power(long long b, long long e, long long m) {
    long long r = 1; b %= m;
    while (e > 0) { if (e & 1) r = r * b % m; b = b * b % m; e >>= 1; }
    return r;
}
long long inv(long long a) { return power(a, MOD - 2, MOD); }

int main() {
    int n;
    cin >> n;
    vector<vector<pair<int,long long>>> adj(n); // adj[u] = {(v, prob_numerator)}
    // Read graph / transitions ...
    // E[target] = 0, solve backwards (topological or Gaussian)
    vector<long long> E(n, 0);
    // Example: DAG in reverse topological order
    // E[u] = sum over (v, p): p/total * (1 + E[v])
    cout << E[0] << "\n";
}

5.3 Explained: Expected value DP for die rolls

cpp

#include <bits/stdc++.h>
using namespace std;

const long long MOD = 1e9 + 7;

long long power(long long b, long long e, long long m) {
    long long r = 1; b %= m;
    while (e > 0) { if (e & 1) r = r * b % m; b = b * b % m; e >>= 1; }
    return r;
}
long long inv(long long a) { return power(a, MOD - 2, MOD); }

int main() {
    // Problem: roll a 6-sided die. You start with score 0.
    // Each roll adds the face value to your score.
    // Expected number of rolls to reach score >= n?
    //
    // E[s] = expected rolls to reach >= n starting from score s.
    // E[s] = 0 if s >= n.
    // E[s] = 1 + (1/6) * sum_{f=1}^{6} E[s + f]
    //
    // Compute backwards from s = n-1 down to 0.

    int n;
    cin >> n;

    vector<long long> E(n + 7, 0); // E[s] = 0 for s >= n

    long long inv6 = inv(6);

    for (int s = n - 1; s >= 0; s--) {
        // E[s] = 1 + (1/6) * sum_{f=1..6} E[s+f]
        long long total = 0;
        for (int f = 1; f <= 6; f++) {
            int ns = s + f;
            if (ns >= n)
                total = (total + 0) % MOD; // E[ns] = 0
            else
                total = (total + E[ns]) % MOD;
        }
        // E[s] = 1 + total * inv(6)
        E[s] = (1 + total % MOD * inv6 % MOD) % MOD;
    }

    cout << E[0] << "\n";
}

5.4 Gaussian elimination for E-value on graphs with cycles

cpp

#include <bits/stdc++.h>
using namespace std;

// Solve system of linear equations over doubles.
// a[i] is row i with n+1 columns (augmented matrix).
// Returns solution vector x, or empty if no unique solution.
vector<double> gauss(vector<vector<double>>& a) {
    int n = a.size();
    for (int col = 0; col < n; col++) {
        int pivot = -1;
        for (int row = col; row < n; row++)
            if (abs(a[row][col]) > 1e-9) { pivot = row; break; }
        if (pivot == -1) return {}; // singular
        swap(a[col], a[pivot]);
        double div = a[col][col];
        for (int j = col; j <= n; j++) a[col][j] /= div;
        for (int row = 0; row < n; row++) {
            if (row == col) continue;
            double factor = a[row][col];
            for (int j = col; j <= n; j++)
                a[row][j] -= factor * a[col][j];
        }
    }
    vector<double> x(n);
    for (int i = 0; i < n; i++) x[i] = a[i][n];
    return x;
}

int main() {
    // E[i] = 1 + (1/deg(i)) * sum_{j in adj(i)} E[j]
    // E[target] = 0
    // Rearrange: E[i] - (1/deg(i)) * sum E[j] = 1
    // Build augmented matrix and solve.

    int n, m, target;
    cin >> n >> m >> target;
    vector<vector<int>> adj(n);
    for (int i = 0; i < m; i++) {
        int u, v; cin >> u >> v;
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    // n equations, n unknowns
    vector<vector<double>> eq(n, vector<double>(n + 1, 0.0));
    for (int i = 0; i < n; i++) {
        if (i == target) {
            eq[i][i] = 1.0; // E[target] = 0
            eq[i][n] = 0.0;
        } else {
            int deg = adj[i].size();
            eq[i][i] = 1.0;
            for (int j : adj[i])
                eq[i][j] -= 1.0 / deg;
            eq[i][n] = 1.0;
        }
    }

    auto ans = gauss(eq);
    if (ans.empty()) cout << "-1\n";
    else cout << fixed << setprecision(9) << ans[0] << "\n";
}

Operations Reference

Operation	Time	Space
Modular inverse (Fermat)	$O (\log p)$	$O (1)$
Precompute inverse table $1. . n$	$O (n)$	$O (n)$
Expected value DP (DAG, $n$ states, $m$ transitions)	$O (n + m)$	$O (n)$
Gaussian elimination ( $n$ unknowns)	$O (n^{3})$	$O (n^{2})$
Coupon collector ( $n$ types)	$O (n)$	$O (1)$
Linearity of expectation (sum of $n$ indicators)	$O (n)$	$O (1)$
Matrix exponentiation for Markov chain ( $k$ steps, $n$ states)	$O (n^{3} \log k)$	$O (n^{2})$

Problem Patterns & Tricks

Pattern 1: Indicator Variable Decomposition

Instead of computing $E [complex quantity]$ directly, write it as a sum of indicator random variables: $X = \sum X_{i}$ where $X_{i} \in {0, 1}$ . Then $E [X] = \sum P (X_{i} = 1)$ .

Example: Expected number of inversions in a random permutation of $[1. . n]$ . There are $(\binom{n}{2})$ pairs. Each pair $(i, j)$ is inverted with probability $1 / 2$ by symmetry. So $E = (\binom{n}{2}) / 2 = n (n - 1) / 4$ .

CF 1Expected -- CF 1042B, CF 1392D

Pattern 2: Coupon Collector / Geometric Phases

Decompose a "collect all" process into phases. Phase $k$ : you have $k$ items, need a new one, probability of new item per trial is $(n - k) / n$ . Expected trials in phase $k$ : $n / (n - k)$ . Total: $n \cdot H_{n}$ .

E = sum_{k=0}^{n-1} n / (n - k) = n * (1/1 + 1/2 + ... + 1/n)

CF 1Expected -- CF 1Expected280C, AtCoder ABC 242E

Pattern 3: Expected Value DP on DAGs

Define $E [s]$ = expected cost from state $s$ to target. Base case: $E [target] = 0$ . Transition: $E [s] = cost (s) + \sum_{t} p (s \to t) \cdot E [t]$ .

If the state graph is a DAG, compute in reverse topological order. No linear system needed.

cpp

// Reverse topo order
for (int u : rev_topo) {
    E[u] = 1; // cost of being in state u
    for (auto [v, p] : transitions[u])
        E[u] = (E[u] + p * E[v]) % MOD;
}

CF 518D, CF 839C

Pattern 4: Self-loop elimination in E-value DP

When $E [s]$ depends on itself (probability of staying in state $s$ ):

E [s] = 1 + p_{stay} \cdot E [s] + \sum_{t \neq s} p_{t} \cdot E [t]

Rearrange: $(1 - p_{stay}) \cdot E [s] = 1 + \sum_{t \neq s} p_{t} \cdot E [t]$

E [s] = \frac{1 + \sum_{t \neq s} p_{t} \cdot E [t]}{1 - p_{stay}}

This avoids Gaussian elimination when the only cycle is a self-loop.

SELF-LOOP ELIMINATION IN E-VALUE DP:

  State s can stay with prob p, or move to t with prob 1-p.

  Naive recurrence (INFINITE LOOP):
    E[s] = 1 + p*E[s] + (1-p)*E[t]     <-- E[s] on both sides!

  Algebraic fix:
    E[s] - p*E[s] = 1 + (1-p)*E[t]
    (1-p)*E[s]    = 1 + (1-p)*E[t]
    E[s]          = 1/(1-p) + E[t]

  +--------+  p (self-loop)   +--------+
  | State s| <---------------| State s|
  |        | --------------->| State t|
  +--------+     1-p          +--------+

  Example: p = 1/2, E[t] = 0  ->  E[s] = 1/(1/2) + 0 = 2

CF 1Expected -- CF 1073E, CF 148D

Pattern 5: Probability modulo prime

When the answer is $p / q$ and the problem says "output modulo $10^{9} + 7$ ", compute $p \cdot q^{- 1} mod (10^{9} + 7)$ . Use Fermat's little theorem: $q^{- 1} \equiv q^{M O D - 2} (\mod M O D)$ .

Never mix floating-point and modular arithmetic. Do everything in modular world from the start.

CF 1Expected -- CF 1151F, CF 963A

Pattern 6: Markov Chain + Matrix Exponentiation

For a random walk on a small state space ( $n \leq 100$ ) with $k$ steps ( $k \leq 10^{18}$ ), build the $n \times n$ transition matrix $T$ and compute $T^{k}$ with matrix exponentiation.

The probability of being in state $j$ after $k$ steps starting from state $i$ is $(T^{k})_{i, j}$ .

CF 1Expected -- CF 691E, AtCoder ABC 256E

Contest Cheat Sheet

+----------------------------------------------------------------+
| PROBABILITY & EXPECTED VALUE  --  Quick Reference              |
|----------------------------------------------------------------|
| WHEN TO USE:                                                   |
|  - "expected number of ..." / "probability that ..."           |
|  - Random process with well-defined transitions                |
|  - Answer mod 10^9+7 with fractions                            |
|                                                                |
| KEY FORMULAS:                                                  |
|  E[X+Y] = E[X] + E[Y]            (ALWAYS, even if dependent)  |
|  E[geometric(p)] = 1/p                                         |
|  E[coupon collector, n] = n * H_n                              |
|  Var(X) = E[X^2] - (E[X])^2                                   |
|  Bayes: P(A|B) = P(B|A)*P(A) / P(B)                           |
|  Birthday: collision after ~1.18*sqrt(N) inserts               |
|                                                                |
| KEY CODE:                                                      |
|  ll inv(ll a) { return power(a, MOD-2, MOD); }                |
|  ll prob(ll p, ll q) { return p % MOD * inv(q) % MOD; }       |
|  // DP: E[s] = cost + sum(p_t * E[next]) / (1 - p_stay)       |
|                                                                |
| COMPLEXITY:                                                    |
|  Modular inverse: O(log MOD)  |  E-val DP (DAG): O(n+m)       |
|  Gauss elimination: O(n^3)    |  Matrix exp: O(n^3 log k)     |
|                                                                |
| GOTCHA: Never mix doubles and mod arithmetic.                  |
| GOTCHA: Self-loops need algebraic rearrangement, not recursion.|
+----------------------------------------------------------------+

Gotchas & Debugging

Mistake 1: Forgetting self-loop rearrangement

If state $s$ can transition back to itself with probability $p$ , writing E[s] = 1 + p * E[s] + ... and computing naively gives infinite recursion. Always rearrange to isolate $E [s]$ algebraically.

Mistake 2: Wrong modular inverse

$p / q mod M$ is $p \cdot q^{- 1} mod M$ , not $(p mod M) / (q mod M)$ . Integer division and modular inverse are completely different operations.

Mistake 3: Probability > 1 in modular world

When debugging, you cannot check if a modular probability "makes sense" by looking at its value. Instead, verify with small cases using floating-point separately, then trust your modular code.

Mistake 4: Off-by-one in geometric distribution

$E [geometric (p)]$ counts the successful trial too. The expected number of trials until success (inclusive) is $1 / p$ . The expected number of failures before success is $(1 - p) / p$ . Know which one your problem asks for.

Mistake 5: Applying linearity to products

$E [X \cdot Y] = E [X] \cdot E [Y]$ is only true when $X$ and $Y$ are independent. Linearity of expectation is about sums, not products. This is the #1 conceptual error.

Edge cases

Probability 0: if a state is unreachable, its expected value is undefined. Guard against division by zero.
$n = 1$ in coupon collector: $E = 1$ (you already get the only type on the first purchase).
Absorbing states: make sure $E [absorbing] = 0$ is set before the DP loop.
Graph not connected: if target is unreachable from source, expected value is infinite -- the problem is ill-defined or you need to output $- 1$ .

Debugging tips

Test with small cases ( $n \leq 5$ ) using brute-force simulation (doubles).
Compare your modular answer by converting the brute-force double to a fraction $p / q$ and checking $p \cdot q^{- 1} mod M$ .
Print intermediate $E [s]$ values -- they should form a monotonically decreasing sequence from source to target in many problems.

Mental Traps

"I need the full probability distribution to compute the expected value." For most problems you need only $E [X]$ , and linearity of expectation or a recurrence gives it directly. Computing the full distribution ( $P (X = 0), P (X = 1), \dots$ ) and then summing $\sum k \cdot P (X = k)$ is almost always more work than necessary. The distribution is a detour; the expected value is the destination.

DISTRIBUTION vs DIRECT EXPECTATION:

  "Expected # of heads in 100 fair flips"

  Distribution approach:                 Linearity approach:
  +----------------------------+         +----------------------------+
  | Compute P(X=k) for each k  |         | E[X] = 100 * P(one head)  |
  | = 0..100 using C(100,k)    |         |      = 100 * 0.5          |
  | Sum k * P(X=k)             |         |      = 50                 |
  | -> dozens of binomial      |         |                            |
  |    coefficient evaluations |         | One line. Done.            |
  +----------------------------+         +----------------------------+

"Expected value problems are different from DP problems." In competitive programming, expected value computations are almost always implemented as DP. The state is "where am I in the process," the transition weights are probabilities, and the value is the expected cost-to-go. Recognizing that $E [state] = cost + \sum p_{t} \cdot E [next]$ is a DP recurrence -- not a probability formula -- is what makes these problems accessible.

Common Mistakes

Treating every random walk as a sum of geometric trials. The frog that jumps $+ 1$ with probability $p$ and stays put with probability $1 - p$ reaches position $n$ in expected time $n / p$ -- each step is a fresh geometric variable with parameter $p$ . Tempting to reuse the formula when the frog jumps $+ 1$ with probability $p$ and $- 1$ with probability $1 - p$ , but now the walk can revisit positions and the steps are no longer independent; the correct setup is the recurrence $E_{i} = 1 + p \cdot E_{i + 1} + (1 - p) \cdot E_{i - 1}$ solved as a system. Linearity of expectation always works for decomposing sums, but $E = n / p$ only applies when each "step" is a fresh independent trial. When there is backtracking or state dependence, use DP or equation systems.

Debug Drills

Drill 1. Expected value DP with backward recurrence.

cpp

// Expected tosses to get n consecutive heads
double E[MAXN];
E[0] = 0;  // 0 consecutive heads: we're done... wait, we WANT n heads
for (int i = 1; i <= n; i++) {
    E[i] = 1 + 0.5 * E[i-1] + 0.5 * E[0];
    // heads: progress to i consecutive, tails: reset to 0
}
cout << E[n] << endl;

What's wrong?

The recurrence is backwards. E[i] should represent "expected tosses to reach $n$ consecutive heads starting from $i$ consecutive heads already." So E[n] = 0 (done), and we want E[0]. The recurrence should be: $E [i] = 1 + 0.5 \cdot E [i + 1] + 0.5 \cdot E [0]$ for $i < n$ .

Fix: Set E[n] = 0, then solve from $i = n - 1$ down to 0. This creates a system where $E [0]$ appears in every equation -- substitute and solve algebraically, or iterate until convergence.

Drill 2. Random walk with bidirectional dependency.

cpp

// Expected steps from node 0 to node n in a path graph
// At each node i (0 < i < n), move to i-1 or i+1 with prob 1/2
double E[MAXN];
E[n] = 0;
for (int i = n - 1; i >= 0; i--) {
    E[i] = 1 + 0.5 * E[i + 1] + 0.5 * E[i - 1];
}

What's wrong?

When computing E[i], the formula references E[i-1] -- but E[i-1] hasn't been computed yet (we're iterating downward)! This isn't a simple DP; it's a system of linear equations because of the bidirectional dependency. You can't solve it with a single-pass loop.

Fix: Use Gaussian elimination, or exploit the tridiagonal structure analytically. For the symmetric random walk on a path, the closed-form is $E [i] = (n - i) \cdot (n + i)$ (from node $i$ to node $n$ , so $E [0] = n^{2}$ ).

Drill 3. Modular probability with integer division.

cpp

const int MOD = 1e9 + 7;
long long prob[MAXN];
// prob[i] = probability of event i, stored as p/q mod MOD
prob[0] = 1;  // probability 1 = 1/1

for (int i = 1; i <= n; i++) {
    prob[i] = prob[i-1] * 1 / 2 % MOD;  // multiply by 1/2
}

What's wrong?

1 / 2 in C++ is integer division, which gives 0. So prob[i] = 0 for all $i \geq 1$ . To divide by 2 in modular arithmetic, you need to multiply by the modular inverse of 2.

Fix:

cpp

long long inv2 = (MOD + 1) / 2;  // modular inverse of 2 mod 1e9+7
// (works because MOD is odd, so (MOD+1)/2 * 2 ≡ 1 mod MOD)
prob[i] = prob[i-1] % MOD * inv2 % MOD;

Self-Test

[ ] State linearity of expectation and explain why it holds even when $X$ and $Y$ are dependent
[ ] Solve "expected # of coin flips until first head" using the geometric distribution formula -- and verify with the recurrence $E = 1 + (1 / 2) E$
[ ] Decompose "expected # of fixed points in a random permutation of $[1. . n]$ " into indicator variables and compute the answer
[ ] Set up the expected-value DP for a random walk on ${0, 1, 2, 3}$ with absorbing barrier at 3 -- write the system of equations
[ ] Explain when $E [X Y] = E [X] \cdot E [Y]$ holds and give a counterexample where it fails

Practice Problems

CF Rating	What You Should Be Able To Do
1200	Compute basic probabilities; know that $E [X + Y] = E [X] + E [Y]$ always holds
1500	Apply linearity of expectation to avoid complex combinatorics; use $E = 1 / p$ for geometric distributions
1800	Write expected value DP (on arrays, trees, DAGs); set up and solve systems of linear equations for random walks
2100	Handle probability with bitmask DP; combine EV with segment trees or other data structures; solve Markov chain problems with Gaussian elimination

#	Problem	Source	Difficulty	Key Idea
1	Dice Probability	CSES	Easy	Basic probability DP
2	Ball in Berland	CF 1475D	Easy	Indicator variables + counting
3	Random Mood	CF 839C	Medium	Expected value DP on tree
4	Ants on a Circle	AtCoder ABC 250E	Medium	Linearity of expectation
5	Dice Tower	CF 1823D	Medium	Geometric distribution + DP
6	Journey	CF 839C	Medium	E-value DP on DAG
7	Playlist	CF 1151F	Medium-Hard	Linearity + modular inverse
8	Berland Miners	CF 148D	Hard	E-value with self-loops
9	Walking on a Tree	CF 1073E	Hard	Gaussian elimination on graph
10	Ant Colony	CF 1525E	Hard	Markov chain + matrix exp
11	Dishonest Sellers	CF 779C	Easy	Basic probability + complementary counting
12	Magic Gems	CF 1117D	Medium	Expected value with matrix exponentiation
13	Nezzar and Board	CF 1478C	Medium	Conditional probability reasoning
14	Random Events	CF 1461D	Medium	Independent events, complementary probability
15	Game on Leaves	CF 1363C	Medium	Probability on trees, Bayes reasoning
16	Birthday Paradox	CSES	Hard	Birthday paradox / collision analysis

Probability & Expected Value ​

Three modes — pick one before you write code ​

Contents ​

Intuition ​

Visual Walkthrough ​

The Invariant ​

When to Reach for This ​

What the Code Won't Teach You ​

Visual Intuition (ASCII Art) ​

Foundational Probability ​

Basic Axioms & Rules ​

Conditional Probability & Bayes' Theorem ​

Discrete Distributions ​

Bernoulli(p) ​

Binomial(n,p) ​

Geometric(p) ​

Poisson(λ) ​

Hypergeometric ​

Continuous Distributions (Brief, for Interviews) ​

Uniform(a,b) ​

Exponential(λ) ​

Normal / Gaussian(μ,σ2) ​

Variance & Covariance ​

Classic Probability Problems (Worked Solutions) ​

Coupon Collector ​

Birthday Paradox ​

Gambler's Ruin ​

Ballot Problem ​

Dice Puzzles ​

Monte Carlo Simulation ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

5.1 Minimal: Modular inverse for probability mod prime ​

5.2 Minimal: Expected value DP ​

5.3 Explained: Expected value DP for die rolls ​

5.4 Gaussian elimination for E-value on graphs with cycles ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: Indicator Variable Decomposition ​

Pattern 2: Coupon Collector / Geometric Phases ​

Pattern 3: Expected Value DP on DAGs ​

Pattern 4: Self-loop elimination in E-value DP ​

Pattern 5: Probability modulo prime ​

Pattern 6: Markov Chain + Matrix Exponentiation ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

Mistake 1: Forgetting self-loop rearrangement ​

Mistake 2: Wrong modular inverse ​

Mistake 3: Probability > 1 in modular world ​

Mistake 4: Off-by-one in geometric distribution ​

Mistake 5: Applying linearity to products ​

Edge cases ​

Debugging tips ​

Mental Traps ​

Common Mistakes ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

Probability & Expected Value

Three modes — pick one before you write code

Contents

Intuition

Visual Walkthrough

The Invariant

When to Reach for This

What the Code Won't Teach You

Visual Intuition (ASCII Art)

Foundational Probability

Basic Axioms & Rules

Conditional Probability & Bayes' Theorem

Discrete Distributions

Bernoulli( $p$ )

Binomial( $n, p$ )

Geometric( $p$ )

Poisson( $λ$ )

Hypergeometric

Continuous Distributions (Brief, for Interviews)

Uniform( $a, b$ )

Exponential( $λ$ )

Normal / Gaussian( $μ, σ^{2}$ )

Variance & Covariance

Classic Probability Problems (Worked Solutions)

Coupon Collector

Birthday Paradox

Gambler's Ruin

Ballot Problem

Dice Puzzles

Monte Carlo Simulation

C++ STL Reference

Implementation (Contest-Ready)

5.1 Minimal: Modular inverse for probability mod prime

5.2 Minimal: Expected value DP

5.3 Explained: Expected value DP for die rolls

5.4 Gaussian elimination for E-value on graphs with cycles

Operations Reference

Problem Patterns & Tricks

Pattern 1: Indicator Variable Decomposition

Pattern 2: Coupon Collector / Geometric Phases

Pattern 3: Expected Value DP on DAGs

Pattern 4: Self-loop elimination in E-value DP

Pattern 5: Probability modulo prime

Pattern 6: Markov Chain + Matrix Exponentiation

Contest Cheat Sheet

Gotchas & Debugging

Mistake 1: Forgetting self-loop rearrangement

Mistake 2: Wrong modular inverse

Mistake 3: Probability > 1 in modular world

Mistake 4: Off-by-one in geometric distribution

Mistake 5: Applying linearity to products

Edge cases

Debugging tips

Mental Traps

Common Mistakes

Debug Drills

Self-Test

Practice Problems

Further Reading