Skip to content

Probability & Expected Value

Quick Revisit

  • USE WHEN: problem asks "expected value" or "probability of X" — especially with linearity of expectation
  • INVARIANT: E[X₁+X₂+...] = E[X₁]+E[X₂]+... regardless of dependence (linearity of expectation)
  • TIME: O(n) for linearity-of-expectation reframe; O(n²) for expected-value DP | SPACE: O(n)
  • CLASSIC BUG: forgetting modular inverse for probability under mod — P = a/b becomes a·b⁻¹ mod p
  • PRACTICE: 08-ladder-mixed

Linearity of expectation, geometric distribution, expected value DP, and probability on graphs. The single most underrated toolkit at CF 1600+ -- once you see it, half the "hard math" problems become straightforward.

Difficulty: [Intermediate]Prerequisites: Modular Arithmetic, 1-D DPCF Rating Range: 1600-2200

Three modes — pick one before you write code

Contest probability problems land in one of three modes. Recognising which mode applies determines the entire solution shape:

ModeTriggerTool
1. Direct probability countingFinite uniform sample space, "what is P(event)?"$\frac{
2. Expected value via indicators"Expected number of [X]" with linearity of expectationE[Xi]=P(Xi=1). Pick the right indicator events; each individual probability is usually trivial.
3. Expected value DP / equationsRandom walks, absorbing states, "expected steps until..."Set up E[s]=cost(s)+p(st)E[t]. Linear-system on cycles, simple recurrence on DAGs.

Reach for indicators before DP — many "expected value DP" problems collapse to a one-line indicator sum once you spot the right Xi.

Modular probability — concrete example. Under modulus p=109+7, the probability 16 is not 1.0/6.0 and certainly not the integer 0. It is mod_inv(6) = pow(6, p-2, p) = 166666668. Verify: 6166666668mod(109+7)=1. Every fractional probability in a modular problem becomes a multiplication by the modular inverse of the denominator. If you ever see (int)(prob * something) mixing floats and mod, the answer is wrong even if the test passes.

Contents


Intuition

Roll a fair 6-sided die until you get a 6. How many rolls does it take on average?

Naive approach: simulate 106 trials in code and average the counts.

Trial 1:  3 5 2 6           -> 4 rolls
Trial 2:  1 6               -> 2 rolls
Trial 3:  4 4 3 2 5 1 6     -> 7 rolls
Trial 4:  6                  -> 1 roll
Trial 5:  2 3 6             -> 3 rolls
...
Average over 10^6 trials    ~= 6.0

You get approximately 6.0, but the contest asks for the exact answer as a fraction or modulo 109+7. Simulation cannot prove E=6 exactly, and for harder variants (e.g., "expected rolls to see all 6 faces") simulation is too slow and too imprecise.

We need a way to compute expected values symbolically, without enumerating an exponential or infinite sample space.

Linearity of expectation lets you split complex expected values into simple independent parts -- E[X+Y]=E[X]+E[Y] even when X and Y are dependent!

Analogy: to find the expected number of heads in 100 coin flips, just add up 100×0.5=50. You never need to consider all 2100 outcomes. Each flip contributes 0.5 to the total expectation independently of every other flip. Linearity is the algebraic fact that makes this shortcut legal.

This analogy is exact -- it works for any random variables, dependent or not. The only place it breaks is when you need E[XY] (product, not sum); then independence does matter.

Visual Walkthrough

Coupon collector: there are 4 coupon types. Each purchase gives a uniformly random coupon. Expected purchases to collect all 4?

Decompose into phases using linearity of expectation:

Phase 0: You have 0 types. Next purchase gives a NEW type with prob 4/4.
Phase 1: You have 1 type. Next NEW type with prob 3/4.
Phase 2: You have 2 types. Next NEW type with prob 2/4.
Phase 3: You have 3 types. Next NEW type with prob 1/4.

Each phase is a geometric random variable -- expected 1/p trials until success.

+-----------------------------------------------------------+
| Phase | Types owned | P(new) | E[purchases in phase]      |
|-------|-------------|--------|----------------------------|
|   0   |     0       |  4/4   |  4/4 = 1.00                |
|   1   |     1       |  3/4   |  4/3 = 1.33                |
|   2   |     2       |  2/4   |  4/2 = 2.00                |
|   3   |     3       |  1/4   |  4/1 = 4.00                |
+-----------------------------------------------------------+
| TOTAL (by linearity):  1 + 4/3 + 2 + 4 = 25/3 ~= 8.33   |
+-----------------------------------------------------------+

Step-by-step trace (one possible run):

Step 1: Buy coupon -> type A  (new!)    collected = {A}
Step 2: Buy coupon -> type C  (new!)    collected = {A, C}
Step 3: Buy coupon -> type A  (dup)     collected = {A, C}
Step 4: Buy coupon -> type B  (new!)    collected = {A, B, C}
Step 5: Buy coupon -> type B  (dup)     collected = {A, B, C}
Step 6: Buy coupon -> type A  (dup)     collected = {A, B, C}
Step 7: Buy coupon -> type D  (new!)    collected = {A, B, C, D}  DONE!

Brute force would enumerate all possible sequences (infinite!). Linearity of expectation gives the exact answer 25/3 with zero enumeration.

COUPON COLLECTOR -- PHASE DECOMPOSITION (n=4 types):

  Phase:   ---- 0 ----   ---- 1 ----   ----- 2 -----   --------- 3 ---------
  Owned:       0 types        1 type         2 types           3 types
  P(new):       4/4            3/4            2/4               1/4
  E[tries]:     1.0            1.33           2.0               4.0

  Timeline of one possible run:
  +---+---+---+---+---+---+---+
  | A | C | A | B | B | A | D |
  +---+---+---+---+---+---+---+
   new  new dup  new dup  dup  new
   <0>  <-1->     <--2-->      <------3------>

  Total E = 1 + 4/3 + 2 + 4 = 25/3 ~ 8.33

For n types: E=nHn=n(1+1/2+1/3++1/n), where Hn is the n-th harmonic number.

The Invariant

+------------------------------------------------------------------+
| CORE FORMULAS                                                    |
|                                                                  |
| 1. Linearity:  E[X] = sum of E[X_i] for indicator vars X_i.    |
|    (Even if X_i are dependent!)                                  |
|                                                                  |
| 2. Geometric distribution: repeated independent trials with      |
|    success probability p.  E[trials until first success] = 1/p. |
|                                                                  |
| 3. Expected value DP:                                            |
|    E[state] = sum over transitions of                            |
|               prob(t) * (cost(t) + E[next_state(t)])             |
|                                                                  |
| 4. Indicator variable trick:                                     |
|    E[count of events] = sum of P(event_i)                        |
+------------------------------------------------------------------+

Look at the coupon collector walkthrough: each phase is an independent geometric variable with success probability (nk)/n. The total expectation is the sum of individual phase expectations (linearity). The invariant connects all three formulas.

When to Reach for This

Trigger phrases:

  • "expected value" / "expected number of steps"
  • "probability that..." / "what is the probability"
  • "random process" / "each step independently"
  • "answer modulo 109+7" combined with any of the above (signals modular inverse for probability)

Counter-examples:

  • "Count the number of ways" without "expected" -> pure combinatorics, not expected value. See Combinatorics Basics.
  • "Minimize the expected cost where you choose actions" -> this is optimization/game theory, not pure probability. You may still use E-value DP, but the recurrence has a min or max over choices.
  • Deterministic simulation of a process -> no randomness, no probability.

What the Code Won't Teach You

The DP fills a table and produces a number. What it hides:

Linearity of expectation does not require independence. This is the single most powerful -- and most counterintuitive -- fact in competitive-programming probability. E[X+Y]=E[X]+E[Y] holds for any random variables, even when X and Y are correlated, dependent, or defined on completely different events. The proof is one line of algebra on the definition of expectation. Once you trust it, "expected number of inversions in a random permutation" becomes a two-line solution instead of an intractable sum over n! outcomes.

Indicator decomposition is the meta-skill. Any "expected count" problem decomposes: let Xi=1 if event i occurs, then E[Xi]=P(Xi=1). The hard part is choosing the right indicator events so that each P(Xi) is easy to compute. The code computes a number; the human chooses the decomposition.

INDICATOR DECOMPOSITION -- THE PATTERN:

  Problem: "Expected # of elements in correct position in random permutation"

  Step 1: Define X_i = 1 if sigma(i) = i   (indicator for fixed point)
  Step 2: E[X_i] = P(sigma(i) = i) = 1/n   (by symmetry)
  Step 3: E[total] = Sum E[X_i] = n * (1/n) = 1

  +-----------------------------------------------------+
  |  No need to enumerate permutations.                  |
  |  No need to track dependencies between X_i and X_j.  |
  |  Linearity handles everything.                       |
  +-----------------------------------------------------+

Visual Intuition (ASCII Art)

Diagram 1 -- Indicator variable decomposition for "expected number of matching pairs" in a random permutation of [1..4]:

Permutation: [2, 1, 4, 3]

Pair positions:    (1,2)  (1,3)  (1,4)  (2,3)  (2,4)  (3,4)
Match? (a_i=i):     N      N      N      N      N      N

X_i = 1 if position i has a_i = i (a "fixed point")
E[# fixed points] = sum E[X_i] = 4 * (1/4) = 1

+-----+-----+-----+-----+
| pos |  1  |  2  |  3  |  4  |
|-----|-----|-----|-----|-----|
| val |  2  |  1  |  4  |  3  |
| X_i |  0  |  0  |  0  |  0  |
+-----+-----+-----+-----+-----+

P(X_i = 1) = 1/4  for each i   (by symmetry)
E[total] = 4 * 1/4 = 1         (by linearity)

Diagram 2 -- Expected value DP: random walk on a line, 0 to n.

Start at 0. Each step: move right with prob p, left with prob 1p. Expected steps to reach n? (Absorbing barrier at n, reflecting at 0.)

State diagram (n=3):

     1-p        1-p        1-p
  +------+   +------+   +------+
  |      v   |      v   |      v
  |  +---+---+---+---+---+---+---+
  +--| 0 |-->| 1 |-->| 2 |-->| 3 |  (absorbing)
     +---+ p +---+ p +---+ p +---+
       ^       |       |
       +-------+       |
        reflect        |
         at 0          |
                       |
    E[0] = 1 + p*E[1] + (1-p)*E[0]    (self-loop at 0)
    E[1] = 1 + p*E[2] + (1-p)*E[0]
    E[2] = 1 + p*E[3] + (1-p)*E[1]
    E[3] = 0                            (target)

Solve the linear system for E[0]. For p=1/2,n=3: E[0]=9.


Foundational Probability

The sections above establish linearity of expectation and the geometric distribution -- the two workhorses of CP probability. But many contest problems (and virtually all quant/HFT interviews) require sharper tools: Bayes' theorem, named distributions, variance, and simulation. This section fills those gaps.


Basic Axioms & Rules

A sample space Ω is the set of all possible outcomes of an experiment. An event A is a subset of Ω. A probability function maps events to [0,1].

Three axioms (everything else derives from these):

  1. P(A)0 for every event A.
  2. P(Ω)=1.
  3. For mutually exclusive events A,B: P(AB)=P(A)+P(B).

Derived rules you'll use constantly:

P(AB)=P(A)+P(B)P(AB)P(A¯)=1P(A)

Complementary counting is the single most useful probability trick at CF 1100-1600: when computing P(A) directly is hard, compute 1P(A¯) instead.

Worked example -- two dice:

Roll two fair 6-sided dice. What is P(sum10)?

All 36 equally likely outcomes (die1, die2):

  Sum >= 10 outcomes:
    (4,6) (5,5) (5,6) (6,4) (6,5) (6,6)  -->  6 outcomes

  Direct:      P = 6/36 = 1/6
  Complement:  P(sum < 10) = 30/36,  so P(sum >= 10) = 1 - 30/36 = 1/6

Inclusion-exclusion generalizes to more events:

P(ABC)=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC)
+-------------------------------------------------------+
| INCLUSION-EXCLUSION (two events, Venn diagram)        |
|                                                       |
|      +----------+                                     |
|      |    A     |                                     |
|      |     +----+------+                              |
|      |     |  A n B    |                              |
|      +-----+----+   B  |                              |
|            |    +------+                              |
|                                                       |
|  P(A u B) = P(A) + P(B) - P(A n B)                   |
+-------------------------------------------------------+

Conditional Probability & Bayes' Theorem

Conditional probability -- the probability of A given that B occurred:

P(AB)=P(AB)P(B)

This is the foundation of all "given that..." reasoning.

Multiplication rule (rearranging):

P(AB)=P(AB)P(B)=P(BA)P(A)

Bayes' theorem reverses the conditioning direction:

P(AB)=P(BA)P(A)P(B)

The denominator expands via the law of total probability:

P(B)=P(BA)P(A)+P(BA¯)P(A¯)

Example 1 -- Disease testing (classic quant interview):

A disease affects 1 in 1000 people. A test is 99% accurate (99% sensitivity, 99% specificity). You test positive. What is P(disease+)?

Let D = "has disease",  T+ = "tests positive"

Given:
  P(D)      = 0.001          (base rate / prior)
  P(T+|D)   = 0.99           (sensitivity)
  P(T+|~D)  = 0.01           (false positive rate)

Compute P(T+):
  P(T+) = P(T+|D)*P(D) + P(T+|~D)*P(~D)
        = 0.99 * 0.001 + 0.01 * 0.999
        = 0.00099 + 0.00999
        = 0.01098

Apply Bayes:
  P(D|T+) = P(T+|D)*P(D) / P(T+)
          = 0.00099 / 0.01098
          = 0.0902  (~9%)

  SURPRISE: 91% chance you DON'T have it despite testing positive.
  The low base rate dominates.

Visualize with natural frequencies (often clearer than formulas):

+------------------------------------------------------------+
|  Population: 100,000 people                                |
|                                                            |
|  +----------+    +--------------------------------------+  |
|  | Diseased |    | Healthy                              |  |
|  |   100    |    | 99,900                               |  |
|  |          |    |                                      |  |
|  | Test+:99 |    | Test+: 999      Test-: 98,901       |  |
|  | Test-: 1 |    |                                      |  |
|  +----------+    +--------------------------------------+  |
|                                                            |
|  P(D | T+) = 99 / (99 + 999) = 99/1098 ~ 9.0%            |
+------------------------------------------------------------+

Example 2 -- Monty Hall problem:

Three doors: one car, two goats. You pick door 1. The host (who knows what's behind each door) opens door 3, revealing a goat. Should you switch?

Initial setup:
  Door 1     Door 2     Door 3
  (You)      (???)      (Host opens: goat)

Case analysis (car equally likely behind any door):

  Car behind 1: Host opens 2 or 3 (say 3). Switch -> LOSE.  P = 1/3
  Car behind 2: Host MUST open 3.          Switch -> WIN.   P = 1/3
  Car behind 3: Host MUST open 2 (not 3).  (Contradicts host opening 3.)

  Given host opened 3:
    P(car = 1 | host opens 3) = 1/3
    P(car = 2 | host opens 3) = 2/3

  SWITCH. Switching wins 2/3 of the time.

The Bayes calculation:

P(car=2host opens 3)=P(host opens 3car=2)P(car=2)P(host opens 3)=11/31/2=23

Why Bayes matters for CP: any problem where you observe partial information and must update beliefs. "Given that event B happened, what is P(A)?" -- that's conditional probability. CF problems with partially revealed randomness often need this reasoning.

cpp
#include <bits/stdc++.h>
using namespace std;

// Bayes update: given prior[i] and likelihood[i] = P(evidence | hypothesis i),
// return posterior[i] = P(hypothesis i | evidence).
vector<double> bayes_update(vector<double>& prior, vector<double>& likelihood) {
    int n = prior.size();
    double evidence = 0;
    for (int i = 0; i < n; i++) evidence += prior[i] * likelihood[i];
    vector<double> posterior(n);
    for (int i = 0; i < n; i++)
        posterior[i] = prior[i] * likelihood[i] / evidence;
    return posterior;
}

Independence: events A and B are independent iff P(AB)=P(A)P(B), equivalently P(AB)=P(A). Independent events give no information about each other. Don't confuse with mutually exclusive (disjoint), which is the opposite -- if A happened, B definitely didn't.


Discrete Distributions

A probability distribution describes the behavior of a random variable. These named distributions appear repeatedly in CP and quant interviews -- recognize them instantly.

Bernoulli(p)

A single trial: success with probability p, failure with probability 1p.

P(X=1)=p,P(X=0)=1pE[X]=p,Var(X)=p(1p)

The building block for everything else. An indicator random variable is Bernoulli.

cpp
#include <bits/stdc++.h>
using namespace std;
// Bernoulli trial: returns 1 with probability p
int bernoulli(mt19937& rng, double p) {
    return uniform_real_distribution<double>(0, 1)(rng) < p ? 1 : 0;
}

Binomial(n,p)

Number of successes in n independent Bernoulli(p) trials.

P(X=k)=(nk)pk(1p)nkE[X]=np,Var(X)=np(1p)

When it appears: "flip n coins with bias p, probability of exactly k heads?" Voting models, reliability problems, any "count successes in fixed trials" setup.

Binomial(10, 0.5) PMF:

  P(k)
  0.25 +          *
       |        *   *
  0.20 +       *     *
       |      *       *
  0.15 +
       |    *           *
  0.10 +
       |   *             *
  0.05 +  *               *
       | *                 *
  0.00 +*                   *
       +--+--+--+--+--+--+--+--+--+--+--
       0  1  2  3  4  5  6  7  8  9  10  k
cpp
#include <bits/stdc++.h>
using namespace std;
// P(X = k) for Binomial(n, p) -- use log to avoid overflow
double binom_pmf(int n, int k, double p) {
    double log_prob = lgamma(n + 1) - lgamma(k + 1) - lgamma(n - k + 1)
                    + k * log(p) + (n - k) * log(1 - p);
    return exp(log_prob);
}

For CP mod-prime problems, compute (nk) with precomputed factorials and modular inverse, then multiply by pk(1p)nk in modular arithmetic.

Geometric(p)

Number of independent trials until the first success (inclusive).

P(X=k)=(1p)k1pfor k=1,2,3,E[X]=1p,Var(X)=1pp2

This is the distribution behind the coupon collector phases in Section 2 above -- each phase is geometric with success probability (nk)/n.

Memoryless property: P(X>s+tX>s)=P(X>t). If you've already failed s times, the expected remaining trials is still 1/p. Past failures don't help. This is the only discrete memoryless distribution.

Geometric(p=0.3) -- probability of first success on trial k:

  k:     1     2     3     4     5     6     7    ...
  P:  0.300 0.210 0.147 0.103 0.072 0.050 0.035  ...
       ***
       ***  ***
       ***  ***  **
       ***  ***  **   *
       ***  ***  **   *    *    *

Poisson(λ)

Number of events in a fixed interval when events occur independently at rate λ. Useful when n is large, p is small, and npλ.

P(X=k)=λkeλk!E[X]=λ,Var(X)=λ

When it appears: modeling arrivals (network packets, server requests -- quant interview staple), approximating Binomial(n,p) when n is huge and p is tiny.

Key property: sum of independent Poisson variables is Poisson: XPoisson(λ1),YPoisson(λ2)X+YPoisson(λ1+λ2).

cpp
#include <bits/stdc++.h>
using namespace std;
// P(X = k) for Poisson(lambda)
double poisson_pmf(double lambda, int k) {
    return exp(k * log(lambda) - lambda - lgamma(k + 1));
}

Hypergeometric

Drawing without replacement. An urn has N balls total: K red, NK blue. Draw n balls without replacement. X = number of red balls drawn.

P(X=k)=(Kk)(NKnk)(Nn)E[X]=nKN,Var(X)=nKNNKNNnN1

When it appears: card-drawing problems, lottery calculations, sampling without replacement. Key difference from binomial: draws are dependent because there's no replacement. As N with K/N=p fixed, hypergeometric binomial.

+-----------------------------------------------------------+
| DISCRETE DISTRIBUTIONS -- QUICK REFERENCE                  |
|-----------------------------------------------------------|
| Name            | E[X]   | Var(X)       | Key property    |
|-----------------|--------|--------------|-----------------|
| Bernoulli(p)    | p      | p(1-p)       | single trial    |
| Binomial(n,p)   | np     | np(1-p)      | n ind. trials   |
| Geometric(p)    | 1/p    | (1-p)/p^2    | memoryless      |
| Poisson(lam)    | lam    | lam          | rare events     |
| Hypergeometric  | nK/N   | (see above)  | no replacement  |
+-----------------------------------------------------------+

Continuous Distributions (Brief, for Interviews)

These rarely appear in CP (which deals with discrete structures) but are quant/HFT interview staples. Know the formulas and key properties.

Uniform(a,b)

All values in [a,b] equally likely.

f(x)=1ba,E[X]=a+b2,Var(X)=(ba)212

Classic interview problem: "Two people agree to meet between 12:00 and 1:00. Each arrives at a uniformly random time and waits 15 minutes. What is the probability they meet?"

Answer: P=1(3/4)2=7/16=0.4375. (Area of the meeting region in the unit square minus the two corner triangles.)

Exponential(λ)

Time until the next event in a Poisson process.

f(x)=λeλx for x0,E[X]=1λ,Var(X)=1λ2

Memoryless property (continuous analog of geometric): P(X>s+tX>s)=P(X>t). "Given you've waited s minutes, the expected additional wait is still 1/λ."

CDF: P(Xt)=1eλt. Use this to compute "probability of event within time t."

Normal / Gaussian(μ,σ2)

The most important continuous distribution, thanks to the Central Limit Theorem.

f(x)=1σ2πexp((xμ)22σ2)E[X]=μ,Var(X)=σ2

68-95-99.7 rule:

                      99.7%  (within 3 sigma)
                |<------------------------------>|
                |    95%  (within 2 sigma)        |
                |  |<---------------------->|     |
                |  |   68%  (within 1 sigma)|     |
                |  |  |<-------------->|    |     |
                |  |  |                |    |     |
           -----+--+--+-------*-------+--+-+-----+-----
              mu-3s  mu-s     mu    mu+s  mu+2s  mu+3s

Central Limit Theorem (CLT): the average of n independent random variables (with finite mean μ and variance σ2) converges to a normal distribution as n:

X¯μσ/ndN(0,1)

Interview application: "Flip 10000 fair coins. Approximate P(heads>5100)." Use CLT: μ=5000, σ=2500=50, z=(51005000)/50=2. P(Z>2)0.0228 (from standard normal table / 68-95-99.7 rule: about 2.5% in the upper tail beyond 2σ).

+-----------------------------------------------------------+
| CONTINUOUS DISTRIBUTIONS -- QUICK REFERENCE                |
|-----------------------------------------------------------|
| Name             | E[X]     | Var(X)       | Key property |
|------------------|----------|--------------|--------------|
| Uniform(a,b)     | (a+b)/2  | (b-a)^2/12   | flat PDF     |
| Exponential(lam) | 1/lam    | 1/lam^2      | memoryless   |
| Normal(mu,s^2)   | mu       | s^2          | CLT limit    |
+-----------------------------------------------------------+

Variance & Covariance

Variance measures how spread out a distribution is. You need it when problems ask for "the variance of ..." or when analyzing the reliability of Monte Carlo estimates.

Computational formula (almost always easier than the definition):

Var(X)=E[X2](E[X])2

The definition Var(X)=E[(XE[X])2] is equivalent but harder to work with in practice. The computational formula lets you track E[X] and E[X2] separately.

Scaling:

Var(aX+b)=a2Var(X)

Adding a constant shifts the distribution but doesn't change its spread.

Variance of a sum:

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)

Covariance:

Cov(X,Y)=E[XY]E[X]E[Y]

If X,Y are independent, then Cov(X,Y)=0, giving the simpler:

Var(X+Y)=Var(X)+Var(Y)(independent only!)

For n independent variables: Var(i=1nXi)=i=1nVar(Xi).

Worked example -- sum of two dice:

Let S = D_1 + D_2 (two independent fair dice).

E[D_i]   = (1+2+3+4+5+6)/6 = 3.5
E[D_i^2] = (1+4+9+16+25+36)/6 = 91/6
Var(D_i) = 91/6 - (3.5)^2 = 91/6 - 49/4 = 35/12 ~ 2.917

D_1, D_2 independent:
  Var(S) = Var(D_1) + Var(D_2) = 35/12 + 35/12 = 35/6 ~ 5.833
  Std dev = sqrt(35/6) ~ 2.415

When variance matters in CP: some problems ask for the variance or standard deviation of a random quantity (not just its expectation). Approach: compute E[X] and E[X2] separately using linearity, then subtract. Both E[X] and E[X2] can use indicator decomposition.

cpp
#include <bits/stdc++.h>
using namespace std;

// Compute E[X] and Var(X) from samples or a PMF
// pmf[i] = {value, probability}
pair<double, double> mean_variance(vector<pair<double, double>>& pmf) {
    double ex = 0, ex2 = 0;
    for (auto& [val, prob] : pmf) {
        ex += val * prob;
        ex2 += val * val * prob;
    }
    return {ex, ex2 - ex * ex};
}

Covariance in practice: when random variables are not independent (e.g., two indicator variables for overlapping events), you must compute covariance explicitly. For indicators Xi,Xj{0,1}:

Cov(Xi,Xj)=P(Xi=1 and Xj=1)P(Xi=1)P(Xj=1)

Classic Probability Problems (Worked Solutions)

These appear in CP contests, quant interviews, and mathematical olympiads. Recognize the pattern, apply the formula.

Coupon Collector

(Detailed walkthrough in Section 2 above.) For n types:

E=nHn=n(1+12+13++1n)nlnn+0.577n

The variance is:

Var=k=1nn2k2nHnπ26n2

For n=6 (dice faces): E=14.7, Var38.4.

Birthday Paradox

In a group of k people with birthdays uniform over 365 days, the probability of at least one shared birthday:

P(collision)=1i=0k1(1i365)

For k=23: P0.507 -- just over 50%.

Application in CS: hash collision analysis. With a hash space of size N, expect a collision after πN/21.18N random insertions. For 64-bit hashes (N=264): collision expected after 5×109 elements.

+----------------------------------------------+
|  Birthday paradox: P(collision) vs group size |
|                                               |
|  P  1.0 +                         ****       |
|     0.8 +                    *****           |
|     0.6 +               ****                |
|     0.5 +- - - - - -* - - - - - - -          |
|     0.4 +          *                          |
|     0.2 +       **                            |
|     0.0 +****                                 |
|         +--+--+--+--+--+--+--+--+--> k       |
|         0  5  10 15 20 25 30 40 50            |
+----------------------------------------------+
cpp
#include <bits/stdc++.h>
using namespace std;
// Minimum group size k for P(collision) >= threshold in space of size n
int birthday_bound(long long n, double threshold = 0.5) {
    double p_no_collision = 1.0;
    for (int k = 1; ; k++) {
        p_no_collision *= (double)(n - k + 1) / n;
        if (1.0 - p_no_collision >= threshold) return k;
    }
}

Gambler's Ruin

A gambler starts with $a. Each round: wins $1 with probability p or loses $1 with probability q=1p. Game ends at $0 (ruin) or $N (goal).

Probability of ruin starting from a:

P(ruin from a)={(q/p)a(q/p)N1(q/p)Nif pq1aNif p=q=0.5

Expected duration (for p=q=0.5): E[steps]=a(Na).

Fortune over time (random walk, absorbing barriers at 0 and N):

  N +                                   * (WIN)
    |                     *   *       *
    |                 *  * * * *  *  *
    |            *   * *       * * *
  a + - - - * * * *                      <-- start
    |      *
    |   * *
    | **
  0 +*                                   (RUIN)
    +--+--+--+--+--+--+--+--+--+--+--> rounds

When it appears: random walk problems, stopping time analysis, competitive betting models in quant interviews. Also connects to the expected value DP on random walks (Section 3 diagram above).

Ballot Problem

Candidate A gets a votes, candidate B gets b votes, with a>b. If votes are counted in a uniformly random order, the probability that A is strictly ahead throughout the entire counting is:

P(A always ahead)=aba+b

This connects to Catalan numbers and lattice path counting. For a=3,b=1: P=2/4=1/2.

Generalization (Cycle Lemma): the number of cyclic permutations of a sequence of +1's and 1's (with sum >0) that have all positive partial sums is exactly (ab)/(a+b) of the total.

Dice Puzzles

Expected rolls to see all 6 faces -- coupon collector with n=6:

E=6(1+12+13+14+15+16)=14710=14.7

Expected value of the max of k independent Uniform(0,1) variables:

E[max(X1,,Xk)]=kk+1

Proof: CDF of the max is F(x)=xk, so E[max]=01(1xk)dx=11k+1.

Common quant interview question: "I draw n numbers uniformly from [0,1]. What is the expected value of the largest?" Answer: n/(n+1).

Expected number of rolls of a die until sum exceeds N: use DP exactly as in Section 5.3 (implementation section below). E[s]=1+16f=16E[min(s+f,N)], computed backwards.


Monte Carlo Simulation

When exact computation is intractable, simulate. The Law of Large Numbers guarantees that the sample average converges to the true expectation as the number of trials .

When to use Monte Carlo:

  1. Verifying your exact formula on small inputs (stress testing).
  2. Getting approximate answers when no closed form exists.
  3. Computing integrals numerically (e.g., estimating π).

Basic template:

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
    uniform_real_distribution<double> real01(0.0, 1.0);
    uniform_int_distribution<int> die(1, 6);

    const int TRIALS = 1'000'000;
    int hits = 0;
    for (int i = 0; i < TRIALS; i++) {
        // simulate one trial
        int s = die(rng) + die(rng) + die(rng);
        if (s > 15) hits++;
    }
    double prob = (double)hits / TRIALS;
    printf("Estimated P(3 dice sum > 15): %.6f\n", prob);
    // Exact: 10/216 = 0.046296...
}

Estimating π: drop random points in the unit square [0,1]2. Fraction that land inside the quarter-circle x2+y21 is π/4.

+---+---+---+---+---+
|   .   . * .       |  1.0
|     *   *   . *   |
|   *   * * .   *   |
| * . *   *   *     |
|   *   *   .       |  0.5
| . * *       .     |
|   *   .           |
| *       .         |
+---+---+---+---+---+
0.0               1.0

  *  = inside quarter-circle (x^2 + y^2 <= 1)
  .  = outside quarter-circle
  pi ~ 4 * (count of *) / (total points)
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
    mt19937 rng(42);
    uniform_real_distribution<double> dist(0.0, 1.0);
    int inside = 0, total = 1'000'000;
    for (int i = 0; i < total; i++) {
        double x = dist(rng), y = dist(rng);
        if (x * x + y * y <= 1.0) inside++;
    }
    printf("pi ~ %.6f\n", 4.0 * inside / total);  // ~3.1416
}

Stress testing: when you derive an exact formula, always verify with Monte Carlo on small inputs before submitting.

cpp
#include <bits/stdc++.h>
using namespace std;

// Verify: expected fixed points in a random permutation = 1.0
int main() {
    mt19937 rng(42);
    int n = 10;
    const int TRIALS = 1'000'000;
    long long total_fixed = 0;
    for (int t = 0; t < TRIALS; t++) {
        vector<int> perm(n);
        iota(perm.begin(), perm.end(), 0);
        shuffle(perm.begin(), perm.end(), rng);
        for (int i = 0; i < n; i++)
            if (perm[i] == i) total_fixed++;
    }
    printf("E[fixed points] ~ %.4f (exact: 1.0000)\n",
           (double)total_fixed / TRIALS);
}
+-------------------------------------------------------------------+
| MONTE CARLO PRECISION GUIDE                                       |
|-------------------------------------------------------------------|
| Trials      | Typical error (p ~ 0.5) | Good for...               |
|-------------|-------------------------|---------------------------|
| 10^4        | ~0.005                  | Quick sanity check         |
| 10^5        | ~0.0016                 | 3-digit accuracy           |
| 10^6        | ~0.0005                 | Most stress tests          |
| 10^7        | ~0.00016                | High-precision verify      |
|-------------------------------------------------------------------|
| Error ~ 1/sqrt(TRIALS).  To halve the error, quadruple trials.   |
+-------------------------------------------------------------------+

C++ STL Reference

Function / ClassHeaderWhat it doesTime
mt19937 rng(seed)<random>Mersenne Twister PRNG, 32-bitO(1) per call
uniform_int_distribution<>(a,b)<random>Uniform random int in [a,b]O(1)
uniform_real_distribution<>(a,b)<random>Uniform random double in [a,b)O(1)
__gcd(a, b) / gcd(a,b)<algorithm> / <numeric>GCD (needed for fraction simplification)O(logmin(a,b))
__int128built-in128-bit int for modular arithmetic overflow avoidance--

For contest probability problems, you rarely use <random> directly -- the key operations are modular inverse and DP. The STL's main role is providing gcd and fast I/O.


Implementation (Contest-Ready)

5.1 Minimal: Modular inverse for probability mod prime

cpp
#include <bits/stdc++.h>
using namespace std;

const long long MOD = 1e9 + 7;

long long power(long long base, long long exp, long long mod) {
    long long result = 1;
    base %= mod;
    while (exp > 0) {
        if (exp & 1) result = result * base % mod;
        base = base * base % mod;
        exp >>= 1;
    }
    return result;
}

long long inv(long long a, long long mod = MOD) {
    return power(a, mod - 2, mod);
}

// Probability p/q mod MOD = p * inv(q) mod MOD
long long prob_mod(long long p, long long q) {
    return p % MOD * inv(q) % MOD;
}

int main() {
    // Coupon collector with n types: E = n * H_n
    int n;
    cin >> n;
    long long ans = 0;
    for (int k = 1; k <= n; k++)
        ans = (ans + prob_mod(n, k)) % MOD;
    cout << ans << "\n";
}

5.2 Minimal: Expected value DP

cpp
#include <bits/stdc++.h>
using namespace std;

const long long MOD = 1e9 + 7;

long long power(long long b, long long e, long long m) {
    long long r = 1; b %= m;
    while (e > 0) { if (e & 1) r = r * b % m; b = b * b % m; e >>= 1; }
    return r;
}
long long inv(long long a) { return power(a, MOD - 2, MOD); }

int main() {
    int n;
    cin >> n;
    vector<vector<pair<int,long long>>> adj(n); // adj[u] = {(v, prob_numerator)}
    // Read graph / transitions ...
    // E[target] = 0, solve backwards (topological or Gaussian)
    vector<long long> E(n, 0);
    // Example: DAG in reverse topological order
    // E[u] = sum over (v, p): p/total * (1 + E[v])
    cout << E[0] << "\n";
}

5.3 Explained: Expected value DP for die rolls

cpp
#include <bits/stdc++.h>
using namespace std;

const long long MOD = 1e9 + 7;

long long power(long long b, long long e, long long m) {
    long long r = 1; b %= m;
    while (e > 0) { if (e & 1) r = r * b % m; b = b * b % m; e >>= 1; }
    return r;
}
long long inv(long long a) { return power(a, MOD - 2, MOD); }

int main() {
    // Problem: roll a 6-sided die. You start with score 0.
    // Each roll adds the face value to your score.
    // Expected number of rolls to reach score >= n?
    //
    // E[s] = expected rolls to reach >= n starting from score s.
    // E[s] = 0 if s >= n.
    // E[s] = 1 + (1/6) * sum_{f=1}^{6} E[s + f]
    //
    // Compute backwards from s = n-1 down to 0.

    int n;
    cin >> n;

    vector<long long> E(n + 7, 0); // E[s] = 0 for s >= n

    long long inv6 = inv(6);

    for (int s = n - 1; s >= 0; s--) {
        // E[s] = 1 + (1/6) * sum_{f=1..6} E[s+f]
        long long total = 0;
        for (int f = 1; f <= 6; f++) {
            int ns = s + f;
            if (ns >= n)
                total = (total + 0) % MOD; // E[ns] = 0
            else
                total = (total + E[ns]) % MOD;
        }
        // E[s] = 1 + total * inv(6)
        E[s] = (1 + total % MOD * inv6 % MOD) % MOD;
    }

    cout << E[0] << "\n";
}

5.4 Gaussian elimination for E-value on graphs with cycles

cpp
#include <bits/stdc++.h>
using namespace std;

// Solve system of linear equations over doubles.
// a[i] is row i with n+1 columns (augmented matrix).
// Returns solution vector x, or empty if no unique solution.
vector<double> gauss(vector<vector<double>>& a) {
    int n = a.size();
    for (int col = 0; col < n; col++) {
        int pivot = -1;
        for (int row = col; row < n; row++)
            if (abs(a[row][col]) > 1e-9) { pivot = row; break; }
        if (pivot == -1) return {}; // singular
        swap(a[col], a[pivot]);
        double div = a[col][col];
        for (int j = col; j <= n; j++) a[col][j] /= div;
        for (int row = 0; row < n; row++) {
            if (row == col) continue;
            double factor = a[row][col];
            for (int j = col; j <= n; j++)
                a[row][j] -= factor * a[col][j];
        }
    }
    vector<double> x(n);
    for (int i = 0; i < n; i++) x[i] = a[i][n];
    return x;
}

int main() {
    // E[i] = 1 + (1/deg(i)) * sum_{j in adj(i)} E[j]
    // E[target] = 0
    // Rearrange: E[i] - (1/deg(i)) * sum E[j] = 1
    // Build augmented matrix and solve.

    int n, m, target;
    cin >> n >> m >> target;
    vector<vector<int>> adj(n);
    for (int i = 0; i < m; i++) {
        int u, v; cin >> u >> v;
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    // n equations, n unknowns
    vector<vector<double>> eq(n, vector<double>(n + 1, 0.0));
    for (int i = 0; i < n; i++) {
        if (i == target) {
            eq[i][i] = 1.0; // E[target] = 0
            eq[i][n] = 0.0;
        } else {
            int deg = adj[i].size();
            eq[i][i] = 1.0;
            for (int j : adj[i])
                eq[i][j] -= 1.0 / deg;
            eq[i][n] = 1.0;
        }
    }

    auto ans = gauss(eq);
    if (ans.empty()) cout << "-1\n";
    else cout << fixed << setprecision(9) << ans[0] << "\n";
}

Operations Reference

OperationTimeSpace
Modular inverse (Fermat)O(logp)O(1)
Precompute inverse table 1..nO(n)O(n)
Expected value DP (DAG, n states, m transitions)O(n+m)O(n)
Gaussian elimination (n unknowns)O(n3)O(n2)
Coupon collector (n types)O(n)O(1)
Linearity of expectation (sum of n indicators)O(n)O(1)
Matrix exponentiation for Markov chain (k steps, n states)O(n3logk)O(n2)

Problem Patterns & Tricks

Pattern 1: Indicator Variable Decomposition

Instead of computing E[complex quantity] directly, write it as a sum of indicator random variables: X=Xi where Xi{0,1}. Then E[X]=P(Xi=1).

Example: Expected number of inversions in a random permutation of [1..n]. There are (n2) pairs. Each pair (i,j) is inverted with probability 1/2 by symmetry. So E=(n2)/2=n(n1)/4.

CF 1Expected -- CF 1042B, CF 1392D

Pattern 2: Coupon Collector / Geometric Phases

Decompose a "collect all" process into phases. Phase k: you have k items, need a new one, probability of new item per trial is (nk)/n. Expected trials in phase k: n/(nk). Total: nHn.

E = sum_{k=0}^{n-1} n / (n - k) = n * (1/1 + 1/2 + ... + 1/n)

CF 1Expected -- CF 1Expected280C, AtCoder ABC 242E

Pattern 3: Expected Value DP on DAGs

Define E[s] = expected cost from state s to target. Base case: E[target]=0. Transition: E[s]=cost(s)+tp(st)E[t].

If the state graph is a DAG, compute in reverse topological order. No linear system needed.

cpp
// Reverse topo order
for (int u : rev_topo) {
    E[u] = 1; // cost of being in state u
    for (auto [v, p] : transitions[u])
        E[u] = (E[u] + p * E[v]) % MOD;
}

CF 518D, CF 839C

Pattern 4: Self-loop elimination in E-value DP

When E[s] depends on itself (probability of staying in state s):

E[s]=1+pstayE[s]+tsptE[t]

Rearrange: (1pstay)E[s]=1+tsptE[t]

E[s]=1+tsptE[t]1pstay

This avoids Gaussian elimination when the only cycle is a self-loop.

SELF-LOOP ELIMINATION IN E-VALUE DP:

  State s can stay with prob p, or move to t with prob 1-p.

  Naive recurrence (INFINITE LOOP):
    E[s] = 1 + p*E[s] + (1-p)*E[t]     <-- E[s] on both sides!

  Algebraic fix:
    E[s] - p*E[s] = 1 + (1-p)*E[t]
    (1-p)*E[s]    = 1 + (1-p)*E[t]
    E[s]          = 1/(1-p) + E[t]

  +--------+  p (self-loop)   +--------+
  | State s| <---------------| State s|
  |        | --------------->| State t|
  +--------+     1-p          +--------+

  Example: p = 1/2, E[t] = 0  ->  E[s] = 1/(1/2) + 0 = 2

CF 1Expected -- CF 1073E, CF 148D

Pattern 5: Probability modulo prime

When the answer is p/q and the problem says "output modulo 109+7", compute pq1mod(109+7). Use Fermat's little theorem: q1qMOD2(modMOD).

Never mix floating-point and modular arithmetic. Do everything in modular world from the start.

CF 1Expected -- CF 1151F, CF 963A

Pattern 6: Markov Chain + Matrix Exponentiation

For a random walk on a small state space (n100) with k steps (k1018), build the n×n transition matrix T and compute Tk with matrix exponentiation.

The probability of being in state j after k steps starting from state i is (Tk)i,j.

CF 1Expected -- CF 691E, AtCoder ABC 256E


Contest Cheat Sheet

+----------------------------------------------------------------+
| PROBABILITY & EXPECTED VALUE  --  Quick Reference              |
|----------------------------------------------------------------|
| WHEN TO USE:                                                   |
|  - "expected number of ..." / "probability that ..."           |
|  - Random process with well-defined transitions                |
|  - Answer mod 10^9+7 with fractions                            |
|                                                                |
| KEY FORMULAS:                                                  |
|  E[X+Y] = E[X] + E[Y]            (ALWAYS, even if dependent)  |
|  E[geometric(p)] = 1/p                                         |
|  E[coupon collector, n] = n * H_n                              |
|  Var(X) = E[X^2] - (E[X])^2                                   |
|  Bayes: P(A|B) = P(B|A)*P(A) / P(B)                           |
|  Birthday: collision after ~1.18*sqrt(N) inserts               |
|                                                                |
| KEY CODE:                                                      |
|  ll inv(ll a) { return power(a, MOD-2, MOD); }                |
|  ll prob(ll p, ll q) { return p % MOD * inv(q) % MOD; }       |
|  // DP: E[s] = cost + sum(p_t * E[next]) / (1 - p_stay)       |
|                                                                |
| COMPLEXITY:                                                    |
|  Modular inverse: O(log MOD)  |  E-val DP (DAG): O(n+m)       |
|  Gauss elimination: O(n^3)    |  Matrix exp: O(n^3 log k)     |
|                                                                |
| GOTCHA: Never mix doubles and mod arithmetic.                  |
| GOTCHA: Self-loops need algebraic rearrangement, not recursion.|
+----------------------------------------------------------------+

Gotchas & Debugging

Mistake 1: Forgetting self-loop rearrangement

If state s can transition back to itself with probability p, writing E[s] = 1 + p * E[s] + ... and computing naively gives infinite recursion. Always rearrange to isolate E[s] algebraically.

Mistake 2: Wrong modular inverse

p/qmodM is pq1modM, not (pmodM)/(qmodM). Integer division and modular inverse are completely different operations.

Mistake 3: Probability > 1 in modular world

When debugging, you cannot check if a modular probability "makes sense" by looking at its value. Instead, verify with small cases using floating-point separately, then trust your modular code.

Mistake 4: Off-by-one in geometric distribution

E[geometric(p)] counts the successful trial too. The expected number of trials until success (inclusive) is 1/p. The expected number of failures before success is (1p)/p. Know which one your problem asks for.

Mistake 5: Applying linearity to products

E[XY]=E[X]E[Y] is only true when X and Y are independent. Linearity of expectation is about sums, not products. This is the #1 conceptual error.

Edge cases

  • Probability 0: if a state is unreachable, its expected value is undefined. Guard against division by zero.
  • n=1 in coupon collector: E=1 (you already get the only type on the first purchase).
  • Absorbing states: make sure E[absorbing]=0 is set before the DP loop.
  • Graph not connected: if target is unreachable from source, expected value is infinite -- the problem is ill-defined or you need to output 1.

Debugging tips

  1. Test with small cases (n5) using brute-force simulation (doubles).
  2. Compare your modular answer by converting the brute-force double to a fraction p/q and checking pq1modM.
  3. Print intermediate E[s] values -- they should form a monotonically decreasing sequence from source to target in many problems.

Mental Traps

"I need the full probability distribution to compute the expected value." For most problems you need only E[X], and linearity of expectation or a recurrence gives it directly. Computing the full distribution (P(X=0),P(X=1),) and then summing kP(X=k) is almost always more work than necessary. The distribution is a detour; the expected value is the destination.

DISTRIBUTION vs DIRECT EXPECTATION:

  "Expected # of heads in 100 fair flips"

  Distribution approach:                 Linearity approach:
  +----------------------------+         +----------------------------+
  | Compute P(X=k) for each k  |         | E[X] = 100 * P(one head)  |
  | = 0..100 using C(100,k)    |         |      = 100 * 0.5          |
  | Sum k * P(X=k)             |         |      = 50                 |
  | -> dozens of binomial      |         |                            |
  |    coefficient evaluations |         | One line. Done.            |
  +----------------------------+         +----------------------------+

"Expected value problems are different from DP problems." In competitive programming, expected value computations are almost always implemented as DP. The state is "where am I in the process," the transition weights are probabilities, and the value is the expected cost-to-go. Recognizing that E[state]=cost+ptE[next] is a DP recurrence -- not a probability formula -- is what makes these problems accessible.


Common Mistakes

  1. Treating every random walk as a sum of geometric trials. The frog that jumps +1 with probability p and stays put with probability 1p reaches position n in expected time n/p -- each step is a fresh geometric variable with parameter p. Tempting to reuse the formula when the frog jumps +1 with probability p and 1 with probability 1p, but now the walk can revisit positions and the steps are no longer independent; the correct setup is the recurrence Ei=1+pEi+1+(1p)Ei1 solved as a system. Linearity of expectation always works for decomposing sums, but E=n/p only applies when each "step" is a fresh independent trial. When there is backtracking or state dependence, use DP or equation systems.

Debug Drills

Drill 1. Expected value DP with backward recurrence.

cpp
// Expected tosses to get n consecutive heads
double E[MAXN];
E[0] = 0;  // 0 consecutive heads: we're done... wait, we WANT n heads
for (int i = 1; i <= n; i++) {
    E[i] = 1 + 0.5 * E[i-1] + 0.5 * E[0];
    // heads: progress to i consecutive, tails: reset to 0
}
cout << E[n] << endl;
What's wrong?

The recurrence is backwards. E[i] should represent "expected tosses to reach n consecutive heads starting from i consecutive heads already." So E[n] = 0 (done), and we want E[0]. The recurrence should be: E[i]=1+0.5E[i+1]+0.5E[0] for i<n.

Fix: Set E[n] = 0, then solve from i=n1 down to 0. This creates a system where E[0] appears in every equation -- substitute and solve algebraically, or iterate until convergence.

Drill 2. Random walk with bidirectional dependency.

cpp
// Expected steps from node 0 to node n in a path graph
// At each node i (0 < i < n), move to i-1 or i+1 with prob 1/2
double E[MAXN];
E[n] = 0;
for (int i = n - 1; i >= 0; i--) {
    E[i] = 1 + 0.5 * E[i + 1] + 0.5 * E[i - 1];
}
What's wrong?

When computing E[i], the formula references E[i-1] -- but E[i-1] hasn't been computed yet (we're iterating downward)! This isn't a simple DP; it's a system of linear equations because of the bidirectional dependency. You can't solve it with a single-pass loop.

Fix: Use Gaussian elimination, or exploit the tridiagonal structure analytically. For the symmetric random walk on a path, the closed-form is E[i]=(ni)(n+i) (from node i to node n, so E[0]=n2).

Drill 3. Modular probability with integer division.

cpp
const int MOD = 1e9 + 7;
long long prob[MAXN];
// prob[i] = probability of event i, stored as p/q mod MOD
prob[0] = 1;  // probability 1 = 1/1

for (int i = 1; i <= n; i++) {
    prob[i] = prob[i-1] * 1 / 2 % MOD;  // multiply by 1/2
}
What's wrong?

1 / 2 in C++ is integer division, which gives 0. So prob[i] = 0 for all i1. To divide by 2 in modular arithmetic, you need to multiply by the modular inverse of 2.

Fix:

cpp
long long inv2 = (MOD + 1) / 2;  // modular inverse of 2 mod 1e9+7
// (works because MOD is odd, so (MOD+1)/2 * 2 ≡ 1 mod MOD)
prob[i] = prob[i-1] % MOD * inv2 % MOD;

Self-Test

  • [ ] State linearity of expectation and explain why it holds even when X and Y are dependent
  • [ ] Solve "expected # of coin flips until first head" using the geometric distribution formula -- and verify with the recurrence E=1+(1/2)E
  • [ ] Decompose "expected # of fixed points in a random permutation of [1..n]" into indicator variables and compute the answer
  • [ ] Set up the expected-value DP for a random walk on {0,1,2,3} with absorbing barrier at 3 -- write the system of equations
  • [ ] Explain when E[XY]=E[X]E[Y] holds and give a counterexample where it fails

Practice Problems

CF RatingWhat You Should Be Able To Do
1200Compute basic probabilities; know that E[X+Y]=E[X]+E[Y] always holds
1500Apply linearity of expectation to avoid complex combinatorics; use E=1/p for geometric distributions
1800Write expected value DP (on arrays, trees, DAGs); set up and solve systems of linear equations for random walks
2100Handle probability with bitmask DP; combine EV with segment trees or other data structures; solve Markov chain problems with Gaussian elimination
#ProblemSourceDifficultyKey Idea
1Dice ProbabilityCSESEasyBasic probability DP
2Ball in BerlandCF 1475DEasyIndicator variables + counting
3Random MoodCF 839CMediumExpected value DP on tree
4Ants on a CircleAtCoder ABC 250EMediumLinearity of expectation
5Dice TowerCF 1823DMediumGeometric distribution + DP
6JourneyCF 839CMediumE-value DP on DAG
7PlaylistCF 1151FMedium-HardLinearity + modular inverse
8Berland MinersCF 148DHardE-value with self-loops
9Walking on a TreeCF 1073EHardGaussian elimination on graph
10Ant ColonyCF 1525EHardMarkov chain + matrix exp
11Dishonest SellersCF 779CEasyBasic probability + complementary counting
12Magic GemsCF 1117DMediumExpected value with matrix exponentiation
13Nezzar and BoardCF 1478CMediumConditional probability reasoning
14Random EventsCF 1461DMediumIndependent events, complementary probability
15Game on LeavesCF 1363CMediumProbability on trees, Bayes reasoning
16Birthday ParadoxCSESHardBirthday paradox / collision analysis

Further Reading

Built for the climb from Codeforces 1100 to 2100.