Appearance
Probability & Expected Value
Quick Revisit
- USE WHEN: problem asks "expected value" or "probability of X" — especially with linearity of expectation
- INVARIANT: E[X₁+X₂+...] = E[X₁]+E[X₂]+... regardless of dependence (linearity of expectation)
- TIME: O(n) for linearity-of-expectation reframe; O(n²) for expected-value DP | SPACE: O(n)
- CLASSIC BUG: forgetting modular inverse for probability under mod — P = a/b becomes a·b⁻¹ mod p
- PRACTICE: 08-ladder-mixed
Linearity of expectation, geometric distribution, expected value DP, and probability on graphs. The single most underrated toolkit at CF 1600+ -- once you see it, half the "hard math" problems become straightforward.
Difficulty: [Intermediate]Prerequisites: Modular Arithmetic, 1-D DPCF Rating Range: 1600-2200
Three modes — pick one before you write code
Contest probability problems land in one of three modes. Recognising which mode applies determines the entire solution shape:
| Mode | Trigger | Tool |
|---|---|---|
| 1. Direct probability counting | Finite uniform sample space, "what is P(event)?" | $\frac{ |
| 2. Expected value via indicators | "Expected number of [X]" with linearity of expectation | |
| 3. Expected value DP / equations | Random walks, absorbing states, "expected steps until..." | Set up |
Reach for indicators before DP — many "expected value DP" problems collapse to a one-line indicator sum once you spot the right
Modular probability — concrete example. Under modulus
, the probability is not 1.0/6.0and certainly not the integer0. It ismod_inv(6) = pow(6, p-2, p) = 166666668. Verify:. Every fractional probability in a modular problem becomes a multiplication by the modular inverse of the denominator. If you ever see (int)(prob * something)mixing floats and mod, the answer is wrong even if the test passes.
Contents
- Intuition
- Visual Intuition (ASCII Art)
- Foundational Probability
- C++ STL Reference
- Implementation (Contest-Ready)
- Operations Reference
- Problem Patterns & Tricks
- Contest Cheat Sheet
- Gotchas & Debugging
- Self-Test
- Practice Problems
- Further Reading
Intuition
Roll a fair 6-sided die until you get a 6. How many rolls does it take on average?
Naive approach: simulate
Trial 1: 3 5 2 6 -> 4 rolls
Trial 2: 1 6 -> 2 rolls
Trial 3: 4 4 3 2 5 1 6 -> 7 rolls
Trial 4: 6 -> 1 roll
Trial 5: 2 3 6 -> 3 rolls
...
Average over 10^6 trials ~= 6.0You get approximately 6.0, but the contest asks for the exact answer as a fraction or modulo
We need a way to compute expected values symbolically, without enumerating an exponential or infinite sample space.
Linearity of expectation lets you split complex expected values into simple independent parts --
Analogy: to find the expected number of heads in 100 coin flips, just add up
This analogy is exact -- it works for any random variables, dependent or not. The only place it breaks is when you need
Visual Walkthrough
Coupon collector: there are 4 coupon types. Each purchase gives a uniformly random coupon. Expected purchases to collect all 4?
Decompose into phases using linearity of expectation:
Phase 0: You have 0 types. Next purchase gives a NEW type with prob 4/4.
Phase 1: You have 1 type. Next NEW type with prob 3/4.
Phase 2: You have 2 types. Next NEW type with prob 2/4.
Phase 3: You have 3 types. Next NEW type with prob 1/4.Each phase is a geometric random variable -- expected
+-----------------------------------------------------------+
| Phase | Types owned | P(new) | E[purchases in phase] |
|-------|-------------|--------|----------------------------|
| 0 | 0 | 4/4 | 4/4 = 1.00 |
| 1 | 1 | 3/4 | 4/3 = 1.33 |
| 2 | 2 | 2/4 | 4/2 = 2.00 |
| 3 | 3 | 1/4 | 4/1 = 4.00 |
+-----------------------------------------------------------+
| TOTAL (by linearity): 1 + 4/3 + 2 + 4 = 25/3 ~= 8.33 |
+-----------------------------------------------------------+Step-by-step trace (one possible run):
Step 1: Buy coupon -> type A (new!) collected = {A}
Step 2: Buy coupon -> type C (new!) collected = {A, C}
Step 3: Buy coupon -> type A (dup) collected = {A, C}
Step 4: Buy coupon -> type B (new!) collected = {A, B, C}
Step 5: Buy coupon -> type B (dup) collected = {A, B, C}
Step 6: Buy coupon -> type A (dup) collected = {A, B, C}
Step 7: Buy coupon -> type D (new!) collected = {A, B, C, D} DONE!Brute force would enumerate all possible sequences (infinite!). Linearity of expectation gives the exact answer
COUPON COLLECTOR -- PHASE DECOMPOSITION (n=4 types):
Phase: ---- 0 ---- ---- 1 ---- ----- 2 ----- --------- 3 ---------
Owned: 0 types 1 type 2 types 3 types
P(new): 4/4 3/4 2/4 1/4
E[tries]: 1.0 1.33 2.0 4.0
Timeline of one possible run:
+---+---+---+---+---+---+---+
| A | C | A | B | B | A | D |
+---+---+---+---+---+---+---+
new new dup new dup dup new
<0> <-1-> <--2--> <------3------>
Total E = 1 + 4/3 + 2 + 4 = 25/3 ~ 8.33For
The Invariant
+------------------------------------------------------------------+
| CORE FORMULAS |
| |
| 1. Linearity: E[X] = sum of E[X_i] for indicator vars X_i. |
| (Even if X_i are dependent!) |
| |
| 2. Geometric distribution: repeated independent trials with |
| success probability p. E[trials until first success] = 1/p. |
| |
| 3. Expected value DP: |
| E[state] = sum over transitions of |
| prob(t) * (cost(t) + E[next_state(t)]) |
| |
| 4. Indicator variable trick: |
| E[count of events] = sum of P(event_i) |
+------------------------------------------------------------------+Look at the coupon collector walkthrough: each phase is an independent geometric variable with success probability
When to Reach for This
Trigger phrases:
- "expected value" / "expected number of steps"
- "probability that..." / "what is the probability"
- "random process" / "each step independently"
- "answer modulo
" combined with any of the above (signals modular inverse for probability)
Counter-examples:
- "Count the number of ways" without "expected" -> pure combinatorics, not expected value. See Combinatorics Basics.
- "Minimize the expected cost where you choose actions" -> this is optimization/game theory, not pure probability. You may still use E-value DP, but the recurrence has a
or over choices. - Deterministic simulation of a process -> no randomness, no probability.
What the Code Won't Teach You
The DP fills a table and produces a number. What it hides:
Linearity of expectation does not require independence. This is the single most powerful -- and most counterintuitive -- fact in competitive-programming probability.
Indicator decomposition is the meta-skill. Any "expected count" problem decomposes: let
INDICATOR DECOMPOSITION -- THE PATTERN:
Problem: "Expected # of elements in correct position in random permutation"
Step 1: Define X_i = 1 if sigma(i) = i (indicator for fixed point)
Step 2: E[X_i] = P(sigma(i) = i) = 1/n (by symmetry)
Step 3: E[total] = Sum E[X_i] = n * (1/n) = 1
+-----------------------------------------------------+
| No need to enumerate permutations. |
| No need to track dependencies between X_i and X_j. |
| Linearity handles everything. |
+-----------------------------------------------------+Visual Intuition (ASCII Art)
Diagram 1 -- Indicator variable decomposition for "expected number of matching pairs" in a random permutation of [1..4]:
Permutation: [2, 1, 4, 3]
Pair positions: (1,2) (1,3) (1,4) (2,3) (2,4) (3,4)
Match? (a_i=i): N N N N N N
X_i = 1 if position i has a_i = i (a "fixed point")
E[# fixed points] = sum E[X_i] = 4 * (1/4) = 1
+-----+-----+-----+-----+
| pos | 1 | 2 | 3 | 4 |
|-----|-----|-----|-----|-----|
| val | 2 | 1 | 4 | 3 |
| X_i | 0 | 0 | 0 | 0 |
+-----+-----+-----+-----+-----+
P(X_i = 1) = 1/4 for each i (by symmetry)
E[total] = 4 * 1/4 = 1 (by linearity)Diagram 2 -- Expected value DP: random walk on a line, 0 to
Start at 0. Each step: move right with prob
State diagram (n=3):
1-p 1-p 1-p
+------+ +------+ +------+
| v | v | v
| +---+---+---+---+---+---+---+
+--| 0 |-->| 1 |-->| 2 |-->| 3 | (absorbing)
+---+ p +---+ p +---+ p +---+
^ | |
+-------+ |
reflect |
at 0 |
|
E[0] = 1 + p*E[1] + (1-p)*E[0] (self-loop at 0)
E[1] = 1 + p*E[2] + (1-p)*E[0]
E[2] = 1 + p*E[3] + (1-p)*E[1]
E[3] = 0 (target)Solve the linear system for
Foundational Probability
The sections above establish linearity of expectation and the geometric distribution -- the two workhorses of CP probability. But many contest problems (and virtually all quant/HFT interviews) require sharper tools: Bayes' theorem, named distributions, variance, and simulation. This section fills those gaps.
Basic Axioms & Rules
A sample space
Three axioms (everything else derives from these):
for every event . . - For mutually exclusive events
: .
Derived rules you'll use constantly:
Complementary counting is the single most useful probability trick at CF 1100-1600: when computing
Worked example -- two dice:
Roll two fair 6-sided dice. What is
All 36 equally likely outcomes (die1, die2):
Sum >= 10 outcomes:
(4,6) (5,5) (5,6) (6,4) (6,5) (6,6) --> 6 outcomes
Direct: P = 6/36 = 1/6
Complement: P(sum < 10) = 30/36, so P(sum >= 10) = 1 - 30/36 = 1/6Inclusion-exclusion generalizes to more events:
+-------------------------------------------------------+
| INCLUSION-EXCLUSION (two events, Venn diagram) |
| |
| +----------+ |
| | A | |
| | +----+------+ |
| | | A n B | |
| +-----+----+ B | |
| | +------+ |
| |
| P(A u B) = P(A) + P(B) - P(A n B) |
+-------------------------------------------------------+Conditional Probability & Bayes' Theorem
Conditional probability -- the probability of
This is the foundation of all "given that..." reasoning.
Multiplication rule (rearranging):
Bayes' theorem reverses the conditioning direction:
The denominator expands via the law of total probability:
Example 1 -- Disease testing (classic quant interview):
A disease affects 1 in 1000 people. A test is 99% accurate (99% sensitivity, 99% specificity). You test positive. What is
Let D = "has disease", T+ = "tests positive"
Given:
P(D) = 0.001 (base rate / prior)
P(T+|D) = 0.99 (sensitivity)
P(T+|~D) = 0.01 (false positive rate)
Compute P(T+):
P(T+) = P(T+|D)*P(D) + P(T+|~D)*P(~D)
= 0.99 * 0.001 + 0.01 * 0.999
= 0.00099 + 0.00999
= 0.01098
Apply Bayes:
P(D|T+) = P(T+|D)*P(D) / P(T+)
= 0.00099 / 0.01098
= 0.0902 (~9%)
SURPRISE: 91% chance you DON'T have it despite testing positive.
The low base rate dominates.Visualize with natural frequencies (often clearer than formulas):
+------------------------------------------------------------+
| Population: 100,000 people |
| |
| +----------+ +--------------------------------------+ |
| | Diseased | | Healthy | |
| | 100 | | 99,900 | |
| | | | | |
| | Test+:99 | | Test+: 999 Test-: 98,901 | |
| | Test-: 1 | | | |
| +----------+ +--------------------------------------+ |
| |
| P(D | T+) = 99 / (99 + 999) = 99/1098 ~ 9.0% |
+------------------------------------------------------------+Example 2 -- Monty Hall problem:
Three doors: one car, two goats. You pick door 1. The host (who knows what's behind each door) opens door 3, revealing a goat. Should you switch?
Initial setup:
Door 1 Door 2 Door 3
(You) (???) (Host opens: goat)
Case analysis (car equally likely behind any door):
Car behind 1: Host opens 2 or 3 (say 3). Switch -> LOSE. P = 1/3
Car behind 2: Host MUST open 3. Switch -> WIN. P = 1/3
Car behind 3: Host MUST open 2 (not 3). (Contradicts host opening 3.)
Given host opened 3:
P(car = 1 | host opens 3) = 1/3
P(car = 2 | host opens 3) = 2/3
SWITCH. Switching wins 2/3 of the time.The Bayes calculation:
Why Bayes matters for CP: any problem where you observe partial information and must update beliefs. "Given that event
cpp
#include <bits/stdc++.h>
using namespace std;
// Bayes update: given prior[i] and likelihood[i] = P(evidence | hypothesis i),
// return posterior[i] = P(hypothesis i | evidence).
vector<double> bayes_update(vector<double>& prior, vector<double>& likelihood) {
int n = prior.size();
double evidence = 0;
for (int i = 0; i < n; i++) evidence += prior[i] * likelihood[i];
vector<double> posterior(n);
for (int i = 0; i < n; i++)
posterior[i] = prior[i] * likelihood[i] / evidence;
return posterior;
}Independence: events
Discrete Distributions
A probability distribution describes the behavior of a random variable. These named distributions appear repeatedly in CP and quant interviews -- recognize them instantly.
Bernoulli( )
A single trial: success with probability
The building block for everything else. An indicator random variable is Bernoulli.
cpp
#include <bits/stdc++.h>
using namespace std;
// Bernoulli trial: returns 1 with probability p
int bernoulli(mt19937& rng, double p) {
return uniform_real_distribution<double>(0, 1)(rng) < p ? 1 : 0;
}Binomial( )
Number of successes in
When it appears: "flip
Binomial(10, 0.5) PMF:
P(k)
0.25 + *
| * *
0.20 + * *
| * *
0.15 +
| * *
0.10 +
| * *
0.05 + * *
| * *
0.00 +* *
+--+--+--+--+--+--+--+--+--+--+--
0 1 2 3 4 5 6 7 8 9 10 kcpp
#include <bits/stdc++.h>
using namespace std;
// P(X = k) for Binomial(n, p) -- use log to avoid overflow
double binom_pmf(int n, int k, double p) {
double log_prob = lgamma(n + 1) - lgamma(k + 1) - lgamma(n - k + 1)
+ k * log(p) + (n - k) * log(1 - p);
return exp(log_prob);
}For CP mod-prime problems, compute
Geometric( )
Number of independent trials until the first success (inclusive).
This is the distribution behind the coupon collector phases in Section 2 above -- each phase is geometric with success probability
Memoryless property:
Geometric(p=0.3) -- probability of first success on trial k:
k: 1 2 3 4 5 6 7 ...
P: 0.300 0.210 0.147 0.103 0.072 0.050 0.035 ...
***
*** ***
*** *** **
*** *** ** *
*** *** ** * * *Poisson( )
Number of events in a fixed interval when events occur independently at rate
When it appears: modeling arrivals (network packets, server requests -- quant interview staple), approximating Binomial(
Key property: sum of independent Poisson variables is Poisson:
cpp
#include <bits/stdc++.h>
using namespace std;
// P(X = k) for Poisson(lambda)
double poisson_pmf(double lambda, int k) {
return exp(k * log(lambda) - lambda - lgamma(k + 1));
}Hypergeometric
Drawing without replacement. An urn has
When it appears: card-drawing problems, lottery calculations, sampling without replacement. Key difference from binomial: draws are dependent because there's no replacement. As
+-----------------------------------------------------------+
| DISCRETE DISTRIBUTIONS -- QUICK REFERENCE |
|-----------------------------------------------------------|
| Name | E[X] | Var(X) | Key property |
|-----------------|--------|--------------|-----------------|
| Bernoulli(p) | p | p(1-p) | single trial |
| Binomial(n,p) | np | np(1-p) | n ind. trials |
| Geometric(p) | 1/p | (1-p)/p^2 | memoryless |
| Poisson(lam) | lam | lam | rare events |
| Hypergeometric | nK/N | (see above) | no replacement |
+-----------------------------------------------------------+Continuous Distributions (Brief, for Interviews)
These rarely appear in CP (which deals with discrete structures) but are quant/HFT interview staples. Know the formulas and key properties.
Uniform( )
All values in
Classic interview problem: "Two people agree to meet between 12:00 and 1:00. Each arrives at a uniformly random time and waits 15 minutes. What is the probability they meet?"
Answer:
Exponential( )
Time until the next event in a Poisson process.
Memoryless property (continuous analog of geometric):
CDF:
Normal / Gaussian( )
The most important continuous distribution, thanks to the Central Limit Theorem.
68-95-99.7 rule:
99.7% (within 3 sigma)
|<------------------------------>|
| 95% (within 2 sigma) |
| |<---------------------->| |
| | 68% (within 1 sigma)| |
| | |<-------------->| | |
| | | | | |
-----+--+--+-------*-------+--+-+-----+-----
mu-3s mu-s mu mu+s mu+2s mu+3sCentral Limit Theorem (CLT): the average of
Interview application: "Flip 10000 fair coins. Approximate
+-----------------------------------------------------------+
| CONTINUOUS DISTRIBUTIONS -- QUICK REFERENCE |
|-----------------------------------------------------------|
| Name | E[X] | Var(X) | Key property |
|------------------|----------|--------------|--------------|
| Uniform(a,b) | (a+b)/2 | (b-a)^2/12 | flat PDF |
| Exponential(lam) | 1/lam | 1/lam^2 | memoryless |
| Normal(mu,s^2) | mu | s^2 | CLT limit |
+-----------------------------------------------------------+Variance & Covariance
Variance measures how spread out a distribution is. You need it when problems ask for "the variance of ..." or when analyzing the reliability of Monte Carlo estimates.
Computational formula (almost always easier than the definition):
The definition
Scaling:
Adding a constant shifts the distribution but doesn't change its spread.
Variance of a sum:
Covariance:
If
For
Worked example -- sum of two dice:
Let S = D_1 + D_2 (two independent fair dice).
E[D_i] = (1+2+3+4+5+6)/6 = 3.5
E[D_i^2] = (1+4+9+16+25+36)/6 = 91/6
Var(D_i) = 91/6 - (3.5)^2 = 91/6 - 49/4 = 35/12 ~ 2.917
D_1, D_2 independent:
Var(S) = Var(D_1) + Var(D_2) = 35/12 + 35/12 = 35/6 ~ 5.833
Std dev = sqrt(35/6) ~ 2.415When variance matters in CP: some problems ask for the variance or standard deviation of a random quantity (not just its expectation). Approach: compute
cpp
#include <bits/stdc++.h>
using namespace std;
// Compute E[X] and Var(X) from samples or a PMF
// pmf[i] = {value, probability}
pair<double, double> mean_variance(vector<pair<double, double>>& pmf) {
double ex = 0, ex2 = 0;
for (auto& [val, prob] : pmf) {
ex += val * prob;
ex2 += val * val * prob;
}
return {ex, ex2 - ex * ex};
}Covariance in practice: when random variables are not independent (e.g., two indicator variables for overlapping events), you must compute covariance explicitly. For indicators
Classic Probability Problems (Worked Solutions)
These appear in CP contests, quant interviews, and mathematical olympiads. Recognize the pattern, apply the formula.
Coupon Collector
(Detailed walkthrough in Section 2 above.) For
The variance is:
For
Birthday Paradox
In a group of
For
Application in CS: hash collision analysis. With a hash space of size
+----------------------------------------------+
| Birthday paradox: P(collision) vs group size |
| |
| P 1.0 + **** |
| 0.8 + ***** |
| 0.6 + **** |
| 0.5 +- - - - - -* - - - - - - - |
| 0.4 + * |
| 0.2 + ** |
| 0.0 +**** |
| +--+--+--+--+--+--+--+--+--> k |
| 0 5 10 15 20 25 30 40 50 |
+----------------------------------------------+cpp
#include <bits/stdc++.h>
using namespace std;
// Minimum group size k for P(collision) >= threshold in space of size n
int birthday_bound(long long n, double threshold = 0.5) {
double p_no_collision = 1.0;
for (int k = 1; ; k++) {
p_no_collision *= (double)(n - k + 1) / n;
if (1.0 - p_no_collision >= threshold) return k;
}
}Gambler's Ruin
A gambler starts with
Probability of ruin starting from
Expected duration (for
Fortune over time (random walk, absorbing barriers at 0 and N):
N + * (WIN)
| * * *
| * * * * * * *
| * * * * * *
a + - - - * * * * <-- start
| *
| * *
| **
0 +* (RUIN)
+--+--+--+--+--+--+--+--+--+--+--> roundsWhen it appears: random walk problems, stopping time analysis, competitive betting models in quant interviews. Also connects to the expected value DP on random walks (Section 3 diagram above).
Ballot Problem
Candidate A gets
This connects to Catalan numbers and lattice path counting. For
Generalization (Cycle Lemma): the number of cyclic permutations of a sequence of
Dice Puzzles
Expected rolls to see all 6 faces -- coupon collector with
Expected value of the max of
Proof: CDF of the max is
Common quant interview question: "I draw
Expected number of rolls of a die until sum exceeds
Monte Carlo Simulation
When exact computation is intractable, simulate. The Law of Large Numbers guarantees that the sample average converges to the true expectation as the number of trials
When to use Monte Carlo:
- Verifying your exact formula on small inputs (stress testing).
- Getting approximate answers when no closed form exists.
- Computing integrals numerically (e.g., estimating
).
Basic template:
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
uniform_real_distribution<double> real01(0.0, 1.0);
uniform_int_distribution<int> die(1, 6);
const int TRIALS = 1'000'000;
int hits = 0;
for (int i = 0; i < TRIALS; i++) {
// simulate one trial
int s = die(rng) + die(rng) + die(rng);
if (s > 15) hits++;
}
double prob = (double)hits / TRIALS;
printf("Estimated P(3 dice sum > 15): %.6f\n", prob);
// Exact: 10/216 = 0.046296...
}Estimating
+---+---+---+---+---+
| . . * . | 1.0
| * * . * |
| * * * . * |
| * . * * * |
| * * . | 0.5
| . * * . |
| * . |
| * . |
+---+---+---+---+---+
0.0 1.0
* = inside quarter-circle (x^2 + y^2 <= 1)
. = outside quarter-circle
pi ~ 4 * (count of *) / (total points)cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
mt19937 rng(42);
uniform_real_distribution<double> dist(0.0, 1.0);
int inside = 0, total = 1'000'000;
for (int i = 0; i < total; i++) {
double x = dist(rng), y = dist(rng);
if (x * x + y * y <= 1.0) inside++;
}
printf("pi ~ %.6f\n", 4.0 * inside / total); // ~3.1416
}Stress testing: when you derive an exact formula, always verify with Monte Carlo on small inputs before submitting.
cpp
#include <bits/stdc++.h>
using namespace std;
// Verify: expected fixed points in a random permutation = 1.0
int main() {
mt19937 rng(42);
int n = 10;
const int TRIALS = 1'000'000;
long long total_fixed = 0;
for (int t = 0; t < TRIALS; t++) {
vector<int> perm(n);
iota(perm.begin(), perm.end(), 0);
shuffle(perm.begin(), perm.end(), rng);
for (int i = 0; i < n; i++)
if (perm[i] == i) total_fixed++;
}
printf("E[fixed points] ~ %.4f (exact: 1.0000)\n",
(double)total_fixed / TRIALS);
}+-------------------------------------------------------------------+
| MONTE CARLO PRECISION GUIDE |
|-------------------------------------------------------------------|
| Trials | Typical error (p ~ 0.5) | Good for... |
|-------------|-------------------------|---------------------------|
| 10^4 | ~0.005 | Quick sanity check |
| 10^5 | ~0.0016 | 3-digit accuracy |
| 10^6 | ~0.0005 | Most stress tests |
| 10^7 | ~0.00016 | High-precision verify |
|-------------------------------------------------------------------|
| Error ~ 1/sqrt(TRIALS). To halve the error, quadruple trials. |
+-------------------------------------------------------------------+C++ STL Reference
| Function / Class | Header | What it does | Time |
|---|---|---|---|
mt19937 rng(seed) | <random> | Mersenne Twister PRNG, 32-bit | |
uniform_int_distribution<>(a,b) | <random> | Uniform random int in | |
uniform_real_distribution<>(a,b) | <random> | Uniform random double in | |
__gcd(a, b) / gcd(a,b) | <algorithm> / <numeric> | GCD (needed for fraction simplification) | |
__int128 | built-in | 128-bit int for modular arithmetic overflow avoidance | -- |
For contest probability problems, you rarely use <random> directly -- the key operations are modular inverse and DP. The STL's main role is providing gcd and fast I/O.
Implementation (Contest-Ready)
5.1 Minimal: Modular inverse for probability mod prime
cpp
#include <bits/stdc++.h>
using namespace std;
const long long MOD = 1e9 + 7;
long long power(long long base, long long exp, long long mod) {
long long result = 1;
base %= mod;
while (exp > 0) {
if (exp & 1) result = result * base % mod;
base = base * base % mod;
exp >>= 1;
}
return result;
}
long long inv(long long a, long long mod = MOD) {
return power(a, mod - 2, mod);
}
// Probability p/q mod MOD = p * inv(q) mod MOD
long long prob_mod(long long p, long long q) {
return p % MOD * inv(q) % MOD;
}
int main() {
// Coupon collector with n types: E = n * H_n
int n;
cin >> n;
long long ans = 0;
for (int k = 1; k <= n; k++)
ans = (ans + prob_mod(n, k)) % MOD;
cout << ans << "\n";
}5.2 Minimal: Expected value DP
cpp
#include <bits/stdc++.h>
using namespace std;
const long long MOD = 1e9 + 7;
long long power(long long b, long long e, long long m) {
long long r = 1; b %= m;
while (e > 0) { if (e & 1) r = r * b % m; b = b * b % m; e >>= 1; }
return r;
}
long long inv(long long a) { return power(a, MOD - 2, MOD); }
int main() {
int n;
cin >> n;
vector<vector<pair<int,long long>>> adj(n); // adj[u] = {(v, prob_numerator)}
// Read graph / transitions ...
// E[target] = 0, solve backwards (topological or Gaussian)
vector<long long> E(n, 0);
// Example: DAG in reverse topological order
// E[u] = sum over (v, p): p/total * (1 + E[v])
cout << E[0] << "\n";
}5.3 Explained: Expected value DP for die rolls
cpp
#include <bits/stdc++.h>
using namespace std;
const long long MOD = 1e9 + 7;
long long power(long long b, long long e, long long m) {
long long r = 1; b %= m;
while (e > 0) { if (e & 1) r = r * b % m; b = b * b % m; e >>= 1; }
return r;
}
long long inv(long long a) { return power(a, MOD - 2, MOD); }
int main() {
// Problem: roll a 6-sided die. You start with score 0.
// Each roll adds the face value to your score.
// Expected number of rolls to reach score >= n?
//
// E[s] = expected rolls to reach >= n starting from score s.
// E[s] = 0 if s >= n.
// E[s] = 1 + (1/6) * sum_{f=1}^{6} E[s + f]
//
// Compute backwards from s = n-1 down to 0.
int n;
cin >> n;
vector<long long> E(n + 7, 0); // E[s] = 0 for s >= n
long long inv6 = inv(6);
for (int s = n - 1; s >= 0; s--) {
// E[s] = 1 + (1/6) * sum_{f=1..6} E[s+f]
long long total = 0;
for (int f = 1; f <= 6; f++) {
int ns = s + f;
if (ns >= n)
total = (total + 0) % MOD; // E[ns] = 0
else
total = (total + E[ns]) % MOD;
}
// E[s] = 1 + total * inv(6)
E[s] = (1 + total % MOD * inv6 % MOD) % MOD;
}
cout << E[0] << "\n";
}5.4 Gaussian elimination for E-value on graphs with cycles
cpp
#include <bits/stdc++.h>
using namespace std;
// Solve system of linear equations over doubles.
// a[i] is row i with n+1 columns (augmented matrix).
// Returns solution vector x, or empty if no unique solution.
vector<double> gauss(vector<vector<double>>& a) {
int n = a.size();
for (int col = 0; col < n; col++) {
int pivot = -1;
for (int row = col; row < n; row++)
if (abs(a[row][col]) > 1e-9) { pivot = row; break; }
if (pivot == -1) return {}; // singular
swap(a[col], a[pivot]);
double div = a[col][col];
for (int j = col; j <= n; j++) a[col][j] /= div;
for (int row = 0; row < n; row++) {
if (row == col) continue;
double factor = a[row][col];
for (int j = col; j <= n; j++)
a[row][j] -= factor * a[col][j];
}
}
vector<double> x(n);
for (int i = 0; i < n; i++) x[i] = a[i][n];
return x;
}
int main() {
// E[i] = 1 + (1/deg(i)) * sum_{j in adj(i)} E[j]
// E[target] = 0
// Rearrange: E[i] - (1/deg(i)) * sum E[j] = 1
// Build augmented matrix and solve.
int n, m, target;
cin >> n >> m >> target;
vector<vector<int>> adj(n);
for (int i = 0; i < m; i++) {
int u, v; cin >> u >> v;
adj[u].push_back(v);
adj[v].push_back(u);
}
// n equations, n unknowns
vector<vector<double>> eq(n, vector<double>(n + 1, 0.0));
for (int i = 0; i < n; i++) {
if (i == target) {
eq[i][i] = 1.0; // E[target] = 0
eq[i][n] = 0.0;
} else {
int deg = adj[i].size();
eq[i][i] = 1.0;
for (int j : adj[i])
eq[i][j] -= 1.0 / deg;
eq[i][n] = 1.0;
}
}
auto ans = gauss(eq);
if (ans.empty()) cout << "-1\n";
else cout << fixed << setprecision(9) << ans[0] << "\n";
}Operations Reference
| Operation | Time | Space |
|---|---|---|
| Modular inverse (Fermat) | ||
| Precompute inverse table | ||
| Expected value DP (DAG, | ||
| Gaussian elimination ( | ||
| Coupon collector ( | ||
| Linearity of expectation (sum of | ||
| Matrix exponentiation for Markov chain ( |
Problem Patterns & Tricks
Pattern 1: Indicator Variable Decomposition
Instead of computing
Example: Expected number of inversions in a random permutation of
CF 1Expected -- CF 1042B, CF 1392D
Pattern 2: Coupon Collector / Geometric Phases
Decompose a "collect all" process into phases. Phase
E = sum_{k=0}^{n-1} n / (n - k) = n * (1/1 + 1/2 + ... + 1/n)CF 1Expected -- CF 1Expected280C, AtCoder ABC 242E
Pattern 3: Expected Value DP on DAGs
Define
If the state graph is a DAG, compute in reverse topological order. No linear system needed.
cpp
// Reverse topo order
for (int u : rev_topo) {
E[u] = 1; // cost of being in state u
for (auto [v, p] : transitions[u])
E[u] = (E[u] + p * E[v]) % MOD;
}CF 518D, CF 839C
Pattern 4: Self-loop elimination in E-value DP
When
Rearrange:
This avoids Gaussian elimination when the only cycle is a self-loop.
SELF-LOOP ELIMINATION IN E-VALUE DP:
State s can stay with prob p, or move to t with prob 1-p.
Naive recurrence (INFINITE LOOP):
E[s] = 1 + p*E[s] + (1-p)*E[t] <-- E[s] on both sides!
Algebraic fix:
E[s] - p*E[s] = 1 + (1-p)*E[t]
(1-p)*E[s] = 1 + (1-p)*E[t]
E[s] = 1/(1-p) + E[t]
+--------+ p (self-loop) +--------+
| State s| <---------------| State s|
| | --------------->| State t|
+--------+ 1-p +--------+
Example: p = 1/2, E[t] = 0 -> E[s] = 1/(1/2) + 0 = 2CF 1Expected -- CF 1073E, CF 148D
Pattern 5: Probability modulo prime
When the answer is
Never mix floating-point and modular arithmetic. Do everything in modular world from the start.
CF 1Expected -- CF 1151F, CF 963A
Pattern 6: Markov Chain + Matrix Exponentiation
For a random walk on a small state space (
The probability of being in state
CF 1Expected -- CF 691E, AtCoder ABC 256E
Contest Cheat Sheet
+----------------------------------------------------------------+
| PROBABILITY & EXPECTED VALUE -- Quick Reference |
|----------------------------------------------------------------|
| WHEN TO USE: |
| - "expected number of ..." / "probability that ..." |
| - Random process with well-defined transitions |
| - Answer mod 10^9+7 with fractions |
| |
| KEY FORMULAS: |
| E[X+Y] = E[X] + E[Y] (ALWAYS, even if dependent) |
| E[geometric(p)] = 1/p |
| E[coupon collector, n] = n * H_n |
| Var(X) = E[X^2] - (E[X])^2 |
| Bayes: P(A|B) = P(B|A)*P(A) / P(B) |
| Birthday: collision after ~1.18*sqrt(N) inserts |
| |
| KEY CODE: |
| ll inv(ll a) { return power(a, MOD-2, MOD); } |
| ll prob(ll p, ll q) { return p % MOD * inv(q) % MOD; } |
| // DP: E[s] = cost + sum(p_t * E[next]) / (1 - p_stay) |
| |
| COMPLEXITY: |
| Modular inverse: O(log MOD) | E-val DP (DAG): O(n+m) |
| Gauss elimination: O(n^3) | Matrix exp: O(n^3 log k) |
| |
| GOTCHA: Never mix doubles and mod arithmetic. |
| GOTCHA: Self-loops need algebraic rearrangement, not recursion.|
+----------------------------------------------------------------+Gotchas & Debugging
Mistake 1: Forgetting self-loop rearrangement
If state E[s] = 1 + p * E[s] + ... and computing naively gives infinite recursion. Always rearrange to isolate
Mistake 2: Wrong modular inverse
Mistake 3: Probability > 1 in modular world
When debugging, you cannot check if a modular probability "makes sense" by looking at its value. Instead, verify with small cases using floating-point separately, then trust your modular code.
Mistake 4: Off-by-one in geometric distribution
Mistake 5: Applying linearity to products
Edge cases
- Probability 0: if a state is unreachable, its expected value is undefined. Guard against division by zero.
in coupon collector: (you already get the only type on the first purchase). - Absorbing states: make sure
is set before the DP loop. - Graph not connected: if target is unreachable from source, expected value is infinite -- the problem is ill-defined or you need to output
.
Debugging tips
- Test with small cases (
) using brute-force simulation (doubles). - Compare your modular answer by converting the brute-force double to a fraction
and checking . - Print intermediate
values -- they should form a monotonically decreasing sequence from source to target in many problems.
Mental Traps
"I need the full probability distribution to compute the expected value." For most problems you need only
DISTRIBUTION vs DIRECT EXPECTATION:
"Expected # of heads in 100 fair flips"
Distribution approach: Linearity approach:
+----------------------------+ +----------------------------+
| Compute P(X=k) for each k | | E[X] = 100 * P(one head) |
| = 0..100 using C(100,k) | | = 100 * 0.5 |
| Sum k * P(X=k) | | = 50 |
| -> dozens of binomial | | |
| coefficient evaluations | | One line. Done. |
+----------------------------+ +----------------------------+"Expected value problems are different from DP problems." In competitive programming, expected value computations are almost always implemented as DP. The state is "where am I in the process," the transition weights are probabilities, and the value is the expected cost-to-go. Recognizing that
Common Mistakes
- Treating every random walk as a sum of geometric trials. The frog that jumps
with probability and stays put with probability reaches position in expected time -- each step is a fresh geometric variable with parameter . Tempting to reuse the formula when the frog jumps with probability and with probability , but now the walk can revisit positions and the steps are no longer independent; the correct setup is the recurrence solved as a system. Linearity of expectation always works for decomposing sums, but only applies when each "step" is a fresh independent trial. When there is backtracking or state dependence, use DP or equation systems.
Debug Drills
Drill 1. Expected value DP with backward recurrence.
cpp
// Expected tosses to get n consecutive heads
double E[MAXN];
E[0] = 0; // 0 consecutive heads: we're done... wait, we WANT n heads
for (int i = 1; i <= n; i++) {
E[i] = 1 + 0.5 * E[i-1] + 0.5 * E[0];
// heads: progress to i consecutive, tails: reset to 0
}
cout << E[n] << endl;What's wrong?
The recurrence is backwards. E[i] should represent "expected tosses to reach E[n] = 0 (done), and we want E[0]. The recurrence should be:
Fix: Set E[n] = 0, then solve from
Drill 2. Random walk with bidirectional dependency.
cpp
// Expected steps from node 0 to node n in a path graph
// At each node i (0 < i < n), move to i-1 or i+1 with prob 1/2
double E[MAXN];
E[n] = 0;
for (int i = n - 1; i >= 0; i--) {
E[i] = 1 + 0.5 * E[i + 1] + 0.5 * E[i - 1];
}What's wrong?
When computing E[i], the formula references E[i-1] -- but E[i-1] hasn't been computed yet (we're iterating downward)! This isn't a simple DP; it's a system of linear equations because of the bidirectional dependency. You can't solve it with a single-pass loop.
Fix: Use Gaussian elimination, or exploit the tridiagonal structure analytically. For the symmetric random walk on a path, the closed-form is
Drill 3. Modular probability with integer division.
cpp
const int MOD = 1e9 + 7;
long long prob[MAXN];
// prob[i] = probability of event i, stored as p/q mod MOD
prob[0] = 1; // probability 1 = 1/1
for (int i = 1; i <= n; i++) {
prob[i] = prob[i-1] * 1 / 2 % MOD; // multiply by 1/2
}What's wrong?
1 / 2 in C++ is integer division, which gives 0. So prob[i] = 0 for all
Fix:
cpp
long long inv2 = (MOD + 1) / 2; // modular inverse of 2 mod 1e9+7
// (works because MOD is odd, so (MOD+1)/2 * 2 ≡ 1 mod MOD)
prob[i] = prob[i-1] % MOD * inv2 % MOD;Self-Test
- [ ] State linearity of expectation and explain why it holds even when
and are dependent - [ ] Solve "expected # of coin flips until first head" using the geometric distribution formula -- and verify with the recurrence
- [ ] Decompose "expected # of fixed points in a random permutation of
" into indicator variables and compute the answer - [ ] Set up the expected-value DP for a random walk on
with absorbing barrier at 3 -- write the system of equations - [ ] Explain when
holds and give a counterexample where it fails
Practice Problems
| CF Rating | What You Should Be Able To Do |
|---|---|
| 1200 | Compute basic probabilities; know that |
| 1500 | Apply linearity of expectation to avoid complex combinatorics; use |
| 1800 | Write expected value DP (on arrays, trees, DAGs); set up and solve systems of linear equations for random walks |
| 2100 | Handle probability with bitmask DP; combine EV with segment trees or other data structures; solve Markov chain problems with Gaussian elimination |
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Dice Probability | CSES | Easy | Basic probability DP |
| 2 | Ball in Berland | CF 1475D | Easy | Indicator variables + counting |
| 3 | Random Mood | CF 839C | Medium | Expected value DP on tree |
| 4 | Ants on a Circle | AtCoder ABC 250E | Medium | Linearity of expectation |
| 5 | Dice Tower | CF 1823D | Medium | Geometric distribution + DP |
| 6 | Journey | CF 839C | Medium | E-value DP on DAG |
| 7 | Playlist | CF 1151F | Medium-Hard | Linearity + modular inverse |
| 8 | Berland Miners | CF 148D | Hard | E-value with self-loops |
| 9 | Walking on a Tree | CF 1073E | Hard | Gaussian elimination on graph |
| 10 | Ant Colony | CF 1525E | Hard | Markov chain + matrix exp |
| 11 | Dishonest Sellers | CF 779C | Easy | Basic probability + complementary counting |
| 12 | Magic Gems | CF 1117D | Medium | Expected value with matrix exponentiation |
| 13 | Nezzar and Board | CF 1478C | Medium | Conditional probability reasoning |
| 14 | Random Events | CF 1461D | Medium | Independent events, complementary probability |
| 15 | Game on Leaves | CF 1363C | Medium | Probability on trees, Bayes reasoning |
| 16 | Birthday Paradox | CSES | Hard | Birthday paradox / collision analysis |
Further Reading
- cp-algorithms: Probability -- expected value and linearity basics.
- CF Blog: Expected Value and Linearity -- excellent tutorial with practice problems.
- CF Blog: Probability in CP -- covers modular probability and common patterns.
- Competitive Programmer's Handbook, Ch. 24 -- probability and expected value section.
- For Gaussian elimination details, see:
10-gaussian-elimination.md(future topic). - For advanced Markov chain analysis, see:
markov-chains.md(future topic). - MIT OCW 6.041 -- Probabilistic Systems Analysis -- rigorous probability course, excellent for quant interview prep.
- Brilliant.org -- Probability -- interactive problems for building intuition.
- Jane Street Probability Puzzles -- real quant interview-style problems.
- For Monte Carlo methods in depth, see:
monte-carlo.md(future topic).