Appearance
Fast I/O and Pragmas
Quick Revisit
- USE WHEN: TLE on a correct solution with large input (n > 10⁵); suspect slow I/O
- INVARIANT:
ios::sync_with_stdio(false); cin.tie(nullptr);makescin/coutas fast asscanf/printf- TIME: I/O-bound;
'\n'vsendlalone can halve runtime | SPACE: O(1)- CLASSIC BUG: Using
endl(flushes buffer every line) instead of'\n'; mixingscanf/printfwithcin/coutafter disabling sync- PRACTICE: 01-ladder-1100-to-1400
The difference between AC and TLE is often not your algorithm—it's your I/O. Before you optimize a single line of logic, make sure you're not losing most of your runtime reading input.
Difficulty: [Beginner] Prerequisites: Basic C++ syntax, compiling and running a program (see C++ Essentials) Topics: I/O optimization, compiler pragmas, GCC builtins, contest templates
Contents
- Why Your First Submission Gets TLE
- The Line That Makes cin As Fast As scanf
- The endl Trap
- scanf and printf—The C Veterans
- Compiler Pragmas—Free Speed
- GCC Builtins—Bit Tricks in Hardware
- The
#define int long longDebate - Common Macros That Save Keystrokes
- File I/O for USACO
- C++ STL Reference
- Implementation—Contest-Ready Templates
- Operations Reference
- Problem Patterns Where I/O Speed Matters
- Contest Cheat Sheet
- Gotchas and Debugging
- IO Debugging
- Interactive Problem IO
- scanf vs cin Benchmark Comparison
- What Would You Do If...?
- When NOT to Use Fast I/O Tricks
- Debug This Exercises
- Self-Test
- Practice Problems
- Further Reading
Why Your First Submission Gets TLE
You write a perfectly correct solution. You submit. TLE.
The problem has
cin and cout, by default, are synchronized with C's stdio (scanf/printf). Every single cin >> call checks whether scanf has also been reading from the same stream. That synchronization overhead is enormous—it can make I/O several times slower than it needs to be.
Here's the gap in concrete numbers:
| Method | Time to read |
|---|---|
cin (default) | ~750 ms |
cin (with sync off) | ~100 ms |
scanf | ~100 ms |
Hand-written getchar parser | ~30 ms |
That first row is the margin between AC and TLE on a surprising number of problems.
The Line That Makes cin As Fast As scanf
cpp
ios::sync_with_stdio(false);
cin.tie(nullptr);Put these two lines at the very top of main(). That's it. Here's what each does:
ios::sync_with_stdio(false)—By default, C++ keeps its I/O streams (cin, cout) synchronized with C's streams (stdin, stdout). That synchronization lets you freely mix cin with scanf in the same program, but it costs dearly: every read goes through C's FILE* layer. Turning sync off breaks that guarantee and lets C++ streams use their own internal buffers—dramatically faster, but now you must commit to one I/O family and never mix them.
cin.tie(nullptr)—By default, cin is "tied" to cout, meaning cout flushes automatically before every cin >>. This exists so that interactive prompts like cout << "Enter n: "; cin >> n; display before the user types. You never need that behavior in competitive programming, and untying prevents thousands of unnecessary flushes.
Real-Life Analogy:
cin.tieis like a waiter who clears your plate before bringing each new course—polite at a restaurant, but agonizingly slow if you're eatingcourses at a competitive eating contest.
How cin.tie works
text
cin TIED to cout cin UNTIED
+------+ +------+ +------+ +------+
| cin |---->| cout | | cin | | cout |
+------+ +------+ +------+ +------+
| | | |
v v v v
read x flush first read x no flush
then read
n reads --> n flushes n reads --> 0 flushesHow the sync mechanism works
text
DEFAULT (sync ON):
+---------+ +---------+
| cin |<--->| stdin | Both share the same underlying
+---------+ +---------+ buffer. Every read/write goes
| cout |<--->| stdout | through C's FILE* layer.
+---------+ +---------+ Slow but safe to mix.
AFTER sync_with_stdio(false):
+---------+ +---------+
| cin | | stdin | Separate buffers. C++ streams
+---------+ +---------+ manage their own memory. Much
| cout | | stdout | faster, but DO NOT mix cin
+---------+ +---------+ with scanf anymore.The endl Trap
This is the second most common performance killer:
cpp
// SLOW -- flushes the buffer after EVERY line
cout << x << endl;
// FAST -- just writes a newline character
cout << x << '\n';endl does two things: writes '\n' and flushes the output buffer. Flushing means "send everything in the buffer to the OS right now." If you're printing
What flushing actually looks like
text
Using '\n' (buffered):
+-------------------------------------------+
| Output Buffer (e.g. 8 KB) |
| "1\n2\n3\n4\n5\n6\n7\n..." |
+-------------------------------------------+
|
v (flushes once when full or at program exit)
[ Operating System ]
Using endl (flush every line):
+--------+
| "1\n" |---> flush! ---> OS
+--------+
+--------+
| "2\n" |---> flush! ---> OS <-- thousands of syscalls
+--------+
+--------+
| "3\n" |---> flush! ---> OS
+--------+Rule: Never use endl in competitive programming. Always use '\n'.
The only exception is interactive problems—covered later—where you need the judge to see your output before it responds.
scanf and printf—The C Veterans
scanf and printf date back to the 1970s C language. C++ introduced cin/cout with type-safe stream operators but kept backward compatibility with C's I/O—which is exactly why the synchronization overhead exists today.
Before the sync trick was widely known, competitive programmers defaulted to C-style I/O:
cpp
int n;
scanf("%d", &n);
printf("%d\n", n);scanf/printf are fast by default—no sync overhead. They're still perfectly valid and common in top-rated submissions. The trade-offs:
| Feature | cin/cout (sync off) | scanf/printf |
|---|---|---|
| Speed | Fast | Fast |
| Type safety | Yes (compile-time) | No (runtime crashes) |
| String handling | Easy (getline) | Painful (%s, %c) |
| Format strings | Not needed | Required |
| Code verbosity | Less | More |
Verdict: Use cin/cout with sync off. It's faster to write, type-safe, and equally fast. Fall back to scanf/printf only when debugging a subtle I/O issue or working with legacy code.
With I/O speed sorted out, the next source of free performance is telling the compiler to work harder on your behalf.
Compiler Pragmas—Free Speed
Pragmas are directives to the compiler, written near the top of your file. They don't touch your logic—they change how the compiler translates that logic into machine instructions.
cpp
#pragma GCC optimize("O2")
#pragma GCC optimize("unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt")What each pragma does
#pragma GCC optimize("O2")—Enables optimization level 2. The compiler spends more time on your behalf: inlining small functions, eliminating dead code, reordering instructions. Most online judges already compile with -O2, but the pragma guarantees it even when they don't. "O3" is more aggressive but can occasionally change floating-point behavior.
text
Optimization Level Spectrum
O0 O1 O2 O3
|-----------|-----------|-------------|
none basic standard aggressive
O0 -- no inlining no reorder no vectorize
O1 -- basic inline some reorder
O2 -- full inline reorder dead code cut
O3 -- auto vectorize speculative fp may shift
^ ^
| |
most judges use with care
default here#pragma GCC optimize("unroll-loops")—Tells the compiler to unroll small loops. Instead of:
text
loop: cmp i, n (check condition)
... (body)
inc i (increment)
jmp loop (jump back)It generates:
text
... (body with i=0)
... (body with i=1)
... (body with i=2)
... (body with i=3)
cmp i, n
jmp loopFewer jumps = fewer branch mispredictions = faster.
#pragma GCC target("avx2,bmi,bmi2,popcnt")—Enables modern CPU instruction sets. AVX2 lets the CPU process 256 bits at once—eight 32-bit integers in a single operation. The popcnt flag enables hardware population count. Most Codeforces judges run on CPUs that support these.
Warning: Pragmas are non-portable. They work on GCC (which CF, AtCoder, and most judges use) but not MSVC. Don't rely on them for correctness—they should only make correct code faster.
Pragmas control how the compiler translates your code, but GCC also exposes individual CPU instructions you can call directly.
GCC Builtins—Bit Tricks in Hardware
GCC provides built-in functions that compile down to single CPU instructions. For a deeper dive into bitwise techniques, see Bit Manipulation and Bitset Optimization.
__builtin_popcount(x)—Count the number of 1-bits in x.
cpp
int x = 0b10110101; // binary
int bits = __builtin_popcount(x); // 5Use __builtin_popcountll(x) for long long.
__builtin_clz(x)—Count Leading Zeros. Number of zero bits before the first 1-bit from the left. Useful for finding the highest set bit.
cpp
int x = 16; // binary: 10000
int lz = __builtin_clz(x); // 27 (for 32-bit int: 32 - 5 = 27)
int highest_bit = 31 - __builtin_clz(x); // 4 (the bit position)__builtin_ctz(x)—Count Trailing Zeros. Number of zero bits after the last 1-bit from the right. Useful for finding the lowest set bit.
cpp
int x = 40; // binary: 101000
int tz = __builtin_ctz(x); // 3Warning: __builtin_clz(0) and __builtin_ctz(0) are undefined behavior. Always check x != 0 first, or use the C++20 <bit> header (std::popcount, std::countl_zero, std::countr_zero) which are well-defined for zero.
text
Bit layout of x = 40 (0b00101000):
+---+---+---+---+---+---+---+---+
| 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | (showing lowest 8 bits)
+---+---+---+---+---+---+---+---+
7 6 5 4 3 2 1 0 <-- bit positions
__builtin_clz: counts from the LEFT (high bits)
__builtin_ctz: counts from the RIGHT (low bits) --> 3 trailing zeros
__builtin_popcount: total 1-bits = 2The #define int long long Debate
You'll see this in many competitive programmers' templates:
cpp
#define int long longThis replaces every int in your code with long long, giving you 64-bit integers everywhere. The motivation is sound: forgetting long long when a computation can exceed
Pros:
- Eliminates overflow bugs from forgetting
long long - You never have to think about which variables need 64-bit
- A product like
where just works
Cons:
- Doubles memory for integer arrays (
bytes → ) - Slightly slower on 32-bit operations (usually negligible)
main()must be declaredint32_t main()sinceint mainexpands tolong long main, which is invalid- Can confuse debuggers and static analyzers
Verdict: A useful training wheel. Many 2400+ rated coders keep it permanently. If you adopt it, always use int32_t main(), and watch the memory cost on problems with very large arrays (
Common Macros That Save Keystrokes
These appear in nearly every competitive programmer's template:
cpp
#define all(x) (x).begin(), (x).end()
#define sz(x) (int)(x).size()
#define pb push_back
#define fi first
#define se secondUsage:
cpp
vector<int> v = {3, 1, 4, 1, 5};
sort(all(v)); // instead of sort(v.begin(), v.end())
cout << sz(v) << '\n'; // instead of (int)v.size()The (int) cast in sz prevents signed/unsigned comparison warnings when you write for (int i = 0; i < sz(v); i++).
Macros are fine until they make your code unreadable to anyone else. The ones above are universally recognized in competitive programming—beyond that, you're paying readability for keystrokes.
File I/O for USACO
USACO problems read from and write to files. If the problem name is paint, you read from paint.in and write to paint.out:
cpp
int main() {
freopen("paint.in", "r", stdin);
freopen("paint.out", "w", stdout);
// now cin/cout read/write to files automatically
int n;
cin >> n;
cout << n << '\n';
}freopen redirects stdin/stdout to files. After those calls, cin reads from the file and cout writes to it—no other changes needed in your code.
text
Before freopen After freopen
+--------+ +--------+
| stdin |<--- keyboard | stdin |<--- file in
+--------+ +--------+
+--------+ +--------+
| stdout |---> terminal | stdout |---> file out
+--------+ +--------+
cin reads from keyboard cin reads from file
cout writes to terminal cout writes to fileTip: When testing locally, comment out the freopen lines so you can type input directly. Or wrap them:
cpp
#ifdef LOCAL
// reads from terminal when testing
#else
freopen("paint.in", "r", stdin);
freopen("paint.out", "w", stdout);
#endifCompile with -DLOCAL during testing to activate the ifdef.
What the Code Won't Teach You
Fast I/O belongs in your template, not in your memory. Every competitive programmer pastes a boilerplate at the start of every problem. The two fast I/O lines should live there permanently—as automatic as #include <bits/stdc++.h>. If you rely on remembering to add them under contest pressure, you will forget and lose time debugging a correct solution that TLEs.
There is also a strict order to optimization: algorithm first, then I/O, then constant factors, then pragmas. Reaching for pragmas before exhausting the earlier options is treating symptoms instead of causes.
text
Optimization priority
1 Algorithm |############################| high impact
2 Fast IO |####################| medium
3 Constants |###########| low
4 Pragmas |#####| lowest
Try in this order -->
Fix algorithm first, then add pragmas lastFor truly extreme constraints (cin may not suffice. A custom getchar_unlocked() reader can be several times faster. This is rare—you'll know when you need it because the constraints all but say "I/O is the bottleneck." Standard fast I/O handles the overwhelming majority of problems.
text
IO speed ladder
cin default |#########################| slow
cin sync off |#####| fast
scanf |#####| fast
getchar reader |##| faster
mmap |#| extreme
+--+--+--+--+--+--+--+--+C++ STL Reference
All I/O-related functions, pragmas, and builtins in one place:
| Name | Header / Source | What It Does | Notes |
|---|---|---|---|
ios::sync_with_stdio(false) | <iostream> | Disables C/C++ I/O sync | Call once at top of main() |
cin.tie(nullptr) | <iostream> | Unties cin from cout | Prevents auto-flush before reads |
cout << '\n' | <iostream> | Outputs newline without flush | Always prefer over endl |
endl | <iostream> | Outputs newline AND flushes | Only for interactive problems |
scanf / printf | <cstdio> | C-style formatted I/O | Fast by default, no sync needed |
freopen(f, m, stream) | <cstdio> | Redirects stream to file | USACO-style file I/O |
getline(cin, s) | <string> | Reads entire line into string | Watch out after cin >> x |
__builtin_popcount(x) | GCC builtin | Count of 1-bits in x | Use popcountll for 64-bit |
__builtin_clz(x) | GCC builtin | Leading zero count | UB if x == 0 |
__builtin_ctz(x) | GCC builtin | Trailing zero count | UB if x == 0 |
std::popcount(x) | <bit> (C++20) | Portable popcount | Requires unsigned type |
std::countl_zero(x) | <bit> (C++20) | Portable CLZ | Well-defined for 0 |
std::countr_zero(x) | <bit> (C++20) | Portable CTZ | Well-defined for 0 |
#pragma GCC optimize(...) | Pragma | Sets optimization flags | "O2", "O3", "unroll-loops" |
#pragma GCC target(...) | Pragma | Enables CPU features | "avx2", "popcnt", "bmi2" |
Implementation—Contest-Ready Templates
Minimal Template (Copy-Paste This)
This is the actual boilerplate you paste into every new file:
cpp
#pragma GCC optimize("O2,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt")
#include <bits/stdc++.h>
using namespace std;
#define int long long
#define all(x) (x).begin(), (x).end()
#define sz(x) (int)(x).size()
int32_t main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
// your solution here
return 0;
}Explained Template (Learn From This)
cpp
// Tell GCC to optimize aggressively and use modern CPU instructions.
// These are safe on Codeforces, AtCoder, and most online judges.
#pragma GCC optimize("O2,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt")
// The competitive programmer's universal header -- includes everything.
// Never use in production code, but perfect for contests.
#include <bits/stdc++.h>
using namespace std;
// Every 'int' becomes 'long long'. Prevents overflow bugs when you
// forget that a * b can exceed 2^31. The cost: 2x memory per int.
#define int long long
// Shorthand so sort(v.begin(), v.end()) becomes sort(all(v)).
#define all(x) (x).begin(), (x).end()
// Returns size as int, avoiding signed/unsigned comparison warnings.
#define sz(x) (int)(x).size()
// Since we #defined int as long long, main() would become
// "long long main()" which is invalid. int32_t is always 32-bit.
int32_t main() {
// Disable synchronization between C and C++ I/O streams.
// Makes cin/cout several times faster. After this, NEVER mix cin with scanf.
ios::sync_with_stdio(false);
// Untie cin from cout. Without this, cout flushes before every cin
// read, which is pointless in competitive programming.
cin.tie(nullptr);
// --- USACO only: uncomment for file I/O ---
// freopen("problem.in", "r", stdin);
// freopen("problem.out", "w", stdout);
// Your solution goes here.
int n;
cin >> n;
cout << n << '\n'; // '\n' not endl!
return 0;
}USACO-Specific Template
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
freopen("problem.in", "r", stdin);
freopen("problem.out", "w", stdout);
int n;
cin >> n;
// solve...
cout << answer << '\n';
return 0;
}Replace "problem" with the actual problem name (e.g., "paint", "hps").
Operations Reference
Relative speed of I/O methods reading
| Method | Approx Time | Relative Speed | Notes |
|---|---|---|---|
cin >> x (default sync) | ~750 ms | 1x (baseline) | Don't do this |
scanf("%d", &x) | ~100 ms | ~7x faster | Always fast |
cin >> x (sync off) | ~100 ms | ~7x faster | Best balance of speed + safety |
Custom getchar parser | ~30 ms | ~25x faster | Overkill for most problems |
mmap + manual parse | ~10 ms | ~75x faster | Never needed in CP |
Output comparison printing
| Method | Approx Time | Notes |
|---|---|---|
cout << x << endl | ~1500 ms | Flushes every line—disastrous |
cout << x << '\n' | ~80 ms | Buffered output—fast |
printf("%d\n", x) | ~80 ms | Equally fast |
| Build string, print once | ~50 ms | Marginal gain, rarely needed |
Problem Patterns Where I/O Speed Matters
Pattern Fingerprint:
with algorithm → fast I/O is mandatory. test cases → use '\n'notendl. "Interactive" in problem title → flush after every query.
Pattern 1: Large Input, Trivial Computation
Problems where
Example: CF 1914A -- Problemsolving Log—Read
Pattern 2: Many Test Cases
Problems with
Example: CF 1873A -- Short Sort
Pattern 3: Interactive Problems
Interactive problems send queries to the judge and read responses. You must flush after every output line, or the judge never sees your query.
cpp
// Interactive problem template
cout << "? " << query << endl; // endl flushes -- needed here!
cin >> response;
// ... after finding the answer:
cout << "! " << answer << endl;This is the ONE place where endl (or cout.flush()) is correct.
Example: CF 1167B1 -- Wrong Answer
Pattern 4: Output-Heavy Problems
Printing a large matrix, grid, or sequence. Use '\n' and let the buffer handle it.
Example: CF 1923A -- Moving Chips
Pattern 5: String Processing With getline
Reading strings with spaces requires getline. Watch the newline trap: after cin >> n, a stray '\n' remains in the buffer. You must consume it:
cpp
int n;
cin >> n;
cin.ignore(); // consume the leftover '\n'
string line;
getline(cin, line);Pattern 6: Bitwise Problems Using Builtins
Counting set bits, finding MSB/LSB—use __builtin_popcount and friends instead of writing manual loops.
Example: CF 1957A -- Stickogon—problems that benefit from fast bit operations.
Contest Cheat Sheet
Mnemonic Anchor: S-U-N—Sync off, Untie cin, Newline not endl. Three letters, three lines, zero TLEs from I/O.
If I Had 5 Minutes
If you're about to start a contest and can only remember five things:
ios::sync_with_stdio(false); cin.tie(nullptr);—paste these in every solution, always'\n'notendl—endlflushes; flushing in loops causes TLEendlfor interactive only—the one exception to rule #2- Never mix
cinwithscanfafter disabling sync—pick one I/O family cin.ignore()beforegetline—eat the leftover newline aftercin >>
"Before You Code" I/O Checklist
Run through this before writing any solution:
- [ ] Template has
ios::sync_with_stdio(false)andcin.tie(nullptr) - [ ] No
endlin output loops (using'\n'instead) - [ ] If interactive: using
endlorcout.flush()after every query, and NOT usingcin.tie(nullptr) - [ ] If using
getlineaftercin >>: addedcin.ignore()between them - [ ] If USACO:
freopenlines are correct and uncommented
text
+------------------------------------------------------------------+
| FAST I/O CHEAT SHEET |
+------------------------------------------------------------------+
| TOP OF FILE: |
| #pragma GCC optimize("O2,unroll-loops") |
| #pragma GCC target("avx2,bmi,bmi2,popcnt") |
| #include <bits/stdc++.h> |
| using namespace std; |
| |
| TOP OF main(): |
| ios::sync_with_stdio(false); |
| cin.tie(nullptr); |
| |
| RULES: |
| * Use '\n' instead of endl (ALWAYS, except interactive) |
| * Never mix cin with scanf after disabling sync |
| * For interactive problems: use endl or cout.flush() |
| * USACO: add freopen("X.in","r",stdin) |
| |
| BUILTINS: |
| __builtin_popcount(x) -> # of 1-bits |
| __builtin_clz(x) -> leading zeros (UB if x==0) |
| __builtin_ctz(x) -> trailing zeros (UB if x==0) |
| Add 'll' suffix for long long versions |
| |
| MACROS: |
| #define int long long (use int32_t main) |
| #define all(x) (x).begin(),(x).end() |
| #define sz(x) (int)(x).size() |
+------------------------------------------------------------------+Gotchas and Debugging
Mixing C and C++ I/O After Disabling Sync
This is the #1 bug from fast I/O:
cpp
ios::sync_with_stdio(false);
scanf("%d", &n); // BUG: using C I/O after disabling sync
cin >> m; // reads from a different buffer -- data lost/corruptedAfter sync_with_stdio(false), pick one style and stick with it. Use cin/cout for everything, or scanf/printf for everything. Never both.
The getline Trap
cpp
int n;
cin >> n;
string s;
getline(cin, s); // BUG: reads the leftover '\n', s is empty!Fix: add cin.ignore() or cin >> ws before getline.
cpp
int n;
cin >> n;
cin.ignore(); // consume the '\n' left by cin >> n
string s;
getline(cin, s); // now reads the actual next lineendl in Tight Output Loops
cpp
for (int i = 0; i < 1000000; i++) { // invariant: a[0..i-1] already printed
cout << a[i] << endl; // 10^6 flushes = TLE
}Replace with '\n'. This alone can take a solution from TLE to AC.
Forgetting int32_t main() With #define int long long
cpp
#define int long long
int main() { // expands to: long long main() -- INVALID!The fix: always use int32_t main() when you have this define.
Pragma Doesn't Fix Wrong Algorithms
Pragmas give a constant-factor speedup. They won't save an
__builtin_clz(0) Is Undefined Behavior
cpp
int x = 0;
int lz = __builtin_clz(x); // UB! Could return anything or crashAlways guard with if (x != 0) or use C++20's std::countl_zero which returns the bit width for zero input.
freopen Path Issues
On your local machine, freopen("paint.in", "r", stdin) looks for the file relative to where you run the program, not where the source file is. If you get "no such file," check your working directory.
Large Arrays With #define int long long
cpp
#define int long long
int a[10000000]; // 80 MB! Might exceed memory limit (usually 256 MB)Without the define, int a[10000000] is only 40 MB. For very large arrays, consider removing the #define or using int32_t explicitly for the array:
cpp
#define int long long
int32_t a[10000000]; // still 40 MBMental Traps
The Mistake That Teaches: During a Codeforces round, I solved problem C in 12 minutes—an
solution for . I submitted confidently. TLE on test 3. I spent 40 minutes rewriting the algorithm three different ways, each one TLE. With 5 minutes left, I added ios::sync_with_stdio(false); cin.tie(nullptr);and changedendlto'\n'. Accepted in 62 ms. The bottleneck was never the algorithm—it was the I/O. Now those two lines live in my template, and I never think about them again.
Trap 1: "My algorithm is fast, so I/O time is negligible"
For small
text
WRONG -- small n RIGHT -- n past 1M
Time ---> Time --->
+----------+--+ +----+----------+
| Algo |IO| |Algo| IO |
|##########| | |####|##########|
+----------+--+ +----+----------+
^ dominant ^ dominantAlways add the two fast I/O lines. Two lines of code, zero risk.
Trap 2: "Pragmas will rescue my
Pragmas give a constant-factor speedup. They cannot change algorithmic complexity.
text
WRONG -- pragma saves n*n RIGHT -- fix algorithm
n 100k 10^10 ops n 100k n log n
+--------------------+ +--+
|####################| TLE |##| AC
+--------------------+ +--+
|
| pragma /4x speedup
v
+----------+
|##########| still TLE
+----------+Trap 3: "sync_with_stdio(false) speeds up all I/O"
It only decouples C++ streams from C streams. C library functions like scanf and printf are unaffected.
text
WRONG RIGHT
All IO gets faster Only cin and cout faster
+--------+------+ +--------+------+
| cin | FAST | | cin | FAST |
| cout | FAST | | cout | FAST |
| scanf | FAST | <-- no | scanf | same |
| printf | FAST | <-- no | printf | same |
+--------+------+ +--------+------+IO Debugging
When your output looks correct but the judge says Wrong Answer, the problem is often invisible whitespace or flushing—not your algorithm. This section covers the tools for hunting down I/O bugs.
See also: Debugging and Stress Testing for general debugging strategies beyond I/O.
Using cerr for Debug Output
cerr writes to standard error, which online judges ignore. You can leave debug prints in your submission and they won't affect your answer:
cpp
int n;
cin >> n;
cerr << "DEBUG: n = " << n << '\n'; // judge ignores this
// ... solve ...
cout << answer << '\n'; // judge reads thisTip: Wrap debug output in a macro so you can disable it with one toggle:
cpp
#ifdef LOCAL
#define dbg(x) cerr << #x << " = " << (x) << '\n'
#else
#define dbg(x) // expands to nothing on the judge
#endifCompile with g++ -DLOCAL solution.cpp locally to see debug output.
Testing I/O With File Redirection
Instead of typing input every time, save it to a file and redirect:
bash
# Create input file
echo "3
1 2 3" > input.txt
# Run with redirected input
./solution < input.txt
# Compare output against expected
./solution < input.txt > output.txt
diff output.txt expected.txttext
Terminal workflow
+----------+ +-----------+ +----------+
| input.txt|----->| ./solution|----->|output.txt|
+----------+ < +-----------+ > +----------+
|
diff v
+----------+
|expected |
+----------+Common I/O Debugging Mistakes
Reading past end of input: If your code tries to read more values than the input provides, cin enters a fail state and all subsequent reads return 0/empty.
cpp
// Input has 3 numbers: "1 2 3"
int a, b, c, d;
cin >> a >> b >> c >> d; // d reads 0 -- cin is in fail state
// All further cin >> operations silently failMixing cin and scanf: After sync_with_stdio(false), the C and C++ streams use separate buffers. Data read by one is invisible to the other.
cpp
ios::sync_with_stdio(false);
int x, y;
scanf("%d", &x); // reads from C buffer
cin >> y; // reads from C++ buffer -- different data!Extra spaces or newlines in output: Some judges require exact output format. A trailing space or missing newline can cause Wrong Answer.
cpp
// WRONG -- trailing space after last element
for (int i = 0; i < n; i++) { // invariant: i elements printed so far
cout << a[i] << ' ';
}
cout << '\n';
// RIGHT -- no trailing space
for (int i = 0; i < n; i++) { // invariant: i elements printed so far
if (i > 0) cout << ' ';
cout << a[i];
}
cout << '\n';Interactive Problem IO
Interactive problems are a special category where your program communicates with the judge in real time: you send a query, the judge responds, and you use that response to decide your next query.
How Interactive Problems Work on Codeforces
text
Your program Judge
+----------+ +----------+
| cout << |---"? 5"------>| reads |
| query | | query |
+----------+ +----------+
|
+----------+ +----------+
| cin >> |<--"1"---------| sends |
| response | | response |
+----------+ +----------+
| | | |
| cout << |---"! 7"------>| checks |
| answer | | answer |
+----------+ +----------+- You print a query (e.g.,
"? 5") and flush the output - The judge reads your query and prints a response
- You read the response and decide the next query
- When you know the answer, print
"! answer"and flush
The Flush Requirement
This is the single most common mistake in interactive problems. If you don't flush after printing, your output sits in a buffer and the judge never sees it—your program hangs waiting for a response that never comes, and you get TLE or "Idleness limit exceeded."
cpp
// Method 1: Use endl (it writes '\n' AND flushes)
cout << "? " << mid << endl;
// Method 2: Use cout.flush() explicitly
cout << "? " << mid << '\n';
cout.flush();
// Method 3: Tie cout to cin (auto-flush before each read)
// Note: cin.tie(&cout) is the DEFAULT behavior, so if you
// called cin.tie(nullptr), you need to re-tie for interactive:
cin.tie(&cout);Example: Binary Search Interactive Pattern
Many interactive problems follow a binary search pattern—you have a limited number of queries to find a hidden value.
cpp
int32_t main() {
ios::sync_with_stdio(false);
// Do NOT untie cin from cout in interactive problems!
// cin.tie(nullptr); <-- OMIT this line
int lo = 1, hi = 1000000;
while (lo < hi) { // invariant: answer is in [lo, hi]
int mid = lo + (hi - lo) / 2;
cout << "? " << mid << endl; // query + flush
int response;
cin >> response;
if (response == 1) {
hi = mid; // answer <= mid
} else {
lo = mid + 1; // answer > mid
}
}
// invariant: lo == hi, we found the answer
cout << "! " << lo << endl; // final answer + flush
return 0;
}Interactive Problem Pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
| Forgetting to flush | TLE or "Idleness limit exceeded" | Use endl or cout.flush() after every query |
Using cin.tie(nullptr) | Queries don't reach the judge | Omit cin.tie(nullptr) or re-tie with cin.tie(&cout) |
| Reading too much | Program hangs | Only read exactly what the judge sends |
Printing debug to cout | Judge interprets it as a query | Use cerr for all debug output |
Using '\n' instead of endl | Output stays buffered | Switch to endl for interactive |
scanf vs cin Benchmark Comparison
Benchmarks reading g++ -O2, averaged over 10 runs. Times vary by machine; the ratios are what matter:
| Method | Time (ms) | Relative | Code |
|---|---|---|---|
cin >> x (default sync) | ~750 | 1.0x | cin >> x; |
cin >> x (sync off, still tied) | ~130 | 5.8x faster | sync_with_stdio(false) only |
cin >> x (sync off + untied) | ~100 | 7.5x faster | Both lines in template |
scanf("%d", &x) | ~105 | 7.1x faster | No setup needed |
Custom getchar reader | ~30 | 25x faster | Manual digit parsing |
getchar_unlocked reader | ~18 | 42x faster | Non-portable, Linux only |
When Each Method Matters
: Any method works. Don't worry about I/O speed. : Default cinwill TLE on 1–2 s time limits. Use the two-line template (sync off + untied) orscanf. This covers nearly all problems., tight TL: Consider a custom reader. This is rare—perhaps a handful of problems per year on CF. : Custom reader is almost mandatory. See cp-algorithms I/O guide for implementations.
The untying (
cin.tie(nullptr)) matters independently of sync. Even with sync off, a tiedcinstill forces a flush before every read—overhead you don't notice untilhits .
What Would You Do If...?
Scenario 1: You submit a correct
Check in order: (1) Are the two fast I/O lines present? (2) Are you using
endlanywhere in a loop? (3) Are you mixingcinwithscanf? (4) Is#define int long longcausing cache issues on huge arrays? If all else fails, tryscanf/printfor a custom reader.
Scenario 2: Your interactive submission gets "Idleness limit exceeded" on test 1 but works perfectly locally.
You're almost certainly not flushing. Check: (1) Are you using
endlorcout.flush()after every query? (2) Did you accidentally includecin.tie(nullptr)in your template? For interactive problems, remove that line or re-tie withcin.tie(&cout).
Scenario 3: Your solution reads a string with spaces but getline returns an empty string on the first call.
The classic
getlinetrap. You read an integer withcin >> nbeforegetline, and the leftover'\n'is consumed bygetlineinstead of the actual line. Addcin.ignore()orcin >> wsbetween thecin >>andgetlinecalls. See C++ Essentials for more on string I/O patterns.
When NOT to Use Fast I/O Tricks
Not every technique in this file is appropriate for every situation:
| Technique | When to Skip It | Why |
|---|---|---|
#pragma GCC optimize | Production code, team projects | Non-portable, obscures real performance issues |
#pragma GCC target("avx2") | Unknown judge hardware | Old CPUs may not support AVX2—causes RE |
#define int long long | Arrays with | Doubles memory; can cause MLE |
cin.tie(nullptr) | Interactive problems | Judge never sees your queries without flush |
sync_with_stdio(false) | Code that must mix cin with scanf | Separate buffers cause data corruption |
Custom getchar reader | Unnecessary complexity; standard I/O is fast enough | |
bits/stdc++.h | Production code, portability | GCC-only header, massive compile time |
Pattern Fingerprint: Constraint says
with 2 s TL → I/O speed is irrelevant; focus on algorithm. Constraint says with 1 s TL → use template fast I/O. Constraint says with 3 s TL → consider a custom reader.
Debug This Exercises
Each snippet has a subtle I/O bug. Try to spot it before revealing the answer.
Exercise 1: The Silent Fail
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n;
scanf("%d", &n);
vector<int> a(n);
for (int i = 0; i < n; i++) cin >> a[i]; // invariant: a[0..i-1] read
cout << *max_element(a.begin(), a.end()) << '\n';
}Bug and Fix
Bug: scanf is used after sync_with_stdio(false). The C and C++ streams now use separate buffers, so scanf reads n from the C buffer, but cin reads array values from the C++ buffer—which may be empty or contain different data.
Fix: Replace scanf("%d", &n) with cin >> n, or remove the sync_with_stdio(false) line.
Exercise 2: The Hanging Interactive
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int lo = 1, hi = 1000;
while (lo < hi) {
int mid = (lo + hi) / 2;
cout << "? " << mid << '\n';
int resp;
cin >> resp;
if (resp <= mid) hi = mid;
else lo = mid + 1;
}
cout << "! " << lo << '\n';
}Bug and Fix
Bug: Two problems. (1) cin.tie(nullptr) means cout is NOT flushed before cin >> resp, so the judge never sees the query—the program hangs. (2) Using '\n' instead of endl means the output stays buffered.
Fix: Remove cin.tie(nullptr) and change '\n' to endl (or add cout.flush() after each query). The final answer line also needs flushing.
Exercise 3: The Phantom Newline
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int t;
cin >> t;
while (t--) {
string s;
getline(cin, s);
cout << s.length() << '\n';
}
}Bug and Fix
Bug: After cin >> t, a '\n' remains in the buffer. The first getline reads this empty line, so s is empty and its length is 0. Every subsequent test case reads the previous test case's input line.
Fix: Add cin.ignore() or cin >> ws after cin >> t (before the loop). If each test case is a single line, cin.ignore() once before the loop is sufficient. If there might be blank lines in input, use cin >> ws.
Exercise 4: The Slow Surprise
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n;
cin >> n;
vector<int> a(n);
for (int i = 0; i < n; i++) cin >> a[i]; // invariant: a[0..i-1] read
sort(a.begin(), a.end());
for (int i = 0; i < n; i++) { // invariant: a[0..i-1] already printed
cout << a[i] << endl;
}
}Bug and Fix
Bug: Two issues. (1) Missing ios::sync_with_stdio(false) and cin.tie(nullptr)—reading endl in the output loop causes
Fix: Add the two fast I/O lines at the top of main(), and replace endl with '\n'. Together, these can speed up the program dramatically.
Self-Test
- [ ] Write the two-line fast I/O setup from memory and explain what each line does
- [ ] Explain why
endlis slower than'\n'and estimate the magnitude of difference - [ ] State what happens if you use
scanfafter callingios::sync_with_stdio(false) - [ ] Name the conditions under which a custom
getchar_unlocked()reader is worth writing - [ ] Describe what
#pragma GCC optimize("O3")actually does and why it cannot fix an algorithmically slow solution - [ ] Explain why
int32_t main()is needed when using#define int long long - [ ] Describe the one situation where
endlIS the correct choice over'\n'
Practice Problems
Work through these in order. Each one will TLE with naive I/O, or has constraints that make fast I/O important.
| # | Problem | Source | Difficulty | Focus |
|---|---|---|---|---|
| 1 | Sum of Round Numbers | CF 1352A | 800 | Multi-test-case I/O |
| 2 | Short Sort | CF 1873A | 800 | Many test cases ( |
| 3 | Odd One Out | CF 1915A | 800 | XOR / builtins practice |
| 4 | CSES -- Weird Algorithm | CSES | Easy | Basic I/O |
| 5 | CSES -- Missing Number | CSES | Easy | Read |
| 6 | CSES -- Bit Strings | CSES | Easy | Modular arithmetic + I/O |
| 7 | Problemsolving Log | CF 1914A | 800 | Large total input |
| 8 | USACO 2020 Dec Bronze -- Do You Know Your ABCs? | USACO | Bronze | File I/O with freopen |
| 9 | CSES -- Two Sets | CSES | Easy | Output-heavy |
| 10 | Guess the Number | CF 1167B1 | 1200 | Interactive (must flush!) |
Rating Progression
How I/O optimization knowledge maps to your competitive programming journey:
| CF Rating | What You Should Know |
|---|---|
| 1200 | The two-line fast I/O setup is automatic in your template. You never use endl except interactively. You've been bitten by the getline trap at least once. |
| 1500 | You recognize when a TLE is caused by I/O vs. algorithm. You know freopen for USACO. You use cerr for debug output and strip it before submitting. |
| 1800 | You can write interactive problem I/O confidently with proper flushing. You understand the cost of #define int long long on memory and know when to avoid it. You've used pragmas to squeeze past tight time limits on correct solutions. |
| 2100 | You can write a custom getchar_unlocked reader for extreme constraints. You understand buffer mechanics and can debug I/O issues by reasoning about stream state. You know exactly when pragmas help and when they're cargo-culting. |
Further Reading
C++ I/O Optimization—cp-algorithms.com Thorough treatment of all I/O methods including custom parsers.
GCC Built-in Functions -- GCC docs Official reference for
__builtin_popcount,__builtin_clz, etc.Codeforces Blog: Fast I/O Classic CF blog post benchmarking I/O methods.
C++ Tricks for Competitive Programming -- CF Blog Collection of pragmas, macros, and other tricks.
Competitive Programmer's Handbook -- Antti Laaksonen Chapter 1 covers I/O and language features. Free PDF.
USACO Guide -- Introduction Beginner-friendly I/O guide with USACO-specific file I/O examples.
C++20
<bit>Header -- cppreference Portable alternatives to GCC builtins.
See also:
- C++ Language Essentials—the language foundations that I/O tricks build upon
- Complexity Analysis—know whether your bottleneck is I/O or algorithm before optimizing
- Common Templates—ready-to-paste contest templates with fast I/O already wired in
Next: Complexity Analysis—learn how to analyze why your algorithm is slow so you know whether the fix is better I/O or a better algorithm.