Skip to content

Fast I/O and Pragmas

Quick Revisit

  • USE WHEN: TLE on a correct solution with large input (n > 10⁵); suspect slow I/O
  • INVARIANT: ios::sync_with_stdio(false); cin.tie(nullptr); makes cin/cout as fast as scanf/printf
  • TIME: I/O-bound; '\n' vs endl alone can halve runtime | SPACE: O(1)
  • CLASSIC BUG: Using endl (flushes buffer every line) instead of '\n'; mixing scanf/printf with cin/cout after disabling sync
  • PRACTICE: 01-ladder-1100-to-1400

The difference between AC and TLE is often not your algorithm—it's your I/O. Before you optimize a single line of logic, make sure you're not losing most of your runtime reading input.

Difficulty: [Beginner] Prerequisites: Basic C++ syntax, compiling and running a program (see C++ Essentials) Topics: I/O optimization, compiler pragmas, GCC builtins, contest templates


Contents


Why Your First Submission Gets TLE

You write a perfectly correct solution. You submit. TLE.

The problem has n106 integers to read. Your algorithm is O(n)—that should be fine; 106 operations finish in well under one second. But the bottleneck isn't your algorithm at all.

cin and cout, by default, are synchronized with C's stdio (scanf/printf). Every single cin >> call checks whether scanf has also been reading from the same stream. That synchronization overhead is enormous—it can make I/O several times slower than it needs to be.

Here's the gap in concrete numbers:

MethodTime to read 106 ints
cin (default)~750 ms
cin (with sync off)~100 ms
scanf~100 ms
Hand-written getchar parser~30 ms

That first row is the margin between AC and TLE on a surprising number of problems.

The Line That Makes cin As Fast As scanf

cpp
ios::sync_with_stdio(false);
cin.tie(nullptr);

Put these two lines at the very top of main(). That's it. Here's what each does:

ios::sync_with_stdio(false)—By default, C++ keeps its I/O streams (cin, cout) synchronized with C's streams (stdin, stdout). That synchronization lets you freely mix cin with scanf in the same program, but it costs dearly: every read goes through C's FILE* layer. Turning sync off breaks that guarantee and lets C++ streams use their own internal buffers—dramatically faster, but now you must commit to one I/O family and never mix them.

cin.tie(nullptr)—By default, cin is "tied" to cout, meaning cout flushes automatically before every cin >>. This exists so that interactive prompts like cout << "Enter n: "; cin >> n; display before the user types. You never need that behavior in competitive programming, and untying prevents thousands of unnecessary flushes.

Real-Life Analogy: cin.tie is like a waiter who clears your plate before bringing each new course—polite at a restaurant, but agonizingly slow if you're eating 106 courses at a competitive eating contest.

How cin.tie works

text
  cin TIED to cout              cin UNTIED
  +------+     +------+        +------+    +------+
  | cin  |---->| cout |        | cin  |    | cout |
  +------+     +------+        +------+    +------+
     |            |                |           |
     v            v                v           v
   read x    flush first         read x    no flush
             then read

  n reads --> n flushes          n reads --> 0 flushes

How the sync mechanism works

text
  DEFAULT (sync ON):
  +---------+     +---------+
  |  cin    |<--->| stdin   |   Both share the same underlying
  +---------+     +---------+   buffer. Every read/write goes
  | cout    |<--->| stdout  |   through C's FILE* layer.
  +---------+     +---------+   Slow but safe to mix.

  AFTER sync_with_stdio(false):
  +---------+     +---------+
  |  cin    |     | stdin   |   Separate buffers. C++ streams
  +---------+     +---------+   manage their own memory. Much
  | cout    |     | stdout  |   faster, but DO NOT mix cin
  +---------+     +---------+   with scanf anymore.

The endl Trap

This is the second most common performance killer:

cpp
// SLOW -- flushes the buffer after EVERY line
cout << x << endl;

// FAST -- just writes a newline character
cout << x << '\n';

endl does two things: writes '\n' and flushes the output buffer. Flushing means "send everything in the buffer to the OS right now." If you're printing 106 lines, that's 106 flush syscalls instead of letting the buffer fill up and flush once in a large chunk.

What flushing actually looks like

text
  Using '\n' (buffered):
  +-------------------------------------------+
  | Output Buffer (e.g. 8 KB)                 |
  |  "1\n2\n3\n4\n5\n6\n7\n..."              |
  +-------------------------------------------+
              |
              v  (flushes once when full or at program exit)
         [ Operating System ]

  Using endl (flush every line):
  +--------+
  | "1\n"  |---> flush! ---> OS
  +--------+
  +--------+
  | "2\n"  |---> flush! ---> OS    <-- thousands of syscalls
  +--------+
  +--------+
  | "3\n"  |---> flush! ---> OS
  +--------+

Rule: Never use endl in competitive programming. Always use '\n'.

The only exception is interactive problems—covered later—where you need the judge to see your output before it responds.

scanf and printf—The C Veterans

scanf and printf date back to the 1970s C language. C++ introduced cin/cout with type-safe stream operators but kept backward compatibility with C's I/O—which is exactly why the synchronization overhead exists today.

Before the sync trick was widely known, competitive programmers defaulted to C-style I/O:

cpp
int n;
scanf("%d", &n);
printf("%d\n", n);

scanf/printf are fast by default—no sync overhead. They're still perfectly valid and common in top-rated submissions. The trade-offs:

Featurecin/cout (sync off)scanf/printf
SpeedFastFast
Type safetyYes (compile-time)No (runtime crashes)
String handlingEasy (getline)Painful (%s, %c)
Format stringsNot neededRequired
Code verbosityLessMore

Verdict: Use cin/cout with sync off. It's faster to write, type-safe, and equally fast. Fall back to scanf/printf only when debugging a subtle I/O issue or working with legacy code.

With I/O speed sorted out, the next source of free performance is telling the compiler to work harder on your behalf.

Compiler Pragmas—Free Speed

Pragmas are directives to the compiler, written near the top of your file. They don't touch your logic—they change how the compiler translates that logic into machine instructions.

cpp
#pragma GCC optimize("O2")
#pragma GCC optimize("unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt")

What each pragma does

#pragma GCC optimize("O2")—Enables optimization level 2. The compiler spends more time on your behalf: inlining small functions, eliminating dead code, reordering instructions. Most online judges already compile with -O2, but the pragma guarantees it even when they don't. "O3" is more aggressive but can occasionally change floating-point behavior.

text
  Optimization Level Spectrum

  O0          O1          O2            O3
  |-----------|-----------|-------------|
  none        basic       standard      aggressive

  O0 -- no inlining    no reorder    no vectorize
  O1 -- basic inline   some reorder
  O2 -- full inline    reorder       dead code cut
  O3 -- auto vectorize speculative   fp may shift
       ^                                  ^
       |                                  |
    most judges                      use with care
    default here

#pragma GCC optimize("unroll-loops")—Tells the compiler to unroll small loops. Instead of:

text
loop: cmp i, n    (check condition)
      ...         (body)
      inc i       (increment)
      jmp loop    (jump back)

It generates:

text
      ... (body with i=0)
      ... (body with i=1)
      ... (body with i=2)
      ... (body with i=3)
      cmp i, n
      jmp loop

Fewer jumps = fewer branch mispredictions = faster.

#pragma GCC target("avx2,bmi,bmi2,popcnt")—Enables modern CPU instruction sets. AVX2 lets the CPU process 256 bits at once—eight 32-bit integers in a single operation. The popcnt flag enables hardware population count. Most Codeforces judges run on CPUs that support these.

Warning: Pragmas are non-portable. They work on GCC (which CF, AtCoder, and most judges use) but not MSVC. Don't rely on them for correctness—they should only make correct code faster.

Pragmas control how the compiler translates your code, but GCC also exposes individual CPU instructions you can call directly.

GCC Builtins—Bit Tricks in Hardware

GCC provides built-in functions that compile down to single CPU instructions. For a deeper dive into bitwise techniques, see Bit Manipulation and Bitset Optimization.

__builtin_popcount(x)—Count the number of 1-bits in x.

cpp
int x = 0b10110101;   // binary
int bits = __builtin_popcount(x);  // 5

Use __builtin_popcountll(x) for long long.

__builtin_clz(x)—Count Leading Zeros. Number of zero bits before the first 1-bit from the left. Useful for finding the highest set bit.

cpp
int x = 16;  // binary: 10000
int lz = __builtin_clz(x);  // 27 (for 32-bit int: 32 - 5 = 27)
int highest_bit = 31 - __builtin_clz(x);  // 4 (the bit position)

__builtin_ctz(x)—Count Trailing Zeros. Number of zero bits after the last 1-bit from the right. Useful for finding the lowest set bit.

cpp
int x = 40;  // binary: 101000
int tz = __builtin_ctz(x);  // 3

Warning: __builtin_clz(0) and __builtin_ctz(0) are undefined behavior. Always check x != 0 first, or use the C++20 <bit> header (std::popcount, std::countl_zero, std::countr_zero) which are well-defined for zero.

text
  Bit layout of x = 40 (0b00101000):
  +---+---+---+---+---+---+---+---+
  | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |   (showing lowest 8 bits)
  +---+---+---+---+---+---+---+---+
    7   6   5   4   3   2   1   0     <-- bit positions

  __builtin_clz:      counts from the LEFT  (high bits)
  __builtin_ctz:      counts from the RIGHT (low bits) --> 3 trailing zeros
  __builtin_popcount: total 1-bits = 2

The #define int long long Debate

You'll see this in many competitive programmers' templates:

cpp
#define int long long

This replaces every int in your code with long long, giving you 64-bit integers everywhere. The motivation is sound: forgetting long long when a computation can exceed 231 is the single most common source of wrong answers for beginners.

Pros:

  • Eliminates overflow bugs from forgetting long long
  • You never have to think about which variables need 64-bit
  • A product like a×b where a,b109 just works

Cons:

  • Doubles memory for integer arrays (4×106 bytes → 8×106)
  • Slightly slower on 32-bit operations (usually negligible)
  • main() must be declared int32_t main() since int main expands to long long main, which is invalid
  • Can confuse debuggers and static analyzers

Verdict: A useful training wheel. Many 2400+ rated coders keep it permanently. If you adopt it, always use int32_t main(), and watch the memory cost on problems with very large arrays (n5×106).

Common Macros That Save Keystrokes

These appear in nearly every competitive programmer's template:

cpp
#define all(x) (x).begin(), (x).end()
#define sz(x) (int)(x).size()
#define pb push_back
#define fi first
#define se second

Usage:

cpp
vector<int> v = {3, 1, 4, 1, 5};
sort(all(v));           // instead of sort(v.begin(), v.end())
cout << sz(v) << '\n';  // instead of (int)v.size()

The (int) cast in sz prevents signed/unsigned comparison warnings when you write for (int i = 0; i < sz(v); i++).

Macros are fine until they make your code unreadable to anyone else. The ones above are universally recognized in competitive programming—beyond that, you're paying readability for keystrokes.

File I/O for USACO

USACO problems read from and write to files. If the problem name is paint, you read from paint.in and write to paint.out:

cpp
int main() {
    freopen("paint.in", "r", stdin);
    freopen("paint.out", "w", stdout);

    // now cin/cout read/write to files automatically
    int n;
    cin >> n;
    cout << n << '\n';
}

freopen redirects stdin/stdout to files. After those calls, cin reads from the file and cout writes to it—no other changes needed in your code.

text
  Before freopen               After freopen
  +--------+                   +--------+
  | stdin  |<--- keyboard      | stdin  |<--- file in
  +--------+                   +--------+
  +--------+                   +--------+
  | stdout |--->  terminal     | stdout |--->  file out
  +--------+                   +--------+

  cin reads from keyboard      cin reads from file
  cout writes to terminal      cout writes to file

Tip: When testing locally, comment out the freopen lines so you can type input directly. Or wrap them:

cpp
#ifdef LOCAL
    // reads from terminal when testing
#else
    freopen("paint.in", "r", stdin);
    freopen("paint.out", "w", stdout);
#endif

Compile with -DLOCAL during testing to activate the ifdef.

What the Code Won't Teach You

Fast I/O belongs in your template, not in your memory. Every competitive programmer pastes a boilerplate at the start of every problem. The two fast I/O lines should live there permanently—as automatic as #include <bits/stdc++.h>. If you rely on remembering to add them under contest pressure, you will forget and lose time debugging a correct solution that TLEs.

There is also a strict order to optimization: algorithm first, then I/O, then constant factors, then pragmas. Reaching for pragmas before exhausting the earlier options is treating symptoms instead of causes.

text
  Optimization priority

  1 Algorithm   |############################| high impact
  2 Fast IO     |####################|        medium
  3 Constants   |###########|                 low
  4 Pragmas     |#####|                       lowest

  Try in this order -->
  Fix algorithm first, then add pragmas last

For truly extreme constraints (n above 107 with a tight time limit), even fast cin may not suffice. A custom getchar_unlocked() reader can be several times faster. This is rare—you'll know when you need it because the constraints all but say "I/O is the bottleneck." Standard fast I/O handles the overwhelming majority of problems.

text
  IO speed ladder

  cin default       |#########################| slow
  cin sync off      |#####|                     fast
  scanf             |#####|                     fast
  getchar reader    |##|                        faster
  mmap              |#|                         extreme
                    +--+--+--+--+--+--+--+--+

C++ STL Reference

All I/O-related functions, pragmas, and builtins in one place:

NameHeader / SourceWhat It DoesNotes
ios::sync_with_stdio(false)<iostream>Disables C/C++ I/O syncCall once at top of main()
cin.tie(nullptr)<iostream>Unties cin from coutPrevents auto-flush before reads
cout << '\n'<iostream>Outputs newline without flushAlways prefer over endl
endl<iostream>Outputs newline AND flushesOnly for interactive problems
scanf / printf<cstdio>C-style formatted I/OFast by default, no sync needed
freopen(f, m, stream)<cstdio>Redirects stream to fileUSACO-style file I/O
getline(cin, s)<string>Reads entire line into stringWatch out after cin >> x
__builtin_popcount(x)GCC builtinCount of 1-bits in xUse popcountll for 64-bit
__builtin_clz(x)GCC builtinLeading zero countUB if x == 0
__builtin_ctz(x)GCC builtinTrailing zero countUB if x == 0
std::popcount(x)<bit> (C++20)Portable popcountRequires unsigned type
std::countl_zero(x)<bit> (C++20)Portable CLZWell-defined for 0
std::countr_zero(x)<bit> (C++20)Portable CTZWell-defined for 0
#pragma GCC optimize(...)PragmaSets optimization flags"O2", "O3", "unroll-loops"
#pragma GCC target(...)PragmaEnables CPU features"avx2", "popcnt", "bmi2"

Implementation—Contest-Ready Templates

Minimal Template (Copy-Paste This)

This is the actual boilerplate you paste into every new file:

cpp
#pragma GCC optimize("O2,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt")
#include <bits/stdc++.h>
using namespace std;

#define int long long
#define all(x) (x).begin(), (x).end()
#define sz(x) (int)(x).size()

int32_t main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    // your solution here

    return 0;
}

Explained Template (Learn From This)

cpp
// Tell GCC to optimize aggressively and use modern CPU instructions.
// These are safe on Codeforces, AtCoder, and most online judges.
#pragma GCC optimize("O2,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt")

// The competitive programmer's universal header -- includes everything.
// Never use in production code, but perfect for contests.
#include <bits/stdc++.h>
using namespace std;

// Every 'int' becomes 'long long'. Prevents overflow bugs when you
// forget that a * b can exceed 2^31. The cost: 2x memory per int.
#define int long long

// Shorthand so sort(v.begin(), v.end()) becomes sort(all(v)).
#define all(x) (x).begin(), (x).end()

// Returns size as int, avoiding signed/unsigned comparison warnings.
#define sz(x) (int)(x).size()

// Since we #defined int as long long, main() would become
// "long long main()" which is invalid. int32_t is always 32-bit.
int32_t main() {
    // Disable synchronization between C and C++ I/O streams.
    // Makes cin/cout several times faster. After this, NEVER mix cin with scanf.
    ios::sync_with_stdio(false);

    // Untie cin from cout. Without this, cout flushes before every cin
    // read, which is pointless in competitive programming.
    cin.tie(nullptr);

    // --- USACO only: uncomment for file I/O ---
    // freopen("problem.in", "r", stdin);
    // freopen("problem.out", "w", stdout);

    // Your solution goes here.
    int n;
    cin >> n;
    cout << n << '\n';  // '\n' not endl!

    return 0;
}

USACO-Specific Template

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    freopen("problem.in", "r", stdin);
    freopen("problem.out", "w", stdout);

    int n;
    cin >> n;
    // solve...
    cout << answer << '\n';

    return 0;
}

Replace "problem" with the actual problem name (e.g., "paint", "hps").


Operations Reference

Relative speed of I/O methods reading 106 integers:

MethodApprox TimeRelative SpeedNotes
cin >> x (default sync)~750 ms1x (baseline)Don't do this
scanf("%d", &x)~100 ms~7x fasterAlways fast
cin >> x (sync off)~100 ms~7x fasterBest balance of speed + safety
Custom getchar parser~30 ms~25x fasterOverkill for most problems
mmap + manual parse~10 ms~75x fasterNever needed in CP

Output comparison printing 106 lines:

MethodApprox TimeNotes
cout << x << endl~1500 msFlushes every line—disastrous
cout << x << '\n'~80 msBuffered output—fast
printf("%d\n", x)~80 msEqually fast
Build string, print once~50 msMarginal gain, rarely needed

Problem Patterns Where I/O Speed Matters

Pattern Fingerprint: n5×105 with O(n) algorithm → fast I/O is mandatory. t104 test cases → use '\n' not endl. "Interactive" in problem title → flush after every query.

Pattern 1: Large Input, Trivial Computation

Problems where n5×105 and the algorithm is O(n) or O(nlogn). The I/O time dominates if you don't optimize. Almost every CF Div 2A-C with large constraints falls here.

Example: CF 1914A -- Problemsolving Log—Read t test cases with strings up to 106 total characters.

Pattern 2: Many Test Cases

Problems with t105 test cases. Even if each case is tiny, the overhead of flushing between cases adds up.

Example: CF 1873A -- Short Sort

Pattern 3: Interactive Problems

Interactive problems send queries to the judge and read responses. You must flush after every output line, or the judge never sees your query.

cpp
// Interactive problem template
cout << "? " << query << endl;   // endl flushes -- needed here!
cin >> response;
// ... after finding the answer:
cout << "! " << answer << endl;

This is the ONE place where endl (or cout.flush()) is correct.

Example: CF 1167B1 -- Wrong Answer

Pattern 4: Output-Heavy Problems

Printing a large matrix, grid, or sequence. Use '\n' and let the buffer handle it.

Example: CF 1923A -- Moving Chips

Pattern 5: String Processing With getline

Reading strings with spaces requires getline. Watch the newline trap: after cin >> n, a stray '\n' remains in the buffer. You must consume it:

cpp
int n;
cin >> n;
cin.ignore();  // consume the leftover '\n'
string line;
getline(cin, line);

Pattern 6: Bitwise Problems Using Builtins

Counting set bits, finding MSB/LSB—use __builtin_popcount and friends instead of writing manual loops.

Example: CF 1957A -- Stickogon—problems that benefit from fast bit operations.


Contest Cheat Sheet

Mnemonic Anchor: S-U-NSync off, Untie cin, Newline not endl. Three letters, three lines, zero TLEs from I/O.

If I Had 5 Minutes

If you're about to start a contest and can only remember five things:

  1. ios::sync_with_stdio(false); cin.tie(nullptr);—paste these in every solution, always
  2. '\n' not endlendl flushes; flushing in loops causes TLE
  3. endl for interactive only—the one exception to rule #2
  4. Never mix cin with scanf after disabling sync—pick one I/O family
  5. cin.ignore() before getline—eat the leftover newline after cin >>

"Before You Code" I/O Checklist

Run through this before writing any solution:

  • [ ] Template has ios::sync_with_stdio(false) and cin.tie(nullptr)
  • [ ] No endl in output loops (using '\n' instead)
  • [ ] If interactive: using endl or cout.flush() after every query, and NOT using cin.tie(nullptr)
  • [ ] If using getline after cin >>: added cin.ignore() between them
  • [ ] If USACO: freopen lines are correct and uncommented
text
+------------------------------------------------------------------+
|                    FAST I/O CHEAT SHEET                          |
+------------------------------------------------------------------+
|  TOP OF FILE:                                                    |
|    #pragma GCC optimize("O2,unroll-loops")                       |
|    #pragma GCC target("avx2,bmi,bmi2,popcnt")                   |
|    #include <bits/stdc++.h>                                      |
|    using namespace std;                                          |
|                                                                  |
|  TOP OF main():                                                  |
|    ios::sync_with_stdio(false);                                  |
|    cin.tie(nullptr);                                             |
|                                                                  |
|  RULES:                                                          |
|    * Use '\n' instead of endl (ALWAYS, except interactive)       |
|    * Never mix cin with scanf after disabling sync               |
|    * For interactive problems: use endl or cout.flush()          |
|    * USACO: add freopen("X.in","r",stdin)                        |
|                                                                  |
|  BUILTINS:                                                       |
|    __builtin_popcount(x)  -> # of 1-bits                        |
|    __builtin_clz(x)       -> leading zeros  (UB if x==0)        |
|    __builtin_ctz(x)       -> trailing zeros (UB if x==0)        |
|    Add 'll' suffix for long long versions                        |
|                                                                  |
|  MACROS:                                                         |
|    #define int long long    (use int32_t main)                   |
|    #define all(x) (x).begin(),(x).end()                         |
|    #define sz(x)  (int)(x).size()                                |
+------------------------------------------------------------------+

Gotchas and Debugging

Mixing C and C++ I/O After Disabling Sync

This is the #1 bug from fast I/O:

cpp
ios::sync_with_stdio(false);
scanf("%d", &n);  // BUG: using C I/O after disabling sync
cin >> m;          // reads from a different buffer -- data lost/corrupted

After sync_with_stdio(false), pick one style and stick with it. Use cin/cout for everything, or scanf/printf for everything. Never both.

The getline Trap

cpp
int n;
cin >> n;
string s;
getline(cin, s);  // BUG: reads the leftover '\n', s is empty!

Fix: add cin.ignore() or cin >> ws before getline.

cpp
int n;
cin >> n;
cin.ignore();      // consume the '\n' left by cin >> n
string s;
getline(cin, s);   // now reads the actual next line

endl in Tight Output Loops

cpp
for (int i = 0; i < 1000000; i++) {  // invariant: a[0..i-1] already printed
    cout << a[i] << endl;  // 10^6 flushes = TLE
}

Replace with '\n'. This alone can take a solution from TLE to AC.

Forgetting int32_t main() With #define int long long

cpp
#define int long long
int main() {   // expands to: long long main() -- INVALID!

The fix: always use int32_t main() when you have this define.

Pragma Doesn't Fix Wrong Algorithms

Pragmas give a constant-factor speedup. They won't save an O(n2) solution when you need O(nlogn). Fix your algorithm first, then add pragmas as insurance. See Complexity Analysis to understand why constant-factor gains can never overcome a worse complexity class.

__builtin_clz(0) Is Undefined Behavior

cpp
int x = 0;
int lz = __builtin_clz(x);  // UB! Could return anything or crash

Always guard with if (x != 0) or use C++20's std::countl_zero which returns the bit width for zero input.

freopen Path Issues

On your local machine, freopen("paint.in", "r", stdin) looks for the file relative to where you run the program, not where the source file is. If you get "no such file," check your working directory.

Large Arrays With #define int long long

cpp
#define int long long
int a[10000000];  // 80 MB! Might exceed memory limit (usually 256 MB)

Without the define, int a[10000000] is only 40 MB. For very large arrays, consider removing the #define or using int32_t explicitly for the array:

cpp
#define int long long
int32_t a[10000000];  // still 40 MB

Mental Traps

The Mistake That Teaches: During a Codeforces round, I solved problem C in 12 minutes—an O(n) solution for n106. I submitted confidently. TLE on test 3. I spent 40 minutes rewriting the algorithm three different ways, each one TLE. With 5 minutes left, I added ios::sync_with_stdio(false); cin.tie(nullptr); and changed endl to '\n'. Accepted in 62 ms. The bottleneck was never the algorithm—it was the I/O. Now those two lines live in my template, and I never think about them again.

Trap 1: "My algorithm is fast, so I/O time is negligible"

For small n this holds. For n above 106, I/O can dominate the total runtime.

text
  WRONG -- small n             RIGHT -- n past 1M
  Time --->                    Time --->
  +----------+--+              +----+----------+
  |  Algo    |IO|              |Algo|    IO    |
  |##########|  |              |####|##########|
  +----------+--+              +----+----------+
   ^ dominant                        ^ dominant

Always add the two fast I/O lines. Two lines of code, zero risk.

Trap 2: "Pragmas will rescue my O(n2)"

Pragmas give a constant-factor speedup. They cannot change algorithmic complexity.

text
  WRONG -- pragma saves n*n    RIGHT -- fix algorithm
  n 100k  10^10 ops            n 100k  n log n
  +--------------------+       +--+
  |####################| TLE   |##| AC
  +--------------------+       +--+
       |
       | pragma /4x speedup
       v
  +----------+
  |##########| still TLE
  +----------+

Trap 3: "sync_with_stdio(false) speeds up all I/O"

It only decouples C++ streams from C streams. C library functions like scanf and printf are unaffected.

text
  WRONG                         RIGHT
  All IO gets faster            Only cin and cout faster
  +--------+------+             +--------+------+
  | cin    | FAST |             | cin    | FAST |
  | cout   | FAST |             | cout   | FAST |
  | scanf  | FAST | <-- no      | scanf  | same |
  | printf | FAST | <-- no      | printf | same |
  +--------+------+             +--------+------+

IO Debugging

When your output looks correct but the judge says Wrong Answer, the problem is often invisible whitespace or flushing—not your algorithm. This section covers the tools for hunting down I/O bugs.

See also: Debugging and Stress Testing for general debugging strategies beyond I/O.

Using cerr for Debug Output

cerr writes to standard error, which online judges ignore. You can leave debug prints in your submission and they won't affect your answer:

cpp
int n;
cin >> n;
cerr << "DEBUG: n = " << n << '\n';  // judge ignores this
// ... solve ...
cout << answer << '\n';              // judge reads this

Tip: Wrap debug output in a macro so you can disable it with one toggle:

cpp
#ifdef LOCAL
    #define dbg(x) cerr << #x << " = " << (x) << '\n'
#else
    #define dbg(x)  // expands to nothing on the judge
#endif

Compile with g++ -DLOCAL solution.cpp locally to see debug output.

Testing I/O With File Redirection

Instead of typing input every time, save it to a file and redirect:

bash
# Create input file
echo "3
1 2 3" > input.txt

# Run with redirected input
./solution < input.txt

# Compare output against expected
./solution < input.txt > output.txt
diff output.txt expected.txt
text
  Terminal workflow
  +----------+      +-----------+      +----------+
  | input.txt|----->| ./solution|----->|output.txt|
  +----------+  <   +-----------+  >   +----------+
                                            |
                                       diff v
                                       +----------+
                                       |expected  |
                                       +----------+

Common I/O Debugging Mistakes

Reading past end of input: If your code tries to read more values than the input provides, cin enters a fail state and all subsequent reads return 0/empty.

cpp
// Input has 3 numbers: "1 2 3"
int a, b, c, d;
cin >> a >> b >> c >> d;  // d reads 0 -- cin is in fail state
// All further cin >> operations silently fail

Mixing cin and scanf: After sync_with_stdio(false), the C and C++ streams use separate buffers. Data read by one is invisible to the other.

cpp
ios::sync_with_stdio(false);
int x, y;
scanf("%d", &x);  // reads from C buffer
cin >> y;          // reads from C++ buffer -- different data!

Extra spaces or newlines in output: Some judges require exact output format. A trailing space or missing newline can cause Wrong Answer.

cpp
// WRONG -- trailing space after last element
for (int i = 0; i < n; i++) {  // invariant: i elements printed so far
    cout << a[i] << ' ';
}
cout << '\n';

// RIGHT -- no trailing space
for (int i = 0; i < n; i++) {  // invariant: i elements printed so far
    if (i > 0) cout << ' ';
    cout << a[i];
}
cout << '\n';

Interactive Problem IO

Interactive problems are a special category where your program communicates with the judge in real time: you send a query, the judge responds, and you use that response to decide your next query.

How Interactive Problems Work on Codeforces

text
  Your program                Judge
  +----------+               +----------+
  | cout <<  |---"? 5"------>| reads    |
  | query    |               | query    |
  +----------+               +----------+
                                  |
  +----------+               +----------+
  | cin >>   |<--"1"---------|  sends   |
  | response |               | response |
  +----------+               +----------+
  |          |               |          |
  | cout <<  |---"! 7"------>| checks   |
  | answer   |               | answer   |
  +----------+               +----------+
  1. You print a query (e.g., "? 5") and flush the output
  2. The judge reads your query and prints a response
  3. You read the response and decide the next query
  4. When you know the answer, print "! answer" and flush

The Flush Requirement

This is the single most common mistake in interactive problems. If you don't flush after printing, your output sits in a buffer and the judge never sees it—your program hangs waiting for a response that never comes, and you get TLE or "Idleness limit exceeded."

cpp
// Method 1: Use endl (it writes '\n' AND flushes)
cout << "? " << mid << endl;

// Method 2: Use cout.flush() explicitly
cout << "? " << mid << '\n';
cout.flush();

// Method 3: Tie cout to cin (auto-flush before each read)
// Note: cin.tie(&cout) is the DEFAULT behavior, so if you
// called cin.tie(nullptr), you need to re-tie for interactive:
cin.tie(&cout);

Example: Binary Search Interactive Pattern

Many interactive problems follow a binary search pattern—you have a limited number of queries to find a hidden value.

cpp
int32_t main() {
    ios::sync_with_stdio(false);
    // Do NOT untie cin from cout in interactive problems!
    // cin.tie(nullptr);  <-- OMIT this line

    int lo = 1, hi = 1000000;
    while (lo < hi) {          // invariant: answer is in [lo, hi]
        int mid = lo + (hi - lo) / 2;
        cout << "? " << mid << endl;  // query + flush

        int response;
        cin >> response;

        if (response == 1) {
            hi = mid;          // answer <= mid
        } else {
            lo = mid + 1;     // answer > mid
        }
    }
    // invariant: lo == hi, we found the answer
    cout << "! " << lo << endl;  // final answer + flush
    return 0;
}

Interactive Problem Pitfalls

PitfallSymptomFix
Forgetting to flushTLE or "Idleness limit exceeded"Use endl or cout.flush() after every query
Using cin.tie(nullptr)Queries don't reach the judgeOmit cin.tie(nullptr) or re-tie with cin.tie(&cout)
Reading too muchProgram hangsOnly read exactly what the judge sends
Printing debug to coutJudge interprets it as a queryUse cerr for all debug output
Using '\n' instead of endlOutput stays bufferedSwitch to endl for interactive

scanf vs cin Benchmark Comparison

Benchmarks reading 106 random integers, compiled with g++ -O2, averaged over 10 runs. Times vary by machine; the ratios are what matter:

MethodTime (ms)RelativeCode
cin >> x (default sync)~7501.0xcin >> x;
cin >> x (sync off, still tied)~1305.8x fastersync_with_stdio(false) only
cin >> x (sync off + untied)~1007.5x fasterBoth lines in template
scanf("%d", &x)~1057.1x fasterNo setup needed
Custom getchar reader~3025x fasterManual digit parsing
getchar_unlocked reader~1842x fasterNon-portable, Linux only

When Each Method Matters

  • n105: Any method works. Don't worry about I/O speed.
  • n106: Default cin will TLE on 1–2 s time limits. Use the two-line template (sync off + untied) or scanf. This covers nearly all problems.
  • n5×106, tight TL: Consider a custom reader. This is rare—perhaps a handful of problems per year on CF.
  • n107: Custom reader is almost mandatory. See cp-algorithms I/O guide for implementations.

The untying (cin.tie(nullptr)) matters independently of sync. Even with sync off, a tied cin still forces a flush before every read—overhead you don't notice until n hits 106.


What Would You Do If...?

Scenario 1: You submit a correct O(n) solution with n106 and get TLE. Your algorithm is provably optimal. What do you check?

Check in order: (1) Are the two fast I/O lines present? (2) Are you using endl anywhere in a loop? (3) Are you mixing cin with scanf? (4) Is #define int long long causing cache issues on huge arrays? If all else fails, try scanf/printf or a custom reader.

Scenario 2: Your interactive submission gets "Idleness limit exceeded" on test 1 but works perfectly locally.

You're almost certainly not flushing. Check: (1) Are you using endl or cout.flush() after every query? (2) Did you accidentally include cin.tie(nullptr) in your template? For interactive problems, remove that line or re-tie with cin.tie(&cout).

Scenario 3: Your solution reads a string with spaces but getline returns an empty string on the first call.

The classic getline trap. You read an integer with cin >> n before getline, and the leftover '\n' is consumed by getline instead of the actual line. Add cin.ignore() or cin >> ws between the cin >> and getline calls. See C++ Essentials for more on string I/O patterns.


When NOT to Use Fast I/O Tricks

Not every technique in this file is appropriate for every situation:

TechniqueWhen to Skip ItWhy
#pragma GCC optimizeProduction code, team projectsNon-portable, obscures real performance issues
#pragma GCC target("avx2")Unknown judge hardwareOld CPUs may not support AVX2—causes RE
#define int long longArrays with n>5×106Doubles memory; can cause MLE
cin.tie(nullptr)Interactive problemsJudge never sees your queries without flush
sync_with_stdio(false)Code that must mix cin with scanfSeparate buffers cause data corruption
Custom getchar readern106Unnecessary complexity; standard I/O is fast enough
bits/stdc++.hProduction code, portabilityGCC-only header, massive compile time

Pattern Fingerprint: Constraint says n105 with 2 s TL → I/O speed is irrelevant; focus on algorithm. Constraint says n106 with 1 s TL → use template fast I/O. Constraint says n107 with 3 s TL → consider a custom reader.


Debug This Exercises

Each snippet has a subtle I/O bug. Try to spot it before revealing the answer.

Exercise 1: The Silent Fail

cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int n;
    scanf("%d", &n);
    vector<int> a(n);
    for (int i = 0; i < n; i++) cin >> a[i];  // invariant: a[0..i-1] read
    cout << *max_element(a.begin(), a.end()) << '\n';
}
Bug and Fix

Bug: scanf is used after sync_with_stdio(false). The C and C++ streams now use separate buffers, so scanf reads n from the C buffer, but cin reads array values from the C++ buffer—which may be empty or contain different data.

Fix: Replace scanf("%d", &n) with cin >> n, or remove the sync_with_stdio(false) line.

Exercise 2: The Hanging Interactive

cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int lo = 1, hi = 1000;
    while (lo < hi) {
        int mid = (lo + hi) / 2;
        cout << "? " << mid << '\n';
        int resp;
        cin >> resp;
        if (resp <= mid) hi = mid;
        else lo = mid + 1;
    }
    cout << "! " << lo << '\n';
}
Bug and Fix

Bug: Two problems. (1) cin.tie(nullptr) means cout is NOT flushed before cin >> resp, so the judge never sees the query—the program hangs. (2) Using '\n' instead of endl means the output stays buffered.

Fix: Remove cin.tie(nullptr) and change '\n' to endl (or add cout.flush() after each query). The final answer line also needs flushing.

Exercise 3: The Phantom Newline

cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int t;
    cin >> t;
    while (t--) {
        string s;
        getline(cin, s);
        cout << s.length() << '\n';
    }
}
Bug and Fix

Bug: After cin >> t, a '\n' remains in the buffer. The first getline reads this empty line, so s is empty and its length is 0. Every subsequent test case reads the previous test case's input line.

Fix: Add cin.ignore() or cin >> ws after cin >> t (before the loop). If each test case is a single line, cin.ignore() once before the loop is sufficient. If there might be blank lines in input, use cin >> ws.

Exercise 4: The Slow Surprise

cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
    int n;
    cin >> n;
    vector<int> a(n);
    for (int i = 0; i < n; i++) cin >> a[i];  // invariant: a[0..i-1] read
    sort(a.begin(), a.end());
    for (int i = 0; i < n; i++) {  // invariant: a[0..i-1] already printed
        cout << a[i] << endl;
    }
}
Bug and Fix

Bug: Two issues. (1) Missing ios::sync_with_stdio(false) and cin.tie(nullptr)—reading 106 ints is several times slower than necessary. (2) Using endl in the output loop causes 106 flush syscalls.

Fix: Add the two fast I/O lines at the top of main(), and replace endl with '\n'. Together, these can speed up the program dramatically.


Self-Test

  • [ ] Write the two-line fast I/O setup from memory and explain what each line does
  • [ ] Explain why endl is slower than '\n' and estimate the magnitude of difference
  • [ ] State what happens if you use scanf after calling ios::sync_with_stdio(false)
  • [ ] Name the conditions under which a custom getchar_unlocked() reader is worth writing
  • [ ] Describe what #pragma GCC optimize("O3") actually does and why it cannot fix an algorithmically slow solution
  • [ ] Explain why int32_t main() is needed when using #define int long long
  • [ ] Describe the one situation where endl IS the correct choice over '\n'

Practice Problems

Work through these in order. Each one will TLE with naive I/O, or has constraints that make fast I/O important.

#ProblemSourceDifficultyFocus
1Sum of Round NumbersCF 1352A800Multi-test-case I/O
2Short SortCF 1873A800Many test cases (t104)
3Odd One OutCF 1915A800XOR / builtins practice
4CSES -- Weird AlgorithmCSESEasyBasic I/O
5CSES -- Missing NumberCSESEasyRead 2×105 ints
6CSES -- Bit StringsCSESEasyModular arithmetic + I/O
7Problemsolving LogCF 1914A800Large total input
8USACO 2020 Dec Bronze -- Do You Know Your ABCs?USACOBronzeFile I/O with freopen
9CSES -- Two SetsCSESEasyOutput-heavy
10Guess the NumberCF 1167B11200Interactive (must flush!)

Rating Progression

How I/O optimization knowledge maps to your competitive programming journey:

CF RatingWhat You Should Know
1200The two-line fast I/O setup is automatic in your template. You never use endl except interactively. You've been bitten by the getline trap at least once.
1500You recognize when a TLE is caused by I/O vs. algorithm. You know freopen for USACO. You use cerr for debug output and strip it before submitting.
1800You can write interactive problem I/O confidently with proper flushing. You understand the cost of #define int long long on memory and know when to avoid it. You've used pragmas to squeeze past tight time limits on correct solutions.
2100You can write a custom getchar_unlocked reader for extreme constraints. You understand buffer mechanics and can debug I/O issues by reasoning about stream state. You know exactly when pragmas help and when they're cargo-culting.

Further Reading


See also:

Next: Complexity Analysis—learn how to analyze why your algorithm is slow so you know whether the fix is better I/O or a better algorithm.

Built for the climb from Codeforces 1100 to 2100.