Big O Notation | Brilliant Math & Science Wiki (2024)

Big O notation is a notation used when talking about growth rates. It formalizes the notion that two functions "grow at the same rate," or one function "grows faster than the other," and such.

It is very commonly used in computer science, when analyzing algorithms. Algorithms have a specific running time, usually declared as a function on its input size. However, implementations of a certain algorithm in different languages may yield a different function. For example, an algorithm with input size \(n\) bytes, when implemented in C++, might take time \(n^2\) microseconds; but when implemented in Python, it might take time \(1000n^2 + 1000n\) microseconds due to it being a slower language. Thus, instead, we often talk about an algorithm's growth rate; whether the algorithm only takes time proportional to its input size (linear), or the square of the size (quadratic), or perhaps it doesn't depend on its input size (constant). This is formalized in big O notation, stating that the algorithm above runs in "quadratic time," or "\(O(n^2)\)."

Another common use is when talking about approximations in power series. A power series has many terms, usually infinite, but for the sake of approximations we often don't need too many terms. For example, the sine function has power series \(\sin x = x - \frac{x^3}{6} + \frac{x^5}{120} - \frac{x^7}{5040} + \cdots\); to approximate \(\sin 0.1\) to four digits, we most likely don't need terms with exponent greater than 4, because \(0.1^5 = 0.00001\) doesn't contribute to the fourth digit. We formalize this also with big O notation; we state "\(\sin x = x - \frac{x^3}{6} + O(x^5)\)," indicating that the terms afterwards are too insignificant to matter.

Intuitive Meaning
Formal Definition
Landau Notations
Properties
Use in Computer Science
See Also

Intuitive Meaning

Informally, if we have functions \(f,g\) such that \(f\) eventually grows slower than some multiple of \(g\), we say \(“f = O(g)."\)

For example, consider the functions \(f(n) = 1000n^2\) and \(g(n) = n^3\) as \(n \to \infty\). "Eventually," namely when \(n > 1000\), we see that \(f(n) < g(n)\), and thus \(f(n)\) grows slower than \(g(n)\). This means \(f = O(g)\), or \(1000n^2 = O(n^3)\).

In fact, since it says "some multiple of \(g\)," we also have \(1000n^2 = O(n^2)\); for some multiple of \(n^2\) \(\big(\)namely \(1001n^2\big),\) eventually \((\)in this case \(n > 1)\) we have \(f(n) < g(n)\).

The above examples talk when the input approaches \(\infty\), as often used in analysis of algorithms. (Algorithms can be used for arbitrarily large input size, and hence we talk about approaching infinity.) In approximations of power series, we often talk about functions as the input approaches a specific, finite value: for example, approaching 0. We can also use big O notation in this case.

Formal Definition

We formalize the informal definition above, mostly by clarifying what "eventually" and "slower" means.

If \(f(x), g(x)\) are functions on the real numbers, then we say \(f = O(g)\) or \(“f\) is big-oh of \(g"\) as \(x \to \infty\) if there exist some constants \(M, c\) such that \(|f(x)| \le c |g(x)|\) for all \(x > M\).
If \(f(x), g(x)\) are functions on the real numbers, then we say \(f = O(g)\) or \(“f\) is big-oh of \(g"\) as \(x \to a\) if there exist some constants \(\delta, c\) such that \(|f(x)| \le c |g(x)|\) for all \(0 < |x-a| < \delta\).

We can verify that

for \(f(n) = 1000n^2\) and \(g(n) = n^3\) as \(n \to \infty\), we pick \(M = 1000, c = 1;\)
for \(f(n) = 1000n^2\) and \(g(n) = n^2\) as \(n \to \infty\), we pick \(M = 1, c = 1001;\)
for \(f(n) = 1000n^2\) and \(g(n) = n\) as \(n \to 0\), we pick \(\delta = 0.001, c = 1.\)

Technically, the appropriate notation is \(f \in O(g)\), because the equal sign does not imply symmetry. (It is to be read as "is" in English, instead of "is equal to.") Thus it's more correct to say \(1000n^2 \in O(n^3)\) instead of \(1000n^2 = O(n^3)\). In this way, we treat \(O(g)\) as the set of all functions \(f\) satisfying the definition above. However, the equal sign is more common, and such is an accepted abuse of notation.

Landau Notations

Big O notation has a few related notions. If \(f, g\) are functions, then informally

\(f = O(g)\) (big-oh) if eventually \(f\) grows slower than some multiple of \(g;\)
\(f = o(g)\) (little-oh) if eventually \(f\) grows slower than any multiple of \(g;\)
\(f = \Omega(g)\) (big-omega) if eventually \(f\) grows faster than some multiple of \(g;\)
\(f = \omega(g)\) (little-omega) if eventually \(f\) grows faster than any multiple of \(g;\)
\(f = \Theta(g)\) (theta) if eventually \(f\) grows at the same rate as \(g.\)

Formally,

If \(f(x), g(x)\) are functions on the real numbers, then as \(x \to \infty\), we say
See Also
Big O Notation - A Comprehensive Guide
\(f = O(g)\) if there exist \(M, c\) such that \(|f(x)| \le c |g(x)|\) for all \(x > M;\)
\(f = o(g)\) if for any positive \(c\), there exists \(M\) such that \(|f(x)| \le c |g(x)|\) for all \(x > M;\)
\(f = \Omega(g)\) if there exist \(M, c\) such that \(|f(x)| \ge c |g(x)|\) for all \(x > M;\)
\(f = \omega(g)\) if for any positive \(c\), there exists \(M\) such that \(|f(x)| \ge c |g(x)|\) for all \(x > M;\)
\(f = \Theta(g)\) if there exist \(M, c_1, c_2\) such that \(c_1 |g(x)| \le |f(x)| \le c_2 |g(x)|\) for all \(x > M.\)
Similar definitions can be made for the \(x \to a\) case, by replacing \(x > M\) with \(0 < |x-a| < \delta\) \((\)and asserting existence of \(\delta\) instead of \(M).\)

Properties

There are a few properties involving Landau notations, all pretty simple to prove from the definitions:

\(f = \Theta(g)\) if and only if \(f = O(g)\) and \(f = \Omega(g)\).
\(f = O(g)\) if and only if \(g = \Omega(f)\).
\(f = o(g)\) if and only if \(g = \omega(f)\).
If \(f = \Theta(g)\), then \(g = \Theta(f)\).
If \(f = O(g)\) and \(g = O(h)\), then \(f = O(h)\). If either (or both) of the big-oh is actually a little-oh (\(o\)), then \(f = o(h)\).
If \(f = \Omega(g)\) and \(g = \Omega(h)\), then \(f = \Omega(h)\). If either (or both) of the big-omega is actually a little-omega (\(\omega\)), then \(f = \omega(h)\).
If \(f = \Theta(g)\) and \(g = \Theta(h)\), then \(f = \Theta(h)\). If any one (but not both) of the theta is replaced by another notation, then the conclusion uses the same notation; for example, \(f = O(g)\) and \(g = \Theta(h)\) implies \(f = O(h)\).
If \(f = o(g)\), then \(f = O(g)\).
If \(f = \omega(g)\), then \(f = \Omega(g)\).
We have \(f = O(f), f = \Theta(f), f = \Omega(f)\), but \(f \neq o(f), f \neq \omega(f)\).

Use in Computer Science

In computer science, we often encounter algorithms to analyze. The analysis most often asks for the running time of the algorithm, but it can also be memory consumption, stack depth, and other resources, often expressed as a function of the input size.

Unfortunately, implementations of algorithms will result in different resource consumption. As given in the introduction, an example is an algorithm which, when given an input size of \(n\) bytes, can be implemented in C++ to take \(n^2\) microseconds, but in Python it's implemented to take \(1000n^2 + 1000n\) microseconds due to Python being a slower language. Not to mention that if we state the algorithm in pseudocode, it's unclear how much time it takes; it might take "\(10n^2 + 100n\) lines" to run, but we don't know how many microseconds it takes to execute a line of pseudocode.

However, for the analysis of algorithms, the most important aspect of it is how it behaves with different input size. An algorithm might take time proportional to its input size; thus, doubling the input size makes it run twice as long. Or perhaps it's proportional to its input size squared, so doubling the input size causes it to run four times as long.

To ignore the differences in implementation, we often resort to its asymptotic growth rate. Given a resource consumption of \(f\), we find a simple function \(g\) such that \(f = \Theta(g)\). Different implementations will be hidden in the \(\Theta\); the focus is now on \(g\), whether it takes linear time \(\big(g(n) = n\big),\) or quadratic time \(\big(g(n) = n^2\big),\) or some other value. The behavior on different input sizes is still retained here; doubling the input value will result in the same behavior on \(g\) \(\big(\)doubled for \(g(n) = n\), four times for \(g(n) = n^2\), and so on\(\big).\) Thus from the view point of analysis of algorithms, we don't lose any information for simplifying \(f\) into \(g\), since both give the same behavior. This allows us to compare different algorithms according to their growth rates, and claim that one algorithm is "faster" than another: if we have two algorithms with resource consumptions \(f_1, f_2\), and we have \(f_1 = \Theta(g_1), f_2 = \Theta(g_2)\), then we say the first algorithm is faster than the second if \(g_1 = o(g_2)\).

Although it's much better to say \(f = \Theta(g)\), in practice computer scientists often say \(f = O(g)\) instead even if they mean the former. This is another abuse of notation; the other Landau notations aren't used very often, and sometimes analyzing the exact runtime is so difficult that they make some simplifying assumptions even if the result becomes slightly weaker.

Also in computer science, some algorithms give different behaviors when given different input. Analysis of algorithms often asks for the worst case and the average case of the running times; given different inputs of a specific size, the worst case is the longest time taken for a specific input, while the average case is the average time taken over all inputs. Sometimes the best case running time is also analyzed, but it's much rarer than the other two.

When analyzing runtime of an algorithm, there are many common growth rates that we have specific names for. (Memory consumption and such are generally less impressive, mostly either constant, logarithmic, or linear.) Here are some of them, from fastest to slowest. In all of these, \(n\) increases without bound. This is usually the input size (in bytes or bits or digits), but sometimes when the input is a natural number, we take \(n\) as the value of this number, or when the input is a list of elements, \(n\) is the number of elements without considering how large the individual elements are.

\(O(1)\): Constant-time algorithm runs in a constant time no matter how large the input is. For example, checking whether the first byte of a file is null (0x00) is constant time; no matter how large the file is, we only need to inspect the first byte. As another example, programs that ignore their input and compute the answer to a specific problem are also constant-time, even though this problem might take a very long time.
\(O(\log n\)): Logarithmic-time algorithm runs in time proportional to the logarithm of the input. The most common example is perhaps binary search: given a sorted array (with random access) of \(n\) elements, we can find whether a specific item is in the array or not by using binary search, dividing the search space by half by checking the middle element.
\(O(n)\): Linear-time algorithm runs in time proportional to the input. This can be good or bad: when \(n\) is the number of elements in an array of integers, radix sorting allows us to sort it in linear-time, a very good performance, but when \(n\) is a positive integer and we want to check whether it's prime, doing trial division over all numbers \(2, 3, \ldots, n-1\) is a poor performance.
\(O(n \log n)\): Often encountered in sorting algorithms, a linearithmic-time algorithm runs in time that's not particularly distinguishable from linear-time for "reasonable" input. Many comparison-based sorting algorithms (merge sort, heap sort, etc.) take linearithmic-time, because it has been proven to be the best running time possible for comparison-based sorting algorithms.
\(O(n^2)\): Quadratic-time algorithms take time proportional to the square of the input. Example algorithms that take this time are schoolbook multiplication of two \(n\)-digit numbers or two polynomials of degree \(n\) (faster algorithms exist, such as Karatsuba's algorithm), some slow sorting algorithms (selection sort and insertion sort), and solving the longest common subsequence problem using dynamic programming.
\(O(n^3)\): The most common cubic-time algorithms are perhaps the running times of schoolbook matrix multiplication (again, faster algorithms exist such as Strassen's algorithm), computing determinant using Gaussian elimination (which can be done using matrix multiplication) and finding a triangle in a graph of \(n\) vertices (which can also be done using matrix multiplication). Other than that, a cubic-time algorithm will likely have the structure of a loop through \(n\) values inside another loop of \(n\) values inside a third loop of \(n\) values; the three examples above naturally give rise to this structure, but it's uncommon to see any other.
\(O(a^n)\): For an exponential-time algorithm, increasing the input by one is enough to multiply the algorithm's running time (by \(a\)). Note that if \(a < b\), then \(a^n = o(b^n)\) (they are not asymptotically the same). An example of this is to evaluate whether a Boolean formula on \(n\) variables can be satisfied; trying each possibility requires \(2^n\) cases to be checked (multiplied by the time it takes to actually evaluate the formula).
\(O(n!)\): A factorial-time algorithm is even slower. Such algorithms often check every permutation of an array, in which there are \(n!\) of them.

You lost your wedding ring on the beach, and have no memory of when it came off. Thus, you decide to do a brute force grid search with your metal detector, where you divide the beach into strips, and walk down every strip, scanning the whole beach, until you find it. For simplicity, assume the beach is a square of side length \(l\) meters, each of your strips has a constant width of 1 meter, and it takes 10 seconds to walk 1 meter (it's hard to walk while searching). Find the big-oh performance of your ring finding algorithm.
beach scanning going on
The beach has size \(l \times l\). Since you divide it into strips of width 1 meter, you have \(l\) strips, each of length \(l\) meters. Thus it takes \(10l\) seconds to check each strip, and \(10l^2\) seconds to check the entire beach (plus some time taken to move to the next strip, but it shouldn't be very relevant). Thus our algorithm takes about \(10l^2\) seconds, which we can simplify with big-oh notation to \(O(l^2)\). \(_\square\)

Consider the following two algorithms to print all permutations of \(1, 2, 3, \ldots, n\) that are strictly increasing:
Algorithm 1: Check each permutation on whether it's sorted or not, and print each one that is sorted.
Algorithm 2: The only such permutation is \(1, 2, 3, \ldots, n\), so just print it.
Determine the running time (in big-oh notation) of each of the two algorithms.
Algorithm 2 is very simple. It takes \(O(n)\) time, because we're printing \(n\) numbers.
Algorithm 1, however, is not very simple. It inspects each permutation, in which there are \(n!\) of them. Each permutation requires \(O(n)\) time to check: we need to make \(n-1\) comparisons. Thus, the running time is \(n! \cdot O(n) = O(n \cdot n!) = O\big((n+1)!\big)\).
Algorithm 1 could be better. If we modify the checking part so it stops after finding an incorrect comparison (instead of continuing uselessly), this can improve the performance. It will also make the analysis much more complicated. Half of the permutations will stop at comparison 1; among half that passes, \(\frac23\) will stop at comparison 2; among the \(\frac16\) that passes, \(\frac34\) will stop at comparison 3, and so on. Thus, while in the worst case, we need to do \(n-1\) comparisons, over all permutations we actually need a total of
\[n! \cdot \left( \frac{1}{2!} \cdot 1^2 + \frac{1}{3!} \cdot 2^2 + \frac{1}{4!} \cdot 3^2 + \cdots + \frac{1}{n!} \cdot (n-1)^2 \right) + (n-1)\]
comparisons. This is pretty difficult to evaluate, but we can show the parenthesized expression tends to \(e-1 = O(1)\), so in fact the total number of comparisons is \(O(n!)\). \(_\square\)

FAQs

How do you answer Big-O notation? ›

In Big O notation, "O" represents the order of the function, and "f(n)" represents the function describing the algorithm's time complexity in terms of the input size "n." The notation "O(f(n))" signifies that the algorithm's time complexity grows no faster than a specific function of "n." Here, "f(n)" is a mathematical ...

Learn More ›

Is Big-O notation difficult? ›

Big O can get very complex, and it can be hard to conceptualize. The few things I was able to learn so far are really just the tip of the iceberg. However, once you understand why and how it works, it's easier to understand some of the more complex ideas.

What are the limitations of Big O analysis? ›

Big O analysis only tells us how the algorithm grows with the size of the problem, not how efficient it is, as it does not consider the programming effort. It ignores important constants. For example, if one algorithm takes O(n2 ) time to execute and the other takes O(100000n2 ) time to execute, then as.

Discover More Details ›

Is Big O the worst case? ›

Big O, also known as Big O notation, represents an algorithm's worst-case complexity. It uses algebraic terms to describe the complexity of an algorithm. Big O defines the runtime required to execute an algorithm by identifying how the performance of your algorithm will change as the input size grows.

Learn More Now ›

What is the big O notation for dummies? ›

Big-O notation is the language we use for talking about how long an algorithm takes to run (time complexity) or how much memory is used by an algorithm (space complexity). Big-O notation can express the best, worst, and average-case running time of an algorithm.

Learn More Now ›

What is the Big O in slang? ›

(slang) org*sm.

View Details ›

How to do Big-O notation proofs? ›

In a formal big-O proof, you first choose values for k and c, then show that 0 ≤ f(x) ≤≤ cg(x) for every x ≥ k. So the example from the previous section would look like: Claim 51 3x2 + 7x + 2 is O(x2). Proof: Consider c = 4 and k = 100.

Why is Big O commonly used? ›

Big O notation has two main areas of application: In mathematics, it is commonly used to describe how closely a finite series approximates a given function, especially in the case of a truncated Taylor series or asymptotic expansion. In computer science, it is useful in the analysis of algorithms.

What is k in Big-O notation? ›

If you're referring to the point in the video where he says "This operation has a run time of big O(k), where k represents the number of elements in the list that we're adding to our existing list." then it's essentially the same thing as O(n), or linear complexity.

What is Big O notation with example? ›

To understand what Big O notation is, we can take a look at a typical example, O(n²), which is usually pronounced “Big O squared”. The letter “n” here represents the input size, and the function “g(n) = n²” inside the “O()” gives us an idea of how complex the algorithm is with respect to the input size.

Get More Info ›

What is the limit formula for Big O notation? ›

For big O notation, the limit definition you provided is equivalent to finding a constant C such that |f(x)| <= C*g(x) for x >= x_0 .

What is the Big Oh ratio theorem? ›

Big-O f(n) = O(g(n)) iff f(n) does not grow any faster than g(n). In other words, f(n) ≤ Cg(n) for some constant C > 0 and for all n ≥ k, where k ≥ 0 is a constant.

Big O Notation | Brilliant Math & Science Wiki (2024)

Contents

FAQs

How do you answer Big-O notation? ›

What is the limit formula for Big O notation? ›