34% Faster Integer to String Conversion Algorithm
Are we printing integers fast enough?
1. Introduction
In computer programming, converting given integer to a string is a common operation, which should be done for example before printing the integer to the screen, or printing it to any kind of textual file, such as *.xml, *.json, *.csv, *.txt, etc…
It is well known that integers (as well as everything else) are stored in computer memory in binary format — as sequences of 0s and 1s. For example:
- number 12 is represented in memory as “1100”,
- and number 29 is represented as “11101”.
This is the reason why such a conversion is needed every time, when we want to bring it into human-readable, decimal format.
In this story I am going to:
- make an overview of the standard algorithm used for such conversion,
- observe its existing optimizations,
- propose my algorithm, and
- present their experimental comparison.
We will see that on average, my algorithm runs 25–38% faster for 32-bit integers, and 40–58% faster for 64-bit integers, than the optimized standard algorithm. Its implementation in C++ language can be found on GitHub, as referenced at the end.
Of course, if the application prints only a few integers during its lifetime, the algorithm which is responsible for converting them to strings will never be the bottleneck. But for cases when the application prints tons of data into text files, the efficiency of the converting algorithm starts playing role. When working in fields such as Data Science or Machine Learning, the need for converting lots of integers into strings arises e.g. when exporting a dataset into a text file, such as *.csv or *.json.
2. The standard conversion algorithm
As converting integers to strings is a common operation, an algorithm for that is implemented in any modern programming language, either as part of the language itself or as part of its standard library. And the algorithm is almost everywhere the same — the one which is based on repeatedly obtaining and picking out the last digit of the integer, and continuing with its remaining part.
In order to obtain the last digit of given integer N, it just calculates the remainder of its division upon 10:
“digit := N mod 10”,
and in order to pick it out, it performs the integer division itself:
“N := N / 10”.
Note, in this story, when dividing 2 integers, we will assume that only the whole part of result is taken.
As an example of complete algorithm, when printing number “N = 2’167”, the following operations will be made:
Note, when we are dealing with 1-digit integer (i.e. from range [0..9]), we can directly send it for print, as corresponding characters are already fixed for each of those 10 digits. And a remainder of division upon 10 is always 1-digit integer.
Also we can note that this algorithm reports digits of N in reverse order (here we got sequence of digits ‘7’, ‘6’, ‘1’, ‘2’, instead of having ‘2’, ‘1’, ‘6’, ‘7’), so there is need to reverse the produced sequence at the end.
Summarizing that, its pseudo-code will be like this:
var result[0 .. 25] : Array of Characters // Assume at most 25 characters
// The procedure takes integer 'N' to be printed, and fills its
// decimal characters into 'result' array.
procedure print( N: Integer )
i := 0 // Index over 'result' array
while N > 0
result[ i ] := '0' + (N mod 10) // Take the last digit
N := ⌊ N / 10 ⌋ // Pick out the last digit
i := i+1
result[ i ] := ' ' // Append the terminating 'null' character
reverse array result[0 .. i-1]
The described algorithm is simple, and we can easily implement it in 3–4 lines of code. But its bottleneck is that it uses 2 relatively expensive operations — integer division and integer remainder calculation, for every digit of N’s decimal notation. It is well known that integer division and remainder calculation on average take 4–5 times longer, than addition, subtraction or even multiplication of 2 integers. Here we can observe time benchmarking of mentioned arithmetical operations:
The experiments were made with Google Benchmark, under the following system:
CPU: Intel Core i7–11800H @ 2.30GHz
RAM: 16.0 GB
OS: Windows 11 Home, 64-bit
Compiler: MSVC 2022 ( /O2 /Ob2 /MD /GR /Gd )
Let’s see if faster methods for integer printing exist…
3. Existing optimizations
Optimization 1
One common optimization for the described algorithm is in eliminating the last step of reversing produced sequence of digits. The trick is well presented for example in [1]. Within this optimization we will write digits in the buffer straightaway in their proper order. And as the algorithm itself reports digits of given integer N from right to left, so we also will write them in the buffer from right to left.
Pseudo-code with this optimization will look as follows:
var result[0 .. 25] : Array of Characters // Assume at most 25 characters
// The function takes integer 'N' to be printed, and returns position
// of its converted first character in the 'result' array.
function print( N: Integer ) : Integer
result[ 25 ] := ' ' // Place the terminating 'null' character at the end
i := 25 // Index over 'result' array
while N > 0
i := i-1 // Here we go to left, for placing the next digit
result[ i ] := '0' + (N mod 10) // Take the last digit
N := ⌊ N / 10 ⌋ // Pick out the last digit
return i // Position from where the converted integer starts
Note, in this and all other pseudo-codes within this story we are not handling the case of printing number “0”. According to all written algorithms, “0” will result as a sequence with no digits at all, and that is why in almost all printing algorithms, printing “0” is made in a separate branch. We will just skip that branch here for compactness.
Another small advantage of this optimization is that we are not required to write the terminating null-character after every conversion. Instead, we can write it only once in the last position of the buffer, as physically position of N’s last digit is fixed in advance, and it will always be the one-before-last position in the buffer.
The drawback of this optimization is that the position of the first character becomes variable, as it becomes dependent on number of digits that integer N has.
However, practically, this does not become a problem, because the converted integers are often promptly sent to a text file or to the screen, thus not remaining in memory for long. And for such purposes we do not need for the converted digits to be written starting from some exactly in advance specified position of the memory.
Optimization 2
Next optimization is about using integer division and remainder calculation operations to obtain 2 digits of N in a single step. This trick is also well documented in [1] and [2]. For this purpose, instead of repeatedly calculating
“digit := N mod 10”, followed by
“N := N / 10”,
we will calculate:
“digits := N mod 100”, followed by
“N := N / 100”,
which will give us the last 2 digits of N, and then will cut them both off.
Note, in order to eventually and efficiently print those obtained 2 digits, here we should have prepared an array of length 100 (with indexes from 0 to 99 — thus corresponding to all possible remainders “N mod 100”), where values will be pairs of characters, starting from “00”, “01”, “02”, … till “98”, “99”.
Within this optimization, count of integer division and remainder operations is reduced by almost 2 times.
Finalizing this part, I want to grab your attention to the fact that even with the described both optimizations enabled, we still do number of integer division and remainder calculation operations, proportional to the count of digits in given integer N.
4. My algorithm
I am going to propose another algorithm, which will accelerate integer printing by around 25–38% for 32-bit integers, and around 40–58% for 64-bit integers. The idea is — what if we pick digits out of given integer N not from right to left, but from left to right? So at first we will obtain its most significant digit, then the next significant digit, and so on, until only the least significant digit remains. Doing this becomes a bit difficult if we don’t know the count of digits of N in advance, but let us put that question aside for now, and assume that we already know that there are L digits in N.
How are we going to obtain the most significant digit then? Again using integer division, but this time as:
“digit := N / 10^(L-1)”
And how are we going to pick it out of N, in order to be able to continue with the remaining part? After knowing the value of the most significant digit is ‘d’, we can do the following subtraction:
“N := N — d*10^(L-1)”
Later we will repeat the division and subtraction operations, until N will become 1-digit integer (i.e. in range [0..9]), and finally will print that digit too. Let us view how the algorithm will work for case “N = 6’129”. Note, it has 4 digits, so here we start with “L=4”:
You might argue that calculating different powers of 10 is more time consuming than doing integer division or remainder calculation. And that will be absolutely correct except for one detail: we can precalculate all necessary powers of 10 and use them during program’s entire execution. For 32-bit integers, there are only 10 different powers of 10, and for 64-bit integers, there are 20 powers of 10. So keeping them all precalculated in memory will not be an issue.
So what do we have in overall? In order to print one digit of N with my algorithm we do:
1 integer division,
1 multiplication, and
1 subtraction,
compared to standard algorithm’s:
1 remainder calculation and
1 integer division.
In the next section we will see that my approach is actually better, because multiplication and subtraction together take less CPU time than remainder calculation. Experimental comparison of time-consumption of those arithmetical operations was presented in chapter 2.
Pseudo-code of the main part of my algorithm might look like this:
var powers_of_10[0 .. 10] : Array of Integers
= { 1, 10, 100, 1'000, ..., 100'000'000, 1'000'000'000 }
// Precalculated powers of 10, which will be used during print
var result[0 .. 25] : Array of Characters // Assume at most 25 characters
// The procedure takes integer 'N' to be printed, and fills its
// decimal characters into the 'result' array.
procedure print( N: Integer )
L := calculate_digits_count( N )
i := 0 // Index over 'result' array
while L > 0
digit := ⌊ N / powers_of_10[ L-1 ] ⌋ // Obtain left-most digit
result[ i ] := '0' + digit // Write it to the 'result' array
N := N – digit * powers_of_10[ L-1 ] // Calculate remaining part
L := L-1 // Adjust its count of digits accordingly
i := i+1
result[ i ] := ' ' // Append the terminating 'null' character
As my algorithm prints digits of N from left to right, I want to call it “Left-to-right printer” or shortly “LR printer”.
The one thing which remains yet is to efficiently find L — count of decimal digits of N. And luckily for us, the precalculated array of powers of 10 will help here too. We can just iterate over that array from small powers to the larger ones, until finding such power 10^L which will be greater than N. Then the exponent L will itself represent the count of digits in N.
For example, obtaining count of digits for “N = 23’504” will look as follows:
Pseudo-code of that function might look like:
// The function takes integer 'N' and returns count of its digits.
function calculate_digits_count( N: Integer ) : Integer
// Check case of numbers with maximal count of digits
if N >= powers_of_10[ 9 ] // Compare with maximal power of 10
return 10 // Count of digits for such numbers
// Regular case
L := 0
while N >= powers_of_10[ L ]
L := L+1
return L
With this 2 parts we are providing complete algorithm for converting integers to strings.
Note, as “LR printer” reports digits of N from left to right, there is no need to do any reversing at the end. Also, in contrast to existing optimization 1, here we keep the ability of specifying if where in memory the first digit of converted N should be placed.
“LR printer” can be used for printing numbers in any base (not just base 10). For doing so, we will only need to replace the precalculated powers of 10 with precalculated powers of the new base.
Implementation of “LR printer” in C++ language can be found on GitHub at [3].
Optimization 2 for “LR printer”
My algorithm can be enhanced with the second optimization described in “Existing optimizations” section, and documented in [1] and [2]. If done, then instead of printing the given number by 1 digit at a step, we will print it by 2 digits at a single step.
Let’s see how it will run for example on number “N = 4’610’937”. Here L=7, and we start by dividing N over 10^(L-2)=10’000 this time:
By enabling this, we will spend:
1 integer division,
1 multiplication, and
1 subtraction,
per 2 digits of the input number.
Here again, the digits will be obtained in their natural order — from left to right, so there is no need to reverse them at the end.
Implementation of “LR printer” with second optimization enabled can also be found on GitHub at [3].
5. Experimental comparison with existing algorithms
Doing experimental comparison is essential for this type of work, so in this chapter I will present results of comparison between the following integer-printing algorithms:
- the standard algorithm with first optimization (labeled as “Std”),
- my algorithm “LR printer” (labeled as “LR”),
- standard algorithm with second optimization too (labeled as “Std [2-dig]”), and
- “LR printer” with second optimization (labeled as “LR [2-dig]”).
Each of those algorithms is tested both on 32-bit and 64-bit integers, with different count of digits of the input numbers.
Printing numbers in base=10:
Results when printing in number base=10 (the ordinary case) are:
For 32-bit integers, we can see that the gain of “LR printer” compared to standard printer is around 30–38%. The gain when printing with second optimization (printing 2 digits at single step) is lower — 13–28%. This is totally expected, as overall we do only 2 or 4 steps in that case.
When it comes to printing 64-bit integers, performance of my algorithm is even better. “LR printer” runs around 40–50% faster than the standard algorithm. And with second optimization enabled for both, “LR printer” performs 47–58% faster.
Percentage in the title of this story was chosen by addressing the most regular case: when we are in base=10, working with 32-bit integers, and assuming they have many digits. For that case performance gain of “LR printer” over standard algorithm was 30–38%, so taking the average makes around 34%.
Printing numbers in base=3:
Let’s also see if the results will differ significantly when printing integers in another base. We will observe printing in number base=3:
As we can see here, for 32-bit integers performance gain of “LR-printer” over the standard algorithm is around 25–33%, which generally corresponds to the difference in performance of used arithmetical operations.
And for 64-bit integers performance gain of “LR-printer” is around 50–55% for short numbers (8 digits), and 27–30% for long numbers (36 digits).
Overall remarks
Generally, the base in which integers are printed doesn’t affect relative performance gain much, as the count of operations to be performed during print is proportional to the count of digits that the input numbers have, and not to the number of possible values that those digits might have.
Almost always it is the case that the greater the count of digits is, that more “LR-printer” (or “LR-printer [2-dig]” variation) will outperform the standard printing algorithm (or its “2-dig” variation). This is also clear, because the more digits we have, that less impact will have the out-of-loop instructions (like calling one function from another, or placing the null-terminating character).
And overall, when printing 64-bit integers, results are more impressive for both “LR-printer” and “LR-printer [2-dig]” variation.
Personally for me, those results as quite notable.
6. Conclusion
We have presented a new algorithm for converting integers to strings, and called it “LR printer”. It runs by 25–38% faster for 32-bit integers, and 40–58% faster for 64-bit integers, compared to the optimized standard conversion algorithm. Our algorithm can work in any number base (not only in ordinary base 10).
The algorithm which converts integers into strings is never a bottleneck for applications that print only a few numbers during their lifetime. But for other types of applications, which automatically generate text files such as *.csv, *xml or *.json, the efficiency of the conversion algorithm matters. This is especially the case if those text files are going to contain lots of numbers, as is the case when exporting large datasets.
Huge thanks for reading till the end! Will be glad to read any comments below!
I express my gratitude to David Ayrapetyan (https://www.linkedin.com/in/davidayrapetyan/), for careful reviewing the draft of this story, and proposing multiple contextual enhancements and grammatical corrections.
Gratitude to Hayk Aslanyan (https://www.linkedin.com/in/haykaslanyan/), for making technical review of the draft, and proposing other enhancements.
Illustrations design by Asya Papyan: https://www.behance.net/asyapapyan
If you enjoyed reading this story, you can find me on LinkedIn at: https://www.linkedin.com/in/tigran-hayrapetyan-88989b12/
References
[1] : “Integer to string conversion” — https://tia.mat.br/posts/2014/06/23/integer_to_string_conversion.html
[2] : “Three optimization tips for C++” — https://www.facebook.com/notes/10158791579037200/
[3] : “LR printer implementation in C++ language” — https://github.com/tigranh/lr_printer
34% faster Integer to String conversion algorithm was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
34% faster Integer to String conversion algorithm
Go Here to Read this Fast! 34% faster Integer to String conversion algorithm