During the 2019 Google Summer of Code, I worked on a library proposed to be incorporated into Boost, Boost.Real. This library provides an interval representation of Real numbers. It attempts to solve the problem of unknown compounding errors, and representing arbitrary precision numbers. For more information on the motivation behind the library, I recommend looking at this paper, by my mentor, Damian Vicino. For details on the Real library, please see the documentation.
Include Google Benchmark submodule, and add a benchmark
Memory fixes, and large refactor. Replaces many raw ptrs with shared_ptr. Uses std::variant.
remove autogenerated documentation, add to .gitignore
adds benchmarks and fixes link issues
adds simplification of binary tree via distributive properties
The commit history was messed up at about this point by the base change merge, so, some of these diffs may not be entirely accurate. This is the last commit I made before the change: finished exponential, made changes to division. Note that the “same” commit on the current master branch is broken. There were a few minor commits in this time range, but they may be inaccurate:
precision changes, some tests failing
Then, after the base change merge:
Limits recursion for distribution, fixes benchmarks, and adds division to benchmarks
Add and edit documentation This is currently in a PR.
This varied a bit from my proposal, since I found out I was to be collaborating with another GSoC student, Sagnik Dey. We communicated through group emails including us and our mentors, Damian Vicino and Laouen Belloli. We split up the tasks. Here are the ones that I was assigned:
Sometimes, things do not go as planned. Fortunately, I was able to adapt and complete most of the tasks that I was assigned. I was additionally able to complete two other tasks (memory fixes and using std::variant).
boost::real::real a("2");
boost::real::real b("2");
for (int i = 0; i < 1000; i++) {
a += b;
}
In order to tell you more precise information on this problem, I pulled up the library as it was before I started working on it. When I did the case above, Valgrind showed the following:
==27358== HEAP SUMMARY:
==27358== in use at exit: 107,892,000 bytes in 1,998,000 blocks
==27358== total heap usage: 2,002,003 allocs, 4,003 frees, 108,180,712 bytes allocated
==27358==
==27358== LEAK SUMMARY:
==27358== definitely lost: 207,688 bytes in 1,997 blocks
==27358== indirectly lost: 107,402,216 bytes in 1,990,779 blocks
==27358== possibly lost: 207,036 bytes in 3,834 blocks
==27358== still reachable: 75,060 bytes in 1,390 blocks
==27358== suppressed: 0 bytes in 0 blocks
When I replaced a with b so that we have a += a;
, 100 iterations ate up all of my 16GB memory and my swap, freezing up my computer. I did a hard shutdown. I reduced the number of iterations so that I could get something from Valgrind. For i < 10
, Valgrind shows this:
==27226== HEAP SUMMARY:
==27226== in use at exit: 1,910,520 bytes in 35,380 blocks
==27226== total heap usage: 92,693 allocs, 57,313 frees, 5,077,972 bytes allocated
==27226==
==27226== LEAK SUMMARY:
==27226== definitely lost: 1,872 bytes in 18 blocks
==27226== indirectly lost: 1,908,648 bytes in 35,362 blocks
==27226== possibly lost: 0 bytes in 0 blocks
==27226== still reachable: 0 bytes in 0 blocks
==27226== suppressed: 0 bytes in 0 blocks
I felt this was unacceptable. The library leaked all memory, and used a copious amount of it. A user ought to be able to do many such trivial operations without such problems. So, I brought this up in the chain email, and I got the OK to work on this problem. After all, what’s the use of benchmarking when there is clearly such vast room for improvement? Currently (for this commit), Valgrind shows the following for the a+=b (top) and a+=a (bottom) problem for 1000 iterations:
==28857== HEAP SUMMARY:
==28857== in use at exit: 0 bytes in 0 blocks
==28857== total heap usage: 19,029 allocs, 19,029 frees, 297,392 bytes allocated
==28857==
==28857== All heap blocks were freed -- no leaks are possible
==28866== HEAP SUMMARY:
==28866== in use at exit: 0 bytes in 0 blocks
==28866== total heap usage: 289,492 allocs, 289,492 frees, 3,160,000 bytes allocated
==28866==
==28866== All heap blocks were freed -- no leaks are possible
Notice how little memory the a+=b operations take up! Half a MB. The a+=a operations take up more memory due to the fact each iteration essentially generates another real number. This is because we do not wish to alter the value of ‘a’ that is referenced. Also note that the bottom here is with 1000 iterations, taking up less memory than it used to in 10 iterations – a delightful improvement. Next, I’ll tell you how I did this.
A clear problem was the use of the new
keyword without any delete
. Further, the memory consumption problem was exacerbated by copies being made for every operation, and pointers being simply assigned to the newly created objects, leaking old ones. To avoid unnecessary copies and having to manually manage the memory, I used shared pointers. It is shared because of the binary tree structure of real (real_operation), where multiple nodes may point to (and own) one real_data object. I refactored things so that Real essentially acts as a wrapper for real_data. A Real does not own the real_data, but instead, points to it. When no Reals point to a real_data, the real_data gets deleted automatically by the magic of shared pointers. C++ has a very useful, extensive standard library. Features like this make it a lot easier to program things bug-free.
Sagnik was originally assigned the task of using std::variant, but I chose to do it because it would be significantly more efficient to do it along with my other changes that affect memory usage. I used visitors quite a bit. The errors with visitors and variant, I found, were somewhat obscure, and the syntax seemed very unintuitive. Luckily, cppreference.com had some good examples on implementing visitors.
After this refactor, a Real contains a shared pointer to real_data, which has a holds a variant of real_explicit, real_algorithm, and real_operation.
Originally, there was an object called boundary, and a header file called boundary_helpers.hpp, and to operate on boundaries, you had to use functions in boundary_helpers. It was rather tedious to have to do add_boundary(a,b,c)
instead of being able to simply do c = a + b
. I did this after the division task, due to the urgency of getting the division done so that Sagnik could work on his base change task. Having done this refactor beforehand would have saved me some time and frustration during that implementation, but alas, it will now help future developers save time.
During my refactor and memory changes above, Sagnik said he needed the division done before he could continue working. Thus, I quickly implemented something so that he could get to it. It was not the most efficient. I simply did a binary-search type method. I ended up doing small changes to it as I found problems. Currently, it makes the exponent 0 (which is the same as making the number .{digits}). The exponent is just added back later. Thus, the numbers we divide internally are always between 0 and 1. Then, it’s a matter of iterating. I essentially just keep a left bound, and a distance from it. The distance halves in each iteration, and we move that distance towards the correct answer, from the current guess.
There were some bugs with the initial implementation. I fixed them. They got broken in the base change merge, but Sagnik appears to have found the issue, just in time for the final evaluation.
I used Google Benchmark to implement benchmarks for the construction and evaluation of boost::real::reals. In particular, I let ‘n’ be a number between some min and max value, and bench a certain operation while varying the n, and it estimates the run time in O(n) notation. I examine the construction of varying sizes of compositions of reals (i.e., n iterations of a += b), as well as the construction reals with varying numbers of digits. I also bench the evaluation time of these reals (i.e., how long it takes to get the cend(), most precise interval).
Before the base change, I had noticed that construction took just about as long as evaluation. Calling cbegin() after instantiation (as in comparison operators) might be something we wish to change, because it throws away possibly useful data. However, things are different now with the base change, which has reduced the computational efficiency drastically.
For this, I made it so anything of the form (a*x)+(b*x) become (a+b)*x. Essentially, on addition or subtraction, I use a check_and_distribute function. Please see the documentation for the implementation details.
For e, I simply used the Taylor series expansion.
Unfortunately, my implementation for Pi was rather poor. I wanted to use the Chudnovsky algorithm, but it requires an algorithm for powers. I attempted to implement one, but was unable in the short time frame, and hence settled on simply hardcoding 2000 digits.
Recursion can get very expensive, very fast. For operations dealing with the binary tree, instead of recursively going through each node, we may find it useful to do some traversal while saving parent addresses in some vector.
Currently, a real_algorithm will do redundant calculations. If we simply want to iterate the precision by one, we end up doing a calculation from the 1st digit to the current digit, instead of a single iteration to get from the previous precision to the next. If we could allow users to define objects that save certain data in order to do this, then we would save a substantial amount of runtime.
Things take a lot longer to compute after the base change. Is there a way to speed this up? Or some alternative? The data type that exact_number is attempting to be will not be very fast, due to these sorts of things being faster doing them more on the CPU level, i.e., the optimizations that popular multiprecision libraries use. I brought up using boost::mpl and other such libraries earlier, but according to the response, these libraries do not throw on certain events (like overflow).
The irrationals found in irrational_helpers are now broken – they need to be converted to work with the base change.
When I added Pi, I used hardcoded digits. It would be preferable to be able to generate the digits. However, this requires a power function (see Chudnovsky algorithm).
I enjoyed the opportunity to work on this project with the guidance of such great mentors. I learned quite a bit. Damian shared some good learning material, and both mentors showed me new things. Damian’s code reviews and suggestions have helped me improve a lot. It was a fun summer! I appreciate everyone who was involved in making this possible.