Tuesday, May 15, 2012

RTEMS Build System Ruminations

This post is a collection of ruminations after a recent post on the RTEMS Users mailing list. The post asked about a few  issues the user was having (italics for quotes):
  • my changes not "taking"
  • is there any way to limit what bootstrap operates on?  Since it takes quite a while to complete and most of the BSPs are of no interest to me, I would like to avoid bootstrapping them.
Gedare Bloom and Ralf Corsepius replied to the post and I thought it would be a good idea to take the time to write up some issues, workarounds, and benchmarks for various operations using the current build system on a git checkout with no modifications. Any build times quoted in this article were performed on a quad-core computer in the RTEMS Build Farm or my personal laptop with the following specifications:
  • RTEMS Build Farm Computer
    • Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
    • 4 GB RAM
    • Seagate 320 GB 7200RPM HDD (ST3320613AS)
    • Western Digital Caviar 1 TB 7200RPM HDD (WD1001FALS)
  • My Laptop
    • Intel(R) Core(TM)2 Duo CPU T7500  @ 2.20GHz
    • 4 GB RAM
    • Hitachi 160GB 7200RPM HDD (HTS722016K9A300)
The CPUs and disks in these computers are reasonably comparable in performance except that the build farm machine is quad-core. I would expect that the quad-core machine is somewhat faster in single core straight line computation than the clock speed indicates.But I would not expect a 2-3x speed difference in single core performance.

I will be the first to admit that neither of the above computers is the fastest one available today. The fact that one could spend money and get faster computers is important. These are reasonable computers and not obsolete. In fact, when I look at potential laptop upgrades, I am still surprised that my old laptop's CPU is rated much faster than those found in many available today. You have to move to a higher end laptop to beat that. When teaching RTEMS classes, I see attendees with computers that are both faster and much slower than mine. And performance is likely to be much worse in Cygwin or virtual machines than either of the two computers above.

The first issue was my changes not "taking". I personally always configure RTEMS with the --enable-maintainer-mode option which is documented as follows:
--enable-maintainer-mode  enable make rules and dependencies not useful
                          (and sometimes confusing) to the casual installer
This tends to ensure that any changes to configure.ac and Makefile.am files are taken into account in a build tree and the appropriate files regenerated. There are limits to this in that if you changed a compiler flag then it will not result in everything being recompiled. However, if you change the way in which a configure option is interpreted and that is propagated into cpuopts.h or bspopts.h, it should result in impacted files being recompiled.

But what if a .h file changes? Based upon my experiment adding a one line comment to confdefs.h, every test "init file" was recompiled and every test executable was relinked. This is as expected.

Now let's consider changing a C file in cpukit. As an experiment, I added a one line commit to cpukit/sapi/src/exinit.c. This file contains rtems_initialize_data_structures() and thus every RTEMS application is dependent on this object file. The library librtemscpu.a was properly updated but no test was relinked. This untracked dependency is one possible explanation for my changes not "taking".

What is a C file in the BSP changes? As another experiment, I added a one line comment to c/src/lib/libbsp/shared/bootcard.c. Just as in the cpukit experiment, this file is required by every RTEMS application. The file librtemsbsp.a was properly updated but no test was relinked. This untracked dependency is another possible explanation for my changes not "taking".

Gedare makes a point of stating that long-time RTEMS developers know these deficiencies and work around them. Personally, I often find something like this in my command history:
rm -f `find . -name "*.exe"` ; make >b.log 2>&1 ; echo $?
That command ensures that tests are relinked. It covers up the fact that the library dependency is not properly tracked.

The first part of the second issue was is there any way to limit what bootstrap operates on? The solution to this is to only run bootstrap from the lowest level directory that has a configure.ac that you modified or added. Often this is just a single BSP or when initially adding a BSP, the directory c/src/lib/libbsp/ just above your new BSP. Gedare Bloom answered this quite thoroughly and I am just going to quote his answer:
Except when you add new files / modify configure.ac/Makefile.am files you need to re-run bootstrap at the closest level to your modified Makefile.am/configure.ac that contains a configure.ac file. 
So for example if you add a .c file into say cpukit/score/src then the file needs to be added to cpukit/score/Makefile.am and then you need to re-run bootstrap from cpukit because that is the closest parent directory with a configure.ac in it. For BSPs usually you just have to deal with the libbsp/CPU/BSP directory.
In order to run bootstrap from there I use a shell variable that points to my RTEMS root directory ($r) so that I can just ... "cd cpukit ; $r/bootstrap"
The second part of the second issue -- Since it takes quite a while to complete and most of the BSPs are of no interest to me, I would like to avoid bootstrapping them. -- is more complicated. When you initially clone the RTEMS git repository, you have to bootstrap the entire tree. A full bootstrap takes a long time and appears to be very single threaded. On the build farm machine described above, this takes 5m18.331s of user time and 0m48.900s of system time for a total of about 6 minutes to complete.On my laptop, this took 7m56.394s of user time and 1m6.840s of system time for a total of about 9 minutes.  Having a quad-core CPU does not help. The bootstrap process has not significantly improved in time in years. I recall various computers used over the years for RTEMS developing taking from 5 to 12 minutes to execute a complete bootstrap. And this time is much longer on Cygwin due to the inefficiency of the way it must implement POSIX process forking on MS-Windows.

Another thing to note is that bootstrap -p ONLY has to be run when you have modified a Makefile.am and changed the set of header files it installs. This generates the preinstall.am files. It does not need to be run after cloning the RTEMS repository because the preinstall.am files are checked into git. Many people run it more than it needs to be run. 

The need to bootstrap and git branches do not get along as well as one would hope.  As Ralf Corsepius explains in the post in the thread:
One final advise: Do not switch git-branches in git checkouts. As git does not preserve timestamps, while make and the autotools are relying on timesstamps, this will break time-stamps on generated files and eventually result in havoc - You (need) a toplevel bootstrap with each "git branch checkout".
To avoid this, my advise is to use multiple checkouts instead.
I note that even with --enable-maintainer-mode enabled, my experience is that you do often get stuck bootstrapping from the top of the tree when switching branches. The builds will end with a cryptic message in the output. This is a serious hindrance to using git. The typical git usage pattern does not include having multiple clones for different purposes. This is what branches are designed for.

How long does it take to build RTEMS? The answer to this question depends on a lot of factors including the obvious like the computer you are using and the not so obvious such as how you configured RTEMS and did you use the -j option to make to enable parallel jobs.  If you configure RTEMS to include all of the tests, then the build time is significantly longer since there are 399 total tests to compile and link. If you enable only sample tests, then this number drops to 13. The execution of the configure command itself is not a huge factor in build times taking only about 4 seconds on my laptop. It is the actual make that takes so long. The make actually results in a lot of configuration being performed. On my laptop, I got the following times for configure and make when POSIX, TCP/IP, and all tests enabled for sparc/sis (forgive the bad line wrapping):
$ time ../rtems/configure --target=sparc-rtems4.11 --prefix=/home/joel/rtems-4.10-work/bsp-install/ --disable-multiprocessing --enable-cxx --disable-rdbg --enable-maintainer-mode --enable-tests --enable-networking --enable-posix --disable-deprecated --disable-ada --enable-expada --enable-rtemsbsp=sis >c.log 2>&1
real 0m12.511s
user 0m1.970s
sys 0m2.138s
$ time make -j3 >b.log 2>&1
real 10m1.806s
user 8m9.319s
sys 2m8.838s
Building all tests on the quad-core computer at -j7 resulted in a build time of approximately 5 minutes. Given the large number of tests, this indicates that there is opportunity to take advantage of multiple cores during a full build of RTEMS.

Building only the sample tests (e.g. --enable-tests=samples) on my laptop, resulted in a build time of 3m4.420s real time with system and user coming close to adding up to real time. That means 2/3 of the build time is compiling and linking the tests.

How much of the make time is actually configuration? After posting this, I was asked privately how much of the make is spent in configuring versus compiling. To answer this question, I found the first file in RTEMS compiled for the target (e.g. cpukit/score/cpu/sparc/cpu.c in this case) and introduced a compilation error. Then I manually fixed the compilation error and invoked make again. The second make invocation is likely verifying that the configuration didn't change so configuration overhead didn't go to zero but it is close enough.
$ time make -j3 >b1.log 2>&1
real 1m38.264s
user 1m8.027s
sys 0m13.357s
$ time make -j3 >b2.log 2>&1
real 1m29.903s
user 1m56.809s
sys 0m24.112s
I repeated this experiment on the quad-core build farm machine and got the following results:

$ time make -j7 >b1.log 2>&1
real 0m36.652s
user 0m10.918s
sys 0m10.383s
$ time make -j7 >b2.log 2>&1
real 0m50.618s
user 1m37.673s
sys 0m27.296s
Looking at the above, it is pretty clear the the configuration part of make is a significant portion of the entire build time. On my laptop it was slightly over half, while on the quad-core computer, it was about 40%. It  also appears the the configuration stage is unable to take advantage of multiple cores as user plus system time are less than the real time in both cases. On the quad-core, it took nearly three times more real time than CPU time which likely indicates that it is I/O bound. 

In contrast, the build portion of the make command's actions are clearly parallelizable. On the dual core laptop, the approximately 90 seconds spent in the second step used 120 seconds of CPU time. This indicates that there both cores were utilized for about 2/3 of the build time. On the quad-core machine, we see about 51 seconds of real time consuming 135 seconds of CPU time for about 2/3 utilization again. The build time was reduced about 45% by moving from from the dual core to the quad-core computer.


I am personally a proponent of continuous integration and testing. It would be a boon to the RTEMS Project if there were a buildbot and system to get build and test execution feedback on every commit/ Even better would be able to get this feedback before the patch is officially committed. Considering that building all source for one BSP with all tests takes 5 minutes on a reasonable quad-core computer and NO TESTS WERE RUN, one can see the challenge.There are approximately 145 BSPs in the tree currently when one considers variants. On this computer, it would take over 12 hours to build all BSPs and tests. This assumes a fresh checkout and a single bootstrap. If you did that for each BSP, the build would take over 24 hours without executing any tests. Add in a test build of each target in multilib configuration and documentation, and that time goes up even further. That is in a SINGLE CONFIGURATION -- this does not include verifying that BSPs build with and without networking or POSIX enabled. 

This is completely unacceptable for a continuous integration and test effort. According to via.ca, we have had an average of 2.34 hours between commits since moving to git. No single solution will allow us to have a fast enough turn around on build and testing. In order to achieve a turn around under 2.34 hours, we will have to address the speed of the bootstrap, speed of the build process, distribution of building and testing, be smart about only building and testing areas impacted, and ultimately throw more hardware at the problem.

As final food for thought, this is just for RTEMS itself. This does not account for the testing that should be done on the GNU tools we rely upon (e.g. binutils, gcc, gdb, and newlib). A full build and test cycle for all targets  can take up to 4 days on the same quad-core computer. This time can vary based upon the languages being built and tested but GCC simply has a lot of tests. 

4 comments:

  1. Just to follow up a bit. On the quad-core computer, a "bit_ALL" run took almost 26 hours.

    All started at: Mon May 14 13:43:01 CDT 2012
    All stopped at: Tue May 15 15:25:44 CDT 2012
    72841.34user 27681.81system 25:42:45elapsed 108%CPU (0avgtext+0avgdata 69224maxresident)k
    30781440inputs+967626752outputs (17860major+9178472898minor)pagefaults 0swaps

    ReplyDelete
  2. Any chance to see the GNU Autotools go away?

    ReplyDelete
  3. Hi everyone, i am trying t port RTEMS on a beaglebone black for a school projetc. The toolchain build correctly, my environment is sane but when i use the make command i have this error : 'cannot open dl.tar for reading'. I tried to chmod the directory but it didn't change anything. It could be very nice if someone could help me on that point.

    Thanks !

    ReplyDelete