Sunday, November 20, 2011

MINIX versus Linux versus BSD

This morning an article was posted to Slashdot in which Andrew Tanenbaum is interviewed.  One question and answer from the interview seemed to draw the most reaction on Slashdot.    LinuxFr.org asked: "If you could return in the past to change the MINIX original proprietary licence to the GPL licence, do you think your system might have become the dominant free OS today?".  Andrew Tanenbaum answered:

Never. The reason MINIX 3 didn't dominate the world has to do with one mistake I made about 1992. At that time I thought BSD was going to take over the world. It was a mature and stable system. I didn't see any point in competing with it, so I focused MINIX on education. Four of the BSD guys had just formed a company to sell BSD commercially. They even had a nice phone number: 1-800-ITS-UNIX. That phone number did them and me in. AT&T sued them over the phone number and the lawsuit took 3 years to settle. That was precisely the period Linux was launched and BSD was frozen due to the lawsuit. By the time it was settled, Linux had taken off. My mistake was not to realize the lawsuit would take so long and cripple BSD. If AT&T had not brought suit (or better yet, bought BSDI), Linux would never have become popular at all and BSD would dominate the world. 
Now as we are starting to go commercial, we are realizing the value of the BSD license. Many companies refuse to make major investments in modifying Linux to suit their needs if they have to give the code to their competitors. We think that the BSD license alone will be a great help to us, as well as the small size, reliability, and modularity.

My first UNIX experience was in the 1984-85 timeframe and my first job out of college was developing software for intelligent I/O controllers for a UNIX System V mini-computer.  I remember what commercial UNIX was like in those days. You may have wanted it but the options were limited and expensive.  On the 80286, you had XENIX.  When the i386 came out, there were a number of nice options including Interactive 386/ix and even SCO UNIX.  Yes SCO had a decent product back in the day.

When I could finally afford a computer capable of running some UNIX-ish system, I found myself in on the consumer side of what the question is about.

From my perspective, I didn't use MINIX because I viewed it as an educational and teaching OS. Its desired user base was not "real" users doing non-academic work.  We had experimented with it in our labs at work and found it quite primitive in comparison to the "real UNIX" we were used to.  Personally, I found the acquisition process painful as well.  But all UNIXy systems were painful to get back then.  The focus on educational users was it for me.  I don't ever want to be the "odd user" of anything who is not in the desired target audience for a product.

Why did I choose a Linux distribution over a BSD?  I honestly don't remember.  My vaguest recollection is that I preferred System V more than BSD systems and Linux leaned to System V.  I doubt this was a factor for most others though.  If I had to guess, I would go back and look at how you had to obtain it, community responses to newbies, etc.. Was the AT&T lawsuit have factor? Maybe. Linux was certainly perceived to be immune from that lawsuit among those I knew.  It did not suffer from that heritage.

Finally, I want to look back with the free software community building experience I have.  When viewed from this perspective and the prism of time, I think the answer has a lot to do with what we should have learned from Google Summer of Code. A project has to be easy to obtain, get started with, contribute to, have a vibrant and friendly community, etc.. The license is important but as long as it is imposes legal impediments or obligations, that won't stop most people.  In the old days, Minix was not really easy to obtain and was not focused on general use. It was not available as an impulse download. That was enough of a hurdle to stop a lot of folks.  

When one examines the choices faced by someone who wanted a UNIX-like system on their personal computer in the early 1990's, it is easy to see how Linux was the default choice.  It simply did not have the "targeted to teaching operating systems" stigma, was easy to obtain, and didn't have a lawsuit looming over its head.

But one of the nice things about free software is that if there is interest, a project will continue on.  MINIX 3 is a great OS that has a BSD-style license, is easy to obtain, and they are clearly interested in MINIX 3 being used for more than teaching operating systems design.  Variety is the spice of life.  I would recommend that you give it a try and tell them I sent you.

Thursday, November 10, 2011

Open Source and Generational Differences


It is time again for another entry from guest blogger Chris Johns. Chris and I have chatted and emailed a lot over the past few months about the issues in this post. They are tough because it is always hard to question your decisions and embrace change. But it is critical to do so on anything that is long-term in your life. RTEMS is a long-term software projects and we need to embrace self-examination and change.

Developers start projects to scratch itches or to bring about change. They join projects as users because they need to use a piece of software. They get involved because they to need to fix bugs or develop new features. The reasons are many, well documented and understood by those who work in or around open source software. What happens when a project becomes old enough that generational change is needed and those who start a project reach an age where they do not have the energy, mental capacity or desire is not well understood. As a project and its leadership age do they move from being intensive productive developers to mentors and governors of the project. Understanding this change is difficult as the interests and focuses of the newer generations are different and sometimes clash with the original developers yet both are right and neither are wrong. The primary function of the project maybe the same, the way it is developed and maintained can be different. Open source is starting to reach this point and some projects have such a long life cycles in user projects it is starting to become an issue. RTEMS is such a project. It is used in space flight and some new projects do not take flight until 2018. Being open source each user has the code and can make changes long past the life of the project, but it is the project and community this discussion is about. 

RTEMS is now 22 years old. It is able to drink, vote and hold a drivers license in most countries. It has experimented with a few things it should not have and so far has not been in trouble with the law. You could say it has had a stable and happy up bring. RTEMS is now looking to the future and life without the current custodians.

RTEMS at its core is a collection of C source files that are built into a C library and linked with user application code to provide single executable image often embedded into a custom piece of hardware. The key factors for the user of this device is performance, resources and stability. The key factors for the developer of this device is availability of source code, easy to use software interfaces, easy to integrate into a team environment, and stability of the project. The key factors for the maintainers of RTEMS is the ability to effectively integrate changes, respond to hardware changes, stable infrastructure and the ability to attract new developers. Developers are the food source that feeds, refreshes and sustains a project.

RTEMS in its post toddler years moved to a new version control tool called CVS that allowed concurrent development of the code. It was liberating because a single set of code did not have to be maintained. Before CVS patches were emailed to the maintainer, merged and then released back to developers as tar files. With CVS this task could be spread among a number of trusted developers. RTEMS also moved from custom makefiles to autoconf and automake. This improved the productivity of the developers allowing the code to be configured and built on a range of host operating systems. RTEMS still uses these same tools 10 to 15 years later and they still work. The developers are comfortable with their work flow and know the problems or issues they have. Why the need to change? There are problems and over time these have grown in size as the project has grown. What were problems are now distance memories and all we have left is the new problems that came with the tools. 

We have files in places that have long since lost there meaning. The board support packages is an example. They are located under 'c/src/lib/libbsp' when they could be located in 'libbsp' or even 'bsps'. This path does not effect the build time or the disk space used and the developers know this path very well so why is this a problem. It makes no sense. Any new users of RTEMS, and by new I mean anyone who has joined in the last 10 years, would have no idea why this structure exists. RTEMS use to have an Ada version and all code was under 'c' or 'ada' and the source was under 'c/src'. Why not move the files? We cannot because CVS does not have a rename command and repository hacks are something we discourage. 

Would we move them if CVS allowed it? Maybe, however this effects the build system. Why is that a problem, is there something wrong with it? Building RTEMS is complex. As a user of RTEMS a release comes with all the autotool's generated files in place ready to work. You can configure RTEMS with a few options passed to configure, plus provide a few more on the command line to the build a range of BSP specific options and then at runtime you have a large array of configurations and runtime options. Are these documented? Only a small number are. The user needs to look into the source to find the full set and even for a seasoned developer this can be complex and not accurate or complete. As a user you just build RTEMS and that does happen and it does it well. By well I mean you get a library of code that is stable and will perform the task asked. As a developer you need to work with the build system and this is where problems start to appear. Performance is an issue. A clean check out from CVS requires a bootstrap to generate all the autoconf and automake files as they are not held in the repository and this can take a lengthy period of time even on large hosts and fast disks. Fortunately this is not often needed as the maintainer mode helps how-ever it makes build-bot type support on check in difficult if not impossible. Also contributing to this is the repeated installing of header files. If you build all 120+ BSPs you will install over 50,000+ header files. This is just building RTEMS and does not include installation of the build output. When installing the 50,000+ files are copied to the install paths. Does this seem normal or ok? Maybe there really needs to be this many headers, or maybe header files have been added to RTEMS following a common template with little regard to the consequence and over the years this has grown to this figure. Most users are only interested in one or two BSPs so this is not a major issue. For a maintainer is it a problem because they need to make sure everything builds and works. 

 I suppose the important questions regarding the build system are "Is it efficient given the new generation of build tools?" and "Does it aid or inhibit the development process?". These are debatable questions which span the boundaries of technical merits, broad range support, supported hosts, and personal preferences. This last one being the most contentious.

The question the current developers and maintainers of RTEMS need to ask is not "Are these tools working and doing the job they are suppose to?", rather if we handed the project to a new group of developers and maintainers "What would new maintainers think of the state of the project?". While we may be comfortable and able to release and maintain RTEMS it may look to a new generation as something from a time past. 

Change is never easy. There needs to be leadership, desire and willingness to refresh to bring about change. It is easy to be negative and to find fault in any new change, then offer no path forward. Leading is not always about "What I think is right", it is about being honest and openly critical of how we work and approach problem solving, and it is about providing paths to new ways of solving problems we face in the project. Not all paths will succeed how-ever being open to change means a new path can be taken until a solution found. Inviting new and young talent to follow these paths and find solutions involves them in the project. They become responsible for various parts and that builds pride and commitment. The hope being someday they will be managing and leading the project.

Friday, October 28, 2011

Flight Software Workshop and Goddard Space Center Visit

OAR was a sponsor of 2011 Workshop on Spacecraft Flight Software at the John Hopkins University Applied Physics Laboratory held October 20 - 22 2011. This was the first time OAR had been an event sponsor and the first time RTEMS stickers have ever been given away. We had spent weeks preparing a new table top display for RTEMS, updating flyers for the project and our services, and having stickers made. The preparation was a lot of work that was outside our primary skill sets. Jennifer Averett is a core RTEMS developer who did the artwork. We are technical folks and not marketing or sales types. If you have ever worked with us to get a quote, you would know. Hence this was a challenge. If you have any suggestions on our display, flyers, etc., please share them.

Chris is much more special.
Mark Johannes, Chris Johns, and I were fortunate enough to have Alan Cudmore take us on a tour of Goddard Space Center. We first visited their SpaceCube lab and saw the smaller current generation and a larger newer one which had been flown on a sounding rocket.

MMS "Flat Sat" Work Area
Alan showed us the MMS laboratory where we chatted and learned about their workflow. The lab included a “flat sat” version with the electronics of a single node from the constellation. Their laboratory also included hardware required to test the system driving all inputs. Very impressive.


The laboratory setup was very nice but we were all shocked when we saw the MMS assembly area and how large each of the four satellites in the constellation was. We expected birthday cake size and they are the diameter of a large rocket body. Each satellite is running a hardened Coldfire CPU with RTEMS and application software built on top of the Core Flight Executive. 

Although not running RTEMS, seeing the James Webb Telescope and Hubble Service Mission assembly areas was a real treat. It is ultimately about the science and I am proud RTEMS is a part of it.

The Workshop on Spacecraft Flight Software  was a real treat. Gedare Bloom and his wife joined Chris, Mark, and I there. I have been to the last three FSW's and have been blown away repeatedly at the incredible work being done by this community. As might be expected, we collectively were delightfully pleased to see so many people using or considering RTEMS. It is quite humbling. Many attendees shared their RTEMS experiences and plans with us. The presentations were video recorded and should be available with slides in the near future.  Presentations and video from previous years is also available online.

All in all, it was a fabulously rewarding experience getting to meet so many people, tour Goddard Space Center, and hear about so many projects.

Thursday, October 20, 2011

Johnson Space Center Visit



I recently was invited to teach a week long RTEMS class to 14 people at Johnson Space Center (JSC) . Multiple projects are considering using RTEMS and two are being the trailblazers which bring RTEMS to JSC. Before saying anything else, I want to thank the folks who invited me and were such good hosts. In addition to a self-guided tour of the Rocket Park, I got a field trip where I got to see a few of the interesting things at JSC.

A complete set of photos from my field trips are in a Facebook album. My field trips included seeing the Saturn V rocket, space suits that had been to the moon, Morpheus lunar lander, and the infamous Building 9 where the ISS and space shuttle training modules, CanadARM, and zero-gravity practice facilities. It also includes a small area of cool projects that didn't make the cut including Robonaut and the movie-scary Spidernaut. Everything I saw was impressive but much of it leaves you with an unsettling feeling of sadness when you realize it has been almost forty years since man went to the moon and, with the end of the shuttle program, we have no ability to put a person in orbit. Big science is not cheap and takes years of effort, but without it, we quit learning more about our universe, getting insight into basic physics, and solving the hard problems.

The Morpheus lunar lander is one of the two projects taking the leap and switching to RTEMS. It has already had multiple successful test flights and at least one “interesting” one (videos). Morpheus uses a PowerPC based computer system and is built using the Core Flight Executive from Goddard Space Center. CFE has long supported RTEMS and as more applications are based upon it, I am sure we will see at least a few of those applications use RTEMS.

DownMASS is a small automated capsule that can be filled with contents that need to be shipped from the International Space Station (ISS). It is being designed to hold approximately 100 pounds (43.5 kg) of cargo. When filled and released from the ISS< it will reenter the Earth's atmosphere and eventually deploy its parachutes. The hardware configuration for prototyping is using a ruggedized embedded PC-104 system.

I am thrilled to see more space applications using RTEMS – especially since have applications at Johnson opens a potential path to having man-rated applications using RTEMS. Thanks to Morpheus and DownMASS for giving RTEMS a chance.

Sunday, October 16, 2011

Google Code In 2011 Announced

The 2011 edition of Google Code-In has been announced.  Google Code-In is a unique opportunity for up and coming hackers.  It is a competition, an open source development contest for 13-17 year old students around the world. The purpose of the Google Code-in competition is to give students everywhere an opportunity to explore the world of open source development. We not only run open source software throughout our business, we also value the way the open source model encourages people to work together on shared goals over the Internet.

Each participating project identifies a variety of tasks which students can choose to perform.  Tasks are not just coding - they can involve documentation, testing, or outreach.  These are the categories:
  1. Code: Writing or refactoring code
  2. Documentation: Creating and editing documents
  3. Outreach: Community management and outreach, as well as marketing
  4. Quality Assurance: Testing and ensuring code is of high quality
  5. Research: Studying a problem and recommending solutions
  6. Training: Helping others learn more
  7. Translation: Localization (adapting code to your region and language)
  8. User interface: User experience research or user interface design and interaction
Black stickers are friom Google Code-In task
The RTEMS Project was fortunate to be one of the twenty organization that participated in the 2010 edition of Google Code-In. We identified about 150 potential tasks had almost 100 tasks done by a variety of students.  One of the highlights was a new RTEMS logo which we are using on stickers and project calling cards. Another student modified the shell scripts which generate our test coverage reports to add a timeline capability.  Some students created Wiki pages for the Board Support Packages that did not have one.  And still other students created question sets for the RTEMS Moodle.

The RTEMS Project is planning on applying again this year.  We have some new tasks in mind for those students who volunteer.  It was an interesting and challenging holiday season for the RTEMS mentors.  I know that I personally did a lot of Google Code-In mentoring and task approval using my phone while visiting family.  We are looking forward to the opportunity be a part of Google Code-In 2011 and to be challenged by the students.

Friday, October 14, 2011

RTEMS Configuration and Resource Limits

This post started as an answer to a question from an ESA Summer of Code In Space student. She had hit one of the things that every person new to RTEMS hits at one point. She attempted to create a pthread mutex via pthread_mutex_init() during the package's initialization. It failed and returned EAGAIN. This was an especially surprising error since it happened in a C++ global constructor which was executed before the first task was entered. It appeared to her that RTEMS was not initialized or something even weirder was wrong. RTEMS was, in fact, initialized and since C++ global constructors are supposed to run before main(). On RTEMS, we run them in the context of the first user task which executes. Let me start with an obvious and decidedly unhelpful assertion:

RTEMS != GNU/ Linux

I am sure that didn't help at all to understand why she got an out of resources error. But it is a lead-in to try to explain the philosophical difference between RTEMS and GNU/Linux that leads us to this. The error she encountered was is in fact an out of resources error. Most people start programming on operating systems that do not discourage you from using dynamic allocation and do not put restrictive limits on the number of instances of an object class you can create. For example, on a GNU/Linux system, you don't worry about how many instances of an OS object you create. There are often limits but these are so high as to not present problems. But the Linux kernel does have some behavioural and limits configuration parameters available to the end user.  If you are curious about these, see /etc/sysctl.conf and sysctl(8).

In contrast, RTEMS is a member of a class of real-time operating systems that was designed for to target systems with safety and hard real-time requirements. There are often limited computing resources. In this design view, it is better to pre-allocate as much as possible so you don't have to deal with running out of resources at run-time. This makes the resulting system safer and less likely to have a weird failure mode in this situation. It is not uncommon for the entire set of tasks, semaphores, etc. to be well known and listed in the application design documentation.  Configuring the resources required is common in this environment. Moreover, it is not uncommon for malloc() to be forbidden after system initialization. There is nothing inherently right or wrong with either of these contrasting philosophies. They are just different approaches given different system requirements.

In RTEMS you configure the maximum number of each type of object you want. The defaults tend to be 0. Memory is reserved for RTEMS separate from the C Program Heap based upon your cofniguration requests. The sample in testsuites/samples/ticker has the following configuration in the file system.h:

#include /* for device driver prototypes */

#define CONFIGURE_APPLICATION_NEEDS_CLOCK_DRIVER
#define CONFIGURE_APPLICATION_NEEDS_CONSOLE_DRIVER

#define CONFIGURE_MAXIMUM_TASKS 4
#define CONFIGURE_RTEMS_INIT_TASKS_TABLE
#define CONFIGURE_EXTRA_TASK_STACKS (3 * RTEMS_MINIMUM_STACK_SIZE)

#include

The ticker application says it needs a console (used for stdio) and clock tick (time passage) device drivers. It may have a maximum of four concurrently instantiated Classic API tasks. It is using a Classic API style initialization task -- the alternative is a POSIX Threads initialization thread. And each of the tasks it is creating has a stack larger than the minimum required. It is assumed that each task requires only the minimum amount of stack space so we have to tell the RTEMS configuration to reserve some extra memory for those that are larger than minimum. You can look at the calls to rtems_task_create() in init.c for the calling parameters that indicate the desired stack size.

The hello world sample application is simpler. It doesn't need a Clock device driver and would only have one task. Your application will likely have a more complicated configuration than hello world but it doesn't need to specify any configuration parameters unless it requires that class of object to be supported.

Our young programmer simply missed defining CONFIGURE_MAXIMUM_POSIX_MUTEXES to however many is required. This is an RTEMS specific issue that is not done on non-embedded operating systems.

With all that background on the hard limit focus on RTEMS configuration, it is easy to forget that RTEMS also has "unlimited object creation mode" and "unified workspace" options. This is probably more useful for our intrepid programmer at this stage. These configuration options lets you specify that you want a potentially unlimited number of a class of objects and that you want the RTEMS Workspace and the C Program Heap to be the same pool of memory.

#define CONFIGURE_MAXIMUM_POSIX_MUTEXES \
    rtems_resource_unlimited( 5 )
#define CONFIGURE_MAXIMUM_POSIX_CONDITION_VARIABLES \
    rtems_resource_unlimited( 10 )
#define CONFIGURE_UNIFIED_WORK_AREAS


The above specifies that POSIX mutexes and condition variables are "unlimited" but the set is extended by five (5) instances of mutexes and ten (10) instances of condition variables at a time. When you create the sixth mutex instance, instances 6-10 will be added to the inactive pool for that object class.

The full set configuration macros are (hopefully) well documented in the Configuring a System chapter of the RTEMS User's Manual. This is a link to the appropriate section for RTEMS 4.10.1:

http://www.rtems.org/onlinedocs/releases/rtemsdocs-4.10.1/share/rtems/html/c_user/c_user00414.html

For normal application development, that's really about all there is to this issue. If you get an out of resources error, you will need to raise the limit. When looking inside the code, any time you see a NULL returned from _Objects_Allocate() fail, it is a maximum objects issue. If you see a task, thread or message queue create fail, then you are not accounting for variable memory like stack space or message buffers which must be allocated when the object instance is created.

Since our intrepid young lady is actually porting a software package, let me throw out another thought  which impacts this situation. There is likely a fixed base set of objects the package creates such as for global mutexes or message queues. A user of package X will create instances of objects that it defines. So if the package X has a macaroni object that requires a mutex, condition variable, and a message queue, then you can let the end user of package X know that for each macaroni instance, they need to reserve A, B, and C. For certain cases, like the tasking in Ada and Go, RTEMS provides higher level configuration helpers like CONFIGURE_MAXIMUM_ADA_TASKS to encapsulate this configuration information.

I know my answer to her was a over the top but I realized that she had run into something that many others hit when using RTEMS for the first time. I really wanted her, and now you, to understand this part of RTEMS. Figuring out how many of each kind of object when developing an application can be tough but figuring that same information out when porting a software package is can really be a challenge. Embedded developers focused on safe, reliable systems don't like surprises and using various techniques to avoid running out of resources at run-time is a big part of it.

Friday, September 16, 2011

Elvis Costello and RTEMS.info History

Over the years, I have told many RTEMS users that I provide hosting and system administration for an Elvis Costello fan site (PHPBB Forum and Wiki).  I have had the fan site for about 9 years now.  What many of you probably don't realize is that I have also hosted the RTEMS.info Mirror Site since 2006.

This is a personal effort and I receive no subsidy from OAR for doing this.  In order to have a static IP address and host services, I have to have a business class account which is more than a residential class account.  All of the sites I host plus the family Internet activities share a 7Mbps/1Mbps connection.

I had been helping on the technical side of administrating the Elvis Costello Fan Forum for a while when the hosting service got hacked and our site trashed.  We were unable to get anyone to contact us via email or phone for over a week.  I realized that I had a computer running GNU/Linux Fedora that was largely unused since I had upgraded.  Even though it was only a 350 Mhz Pentium II with 384MB RAM, it was perfectly suitable to host a small (~100K hits a day) website.  I made a phone call to get a static IP address, moved the domains and within about a week we were back up.

A couple of years after that, the person who ran the Elvis Costello Wiki asked if I could host it.  I had already planned to upgrade to a 2.4 Ghz Pentium 4 with 2GB RAM.  We decided to wait to move the Wiki until after the server upgrade. The hardware upgrade went easily and we moved the Wiki.  What surprised us both was that the performance on the site went to hell.  The server logs showed nothing, load looked low and no amount of tuning or probing helped. I begged my ISP for help and got an unlocked, uncapped cable modem to test with.  After more research and fighting, I learned that my router could not hand the number of simultaneous connections and was dropping them randomly.  I upgraded routers and the performance issues were settled.

The site has always used external hard disks for backup.  There is a script which runs every night and dumps all user directories, databases, etc to a special directory on an internal disk.  Then another which runs later and "rsync's" the internal disk copy with an external one.  Backups are placed in dated directories and a few a month are saved.

When I set up the RTEMS.info Mirror, I was more concerned with disk space than bandwidth consumption.  I don't have the fastest connection and I am sure users would appreciate a faster uplink. But I foot the bill and until there is funding, this is what there is.  The RTEMS.info site mirrors at least 4 times a day.  Ralf Corsepius has set up automated checks which let us know when a mirror site is down or out of sync.

In late 2010, I became worried that the 2.4 Ghz Pentium 4 was getting very old.  It was not new when it became the server and I was sure it had seen at least 6 years as server.  The Elvis Costello Fan community rallied around my request for a new server and within a month or so, the fund raising goal was met.  The new server is from AS Labs who specialize in building custom GNU/Linux systems.  It has a quad-core 3.0 Ghz CPU and 8 GB RAM.  It runs very cool and is far from overloaded.

Over the years we have had period power outages with the worst being the tornadoes of April 2011. But overall, I believe our uptime is very good.  I don't track it but thanks to the Elvis Costello fan community, it can't be down over 20 minutes without me getting an email. Thanks folks!

My wife and I have learned a lot about system administration over the years of maintaining these sites.  She personally reviews and approves every account request for the PHPBB Fan Forum.  The number of spam account requests is boggling and periodically she begs me to try to find another way to stem an increase.  The Wiki account requests and spam were solved when we instituted a very strict policy on getting an account.  A small group of people review and approve these accounts.

The server also hosts a couple of very low volume sites for friends.  They are more interesting from a content viewpoint and I want to share more about them in a future post.

Thursday, September 8, 2011

RTEMS Pair Programming

One of the most interesting and under-utilized RTEMS services that OAR Corporation offers is RTEMS Pair Programming.  This service is a great solution when dealing with a customer who wants to a big head start on some type of development effort. In Agile terms, this is a development sprint with a team consisting of RTEMS and customer supplied experts.  Most of the time, we do this for BSPs and device drivers.

Left to Right: Walter Nakano, Wendell Pereira da Silva,
Joel Sherrill and Jennifer Averett
The key to pair programming success is that we know RTEMS and the customer knows their hardware and test equipment.  OAR folks can concentrate on quickly providing the framework for the BSP and needed devices drivers. Then we work together to author the device drivers. This provides them with specialized training on the details of the BSPs and device drivers that are critical to the success of their application.  Usually the initial testing is performed as joint effort with subsequent detailed testing performed by the customer engineers.

Recently, OAR got to host Wendell Pereira da Silva and Walter Nakano from COMPSIS  for two weeks of intense development activity.  Their system consisted of an embedded PC plus some add-on boards which added up to a lot of individual pins and ports to test.  They brought a LabView test right which allowed us to test every input and output on the Multi-I/O board.The hardware list was:
  • RTD CME137686LX Embedded PC
    • 4 COM ports and i82551 NIC of particular interest to their project
  • RTD 17320HR Octal UART PC-104 board (PCI interface)
    • Exar PCI Vendor Id with 8 NS16550 compatible serial ports
    • NOTE: We only had one of these boards but they will have 4 in the real configuration!!
  • RTD 316HR Dual Synchronous Serial Port PC-104 board
    • single Zilog Z85230
  • RTD 6425HR Multi-I/O PC-104 board
    • 16 differential or 32 single-ended analog input channels
    • 4 analog output channels
    • 32-bit discrete I/O with 16 bit programmable for interrupt on input change
One thing should quickly stand out with viewing that hardware list.  They have a LOT of serial ports.  Four asynchronous on the embedded PC, 32 on 4 PCI-104 boards, and 2 synchronous on the Z85230 board for a total of 38 serial ports.  This is actually a classic example of a case where using the libchip serial driver framework would be very useful.  But the PC386 console driver was not designed this way.  Before the guys arrived, Jennifer reworked this driver to be libchip style.  At the same time, I factored out the mouse input stream parsing code.  It really wasn't BSP dependent and by moving it to cpukit/libmisc/mouse, I made it potentially available to every BSP.  Jennifer and I had tested COM1 and COM2 on qemu before they arrived but waited for their hardware to test COM3 and COM4.

COM1 and COM2 worked as soon as the cabling was correct. COM3 and COM4 proved more difficult.  After struggling to find a software problem, it occurred to me that it could be as simple as RTS/CTS not being wired together in the shell since we were using a 3-wire connection.  That was indeed the problem.  Next came the octal serial port board.

After realizing that we would end up with a libchip configuration table with 38 entries and most of them would be disabled for "normal" configurations, I had the idea to allow for dynamic registration of new "ports" in the libchip configuration table.   The idea is that if you probe for a bank of 8 serial ports and find them, then you can dynamically add 8 more entries to the libchip configuration table.  Currently this allows probes to insert entries prior to console_initialize() being called.  It is possible to allow them to be registered after this point but they would not be available to be /dev/console or used for printk().

While I was implementing dynamic registration, Jennifer and our guests worked to get the PCI probe to find the card and the first serial port working.  We were surprised to learn that it didn't have a vendor Id of RTD but Exar.  This explained the sparse programming documentation from RTD.  As soon as the probe and one serial port worked, we switched to my dynamic registration code.  Soon all eight ports on the board we had were working.  Plus I added code to detect the 2, 4, and 8 port variants of the Exar chip.

Next was the dual port synchronous board.  Unfortunately, RTEMS does not have a Z8530 synchronous driver but does have a standard libchip asynchronous driver.  After fiddling to figure out the baud rate clock divisor math, we ended up with both ports working.  There was one issue in the driver we did not resolve in the two weeks they were here.  The two ports on a Z8530 share a single interrupt status register which when read, clears the source.  You have to be extremely careful to touch it one time and process all interrupt sources on both ports.  The ports worked individually but not when both were installed.  Jennifer and I had a solution but not enough time to implement it. Hopefully Wendell and Walter can implement it and we can get this resolved in the main tree.

Next was the Multi-IO board.  If you have been following my blog a while, then you might remember the entry RTEMS Shell as Debug Aid where I discussed adding commands to the RTEMS Shell to aid in debugging a Winsystems Multi-IO board similar in capability to this board.   One of the last things Jennifer and I had done to the existing multiio was to define a board independent interface between the shell commands and the actual driver.  My plan was to let this interface evolve and grow as we learned more about user application requirements.  This was the first opportunity we had to write a driver to this interface and reuse the commands.  As might be expected, there were places where 0/1 based numbering of inputs still reflected the Winsystems board.  And there were places in the RTD documentation that were unclear.  But after a while of fighting these and the normal cabling issues, we were able to use the existing commands to debug the driver and verify that all discrete I/Os to work polled and interrupt driven and that all analog inputs and outputs work polled.  We ran out of time before we were able to attempt analog input interrupts.

The final thing we attempted was getting the RTEMS TCP/IP stack to run on this board.  It had an Intel i82551ER NIC which required using the drivers in the libbsdport kit of late model FreeBSD drivers.  This driver works on qemu when you configure qemu for the i82559 simulation.  We verified the basics were OK on qemu.  Then we moved on to the real hardware.  After the normal hunt for an extra cable and battle of the network settings, we were able to run the telnetd application from the network-demos module.

Walter and Wendell drove their hardware.  Jennifer and I were the main forces driving the code but they reviewed every line of code and we all verified that each line of code programmed the hardware as we all agreed it should be.  Along the way, if something was unclear, we took a break from coding and testing to focus on a portion of the RTEMS Open Class that was very specific to what we were working on. The goal was not only to have as much functional code as possible; it was also to ensure that the code was high quality and they left understanding it and capable of modifying it should the need arise.

At the end of two weeks, we all were thrilled.  Walter and Wendell had been sending home progress reports and every day I continued to be amazed at the progress we -- as a team -- had made.  This amount of progress was possible because each of us brought unique skills and knowledge to the table.  Jennifer and I knew where to reuse code from in RTEMS and how to create an elegant solution the RTEMS way.  Walter and Wendell were intimately familiar with their hardware and test equipment and ensured we tested well.  Together we reviewed all code to ensure they left understanding it.

I really enjoy teaching the RTEMS Classes but RTEMS Pair Programming is one of the most fun and personally rewarding services we offer.  I always come away amazed at how much is working at the end of an intense 2-3 week development sprint.  By bringing together engineers with complementary skills and knowledge, solutions are found quicker.  And solutions are ultimately what we all want.

Thanks to Walter, Wendell and Daniel who couldn't make the trip for two weeks of fun and productive work.

Sunday, August 28, 2011

Spell Checking Not Working in Firefox

This is not RTEMS related at all but everyone in the field has done free technical support before.   I got asked

Why isn't spell checking working on my new Firefox install?

It turns out there can be a variety of reasons and from my searches, the possible answers are not all in one place.  This article is an attempt to put all the things to check in one place.

WARNING: I checked this on Fedora (GNU/Linux) and menu options may vary based upon your host operating system.  Hopefully they are close enough to keep you on track.

The first thing to check is whether spell check is enabled at all.  Select "Edit > Preferences" from the menu.  Navigate to the "Advanced" tab and select the "General" subtab.  Make sure the "Check my spelling as I type" option is enabled as shown in this figure:


If this option is selected, then we will have to check two more things which might not be correct.  In order to do these you will have to navigate to a website which has a text entry form.  Composing mail on a web mail client like Google or Yahoo mail will work.  When you are there, right click in the text entry area and verify that "Check Spelling" is checked.

If it is, there is still one more thing to check.  Apparently some Firefox distributions were shipped without dictionaries.  If you installed one of these as a clean install, rather than as an upgrade, then you didn't get a dictionary.  I don't know why and it doesn't really matter.  If you have installed one of these versions, then you need to install a dictionary.  Again, you need to right click but this time, select "Languages > Add Dictionaries".  This will guide you through installing the dictionary you need.

As I was investigating this to help the person who asked, I learned I was using the "English / Zimbabwe" dictionary which I found surprising.  It definitely explains why I use UK spellings frequently.  It is necessary to make that language variant.  Colour me surprised.  Or color me surprised not that I am using "English / United States".

If there are any other things which might explain spell check not working in Firefox, please leave a comment.  

Wednesday, May 18, 2011

Passing of Dr James Johannes

My normal blog entries cover technical issues.  This post is going to be very different.  It is an homage to the man who referred to himself as my academic grandfather because my Ph.D. advisor had been his first Ph.D. student.  A man I have had the pleasure of knowing and working for for almost twenty-five years.

Dr. James D. Johannes
Dr James Johannes (Dr J) passed away Tuesday May 17 2011 at the age of 76.  This came as a shock to those who knew him because he was a vibrant person.  He was the type of person that one just expected to always be there.  Until a few months ago, he came to the office nearly every day. Within the past few years, he earned his pilot's license for the first time.

Dr Johannes earned his Ph.D. in his early 40's from Vanderbilt University.  At the time, he lived in Huntsville and had two children (Mark and Michele) and commuted about 100 miles each way to take classes.  His dear wife Aurelia -- who was the epitome of a classy and tough southern lady -- supported him and the children through this.  This was clearly a factor in them being understanding and supportive of my finishing my Ph.D. with four small children in the house.  Aurelia appreciated what my wife Michele (not his daughter) was doing.

Dr Johannes founded On-Line Applications Research (OAR) Corporation in 1978.  He was an Emeritus Professor of the Computer Science department faculty at University of Alabama in Huntsville and was the first head of the department.  Based upon the number of dissertations on the shelves at OAR, he advised over twenty-five successful Ph.D. students. When he retired, he was serving as the Graduate Dean of the UAH College of Science.  He wrote the Thesis and Dissertation Style Guide required by the university that I followed on my own dissertation.  It certainly made it easier to get clarifications.

I first encountered Dr J as a student in Spring 1988.  I was taking the Ph.D. level Operating Systems class he was teaching. I must have made a good impression because after the class was over, I received a job offer.  I asked if they could hold the offer for a few months.  My daughter Jessica (now 22) had just been born and I wanted to make sure she passed her six-week check up before switching jobs.  In July 1988, I started work at OAR and my first project was RTEMS.  You should know the RTEMS story.

Dr J also could show pride in those around him. I was the lucky recipient of his special events twice.  The first was a company wide lunch at the Huntsville Country Club when I passed my Ph.D. defense!  He knew the system and didn't wait for graduation.

Twentieth Anniversary
The other special event was in 2009 when we celebrated my twentieth anniversary at OAR.

There are many Dr J stories but I will only share a few.  Long ago, he called me at home on a Saturday completely unexpectedly.  I assumed I was about to be fired and went to a quiet place in the house to take the call.  Dr J had that professorial demeanour that makes professional students always a bit leery. It turned out he needed some sysadmin help with a Solaris computer he had at home.  Why he had a Solaris computer at home I don't know.

He didn't like to dispose of old computers. OAR still has a CP/M computer with 8" floppy drives in storage.  The HP1000 was taken to his house after the OAR folks refused to move it to our third office location in the mid-1990s.  I think Aurelia finally made him dispose of it.

Dr J was loved and respected by a wide circle of people.  He will be missed.

Tuesday, May 3, 2011

Power Restored But Issues Remain

Power was restored to OAR at ~5pm CST Tuesday May after being off since about the same time May 27.  The area is still under a dawn to dusk curfew so no hands on to fix things until tomorrow morning.  If a machine came up, we are running checks remotely.  The known status is
  • rtems.org rebooted cleanly and appears to have all services running correctly.
  • OAR VOIP phone service rebooted cleanly.
  • mail.oarcorp.com did not automatically come up and needs hands on help.  I was not the one checking this machine and that's all I know.
  • None of the RTEMS lab machines are up.  I smelled something acrid in
    the lab after the outage so don't know.  They are on different UPS's so unless it is the circuit the machines are on, it has to be a single piece of equipment. That has to be investigated tomorrow.
Tomorrow's work is focused on testing batteries in UPS's, running checks on machines, and cleaning the refrigerator.  Everyone is really chomping at the bit to get back to work.

rtems.info and Power Update

rtems.info is my personal server.  It is a purely volunteer effort and unfunded.  It is in my home.  We lost power Wednesday April 27 about 6pm CST.  Power was restored overnight Sunday.  Monday we returned home and I started to get the server back online.  There was some damage to various MySQL databases so I did a check like this after stopping mysqld:

service mysqld stop
cd /var/lib/mysql
for i in */*.MYI; do myisamchk --max-record-length=1048576 -r -f $i; done
That seemed to resolve all of those issues.

The outstanding issue now is that it appears my ISP had some damage to their network operations center.  All appears OK but in the process of recovering, they have deleted the DNS entry for rtems.info.  This was filed with them last night.

Huntsville is still under a dusk to dawn curfew and power is NOT restored to Research Park.  City schools are scheduled to start again on Thursday May 5.  So we are hoping for power tomorrow or Thursday.

More as it becomes available.

Terrible Storm and RTEMS Outage

Surely by now, you have noticed that the RTEMS Project appeared to drop completely off the face of the planet about 6pm CST April 27.  It was at this time that the third storm system moved through north Alabama and knocked out all major power transmission lines.  This blog is a first in a series to let you all know what happened and what is happening now.

The primary servers and lab machines for the RTEMS Project (.org and .com) are in Huntsville Alabama which was in the path of the storm April 27th 2011. This storm  killed 340+ across multiple states and left a huge trail of destruction.  I have heard reports of business signs being found 100 miles (160km) away.  The following is a nice weather summary without the heart wrenching photos of the death and devastation.

http://www.washingtonpost.com/blogs/capital-weather-gang/post/alabama-tornado-outbreak-visuals-jaw-dropping-radar-and-satellite-imagery/2011/04/29/AFg1C5YF_blog.html

Huntsville had three systems pass over it that day. The first was nasty but no issues impacting the server or our home. The second system resulted in water getting into the rtems.info server area but we still had power. This allowed us to clean up and get the server back online until the third system hit. The third system was the killer. It was the devastating one that wiped out communities from Mississippi through Alabama and Georgia to points further north. It destroyed the major power transmission lines into north Alabama and Mississippi. Local utilities get power from the Tennessee Valley Authority (TVA) and TVA could not supply power to them. Their blog is here with details:

http://www.tva.com/news/releases/aprjun11/storm.htm


I was teaching an RTEMS class during this with the sole attendee being a wonderful fellow from the UK.  We spent much of Wednesday in a safe area inside OAR.  And once the storm had passed and we realized we were the lucky ones, we finished the class without power.  His hotel room was wet and without power but his bed was dry.  He could charge his laptop from the car.  We took a table and a couple of chairs from OAR and sat outside the door.  When the sun moved and we got hot,  we moved the furniture.  At one point, we were on the other side of the parking lot.  We had nothing else to do and a dusk to dawn curfew, so we followed the class material, chatted, drank soda, etc.. Just chilled and did RTEMS stuff.  Phillip deserves a big thank you for helping me make sure all was turned off Thursday and cleaning the fridge and freezer Friday. I sent him to Chattanooga for the weekend and I hope it was some nice relaxing site-seeing.

Friday, my family went to a hotel in a neighbouring city and waited for power to be restored. We came home Monday afternoon since our home had no apparent damage and power was restored.  We are near the main hospital so we usually get power early.  I started ensuring the rtems.info and elviscostellofans.com server came back up OK. Michele is cleaning the fridge and freezer out. If we didn't lose any electronics due to power spikes, then that's all I think we have. That makes us very lucky. Michele and I know people with deaths in their families or homes destroyed. Cleaning the fridge looks pretty tame in comparison.

We tried to stay in touch using our cell phones for email but the towers died about 12 hours in.  Plus if we didn't know you on Facebook or via our private emails, it looked like we disappeared.  I apologize for not remembering linkedin and the RTEMS facebook group.  I didn't even remember IRC until Friday when we got to the hotel.

Huntsville Utilities announced yesterday (Monday night) that they have done all they can.  Everywhere TVA has given them power has been passed on to residents.  Only 30% have power.  I believe that Redstone Arsenal, Marshall Space Flight Center, and Research Park will be the last in the area to be restored.  They consume a LOT of power between them and it is more important to get power back to houses.

We really appreciate the good karma that was sent our way. We are both tired and frazzled but that's no biggie.

Friday, April 22, 2011

More RTEMS SMP Patches Coming

Some of you may be aware that SMP for RTEMS has been underway for about a year now.  The goal of the SMP effort is to have a simple, working, and correct implementation.  The first incarnation will have the following characteristics.
  • BSP SMP Interface definition with implementations for
    • PC386
    • LEON3
  • Simple SMP Aware Priority Based Scheduler
  • Faithful SMP safe version of RTEMS OS Critical Sections
    • Dispatch Disable
    • Interrupt Disable
  • Scheduler Simulator 
    • Test scenarios for new Simple SMP Scheduler
  • Features Not Present
    • Processor affinity
    • Deferred Floating Point context switch
    • Taking a core offline
The SMP implementation plan broken the effort into as many small steps as possible so it could be incrementally reviewed and merged.  This plan also allows for intermittent work.  This was critical due to the fact that all initial work was completely volunteer.  In addition to being planned as a series of small steps, the initial SMP implementation is focused on simplicity and correctness.  We can improve a simple working implementation.

Gedare Bloom and I made the first steps last summer when implemented the Pluggable Scheduler Framework for RTEMS and I added a "per cpu" data structure.  Together these allow us to provide an alternative scheduler that is SMP aware and to have the data required by RTEMS SuperCore to manage each core encapsulated and allocated properly.

Jennifer Averett and I have been working the past couple of months on completing the SMP support to RTEMS.   Jennifer and I are approaching a milestone of having a basic SMP system functional with the only major missing item being SMP safe interrupt disable sections.  We are about to file a set of PRs and merge our current work and it made sense to post a blog entry with status that the PRs could reference.
  • Test code - our test code is hacky since it has to force interprocessor interrupts. We need to integrate where these are generated   inside RTEMS.  Tests will be submitted once the code is in shape to work without "user-level" intervention.
  • Simple SMP Scheduler - implemented, tested with schedsim and our hacky test
  • Scheduler Simulator - multiple changes to improve its use during Scheduler development and to track changes in code base
  • PC386 BSP - SMP BSP support seems complete.
  • LEON3 BSP - SMP BSP support seems complete.
  • Context Switch Disable Critical Section - Working
Overall, (today) SMP RTEMS  can bring an SMP system out of reset, schedule across multiple cores, and command the first dispatch on the secondary cores.  The "disable dispatch" should be SMP safe now.  It can do this on pc386 and leon3.

The next major tasks are to integrate the generation of interprocessor interrupts for subsequent dispatch requests and system shutdown.  We also have to address interrupt disable SMP safety.

We are doing our best to break this into as many small incremental pieces as possible so the review and integration into the main tree is easier.

Wednesday, April 20, 2011

Behind the Scenes of the RTEMS Tool Binaries

I recently posted a long email to the RTEMS Users mailing list about what went into the building and distribution of the pre-built RTEMS Cross Development Tools.  I thought it would be interesting to clean that post up and turn it into a blog entry for posterity. 

I don't think most people in the community realize what goes on quietly behind the scenes for the tools. When someone installs a pre-built toolset, it is the result of Ralf Corsepius' ongoing effort. 

OAR hosts the RTEMS Build Farm and Ralf uses these machines to build the tools. When there is a change in a patch or tool revision, he very quickly responds and kicks off tool builds. I have no idea how long it takes for them to finish building but the number of individual toolset combinations is staggering when you consider the multipliers:
  • number of target architectures (~10-12 depending on RTEMS version)
  • number of host OS distributions and version
    • SUSE
    • CentOS/RHEL
    • Fedora
    • mingw32 
    • Cygwin
  • 32 and 64 bit hosts
Today there are 15 unique host variations for 4.11. This results in approximately 25GB of tool content for 4.11 on ftp.rtems.org. In addition, there are binary toolsets on the ftp site for release branches back to 4.6.  So the main ftp site has a LOT of stuff on it.

Ralf is very quick about getting new tool binaries out. Because of this, RTEMS is typically the first project to release binary tools after a binutils, gcc, or gdb release.  For gcc 4.6.0, he tracked the final release candidates so we were using the release image before the announcement. :-D 

After the tools land on the RTEMS FTP site, there are two yum mirrors of the rtems.org site:

rtems.eu [1]
rtems.info [2]

It can take hours for the mirror process to complete. Ralf has a script that checks the mirrors each hour.  This script emails those interested when things get out of sync.  The Yum repository for each RTEMS branch, distribution, OS version, and 32/64-bit variation is checked individually.  When a mirror out of sync for a variation, that single mirror is taken out of the yum mirror list for that variation until it has time to resynchronize. When a tool build is under way, I might get email for 8+ hours showing the progress of the synchronization.

Check out the Munin performance graphs for rtems.info to see how long a recent tool mirroring took.

This is what goes on behind the scenes to make the tool binaries available.  There is a different process for building the various tool chains, running the tests on them and reporting them to both the RTEMS Tool Test Results and GCC Test Results mailing lists.

If you would be interested in DVD distributions of the pre-built tools, let me know.

--joel

[1] rtems.eu is sponsored by Embedded Brains. I don't know its speed.

[2] rtems.info is my personal server and is sponsored by love and donations. It is an 8/1 Mbps connection which could be upgraded.

Tuesday, April 19, 2011

Merging Multiple PDF Files

If you have taken one of the RTEMS Classes from me, you will remember that the material for the Open Class comprises over 1000 PowerPoint slides. [1]  These slides are broken down into sections and within each section, there is a unit of 20-100 slides.  Each unit is an individual file.  Getting from 50+ PowerPoint files to printed material is a tedious and error prone process by hand.  The class and this process have evolved over the past ten years.  In this post, I will provide some insight into how this is done.

The first piece of magic is an MS-Office macro written by someone here are OAR.  It reads in a list of files from a text file.  The files are in the order they are to be printed.  This macro automates either generating PDFs or directly printing the files in the various handout formats (1 per page, 3 per page, 6 per page, etc.).  The PDFs are generated using PDFCreator which makes it possible to specify a unique file name for each PDF file.  The PDF files are prepended with a number so they sort and print in the correct order when wild-carded.  This produces files like this:

001-OpenClass.pdf
002-IntroToRTEMS.pdf
003-ProfilesAndRTEMS.pdf

...
Once the PDF files are generated, they can be printed easily.  However, I sometimes teach the class in Munich and have to send the PDFs to the nice folks embedded brains GmbH  to print there.  For the first few classes, there I sent them a large number of PDFs.  When someone dropped the master copy, we learned it didn't have page numbers.  This taught us to add page numbers. :-D

But this still leaves us with a large number of PDFs.  The solution to this was a custom  shell script that merges them into proper double-sided "units".  Each unit is then a single PDF file which goes between divider tabs in a binder.  Now there are seven PDF files for the Open Class and each page is numbered.  Much safer and easier.

The script to merge the PDF files was developed and executes on GNU/Linux (no surprise, right?).  The key to this program is this shell function:

merge_them()
{
  outf=$1
  shift
  inf=$*
  gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=${outf} -dBATCH ${inf}
}
This function takes the name of output file as the first argument and the set of PDF files to merge as the rest of the arguments.  When invoked, the command looks something like this in my shell script:

merge_them ${mergedir}/01-Intro.pdf 00[1-5]*.pdf
That takes the first five "section" PDF files and merges them to produce the PDF file named 01-Intro.pdf for the Introduction to RTEMS "unit".  This file  is  placed in the output directory ${mergedir}.  This is repeated for each of the units in the class.

But remember -- I want to produce a double-sided master copy.  Sometimes, the merged PDF files for a unit will have an odd number of pages.  The script has another section to detect merged PDFs with odd number of pages and add a page the says "Intentionally Blank"  [2]  The following fragment of the shell script determines how many pages are in the PDF file. If the number of pages is odd, it them adds the Intentionally Blank PDF file.

pages=`pdfinfo $1 | grep Pages | cut -d':' -f2`
remainder=`expr ${pages} % 2`

if [ ${remainder} = 1 ] ; then
   mv $1 XXX.pdf
   merge_them $1 XXX.pdf ${BLANKPDF}
   rm -f XXX.pdf
fi
And that's it.  It only takes about a minute to run and produces double-sided files that are very easy to send to a printer.  We have a nice duplex printer and by using paper that is already 3-hole punched, constructing the material for the RTEMS classes is much simpler than it was 10 years ago.

--joel

[1] OpenOffice did not exist when the slides were created.  I have tried to use OpenOffice with them, but it butchers the slides and destroys. them.  If this is ever resolved, I will happily use OpenOffice for the class.

[2] The "Intentionally Blank" page was generated in OpenOffice. :D

Thursday, April 7, 2011

A Close Look at a Funny Spam

I get a lot of spam.  And when I say a lot, I really mean it.  Most of it is caught by the OAR spam filter but I have to have them keep my settings down a bit to ensure I always get random sales inquiries.  I have had the same email address for over 15 years and am very open about it.  I post to free software development mailing lists and sometimes those get archived with email addresses.  When people get viruses, they have my email from those lists or personal correspondence.  So I have received dating and penis enlargement spam that is supposedly from people who would die if they knew.  I generally just delete it quickly but sometimes read it.

This morning, I received this gem.  I changed the return address and the phone numbers.
Received: from [32.11.46.109] (helo=tdyoireznpl.tmxmykph.ua)
Subject: Free heroin shipping!

FREE HEROIN SHIPPING!

1. Heroin, in liquid and crystal form.
2. Rocket fuel and Tomohawk rockets (serious enquiries only).
4. New shipment of cocaine has arrived, buy 9 grams and get 10th for free.

Everebody welcome, but not US citizens, sorry.

ATTENTION. Clearance offer. Buy 30 grams of heroin, get 5 free.

Please contact: SPAMMER_EMAIL@gmail.com

PHONE 0093(0)1234567
FAX 0093(0)1234567

Afghanistan
There are so many things to notice.  The incorrect spellings and numbering (1, 2, 4) are how they arrived.   The interesting mix of drug spam and weapon spam.

On the technical side, they claim to be from Afghanistan and the 0093 country code is for Afghanistan but the email address listed is gmail (frowny face to them), the "helo" exchange with the mail service indicates it came from .ua which is the Ukraine.  But the IP address listed is neither of those.  The IP address is allocated to .... are you ready...

OrgName:        AT&T Global Network Services, LLC
OrgId:          ATGS
Address:        3200 Lake Emma Road
City:           Lake Mary
StateProv:      FL
PostalCode:     32746
Country:        US


So we have spam from Florida, claiming to be from the Ukraine, and wanting us to contact some drug/weapon runner in Afghanistan.  My guess is that this is likely spam from one of those infamous Russian botnets driven by the Russian mob.

The marketing and salesmanship is brilliant in a warped way. You have to smile at the offer of free shipping and buying 9 grams and get one free.  And who could resist the Clearance offer?  All we are missing is an order by midnight tonight and the more you buy, the more you save.

The pinnacle of the marketing here is that this offer is not available for American citizens.  Wow!  What a great use of reverse psychology.  Now all of the Americans reading this spam want the drugs with free shipping and a six-pack of "Tomohawk" missiles.  Hold me back. 

This is nothing I will reply and it is already deleted but very entertaining to read. This is even funnier than the offer I got recently which was supposed to be from an RTEMS Steering Committee member and slipped through the the RTEMS Users mailing list this week.  It wanted us all to look at their photo set at a Russian "sex flirt girls" site.  I really hope they were not pictures of him. LOL

Tuesday, March 8, 2011

Coverage Testing Finds Unexpected Bug

Recent posts have been about debugging experiences. Jennifer and I have been working on a simple implementation of the RTEMS priority based scheduler to illustrate how to write an alternative scheduler. The primary RTEMS Scheduler is deterministic (e.g. constant, predictable) in performance because it uses a FIFO per priority and a two level bit map to assist in look ups. The simple scheduler uses a single list for all tasks and searches down the list to perform inserts. Its worst case is thus O(n) where n is the number of ready tasks. But this post is above debugging, not about the new scheduler. We will discuss that another time.

Jennifer and I want the simple scheduler to have 100% test coverage when it is merged. This has required us to run coverage analysis on RTEMS and check the results. We haven't yet merged this code so it isn't showing up in published runs. She noticed that thee method _Scheduler_priority_Enqueue_first was reported as never executed. We both found this incredulous since this is called as part of inheriting a priority. We knew this was well tested. She was perplexed and asked me what I thought. It turned out to be a recently introduced typo where _Scheduler_priority_Enqueue was configured as the handler for both the enqueue and enqueue entries points in the Scheduler Priority table. This also explained a couple of test failures Jennifer had noticed but hadn't investigated yet. This is the pertinent part of patch:

- _Scheduler_priority_Enqueue, /* enqueue_first entry point */ \
+ _Scheduler_priority_Enqueue_first, /* enqueue_first entry point */ \

Again, this problem was highlighted by an unexpected drop in the coverage results. RTEMS has very good test coverage and an unexpected change can indicate a problem. The fact that this method was never called was a huge hint to the cause. The answer popped almost instantly in my head. Debugging the test failures would almost certainly have taken much longer.

Debugging an Invalid Memory Access

This time we have a guest at RTEMS Ramblings -- Chris Johns. Chris is a long time RTEMS contributor and member of the Steering Committee. He was the second person outside the core team to submit code to RTEMS. He has fielded some pretty impressive RTEMS applications. This is his discussion of he and I trying to track down an invalid memory access on the SPARC/SIS BSP. From here on.. "I" == Chris. :D

I have a SIS BSP failure with RTEMS head (+rbtree patch) and the latest tools on a MacOS host running the "Hello" sample application in the erc32 simulator. Joel is not seeing this so I have to dig into what has been a stable platform for me to developer the RTL code on. This is an account of the debugging session as I work though the problem.

I run the hello test and all I get is:

$ sparc-rtems4.11-run sparc-rtems4.11/c/sis/testsuites/samples/hello/hello.exe

Memory exception at ffffffe0 (illegal address)


Not much help from the tool. The same happens if I run the same application in gdb.

Setting a break point on boot_card and entering 'r' to run the application shows me the target is loading, booting then reaching the boot_card function. Nice because this means the tools are not broken and RTEMS and the BSP are sort of sane. I build the MacOS tools from source using the SpecBuilder tool and track Ralf's changes closely. This means there is always a chance something went wrong in the build. I have never seen this but there can always be a first time. I can step the code up to the API_Mutex_initialization call then something goes wrong but I am not sure where. Maybe I should look from the bottom up.

I decide to track down who is printing the error message "Memory exception ...". Running strings and grep on the executable shows it is not in RTEMS. I see it is generated by the simulator. I also see the simulator generates a trap 9 exception. This exception is not handled in the target so the simulator terminates. Fair enough but the stopped simulator destroys the state information of where the fault is which means I cannot see what happens.

A 'sparc-rtems4.11-objdump -D --source hello.exe | less' shows me trap 9 is address 0x2000090 so I set a break point with b *0x2000090 and run again. I break at the trap 9 entry point and stop the simulator exiting:

(gdb) c
Memory exception at ffffffe0 (illegal address)

Breakpoint 4, trap_table () at c/src/lib/libbsp/sparc/erc32/../../sparc/shared/start.S:68
68 BAD_TRAP; ! 09 data access exception


I can now inspect the state of the processor and try and find the source of the problem. First stop is a back trace:

(gdb) bt
#0 trap_table () at c/src /lib/libbsp/sparc/erc32/../../
sparc/shared/start.S:68

Nothing helpful here. We know we are at this location and what we want is what happened before this.

I do not know SPARC processors very well and a dump of the registers gives me little information. I do know the stack works down and most processors save the return address on the stack. GDB is nice by providing me with the stack pointer as $sp. I dump the stack:

(gdb) x /32xw $sp
0x23ffcd0: 0x00000034 0x02007378 0x0200737c 0x00000008
0x23ffce0: 0x00002000 0x02014928 0x00000001 0x00000000
0x23ffcf0: 0x02012a30 0x02014924 0x02012a30 0x00006054
0x23ffd00: 0xffffffff 0x00000004 0x023ffd38 0x0200792c
0x23ffd10: 0x00000000 0x00000000 0x00000000 0x00000000
0x23ffd20: 0x00000000 0x00000000 0x00000000 0x00000000
0x23ffd30: 0x00000000 0x00000000 0x02012a30 0x00006089
0x23ffd40: 0x0201a9ac 0x00000008 0x020148b0 0x02014928


I have a separate window open with the sparc-rtems4.11-objdump output in less so I can search around with ease. With a few simple 'less' commands I can find the code at a specific address. The first address is '0x02007378' which must be the last one pushed. In less a '1G' takes me to the start of the dump then entering '/2007378' and enter brings up some code to do with the heap:

02007374 Heap_Protection_block_check_default:
static void _Heap_Protection_block_check_default(
Heap_Control *heap,
Heap_Block *block
)
{
2007374: 9d e3 bf a0 save %sp, -96, %sp
if (
2007378: c2 06 60 04 ld [ %i1 + 4 ], %g1

Dumping '$l1' gives:

(gdb) p /x $i1
$14 = 0xffffffdc


Dumping '$l1' gives:

(gdb) p /x $i1
$14 = 0xffffffdc

This is very close to the address in question. Time to run again this time with a break point on this address and a couple of displays to help me see what is happening:

(gdb) b *0x2007378
(gdb) display /i $pc
(gdb) display /x $i1
(gdb) r

The break point gets hit a number of times and the arguments all look ok so just continue. On the 6th hit of the breakpoint we get something that does not look ok:

(gdb) c
Breakpoint 8, _Heap_Protection_block_check_default (heap=0x2012a30, block=0xffffffdc) at c/src/../../cpukit/score/src/heap.c:149
149 if (
2: /x $i1 = 0xffffffdc
1: x/i $pc
=> 0x2007378 &#60_heap_protection_block_check>default+4&#62: ld [ %i1 + 4 ], %g1

A back trace this time is much better:

(gdb) bt
#0 _Heap_Protection_block_check_default (heap=0x2012a30, block=0xffffffdc) at c/src/../../cpukit/score/src/heap.c:149
#1 0x0200c82c in _Heap_Protection_block_check (heap=0x2012a30, alloc_begin_ptr=) at ../../cpukit/../../../sis/lib/include/rtems/score/heap.h:625
#2 _Heap_Free (heap=0x2012a30, alloc_begin_ptr=) at c/src/../../cpukit/score/src/heapfree.c:119
#3 0x02007d08 in _Objects_Extend_information (information=0x2012b18) at c/src/../../cpukit/score/src/objectextendinformation.c:224
#4 0x02006cc8 in _API_Mutex_Initialization (maximum_mutexes=1) at c/src/../../cpukit/score/src/apimutex.c:23
#5 0x0200672c in rtems_initialize_data_structures () at c/src/../../cpukit/sapi/src/exinit.c:125
#6 0x0200137c in boot_card (cmdline=) at c/src/lib/libbsp/sparc/erc32/../../shared/bootcard.c:163
#7 0x02001158 in zerobss () at c/src/lib/libbsp/sparc/erc32/../../sparc/shared/start.S:334
#8 0x02001158 in zerobss () at c/src/lib/libbsp/sparc/erc32/../../sparc/shared/start.S:334
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


Walking up the stack with the 'up' command until we end up in the _Object_Extend_information call:

(gdb) up
#3 0x02007d08 in _Objects_Extend_information (information=0x2012b18) at c/src/../../cpukit/score/src/
objectextendinformation.c:224
224 _Workspace_Free( old_tables );
(gdb) p old_tables
$16 = (void *) 0x0

It would seem _Object_Extend_information is calling the workspace with a NULL which should be ok or it use to be ok. I chat with Joel and he informs me the code in the interface to the workspace heap has changed and this has exposed some bugs. A check of the code in the heap free call shows the heap protection check is being called before the block pointer has been validated. This also explains why Joel does not see the problem. I built RTEMS with the debug configure option. There are other cases so I will need to perform a careful check of all the heap code to make sure we are correct.

It looks like I am not the only one who has the problem. Peter Dufault has just posted to the RTEMS user list:

http://www.rtems.org/ml/rtems-users/2011/february/msg00142.html

The PR Peter has kindly raised is:

https://www.rtems.org/bugzilla/show_bug.cgi?id=1746

I was talking with Joel about the changes when I noticed the heap extend now allows discontinuous memory regions. I did not know this was allowed and I had been assuming the memory had to be continuous because of the code in _Heap_Is_block_in_heap. A check of rtems_region_extend shows it uses heap extend and its documentation states memory must be continuous. I have raised a PR to handle this:

https://www.rtems.org/bugzilla/show_bug.cgi?id=1747

Back to Joel.. Chris' discussion should provide some insight into how an free software project and community work. I made a modification to move some scattered NULL checks before calls to _Workspace_Free into that routine. Sebastian Huber noted that since NULL pointers shouldn't be processed by _Heap_Free, so the check was technically redundant so he removed it. This exposed a latent bug in _Heap_Free when passed an invalid address and debug checks were enabled. At the same time the Chris and I were tracking this down, a user tripped the same bug and filed a PR. In reviewing the code, Chris and I found other cases which could cause the same fault and a disconnect between extending the heap and checking whether a block was in the heap. This was a side-effect of recent enhancements and had never been caught by a user. So we see a community coding together, reviewing each other's code, and working together to resolve an issue.

It is important to note that this bug was only present in the RTEMS Development Head. It cannot occur in released versions.