Sunday, September 06, 2009

Hard To Find Bugs - Part 1

The interesting bugs tend to stay in your memory for a very long time. My first juicy one dates back to 1993. It was my first "real" job after college. I was working for a telecom company specializing in military communications. Since I was the new grunt on the team, my first assignment was the bug that no one wanted to deal with. The defect was low priority from the customer's viewpoint, but they voiced mild irritation every few months. It was occasionally annoying, but not a show-stopper.

The system in question was ancient, even by 1993 standards. It was a mil spec voice and data communications console. The guts of the beast consisted of off-the-shelf PDP-11 hardware and various custom communication boards, complete with the requisite gobs of firmware. Yes, the core really was a PDP-11 removed from its traditional chassis. The console had many features, and one of them was providing a basic secure phone. The bug I was investigating was not in the high-end features, but in the simple phone keypad. For one out of every 100 or so key presses, the tone (DTMF) for a key would stick until the user hung up and dialed again.

After some code spelunking, I discovered that the keypad firmware sent commands via a DUART to a tone generator. The generator accepted simple commands to start and stop tones (big surprise). Solving this was going to be easy, I thought. The problem was likely a simple logic issue where the stop tone command was not always sent when it should be. After reviewing the circuit schematics for the board, I was pleased to discover that the designers had configured one channel of the DUART as a debug port, with a small header for the crusty, yet trusty 2-3-7 (TX, RX, and GND of RS-232). This configuration allowed even the village idiot to make a cable that could easily be connected to the serial port of any PC, workstation, or VTXXX terminal.

Believe it or not, there was a culture back then such that software folks had to battle with hardware designers to make small concessions such as an RS-232 port for firmware debugging. The argument was always something like "That will add 50 cents to the cost of every board! Why can't you software people get it together before we go into production?" When confronted with this in meetings, I always wanted to pipe up with "Well, let's talk about all the white wires that needed to be applied to boards X,Y and Z, after they were put into production". I never actually stated my true thoughts on the matter, since I accepted (capitulated) that it would be in vain. I was a subordinate software grunt in a world of hardware hubris.

Before I forget, I should mention one particularly horrible hardware blunder. Several hundred boards made it to production with some data lines reversed. I don't recall exactly what communications circuitry was involved (maybe an RS422 link), but the end result was that we had to work around the problem in software by mirroring every byte before writing to the transmitting device, and mirror again when reading. A simple lookup table did the trick efficiently since we were only dealing with 8 bits at a time, but from an engineering and common sense perspective, it sure did not feel right. Economically however, it was the right thing to do. I recall that there were many other less severe hardware blunders my team had to work around. There was certainly an attitude back then that hardware was never at fault. Perhaps this culture still exists. I have not worked in the embedded world since 1995, therefore I am not a contemporary authority on the subject. Further discussion on this topic is warranted, but for now, back to the bug.

My first course of action was to create a debug build of the firmware that contained critical logging to an RS-232 port. At the time, doing this seemed perfectly natural. In hindsight, it seems primitive. The board I was debugging had no file system. There was not any infrastructure to log via a network or hardware bus to some central data store. The RS-232 port was my only logging mechanism. I needed to verify that the firmware was not sending the stop tone when the annoying problem manifested itself. When I went to insert the necessary logging, I quickly realized that I was the first to need the debug port. The released firmware did not configure the debug port at all. At the time I smiled when this became apparent. Back then I loved creating C structs and unions to program hardware registers. The inane intricacy had some primitive appeal. This probably explains why I suffered from an infatuation with C++ many years later. After writing a bit of C code, a fresh compile, and an EPROM burn, I now had the ability to log debug messages via the serial port.

I did not have any reliable mechanism to repeat the error. I had to bang on the keypad many times until a tone would stick. Usually 80 or so key presses would trigger the bug. In retrospect, I probably could have saved some time by creating debug code in the firmware to simulate keypad events. Hindsight is always 20/20. What I did do however was log all the commands being written to the tone generator. I expected that logging would show when the tone did not terminate properly, the stop tone command was not sent due to a race condition, or logic bug. I would then proceed to track down the race or logic problem. To my surprise, the VT220 connected to my freshly minted debug port showed that the stop tone command was indeed being written to DUART transmit register when the bug manifested itself. Unfortunately, I was not well versed in all the nuances of the DUART, so my next assumption was that it was likely some type of hardware problem. I reserved an expensive logic probe (I think it was a techtronix) from the equipment room and began setting it up to analyze the data bus of the DUART. I expected to observe that the data lines would not contain the stop tone command bits when the problem occurred. After running a logic trace on the DUART's data bus, it became apparent that the stop command was indeed being written to the data bus correctly.

It had become very obvious that this might take a little longer to figure out. By this time, I had about 3 or 4 days invested in the problem. I now knew that the software was sending the stop tone command correctly and the data lines were propagating this command to the DUART, and yet the tone was not not ceasing. My next course of action was to use the probe to look at the bytes being sent to the tone generator. Perhaps the generator was the problem? Again using the logic probe, I monitored the transmit bytes via the TXD pin of the DUART. Ah ha! Finally progress! The stop command was never making it to the tone generator. I was happy, but I still had no solution. What would cause the DUART not to transmit bytes written to its transmit register? I had demonstrated via the software logging and the logic probe that the appropriate command was being written to the DUART. Could the DUART be defective? Highly unlikely. Millions of them were in production. There must be some internal state of the DUART such that writing a byte to the transmit register was not guaranteed to actually be transmitted. I began to pour over the DUART data sheet looking for a reason for this behavior. It did not take long to find a plausible explanation. I quickly found a likely candidate:

Bit 2 of the Status Register
TxRDY - When set, it indicates that the transmit-holding register (the one waiting to be transmitted) is ready to be loaded with a new character.


Perhaps the firmware was not checking this bit before writing to the transmit register? I then took another look at the transmitting code, and sure enough, it was not checking this bit! A quick mod, and the problem was finally solved!

Someone much more experienced with a DUART would probably have immediately thought of checking for the proper usage of the TxRDY bit. However, this was a great learning experience for me. I was new at the company, and it accelerated my knowledge of my employer's processes such as source control usage, ROM image archiving, and how to use a logic probe to debug software. It was also very educational for me to look at circuit schematics alongside code. I did not work in the embedded world for long, but the 'cool' factor of using a hardware probe to debug software will always be a fond memory.

Thursday, September 03, 2009

ORM Infatuation

Why is it that so many developers worship the ORM paradigm? Here's an educated guess:

SQL...really? You want me to write SQL code to manipulate data? How can anything from the 1970s be relevant today? Soon you'll be saying that cobol is making a comeback.

SQL...bah! It has no class keyword or curly braces. It's crap I tell you!

There's no fluent API provided by the major database vendors.

Who needs a DSL that is great at managing sets of data in an elegant manner when I can do it with so much more verbosity using languages intended for other, more generic purposes?

I want my object graph and I want it now!

Don't make me think about ad-hoc queries. In five years, no one will care about this data.

We are an agile shop, we want the ORM implementation to handle all the database design issues. Manually designing the database decreases the team's velocity. SQL is from the 1970s after all.

Five years from now, the data collected by this website will not have business value. Why should I care if the schema created by the ORM cannot be easily understood by a developer or business analyst?

The database is a vendor-specific entity, therefore evil, and must be abstracted away.

Everything decent and good in software must be object oriented, or it's crap.

Why can't we simply realize that OO normalizes behavior, while the relational DB normalizes data? Both are very important and deserve the full attention of the developer.

Sunday, February 18, 2007

Using Velocity in .Net via IKVM

The Velocity template engine has been a core library used in many Java frameworks over the years. There are a few free template engines available that are written in C#. However, they do not seem to have the level of support and development activity that Velocity has. Codesmith seems to be the leader in the .Net world, but it is not free. Since Velocity is written in Java, it cannot directly be used by .Net code without some conversion. I used IKVM to translate the velocity.jar file into a .Net dll. At this point I was not sure if the converted Velocity library would work. My concern was that Velocity operates by processing a context that contains objects (standard and application-specific) that are used to replace references in a template. I thought there might be a problem in passing .Net objects to the Java library. It seems that on most occasions when I think IKVM cannot do something, it simply just works. This is one of those situations. I coded up a test based on the velocity user guide sample (written in VB.Net instead of Java) and it worked. Great, now for something actually useful. What I wanted to do was generate a data transfer object by obtaining column names and types from an instance of SQL server (I used SQL Express). I already had a database related to a little home banking application I have been working on. The DTO object I want to generate consists of a VB class with public properties representing the table columns. The VB.Net code can be downloaded here.


The velocity template file (dto.vm) contains:

'
' Generated on $date
'

Class ${tablename}DTO

#foreach( $val in $columns )
Private _$val.GetColName() As $val.GetColType()
#end

#foreach( $val in $columns )
Public Property ${val.GetColName()}() As $val.GetColType()
Get
return _$val.GetColName()
End Get
Set
_$val.GetColName() = value
End Set
End Property

#end
End Class


The $references are replaced by the corresponding data that was placed into the Velocity context in the VB.Net code. The Velocity Template Language supports a nice set of features. You can read about all the details in the user guide. When the code is executed, the following is written to the console:

'
' Generated on 2/18/2007 9:18:41 AM
'

Class TransactionsDTO

Private _TransactionID As Int32
Private _CategoryID As Int32
Private _Amount As Double
Private _Description As String
Private _InsertDate As DateTime
Private _TransactionDate As String

Public Property TransactionID() As Int32
Get
return _TransactionID
End Get
Set
_TransactionID = value
End Set
End Property

Public Property CategoryID() As Int32
Get
return _CategoryID
End Get
Set
_CategoryID = value
End Set
End Property

Public Property Amount() As Double
Get
return _Amount
End Get
Set
_Amount = value
End Set
End Property

Public Property Description() As String
Get
return _Description
End Get
Set
_Description = value
End Set
End Property

Public Property InsertDate() As DateTime
Get
return _InsertDate
End Get
Set
_InsertDate = value
End Set
End Property

Public Property TransactionDate() As String
Get
return _TransactionDate
End Get
Set
_TransactionDate = value
End Set
End Property

End Class



Velocity's template language allows you to quickly create useful templates. This example shows generating a simple DTO, however much more complex code can be generated. I should point out that there is a port of Velocity from Java to C# (NVelocity). However, there has not been a release of this library since 2003. Activity on this project seems to be non-existent during the last few years. The original Java version however is constantly being updated, improved, and documented.

Thursday, November 17, 2005

Why does a profiler almost always point out that your intuition is wrong?

Nine times out of ten, profiling reveals your code is not spending the time where you think it is. Why is this? Usually you don't start profiling until either testing or a production environment reveals a performance problem. Most of the time after running a profiler in this situation, I'm left thinking "I never would have guessed that was going on!" Why is this? Probably because if I had thought of the particular set of circumstances earlier, I would have designed/coded for it. This phenomenon is part of a larger problem facing the software developer: How do you know your software will function properly when your customer begins to use it? The truthful answer is you don't really, no exceptions. You can derive a level of confidence, based on how close the customer's environment matches your test environment. Reality dictates that when deploying a complex software system to a new customer environment, something is going to break. Maybe the failure will be due to some low-level network parameter, or perhaps the database needs slightly different configuration because the RAID device in use is faster or slower than the reference test system. The list of possibilities is virtually endless. It all comes back to the fact that your testing only accounts for the situations that you thought of and is valid only within the environment you were in control of. Does this mean testing is useless? Of course not. It simply means that you need to be prepared for dealing with the unexpected failures. This is where debugging skills become critical. Knowing what tools to use can make all the difference. In these circumstances, sometimes you need to let your intuition take a back seat to the hard data that debugging tools can provide.

Saturday, October 08, 2005

JINI...Is it relevant anymore...was it ever?

For most of its life, JINI cost too much and was somehow coupled (marketing wise) with J2ME, which was also priced into oblivion. Many years ago (1998 or so), I was involved in a project to create some networking infrastructure to ease the creation of distributed applications (a lot of wheel re-invention, but at the time we could not afford the accepted enterprise solutions). We also had to deploy on multiple operating systems. Most of my employer's software was written in C[++] back then. CORBA was well known to do what we needed (and more). However, ORBs were still too costly, and like many companies then, my employer was terrified of deploying open source. The company was dabbling with Java, and this is when I first came across JINI. It seemed to me then that JINI was little more than a Java-centric version of CORBA's naming and trading services. I realize it probably does more than this, but at the 10,000 ft level, I'd say this is an accurate description. I looked into what it would cost to license JINI, and I was blown away by the cost. There were on-going royalties that I could never get my employer to agree to. For a long time I dismissed JINI simply based on cost. Recently however (earlier this year), Sun has provided JINI under an Apache license. Perhaps it's time to give JINI a second look? It appears however, that all that is free is a starter kit. This implies to me that it is missing something you will probably need in a production environment.
Will Sun ever learn? Sun has been doing a similar thing with JMX. Sun offers a product named Java Dynamic Management Kit (JDMK). It is also priced outside the reach of most software shops. I think it is a shame that the JMX specification did not mandate SNMP support. Yes, it's an old, crusty protocol, but if you are doing any sort of real-world enterprise OA&M, your code will have to acquire a large amount of its data via SNMP. I should mention that TCP/UDP is also an old, crusty protocol.

Friday, September 02, 2005

The Problems with C++

Don't get me wrong, I think C++ was great back in the day. I simply don't think it's the most productive language to program in these days. The following summarizes my thoughts on what is wrong with with C++.

Threads and Network I/O
For many years now, whether you are doing server-side, client-side or embedded dev, you need two fundamental support elements from the programming environment: threads and network I/O. C++ has had neither of these. For reasons unknown to me, the C++ standards committee thought these features should be something left for non-standard libraries to implement. Perhaps this was due to the fact that historically many operating systems did not support these critical features (well maybe only a few did not). This has not been true for a very long time. I think that C++ not directly supporting threading or networking has been a huge /dev/null for productivity since you must either roll your own library or find some non-standard library that may not be available for all the platforms you need to support.

Portability
Historically speaking, C is portable (if you know what you are doing), C++ is not. Keep in mind I said historically. C++ portability was horrible until a few years ago. Yes, I said a few as in three (it's 2005 at the time of this writing). I feel that this is not the fault of C++ as a language. Compiler support of the draft standard varied wildly. Things only got better a few years ago because compiler vendors finally caught up with the standards ratified in 1998. Even the beloved gcc did not fully support C++ until relatively recently. Compared to other compilers, gcc was way behind in C++ support for quite some time. The current incarnations of the GNU C/C++ runtime libraries still treat multithreaded C++ as a second class citizen. I have been bald for many years. Perhaps some of this is due to spending long hours trying to work around thread termination bugs in C++ runtime libraries. I don't mean to pick on GNU code...Sun's Forte/Workshop compiler was complete garbage for anything other than trivial C++ until around 2002. Microsoft's C++ compiler did better in the area of standards compliance, but it still had some severe bugs, in particular some show stoppers in its STL implementation. You would typically not see them unless you were running under load on a multiprocessor system. These severe problems were corrected only 2 years ago! Microsoft never seemed to care, their internal C++ code was using ATL. STL implied portability...perish the thought.

STL is Great, but...
It's based on C++ templates. This is its greatest strength and weakness. STL was a huge productivity boost when compared with writing code in straight C. However, you could never safely expose STL in public interfaces, simply because they were templates, generated at compile time. Any change to a template implementation requires a complete rebuild of all clients using the interface. Furthermore, it requires that all clients be built using the same version of STL. In distributed systems and integrations with third parties, this is highly undesirable and sometimes not feasible. You can work around this problem by not exposing templates in your interfaces and employ the bridge pattern. This is a lot of tedious work however. Templates are very powerful and useful, but they must be used with care when designing library interfaces. If the export keyword was ever implemented by compiler vendors, most of this situation might be resolved. I don't think the standard ever mandated it.

Name Mangling was Never Standardized
Every compiler vendor creates proprietary encodings of method names to support overloading and namespaces. Because of this, a shared .so or .dll built with one compiler cannot be used by another compiler. Couple this with the fact that STL cannot be safely exposed in your library interfaces and you now have serious problems when needing to interact with third-party code. That is of course unless everyone is using the same compiler and STL implementation.

Standard Runtime Never Provided a General Purpose Smart Pointer
Yes, there is std::auto_ptr, but this is generally useful only for exception safety. You can't use it in STL containers. I assume C++ never defined a standard smart pointer as part of the standard library since it did not want to deal with threading issues? Writing a good, robust, inheritance-friendly, reentrant smart pointer implementation is not trivial. This should have been standardized years ago. Disciplined use of smart pointers in C++ code can eliminate virtually all stability problems due to memory leaks and heap corruption. Smart pointers are also critical for effective use of STL containers in multithreaded apps. In the absence of garbage collection, the smart pointer is critical.

Future Directions
I read recently on slashdot that a new C++ spec is in the works. Although I have not seen the draft version, the areas that Stroustrup commented on seemed to address some of the items I have mentioned here. It's probably too little, too late however. I think many areas of the software industry have long since migrated to platforms such as Java and .Net and various other programming languages. Perhaps platform is the key term here. I don't think the designers and maintainers of the spec ever wanted C++ to become a platform in the sense that Java and .Net are. I should probably read Stroustrup's book on the design of C++. This would probably give me a historical perspective on why C++ is what it is.






Monday, August 22, 2005

ANT and XML

I have always thought that XML did not fit very well as a vehicle for creating build scripts. It's like trying to fit a square peg into a round hole. This is a prime example of XML as a golden hammer. The originator of ANT, James Davidson, has admitted XML was a poor choice. You can read about it here