Leaving untested code is stupid, shortsighted, and irresponsible (B. Beizer)


Testing in software industry is a typical activity that everyone involved recognizes as both mandatory in theory and so often neglected in practice. Many organizations continue to develop software without a minimal testing plan. Sometimes even without a clear/detailed project specification. The classical excuse for this “lack of materials” is that there are not enough resources (money, time, people, skills) to address them in the first place. Concerning testing, most development teams simply have no idea of how to properly test a system in a professional way. A minimum requirement for testing is the ability to debug step-by-step the code base at list once for each instruction, or to write some test code according to a Test-Driven Design methodology.

The reality is that without a significative background on software testing techniques, even utilizing an automation tool results in a sterile activity. No tool is able to tell you what to test (first) or how to design test cases. Unfortunately, as happens for other stages in a project lifecycle which are mainly (and in my opinion erroneously) driven only by CASE tools, the idea to perform “appropriate” testing coincides with the ability to run “some” tests in NUnit until a green light reassures people that all is fine. Of course, this is not enough. Too many fundamental questions about testing remain unanswered using this approach. One of these questions is “how many test cases I need to properly test a particular functionality?” Proper white-box testing requires at least some minimum coverage criteria. Any attempt to introduce a serious testing activity in a software project without a coverage criteria is a clear sign of an hobbist approach. Of course, a bunch of tests can provide value by itself, but this level of software quality assurance is simply insufficient and inadequate for industrial systems.

What I mean using the term coverage is to identify a set of tests (with respect to a module/subsystem/system) that has the potential for executing every instruction and taking all branches in all directions (paths) in the code. According to Boris Biezer, complete coverage is the first minimum mandatory requirement for serious testing [1]. Moreover, he also stresses the need to perform such level of testing without  taking refuge in pretexts such as the non-economical cost of testing:

Any testing based on less than complete coverage requires decisions regarding what code should be left untested (which is by itself a cost). Such decisions are inevitably biased, rarely rational, and always grievous. Realistically, the practice of putting untested code into systems is common, and so are system failures. The excuse I’ve most often heard for putting in untested code is that there wasn’t enough time left or enough money left to do the testing. If there wasn’t enough time and money to test the routine, then there wasn’t enough time and money to create it in first place. For what you think is code, before it has been properly tested, is not code, but the mere promise of code. […]  It is better to leave out untested code altogether than to put it in. Code that does not exist cannot corrupt good code. An untested function may or may not work itself (probably not), but it can make other functions fail that would otherwise work. […] Leaving untested code in a system is stupid, shortsighted, and irresponsible. (The italics are mine.)

Many years has been passed since the time Beizer wrote his book, but the situation is not changed a lot. I believe his message is very actual even nowadays. The problem is that many software developers continue to think that developing software systems is a mere act of writing code, compile it, debug it, and deploy it. This development cycle hides testing. This was probably a little concern when software systems were written by only one person and consisted of no more than a few hundred lines of code. This is a huge problem now, because more sophisticated techniques are required for testing a complex system, developed by many people, often distributed in different organizations or teams. The sole programming culture is not sufficient. Anf the role of models comes to light.

Considering the definition of coverage criteria provided by Beizer, which is a simple, sound metric for a (model-driven) testing activity? I believe that McCabe’s complexity metric can be useful [2]. Whereas the CCD family of metrics [3] helps to catch an overall indicator of the testing/maintenance burden, the McCabe’s metric provides a first hypotesis on the numer of test cases needed to test a module/subsystem/system fulfilling a coverage criteria. Indeed, McCabe’s metric is closely related to the number of circuits required to cover a graph. Due to the fact that any flowchart-like representation of the control flow structure is similar to a graph, it is straightforward to conclude that this metric has the potential to provide an indicator of the number of test case needed to test e.g. a piece of model dynamics expressed by an UML activity diagram. There are three ways to calculate such metric. The first one is a simple formula:

M= L – N + 2P,

where L= the number of links in th graph, N= the number of nodes in the graph, and P= the number of disconnected parts of the graph (e.g. a calling program and a subroutine).

The second method  is to calculate the number of binary decisions in a program plus one. A three-way decision must be converted in a two binary decisions, and a N-way case statement in a N-1 binary decisions. Similarly for the guard condition of a loop.

Finally, the third way, which is associated to the notion of a circuit in a graph, is to count the number of connected regions present in the graph’s topology. An interesting property of the McCabe’s metric is that the complexity of several graphs (modules/subsystems/systems) considered as a group is equal to the sum of the individual graphs’ complexity. (Of course, we focus here in the structural complexity concerning the algorithmic/dynamic representation of such modules/subsystems/systems.) Hence, as a simple test plan action, we start to build a representation of the control flow structure of a module code (using either flowcharts or UML activity diagrams). Then we transform such a representation in a graph, and finally we calculate the McCabe’s metric on it. If the number of test cases does not equal the final M value of the metric, then we have not yet satisfied the coverage criteria. In that case, there isn’t necessarily an error, but there is anyway a reason for caution, as Beizer claims:

  1. You haven’t calculated the complexity correctly. Did you miss a decision?
  2. The cover is not really complete. There is a link that has not been covered.
  3. The cover is complete, but it can be done with a few more but simpler paths.
  4. It might be possible to simplify the routine.

I like the McCabe’s metric because it is quite as simple as the count of LOCs (Lines Of Code) but provides a better measure of the structural complexity of a routine (method). It’s intuitively better because it takes into account the increase of complexity due to subdividing a routine (method), something that a LOC-based metric does not do. To be precise, as suggested by Beizer, the McCabe’s metric takes into account but underestimate the real impact of subdividing code because in such cases the structural complexity increases nonlinearly with respect to the number of parts in which the routine (method) is splitted into. However, it is better than nothing.

It is time to show how things work in practice. Consider the situation in which we want to test the main method of a simple data recorder for a hot-water healing system. We first create a flowchart (e.g. an activity diagram in UML) illustrating the control-flow logic of our recorder system. The result should be similar to that illustrated in Figure 1.

Flowchart for a simple data recorder system

(It is not the best design for such application, but this is not the point). Then we translate this flowchart into a graph representing all possible paths. We label each node with a number and each link with a letter. (In order to facilitate the mapping with the two diagrams, I have annotated the activity diagram accordingly.) A link in the graph corresponds to either an activity (process) or a connection between nodes (e.g. a loop) in the flowchart, whereas a node represents a decision point, a junction (a point in the flowchart where two or more paths join), or a combination of both. For example, the f link describes the jump backward to the the process b (Read data from environmental sensors) that can arise at the decision point 5 (Any error during validation?) immediately after the execution of process c (Validate data).

Paths for the workflow model of the data recorder system

Applying the McCabe’s formula, we have that: L=7; N=6, thus M= L – N + 2= 7-6+2= 3

The minimum number of test cases needed to test the flowchart of our data recorder system is 3. I have made an important hypotesys here: all the paths through the routine are achievable because all the predicates (statements) in the routine are uncorrelated (independent). If we have correlated data or statements, the actual possible paths are less than those provided by the McCabe’s metric.

Our next problem is to find these test cases. In our pretty simple example, it is quite forward to find them simply by manually inspecting the graph. The simplest path is probably abcde, which covers all the links except f and g. Now we need two other different paths to complete the coverage criteria: one satisfying f (the simplest one is abcfbcde), and the other satisfying g (again, the simplest path is abcdgcde). In conclusion, the set of test cases needed are:

  1. abcde
  2. abcfbcde
  3. abcdgcde

Finding a set of input values that will cause the execution of each desired path (path sensitizing) gives us a reasonable level of confidance that our testing effort was sufficient (according to our basic coverage criteria) to check the recorder funtionality illustrated here.

Bibliography

[1] B. Beizer, “Software Testing Techniques”, Van Nostrand Reinhold, 1983 [2] T.J. McCabe, “A Complexity Measure”, IEEE Transaction on Software Engineering SE-2: 308-320 (1976) [3] J. Lakos, “Large-Scale C++ Software Design”, Addison Wesley, 1993

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s