Academia.eduAcademia.edu

Systematically testing a real-time operating system

1995

AI-generated Abstract

Systematic testing is essential for ensuring reliability and quality in large, complex software systems, particularly for real-time operating systems (RTOS). This paper discusses methodologies applied to the testing of the CLEAR RTOS, developed as part of the ESPRIT III project 8906. Emphasizing the need for a structured approach, the study details various testing strategies, including black-box and white-box techniques, and highlights their importance in revealing software errors and affirming system performance and reliability.

zyx Systematically Testing a Real-Time Operating System zy zyxwvu Testing large and complex software is an inherently difficult process that must be as systematic as possible to provide adequate reliability and quality assurance. This is particularly true for a complex real-timeoperating system, in which an ad hoc testing approach would certainly fail to afA+m the quality and correctness of the requirements specification,design, and implementation.We discuss applying systematicstrategies to the testing of real-timeoperatingsystem RTOS under development in the Esprit IIIproject8906 OWCLEAR. zyxwvutsrqpon zyxwvutsrqpon zyxwvutsr Manthos A. Tsoukarellas Vasilis C. Gemgiannis KoJtiS 0. Economides Advanced Informatics Ltd. oftware testing is one of the most significant activities in the software life cycle and must affirm the quality of requirements specification, design, and implementation. A company developing software spends about 40 percent of a project’s cost on testing activities.’ In exceptional cases, such as safety-critical real-time software, the testing phase may cost from three to five times as much as any other phase of the software life cycle. Usually, testing begins early in the development process, as test planning and specification overlap, to a certain degree, with requirements specification. Testing activities demonstrate that software is consistent with its specifications. Therefore, as test results accumulate, evidence indicating the level of software quality and reliability emerges. In particular, if testing often detects important errors, the software’s quality and reliability are probably inadequate and further testing is required. On the other hand, if the discovered errors are minor and easily correctable, then either the level of software quality and reliability is acceptable, or the executed tests are inadequate to reveal software errors. Testing can never assure that a program is correct because undetected errors may exist even after the most extensive testing. Therefore, the common view that a successful test is one that reveals no errors is incorrect, as the following general testing goals Testing is the process of executing a program with the intent of finding errors. A good test case has a high probability of finding an undiscovered error. A successful test uncovers an undiscovered error. Aspects of general testing strategies and techniques also apply to real-time software. However, additional requirements and difficulties characterize an effective testing process for real-time software, especially real-time operating systems (RTOSs), for several reasons:*.: The software contains several modules and decision statements. Many modules use the same resources simultaneously. The same sequence of test cases may produce different outputs, depending on when the test is performed. System errors may be time dependent, only arising when the system and controlled environment are in a particular state that may be impossible to reproduce. Finally, reliability, schedule, and performance requirements are usually more critical than those for non-real-time software. This article focuses on a well-established methodology to test the functional behavior of 0272-1732/95/$04.00 8 1995 IEEE zyxwvu zyxw zyxwvuts zyxwvutsrq zy an actual RTOS. The system is being developed as part of the ESPRIT 111 project 8906 OMI/CLEAR (Open Microprocessor Systems Initiative/Components and Libraries for Embedded Applications in Real-time).6CLEAR RTOS is a scalable executive for embedded applications. It supports multiple configuration levels (from hardware and basic VO services to higher level ones) to establish the correct balance between efficiency (performance/size) and required services. The strategyfollowed for testing CLEAR RTOS is systematic and includes individual function, module, and bottom-up integration testing, using both black-box and white-box disciplines. We chose this approach because not all modules are available from the beginning of testing activities. A systematic way to determine test cases satisfies the requirement for low redundancy (in test cases selection) and high reliability (in test results). In addition, this method incorporates oftenused coverage criteria and software complexity metrics. Issues related to RTOS timing behavior (schedule analysis and performance testing) are beyond the scope of this article. However, Gerogiannis and Tsoukarellas discuss such issues.’ Software testing techniques We base testing specificationon methods that guide testers to follow a systematic approach. These methods provide criteria for determining appropriate test cases, to ensure test completeness and guarantee a high probability of discovering software errors. Using these methods, we derive test cases from specifications or by code examination. The test methods corresponding to these approaches are known as black- and white-box testing (see the box, next page). Software testing strategies These problems motivated the strategy adopted in the OMKLEAR project, in which two independent groups perform development and test. The groups cooperate closely to achieve a highquality final product. In general, the OMVCLEAR project’stesting strategy consists of three phases: individual function, module, and integration testing! Individual function testing. This strategy focuses on each individual function. Testing a function in an isolated manner means that we do not attribute operations performed by calling other functions to the function under test. In this case, the description of the calling function should identlfy the called functions. In the beginnirg, we apply the blackbox technique to test each function’s interface according to the corresponding inpuUoutput specifications. Afterwards, we apply the white-box method to test the paths in the function’s source code. During this phase, drivers and/or stubs may have to be constructed for each individual function test. A driver accepts the test case data as input and passes it to the function. A stub, which we place in the code because some functions are not yet available, simulates functions that are immediately subsequent (lower level) in the control flow to the tested function. Module testing. Next, we combine the already tested functions to compose modules (tightly coupled functions), using either the top-down or bottom-up method. These two approaches (described in the following Integration testing section) apply to module testing if we exchange the term function for module, and module synthesis for integration process. That is, the integration of functions into modules is similar in concept to the integration of modules into larger ones to construct the entire program. In module, and subsequently, integration testing, we use the black-box technique because the main purpose of module and integration testing is to uncover interfacing errors.’ In addition, the code to be tested tends to be very large as we form and integrate modules, so white-box testing becomes extremely complicated. The essential difference between testing the individual function and testing the module of a function that calls other functions is that in the first strategy, testing extends only to the calls to other functions; it does not account for the operation of those functions. On the other hand, the second strategy checks the module as a whole, all the way to the operations performed by the called functions. Integration testing. Third, we integrate the already tested modules into larger ones. There are two approaches to merging modules during this phase: nonincremental and incremental. The nonincremental or “big bang” approach tests a program by testing modules independently and then combines all the modules together to test the program as a whole. This approach does not facilitate revisions since we cannot easily isolate errors. zyxwvut A strategy for software testing incorporates a set of activ- ities organized into a well-planned sequence of steps that finally affirms software quality. The initial important decision in the process is determining who will perform testing. Pressmarl’ discusses the various inherent problems associated with allowing software developers to test the product. From a psychological point of view, software analysis, design (along with coding) are constructive tasks. However, when testing commences there is a subtle, yet definite, attempt to ‘break what the software engineer has built. From the point of view of the builder, testing can be considered destructive. So, the builder designs tests that will demonstrate that the program works. The role of an independent test group (ITG) is to remove the inherent problems associated with letting the builder test his own product. However, the developer and the ITG have to work closely throughout a software project to ensure that thorough tests will be conducted. October1995 51 zyxwvutsr zyxwvuts zyxwvutsr zyxwvuts zyxwvu zyxwvutsr zyxwvu zyxwvut zyxwvuts Black-box (Functional) testing does not consider the internal structure and behavior of the prosramp but examines only its input-output behavior. Black-box testing methods are based on the functional requirements specification of the software and determine whether the software behaves as specitled. We construct test &ta fiom the specifications and, in general, there are three methods for deriving the appropriate test cases. The equivalence partitioning method divides the input space of a program into equivalence classes to minimize the number of possible test cases.Test cases are considered adequate even when choosing only one value from each class, since all the values of a class exercise the same hnctionalities and are considered equivalent. A good test case reveals a class of errors that might otherwise require execution of many cases before observing the general error. Therefore, this method achieves a low level of d u n dancy in test case selection. Boundary value analysiscomplements equivalence partitioning and leads to the selection of test cases that exarise boundary values; that is, values on and near the boundaries of input equivalence classes. h many cases. thesevalues are responsible for emneow program behavior. Random testing selects test cades either randomly or by sampling the distribution of all possible input values. It is usually used during the final testing stages. White-box (structural) testing examines the program structure and derives test cases from program logic.’ We observe the procedural detail closely and test logd+aths (decisions and loops, for example) throughoutthe sdiwmz. Unlike black-box testing, which identifies mostly interfirr errors, white-box testing produces test casesthat guaranoe A flow graph depicfs all theoretically possible paths in the program, even if some of them cannot be traversed b u s e of the specific combinations of conditions in the p”’s decision statements. A flow graph consists of a single start node and one or more end nodes. Any other node lies on at least one path between the start node and at least one end node. A flow graph Mers from a standard low-level flowchart in that it emphasizes decision and branch statements as the graph nodes--the critical points in a program’s control logic. White-box testing determines a h t e number of paths (white-bax test cases). Aftetwards, the successfulexecution of h s e paths accordmg to the appropriate test inputs d c~mrtheflawgraph. It is important to detffmine the degree ofgnph coverage that ahcts the global testing coverage. Whitebox testing “ d o l o g y consists of four phases. First, we construct the flow graph directly from the s~llcecode. secorpd,we’select a finite set of paths of the flowgraph accordingto one or more coverage criteria. The most commonones (hwn the weakest to the strongest) are Statement coverage requires all statements in the graph to be execut$ at least once. * Node coverage requires the test to encounter all decision node entry points. Branch coverage requires the t e s to encounter all exit branches of each decision node. This criterion is con- zyxwvutsr traversal of all independent paths within a module at least once; exercise of all logical decisions on their true and false sides; execution of all lqps at their boundaries and within their operational bounds; and exercise of internal data structures. The flow graph’” is a common graphical representation of the control flow. A flow graph consistsof nodes representing one or more program statcmentd that execute in sequences and arcs (called edges) that repexmt the flow of control. These edges are analogous to flow chart 111~ws. The incremental approach tests a module in combination with the set of previously tested ones. This constructs and tests the final program incrementally and systematically.Such sidered to provide the lowest acceptable confidence level for the white-box approach. It includes the previous two criteria, since it reqyires executing every statement and encountering every node by exercising each branch in a program. Path coverage requires all possible paths to be executed. This is the strongest but least practical criterion since the combination of all indi+idual paths increases exponentially with the number of decision statements. In addition, infinite loops make the number of possible paths unbounded, so the method considers equivalent all paths that differ only in the number of loops. Third, we generate test cases.This is the most compIicat& phase and concerns the determination of the test inputs that awe execution ofthe previously selected paths. ~ast, we execute the program using the test cases‘- compare the actual output to the expected (specified) output. an approach emphasizes testing the interfaces among the combined modules. The main incremental-integration approaches are top down and bottom up. zyxwvutsrqp zyxwvutsrqp zyxw zyx zyxwvutsrqp Top down. We integrate modules by moving downward through the control hierarchy, beginning with the main control module (main program). The integration process takes five steps: Host, target and bahavioral testing 1. We use the main control module as a test driver and substitute stubs for all modules directly subordinate to the main control module. 2. Depending on the integration approach selected (that is, depth or breadth first), we replace subordinate stubs one at a time with actual modules. 3. We conduct testing. 4. After completing each set of tests, a real module replaces another stub. 5 . Testing continues with regression testing (that is, conducting all or some of the previous tests) to ensure that the testing process has not introduced new errors. The process continues from step 2 until we have built the entire program structure. This strategy should be used with top-down design. Strict top-down testing can be difficult because it may be virtually impossible to produce a program stub that accurately simulates a complex function. Bottom up. This method involves testing modules starting at the lower levels of modules and moving upward in the hierarchy.As we integrate modules from the bottom up, processing required for modules subordinate to a given level is always available, eliminating the need for stubs. On the other hand, we must construct drivers to present lower level modules with appropriate input. Bottom-up integration calls for 1. combining low-level modules into clusters that perform a specific software subfunction; 2. writing a driver to coordinate test case input and output; 3. conducting cluster testing; and 4. removing the drivers and then moving upward in the program structure (by combining clusters). zy zyxwvu zy , zyxwvut The advantages of top-down testing constitute the disadvantages of bottom-up testing and vice versa. In general, the lack of stubs in bottom-up testing makes test case design easier. We discuss other aspects of real-time softwaretesting in the Host, target, and behavioral testing box. CLEAR RTOS description Usually. real-time software testing involves host and target computem." The latter is the real-time system controlling the activities of an ongoing process, while the former constructs programs for the target and is usually a commerciallyavailable computer. Such a computer usually contains a cross compiler and/or cross assembler, a linking loader for the target, and an instruction level simulator.The characteristic phases of typical realtime software testing are host and target testing. Host testing's goal is to meal error^ in software modules. Most of the techniques we b e for testing on a host computer are the same as for non-real-time applications. The full systemis rarelytestedon the host, as wecandiscover only logical, and not timing,errors. A n instruction level simulator may detect some target-dependent errors and errors in support software (for example, in the compiler t a r g e t d e generator or assembler) on the host. In target testing, we conduct individual function testing first, followed by module and integration testing, which is sometimes performed using an environment simulator to drive the target computer. An environment simulator is an automated replication of the external system world and is of most use Ih testing real-time applications in which a specific environment exists and has been specified. A commonly used practice in testing a real-time system is to examine the system's reaction to a single event (behavioral testmg). Mer testing a slngle execution path, we can then introduce multiple events of the same dass without introducing events of any other class. The process continues to test a single class at a time, and then progresses to more than one class of events occurring simultaneously,in random order and with random frequency. At this stage, we should introduce new event tests gradually so that we may localize system errors. To describe the basic characteristics of CLEAR RTOS,6 we first logically divide the system into two parts: The high-level part is independent of underlying hardware and consists of two layers (system calls, which are accessible by the user, and internal functions). Applications may interact with RTOS through different -_ ~~ ~ profiles, according to specific required services and size limitations. These profiles are the basic system and Posix-compliant profiles.12(Posix refers to IEEE Std 1003.1, Portable Operating System Interface for Computer Environments.) The common, low-level part consists of hardwaredependent functions on which we can implement the high-level machine-independent ones as a library. This scheme allows RTOS to run over a variety of processors as only the common low-level functions have to be ported to each target. Example processos are the m 6 , ARM (Advanced RISC Machines) Limited's general-purpose, 32-bit RISC micrcprocessor, and SGS-Thomson's Sl-9, a 8416-bit microcontroller. October 1995 53 zyxwvutsrqp zyxwvutsr zyxwvutsrq zyx zyxwvuts zyxwvu The architecture of the OMVCLEAR real-time libraries supports two execution modes: urcer modeswhich does not allow access to the hardware, protected memory areas, or registers; and system or supervisor mode, which the system uses to perform all crucial operations.In the double-execution mode,tasks are only allowed to access the system in a controlled way through a system call, which involves a trap, a change in the processor’s priority, and a stack switch. This method increases system security but affects execution speed. On the other hand, application programmers may select a single-mode scheme in which all parts of the application (even system-independent ones) execute in system mode. Finally, RTOS offers a variety of services for scheduling (including conventional, real-time, priority-based, and preemptive scheduling algorithms such as FIFO, round robin, and rate monotonic), task management, synchronization (using semaphores or events), memory management, intertask communication (via signals or ports), input/output, and interrupt handling, to name a few. Each group of services supports different configuration levels, which allows the user to customize services and avoid wasting space and time, as the system does not allocate memory or link code for unwanted resources. For example, if an application does not require buffer pools, selecting the corresponding configuration level and rebuilding the system will exclude all code related to buffer pools. Users can follow the same procedure to select scheduling services; five configuration levels are available: JumpStart development environment). We have also tested the Posix functions on a DOS-based host wider simukttion, as the S T 9 was unavailable at the time of test. As a result, we have not tested certain Posix and common low-level functions specific to the ST9. The checklists we produced contain explicit references to different configuration levels and options with different expected program behaviors only when the specifications contain such references. In all other cases, we selected the most complete system configuration to test all of the source code applicable to the ARM6 target processor. In the sourcecode listings for the white-box tests; boxes enclose sections of code corresponding to a particular configuration option. The strategy for testing CLEAR RTOS includes individual (unit) function testing, as well as incremental bottom-up module and integration testing. We first identlfy all functionalities from the specificationsand form checklists,b.8then classlfy each function and consequ.ently all its functionalities into one of the following functional areas: system management, scheduler, task management, interrupts, memory management, semaphores (Posix mutual-exclusion mechanisms), message queues (Posix mailboxes), input/output, time management, signals, or events. Module and integration testing~followstest case design and the individual testing of each function. Every module is integrated into the system when ready. Meanwhile, a module’s functionalities (if needed) are temporarily replaced by an appropriate stub or driver to ensure correct coordination among modules. Our method adopts an incremental bottom-up approach because the development is bottom-up, from low- to high-level parts, and testing and development activities must proceed, to a certain degree, in parallel. This strategy makes test case design easier and finds errors in critical modules earlier. We have not considered the pure top-down approach suitable since our project’s timing constraints require proving the feasibility and practicality of the most critical modules (the scheduler, for example) early. Sometimes, a lower level module is not available when we are testing higher level ones that need it. This is why we use a modified bottom-up technique, when deemed necessary, to minimize delays. Our method treats every function of RTOS as a black box. Thus, its aims,are to construct tests for each independent entity according to its specifications and determine whether the inputloutput behavior of each function meets its specifications. Black-box testing alludes to tests we conduct at each function interface, and demonstrates that all functions are operational (meaning that input is pioperly accepted, and output, both in the form of actual returned values and side effects, is produced correctly). A point of interest in blackbox tests of individual functions is that many RTOS functions perform certain operations not by themselves, but by calling other functions. This stems from the highly modular nature zyxwvuts Explicit scheduling-The running task explicitly reactivates another task. This option does not support the system clock. Priority scheduling-The scheduler works on a priority basis, handling tasks residing in multiple priority queues, and is activated by explicit calls to resume and suspend. This option also does not support the system clock. Complete scheduler-This option supports a complete time slicing and round-robin scheme. Special behaviors--Users can dynamically define scheduling policies, even on a per-task basis (preemption can be disabled, for example). Task accounting-An accounting mechanism traces the number of times a task has been scheduled and the number of clock ticks it has been running RTOS testing methodology Our methodology uses the single-mode scheme to allow direct access to the whole system. Tests cover the basic system and common low-level functions on both the ARM6 and a DOS-based host computer (using the emulator from the 54 IEEEMicro zy zy zyxwvuts zyxwvutsr zyxwvuts zyxwvu zyxwvutsr of RTOS, which is necessary to provide added flexibility and suppofi for multiple targets. As a result, we have to enhance the aforementioned generic definition of a function’s interface to address this type of function. So,the presence of function calls in black-box individual function tests is justified. For white-box testing, we selected the branch coverage criterion to handle the size and complexity of RTOS. Ideally, we would use the white-box approach to test both individual functions and modules. However, as modules get more complex, white-box testing becomes impractical because it requires defining all logical paths inside each function, exercising them, and evaluating the results. So the white-box technique is only for the individual function tests of all basic system, Posix, and common low-level functions (except those coded in assembly language). We perform the test of each individual function or module both positively (testing on normal inputs) and negatively (testing the system’s reaction to abnormal inputs). Furthermore, if we can identify input cases not reported in the specifications, we design unspecified test cases. For unspecified test cases, we should make a distinction between system calls and internal (kernel) functions, as there is a different approach between the design and testing points of view. Although it may be a design decision that RTOS performs no parameter checking for internal functions, it is important for testing to remain consistent and exercise every function using incorrect input, including functions that the user may not directly call. It is safer to check each function independently, without making any assumptions as to whether it is called by another function with correct or incorrect parameters. Bearing this in mind, the unspecified test cases for internal functions do not infer an omission by the design group, but only reflect a design decision. Therefore, we should dis- Table 1. Checklist for sys-create-semaphore function. Functionality 1 2 3 4 5 Description If count is illegal, set error-number t o BAD-ARGUMENT-1 and return SYSTEM-E RROR . Get the index of the first free entry. If no semaphore is currently available, set error-number t o NO-MORE-SEMAPHORES and return SYSTEM-ERROR. * Save the index of the next free semaphore. Initializethe semaphore counter with count. Mark the semaphore table entry as used by a user semaphore and return its identification number. tinguish these two categories of unspecified test cases. Thus, a carefully selected and systematic combination of black- and white-box testing maximizes test coverage while keeping test complexity at an acceptable level. ,* Examples of individual function testing zyxw zyxwvu We present examples of both black- and white-box individual function tests of the same function (sys-create-semaphore). These examples show that the methods discussed earlier are not alternative ones, but investigate different aspects of the tested function. Black-box testing. To test each function, we construct a checklist and test specification table (see examples in Tables 1 and 2). The checklist contains the function operation, iden- Table 2. Black-boxtest specificationtable for sys-create-semaphore function. Tested functionallties Given input Type 1 Count = -32768/-1 N 2 Count = 0/32767 (no free semaphore slots) N 4 5 Count = 0/32767 (free semaphore slots, first free semaphore id = 5) P Expected output SYSTEM-ERROR (error-number should . change to BAD-ARGUMENT-1 ) SYSTEM-ERROR (error-number should change t o NO-MORE-SEMAPHORES) Semaphore-id (created semaphore’s counter should be set to count; the semaphore should be marked as user semaphore) Actual output OWnot OK SYSTEM-ERROR (error-number has changed t o BAD-ARGUMENT-1) SYSTEM-ERROR (error-number has changed to NO-MORE-SEMAPHORES) 5 (created semaphore’s counter has been set to 0/32767; the semaphore has been marked as user semaphore) OK OK zyxwv OK October1995 55 zyxwvutsrq zyxwvutsrq zyxwvutsrqponmlkjihgf tification, arguments, return value, and a table of functionalities (functionality numbers and descriptions). The test specification table lists tested functionalities, given input, type (positive, negative, or unspecified), expected output, real output, OK or not OK. An actual test, which is represented by a row in the test specification table, proceeds as follows: We examine the function’s operation with the given input. Whatever is found in parentheses in the given input column corresponds not to an actual argument of the function, but to either a variable or a condition that affects the functionalities under test. Thus, its definition is necessary for test execution. The equivalencepartitioning approach has identified two equivalence classes for the input argument count, one in which count is greater or equal to zero and one in which it is negative. Since count is an integer (and represented by 2 bytes), the first equivalence class ranges from 0 to 32,767 and the second from -32,768 to -1. In these ranges, we selected the two boundary values (instead of using random ones) to account for the boundary value analysis complement to the equivalence-partitioningapproach. The expected-outputcolumn presents the expected results int sys-create-semaphore(int count) f of the function’sexecution with the given input. These results consist of the function’s return values (if any) plus certain actions (enclosed in parentheses) that have taken effect. The table does not present variables appearing inside the funcset-error(BAD-ARGUMENT-1); 2 tion’s code because the black-box approach does not conreturn SYSTEM-ERROR; sider them. However, a variable. altered by side effects, according to the specifications, may appear in parentheses (as it is not an actual return value). The examples in Tables 1 and 2 show that black-box testing cannot check the third set-error(N0-MORESEMAPHORES); 4 functionality “save the index of the next free semaphore,” return SYSTEM-ERROR; since variables inside the function code are not monitored. Function sys-create-semaphore creates a semaphore. The argument count represents the semaphore counter, and the semaphores [index].counter= count; semaphores [indexlstate = USER-SEMAPHORE; function returns either the semaphore identification number return index; or SYSTEM-ERROR if the creation fails. I White-boxtesting. In white-box testing, the only difference in the structure of the test specification table is in the first Figure 1. Source code for sys-create-semaphore. column, now called tested paths because the focus is on exercising control-flow paths instead of specific functionalities.In addition. the contents of such a table differ because we must observe all variables inside the function’s code and the table must present 1 their values. From the function’s I source code (Figure 1) we construct the correspondingflow graph (Figure return SYSTEM-ERROR; 2). Next, we select the test paths according to the branch coverage criterion. In this simple example, the selected paths are actually all possible ones. We monitored the execution of the function on the test cases 4 using the ARM6 debugger to deterset-enor (NO-MORE-SEMAPHORES); mine the exact path traversed. return SYSTEM-ERROR; We used the McCabe metric v (cyclomatic measure) to determine firs-semaphore = semaphores[index].counter; RTOS complexity and consequently semaphores[index].counter= count; semaphores[index].state = USER-SEMAPHORE; the complexity of testing? The metric’s return index; formisv(f3 = d+ 1,where Fisthe p n gram’s flow graph and dis the number of binary decision nodes in F. This Figure 2. Flow graph for sys-create-semaphore. measure is an upper bound on the zyxwvu zyxwvutsrqp zyx zyxw zy 56 /€€€Micro I zyxwvutsrqpon Table 3. White-box test specification table for sysrreate-semaphore function. zyxwvutsrqp zyxwvu zyxw zyxwvutsrqp ~ Tested paths Given input Type 1-2 Count = -3276W-1 N Count = 1132767 (no free semaphore slots) N Count = 1132767 (free semaphore slots; first free semaphore id = 5) P 1-3-4-5 1-3-5 Expected output SYSTEM-ERROR (*error-pointer t BAD-ARGUMENT-1 ) SYSTEM-ERROR (*error-pointer t NO-MORE-SEMAPHORES) Semaphore-id (semaphores[5].counter t 1132767) (first-semaphore t 6) (semaphores [5].state t USER-SEMAPHORE) Actual output OWnot OK SYSTEM-ERROR (*error-pointer = BAD-ARGUMENT-1) SYSTEM-ERROR (*error-pointer = NO-MORE-SEMAPHORES) 5 (semaphores [5].counter = 1132767) (first-semaphore = 6) (semaphores [5].state = USER-SEMAPHORE) OK OK OK number of paths (that is, white-box sys-signal-semaphore() test cases) and satisfies the branch coverage criterion. The cyclomatic sys-suspendtask0 complexity metric of the following example, according to the above definition, is v ( F ) = 3. So, the maximum Suspended-wait number of paths to examine is 3. Table 3 demonstrates that whitesys-suspend-task() sys-signal-semaphore() box testing provides different results for variables inside each function, sys-suspendtask() such as the variable first-semaphore, which corresponds to the third functionality of the Table 1 checklist. sys-resume-task() We perform module and integration testing in an analogous way using the black-box technique. Although the tables are larger, the test table structure 4 remains the same, since each module sys-suspend-task() sys-resume-task() consists of several functions. We consys-sleep-tas k() sider functions to form a level when wakes0 called by other functions. An RTOS characteristic that affects module and integration testing is that even the largest modules are not more than four Figure 3. Task state diagram. levels deep, while most of them are only two to three levels deep. In fact, functions that call other functions form all of the modules. As a not matter what the specific tasks are or their priorities, as result, the module test of the function at the top of a module long as task 2’s priority is greater than task 1’s.This is because may appear,at first, quite similar to the function’sindividual test. all these events belong to the same class. As we discussed earlier, this is not true. Figure 3 shows the possible states of a task in RTOS, as well as the system calls used for the transitions between Behavioral testing example states. We describe the expected behavior of RTOS as a conIn a typical class of events, a new task with a higher prisequence of this event in Figure 4 (next page). We must introority (task 2) may preempt an earlier one (task 1). It does duce the event to test whether RTOS reacts as expected and zyxwvutsrq zyxwvutsrq zyxwvuts zyxwvuts mind that all errors are not of the same importance. The following tables and graphs present the cumulative results of testing for the total number of functions, functionalities, and paths tested. Of the functions we tested, 18 percent were low-level functions, 30 percent Posix support, and 52 percent basic system. Table 4 presents the results of RTOS testing. Figure 5a shows the testing results for the number of tested functions. We considered a function not OK even when one of its functionalities or paths is not performed correctly. Figure 5b presents the results of Table 4. Results of CLEAR RTOS software tests. testing black-box functionalities.The ~ ~ _ _ _ functionality types (positive or negBasic Posix Lowative) are' in parentheses. Tests system support level Total Figure 5c presents the outcome of black-box test cases. These results All functions are not the same as in Figure 5b, as OK 60 24 17 101 we test certain black-box functional10 21 2 33 Not OK ities in more than one test case. Total 70 45 19 134 Figure 5d presents the results of testing white-box functionalities. These functionalities differ from Black-box functionalities 85 18 312 OK 209 black-box ones in that they considNot OK 6(1P, 5N) er the internal operation of each 2 (2P) 56 (12P, 44N) 48 (9P, 39N) 133 20 368 Total 21 5 function and thus are more detailed and greater in number. Black-box test cases Finally, Figure 5e shows the results OK 227 (94P, 133N) 86(58P, 28N) 18(18P) 331 (170P, 161N) of executing white-box test cases, in Not OK 6(1P, 5N) 2 (2P) 60 (15P, 45N) which each test case corresponds to 52 (12P, 40N) 138 20 391 a different path. The test cases must Total 233 I exercise at least all the paths'accordWhite-box functionalities ing to the selected coverage criterion (branch coverage), and in simple OK 382 158 17 557 cases we selected all possible paths, Not OK 16(11P,5N) 1 (IP) 27(21P,6N) lO(9P, 1N) 168 18 584 according to the most-complete path Total 398 coverage criterion. White-box test cases In individual function testing, the OK 243(126P, 117N) 115(76P,39N) 12(12P) 370(126P, 50N) average value of the McCabe metric Not OK 16(11P, 5N) we have encountered is about 5, 1 (1P) 29(23P,6N) 12(llP, 1N) Total 259 while its maximum value is 10. 127 13 399 These values reveal only part of the effort devoted to testing activities, then use the U 6 debugger to monitor the actual behavior and we must consider the total si& of the tested code (in of RTOS, as in white-box testing. this example, 720 Kbytes of C source code). In general, discovering few errors does not imply inadeResults quate testing, but just that CLEAR RTOS runtime is very staThe test result assessment provides evidence of the softble. This is especially true of its basic system profile, because ware's quality and reliability. Of course, we should bear in we can identfy most of the discovered errors as specification ~ zy zy zyxwvu zyxwvutsrqpo zyxwvut zyxwvutsrqp zyxwvu zyxwvutsrq 58 IEEE Micro rather than inipleinentation error. Nor does this fact imply that the Posix code is of lower quality. We tested early versions of Posix functions, and certain argument checks. although specified, had not been implemented.'1The low percentage of erroneous white-box test cases (Figure Se) supports the quality of Posix code. White-box test cases are tiased on coded operations rather than on those that are specified hut unimplemented. In addition. the code structure was not final l~ecausethere is the option of omitting certain functions to save space. Basic system zyxwv zyxwvutsr Posix support Low level Basic system TESTING RTOS An independent group should Basic system Posix support Low level is an ongoing activity, and we will present a more complete evaluation soon. Until now. experience from the application of our testing methodology confirms some general assertions: E zyxwvu E zyxw No. of functions (percentage) (4 POSIX support P= NO.of functionalities (b) (percentage) Basic system Low level No. of test cases (percentage) (a Basic system Posix support Posix support perform testing since it is not Low level Low level 92 easy for the development group to perform objective and highly error-revealing tests. NO.of functionalities No. of test cases (percentage) A method should use both black(d) (percentage) (4 and arhite-box techniques, at least in the unit testing phase. since they reveal different kinds Figure 5. Graphical representation of results in Table 4: all functions (a), btackbox functionalities (b), black-box test cases (c), white-box functionalities (d) , and o f errors. We should give special emphasis white-box test cases (e). to selecting the appropriate coverage criterion. which assures an projects in Lvhich several partners rely on each other's results adequate confidence l e ~ while ~ l limiting testing effort and time constraints are strict. P to an acceptable level. Following an incremental tmtton-up integration stfateAcknowledgments gy is best. since the deveIopment phase is also ImttoniThis project received 50 percent o f its funds from the up and proceeds in parallel with testing. Commission of the European Union. A test strategy also requires behavioral testing to exmiine a real-time operating system's reaction to events from as many event classes as possible. zyxwvutsrq References Finally. following a systernatic testing approach not only ensures a predefined reliability level, but also permits the best possible organization of work from the project management point of view. This is particularly important for large 1 zy zyxwv R.S. Pressman,Software Engineering:A Practitioner'sApproach, McGraw-Hill, New York, 1992 2. R S. Freedman, "Testability of Software Components," I€€€ Trans. Software Eng., Vol. 17, No. 6, June 1991, pp. 553-564. U-At-I DUE TO LACK OF CONTRAST, GRAPHS DID NOT REPRODUCE WELL. GRAPHS FOLLOW SAME SEQUENCE AS LEGEND October 1995 59 zyxwvutsr zyxwvutsrq zyxwvuts zyx zyxwvutsrqp zyxwvutsrqpo zyxwvutsrqponmlkjih zyxwvutsrq 3. G.J. Myers, TbeArtof Software Testing, John Wiley & Sons, New York, 1979. 4. I.Sommerville, Software Engineering,Addison-Wesley, Reading, Mass., 1992 5. 1.A Wise, V D. Hopkin, and P. Stager, eds., "Verification and Validationof Complex Systems: Human Factor Issues," NATOAS1 Series F, Vol. 110. Springer-Verlag, Berlin, 1993. 6. C. Farris and P. Petit, "Final Specification for the Real Time Runtime Support for Deeply Embedded Applications," Esprit Project 8906-OMIKLEAR, Tech. Report 4.1.2-01, Etnoteam S p A., Milan, Italy, Dec. 30, 1994. 7. V.C. Gerogiannisand M.A. Tsoukarellas, "SAT-A Schedulability Analysis Tool for Real-Time Applications," Proc. Seventh Euromicro Workshop on Real-Time Systems, IEEE Computer Society Press, Los Alamitos, Calif, 1995, pp. 155-159. 8 R.A. DeMillo et al., Software Testing and Evaluation, Benjamin/CummingsPublishing Company, Reading, Mass., 1987. 9. J.P. Myers, Jr., "The Complexity of Software Testing," Software EngineeringJ., Jan 1992, pp. 13-24. 10. V.C. Gerogiannis, K.D. Economides, and M.A. Tsoukarellas, "Runtime Validation Report," Esprit Project 8906-OMVCLEAR, Tech. Report 6.1.1 -01, Advanced InformaticsLtd., Patras, Greece, June 29, 1995. 11 R.L. Glass, "Real-Time: The Lost World of Software Debugging and Testing," Comm ACM, Vol 23, No.5, May 1980, pp. 264271. 12. /E€€ Std 1003 1b-1993, Standards for InformationTecbnologyPortableOperatingSystem Interface (POsixkPart I : Application Program Interface (APl) [C Language]-Amendment' Realtime Extensions, IEEE, Piscataway, N.J., 1993 Manthos A. Tsoukarellas is a professor of informatics at the Patras and Mesologi Technological Education Institute in Greece. He is also founder, president, and executive director of Advanced Ipformatics Ltd. His research interests include the monitoring and performance evaluation of real-time systems. Tsoukarellas served as an evaluator of the Esprit proposals for the European Commission and has been involved in several Esprit projects. Vasllps C. Gtpo$lbaLds is a PhD student at W U m &f W a n d also works for Advanced *-I SLtd. Specifica- tion mechanism for real-time systems, real-time scheduling, and software testing are his special interests. Gerogiannis holds an MS in computer nformatics from the University of Patras. Kostis D. Economides is currently with Advanced Informatics Ltd., where he is responsible f o r b e Esprit I11 OMI/CLEAR project's testing activities. His current research interests include software testing, broadband networks, and networking interconnectivity. Economides graduated from thqJJniversity of Patras, with a BS in computer engineering and informatics. Direct questions concerning this article to Manthos A. Tsoukarellas, Advanced Informatics Ltd., 35 Gounari Ave., 26221 Patras, Greece; [email protected]. zyxwvut Reader Interest Survey Indicate your interest in this article by circling the appropriate number on the Reader Service Card. Low 165 Medium 166 High 167