Object Oriented COBOL
Object Oriented COBOL
Object Oriented COBOL
When I first heard about Object Oriented COBOL, it sounded like an oxymoron. Since then I have spent some time poring over manuals and trying some simple experiments, and I'd like to share what I have learned. However, I am not an expert. No doubt I've gotten some things wrong. The only reason I am posting this material in the first place is that so little information on OO COBOL is available on the Web. If you want a quick overview without having to spend $50 on a book, or 50 hours deciphering a manual, then take a look -- but don't regard these pages as authoritative. The version described here is IBM's implementation, mostly for mainframes, though I gather the OS/2 and AIX versions are similar. The draft standard evidently provides additional features which IBM has not implemented. These pages will be best understood by those who are already familiar with the OO paradigm in general, and C++ in particular, as well as COBOL.
Objects
An object has behaviors and an internal state.
In OO COBOL the behaviors are called methods. They correspond to member functions in C++. Whatever jargon you use, a behavior amounts to a subroutine call, where an implicit parameter identifies the object whose method is being invoked. The internal state consists of the data stored inside the object. In OO COBOL this data is accessible only to the methods of the object. From the standpoint of any other code, the object consists entirely of behaviors, except that the result of the behaviors may depend on the internal state. Some methods may change the internal state.
Classes
A class is a category of similar objects. All objects of the same class have the same range of potential states and behaviors. A particular object is sometimes called an object instance, or simply an instance, of the class. There may be multiple instances of a given class in existence at the same time.
Inheritance
A class may be a subclass (or "child" class) of a broader, more generic superclass (or "parent" class). Instances of the subclass have all the states and behaviors associated with the superclass, but may have other more specialized traits as well.
Every Dog is a Carnivore. Carnivore, in turn, is a subclass of Mammal, which is a subclass of Vertebrate (actually Chordate, if you want to be picky). Lassie inherits Hair and MammaryGlands from Mammal, and a Hunt method from Carnivore. (I have described what is sometimes called "public" inheritance. C++ also provides "private" and "protected" inheritance, which -- oh, never mind. COBOL doesn't have them.)
Polymorphism
A subclass inherits all the behaviors of the superclass, but it may override a behavior with its own variation of that behavior.
If you have an object reference to a Carnivore, you can invoke its Hunt method without knowing what kind of Carnivore it is. At runtime, if it happens to be a Dog, it will chase. If it happens to be a Cat, it will stalk.
Multiple Inheritance
A class may have two or more parent classes, inheriting traits from each.
This scenario invites confusion, because two or more parents may have methods of the same name. There needs to be some rule to resolve the conflict. As a result, some OO languages (e.g. Java) don't allow multiple inheritance. (The two parents may also have data elements with the same name. In COBOL, however, this possibility is not a problem. The data elements of a class are not visible to any other classes, even child classes.) Lassie is not just a Dog, she is a DomesticDog, which inherits from Dog and Property. Like other instances of Property, she has an Owner, a PurchasePrice, and a Depreciate method.
OO COBOL: Overview
In OO COBOL there are three kinds of programs. I have summarized IBM's syntax for each (consult the manuals for further details):
1. 2. 3.
A class definition is similar to an ordinary program. It has the usual four divisions, but with various special features. In particular, the PROCEDURE DIVISION doesn't contain procedural code in the usual way. Rather, it contains all of the code for all of the methods of the class. Each method definition has four divisions of its own, and its PROCEDURE DIVISION contains the procedural code. Because of this arrangement, it isn't possible to define some methods in one source file and others in another. All method definitions for a class must reside in the same source file. A class with many complex methods may require an unusually large source file.
A client program may be an ordinary program or a method definition. It uses the INVOKE verb to execute a method, rather than CALL. Defining a subclass is no different from defining a base class. In fact every class is a subclass, except for the built-in class SOMObject. A class may itself be an instance of a metaclass -- a class of classes. You can define your own metaclasses, derived from SOMClass. Since a metaclass is just another kind of subclass, the syntax is the same as for any other subclass.
Here's what I have found so far, organized by C++ feature. I won't pretend that this comparison is either complete or completely accurate. As noted elsewhere, this discussion is based on IBM's implementation, mostly for mainframes. IBM has apparently implemented only a subset of the draft COBOL standard.
Object Model
IBM's implementation is based on SOM (System Object Model), a complex set of tools and built-in classes. It supports communication among objects written in different languages, following CORBA standards.
All classes inherit, directly or indirectly, from SOMObject. A class is itself an object instance of SOMClass. For special purposes you can define a metaclass derived from SOMClass.
Header Files
COBOL still has copybooks, of course. Unlike headers in C++, however, copybooks are not useful for representing class interfaces.
Since data members are not visible to the client code of a class (see below), the compiler doesn't need to know the size of the objects it uses. It merely needs to mangle the names of the methods so that the linker can look for the right routines. However, the classes used by a program must be declared in the CONFIGURATION SECTION, in a special REPOSITORY paragraph. IBM stores interface information in a special database called an IR (Interface Repository). The compiler may optionally consult the IR to enforce proper use of the classes described there. The IR plays roughly the same role as C++ header files, providing the equivalent of function prototypes for methods.
Inheritance
All inheritance is public, and should therefore be confined to "is-a" relationships among classes.
In C++ you would typically use private inheritance to represent a "has-a" or "is-implementedas" relationship. Class Derived would inherit privately from Base. In OO COBOL, your best bet is to embed an object reference to a Base as a data member of Derived. OO COBOL supports multiple inheritance. As with C++, there are rules for resolving conflicts among methods inherited from more than one parent class.
Data Members
A class definition declares data members in the WORKING-STORAGE section. Each object instance gets its own WORKING-STORAGE. There are no static data members.
All data members are private; i.e. they are accessible only to the methods of that class.
Member Functions
OO COBOL calls them "methods." Each method is defined as a mini-program within the PROCEDURE DIVISION of a class definition. It may have its own WORKING-STORAGE section, for data with persistent state. It may also have a LINKAGE section for passing parameters, and a LOCAL-STORAGE section for the equivalent of auto variables (their values do not persist from one call to the next).
All methods are virtual and public. You don't CALL a method -- you INVOKE it. As with a CALL, you may specify a method with either a literal (for static linkage) or a data name (for dynamic linkage). As with C++, the method invoked at run time depends on the type of the object referenced. If you invoke a method through a reference to a base class, but the reference refers to an object of a derived class, you'll invoke the method associated with the derived class, not the one associated with the base class.
If you really need something like an abstract base class, you can build some clumsy checks to prevent the class from being instantiated. The constructor (see below) can interrogate the class for run-time type information, and abend if the object does not belong to a derived class. However there is no way to prevent instantiation at compile time.
COBOL object references occupy a middle ground between pointers and references as we know them in C++.
Unlike a reference in C++, an object reference in OO COBOL can be NULL, or it may be reseated from one object to another, or to NULL. Unlike a pointer, you cannot dereference an object reference. You can use an object reference only to specify an object instance whose method you wish to invoke. As with pointers, multiple object references may refer to the same instance. Object references may be either typed or untyped. An untyped reference may refer to any object. A typed reference may refer only to an object of the designated class, or of a class derived from it.
Likewise the somFree method corresponds to operator delete. You cannot allocate arrays of objects, so you don't need equivalents to operators new[] or delete[]. You can declare an array of object references and allocate each instance separately. somNew and somFree are methods of SOMClass (a generic class of classes). Theoretically, if you define your own metaclass, you could override them with your own versions of somNew and somFree, just as you might override operators new and delete in C++. However, SOM is presumably not bound by COBOL's restrictions. It may not be possible to override these methods. I haven't tried it yet.
C++ calls a constructor whenever an object is created, and a destructor whenever the object is destroyed. Likewise, someNew invokes somInit whenever it allocates an object, and somFree invokes somUninit. As with C++, OO COBOL invokes the constructor of a base class before invoking the constructor of the derived class. Likewise it invokes the destructor of the base class after invoking the destructor of the derived class. (I couldn't find anything in the manual that said so, but I did some experimenting.) Like a default constructor in C++, somInit takes no parameters. There's no way to overload somInit with a parameterized constructor. The manual recommends that you define a metaclass with a parameterized method for creating an object. However, this approach offers no evident advantages over an ordinary subprogram.
Static Members
There are no static members. Data members occur separately for each object instance. Every method call pertains to a particular instance, although the WORKING-STORAGE of a method occurs once per class, not once per instance.
However, there is no way to encapsulate such pseudo-static members properly. Even a metaclass has no special access to the internals of its instance classes. It can only use the same public methods that any other code can use.
Suppose, for example, you want to maintain a counter of all the instances of a class. You want to increment it whenever you allocate an instance, and decrement it whenever you deallocate. How do you enforce the proper maintenance of such a counter? You could increment the counter in somInit, and decrement it in somUninit. However, there's no way to make the counter accessible to both methods without making it accessible to every routine in the load module. You could bury the counter in a metaclass or a subprogram, and provide special methods or entry points for allocating and deallocating objects. However, you can't prevent the client code from calling somNew or somFree directly, bypassing the counter maintenance. This inability to declare static members is a crippling weakness in IBM's implementation.
For a class definition, the COBOL compiler writes a file in Interface Definition Language (IDL), normally a member in an IDL library. IDL is a specialized language, similar to C, for specifying class interfaces. A DD statement in the compile step tells the compiler where to put the IDL file. In a subsequent step, the SOM compiler compiles the IDL into machine-readable gibberish, which it stores in an Interface Repository (IR). There are also options whereby the SOM compiler can create header files and program skeletons for C or C+ + programs. If you're ambitious, you can even coax the SOM compiler into writing something completely different, such as documentation or ADA. There's a generic IR for built-in SOM classes. You're expected to create one or more levels of separate IRs by installation, application, and the like.
Once the IR contains a representation of the class interface, the COBOL compiler can consult it to verify that it is invoking the methods with the right parameters. The compile step requires a DD statement for a profile which defines a series of environmental variables. Among other things, these variables tell the compiler which repositories to consult. The compiler puts the object code into an object library in the usual way. However, in order to support names longer than eight characters, you need to keep a special member in the library to map long names to mangled eight-character names. You have to know how to invoke the tools which maintain this mapping.
Different kinds of compiles need different kinds of JCL. Compiling a class definition is different from compiling a client program, and each is different from compiling a program which doesn't use objects. You need to follow yet another procedure if two or more classes refer to each other. If you are adopting OO COBOL for the first time without an experienced hand to guide you, the complexity of these procedures will be a substantial obstacle.
You can't define static members; You can't define private or protected member functions; You can't declare one class as a friend of another.
You also can't declare protected data, but you wouldn't need to if you could define protected methods. It is often useful to circumvent the encapsulation of a class. C++ lets you do so in a carefully controlled manner, through the use of friend classes, friend functions, and protected members.
By contrast, OO COBOL's encapsulation is so rigid as to be self-defeating. The only way to provide any access to private data is to define public methods, which render the data effectively public. Still, it would be churlish to write off IBM's OO COBOL as a failed experiment. The draft standard promises a fuller set of features, and IBM may catch up eventually. Meanwhile, even the limited features now available are worth exploring, and will no doubt prove useful.
After the initial swell of hype and enthusiasm, the Wave of the Future crashes upon the Rocks of Reality. The new techniques prove difficult to apply in the real world. Several large, visible, and expensive projects fail. Disillusionment sets in. Eventually, we assimilate the new techniques into a larger body of practice and lore. They become old techniques, part of our usual bag of tricks, to be used where they are useful and ignored where they are not. If you've been around for very long in this business, you've seen this cycle several times. Name your fad: structured programming, decision tables, relational databases, CASE tools, client-server, and all the rest. We may be entering the later stages of this cycle for Object Orientation (OO). As with previous fads, OO has been less useful, and more difficult to apply, than we had hoped. When OO fails, the zealots insist that it wasn't properly applied; skeptics suspect that it was a bad idea in the first place. The truth, as usual, is somewhere in between. Designing and building complex systems is still hard and probably always will be. While OO can be useful, it will be most useful if we can recognize the kinds of situations where it is least useful. Let's make the working assumption that OO is good for some things and not so good for other things. The question is: which is which? The following pages attempt some answers. They are not the proclamations of an expert, only the guesses of an amateur. They are an attempt to spark debate among those who are experts. Let the debate begin.
THE OO PARADIGM
I won't even try to give an authoritative definition of Object Orientation. If you don't already know what it is, or think you do, then you probably wouldn't be reading this.
It is fair to say, however, that most people's definitions of Object Orientation include three key principles:
1. Encapsulation 2. Inheritance 3. Polymorphism Encapsulation is nothing new. A number of older terms refer to roughly the same idea: information hiding, modularity, cohesion.
In OO, encapsulation extends to procedures as well as data. In other words, a class encapsulates methods as well as attributes. Even this idea is not specific to OO. Abstract Data Types apply the same notion. Inheritance is what makes a Class different from an Abstract Data Type. A derived class inherits the methods and attributes of its parent class. Polymorphism allows different types of objects to behave differently. A piece of code can invoke an object's method without knowing what kind of object it is. The runtime system automagically selects the method appropriate for the class to which the specific object belongs. Without polymorphism, inheritance would be of limited usefulness. Certain other ideas are often associated with OO, but are not really part of it:
The use of a Graphical User Interface (GUI). OO is good for implementing a GUI, but not essential. Likewise, you can use OO for an application which doesn't have a GUI. Implicit code. I made up this term myself because I don't know what else to call it. It refers to certain features of C++ whereby the compiler generates code which is not explicitly visible in the source code. Constructors, destructors, overloaded operators, templates, and exceptions all invoke various kinds of implicit code.
Since I've never coded for a GUI myself, I have nothing to say on the subject. Implicit coding is a language issue rather than an OO issue, and a topic for a different diatribe.
AUSPICIOUS OMENS
Each of the following conditions makes it more likely that OO will be a good fit for a particular application: You can expect to get the design right, or nearly right, from the beginning. This is the most important condition, and I will expand on it below. There is no need for persistent objects, i.e. objects stored in files. You can still use files, but they store just ordinary data, not full-blown objects with inheritance and polymorphism. There is no need, or temptation, to use multiple inheritance. The problem readily lends itself to the use of polymorphism.
None of these conditions is an absolute requirement. You may be able to use OO successfully even if none of them applies. However, these principles may help you decide when OO is likely to be a good fit and when it isn't.
FUNDAMENTAL AXIOM OF OO
One aspect of OO stands out in the literature, in my own experience, and, so far as I know, in the experience of pretty much anyone who has seriously used OO: You have to get the class design right from the beginning. If you don't, there will be hell to pay later.
One reason is the emphasis on encapsulation. When you let one piece of code hide information from another, you'd better make sure that the other piece will never need that information. Hence the Fundamental Axiom applies with similar force to any discipline which emphasizes encapsulation.
The other reason is the emphasis on inheritance and polymorphism. Your entire system may depend on the design of your classes. Any change in the class design will ripple throughout the system. Even without OO, it's a good idea to get the design right before coding. However, OO raises the stakes. An OO language like C++ provides elaborate machinery for defining class relationships, and for specifying the scope and degree of encapsulation. A seemingly minor design change can have wide-ranging and non-obvious consequences. A larger change may require extensive rewriting. As a result, OO imposes a sizable penalty for guessing wrong about the design. Since it is so painful to change your design after you've started coding, you'll be tempted to muddle through with a flawed design rather than fix it.
FUNDAMENTAL COROLLARY OF OO
If you believe the Fundamental Axiom, then the Fundamental Corollary follows as night the day: OO is most likely to work when you can get the design right from the beginning. You're most likely to get the design right when: The requirements are stable and well understood The requirements are largely under the developer's control The class design is natural and intuitive
STABLE REQUIREMENTS
An OO system typically uses classes to model the behavior of real-world entities. You're most likely to model these entities successfully if you understand them, and if they stay pretty much the same over the course of development.
For example, the following kinds of entities might be good candidates for representation by a class:
Mathematical objects such as complex numbers, equations, matrices, and geometrical shapes (because mathematics is stable) A general ledger system (because the principles of accounting are stable) Physical objects such as molecules, machine parts, or furniture (because the laws of physics are stable)
INTUITIVE CLASSES
Ideally, objects should correspond to recognizable entities, whether real or abstract. The relationships among classes should reflect the relationships among the corresponding entities. The better the match, the easier it is to identify the appropriate attributes and methods, and the easier it is for others to understand the design.
Sometimes it is a struggle to come up with a satisfying set of classes. Things don't quite seem to fit. There may be several plausible designs, with no obvious reason to pick one instead of the others. This difficulty may reflect our own inexperience with OO design, or it may suggest that the OO paradigm doesn't work very well for this problem. Either way the result is much the same: an obscure and tortured design which will be awkward to implement and confusing to maintain. Other times, we may find ourselves inventing classes as code gimmicks. They don't correspond very well to recognizable entities, but they help us make the code behave the way we want. Some of the advanced C++ idioms fall into this category. Such classes are not necessarily a bad idea, but they are often obscure. These considerations suggest some guidelines:
Don't use classes, inheritance, or polymorphism just because you can. Use them when they come naturally, and when they help you solve problems. If you have to strain to squeeze your problem into the OO paradigm, then reconsider. Some other approach may work better, even if it is unfashionable. If a design is useful despite being non-intuitive, then try to hide it behind a layer of encapsulation. The most visible portions of the design should be as intuitive as possible.
PERSISTENT OBJECTS
OO techniques work best when every object resides in memory. They are more complicated when you must store objects in a file system.
One reason is that objects have different sizes and different internal structures, even if they are derived from the same base class. As long as they reside in memory, you can simply allocate space for them dynamically as needed. However, it is awkward to store different objects in the same file unless you are content with simple sequential access. Another reason is that objects often incorporate data structures stitched together with pointers: linked lists, trees, and the like. You can't usefully store pointers in a file. At best you can only simulate them with file offsets, database keys, or various other gimmicks. Pointers are likely to be involved in another form as well. Typically a polymorphic object carries a pointer to a list of its methods -- represented by entry point addresses, similar to function pointers in C and C++, or to procedure pointers in COBOL. It is by following these pointers that the runtime system supports polymorphism. When you store objects in a file, there's no simple way to store these pointers with it. There are ways to solve these problems, but there is no simple standard way. You either code your own clever tricks or buy somebody else's proprietary clever tricks, such as an OO database.
MULTIPLE INHERITANCE
At first, multiple inheritance is an appealing idea, because things in the real world belong to multiple categories. I belong not only to the class Human, but also to various overlapping subclasses such as Employee, Husband, Voter, CarOwner, and CobolProgrammer. Each of these classes has its own distinct set of attributes and methods.
In practice, multiple inheritance exacts a price. Membership in multiple classes adds complexity by multiplying the number of attributes and methods which apply to an object. In addition, it introduces conflicts among competing base classes if they have attributes or methods with the same name. Languages which support multiple inheritance have ways of resolving these conflicts, but those ways tend to be either subtle or inflexible. At best they provide opportunities for errors and confusion. There is no end to the categories to which something could belong. Your job as class designer is to find the minimal set of abstractions which are adequate for the job at hand. Multiple inheritance may be appropriate in a given case. However, the temptation to use multiple inheritance may be a clue that the class design is muddled, or that OO is not a good fit.
POLYMORPHISM
If you see a natural and satisfying use for polymorphism, then OO is likely to be a good choice, for two reasons: The usefulness of polymorphism suggests that your classes are a good fit for the entities they represent. If the problem lends itself to a polymorphic solution, then other approaches are likely to work badly. You can simulate polymorphism by branching on type codes, but this kind of coding is ugly, awkward, and difficult to extend to new classes.
For example: I once used polymorphism in a menu system. My base class MenuItem had a polymorphic method doAction and three subclasses: Program (which invoked some other program), Submenu, and Exit. The menu code didn't know or care which was which; it just invoked the doAction method for whichever item the user picked.
Later I added a fourth subclass PrevMenu, which returned to the previous menu without exiting. This addition was trivial because the menu code didn't have to change. If I hadn't used polymorphism, I would have had to find all the right IF statements and add another branch.
These factors are bad news for any methodology, but OO can make it especially difficult to adapt to changing requirements. The people who change the rules don't care about your design. New rules may cross the boundaries which the initial design so carefully erected.
For example, consider a Human Resources system for managing payroll and benefits. (I've never worked on an HR system, so I'm just making things up. If I have committed any howlers they merely prove my ignorance; they don't disprove my point.) Our employees are not all the same. Union members are governed by a collective bargaining agreement, and we withhold union dues from their paychecks. The others receive a different benefit package and pay no union dues. In the bad old days before OO, you would have designed an Employee record with a flag to indicate union or non-union. Whenever necessary, your programs would test that flag with an IF statement. Armed with OO methodology, however, you start off with a base class, Employee, for the things which are common to all employees: name, Social Security Number, home address, and so forth. Then you design two subclasses of Employee: UnionEmployee and Manager, with a CalculatePay method for each. Your program simply calls Employee.CalculatePay, and through the magic of polymorphism, selects the appropriate version of this method. You're using a relational database, not a fancy OO database. Somehow you figure out a way your database can support two different kinds of employees, and you write a layer of interface code to translate between relational concepts and OO concepts. However, it turns out that Managers aren't all alike, either. Some are paid by the hour, and others are salaried. No problem: you design two more subclasses, HourlyManager and SalariedManager, and tinker with your database accordingly. In talking with the HR manager, you discover that upper-level managers, known as Officers, are entitled to additional benefits such as stock options. You add an Officer class. A few of the Officers sit on the Board of Directors, along with some outside Directors. The company pays Directors for attending board meetings, and also pays insurance premiums to indemnify them against shareholder lawsuits. Two more classes enter the design, one of them using multiple inheritance to combine Officer and Director. After coding for a while, you discover that union stewards (but not ordinary union members) can attend union committee meetings on company time. You add a UnionSteward class so that CalculatePay can reflect this rule. After the system goes into production, your design must accommodate a series of further developments:
1. The company creates a new category of part-time workers. They are not union members, and not entitled to most of the usual benefits. 2. The company opens a plant in a right-to-work state. Some of the workers are covered by the collective bargaining agreement, but don't belong to the union or pay union dues. 3. The company hires college students as management interns over the summer. They are salaried but receive few benefits. 4. As a result of an SEC ruling, Officers who are also Directors are subject to additional reporting requirements when they receive or exercise stock options. 5. The company hires temporary employees for seasonal work. They aren't covered by the collective bargaining agreement, and receive less pay and fewer benefits than union members.
6. If a union member changes to part-time status, he or she can remain in the union for up to a year, with partial dues and partial benefits. 7. The union negotiates a new contract, and wins a seat on the Board of Directors for one of the unionized employees. The union representative does not receive stock options, and attends on company time without additional compensation, except for reimbursement for travel and expenses. If the board meeting is held on a day which is a legal holiday in the state in which the representative is normally employed, he or she is paid at overtime rates. By this point you may wish you had stuck with flags and IF statements.
I apologize for making this example so long, but I don't know how else to suggest the kinds of arbitrary complications that the real world can impose. If anything, the real world is even worse -- sometimes much worse. An OO zealot might insist that a correct class design would have been able to accommodate these complications with a minimum of fuss. Perhaps so. My point is that in some problem domains it is unlikely that anyone but a virtuoso will come up with the correct design at the outset. By definition, most people are mediocre. It's not smart to rely on a methodology which requires a virtuoso.