Academia.eduAcademia.edu

Mediators in the architecture of future information systems

2000, Computer

omputer-based information systems, connected to worldwide high-speed networks, provide increasingly rapid access to a wide variety of data resources.' This technology expands access to data, requiring capabilities for assimilation and analysis that greatly exceed what we now have in hand. Without intelligent processing, these advances will provide only a minor benefit to the user at a decision-making level. That brave user will be swamped with illdefined data of unknown origin. The problems. This article will expand on the two types of problems that exist: For single databases. a primary hindrance for end-user access is the volume of data that is becoming available. the lack of abstraction, and the need to understand the representation of the data. When information is combincc1 trom multiplc datahasch. thc major concc'rn is w-1 t hc mismatch c ncou n t crccl i n i n tor inn t io n re prcsc n I ;I t ion :i ncl h t r uct ti re. Mediators embody the administrative and technical knowledge to create information needed for user decision-making modules. The goal is to exploit the data technology puts within our reach. Volume and abstraction. The volume of data can be reduced by selection. It is not coincidental that Select is the principal operation of relational database management systems, but selected data is still at too fine a level of detail to be useful for decision making. Further reduction is achieved by abstracting data to higher levels. Aggregation operations such as Count, Average, Standard-Deviation, Maximum, and Minimum provide some computational facilities for abstraction. Today, such abstractions are formulated within the end user's application, using a variety of domain knowledge. For most base data, more than one abstraction must be supported: For the sales manager, the aggregation is by sales region. while aggregation by customer income is appropriate for marketing. Figure 1 presents examples of required abstraction types. Computations for abstraction may be extensive and complex. Collecting all instances that lead to an abstraction can involve recursion, say, locating all potentially useful flight segments for a trip. Such a computation cannot be specified with current database query languages. Hence, application programs are written by specialists to reduce the data. Using specific data-processing programs as intermediaries reduces flexibility and responsiveness for the end user. The knowledge that creates the abstractions is hidden and hard to share and reuse. Mismatch. As Figure 2 shows, data obtained from remote and autonomous sources often will not match in terms of name, scope, granularity of abstractions, temporal units, and domain definitions. The differences shown in the examples must be resolved before automatic processing can join these values. Without an extended processing paradigm, as proposed here, the information needed to initiate actions will be hidden in ever larger volumes of detail. scrollable on ever larger screens, in ever smaller fonts. In essence, the gap between information and data will be even wider than it is now. Knowing that information exists and is accessible gives end users expectations. Finding that it is not available in a useful form or that it cannot be combined with other data creates confusion and frustration. I believe the reason some users object to computer-based systems, saying they create information overload, is that they get too much of the wrong kind of data. (See the sidebar, "Knowledge versus data. ") Use of a model. To visualize the requirements we will place on future information systems, let's consider the activities carried out today when deci-Type of Mismatch Type of Abstraction Example Base Data Abstraction Granularity Sales detail 4 Product summaries Generalization Temporal Daily sales 4 Seasonally adjusted Relative Product cost + Inflation-adjusted trends Exception recognition Accounting detail 4 Evidence of fraud Path computation Airline schedules + Trip duration and cost Product data-+ Product type monthly sales Figure 1. Abstraction functions. Knowledge versus data I favor a pragmatic distinction between data and knowledge in this model. Data describes specific instances and events. It may be gathered automatically or clerically. The correctness of data can be checked versus the real world. Knowledge describes abstract classes. Each class can cover many instances. Experts are needed to gather and formalize knowledge. One item of knowledge can affect the use of many items of data; also, one item of new data can disprove or weaken existing knowledge.' 40 COMPUTER

Mediators in the Architecture of Future Information Systems Gio Wiederhold, Stanford University zyxwvu omputer-based information systems, connected to worldwide high-speed networks, provide increasingly rapid access to a wide variety of data resources.' This technology expands access to data, requiring capabilities for assimilation and analysis that greatly exceed what we now have in hand. Without intelligent processing, these advances will provide only a minor benefit to the user at a decision-making level. That brave user will be swamped with illdefined data of unknown origin. The problems. This article will expand on the two types of problems that exist: zyx zyxw zyxwvutsr zyxwvutsr For single databases. a primary hindrance for end-user access is the volume of data that is becoming available. the lack of abstraction, and the need to understand the representation of the data. When information is combincc1 trom multiplc datahasch. thc major concc'rn is t hc mismatch c ncou n t crccl i n i n tor innt io n re prcsc n I ;I t ion :i ncl h t r uct ti re. 1 -w Mediators embody the administrative and technical knowledge to create information needed for user decision-making modules. The goal is to exploit the data technology puts within our reach. Volume and abstraction. The volume of data can be reduced by selection. It is not coincidental that Select is the principal operation of relational database management systems, but selected data is still at too fine a level of detail to be useful for decision making. Further reduction is achieved by abstracting data to higher levels. Aggregation operations such as Count, Average, Standard-Deviation, Maximum, and Minimum provide some computational facilities for abstraction. Today, such abstractions are formulated within the end user's application, using a variety of domain knowledge. For most base data, more than one abstraction must be supported: For the sales manager, the aggregation is by sales region. while aggregation by customer income is appropriate for marketing. Figure 1 presents examples of required abstraction types. Computations for abstraction may be extensive and complex. Collecting all instances that lead to an abstraction can involve recursion, say, locating all potentially useful flight segments for a trip. Such a computation cannot be specified with current database query languages. Hence, application programs are zyxwvut zyxwvutsrqp zyxwvutsrqpo written by specialists to reduce the data. Using specific data-processing programs as intermediaries reduces flexibility and responsiveness for the end user. The knowledge that creates the abstractions is hidden and hard to share and reuse. Mismatch. As Figure 2 shows, data obtained from remote and autonomous sources often will not match in terms of name, scope, granularity of abstractions, temporal units, and domain definitions. The differences shown in the examples must be resolved before automatic processing can join these values. Without an extended processing paradigm, as proposed here, the information needed to initiate actions will be hidden in ever larger volumes of detail. scrollable on ever larger screens, in ever smaller fonts. In essence, the gap between information and data will be even wider than it is now. Knowing that information exists and is accessible gives end users expectations. Finding that it is not available in a useful form or that it cannot be combined with other data creates confusion and frustration. I believe the reason some users object to computer-based systems, saying they create information overload, is that they get too much of the wrong kind of data. (See the sidebar, “Knowledge versus data. ”) Type of Abstraction Base Data Example Abstraction Granularity Sales detail 4 Product summaries Generalization Product data -+ Product type Temporal Daily sales 4 Seasonally adjusted monthly sales Relative Product cost + Inflation-adjusted trends Exception recognition Accounting detail 4 Evidence of fraud Path computation Airline schedules + Trip duration and cost Figure 1. Abstraction functions. zyxwvu Knowledge versus data I favor a pragmatic distinction between data and knowledge in this model. Data describes specific instances and events. It may be gathered automatically or clerically. The correctness of data can be checked versus the real world. Knowledge describes abstract classes. Each class can cover many instances. Experts are needed to gather and formalize knowledge. sions are being made. Making informed decisions requires applying a variety of knowledge to information about the state of the world. To manage this variety, we employ specialists. T o manage the volume of data, we segment our databases. In these partitions, partial results are produced, abstracted, and filtered. The problem of making decisions is now reduced to the issue of choosing and evaluating the significance of the pieces of information derived in those partitions and fusing the important portions. For example, an investment decision for a manufacturer will depend on the fusion of information on its own production capability versus that of others, its sales experience in related products, the market for the conceived product at a certain price, the cost-to-price ratio zyxwvutsrqpon Use of a model. To visualize the requirements we will place on future information systems, let’s consider the activities carried out today when deci- One item of knowledge can affect the use of many items of data; also, one item of new data can disprove or weaken existing knowledge.’ Type of Mismatch Example Domains Key difference Alan Turing: The Enigma Reference for reader versus QA29.T8H63 Reference for librarian Scope difference Employees paid Includes retirees versus Employees available Includes consultants Abstraction grain Personal income From employment versus Family income For taxation, housing Temporal basis Monthly budget Central office versus Weekly production Factory records Domain semantics Postal codes One can cover multiple places versus Town names Can have multiple codes Value semantics Excessive-pay Per Internal Revenue Service versus Excessivegay Per board of directors zyxw Figure 2. Mismatches in data resources. March 1992 39 appropriate for the size of the market, and the cost of the funds t o b e invested. Specialists would consider these diverse topics and consult multiple data resources to support their claims. The decisionmaker will integrate and fuse that information t o arrive at a single set of ranked alternatives, considering risk and longrange objectives in combining the results. A n effective architecture for future information systems must support automated information-acquisition processes for decision-making activities. By default, I model the solution following the partitioning seen in human-based support systems. However, most aspects of human behavior cannot b e captured and formalized adequately. W e merely define a modular architecture wholly composed of pieces of software that are available o r appear to be attainable in a modest time frame, say 10 years. Modern hardware should b e capable of dealing with the processing demands imposed by such software. information t o project future sales, and a customer wants price and quality information to make purchase choices. Most of the information these people need can b e represented by factual data and is available o n some computer. Communication networks can make the data available wherever needed. However, before decisions are made, a considerable amount of knowledge also has t o b e applied. Today, most knowledge is available through various administrative and technical staff at institution^.^ Some knowledge is encoded in dataprocessing programs and expert systems for automated processing. Use of supporting information for decision making is similar in partially automated and manual systems. I n manual systems, the decision-maker obtains assistance from staff and colleagues who peruse files and prepare summaries and other documentation. With partial automation, the staff uses computers to prepare these documents. T h e decisionmaker rarely uses the computer, because the information from multiple sources is too diverse for automatic integration. loops and their interaction. The data loop closes when the effects of actions taken are recorded in the database. The knowledge loop closes when recently gained knowledge is made available so it can b e used for further selection and data-reduction decisions. A t their interaction points, information is created. Those points are of prime concern for future systems. Creation of information. T h e model in Figure 3 identifies the interaction points where, during processing, information is created. Since getting information means that something novel has been learned, one o r more of the following conditions have to hold: The information is obtained from a remote source that was previously not known locally (Step 3.ii.b of Figure 3). Here, the information system must provide communication and remote access support. A special case of this condition occurs when a database is used for recall, to provide data we knew once but cannot remember with certainty. T h e database component is used here to communicate over time-from the past to the present. Two previously distinct facts are merged o r unified (Step 3.ii.c). A classic, although trivial, example is finding one’s grandfather via transitivity of parents. In realistic systems, unification of data also involves computation of functions, say, the average income and its variation among groups of consumers. Multiple results are fused usingpragmatic assessments of the quality and risks associated within Step 4. Here, derived abstractions, rather than facts, are combined; the processing techniques are those associated with symbolic processing in rule-based expert systems, although they are also found coded within application programs. In our example, the market specialist might want to unify incomes of current consumers with their reading habits t o devise an advertising strategy. zyxwvutsrqpo Current state. W e a r e not starting from a zero base. Systems are now becoming available that are capable of achieving what Vannevar Bush envisaged nearly a half-century ago for his information management desk (Memex).’ W e can select and scroll information on our workstation displays. We have access to remote data and can present the values in one of multiple windows. W e can insert documents into files in our workstation, and we can annotate text and graphics. W e can reach conclusions based on this evidence and advise others of decisions made. T h e vision in this article is intended to provide a basis for automated integration of such information. Neither Memex nor our current systems address that level of processing. A t best, they provide graphic visualizations so that voluminous information is assimilated more easily by the end user. A model of information processing Information lets us choose among several otherwise indistinguishable actions. Let’s again consider a simple business environment. Say a factory manager needs sales data t o set production levels, a sales manager needs demographic 40 Processing and applying knowledge. A technician will know how to select and transfer data from a remote computer to one used for analysis. A data analyst will understand the attributes of the data and define the functions to combine and integrate the data. A statistician might provide procedures to aggregate data on customers into groups that present distinctive behavior patterns. A psychologist might provide classification parameters that characterize the groups. Ultimately, it’s u p t o a manager to assess the validity of the classifications that a r e made, use the information t o make a decision, and assume the risk of making the decision. A public relations person might take the information and present it to stockholders, the people who eventually assume the risk. Since these tasks are characterized by data and knowledge gathered in the past and projected into the future, we term these tasks planning. (This definition of planning is more extensive than that used in artificial intelligence research,” although the objectives are the same.) T o b e able t o deal with support for planning in a focused way, we model the information-processing aspects. Figure 3 illustrates the two distinct feedback Databases record detailed data for each of many instances o r events. Reducing this detail t o a few abstract cases increases the information content per element. Of these abstractions, only a small, manageable number of justified results is brought to the decision-maker. For instance, the market analyst has made it possible to base decisions on consumer groups rather than individual COMPUTER consumers. Certain groups may be unlikely purchasers and are therefore not targeted for promotions. While the behavior of any individual may not adhere to the rules hypothesized in the prior steps, the expected behavior of the aggregate population should be close to the prediction. Failures only occur if the underlying data contains many errors or if we have serious errors in our knowledge. Uncertainty, however, is common. generated plans will be greater. But uncertainty is the essence of decision making and is reflected in the risk that the manager takes on. If we only have to report the postal codes for our consumers, we do not need a manager with decision-making skills or intelligent support software. Hence, uncertaintyiscreated when formal and natural classes are matched. Communication of knowledge and data is necessary to achieve this conflu- zyxwvutsrqp Uncertainty. We cannot predict the future with certainty. For automation of full-scale information systems, the processing of uncertainty measures must be supported, although subtasks do exist whereby results can be precisely defined. Uncertainties within a domain may be captured by a formal model. The uncertainty of the results of an application is based on the uncertain precision of source information and the uncertainty created at each step where information is merged. For example, we collect domain-specific observations based on some criterion - say, people living in a certain postal-code area. We also have data to associate an income level with that postal-code area. At the same time, we might know the income distribution of people buyingsome product. The desired result requires unification of the postal code with income to estimate potential sales. Unfortunately, there is no logical reason why such a unification should be correct. We have some formal classes - namely, people with a certain postal code - and some other formalizable classes based on income. In addition, some natural classes exist that are not formalized but are intuitively known. In our example, these comprise the potential purchasers, of which there are several subgroups including those found in the database who bought in the past and those who may buy the planned products in the future. For the future class, only informal criteria can be formulated. The marketing manager will use definable classes -by postal code and by observed and recorded purchasing patterns- to establish candidate members for the natural class of potential consumers. These classes overlap; the more they overlap the more correct the decision-maker’s predictions will be. If we infer from classes that do not match well, the uncertainty attached to the zyxwvutsrqpo March 1992 Decision support Major steps in the information-processing loops include: 1. Data are made available. These are either factual observations or results from prior processing, or combinations thereof. 2. Knowledge is made available. It derives from formal training and experience. 3. Knowledge about the data and its content is applied: i ii Validation: Errors in data or knowledge are identified. Selection: Subsets of available data are (a) defined, (b) obtained, and (c) merged. iii Reduction: The data found are summarized to an appropriate level of abstraction. 4. Results are made available and multiple results are fused. 5. The combined information is utilized in two ways: i ii zyxw Actions are taken that will affect the state of the world. Unexpected results will augment the experience base of the participants and of others who receive this information, increasing their knowledge. 6. The process loops are closed with two independent steps. i ii The actions and their effect are observed and recorded in some database. The knowledge is recorded to effect subsequent validation, data definition, selection, reduction, o r fusion. Figure 3. Knowledge and data feedback loops and their interaction. 41 zyxwvuts A manual approach to mediation The concept of mediation is related to the notion of having corporate information centers, as promoted by IBM and others. These corporate resources are staffed and equipped with tools to aid any user needing information. The needs and capabilities of an information center, however, differ from those of automated mediators: ence. T h e communication may occur over space or over time. T h e information systems we consider must support both communication and fusion of data and knowledge. knowledge is difficult to integrate and fuse. For instance, combining a travel guide description with airline and bus schedules t o plan a trip is a research challenge. The tables, maps, and figures in such documents a r e data as well, but rarely in a form that can be processed without human input. zyxwvutsrqponm 1. A single center, or mediator for that matter, cannot deal with the variety of information that is useful for corporate decision making. 2. Automation of the function will be necessary to achieve acceptable response time and growth of knowledge and its quality over time. 3. The user should not be burdened with the task of seeking out information sources. This task, especially if it is repetitive, is best left to an interaction of application programs in a workstation and automated mediation programs. Change. O u r systems must also be able t o deal with continuing change. Both data and knowledge change over time because the world changes and because we learn things about our world. Rules that were once valid eventually become riddled with exceptions, and the specialist who does not adapt finds that his or her work loses value. A n information system architecture must deal explicitly with knowledge maintenance. (See the sidebar entitled “ A manual approach to mediation.”) Workstation applications. T h e new generations of capable workstations provide the systems environment for planning activities. For planning, users need to interact with their own hypotheses, save intermediate results for comparison of effects, and display alternate projections over time. This interaction with information establishes the insights you need to gain confidence in your decisions. O u r architecture must not constrain the users as they exercise creativity at the workstation. A variety of data resources will b e needed for planning processes. But help is needed t o deal with the aggregation and mismatch problems being encountered. By providing comprehensive support for access to information, the complexity of the end user’s applications can b e reduced to such an extent that quite involved analyses remain manageable. zyxwvutsrqpo zyxwvutsrq The information center notion initiates yet another self-serving bureaucracy within a corporation. Effective staff is not likely to be content in the internal service roles that an information center provides, so that turnover of staff and its knowledge will be high. The only role foreseen in such a center is mediation - bringing together potential users with candidate information. To manage mediation, modularization instead of centralization seems to be essential. Modularity is naturally supported by a distributed environment, the dominant computing environment in the near future. Information system components T h e components available today for building information systems are positioned along a data highway provided by modern communication technology. T h e interaction of the components will be primarily constrained by logical and informational limitations, not by physical linkages. When we place the components into the conceptual architecture, we will recognize lacunae, that is, places where there are no existing components, only inadequate or uncooperative ones. W e will see where the components must work together. Effective systems can b e achieved only if the data and knowledge interfaces of the components are such that cooperation is feasible. Network interfaces. Modern operating and network systems simplify the users’ administrative tasks by handling all hardware resources and interfaces. Interfaces for remote access from the users’ workstation nodes to the nodes containing database servers are based on communication protocols and formats. They ignore the contents of the messages. Mediators, on their network nodes, will provide services that deal with the content of the data being transmitted. Mediators can be interposed using existing communication network protocols. zyxwvutsrqponm Further reading zyxwvutsrqp Atre, S., lnformationCenter: Strategies and Case Studies, Vol. 1, Atre Int‘l Consultants,Rye, N.Y., 1986. Wetherbe, J.C., and R.L. Leitheiser, “InformationCenters: A Survey of Services, Decisions, Problems,and Successes,” lnformationSystems Management, Vol. 2, No. 3, 1985, pp. 3-10. 12 Data and knowledge resources. There is a wide variety of data resources. We might classify them by how close they are t o the source. Raw data obtained from sensors, such as purchase records from point-of-sale scanners, or, on a different scale, images recorded by earth satellites, are at the factual extreme. Census and stock reports contain data that have gone through some processing, but they will b e seen as facts by most users. A t the other extreme are textual representations of knowledge. In general, books, research reports, and library material contain accumulated knowledge, contributed by writers and their colleagues. Unfortunately, from our processing-oriented viewpoint. that The mediator architecture Intelligent and active use of information requires a class of software modules that mediate between the workstation applications and the databases. Mediation simplifies, abstracts, reduces, merges, and explains data. T h e examples of mediation shown in the “Mediation” sidebar are specialized and are COMPUTER Mediation zyxwvuts zyxw In this article, the term mediation covers a wide variety of functions that enhance stored data prior to their use in an application. Mediation makes an interface intelligent by dealing with representation and abstraction problems that you must face when trying to use today’s data and knowledge resources. Mediators have an active role. They contain knowledge structures to drive transformations. Mediators may store intermediate results. Some examples of mediation found in current information systems are shown here. Most readers will be able to add entries from their experience. Transformation a n d subsetting of databases using view definitions and object templates. These techniques reorganize base data into new configurations appropriate to specific users and applications. DeMichiel, L., “Performing Operations Over Mismatched DOmains,” /€€€ Trans. Knowledge and Data Eng., Vol. 1, No. 4, Dec. 1989, pp. 485-493. Litwin, W., and A. Abdellatif: “Multidatabase Interoperability.” Computer, Vol. 19, No. 12, Dec. 1986, pp. 10-18. Sacca, D., et al., “Description of the Overall Architecture of the KIWI System,” ESPRlT85, EEC, Elsevier, 1986, pp. 685-700. Computations that support abstraction and generalization over underlying data. These are needed to bring data at a low level of detail to a higher level. Typical operations involve statistical summarization and searching for exceptions. Abstractions are also needed to resolve mismatches. Adiba, M.E., ”Derived Relations: A Unified Mechanism for Views, Snapshots, and Distributed Data,” Proc. Seventh Conf. Very Large Data Bases, C. Zaniolo and C. Delobel, eds., IEEE Computer Soc. Press, Los Alamitos, Calif., Order No. 371 (microfiche only), 1981, pp. 293-305. zyxwvutsrqpo zyxwvutsrqponm zyxwvutsrqp zyxwvutsrq Barsalou, T., R.M. Chavez, and G. Wiederhold, “Hypertext Interfaces for Decision-Support Systems: A Case Study,” Proc. Medinfo 89,IFIP, 1989, pp. 126-130. Basu, A., “Knowledge Views in Multiuser Knowledge-BasedSys. tems,” Proc. Fourth /E€€ lnt7 Data Eng. Conf., IEEE CS Press, Los Alamitos, Calif., Order No. 827 (microfiche only), 1988, pp. 346-353. Chamberlin, D.D., J.N. Gray, and I.L. Traiger, “Views, Authorization, and Locking in a Relational Data Base System.” Proc. 1975 Nat’l Computer Conf., Vol. 44,AFIPS Press, pp. 425-430. Lai, K-Y., T.W. Malone, and K-C. Yu, “Object Lens: A Spreadsheet for Cooperative Work,” ACM Trans. Office lnformation Systems, Vol. 6, NO.4, Oct. 1988, pp. 332-353. Wiederhold, G., “Views, Objects, and Databases,” Computer, Vol. 19, No. 12, Dec. 1986, pp. 37-44. Methods to gather an appropriate amount of data. Conventional database management systems do not deal well with data that are recursively linked. To gather all instances, computations to achieve closure may process data from a relational database, as in some logic and database projects. To select sufficient instances for a narrowly phrased query, it is possible to broaden a search by generalization. A frequent abstraction is to derive temporal interval representations from detailed event data. Chen, M.C., and L. McNamee, “A Data Model and Access Method for Summary Data Management,” Fifth /E€€ lnt‘l Data Eng. Conf., IEEE CS Press, Los Alamitos, Calif., Order No. 1915, 1989, Pp. 242-249. DeZegher-Geets, I.,et al., “Summarization and Display of Online Medical Records,” M.D. Computing, Vol. 5, No. 3,Mar. 1988, pp. 38-46. Ozsoyoglu, Z.M.. and G. Ozsoyoglu, “Summary-Table-By-Example: A Database Query Language for Manipulating Summary Data,” First /€€E lnt’l Data Eng. Conf., IEEE CS Press, Los Alamitos, Calif., Order No. 533 (microfiche only), 1984, pp. 193202. Much information is available in the form of text. Most text processing is limited today to selection and presentation. More can be done. Most routine reports tend to have a degree of structure that makes some analysis feasible. Developing standards for text representation will help, regularizing textual information for further processing and mediation. Callahan, M.V., and P.F. Rusch, “Online Implementation of the Chemical Abstract Search File and the Chemical Abstracts Ser. vice Registry Nomenclature File,” Online Rev., Vol. 5, No. 5, Oct.1981, pp. 377-393. zyxwvutsrqponm Chaudhuri, S., “Generalization and a Framework for Query Modification,” Sixth /€€E lnt7 Data Eng. Conf., IEEE CS Press, Los Alamitos, Calif., Order No. 2025, 1990, pp. 139-145. Ullman, J.D., Principles of Database and Knowledge-Base Systems, Vol. /I, Computer Science Press, 1989. Wiederhold, G., S.Jajodia, and W. Litwin: “Dealing with Granularity of Time in Temporal Databases,” Lecture Notes in Computer Science. Vol. 498, R. Anderson et al., eds., Springer-Verlag, N.Y ., 1991, pp. 124-140. Methods to access and merge data from multiple databases. These have to compensate for mismatch at the level of database systems, database structure, and the representation and meaning of the actual data values. Mismatch is often due to differing temporal representations, say, monthly budgets and weekly production figures. These methods may induce uncertainty in the results because of mismatched sources. Chiang, T.C., and G.R. Rose, “Design and Implementation of a Production Database Management System (DBM-P),” Bell System TechnicalJ., Vol. 61, No. 9, Nov. 1982, pp. 2,511-2,528. Dayal, U,, and H.Y. Hwang, “View Definition and Generalization for Database Integrationin Multibase: A System for Heterogeneous Databases,” /E€€ Trans. Software Eng., Vol. SE-10, No. 6. NOV. 1983, pp. 628-645. McCune, B.P., et al., “Rubric: A System for Rule-Based Information Retrieval, ‘/E€€ Trans. Software Eng., Vol. SE-l l , No. 9, Sept. 1985, pp. 939-945. Sager, N., et al., Medical Language Processing, Computer Management of Narrative Data, Addison-Wesley, Reading, Mass., 1987. Mediators may maintain derived data for the sake of efficiency. Having derived data reduces the need to access databases, but the intermediate knowledge has to be maintained. Research into truth-maintenance is relevant here. There is also a problem of maintaining integrity under concurrent use. Filman, R.E., “Reasoning with Worlds and Truth Maintenance in a Knowledge-BasedProgramming Environment,” Comm. ACM, Vol. 31, No. 4, Apr. 1988, pp. 382-401. Hanson, E.,“A Performance Analysis of View Materialization Strategies,” Proc. ACM SIGMOD, 1987, pp. 440-453. Kanth, M.R., and P.K. Bose, “Extending an Assumption-Based Truth Maintenance System to Databases,” Fourth /E€€ lnt’l Data Eng. Conf., IEEE CS Press, Los Alamitos, Calif., Order No. 827 (microfiche only), 1988, pp. 354-361. Roussopoulos, N., and H. Kang, “Principles and Techniques in the Design of ADMS,” Computer, Vol. 19, No. 12, Dec. 1986, pp. 19-25. zyxwvutsrqponm tied t o a specific database o r to a particular application. Medicitors are modules occupying an explicit. active layer between the user applications and the data resources. They will be accessed by application programs residing in the user workstations. (Recall that our goal is a sharable architecture.) Mediators form a distinct middle layer. making the user applications independent of the data resources. What are the transforms needed in such a layer, and what form will the modules supporting this layer have? The responses to these questions are interrelated. as in any good cooperative expert system. In this sense, the mediators provide data about themselves in response to inspection, and such data could be analyzed by yet another mediator module, an inspector mediator. Since there will eventually be a great number and variety of mediators. users have t o be able to choose among them. Inspectability enables that task. For instance, we might have distinct mediators that can provide the names of the best consultants for database design. Alternate metamediators are likely to use different evaluation criteria: one may use the number of publications and another the number of clients. Some metamediators will have to exist that merely provide access to catalogs listing available mediators and data resources. The search may go either way: For a given data source, it may be necessary to locate a knowledgeable mediator; for a desirable mediator. we need to locate an adequate data resource. It will be essential that the facilities provided by these metalevel mediators can be integrated into the general processing model. since search for information is always an important aspect of information processing. Where search and analyses are separated - as is still common today in, for instance. statistical data-processing -trying to find and understand the data is often the most costly phase of information processing. Since many databases are autonomous, it is desirable that only a limited and recognizable set of mediators depend on any one of them. Focusing data access through a limited number of views maintained by these mediators provides the data independence necessary for databases that are evolving autonomously. Currently, compatibility constraints are hindering growth of databases in terms of structure and scope, since many users are affected. As the number of users and the automation of access increase. the importance of indirect access via mediators will increase. sible, the model of common information access I envisage is bound t o remain fictitious. T h e research challenge then lies in the interface and its support. Our hardware environment implies that mediators can live on any node, not just on workstation and database hosts. Mediator interfaces. The two interfaces t o the mediator layer are the most critical aspect of this three-layer architecture. Today‘s mediatingprograms use a wide variety of interface methods and approaches. The user learns one o r a few of them and remains committed to that choice until its performance becomes wholly unacceptable. Unless the mediators are easily and flexibly acces- T h e mediator to the database managem e n t s y s t e m interface. Existing database User’s workstation interface to the mediators. The range of mediator capa- zyxwvutsrqp Three architectural layers. W e create a central layer by distinguishing the function of mediation from the user-oriented processing and from database access. Most user tasks will need multiple, distinct mediators for their subtasks. A mediator uses one o r a few databases. As Figure 4 shows, the interfaces to be supported provide the cuts where communication network services a r e needed. Unfortunately, the commonality of functions described in the examples (see the “Mediation” sidebar) does not extend to an architectural commonality: All the examples cited are bound to the data resources and the end users’ applications in their idiosyncratic ways. This is where new technology must be established if fusion at the application level is to be supported. Accessing one mediator at a time does not allow for fusion. and seeing multiple results o n distinct windows of one screen does not support automation of fusion. bilities is such that a high-level language should evolve to drive them. Here, I am thinking of language concepts, rathe r than interface standards, to indicate the degree of extensibility that must be provided if the mediating concepts are to b e generalized. Determining an effective interface between the workstation application and the mediators will be a major research effort in establishing systems sharing this architecture. It appears that a language is needed to provide flexibility, composability ,iteration, and evaluation in this interface. Descriptive, but static, interface specifications seem unable to deal with the variety of control and the information flow that must be supported. The basic language structure should permit incremental growth so that new functions can be supported as mediators join the network to provide new functionality. It is important to observe that I d o not see a need for a user-friendly interface. What is needed here is a machineand communication-friendly interface. Application programs executing on the users’ workstations can provide the type of display and manipulation functions appropriate to their users. Omitting the criterion of user friendliness avoids the dichotomy that has led to inadequacies in t h e Structured Query Language (SQL), which tries to be user friendly, while its predominant use is for programmed access.’Standards needed here can only b e defined after experience has been obtained in sharing these resources to support the high-level functions needed for decision making. zyxwvuts zyxwvutsrqpo Mediators. Having listed some examples of mediators in use o r planned for specific tasks. I offer this general definition: A mediator is a software m o d u l e thrrt exploits encoded kriowledge a b o u t certain sets o r subsets of data to create information fi,r a higher laver of applications. We place the same requirements on a mediation module that we place on any software module: It should be small and simple‘s0 that it can b e maintained by one expert or, at most, by a small and coherent group of experts. A n important, although perhaps not essential, requirement I’d like to place on mediators is that they b e inspectable by potential users. For instance, the rules used by a mediator using expert system technology can b e obtained by the user zyxw standards, such as SQL and the Remote Data Access ( R D A ) protocol, provide a basis for database access by mediators. Relational concepts - selection, views, etc. - are a good starting point, but much flexibility is possible. A mediator dealing with a specific database need not be constrained to a particular COMPUTER zyxwvutsrqpon zyxwvut protocol, while a more general mediator will gain applicability through a standard interface. A mediator that combines i n f o r m a t i o n f r o m multiple databases can use its knowledge to control the merging process. specifying relational operations directly. Joins may, for instance, be replaced by explicit scmijoins, so that intelligent filtering can occur during processing. Still, dealing with multiple sources is likely t o lead to incompleteness. Outer-joins are often required for data access to avoid losing objects with incomplete information. New access languages are needed t o manage sensor-based and simulation processes. Such systems also provide data to b e abstracted and fused. The separation of user applications and data sources provided by mediating modules allows reorganization of data structures and redistribution of data over the processing nodes of communication networks without affecting the functionality of the modules. The three-laye r architecture then makes an explicit trade-off favoring flexibility over integration. The argument is the distinction of data and the results of mediator modules: Result -+ Decision making Independent applications o n workstations User layer: ____ - managed by decision-makers __-- network services t o information servers Multiple mediators Mediator layer: ---- - managed by domain specialists network services t o data servers Multiple databases, Base layer: ----- ... -managed by database administrators Input t real-world changes Figure 4. Three layers of a mediator architecture. Sharing of mediator modules. Since we are getting access to so much more data from a variety of sources, which are arriving at ever higher rates. automated processing will be essential. The processing tasks needed within the mediators - selection. fusion. reduction, abstraction. and generalization - are sketched in the interaction model of Figure 3. Diverse mediator modules will combine these functions in various ways to serve user applications at the decision-making layer above. The mediator modules will be most effective if they can serve a variety of applications. The applications will compose their tasks as much as possible by acquiring information from the set of available mediators. Unavailable information may motivate the creation of new mediators. Sharing reinforces the need for two types ofpartitioning. First. there are the three horizontal layers supporting end users, mediators. and databases. Second, each of those layers will be vertically partitioned. There will be multi- ple-user applications, each using various configurations of mediators. Each of the mediators in turn will use distinct views over one or several databases. Vertical partitioning does not create a hierarchy. Just as databases are justified by a diversity of shared usage. mediators should be sharable. Today’s expert systems are rarely modular and sharable, so their development and maintenance cost is harder to amortize. For instance. the mediation module that can deal with inflation adjustment can be used by many applications. T h e mediator that understands postal codes and town names can b e used by the post office. express delivery services, and corporate mail rooms. Partitioning leads t o modules, as sketched in Figure 5. zyxwvutsrqpo Sharability of information requires that database results can be configured according to one of several views. Mediators, being active, can create objects for a wide variety of orthogonal views. Making complex objects themselves persistent, on the other hand, binds knowledge to the data - hampering sharability. .The loss of performance due to the interposition of a mediator can be overcome, for instance, via techniques listed in the section entitled “SODS.” These arguments d o not yet address the distribution of the mediators I envisage. Available interfaces. We need interface protocols for data and knowledge. Mediation defines a new layer within the application layer defined by the open-system architecture. Open-systems layers will soon provide good communication support. However, as mentioned earlier, communication of data alone does not guarantee that the data will be correctly understood for processing by the receiver. Mediation considers the meaning assigned to the bits stored; Figure 2 contains some examples. March 1992 Distribution of mediators. I have implied throughout this article that mediators are distinct modules, distributed over the network. Distribution can b e motivated by greater economy for access, by better locality for maintenance, zyxwvutsrqp zyxwvuts I QueryL Relevant responses? ... w Formatted query& Inspection& .,. 1 - j .., 1 - Bulky responses? p z z q ... p i z i q .,. tf Experts Triggered events? p z G q ... p z G q All F l a r e distributed over nationwide networks. Figure 5. Interfaces for information flow. 45 and by autonomy. For mediators the two latter arguments are the driving force for distribution. Why shouldn't mediators be attached to databases? In many cases. that may be feasible: in general it is not appropriate because A mediator contains knowledge that is beyond the scope of the database proper. A database programmer dealing with. say, a factory production-control system cannot be expected to foresee all the strategic uses of the collected information. Concepts of abstraction are n;t part of database technology today. The focus has been on reliable and consistent management of large volumes of detailed facts. Intelligent processing of data will often involve dealing with uncertainty, adding excessive and undesirable complexity to database technology. Many mediators will access multiple databases to combine disjointed sets of data prior to analysis and reduction. Similarly. we can argue that the mediators should not be bound to the users' workstation applications. Again. the functions that mediators provide are different in scope from the tasks performed on the workstations. Workstation applications might use a variety of mediators to explore the data resources. Maintenance issues, a major motivation for keeping mediators distinct. have received insufficient attention in the past. During the initial stage of most projects that developed expert systems. knowledge bases simply grew and the cost of knowledge acquisition dominated the cost of knowledge maintenance. Many projects, in fact. assumed implicitly that knowledge, once obtained, would remain valid for all times. Although some fundamental rules may indeed not change for a long time. new concepts arise, older ones are partitioned. and definitions a r e refined as demands change. The importance of knowledge maintenance leads to systems composed of small knowledge units focused on specific domains! Maintenance of knowledge stored within an application by an outside expert system is intrusive and risky. T h e end user should make the decision when to incorporate new knowledge. It is best to keep the mediator modules distinct from the applications. Mediators are associated with the domain expert, but may be replicated and shipped to other network nodes to increase their effectiveness. Specialization increases the power and maintainability of the mediators and provides choices for users. The efficiency concerns of separating knowledge in mediators and data in databases can be mitigated by replication. Since mediators (incorporating only knowledge and no factual data) are relatively stable, they can be replicated as needed and copied onto nodes along the data highway where they are maximally effective. Their content certainly should not change during a transaction. As long as the mediators remain small. they can also be easily shipped to a site where substantial data volumes have to be processed. with in many specific situations. Rather than extrapolating into the unknown. I define an architecture that is based on a number of known concepts. Object-oriented concepts. We have great expectations from object-oriented concepts, since these provide a semantic. meaningful clustering of relate d data and methods. A mediator can be viewed as an autonomous superobject. It also hides the representation of the underlying data. However, the mediator should be accessed by a higher level language. Internally. the choice of language is not restricted. A n import ant difference bet ween mediators and objects is in their scale. This difference is reflected in their scope and ownership. Mediators are independent units and have to deal with multiple applications. Furthermore. they d o not need to encapsulate data. T h e network of connections within the global architecture means that distinct tasks can intersect at nodes within this information-processing structure. The same mediator type may access distinct sets of data. and information from one data source can be used by distinct mediators. Sources can include object databases. zyxwvutsrq Triggers for knowledge maintenance. I have incorporated one aspect of mediation into Figure 5 that has not yet been discussed. Since the knowledge in the mediator must be kept up to date. placing triggers o r active demons in databases' is useful. Now the mediators can be informed when the database and. by extension, the real world changes. The owner of the mediator should ensure that such changes are, in due time. reflectedin the mediator's knowledge base. Again justified by the analogy to human specialists, I consider that a mediator is fundamentally trustworthy but is inspectable when suspicions of obsoleteness arise. For instance, an assumption, say, that well-to-do people buy big cars may be used in the marketingmediator, but it is possible that over time this rule will become invalid. I expect that the base data will be monitored for changes and that exceptions to database constraints will trigger information flow to the mediator. In a rulebased mediator the certainty factor of rules can be adjusted. If the uncertainty exceeds a threshold. the mediator can advise its creator. the domain expert. to abandon this rule. The end user need not get involved.' Related research Independent actors and agents. Independence is even more prominent in the concept of Actors. as proposed by Hewitt."'In an Actor architecture, modules operate independently but are assumed to cooperate towards a common goal. Mediators d o not act independently. They respond to queries from applications o r to triggers placed in databases. They d o not interact intelligently with each other: a hierarchy is imposed for every specific task. This limitation provides computational simplicity and manageability. T h e extent to which networks of autonomous Actors can be motivated is unclear. The concepts underlying Agents, as developed to control robot activities, are based on control mechanisms similar to those in mediators. Such Agents do not yet deal with abstractions. mismatch. and autonomous data resources. We have seen that many of the individual concepts underlying this architecture were found in earlier work. This is hardly surprising, since the problems that future information systems must address exist now and are being dealt Maintenance and learning. T h e knowledge embodied in mediators cannot be permitted to be static. Knowledge maintenance is based on feedback. In effective organizations. lower levels of management involved in informa- zyxwvutsrqpo Ih COMPUTER tion processing provide feedback to sup e r i o r layers. Knowledge in mediators will initially b e updated by human experts. For active knowledge domains, some automation will be important. Checking for inconsistencies between acquired data and assumed knowledge is the next step. Eventually, some mediators will b e endowed with learning mechanisms. Feedback for learning might either come from performance measures o r from explicit induction over the databases they manage. Learning is triggered by monitors placed in the database. Ideally, every rule in the mediator is related t o triggers. For instance, t h e r u l e “Wealthy people buy big cars” requires triggers o n income and car ownership. Now, changes in the database can continuously update hypotheses of interest within the mediator. Learning by modifying certainty parameters in the knowledge base is relatively simple. Tabular knowledge - or say, a list of good customer types -can b e augmented. Learning new concepts is much more difficult, since we have n o mechanisms that relate observations automatically to unspecified symbolic concepts. By initially depending fully o n the human expert to maintain the mediator and later updating parameters of rules, we can gradually move to automated learning. spaces. Uncertainty computations are needed to deal with missing data and mismatched natural classes. SODs. T h e knowledge-based management systems (KBMS) project group at Stanford University has formulated a specific form of mediator, which focuses on well-structured semantic domains of discourse (SODS).” A S O D provides a declarative structure for the domain semantics, suitable for an interpreter. We see multiple SODs being used by an application, executing a long and interactive transaction. I summarize our concepts here as one research avenue for the architecture I have presented. Specific features and constraints imposed on SODs are Interface language. Defining the language needed to effectively invoke a variety of mediators is the major issue. If we cannot express the high-level concepts encapsulated in the mediators well, we will not b e able t o implement the required services. F o r application access t o SODs, we start from database concepts, where high-level languages have become accepted. T h e S O D access language (SODAL) must include the functional capabilities of SQL, plus iteration, test, and ranking. New predicates are needed to specify intelligent selection. Selection of desirable objects requires an ability to rank objects according to specified criteria. These criteria are understood by the S O D and are not necessarily predicates on underlying data elements, although for a trivial S O D that may be true. These criteria are associated with such result size parameters as “Give m e the 10 best where the best predicate is a semantically overloaded term interpreted internally t o a particular SOD. Given an application to find reviewers for an article, the Expertise S O D will rank candidate scientists by best match to the article’s keywords, while a Competency SOD will rank the candidates by the quality of their publication record. Other S O D s can assess the potential bias and responsiveness of the candidates. T h e format of S O D A L is not user friendly; after all, other subsystems will use the SODS, not people. It should have a clear and symmetric structure that is machine friendly. It should b e easy to build a user-friendly interface, if needed, on top of a capable S O D A L . zyx zyxwvutsrqp zyxwvutsrqpo Implementation techniques. Mediators will embody a variety of techniques now found in freestanding applications and programs that perform mediation functions. These programs are now often classified by the underlying scientific area rather than by their place in information systems. The nature of mediators is such that many techniques developed in artificial intelligence will b e employed. We expect that mediators will often use declarative approaches. capability for explanation, heuristic control of inference, pruning of candidate solutions. *evaluation of the certainty of results. and estimation of processing costs for high-level optimization. The literature on these topics is broad.’ Heuristic approaches are likely to be important because of the large solution March 1992 The knowledge should be represented in a declarative form. Each should have a well-formed language interface. Each should contain feature descriptions exploitable by the interpreter. Each should b e inspectable by the user applications. Each should be amenable to parallel execution. Each accesses the databases through relational views. During execution, source data and derived data objects are bound in memory. Each can share instances of objects. By placing these constraints o n S O D s as mediators, proofs of their behaviors and interaction are feasible. Provable behavior is not only of interest to developers, but also provides a basis for prediction of computational efficiency. However. the modularity of S O D s causes two types of losses: (1) Loss in power, due to limitations in interconnections. (2) Loss in performance. due to reliance on symbolic binding rather than on direct linkages. We hope to offset these losses through gains obtained from having structures that enable effective computational algorithms. A n implementation of a demonstration using frame technology is being expanded. Of course, the longr a n g e benefit is t h a t small, wellc o n s t r u c t e d m e d i a t o r s will e n a b l e knowledge maintenance and growth. x,” Limits and extensions T h e separation into layers envisaged here reduces the flexibility of information transfer. Structuring the mediators into a single layer between application and data is overly simplistic. Precursors to general mediators already recognize hierarchies and general interaction, as Actors do.“’ T h e simple architecture described here is intended to serve large-scale applications. To assure effective exploitation of the mediator concepts, I propose to introduce complexity within layers slowly, only as the foundations are established to permit efficient use. Structuring mediators into hierarchies zyxwvut 41 zyxwvut A futuristic world with mediators A mediator contains an expert's knowledge and makes that expertise available to applications customers. One can envisage a world where mediators can be purchased or leased for use. A market for mediators enables experts to function as knowledge generators and sell their knowledge in a form that is immediately usable. Traditional papers and books are a knowledge representationthat requires extensive human interpretation. To apply the knowledge from a book to actual data, a program has to be written and tested. The knowledge in a mediator is ready for consumption. There will be a trade-off in such a market between mediators that are powerful and highly specialized and mediators that are more basic and more general. The latter will sell more copies but at a lower price. A mediator that contains very valuable knowledge may only be made available for use on its home computer, since limited access can greatly reduce the risk of unauthorized copying. A market economy of mediators will greatly change the interaction between knowledge workers. Publication volume will decrease. Mediators that are incomplete, incorporate foolish assumptions, or contain errors will soon lose their credibility. A review mediator can make user comments available to a wide community. To introduce new and improved mediators, advertisements can be posted via a news mediator. Metamediators can process advertisements and reviews to help applications make sensible decisions. base access knowledge and understand little about application-domain semantics. Optimizers may restructure the information flow, taking into account success o r failure with certain objects in one of the involved SODS. More complex is lateral informationsharing among mediators. Some such sharing will be needed t o maintain the lexicons that preserve object identity when distinct mediators group and classify data. Fully general interaction between mediators is not likely to b e supported at this level of abstraction. Just as human organizations are willing to structure and constrain interactions, even at some lost-opportunity cost, we impose similar constraints on the broad information systems we envisage. Requirements of data security will impose further constraints. Dealing with trusted mediators, however. may encourage database owners t o participate in information sharing to a greate r extent than they would if all partici p a n t s would n e e d t o b e g r a n t e d file-level access privileges. formation from their subdomains. Good scheduling is critical. The knowledge-based paradigms inherent in intelligent mediators indicate the critical role of artificial intelligence technology foreseen when implementing mediators. Mediators may be strengthened by having learning capability. Derived information may simply be stored in a mediator. Learning can also lead to new tactics of data acquisition and control of processing. It is not the intent of the mediatorbased model to b e exclusive and rigid. The model is intended to provide a common framework under which many new technologies can be accommodated. An important objective is to utilize a variety of information sources without demanding that they b e brought into a common internal format. In a 1990 report.': the three primary issues to be addressed in knowledgebased systems were listed as maintenance, problem modeling, and learning and knowledge acquisition. T h e architecture presented here contributes t o all three issues, largely by providing a partitioning that permits large systems to b e composed from modules that are maintainable. W zyxwvutsrqponm T h e proposed generalization of practices seen today takes advantage of modern hardware. The architecture can focus a variety of research tasks needed to support such systems. Some extensions, such as general uncertainty algebras, are beyond today's conceptual foundations. As stated earlier, we can take a cue from Vannevar Bush,3 who could identify all units needed for the Memex - although its components were based on technology that did not exist in 1945. A language will b e needed to provide flexibility in the interaction between the end users' workstation and the mediators. The partitioning of artificial intelligence paradigms into pragmatics (at the user-workstation layer) and the formal infrastructure (in the mediation layer) are discussed elsewhere." For query operations, the control flow goes from the application to the mediator. There, the query is first interpreted to plan optimal database access. The data obtained from that database would flow to the mediator, be aggregated, reduced, pruned, and so on, and the results reported to the query originator. Multiple mediators serve an application with pieces of in- Acknowledgments This article integrates and extends concepts developed during research into the management of large knowledge bases, primarily supported by DARPA under contract N39-84-C-211. Useful insights were gathered via interaction with researchers at Digital Equipment Corp. and with the national HIV-modeling community. The 1988 DARPA Principal Investigators meeting helped by providing a modern view of the communication and processing capabilities that lie in our future. Robert Kahn of the Corporation for National Research Initiatives encouraged development of these ideas. Further inputs were provided by panel participants at a number of conferences and at the National Science Foundation Workshop on the Future of Databases. Tore Risch at the Hewlett-Packard Stanford Science Center is conducting research on triggers, and his input helped clarify salient points. Andreas Paepke of HP commented as well. Witold Litwin of the Institut National de Recherche en Informatique et en Automatique and the students on the Stanford KBMS project. especially Surajit Chaudhuri, provided a critical review and helpful comments. I thank the reviewers from Computer for their careful and constructive comments. zyxwvutsrqponm should not lead to problems. W e already required that directory mediators could be inspected. Directory mediators can help select other mediators by inspecting them and analyzing their capabilities. High-level mediators can obtain help in locating and formatting data from lower level mediators. Lowlevel mediators might only have data- -1s COMPUTER zyx References zyxwvut zyxwvutsrqp Alamitos. Calif.. Order No. 827 (microfiche only). 1988. pp. 222-231. 1. J.S. Mayo and W.B. Marx Jr., “Introduction: Technology of Future Networks,” AT&T Technical J.. Vol. 68, No. 2, Mar. 1989. D. Tsichrit~iset al.. ”KNOs: Knowledge Acquisition. Dissemination. and Manipulation Objects.“ ACM Truns. Office Information Srlstems. Vol. 5 , No. I , J a n . 1987. pp. 96-1 12. 2. On Knowledge Base Management Systems:Integrating Artificial Intelligence and Database Technologies, M. Brodie, J. Mylopoulos, and J. Schmidt,eds., Springer Verlag, N.Y., June 1986. 3. V. Bush, “As We May Think,” Atlantic Monthly, Vol. 176, No. 1, 1945, pp. 101108. T. Risch, “Monitoring Database Objects,” Proc. 15th Conf. Very Large Data Bases. 1980, Morgan Kaufmann. San Mateo. Calif.. pp. 445-454. G i o Wiederhold, a professor of computer science and medicinc at Stanford Univcrsity, is on leave performing program management at DARPA. the Defense Advanced Research Projects Agency. His research interests are the application and development of knowledge-based techniques for database management. Wiederhold authored a widely used textbook, Database Design (McGraw-Hill,1977,1983);a textbook,Me& ical Informatics (Addison-Wesley, 1991), written with Ted Shortliffe; and a book. File Organizationfor Database Design (McGrawHill, 1987). He has written more than 200 published papers on computing and medicine, is editor-in-chief of ACM’s Trunsactions on Database Systems. and is associate editor of Springer-Verlag’s M.D. C o m p u f - 4. M.M. Waldrop, “The Intelligence of Organizations,” Science,Vol. 225, No. 4,667, Sept. 1984, pp. 1,136-1,137. 10. C. Hewitt, P. Bishop, and R. Steiger. “A Universal Modular Actor Formalism for Artificial Intelligence.” Proc. Third Int’l Joint Conf. Artificial Intelligence, SRI. 1973. pp. 235-245. 5. Handbook of Artificial Intelligence, P.R. Cohen and E. Feigenbaum, eds., Morgan Kaufmann, San Mateo, Calif., 1982. I 1 G. Wiederhold et al., “Partitioning and Combining Knowledge.“ Information Systems. Vol. 15. No. I , 1990. pp. 61-72. 6. M. Bull et al., “Applying Software Engineering Principles to Knowledge-Base Development,” Proc. Expert Systems and Business 87, Learned Information, Meadford, N.J., 1987. pp. 27-37. 12. B.G. Buchanan et al.. ”Knowledge-Based Systems,” J. Traub. ed., Ann. Rev. Computer Science. 1990. Vol. 4, pp. 395-416. 7. M. Stonebraker. “Future Trends in Database Systems,”Proc. Fourth IEEEInt’l Data Eng. C o n f , IEEE CS Press, Los Readers can contact the author. as well as obtain a report with more references, at the Department of Computer Science, Stanford University, Stanford.CA 94305-2140. His e-mail address is [email protected]. ing. Wiederhold received a degree in aeronautical engineering in Holland in 1957 and a PhD from the University of California at San Francisco in 1976. He is an IEEE fellow and a member of the IEEE Computer Society. zyxw zyxwv zyxwvu TECHNOLOGY AND MANAGENIEINT CONSULTING Arthur D. Little’s Technology Resource Center, located in Washington, D.C., is in its ninth year of technology research and planning for its client, the United States Postal Service. We seek outstandina candidates in the following fields: - MAIL PROCESSING SYSTEMS PatternReco$rrition Responsibilities include defining, managing, directing and evaluating automated address reading algorithms research and real-time implementation.Thesepositionsrequireexpertisein several or all of the following areas: document analysis, pattern recognition.character and word recognition (machine-printed and handwritten), and contextual analysis algorithm development; and real-time implementation and architectural design for document analysis or pattern recognition systems. An M.S. or Ph.D. in Electrical and Computer Engineering, Computer Science, or other closely related field, excellent oral, written, and organizational skills, and 5 or more years of R&D experience are required. Management and supervisory experience are highly desirable, but not required. Dept. 125 Electro-MechanicalSystems Engineering Data Structures Responsibilities include defining, managing, directing and evaluating research and real-time imflemen!ation of large-scaleand complex data stora e and retrievalsystems. his position requires experience in several or all of ti?e following areas: data structures and data organization; data com ression, storage, and retrieval involving inexact matching; structured anabsis and design; large-scale scientific data analysis. An M.S. in Computer Science or other closely related fields, broad ex osure to hardware and software, excellent oral, written and organizationafskills and 8 or more yearsof relevant experience are also required. Some project management experience is desirable. Dept. 127 zyxw zyxwvutsrq Responsibilities include coordinating requirements and developing activities for a special purpose mail handling system. Extensive experience with development of hardware and software control processes and device interfaces related to high-speed paper handling systems such as check sorters envelope inserters etc. is required as well as a demonstrated success in dealing with end usersand sup liersof customcomponentsandsubsystems. AB.S. in Electrical Agineering, excellent verbal, written and organizational skills, and 5 or more years of R&D and project management experience are desired. Dept. 126 Arthur D. Little’s Technology Resource Center is providing long-term advanced technology development sup ort for (he U S . Postal Service. This includes support of a contract researcE program to improve the efliciency of all postal operations. These high-visibility positions offer opportunities for direct client contact. participation in high-level planning and work in a teamoriented, multidisciplinary research environment. If you are interested in exploring employment o portunities with us, send 8 resume to the appropriate department numger at Arthur D. Little, 20 Acorn Park, Cambridge, MA 02140. We are an equal opportunity employer, WF.