Academia.eduAcademia.edu

Restrictiveness and guidance in support systems

2011, Omega

Omega 39 (2011) 242–253 Contents lists available at ScienceDirect Omega journal homepage: www.elsevier.com/locate/omega Restrictiveness and guidance in support systems Paul Goodwin a,n, Robert Fildes b, Michael Lawrence c, Greg Stephens c a b c The Management School, University of Bath, Bath BA2 7AY, United Kingdom Lancaster Centre for Forecasting, Lancaster University Management School, Lancaster University, Lancaster LA1 4YX, United Kingdom School of Information Systems, University of New South Wales, Sydney 2052, Australia a r t i c l e in f o a b s t r a c t Article history: Received 30 September 2009 Accepted 2 July 2010 Available online 24 July 2010 Restrictiveness and guidance have been proposed as methods for improving the performance of users of support systems. In many companies computerized support systems are used in demand forecasting enabling interventions based on management judgment to be applied to statistical forecasts. However, the resulting forecasts are often ‘sub-optimal’ because many judgmental adjustments are made when they are not required. An experiment was used to investigate whether restrictiveness or guidance in a support system leads to more effective use of judgment. Users received statistical forecasts of the demand for products that were subject to promotions. In the restrictiveness mode small judgmental adjustments to these forecasts were prohibited (research indicates that these waste effort and may damage accuracy). In the guidance mode users were advised to make adjustments in promotion periods, but not to adjust in non-promotion periods. A control group of users were not subject to restrictions and received no guidance. The results showed that neither restrictiveness nor guidance led to improvements in accuracy. While restrictiveness reduced unnecessary adjustments, it deterred desirable adjustments and also encouraged over-large adjustments so that accuracy was damaged. Guidance encouraged more desirable system use, but was often ignored. Surprisingly, users indicated it was less acceptable than restrictiveness. & 2010 Elsevier Ltd. All rights reserved. Keywords: Restrictiveness Guidance Judgmental forecasting Sales promotions System design 1. Introduction Managers in companies often use computerized support systems to produce forecasts of demand for their products. These systems use statistical algorithms to extrapolate past patterns, but also provide facilities for managers to apply judgmental adjustments to these forecasts, where they consider these to be appropriate. However, recent research has shown that smaller adjustments tend to waste considerable management time and can also lead to reductions in forecast accuracy [1]. Often small adjustments are made because the forecaster has falsely seen patterns in the noise associated with the demand time series. In addition, such adjustments may be prompted because the forecaster has an illusion of control over the variable that is being forecast [2]. In contrast, larger adjustments can be effective in improving accuracy [1]. This is because they are usually made to take into account the effects of an important special event (e.g. a sales promotion campaign), which the statistical forecast has ignored. This raises the question: can facilities be incorporated into forecasting support systems to reduce the number of gratuitous (and damaging) interventions that managers make to their forecasts while, at the same time, not n Corresponding author. Tel.: +44 1225 383594; fax: +44 1225 386473. E-mail address: [email protected] (P. Goodwin). 0305-0483/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.omega.2010.07.001 discouraging (or even encouraging) them to make adjustments when these are necessary? The information systems literature has identified two approaches that are designed to improve the way in which people use decision support systems, restrictiveness and guidance [3]. Both approaches conceptualize a gap between a descriptive model of how people use the system and a normative model [4] and they involve the incorporation of features designed to move the user towards the normative model. System restrictiveness is defined as ‘the degree to which and the manner in which a Decision Support System limits its users’ decision making processes to a subset of all possible procedures’ [3]. Restrictiveness can be applied to the structure of decision making and forecasting processes by restricting the set of available activities (e.g. a forecasting system could prevent the use of more complex models where their use might be damaging, simply be excluding the facility) and to the sequencing of activities (e.g. a system could require users to record a reason for adjusting a statistical forecast before being allowed to produce forecasts for other products). It can also be applied to the execution of a decision making or forecasting process. For example, it could restrict the size of smoothing parameters that can be used in exponential smoothing to a particular range to prevent forecasts overreacting to noise in a time series. Alternatively, it could proscribe behaviors that are associated with cognitive biases. P. Goodwin et al. / Omega 39 (2011) 242–253 The second approach to improving system use is decisional guidance. Silver [3] defines this as ‘the degree to which and the manner in which a Decision Support System guides its users in constructing and executing decision making processes, by assisting them in choosing and using its operators’. As Silver notes, support systems are intended to combine the strengths of human judgment with those of machines and hence a system can provide guidance on when judgmental inputs are most appropriate. Although restrictiveness and guidance are not mutually exclusive features of support systems, the less restrictiveness there is in a system, the greater the scope, and possible need for guidance [3]. This suggests that it is useful to examine the relative effectiveness of the two approaches. While both have the objective of improving the quality of decision making or forecasting, their effectiveness also needs to be assessed on other important dimensions such as user acceptance, effect on user learning and the efficiency with which the task is carried out [5]. However, despite the attention that they have received in the literature, we know of only one empirical study that has directly compared the effectiveness of restrictiveness and guidance. In this case the two approaches were tested on undergraduate students using a knowledge-based system [6]. The authors of this paper also highlighted the absence of direct comparative studies in this area. The current paper examines the relative effectiveness, on a number of key dimensions, of restrictiveness and decisional guidance in a support system, which was designed to improve the quality of judgmental adjustments to statistical forecasts generated by the system. This type of system is very widely used in practice [1]. In an experiment, participants in one set of treatments encountered restrictions, which prevented them from making adjustments below a certain size. Participants in a second set of treatments received guidance on whether or not they should make adjustments when the system identified that their behavior departed from that prescribed by the system. The next section reviews the relevant literature and sets out the specific research questions that were investigated. Section 3 then describes the experiment which was designed to address these questions and the results are discussed in Section 4. Finally, the paper considers the practical implications of the results and makes suggestions for further research. 2. Literature review and research questions While some companies rely solely on managers’ judgments to make forecasts (e.g. [7]), forecasting in many supply chain businesses is a semi-structured task where both statistical algorithms and human judgment can play complementary roles [8–10]. Statistical methods are efficient at detecting systematic patterns from noisy time series while humans can take into account the effects of important one-off events that are known to be occurring in the future. Performance in the forecasting task is therefore likely to be improved through the use of support systems and researchers have explored the effectiveness of a number of design features of such systems in the context of forecasting [11] and in other domains (e.g. [12,13]). When the task involves applying managerial judgmental adjustments to statistical forecasts, two stages of the task need to be supported: (1) the decision on whether or not to adjust the statistical forecast and (2) (if a decision is made to adjust) the decision on how large the adjustment should be. Some researchers have investigated the effectiveness of facilities designed to support the second stage of the task. For example, [14] explored the use of a database of analogous special events to allow the user to estimate the effects of forthcoming events, while [15] 243 investigated the role of decomposition in simplifying the task of adjusting forecasts for multiple special events. However, there have been relatively few studies which have looked at ways of supporting the decision on whether or not to adjust in the first place. One exception was a study by Willemain [16], which found that adjustment is likely to improve accuracy when a naı̈ve forecast outperforms a more sophisticated statistical forecast. Goodwin [17] found that requiring forecasters to give reasons for their adjustments significantly reduced the frequency of unnecessary adjustments without reducing the propensity to make adjustments that were beneficial. Indeed, even requiring a forecaster to answer the question: ‘‘Do you wish to make an adjustment?’’ reduced the propensity to make unnecessary adjustments, indicating the significant impact that small changes can make to the operation of a support system. A major motivation for focusing on the decision to adjust is the finding that companies devote huge amounts of time and effort to the forecast adjustment process. For example [1] found that a major food processing company judgmentally adjusted 91% of its statistical forecasts, while [18] estimated that a pharmaceutical company devoted 80 person hours of management time each month to forecast review meetings where the sole objective was to agree and adjust product demand forecasts. These studies have found that much of this effort does not contribute to accuracy. For example, in the pharmaceutical company studied in [18] half the improvements obtained through adjustment improved accuracy by less than 0.37% and only 51.3% of statistical forecasts were improved through adjustment. (It should be noted that in some cases adjustments are made by managers to avoid extreme errors, rather than to improve average accuracy. For example, adjustments may be motivated by the desire to avoid individual absolute percentage errors of over a certain percentage, which may attract the attention of senior managers.) A recent field study [1] found that the effectiveness of adjustments is partly related to their size. Larger adjustments tend to be more likely to improve accuracy because they are usually made with good reason. For example, an important event is known to be occurring in the future and the effects of this have not been incorporated into the statistical forecast. There are several possible reasons why smaller adjustments tend to be ineffective, or even damaging to forecast accuracy. Several studies [19–21] have shown that forecasters often see systematic patterns in the noise associated with time series and they make unnecessary adjustments to try to include these false patterns into the forecast. In addition, in some companies there is a tendency for forecasters to tweak forecasts merely to justify their role [1]. Smaller adjustments may also be made when forecasters are doubtful about the reliability of information about future events. As a result they may be unwilling to commit themselves to large adjustments, so instead, they ‘hedge their bet’ by compromising with a small adjustment. These findings are consistent with the normative view of the forecasting adjustment decision, as exemplified by the Forecasting Principles project [22] that adjustments should only be made on the basis of important domain knowledge [23]. If important domain knowledge implies relatively large adjustments then this suggests that a restrictive forecasting support system that prohibits smaller adjustments will move forecasters’ use of the system closer to that which is deemed to be normative. When initially deployed, such a system might not reduce wasted management effort because forecasters would still spend time estimating adjustments only to find that they are prohibited. However, as experience with using the system increases, learning should reduce these wasted interventions and eventually lead to more efficient use of management time. Nevertheless, prohibiting a user from carrying out certain tasks may have a detrimental 244 P. Goodwin et al. / Omega 39 (2011) 242–253 effect in the acceptability of the system. As Silver [3] points out, restrictiveness must not be such as to put people off using system—it should promote its use. The acceptance of systems has been the subject of extensive research, which Venkatesh and others have attempted to bring together in their Unified Theory of Acceptance and Use of Technology (UTAUT) model [24]. One of the determinants of acceptance in this model is ‘performance expectancy’, defined as ‘the degree to which an individual believes that using the system will help him or her to attain gains in job performance’. Forecasters who believe that their small adjustments are likely to improve accuracy would have low performance expectancy of such a restrictive system and, consequently, be unlikely to accept it. Guidance is an alternative approach to supporting adjustment decisions that might be more acceptable. It can take a number of different forms [25]. Suggestive guidance recommends particular courses of actions to the user, while informative guidance provides users with relevant information, but does not suggest a course of action. Both forms of guidance can be provided either in a pre-defined form, where all users receive the same guidance under a given set of conditions, or in a dynamic form where the advice is customised to meet the apparent needs of individual users under particular conditions [5]. Furthermore, guidance, irrespective of its type, can either be delivered automatically, or only when requested. One group of researchers [25] investigated the effectiveness of providing different forms of guidance in a task that involved the selection of a forecasting model and found that, while all forms of guidance improved decision quality, suggestive guidance was most effective. It also led to the greatest user satisfaction and reduced decision time. However, it was less effective than informative guidance in fostering learning about the problem domain. Different researchers sometimes use alternative terms when they refer to guidance. For example, it can be regarded as a form of feedback designed to enable the user to learn about how to carry out the task more effectively [26]. When invoked in response to a user’s intended action, suggestive guidance explicitly contrasts this intended action with the action that the system designer has deemed to be normative. The psychological ‘advice literature’ offers another perspective on guidance. An extensive review of this literature can be found in [27]. Although the literature is primarily oriented to advice provided by human experts, some researchers have explored the effectiveness of advice provided by machines [28]. This research has shown that while advice generally improves the quality of judgments, it is also often discounted in that people ‘[do] not follow their advisors’ recommendations nearly as much as they should have’ [27]. A number of explanations have been put forward for this. For example, it has been argued [29–31] that advice discounting partly takes place because decision makers have access to their own rationale for choosing a particular course of action while the rationale of their advisors may be less accessible. Also, imposed advice is less likely to be followed than advice which is actively solicited [27]. This discussion raises a number of questions: 1) Which feature of a support system, restrictiveness or guidance, is more effective in moving forecasters’ behavior closest to that of a normative approach? 2) Which feature leads to the most efficient decision making? Efficiency in this case is a low average time to produce forecasts. 3) Which feature is more effective in stimulating learning so that a movement towards normative behavior is most rapid? 4) Which feature is more acceptable to forecasters? The next section describes an experiment, which was designed to address these questions. 3. Experimental design In the instructions for the experiment participants were told that they were forecasters working for a manufacturing company, which supplies supermarkets with its products and that each month their task was to produce a forecast of total demand for a single product for the following month. Participants were told that the supermarkets sometimes run promotion campaigns and notify the manufacturer of the details of these campaigns one month ahead. Also available to the forecaster were occasional items of ‘soft’ information such as rumours or the ‘gut feel’ of some of the managers associated with the company’s operations. The nature of this soft information, and the overall design of the experiment, was based on the authors’ detailed observations of forecasting meetings and processes in several manufacturing companies [18]. The participants used an experimental computerized support system (ESS) to produce the forecasts. The ESS initially provided the following information graphically: i) Data on a product’s demand for the last 30 months; ii) statistical baseline forecasts for the last 30 months. These forecasts were derived from data that had been cleansed of estimated promotion effects and were based on simple exponential smoothing or the Holt-Winters method. They were obtained automatically by using the expert system that is incorporated in the Forecast Pro forecasting system [32]; iii) a statistical baseline forecast for the next month (provided both graphically and numerically); iv) estimated past promotion effects; v) a message board giving details of any sales promotion that was due to take place next month, together with the estimated effect of recent similar promotions. To simulate a real forecasting environment; this board also occasionally displayed rumours or other managers’ speculations relating to next month’s demand. As is typical of forecasting review meetings [18] many of the manager’s views were contradictory (e.g. Sales Manager: ‘‘I’ve a gut feeling that we’ll see slightly better than expected sales next month if the mood of my sales staff is anything to go on.’’ Accountant: ‘‘I recall you telling me something similar a year ago and sales went down!’’), and sometimes the factual information on forthcoming promotions was qualified by managerial opinion (e.g. ’’Price Star supermarkets are running a money-off token promotion next month. Their last 2 promotions of this type generated estimated extra sales of 45 and 57 units, respectively’’. Sales Manager: ‘‘But can we trust them to display our product prominently in their stores?’’). Fig. 1 displays a typical screen from the system. After each forecast had been made the screen was updated to include the information for the next month. Forecasts were required for months 31–71. In each case, the participant had to decide whether to adjust the statistical forecast and, if a decision was made to adjust, what the new forecast should be. The new forecast could be indicated by either clicking on the graph in the appropriate place or entering the forecast into a text box. The simulated demand time series were either ARIMA (0,1,1) or they followed a linear trend with multiplicative seasonality. In both cases the series were subject to either low noise (  N(0,18.8)) or high noise (  N(0,56.4)). The formulae used to 245 P. Goodwin et al. / Omega 39 (2011) 242–253 Fig. 1. A typical screen display. ARIMA (0,1,1) High noise 500 400 400 Sales (units) Sales (units) ARIMA (0,1,1) Low noise 500 300 200 100 0 300 200 100 0 1 7 13 19 25 31 37 43 49 55 61 67 Months 1 7 Trend seasonal High noise 500 500 400 400 Sales (units) Sales (units) Trend seasonal Low noise 13 19 25 31 37 43 49 55 61 67 Months 300 200 100 300 200 100 0 0 1 7 13 19 25 31 37 43 49 55 61 67 Months 1 7 13 19 25 31 37 43 49 55 61 67 Months Fig. 2. Time series. generate these series were ARIMAð0,1,1Þ series : Yt ¼ Yt1 0:7et1 þ et Trended seasonal series : Yt ¼ ða þ1:5tÞSi þ et where Yt is the observation at time t, a the trend at t ¼0, et the noise at t and Si the seasonal index for month i. In 9 of the 41 forecast periods (promotion periods) the series was disturbed by the effects of a sales promotion. The size of the effects was roughly in line with those reported in a study, which looked into ketchup and yogurt promotions [33]. These effects had a correlation of +0.74 with the means of the estimated effects of past promotions that were displayed on the message board. The series contained no pre- or post-promotion effects. Such effects are not observable in many product sales series, particularly those where consumers are unable to stock up on the promoted product [34,35]. In the 7 periods when soft information was provided (rumour periods) this information was contradictory on three occasions. On the remaining occasions, the managers’ speculations suggested a particular direction of adjustment to the statistical forecast and this conformed with the required direction of adjustment on 62.5% of occasions Thus, adjusting on the basis of these speculations was a risky strategy. Not only was there a 37.5% probability of adjusting in the wrong direction, which would increase the forecast error, but also even if the correct direction was chosen, an adjustment that was more than twice the required adjustment would also increase the error. Fig. 2 displays the series for months 1–71. Participants were randomly assigned to one of three treatments: control, restrictiveness or guidance. The control group performed the forecasting task without any restrictions on the size of their adjustments or any guidance on when they should make adjustments. When participants in the restrictiveness 246 P. Goodwin et al. / Omega 39 (2011) 242–253 treatment tried to make an adjustment below a pre-specified size they received a message: ‘‘The system will not allow adjustments of [x] units or less. It only permits adjustments for major factors that the statistical forecast has not allowed for.’’ They then had the option of trying to make a revised adjustment or accepting the statistical forecast after all. To determine x the program needed to assess whether a participant’s judgmental adjustment was likely to be an adjustment for noise (which it aims to prohibit) or one made to take into account a promotion effect (which it aims to allow). To try to discriminate between these two circumstances it compared the size of past promotion effects with an estimate of the standard deviation of the noise associated with a series. The latter was estimated using the RMSE of the first 30 baseline forecasts on the cleansed data. This yielded a promotion effect ratio, where Promotion effect ratio ¼ Mean promotion effect Forecast RMSE Investigation of this ratio for the low noise series suggested that a restriction prohibiting absolute adjustments of below 2 estimated noise standard deviations (31 sales units) would still allow for sufficient adjustment to take into account all the promotion effects. There is roughly only a 5% chance that a normal period observation would have required an adjustment of this size. For the high noise series, a restriction prohibiting absolute adjustments of below one estimated noise standard deviation (i.e. 51 sales units) would still allow sufficient adjustment to take into account all the promotion effects (only one promotion effect in the past data was below the permitted adjustment level). There is roughly a 32% chance that a normal period observation would have required an adjustment of this size to yield a forecast error of zero, but obviously the discrimination between judgmental adjustments for promotion and noise effects is bound be more problematical for a high noise series. Whatever its potential limitations, this approach seemed preferable to a system, which simply prohibited absolute adjustments of below (say) 10% of the statistical forecast since such a strategy would take no account of the noise associated with a series. If a participant in the guidance treatment indicated that they intended to adjust the statistical forecast for a non-promotion (i.e. a normal or rumour) period the following message appeared. ‘‘Are you sure that you want to change the statistical forecast? There is no promotion campaign next month and any change you make to the forecast is likely to reduce accuracy.’’ Conversely, if the forecaster initially chose not to adjust the statistical forecast for a promotion period the following message appeared: ‘‘You are advised to consider adjusting the statistical forecast. It cannot take into account the likely extra sales resulting from the promotion.’’ In both cases facilities were provided for the forecaster either to change their mind or to stay with their original decision to adjust. Note that the guidance was not solicited by participants. Also participants had to explicitly indicate that they were going against its suggestion by pressing a button labelled either: ‘‘Yes, I still want to change the forecast’’ or ‘‘I’ll still leave the stats forecast unchanged’’. After completing the forecasting task, participants used the computer to complete a questionnaire in which they were asked to indicate their level of agreement with the following statements using a 1(strongly disagree) to 5 (strongly agree) scale.  I found the forecasting system easy to use.  I thought that the forecasting system was useful.  I think that the forecasting system would be acceptable to forecasters in companies.  Using this forecasting system is likely to lead to more accurate forecasts.  I am confident that my forecasts were as accurate as they could be, given the information that was provided. The 130 participants who took part in the experiment were graduate or final year undergraduate students in the Australian School of Business (formerly the Faculty of Commerce and Economics) of the University of New South Wales in Sydney, Australia. The final year undergraduate students were mostly from the sponsored program and had undertaken either 12 or 18 months of industrial training. Whilst the use of students in experimental research is sometimes criticized, a study by Remus [36] found that students can act as reliable proxies for managers in decision making tasks. This is particularly the case, in experiments like this, where the task mirrors the types of job undertaken in industrial placements or at an early stage in a graduate’s career. A second issue is the participants’ motivation to complete the tasks successfully. In the current experiment all participants were presented with a screen at the end of the session, which informed them of the accuracy they had achieved. The intention of this was to motivate them by providing an assessment of their judgmental skill and to foster a spirit challenge and competition between them. No monetary incentives were provided. Remus et al. [37] found that providing monetary incentives had no effect on the accuracy of forecasts produced by participants in judgmental forecasting experiments. Other studies have shown that financial rewards can lead to weaker cognitive performance (e.g. see [38]). 4. Results 4.1. Which feature of the support system, restrictiveness or guidance was most effective in moving forecasters’ behavior closest to that of a normative model? To answer this research question data was collected on the proportion of adjustments made to the statistical forecasts in the three period types: (a) normal, (b) rumour and (c) promotion. Movement to the normative model would be signified by a smaller proportion of adjustments in non-promotion periods and the same, or a higher, proportion of adjustments in promotion periods when compared to the control group. The data was analysed using a 3 (type of support)  4 (series type)  3 (type of period) ANOVA model with type of period treated as a repeated measure. Fig. 3 shows the proportion of statistical forecasts adjusted for each mode in each type of period: (1) normal, (2) rumour and (3) promotion. The interaction depicted between support and period type is statistically significant (F(4,236)¼4.73; po0.001). There was no significant interaction between type of support and series type. It can be seen that, whatever the nature of the support they received, people were tempted to make an adjustment when they received a rumour, despite the high risk associated with this strategy. Nevertheless, they (correctly) made the greatest proportion of adjustments when there was a sales promotion campaign forthcoming. Both restrictiveness and guidance were successful in reducing the proportion of unnecessary adjustments in normal periods. However, restrictiveness reduced people’s propensity to make an adjustment when it was necessary in promotion periods while guidance was successful in encouraging a greater proportion of adjustments. Thus guidance appeared to be most successful in moving decision making closer to the normative model. It is also interesting to investigate the size of adjustments made by participants. To measure these, the absolute adjustments were taken as a percentage of the statistical forecasts. The mean of these percentages yielded the mean absolute percentage 247 P. Goodwin et al. / Omega 39 (2011) 242–253 1 Proportion of forecasts adjusted 0.9 Guidance 0.8 0.7 0.6 0.5 Control group 0.4 Restrictiveness 0.3 0.2 0.1 0 Normal Rumour Promo Fig. 3. Proportion of adjustments to statistical forecasts by type of period and type of support. Table 1 Mean absolute percentage adjustment for each type of support. Table 2a Mean absolute percentage errors for each type of support. Support type MAPA Control Restrictiveness Guidance 23.2 31.2 16.9 adjustment (MAPA) and Table 1 shows the MAPA for the three types of support. Note that the MAPA shown is averaged only over periods when an adjustment was made, but the relative size of the MAPAs is similar when the average is taken over all periods. There was a significant main effect for type of support (F(2,118)¼12.94; p o0.0001). Restrictiveness led to significantly larger adjustments than the other types of support. This is probably because people were trying to get round the ban on small adjustments. What was the effect of the forecasters’ behavior on accuracy? Because participants were not intended to forecast the noise in the series, accuracy was measured by comparing the forecasts with the underlying signals of the simulated time series (i.e. the demand figures before noise was added). This approach has been used in several studies including [14,17,19,39]. However, to allow comparisons with other studies, we also display the accuracy of forecasts compared to the actual demand figures in Fig. 2a. Average accuracy was measured using the mean absolute percentage error (MAPE), though similar results were obtained when the median absolute percentage error (MdAPE) was used. When ANOVA was applied to the MAPEs there was no significant interaction between type of support and period type or type of support and series type, but there were significant main effects for both variables (for type of support: F(2,118)¼5.39; p o.006 for period type: F(2,236)¼11.27; p o0.0001). Table 2a shows the MAPEs by type of support and Table 2b shows the MAPEs by period type. Most interestingly, restrictiveness actually led to a statistically significant reduction in accuracy when compared to the control. Moreover, guidance failed to yield significant improvements in accuracy over the control group. Not surprisingly, rumour periods were the least accurately forecast as they had only limited information value and were contradictory in 37.5% of cases. A series of decision trees was constructed to obtain a more detailed assessment of how the participants reacted to the different types of support and the consequences of their decisions. Type of support MAPE against signal MAPE against actual Control Restrictiveness Guidance Unadjusted statistical forecasts 23.4 27.1 23.4 22.4 25.1 25.4 22.4 22.3 Table 2b Mean absolute percentage error against signal for each period type. Period type Normal Rumour Promotion Type of support Unadjusted statistical forecast Control Restrictiveness Guidance 22.9 26.3 22.7 24.7 33.5 28.9 22.8 25.4 23.5 19.6 22.4 30.2 The trees in Fig. 4 show, for the control group, the effect of the participants’ adjustment decisions on forecast accuracy. As expected, when people made adjustments in normal or rumour periods it was much more likely that these adjustments would reduce accuracy than improve it (e.g. in the control group over 70% of these forecasts were worsened through adjustment). In contrast, when people made adjustments in promotion periods, as expected, these tended to improve accuracy (over 83% of these forecasts were improved through adjustment). Thus the advice that was provided in the ‘guidance’ treatment, which told participants not to adjust in non-promotion periods, but to make adjustments in promotion periods was sound. For participants in the restrictiveness treatment, data was collected on the percentage of times that they: (a) decided not to change the statistical forecast after being prohibited from making a small change or (b) simply increased the size of the adjustment after being informed that a small change was not allowed. In each case the effect of this behavior on accuracy was also measured. The decision trees in Fig. 5 show the results for the three types of periods. It can be seen that when people were banned from making a small adjustment, they often responded by making a larger adjustment, rather than simply deciding not to adjust after 248 P. Goodwin et al. / Omega 39 (2011) 242–253 "fcs" = forecasts "Acc" = forecast accuracy Normal Periods 21.43 % Acc improved 21.18% Attempt adjust 78.57 % Acc not improved 1100 fcs 78.82% Don’t attempt However, the acceptance of this advice was still surprisingly low—it was followed in only 48.2% of cases. 4.2. Which feature led to the most efficient decision making (i.e. the lowest average time to produce forecasts)? There were no significant differences between the control group and the restrictiveness, and the guidance groups in the mean time took to produce the forecasts. Unsurprisingly, forecasts for the series with linear trend and multiplicative seasonality took significantly longer to make on average (mean time: 22 s) than those for the ARIMA series (mean time: 14 s) (po0.05). 4.3. Which feature was most effective in stimulating learning so that a movement towards normative behavior is most rapid? Rumour Periods 29.94 % Acc improved 50.97% Attempt adjust 70.06 % Acc not improved 308 fcs 49.03% Don’t attempt Promotion Periods 83.39 % Acc improved 76.01% Attempt adjust 16.61 % Acc not improved 396 fcs 23.99% Don’t attempt Fig. 4. Decision making in the control group. all. This was particularly the case in ‘rumour’ periods where 67% of ‘too small’ adjustments were revised upwards and the effect was that 78.6% of these revised adjustments damaged accuracy. In normal periods people tended to be less insistent on making a larger adjustment after a small adjustment had been rejected, but when they did, 92.9% of their adjustments reduced accuracy. In promotion periods, as already reported, restrictiveness led to fewer adjustments being attempted in the first place (64.9% vs. 76.0% in the control). However, this negative effect was partly mitigated by its implicit encouragement to make larger adjustments, which led to improvements on 73.5% of occasions since larger adjustments were in fact needed here. How did people react to the guidance they were given? As the decision trees in Fig. 6 show, in normal and rumour periods they tended to go against the good advice not to make an adjustment—they did this over 77% of the time in both cases. Recall that the advice did not simply appear in a corner of the screen where it could easily have been ignored participants attempting an adjustment had to explicitly indicate that they wanted to go against the advice before they could proceed. People were more responsive to guidance, which told them to make an adjustment (when they had initially decided not to) in promotion periods. Fig. 7 shows the total number of times that restrictiveness was activated as the experiment progressed for the entire group in the restrictiveness treatment. This clearly shows that people generally learned not to make small adjustments as the experiment progressed. Fig. 7 also shows the number of occasions that guidance was activated. This curve falls less steeply than that for restrictiveness, suggesting a slower rate of learning. However, there are two differences between the features. First, restrictiveness only reminds users not to make small adjustments while guidance reminds them both not to adjust in non-promotion periods and to make adjustments in promotion periods so there are two things to learn. Second, the user may be prepared to activate guidance deliberately in order to prevail with an adjustment that they are determined to make. There is little point in deliberately activating restrictiveness since it will not allow the user to continue with a desired adjustment. Of course, learning to manipulate the support system and to avoid its constraining interventions does not necessarily imply that the user is learning to carry out the task in a manner which is closer to the normative behavior intended by the system designer. To investigate this we recorded the percentage of times that participants made decisions which conflicted with the normative approach (i.e. the percentage of times that they made an adjustment in a non-promotion period or failed to adjust in a promotion period). Fig. 8 compares the results for the first 15 periods that required forecasting with the last 15 for the different types of period. The evidence that restrictiveness and guidance fostered learning towards the normative approach as the experiment progressed is weak. Even though the percentage of conflicting decisions declined as time went on for normal periods, this same phenomenon was observed in the control group. Thus the improvements might have occurred simply because the patterns in the later part of the time series offered less enticement to participants to make adjustments or because the participants were becoming fatigued. Alternatively, there may have been a natural tendency to learn, irrespective of the type of support. Thus we can only conclude that the relative benefits of restrictiveness and guidance were obtained early on and that there was no improvement from this initial advantage. The graph for the rumour periods also suggests no tendency to learn; it merely implies again that the advantages of restrictiveness and guidance were acquired early on. For promotion periods, the tendency of restrictiveness to discourage necessary adjustments is clearly seen. Again, the slight improvement over time is no better than that of the control group. This behavior was also reflected in the MAPEs for the different types of periods for the early and later parts of the experiment—these yielded no significant differences in any treatment. Thus, while people in 249 P. Goodwin et al. / Omega 39 (2011) 242–253 "fcs" = forecasts 7.10% "Acc" = forecast accuracy Acc improved 35.44% Make larger adjustment Normal Periods 92.90% Acc not improved 37.40% Adjust banned 64.56% 19.18% Attempt adjustment Don’t adjust 14.40% Acc improved 62.56% Not banned 1100 fcs 85.60% Acc not improved 80.80% Don't attempt 21.40% Acc improved 66.67% Make larger adjustment Rumour Periods 78.60% Acc not improved 31.57% Adjust banned 33.33% 43.18% Attempt adjustment Don’t adjust 12.10% Acc improved 68.42% Not banned 308 fcs 87.90% Acc not improved 56.82% Don't attempt 73.50% Acc improved 70.83% Make larger adjustment Promotion Periods 26.50% Acc not improved 18.68% Adjust banned 29.17% 64.90% Attempt adjustment Don’t adjust 75.10% Acc improved 81.32% Not banned 396 fcs 24.90% Acc not improved 0 35.10% Don't attempt Fig. 5. The effect of restrictiveness on decision making and accuracy. the restrictiveness and guidance groups seem to have learned over time to reduce the number of system interventions, there is little evidence that they also learned to make more accurate forecasts. 4.4. Which feature was most acceptable to forecasters? Surprisingly, in response to the question, I think the forecasting system would be acceptable to forecasters in companies, participants in the restrictiveness treatment indicated a significantly higher level of agreement than those in the guidance treatment (mean¼ 3.35 vs. 2.85, p¼ 0.0138, two tail, where the scale ranges from 1¼strongly disagree to 5¼strongly agree). It might have been expected that restricting participants’ actions would be considered to be less acceptable. Silver [3] has suggested that people may prefer highly restrictive systems because these are easier to use and remove the burden of having to decide which features to employ. However, in this experiment, restrictiveness thwarted actions that the forecasters wished to carry out. Apart from a weak, but significant, negative partial correlation between the total number of adjustments made by participants and their agreement with the statement: ‘‘I thought the forecasting system was useful’’ (after controlling for mode and series) (r ¼  0.177, p ¼0.046) none of the responses to the questionnaire were significantly correlated with the proportion of adjustments made by participants (even after controlling for mode and series). Nor were any of the dimensions measuring attitude to the system 250 P. Goodwin et al. / Omega 39 (2011) 242–253 "fcs" = forecasts "Acc" = forecast accuracy 17.90% Acc improved 77.23% Normal Periods Adjust 82.10% Acc not improved 100.00% advice not to adjust 22.77% Don’t adjust 2 1 .3 3 % Attempt adjustment 0.00% No advice 1050 fcs 80.80% Don't attempt 28.45% Acc improved 77.85% Rumour Periods Adjust 71.55% Acc not improved 100.00% advice not to adjust 22.15% Don’t adjust 50.68% Attempt adjustment 0.00% No advice 294 fcs 49.32% Don't attempt Promotion Periods Acc improved 79.86% Acc not improved 20.14% 77.51% Attempt adjustment 60.98% Acc improved 48.24% adjust 378 fcs 0 39.02% Acc not improved 100% advice to adjust 51.76% don’t adjust 22.48% Don't attempt 0% No advice Fig. 6. The effect of guidance on decision making and accuracy. correlated with the amount of advice received or the number of restrictions experienced. 5. Discussion The results of this study suggest that well-intentioned design features of support systems can have damaging effects on the performance of users. People often attempted to circumvent the constraints imposed by the restrictiveness feature and the accuracy of their forecasts was reduced as a consequence. They also frequently ignored the guidance they were given—a result which was consistent with the findings of Antony et al. [6] in their study of knowledge-based systems. Surprisingly, guidance was less acceptable than restrictiveness. It is, of course, possible that at least some of the participants had misunderstood the nature of the task or that they were treating the task either carelessly or in a deliberately negative fashion because of lack of motivation. To establish whether there was any evidence for this we first identified participants who made an excessive number of downward adjustments to the statistical forecasts in promotion periods, when, of course, an upward adjustment would normally be required. Ten participants who downwardly adjusted 4 or more of their 9 forecasts for promotion period were judged 251 P. Goodwin et al. / Omega 39 (2011) 242–253 Number of restrictiveness activations All periods Restrictiveness 16 14 12 10 8 6 4 2 0 5 0 15 10 20 25 30 35 40 45 35 40 45 Period no No of times advcie triggered Guidance All periods 30 25 20 15 10 5 0 0 5 15 10 20 25 Period no. 30 Fig. 7. Number of occasions when restrictiveness and guidance were evoked (with fitted trend lines). 20.0 15.0 10.0 Control Restrict Guidance 5.0 0.0 First 15 Last 15 60.0 50.0 40.0 30.0 20.0 Control Restrict Guidance 10.0 0.0 First 15 Last 15 Promotion periods Percentage of non-normative actions 25.0 Rumour periods Percentage of non-normative actions Percentage of non-normative actions Normal periods 30.0 45.0 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 Control Restrict Guidance First 15 Last 15 Fig. 8. Percentage of time participants’ decisions conflicted with those of the normative approach in the first and last fifteen periods to be dubious. Two additional participants never made any adjustments at all, despite the evidence that promotion periods required an upwards adjustment. However, when the analysis was repeated with these 12 participants removed (so that 118 participants remained) similar results to those reported earlier were obtained. Although guidance was more successful in moving people’s use of the system closer to that deemed to be normative, the extent to which it was ignored needs to be explained. One possibility is that guidance fared badly because it was imposed on users [27]. The relatively low level of acceptance for guidance provided some support for this. Alternatively, people may have resented having their initial decisions challenged. If this is the case providing guidance before the adjustment decision is made would be likely to be beneficial. A third possibility is that the concise form in which the guidance was conveyed did not allow it to compete with the user’s internal rationale for making a given decision. The fact that advice was most often ignored in rumour periods adds weight to this latter explanation. It seems that colourful speculations or rumours that have low reliability (e.g. Period 47: ‘‘Haven’t they predicted that we’ll have heavy rain next month? These medium term weather forecasts might not be that reliable, but rain will hit our sales.’’) are likely to have far greater salience than a piece of terse advice cautioning against adjustment in a non-promotion period. This mirrors Tversky and Kahneman’s [40] finding that judges will ignore statistical baserates in favour of anecdotal case-based information that has little reliability. Of course in real organizational environments forecasters may also feel obliged to act on rumours, especially if they emanate from more senior managers. The beneficial effects of guidance, such as they were, were also achieved early on and there was only weak evidence of an improvement from this position as time went on. Parikh et al. [5] found that suggestive guidance was less effective than informative guidance in developing user learning about the problem domain. To foster learning it might therefore be worth coupling the suggestive guidance with some form of feedback, which either provided direct information about the task to support the advice and/or provided information showing the past consequences of ignoring or adhering to the advice. 252 P. Goodwin et al. / Omega 39 (2011) 242–253 The main intention of restrictiveness was to dissuade participants from adjusting forecasts in non-promotion periods. While it achieved this objective to an extent, the resulting benefits were outweighed by two associated disadvantages. First, once people decided to make an adjustment in non-promotion periods, they often seemed determined to persevere and simply increased the size of the adjustment to get round the restriction, thereby tending to exacerbate the inaccuracy of the forecast. Second, ironically, restrictiveness appeared to reduce people’s propensity to make the adjustments that were required in promotion periods. The result was that, overall, restrictiveness significantly reduced the accuracy of forecasts when compared with the control group. Again it would be interesting to see if these negative effects could be mitigated if restrictiveness was coupled with some form of feedback. It was also expected that restrictiveness would eventually improve the efficiency of decisions by discouraging participants from wasting time pondering over small adjustments that would serve no purpose. This expectation was also not borne out by the results. Although those in the restrictiveness group attempted fewer adjustments, this did not significantly reduce the mean time they spent on the forecasting task. 6. Conclusions The main conclusion of this paper is that neither restrictiveness nor guidance were wholly successful in fostering improved use of a support system. Indeed, restrictiveness, though more acceptable to forecasters than guidance, actually encouraged counter-productive behavior and significantly reduced forecast accuracy. Although guidance was more effective in encouraging decisions that were closer to the normative approach its impact was limited because it was frequently ignored. The experiment illustrates the danger that well-intentioned design features in support systems can be resisted by users with potentially detrimental effects on the quality of their decision making. Despite this the experiment provided enough evidence to suggest that support systems are worth pursuing and that the ideas of guidance and restrictiveness could be worthwhile features of such systems if their disadvantages can be avoided. They both have the potential to deliver improved forecasts. Providing feedback in association with these facilities might be one way of achieving this and would be a worthwhile avenue for future research. Future experiments could also usefully investigate the role of restrictiveness and guidance when forecasts of demand are provided for multiple time horizons and where minimum allowable adjustments are specified in advance, with adjustments possibly being expressed as percentages rather than as absolute values. The effect of requiring managers to make adjustments prior to seeing the statistical forecasts (as recommended by [22]) with further adjustments being prohibited thereafter is also worth investigating. Acknowledgements This research was supported by Engineering and Physical Sciences Research Council (EPSRC) Grants GR/60198/01 and GR/ 60181/01. References [1] Fildes R, Goodwin P, Lawrence M, Nikolopoulos K. Effective forecasting and judgmental adjustments: an empirical evaluation and strategies for improvement in supply-chain planning. International Journal of Forecasting 2009;5: 3–23. [2] Kottemann JE, Davis FD, Remus WE. Computer-assisted decision making: performance, beliefs, and the illusion of control. Organizational Behavior and Human Decision Processes 1994;57:26–37. [3] Silver M. Decision support systems: directed and nondirected change. Information Systems Research 1990;1:47–70. [4] Stabell CB. A decision-orientated approach to building DSS. In: Bennett JL, editor. Building decision support systems. Reading, MA: Addison-Wesley; 1983. [5] Parikh M, Fazlollahi MB, Verma S. The effectiveness of decisional guidance: an empirical investigation. Decision Sciences 2001;32:303–31. [6] Antony S, Batra D, Santhanam R. The use of a knowledge-based system in conceptual data modelling. Decision Support Systems 2005;41:176–88. [7] Sanders NR, Graman GA. Quantifying costs of forecast errors: a case study of the warehouse environment. Omega 2009;37:116–25. [8] Blattberg RC, Hoch SJ. Database models and managerial intuition: 50% Model + 50% manager. Management Science 1990;36:887–99. [9] Goodwin P. Integrating management judgment and statistical methods to improve short term forecasts. Omega 2002;30:127–35. [10] Keen PGW, Morton MSScott. Decision support systems: an organizational perspective. Reading, UK: Addison-Wesley; 1978. [11] Fildes R, Lawrence M, Goodwin P. The design features of forecasting support systems and their effectiveness. Decision Support Systems 2006;42: 351–361. [12 ] Fagerholt K, Christiansen M, Hvattum LM, Johnsen TAV, Vabø TJ. A decision support methodology for strategic planning in maritime transportation. Omega 2010;38:465–74. [13] Tütüncü GY, Carreto CAC, Baker BM. A visual interactive approach to classical and mixed vehicle routing problems with backhauls. Omega 2009;37: 138–54. [14] Lee WY, Goodwin P, Fildes R, Nikolopoulos K, Lawrence M. Providing support for the use of analogies in demand forecasting tasks. International Journal of Forecasting 2007;23:377–90. [15] Webby W, O’Connor M, Edmundson B. Forecasting support systems for the incorporation of event information: an empirical investigation. International Journal of Forecasting 2005;21:411–23. [16] Willemain TR. The effect of graphical adjustment on forecast accuracy. International Journal of Forecasting 1991;7:151–4. [17] Goodwin P. Improving the voluntary integration of statistical forecasts and judgment. International Journal of Forecasting 2000;16:85–9. [18] Goodwin P, Fildes R, Lee WY, Nikolopoulos K, Lawrence M.Understanding the use of forecasting systems: an interpretive study in a supply-chain company. University of Bath Management School Working Paper, 2007. [19] Goodwin P, Fildes R. Judgmental forecasts of time series affected by special events: does providing a statistical forecast improve accuracy?Journal of Behavioral Decision Making 1999;12:37–53. [20] Harvey N. Why are judgments less consistent in less predictable task situations?Organizational Behavior and Human Decision Processes 1995;63:247–63. [21] Armstrong JS. Principles of Forecasting: a handbook for researchers and practitioners. Norwell, MA: Kluwer Academic Publishers; 2001. [22] O’Connor M, Remus W, Griggs K. Judgemental forecasting in times of change. International Journal of Forecasting 1993;9:163–72. [23] Sanders N, Ritzman L. Judgmental adjustments of statistical forecasts. In: Armstrong JS, editor. Principles of forecasting: a handbook for researchers and practitioners. Norwell, MA: Kluwer Academic Publishers; 2001. [Chapter 13]. [24] Venkatesh V, Morris MG, Davis GB, Davis FD. User acceptance of information technology: towards a unified view. MIS Quarterly 2003;27:425–78. [25] Montazemi AR, Wang F, Nainar SMK, Bart CK. On the effectiveness of decisional guidance. Decision Support Systems 1996;18:181–98. [26] O’Connor M, Remus W, Lim K. Improving judgmental forecasts with judgmental bootstrapping and task feedback support. Journal of Behavioral Decision Making 2005;18:247–60. [27] Bonaccio S, Dalal RS. Advice taking and decision making: an integrative literature review, and implications for organizational sciences. Organizational Behavior and Human Decision Processes 2006;101:127–51. [28] Wærn Y, Ramberg R. People’s perception of human and computer advice. Computers in Human Behavior 1996;12:17–27. [29] Yaniv I. The benefit of additional opinions. Current Directions in Psychological Science 2004;13:75–8. [30] Yaniv I. Receiving other people’s advice: influence and benefit. Organizational Behavior and Human Decision Processes 2004;93:1–13. [31] Yaniv I, Kleinberger E. Advice taking in decision making: egocentric discounting and reputation formation. Organizational Behavior and Human Decision Processes 2000;83:260–81. [32] Stellwagen EA, Goodrich RL. Forecast pro for Windows. Belmont, MA: Business Forecast Systems Inc.; 1994. [33] Ailawdi KL, Neslin SA. The effect of promotion on consumption: buying more and consuming it faster. Journal of Marketing Research 1998;35:390–8. [34] Hendel I, Nevo A. The post promotion dip puzzle. What do the data have to say? Quantitative Marketing and Economics 2003;1:409–24. [35] Neslin SA, Stone LGS. Consumer inventory sensitivity and the post-promotion dip. Marketing Letters 1996;7:77–94. [36] Remus W. Graduate students as surrogates for managers in experiments on business decision making. Journal of Business Research 1986;14: 19–25. P. Goodwin et al. / Omega 39 (2011) 242–253 [37] Remus W, O’Connor M, Griggs K. The impact of incentives on the accuracy of subjects in judgmental forecasting experiments. International Journal of Forecasting 1998;14:515–22. [38] Eroglu C, Croxton KL. Biases in judgmental adjustments of statistical forecasts: the role of individual differences. International Journal of Forecasting 2010;26:116–33. 253 [39] Harvey N, Bolger F. Graphs versus tables: effects of data presentation format on judgemental forecasting. International Journal of Forecasting 1996;12: 119–137. [40] Tversky A, Kahneman D. Judgment under uncertainty: heuristics and biases. Science 1974;185:1124–31.