Cube Analyst

Download as pdf or txt
Download as pdf or txt
You are on page 1of 144

Cube ME

Table Of Contents
Introduction .................................................................................................. 1 Introduction ............................................................................................... 1 Scope of this Manual.................................................................................... 2 Background ................................................................................................ 3 Common Elements and Variations.................................................................. 4 Reading this Manual .................................................................................... 5 Conventions Used in this Manual ................................................................... 6 Computing Resources .................................................................................. 7 Cost Information ......................................................................................... 8 The Nature Of The Estimation System ............................................................... 9 A Framework for Handling Different Data Consistently ...................................... 9 Objectives .................................................................................................10 Other Features...........................................................................................11 Options for the User ...................................................................................12 Considerations for the User..........................................................................13 Estimating Highway and Public Transport Matrices ..........................................16 Overview of Cube ME ..................................................................................17 Possible Data Inputs......................................................................................19 Different Types of Data ...............................................................................19 Sets of Data ..............................................................................................22 Mathematical Background ..............................................................................23 Mathematical Notation ................................................................................23 Introduction to the Mathematics in Cube ME...................................................25 Mathematical Summary...............................................................................34 Extensions to the Calculations ......................................................................41 Data Preparation and Analysis of Results ..........................................................43 iii

Cube ME Overview ..................................................................................................43 Matrices....................................................................................................44 Trip Ends ..................................................................................................45 Networks and Traffic and Passenger Counts ...................................................46 Screenlines ...............................................................................................47 Routings ...................................................................................................49 Setting Confidence Levels............................................................................51 Tuning Estimation Performance ....................................................................53 Control of Routing Information .....................................................................54 Analyzing the Results..................................................................................55 The Estimation Process In Application ..............................................................57 Study Area ................................................................................................57 Data.........................................................................................................58 Estimating the Matrix..................................................................................59 Evaluation: Sensitivity Analysis ....................................................................61 Including Part Trip Data ..............................................................................62 Hierarchic Estimation.....................................................................................67 Introduction to Hierarchic Estimation ............................................................67 Alternative Approaches to Hierarchic Estimation..............................................69 Defining Districts........................................................................................76 Running Cube ME for Hierarchic Estimation ....................................................77 Using Cube ME .............................................................................................79 Input Data - Overview ................................................................................79 Outputs - Overview ....................................................................................80 Estimating Large Matrices (Hierarchic Estimation) ...........................................81 The Estimation Process ...............................................................................82 Reports .......................................................................................................83 iv

Table Of Contents Summary of Reports...................................................................................83 Example of Average Confidence Level Report .................................................85 Example of Final Five Iterations Report..........................................................86 Example of Matrix Totals and Zone Generation Report .....................................87 Example of Zone Attractions Report ..............................................................88 Example of Average Confidence Level Report (Part Trip Data)...........................89 Example of Part Trip Totals Report ................................................................90 Example of District Matrix Reports ................................................................91 Example of Local Matrix Reports ...................................................................92 Files ............................................................................................................95 Permanent Files .........................................................................................95 Control Data.................................................................................................97 &PARAM Keywords .....................................................................................97 &OPTION Keywords .................................................................................. 104 Program Specific Data ................................................................................. 107 Screenline File ......................................................................................... 107 Trip End File ............................................................................................ 109 Coordinate File......................................................................................... 110 Model Parameter File ................................................................................ 111 Local Matrix Control File ............................................................................ 113 District Definition File................................................................................ 114 Intercept File ........................................................................................... 115 Gradient Search File ................................................................................. 116 Notes on Program Use ................................................................................. 117 Approaches to Running Cube ME ................................................................ 117 Selection of Model Form ............................................................................ 119 Information in the Optimization Log File ...................................................... 121 v

Cube ME Computation Times .................................................................................. 123 Examples................................................................................................... 125 Estimation with Prior Trip and Count Data Only............................................. 125 Estimation with Prior Trip, Count, and Trip End Data ..................................... 126 Estimation with 'Warm Start' and Cost Data ................................................. 127 Estimation with Highways Part Trip Data...................................................... 128 Estimation with Public Transport Part Trip Data............................................. 129 Hierarchic Estimation ................................................................................ 130 Example of Screenline Volumes Report ........................................................ 131 Index ........................................................................................................ 133

vi

Introduction
Introduction
Cube ME is a program which estimates an Origin-Destination (O-D) trip matrix. Cube ME estimates one matrix at a time, and the data should form a set related to this particular matrix; that is, the data should correspond to the same time period (hour(s) of day, day of week, time of year) as the matrix. It should also correspond to the same units of flow as the matrix (vehicles, pcu's, passengers etc). The characteristic common to all estimation options offered by Cube ME is that they make the best use, in a flexible way, of commonly available data sources to contribute to the estimation process. Data is given 'levels of confidence' or 'reliability' by the user which conditions the influence of varying sources of data in the estimation. The estimation process is based on the Maximum Likelihood technique, coupled with an optimization procedure.

Cube ME

Scope of this Manual


This manual applies to all levels of functionality offered and modes of operation of Cube ME. Features specific to a variant are noted. This manual concentrates on Cube ME; wider matters on matrix estimation, and the context within which Cube ME may be used, are described in the 'Introduction to the Matrix Estimation Programs' manual. This also explains the terms which have a specific meaning for Cube ME which are also used in this manual.

Introduction

Background
Cube ME enables transport planners to estimate Origin-Destination (O-D) trip matrices and to maintain the currency of existing O-D matrices, while minimizing survey costs. As is described later in Mathematical Estimation Model, Cube ME is suitable for estimating present day matrices, but not for forecasting future year trip matrices. The software contains a number of novel and distinctive features. It was first developed as a collaborative venture with the Dutch Ministry of Transport, the Rijkswaterstaat. Subsequently, studies and developments undertaken for Centro (the Passenger Transport Executive for the West Midlands area of England) led to a broadening of the software's capabilities to consider public transport passenger matrices, as well as highway (vehicle) matrices, and to estimate detailed matrices for very large study areas.

Cube ME

Common Elements and Variations


The characteristic common to all variants of Cube ME is that they make the best use, in a flexible way, of most available data sources in the estimation process. This includes not only vehicle traffic or passenger flow counts and prior (old) matrices, but also partially observed matrices, zonal trip end (generation and attraction) data, vehicle routing, travel cost matrices, and even previously calibrated trip cost distribution functions. An extension is the use of a further form of data called Part Trip data described in Possible Data Inputs. Data is ascribed confidence, or reliability levels by the user. This conditions the influence of data when different data items (inevitably) imply different trip matrix cell values. The estimation process is based on a statistically rigorous procedure which takes direct account of inherent traffic data variability. It uses the Maximum Likelihood technique, coupled with a powerful optimization procedure, to derive simultaneously an unusually large set of Model Parameters. These then determine the estimated trip cell values with correspondingly enhanced precision. Nevertheless, the estimation process remains mathematically underspecified and a feature of Cube ME is the information available to assess the quality of the estimated matrix. This includes comparative and sensitivity analyses, and reports which draw on a range of graphical and tabular presentations. Statistical reports are available which provide information on the standard errors of Model Parameter values, and indicators of the stability of estimated trip matrix cells (via a Sensitivity Matrix). Cube ME provides a hierarchic approach to estimation, suited for use with very large matrices, typically, between 2,500 and 5,000 zones in size. Its basic approach is to estimate a general matrix, in which zones are automatically grouped into Districts. This area-wide estimation is then used to control a set of detailed estimations, which build up to provide a fully-detailed estimate for the entire study area.

Introduction

Reading this Manual


The introductory chapters provide:
an overview of Cube ME; a set of Standardized Procedures, suitable for different types of estimations.

The manual considers estimation of highway and public transport matrices and all of the Cube ME features. Highway and public transport estimation are very similar, apart from obvious differences such as the use of line (service) data for public transport. There are also differences in emphasis, for example, count data is often more plentiful and reliable for highways than for public transport. Where such differences arise, they are noted. When reading this manual note that:
The following four chapters provide an essential overview of Cube ME; Chapter 6 documents an example of applying Cube ME; Chapter 7 is concerned with the specialist topic of Hierarchic Estimation

Cube ME

Conventions Used in this Manual


The following conventions are used in this manual: COSTM Parameters, Options and Selections in upper case. Hessian Technical term introduced for the first time, in upper and lower case italics. 'Sensitivity Matrix' Terms and phrases with particular meaning in the context of Cube ME in single quotes. These phrases may also appear in italics.

Introduction

Computing Resources
Cube ME is a major system. The programs ensure that the mechanics of operation for the user are straightforward, but it requires familiarity with a number of programs, especially for data preparation and analysis of results, and this should be taken into account when planning to use it for the first time. Cube ME is designed about a number of rigorous principles, including the calibration of the mathematical estimation model which the program undertakes. One consequence is that it is computationally intensive; the differing sets of data are considered simultaneously and this requires the availability of relatively large amounts of Random Access Memory (RAM), memory, and of disk space.

Cube ME

Cost Information
For highways, cost data is produced by Citilabs products. For Public Transport in TRIPS, cost data is produced by MVPUBM.

The Nature Of The Estimation System


A Framework for Handling Different Data Consistently
Cube ME provides a framework that is used to input a variety of information to estimate an O-D matrix. The characteristics of the system are that:
some or all of the types of information introduced in Common Elements and Variations, may be used; the system can work with little data, but the accuracy of the estimated matrix is improved as more data is input; different information is handled on a consistent basis; the variability of data is explicitly accounted for.

Cube ME

Objectives
The aim of Cube ME is to maximize the value of existing data and to limit the need for costly surveys. As such, it is mainly concerned with processing information in the best (statistical) manner; though the accuracy of the estimated matrix remains strongly affected by the amount and the quality of the information input by the user. Beside the role of estimating matrices for individual studies, Cube ME is suited for use with regular surveys designed to keep matrix information up-to-date.

10

The Nature Of The Estimation System

Other Features Handling Data Variability


Cube ME explicitly considers the variability of data. Inevitably, there are inconsistencies in what the different data suggest that the estimated matrix should be. The inherent variability means that collected data items are merely a sample, and hence the values, (even of simple traffic counts) may only be considered to fall within a range (a distribution). The width of this range is a reflection of the confidence that may be placed in particular items. Cube ME therefore requires the user to input information about how confident they are that each data item is representative of the situation for which the matrix is to be estimated. The information is input as a nominal percentage sample value. In restricted circumstances, this may be an actual sample obtained in a survey. This information about the variability is used to determine what relative influence each item of data has in the estimation process - it acts algebraically as a weighting value, and is referred to as a Confidence Level.

11

Cube ME

Options for the User


The user does not have to use Cube ME in one manner, but rather according to the information that is available and the context within which the matrix is required. Typically, the user will start with what information is to hand or may easily be collected. This provides a fast means of obtaining an initial matrix that can enable a study to proceed, at least for general investigations. Analysis of the resulting matrix and estimation statistics will show where there is greatest requirement for further quality data. Cube ME is then used to integrate this new (and possibly different type of) data to produce an improved estimated matrix.

12

The Nature Of The Estimation System

Considerations for the User


Cube ME involves the user in a number of stages:

Deciding what Information to Input


This will usually be all information already available, but new data will normally be appropriate for those parts of the study area where most change has taken place since previous surveys, or where traffic schemes or policy proposals require detailed analysis. See Table 2.1 and Figure 2.1. Feature Changes in: Car Ownership Land - Use Traffic Growth New industry, shops New car parking Road/Public Transport Network New bypass Traffic management New bus/rail services Travel Habits Out-of-Town shopping Observed O-D patterns; PT operators' boarding & alighting surveys; vehicle licence plate surveys
Table 2.1 : Identify Notable Features and Data Sources

Example

Data Counts Trip ends (Generations and Attractions) Travel Times, Routing

13

Cube ME

Figure 2.1: Appreciating Key Land Uses

Inputting Data
Information may be input in the form of matrices, as Trip Ends, or as networkrelated information. This data is prepared by the user within Cube, which offers a variety of modes of data entry. Extra information is required on data variability. This is input in the same form as the information to which it corresponds. Each data item, e.g. each count, trip end, etc. may have an individual Confidence Level attached to it, but in many cases global values will be used.

Estimating the Matrix


The matrix estimation stage simply requires the user to input the prepared files into Cube ME. As is described in Overview of Cube ME, and with more detail in Mathematical Background, Cube ME performs a set of iterative calculations which will automatically determine the statistically most likely matrix for the set of input data values provided. The first time Cube ME is run, it creates a set of files which can be used to reduce the run times of subsequent runs of Cube ME. This is either because the need to restructure data is avoided (the Intercept file) or because an estimation can take advantage of previously calculated results (the Gradient Search file and the Model Parameter file).

14

The Nature Of The Estimation System This ability to benefit from a previous run of Cube ME (for the same basic study) is usually used to assist in analyzing the consequences of changes in data values, but, for lengthy runs for large matrices it can provide a means of breaking an estimation into more than one run, for convenience. With an improved optimizer in Cube ME and more powerful computers such staging of estimations is now rarer, but it remains a typical feature for Hierarchic estimations of extremely large matrices. This is assisted by the Local Matrix Control file, which is open to editing so that estimations are staged in a manner convenient to the user.

Analyzing the Estimated Matrix


It is natural and desirable to want to check the quality of the estimated matrix. A typical approach to checking quality might be to compare the estimated matrix with some observed data which has not been used in the estimation process. However, this approach is not usually appropriate for Cube ME, which is designed to take advantage of all reasonably observed data. For example, if the estimated matrix implies that the link flows across a screenline are different from that observed (this is easily checked by assigning the estimated matrix to the network), then the solution is to re-run the estimation but now incorporating the extra observed data. The approach to analyzing the quality of the estimated matrix is, therefore, based on:
comparing the estimated results with input data values; checking the sensitivity of the results if data values are altered; analyzing the estimation calculations.

Besides information output by Cube ME itself, extensive use is made of other Citilabs programs for creating tabulations and graphic displays which highlight different characteristics of the estimated matrix.

Improving the Estimated Matrix


Deficiencies in the quality of the estimated matrix, when they are signalled by the results of the analysis phase, are remedied by improving the quality or quantity, or both, of the input data. The analysis phase can provide strong pointers as to which data is contributing to quality problems and hence where the user can focus attention.

15

Cube ME

Estimating Highway and Public Transport Matrices


For much of the time, it is not necessary to distinguish between the cases of estimating matrices for use with highways and public transport analysis; the same principles apply to each. However, there are a number of points to note. The first one is that the units of the matrices are usually in terms of vehicles for highways, and in terms of passengers for public transport. Much of the data and methods of processing are identical for both highways and public transport, but the routing information is derived in quite different ways. There is also the concept of Line Groups, which only applies to public transport and not to highways. Assumptions about the quality and quantity of data vary between the modes. Link count data is more readily, and accurately, available for highways than for public transport. Public transport is often more reliant on Part Trip data, as obtained from boarding and alighting surveys. This form of data may be obtained from licence plate matching surveys for highways.

16

The Nature Of The Estimation System

Overview of Cube ME
Cube ME's operations can be considered as a series of activities: i) Data Input and Restructuring For the most part, Cube ME simply reads the set of user's input data at this initial stage. However Cube ME also analyzes and restructures routing information (from the TRIPS Route Choice Probability (RCP) file or VOYAGER path file), and count data, from the Screenline file, into a more concise and efficient file, called the Intercept file. This restructuring can be relatively lengthy so, as noted in Considerations for the User, it is possible to re-use an Intercept file once it has been created. For VOYAGER users, the creation of the Intercept file is handled by the HIGHWAY program. ii) Calculation Initiation The main Cube ME calculations may be viewed as a search for the statistically most likely matrix, given the set of input data values. As this search relates, typically, to many thousands of matrix cell values, the manner of searching is a critical aspect of Cube ME. A calculation called 'The Method of Scoring' directs the start of the searching process. This calculation is always done as the first stage of the estimation calculation, and it may be repeated later, according to the settings of Cube ME's ITERH parameter. (This determines the number of iterations between Gradient Search Matrix calculations.) There is a 'strategy' consideration here. The default method for running Cube ME spends time with 'The Method of Scoring' calculation in order to limit subsequent calculations. Cube ME also calculates a suitable value for ITERH. However, it is open to the user to over-ride this strategy by:
changing the setting of the IHTYPE parameter (used to determine the optimization process) of Cube ME from its default in order to avoid the Method of Scoring. This reduces the associated calculation time, but means that the searching process is initially less well directed and so the net calculation time may still be longer;

setting ITERH to a lower value than the default, which means that the searching process is re-appraised by further application of the Method of Scoring. This may be suitable when there are signs that the optimizer is not able to determine a convergent solution in a reasonable number of iterations.

The user should note that these options for tuning the performance of Cube ME exist, but should not necessarily be concerned to apply them, as the default operation is usually entirely satisfactory. It requires some experience with a particular estimation problem to determine its best strategy. iii) Function Evaluation Function Evaluation is the term used to describe the calculation of a series of estimation results. These are calculated by way of an Estimation Equation (function). The Estimation Equation calculates the values of the estimated cells according to the current values of a series of Model Parameters. There are a large number of Model Parameters, in fact the number is usually two times the number of zones, plus the number of screenlines. These Model Parameters have an initial value of 1.0, which has the consequence that the initial Function Evaluation (usually) results in an 17

Cube ME Estimated Matrix which is identical to the old ('Prior' - see Different Types of Data). iv) Optimization The Optimizer is a central feature of Cube ME; there are two critical elements to it: a) an Objective Function - this provides a criterion by which the optimizer can determine whether one value of a particular cell is better than another value. Mathematical Background, explains how this criterion is derived from the statistical Maximum Likelihood theory and rigorous mathematical calculation. Hence, Cube ME defines 'better' as 'statistically more likely'; b) a set of Search Directions and a Step Length - the optimizer alters the Model Parameter values, from their starting point of 1.0, to seek an Estimated Matrix that is an improvement on its current estimates. The Search Direction determines, for any cell in the matrix, whether Model Parameters should be increased or decreased, and the Step Length defines by how much. The final values of the Model Parameters are available to view as the Model Parameter file, so it is possible to see how they have been changed from 1.0. v) Iterations and Convergence After the optimizer has calculated new Model Parameter values, the function evaluation process is repeated to obtain the latest estimated matrix (and its derivative values). This overall process is repeated in a series of iterations; at each iteration the optimizer will ensure that the new estimated matrix is an improvement ('more likely') than the previous one. Because there are so many cells to estimate, which Cube ME does not confine to have integer values, it is normally always possible to make some improvement, however small. Therefore, it is necessary to define a criterion to determine when the iterations have reached an acceptable solution. In Cube ME, this criterion is set by the UTOL ('user tolerance') control parameter. UTOL sets a minimum value on the step length which the optimizer is allowed to use, as very small step lengths indicate that the optimizer is making correspondingly small changes to the estimated matrix. It is usual to leave UTOL at its default value, and allow Cube ME to run until it terminates with a 'converged' message.

18

Possible Data Inputs


Different Types of Data
Cube ME can operate using some or all of the following data items (an indicates that information is required on Confidence Levels):

Link Counts
For highways, this information may be surveyed with considerable accuracy and exploit automatic counters, but it may not show the current demand for travel (which the O-D matrix should represent) if congestion has constricted flows. For public transport, this data is often obtained from estimates of passenger numbers in buses and rail carriages, and is of inherently limited accuracy (but may still be usefully exploited by Cube ME). For both modes, it should be observed that matrices normally apply to average situations for which individual counts will match to only some extent. Link counts which are spread randomly across the network contribute relatively little information to the estimation of matrix cells. This may be less of a problem for public transport networks offering limited alternative routes, than for highway networks with inherently greater route choice options.

Turning Counts
The same comments as for link counts apply. Note that turning counts may only be applied when inputting a VOYAGER path file. They are not supported for an estimation using a TRIPS RCP file.

Prior Trip Matrix


This matrix might be an out-of-date matrix for the study area, or possibly a previous study forecast for the present day. It is not essential to input a Prior Trip Matrix, but in practice a matrix is very desirable for information about the pattern of trip movements.

Trip Cost Matrix


This matrix summarizes the cost of travel between zones, where cost is normally defined as a user-specified combination of time and distance, and any tolls or fares, etc. The Trip Cost Matrix may be used as a substitute when some or all of a Prior Matrix is not available. The costs may be based on either modelled or surveyed speed data.

Partial O-D Matrix

19

Cube ME This is simply another approach to providing the Prior Matrix that makes it possible to use information that specifies some cells of the matrix but not all. The user merely identifies a (relatively) high confidence in those cells which have been observed and allows other information to determine values in the remaining cells. This may be data from the Cost Matrix, in which case the corresponding Prior Matrix cells must be zero. Alternatively, non-observed cells are given nonzero values with zero or low Confidence Levels. Zero values in input matrices are taken to indicate that trips in corresponding cells are impossible. Cost data are not used to estimate trips for cells which have non-zero Prior Trip values. This approach makes Cube ME useful when surveys have been conducted around critical parts of a study area (e.g. town centers, travel corridors, etc.), but there remains a need to estimate the matrix for the rest of the area.

Trip Ends
The total number of trips generated from and attracted to zones (G&A) may be obtained either from surveys or from mathematical land-use type models. Surveys are appropriate when zone boundaries are such that traffic may be counted entering and leaving zones on distinct trips, rather than merely passing through the zone. This tends to occur only for some zones, for example a car park or an industrial estate, but these are often important zones for a study. It is possible to use data derived from both methods, for example, a few zones surveyed and the remainder derived from a model, with the resulting Trip Ends distinguished through differing Confidence Levels.

Routing Information
It is possible to survey routing data, though this is rarely done. The modelling of routing is often not a very good replication of actual (erratic) driver or passenger routing, and it is often not possible to place much reliability on this otherwise important data. Cube ME is therefore designed to use routing information, as far as possible, only where the precise routing does not matter. Thus, for skim cost information small variations in routes may be ignored, while count information is used in 'bottleneck' situations where the number of routes is limited to a few alternative links (ideally one).

Cost Distribution Function


Many areas which have been the subject of previous studies will have a previously calibrated mathematical Trip Cost Distribution Function, as used in the Gravity model. Because Cube ME contains its own calibration procedures, the information implied by the Distribution Function is not normally used directly, although the and parameters, discussed later, may be fixed with reference to a previously calibrated Gravity model.

Part Trip Data


This data is surveyed in the form of matrices where the recorded origin and destination are not necessarily the ultimate origin and destination of the trip. 20

Possible Data Inputs This is illustrated in Figure 3.1, which shows the recorded part of trip (S - E) relative to the total trip (O - D). It is possible for one or both of points S and E to coincide with the corresponding points O and D. For highways, this data is typically obtained from licence plate matching surveys, and from on-board surveys recording passenger boarding and alighting points for public transport.

Figure 3.1: Definition of Part Trip Data

21

Cube ME

Sets of Data
Cube ME estimates one matrix at a time, and the data should form a set related to this particular matrix, that is, the data should correspond to the same time period (hour(s) of day, day of week, time of year) as the matrix. It should also correspond to the same units of flow (vehicles, pcus, passengers, etc.). Sometimes the user will have to transform data (e.g. by factoring) to achieve this, and this will usually imply a reduction (small or large) in Confidence Levels for the transformed data. Also, only one set of information may be input into Cube ME for an estimation. Hence, if multiple sets exist, say, several traffic counts for the same link, then the user must derive a single set. This may simply be to choose the most recently surveyed set, or it might be a weighted average of all available sets. Multiple sets of data usually allow Confidence Levels to be increased relative to single sets of data.

22

Mathematical Background
Mathematical Notation Explaining the Letters and Symbols
'It's All Greek to Me!' This Section uses mathematical notation, which can look daunting for those who are not accustomed to it. So, firstly, a word of background explanation. The notation can be made to appear worse because of the use of greek letters and some specialist mathematical symbols. The problem is that the normal 26 letter Roman alphabet is not sufficient, even considering upper and lower case letters, and remembering that some letters have traditional mathematical meanings and associations. The mathematics which is presented here is only an extract of the full Cube ME mathematics, which uses an even wider range of letters. Also, some of the traditional mathematical notations are cumbersome when used with vectors and matrices and their elements, as Cube ME requires, hence it is better to use alternative forms. This is mainly a pronunciation guide, but some of the symbols and letters are explained further: alpha beta eta theta lambda xi (upper case) xi (lower case) pi (upper case); symbol for multiplication (product) sigma (upper case); symbol for summation phi (upper case) psi (upper case)

* partial differential operator


nabla; symbol for (partial) differentiation of matrix elements

23

Cube ME e exponent ! factorial operator (e.g., 4! = 4x3x2x1) The notation P(xX) implies the probability of x, given the value X. Similarly, L(xX) is the likelihood of x, given X; M(xX) refers to the log-likelihood of x, given X. Note the use of bold in the last example implies that x and X are multi-valued vectors (or matrices

Notation Used in the Estimation Equation


= = = = origin zone destination zone link count screenline count (from count sites .....) = Model Parameters

mean travel cost

= = H Observed

any one of the Model Parameters observed data item estimated data item h Estimated

these may take values as shown below: Description number of trips from i to j number of trips from origin i number of trips to destination j number of trips through link k

This notation is used in the sections 'Introduction to the Mathematics in Cube ME' and 'Mathematical Summary'.

24

Mathematical Background

Introduction to the Mathematics in Cube ME The Need to Know


The design of Cube ME means that a user can estimate matrices simply by supplying the program with the appropriate input data and accepting the resulting matrix. However, it is valuable to have some understanding of how Cube ME calculates the value of the estimated matrix cells; this insight both helps in providing confidence in the results and in guiding the approach to input data, such as setting confidence levels and considering the potential effects of extra data or improved data quality.

Presentation of the Main Mathematical Features


This section is intended to cater to those Cube ME users who are interested in the detailed mathematical and statistical underpinnings of the estimation process. Users who are more interested in other aspects of the model should proceed to the section titled Data Preparation and Analysis of Results. The basis of Cube ME's calculations is an application of the standard statistical approach known as the Maximum Likelihood method. This method allows estimates of a set of inputs to guide the estimates of a corresponding set of outputs; the estimates of the set of inputs are obtained from Likelihood Functions, which are expressions of probability distribution functions (pdfs) associated with the user's input data. The outputs are calculated from an estimation equation, which must be provided. These points are further explained below. Given the range of possible input data, the full mathematical expression of Cube ME is complex, but it involves some principal components which we use to describe the essential features of Cube ME. The Mathematical Summary, explains the standard Cube ME calculations by summarizing the main mathematical steps. Extensions to the Calculations, shows how additional features are accommodated in the calculations. This section continues with explaining Cube ME's mathematics in largely descriptive terms, while introducing the main equations. Throughout this Section, the mathematical notation is defined Mathematical Notation, where it is not otherwise clear from the text.

The Estimation Equation


The heart of the estimation is an equation ('estimation model') whose output, , corresponds to the values of the cells of Cube ME's output matrix for trips between zones ME is: and . The form of this mathematical estimation model in Cube

25

Cube ME

.....(1) This equation contains the following elements:


its output, some data items:

- the prior observation of trips between

and

- the probability of trips between zones and using screenline site (it is possible for a 'screenline' to correspond to a single count site, in suitable circumstances)
some ai, bj, XK. Model Parameters

implies the product of If there is no

over all the screenline count sites

prior observation for movements between some or all possible , then may be calculated by Cube ME from:

origin-destination zone pairs, .....(2)

Equation (2) introduces further elements:


a data - the generalized cost of travel between zones and Parameters item

two

Model

It may be noted that screenlines are usually organized so that Also, because

or

provides an estimator of the output, as well as possibly being an

input data item, it may also be considered as a Model Parameter. Hence, the data item is also referred to as but are logically distinct.) . (That is, and are numerically identical,

The form of equation (1) has been chosen primarily for reasons of convenience, and for the appropriateness of its form according to the data used in the estimation (as we discuss below). It is designed to be efficient is assisting information to be processed, but is not behavioral in nature. This implies that Cube ME is suitable for estimating present day matrices, but not for forecasting which would require some behavioral assumptions.

26

Mathematical Background Equation (2) is borrowed from the well-known Gravity model that makes the behavioral assumption that people prefer lower cost journeys to higher cost ones, but are influenced by the level of trips generated by and attracted to different zones. This is a broad assumption; it means that cost data may be used where no other source of prior matrix data is available, but it is not a precise approach to estimating individual matrix cells.

Model Parameters
For Cube ME, therefore, the estimated matrix is entirely dependent on the values given to the Model Parameters. Cube ME is thus, in effect, solely concerned to establish the most appropriate values for these Model Parameters. (Cube ME's calculations are in 'parameter space', which accounts for some of the behavior that may be observed in Cube ME's output to the screen and log file while it is computing, where the values of the matrix may change in an apparently erratic manner.) Cube ME's calculations are mainly in the nature of a search for the 'best' Model Parameter values. Apart from the estimation equation itself, the main features of the Cube ME calculations are:
directing the search for Model Parameters values - 'optimization' deciding whether the new Model Parameter values are the 'best' - 'function evaluation'.

We now describe the general issues for Cube ME when setting Model Parameter values. Unless the user supplies an input Model Parameter file (created either by an earlier run of Cube ME), the Model Parameters are automatically initialized to 1.0. From equation (1), it may be seen that the initial estimate is identical to the Prior Matrix (or based on the Cost Matrix, equation (2), if no Prior Matrix value exists). It is possible to compare the Estimated Matrix with all of the items of the user's input data. For example, the sum of rows and columns of the Estimated Matrix may be compared with input Trip Ends (the Mathematical Summary, shows this in mathematical terms for all data items). If the result of this comparison indicates that the current estimate is too low, then an improved Estimated Matrix may be achieved by increasing the value of, at least, some Model Parameters. The 'problem' for Cube ME is that there are many items of user data, implying many comparisons of the type just described; some of these comparisons may require the current estimate to be improved in one way (increased, say), while other comparisons need the estimate to be altered another way (decreased, say). The large number of Model Parameters provides the basis for reconciling these apparent conflicts;by definition there are (2 x the number of zones) Model 's and the 's alone. It may be demonstrated Parameters provided by the that these are sufficient for equation (1) to define any possible combination of positive, non-zero matrix cell values. Hence, if, by some means, suitable values of the Model Parameters may be found, equation (1) can produce a matrix which is consistent with all of the user's input data. That is, at least, if the input data is self-consistent in the first place.

27

Cube ME Of course, this consistency is never the case in real applications of Cube ME, and the best that may be hoped for is to estimate the matrix which is most likely, given the user's input data. Achieving this 'most likely' result is the next main topic to discuss, but we will stay with Model Parameters to make a few more points. In principle, there is nothing particular to distinguish the set of Model Parameters ; mathematically, they are equal and each may be affected by any item of data. However, the form of the estimation equation allows Parameters to be associated naturally with different types of data such as: - Trip Ends, for trips generated at zone i and attracted to zone j - Counts on screenline site K - Trip Cost information - Prior Trip matrix). ( This association is useful to the optimizer in reflecting the different (quality) characteristics of the data sets. The nominally redundant parameters provide extra 'degrees of freedom' to handle data inconsistencies. This is useful, as the matrix cells affected by a set of screenline data are precisely defined by the routing information.

The Maximum Likelihood Objective Function


When Cube ME establishes values for the Model Parameters, it requires a criterion to determine if the corresponding Tij estimates either are 'correct' or are 'better' than another set of Model Parameter values. This criterion is provided by a mathematical equation called an Objective Function. The Objective Function, , for Cube ME has the following form:

.....(3) where: - is an estimated data item - is an observed data item - is the confidence level associated with .

and can represent but, in The Mathematical Notation shows which items general terms, is the input data which the user supplies and is the corresponding value implied by the estimated matrix. We have already discussed how the form of the estimation equation (1) has been determined for reasons of effectiveness, but which remain essentially arbitrary; also, how equation (2) derives a weak behavioral basis from the Gravity Model. It is therefore important to appreciate that in contrast, the Objective Function, 28

Mathematical Background equation (3), is the result of a statistically rigorous procedure, namely the Maximum Likelihood method. The consequence of this is a guarantee, subject to some qualifications which we consider below, that the estimated matrix is the statistically most likely, given the data supplied by the user. The 'correctness' of the estimate remains, of course, dependent on the quality of the input data. Maximum Likelihood theory shows that the most likely values are indicated when M in equation (3), which is negative, reaches its minimum possible value. (For reasons of computational convenience, Cube ME minimizes the negative of the 'Log-Likelihood' Objective Function, rather than maximizing the positive version, as the name 'Maximum Likelihood' might suggest.) The qualifications mentioned before respectively concern the input data sets representing 'independent observations', which is not normally a problem for Cube ME users, and of the input data being described by a probability distribution function, which we now discuss. The derivation of equation (3) for the Objective Function is outlined in the Mathematical Summary.

Describing the Variation in Data


The Maximum Likelihood method assumes that each item of input data represents an observation from a random distribution of possible values, but where the variation of values may be described by a probability distribution function. That is when the user supplies Cube ME with, say, a screenline traffic count value of 1684 vph; this is not considered to be the count for that screenline but, rather, a sample from a distribution. It is common experience that counting the same screenline on another, but equivalent occasion (for example, the same time the following week) will provide another count value, say 1739 vph, simply on account of the random variation which is inherent in all traffic (and passenger) data. The assumption is made, therefore, that all input data for Cube ME is subject to variation which may be described by the Poisson probability distribution function (pdf). A graphical example of the Poisson pdf is shown in Figure 4.1.

29

Cube ME

Figure 4.1: Illustration of a Poisson Probability Distribution Function

The Poisson is a well-known pdf, often associated with data which can involve many 'events' (for example, 1684 vehicles passing an observer in an hour). It has the statistical property that its mean equals its variance. This is valuable for data such as count information where the variation of 100 vph is significant when the mean figure is 200 vph, but not when it is 1000 vph; alternatively, a 10% variation implies many vehicles on a mean of 5000 vph, but not on 50 vph. The Poisson distribution reflects these changes in significance in an appropriate way. During the original development of Cube ME alternative assumptions about the pdf used to describe data variation were reviewed; the Log-Normal distribution for example, but these were considered only to add complexity, rather than accuracy. It is usually that case that the Poisson is a good way of describing traffic and passenger data. The Poisson distribution also has the considerable merit that it leads to some mathematical relationships where the role of confidence levels is clearly apparent. In particular, the Mathematical Summary shows an element of the calculation concerned with calculating the optimum value of the Objective Function which has the following general form (see equation (18) later for details):

, and represent, respectively, the confidence levels (), observed The 1, (H), and estimated (h) values for the first data item, similarly for the second, third, etc., data items. The form of this equation is directly attributable to the use of the Poisson pdf; another pdf, the Normal pdf for example, would give a different and more complex form. 30

Mathematical Background The significance of equation (18) is twofold: firstly, each and every data item is represented in this equation, that is, each cell of the Prior Matrix, each Trip End, each Screenline count, and so on. Thus, all items of data are considered together, not in separate categories. (It is not only equation (18) which shows this, most significantly, so does equation (3), the Objective Function, amongst others.) The second point is that the data contributes as: 1. a ratio of observed to estimated values; 2. a linear combination (i.e. simple addition (+)) of data items, each multiplied (weighted) by its own confidence level. This enables the Cube ME user to view confidence levels as simple weighting factors, even though the derivation of is originally from considerations of data sampling, as discussed in the following section. This would not be the case if a non-Poisson pdf had been used.

The Optimizer: Finding the Minimum Value


We have already discussed how Cube ME is designed to adjust the Model Parameters, from their initial value of 1.0, so that equation (1) leads to a new value of , which provides a new set of estimated data values, .

Equation (3) can then be used to determine if the new estimates are more likely ('more consistent with') the input data, . Cube ME therefore incorporates a powerful optimizer to amend the Model Parameters so that the value of is minimized as much as possible. This minimum is defined mathematically by locating the point at which the gradient of the objective function, with respect to . This well-known the set of Model Parameters, , is zero, that is approach to determining minimum or maximum points is shown in Figure 4.2 which shows in a schematic fashion how the value of the Objective Function, varies according to the value of a parameter, .

31

Cube ME

Figure 4.2: Two Dimensional Schematic View of Variations in Objective Function according to Model Parameter Values

It is at this stage, in particular, that Cube ME is operating in 'parameter space'. The principle is, simply, to adjust each parameter by an amount (the 'step length') and by a search direction (up or down). The optimizer ensures that Cube ME only makes adjustments which improve the situation; i.e. to further Once a set of (improving) adjustments has minimize the Objective Function, been made, the Cube ME optimizer performs another iteration of adjustments to determine whether more improvements are possible, and so on, until no further decrease in the (negative) value of the Objective Function is possible. This approach places several requirements on the optimizer:
efficiency in determining optimum step lengths and directions avoidance of 'local minima' and location of the 'global minimum' (this means being sure that no values of step length and direction could lead to a better result) identification of the minimum point when in the neighborhood of one (this means achieving a stable convergence point).

There are several possible approaches to calculating optimum step lengths and directions. These may be considered to represent a spectrum characterized, at one end, by methods which use a simple strategy to define a step length and direction, but spend more time adjusting these elements through more iterations; at the other end, the methods spend more effort calculating the optimum step length and direction, but require fewer iterations.

32

Mathematical Background The direction information is held by Cube ME in the Gradient Search Matrix file; this is also known as the Hessian matrix, as the Gradient Search matrix is an approximation for the Hessian. The degree of approximation depends on the method and certain aspects of the calculation, notably the proximity to convergence and the number of iterations since the Gradient Search was last recomputed (controlled in part by Cube ME control parameter ITERH). The significance of the Hessian matrix for Cube ME is that it provides a mathematical description of the relationships between Model Parameters; indeed the Hessian itself approximates to the variance-covariance matrix. This can be exploited by the optimizer to update the direction information in an optimum manner. Through the Cube ME control parameter IHTYPE, the user can select alternative methods. These are listed below in order of increasing calculation effort given to the step length and direction: 1. the method of Steepest Descent 2. Newton's method 3. the quasi-Newton method 4. the method of Scoring. The default procedure in Cube ME uses a combination of methods (iii) and (iv). It starts by using the method of Scoring to calculate an approximation to the Hessian, which requires considerable computational effort. Further improvements to the solution are obtained by the quasi-Newton method, which needs less computation. This method works well and requires very few iterations if the solution is in the region of the optimum value. Otherwise the Gradient Search matrix is recalculated using a method to determine the exact Hessian matrix, a new step length is adopted, and the process repeats itself. (If the exact Hessian cannot be computed, maybe because the results are still far from a converged solution, the method of Scoring is automatically re-applied.) As the solution approaches the optimum, the step length is reduced, allowing the optimum to be located more precisely. A very small step length indicates a close proximity to the optimum value and so the search is terminated when the step length is beneath the threshold defined by Cube ME control parameter UTOL. This is a more practical method of determining when the calculation should finish than monitoring the gradients approaching zero.

33

Cube ME

Mathematical Summary
This section presents a further explanation of Cube ME's calculations, as given in the Introduction to the Mathematics in Cube ME, using the mathematical notation. a) The Maximum Likelihood Method: Background Theory Maximum Likelihood is a standard method of estimating parameters of mathematical modeling equations, based on sets of relevant data observations. Given values of the model parameters, the pdf defines the probability associated with the observed data. When viewed as a function of the model parameters, the pdf is called a Likelihood function. The values of the parameters which maximize this function are called maximum likelihood estimates. They correspond to a model in which the probability of the observed data is maximized. The estimation process has two elements of establishing the likelihood function and of determining the optimum parameter values to maximize it. Mathematically, the theory may be expressed as:
.....(4)

where:
= random variable = observation = parameter (or function of a parameter)

The Likelihood Function is then defined to be:

.....(5)

where:

i.e.

is a set of

observations

The optimization process is to find the value of

that maximizes

Application of Maximum Likelihood to Cube ME


In accordance with the above theory, but with a slightly altered notation, the following are defined:

34

Mathematical Background

= a data item ( =above) = an estimated item ( =above)

It is assumed that the appropriate pdf is

.....(6) where is called the 'weighting factor'. It can be seen that is a Poisson

random variable with mean . Thus can be considered a scaling parameter which defines the time units in the underlying Poisson process. A likelihood function may thus be defined as:

.....(7) Taking logarithms of Equation (7) leads to: .....(8) It may be noted that = constant

Referring to Equation (5), and considering all data items, H, a Likelihood Function may be defined as:

.....(9) For computational ease, the task of maximizing L may be converted to the minimization of:

.....(10) where .....(11) Equation (10) therefore represents the general form of the Objective Function which is minimized by Cube ME. b) The Cube ME Objective Function

35

Cube ME Cube ME allows varied data items to be used in the estimation, that is, H and h may represent different data items, as shown in the following table: Observed data value, Nij Estimated data value, Description number of trips with origin at zone i and destination at zone j number of trips with origin at zone i number of trips with destination at zone j number of trips through screenline K

Oi Dj QK

where: RijK is the proportion of trips in matrix cell (i, j) using screenline K
Table 4.2: Observed Data, H, and Estimated Equivalents, h.

Substituting these observed and estimated data items into Equation (10) gives an objective function shown below, with the source of the data indicated. For reasons to do with function evaluation, the estimated tij is treated as a least squares minimization in the objective function. The objective function then becomes: Objective Function, M = Comment Screenline counts Trip origins Trip destinations Prior matrix Cost matrix derived

.... (12)
indicates summation over cells which are zero in the prior matrix, but where not the cost matrix. c) The Cube ME Trip Estimation Model The objective function, Equation (12) above, is used to calibrate the trip estimation model of the form:

.....(13)

36

Mathematical Background where tij = Nij or

d) Estimating Model Parameters It follows, by differentiation of Equation (11):

.....(14)

.....(15) (note: undefined for h=0) The minimum value of the objective function, M, for a parameter when . The remaining steps are to: i) calculate using equation (13) and current values of Model Parameters; for each set of input and estimated data; , is found

ii) use Table 4.2 to calculate

iii) calculate

as we show below, for each set of estimated data.

Table 4.3: Substitutions for Equation (15)

leads to

.....(16) 37

Cube ME where

.....(17) and

.....(18) Note: are constants or . . We

is undefined if

In equation (16) we need to substitute each set of model parameters for start by determining

for each parameter reproducing Model Equation (13),

.....(13) where or = constant

let Then differentiating (13) gives,

(for each )

.....(19)

(for each )

.....(20)

(for

) .....(21)

38

Mathematical Background

.....(22)

.....(23) Finally, we substitute (19) to (23) into (16) for each value of , and use an of that

optimization procedure to choose parameter values that give values minimize the Objective Function (9). e) Optimization Procedure Given an initial guess

Cube ME computes the maximum likelihood estimates from

by generating a sequence of estimates

where by

is a suitable steplength, and

denotes a search vector given

is equal to the expected value For the Method of Scoring used by Cube ME, of the Hessian matrix. It may be shown that this can be represented as

indicates the expected value, and where gradient vector of the Objective Function, Parameters, The .

, which denotes the , with respect to the Model

entry of the matrix

is given by

.....(24) From equation (16) we can write

.....(25) 39

Cube ME This leads to

.....(26)

The formulae for

and

are given in equations (19) to (23).

When is calculated by the quasi-Newton method (as previously described in the Introduction to the Mathematics in Cube ME), the Hessian matrix updates the expected value, f) Parameter Errors The optimization produces an estimate estimate of the parameter value itself, Therefore, . the variance of parameter , and an , using the BFGS update formula.

Standard Error =

.....(27)

and the range within one Standard Error is

g) Cell Reliability The 'sensitivity' of the estimate of , , is defined to be

.....(28) is the objective function, where differentials. and represents a matrix of second

40

Mathematical Background

Extensions to the Calculations Hierarchic Estimation


Hierarchic estimation is described in Hierarchic Estimation Hierarchic estimation calculates two forms of matrix, the District Matrix and a set of Local Matrices. Apart from the aggregation of information which is implied by converting a Zonal Matrix to a District Matrix, the estimation of a District Matrix is entirely similar to a standard estimation. The estimation of the Local Matrices is, equally, similar, but it introduces a new set of data, derived from the District Matrix, which are referred to as 'Side Constraints'. To understand this 'Side Constraint' information, we show a Local Matrix in a schematic form in Figure 4.3.

Figure 4.3: Relationship of Side Constraints with Local Matrices

The set of side constraint variables, in terms of prior 'observed' (H) and estimated (h) data, and associated Confidence Levels, , are: H h 41

Cube ME

PZTZ FZTZ = PZTR FZTR = PRTZ


i1 ij

PZTZ PZTR PRTZ

FRTZ = 1j Note: The specifications of PZTZ (observed), FZTZ (estimated), etc., are indicated in Figure 4.3. Note that the corresponding Confidence Levels, PZTZ, PZTR and PRTZ are all set by the user with Cube ME's ZCONF control parameter. (The Confidence Levels for the Trip Ends applied to the District matrix are set according to the minimum values of the generation and attraction Trip Ends Confidence Levels found in the Trip End file.) These values of H and h are the substituted in the same manner which applies to other sets of data represented by H and h.

42

Data Preparation and Analysis of Results


Overview
This chapter focuses on the tasks which the user undertakes as part of the estimation process. There are a series of data preparation tasks which are discussed in the following sections. Most of the tasks only require data files to be created in a relatively mechanistic manner, but two of the tasks require the user to make considered choices. These are discussed in Screenlines, and in Setting Confidence Levels. The final sections in this chapter explain the estimation stage in terms of tasks facing the user. As Cube ME usually requires minimal input from the user, apart from the supply of prepared data files, the estimation stage is very straightforward. However, advice is given on possible ways of improving the speed of estimation. This may be achieved through:
influencing the strategy by which the Hessian matrix is calculated, which is used in the optimization stages of Cube ME - see Tuning Estimation Performance; avoiding unnecessary detail in the routing files, which can be burdensome for the data processing elements of Cube ME - see Control of Routing Information.

The final set of activities for the user are to analyze the results to assess the quality of the estimation, partly to determine if and how they might need to be improved. This topic is discussed in Analyzing the Results. The ideas introduced in this chapter are subsequently illustrated in later chapters with an example application of Cube ME, based on an actual study. Further details on points covered in this Section are provided in the Standardized Estimation Procedures.

43

Cube ME

Matrices
Cube is used to set or modify individual cells or ranges of cells. This also permits Confidence Levels to be easily set to global or individual values. Figure 5.3 illustrates the concept of a Prior Matrix (Table 101) giving information about basic trip patterns, together with an associated Confidence Matrix (Table 102) that discriminates between data reliability for different groups of movements. Intrazonals can be included in the matrix. Note that because routings only cover inter-zonal trips, the intrazonals will not be affected by the screenline counts. They will just impact on the trip ends. So as their role is limited, there is a case for omitting intrazonals from the estimation. Note that if intrazonals are included in the trip ends, then they should also be included in the matrix. If the trip ends do not include intrazonals, the intrazonal cells of the input matrices should be zero.
+-------------------------------------------------------------+ | | | TABLE = 102 (Confidences ) | | 1 2 3 4 5 6 7 8 9 10 | +-------------------------------------------------------------+ | 1: 20 20 20 20 40 40 20 20 20 20 | | 2: 20 20 20 20 40 40 20 20 20 20 | | 3: 20 20 20 20 40 40 20 20 20 20 | | 4: 20 20 20 20 40 40 20 20 20 20 | | 5: 40 40 40 40 40 40 40 40 40 40 | +-+--------------------------------------------------------+ 40 | | | 20 | | TABLE = 101 (Prior ) | 20 | | 1 2 3 4 5 6 7 8 9 10 | 20 | | ------------------------------------------------+ 20 | | 1: 1 1 0 5 45 126 50 21 30 55 | 20 | | 2: 1 5 0 70 125 36 38 50 58 14 | 20 | | 3: 1 1 0 2 108 119 90 69 148 44 | 20 | | 4: 69 3 0 1 6 7 6 3 25 3 +----+ | 5: 100 1 0 192 71 20 12 11 14 7 | | 6: 36 2 0 88 52 6 3 7 16 13 | | 7: 62 3 0 32 36 58 9 63 9 61 | | 8: 0 1 0 64 65 30 119 19 121 64 | | 9: 0 7 0 57 123 70 178 279 7 38 | | 10: 0 10 0 7 31 3 1 10 21 3 | | 11: 0 13 0 19 35 4 96 170 28 29 | | 12: 0 5 0 41 286 52 103 117 29 56 | | 13: 0 9 0 24 99 50 90 91 23 12 | | 14: 4 3 14 20 56 19 67 58 21 7 | | 15: 28 2 36 1 185 1 1 2 15 1 | +----------------------------------------------------------+ Figure 5.1: Prior Matrix (Table 101) and Confidence Levels (Table 102)

44

Data Preparation and Analysis of Results

Trip Ends
Trip Ends may be determined either by reference to an existing matrix, surveys (e.g. of parking), or they may be calculated from equations.

45

Cube ME

Networks and Traffic and Passenger Counts


Cube is used for preparing networks. Traffic and passenger counts, together with Confidence Level information, is input into the Volume Field storage areas associated with each link.

46

Data Preparation and Analysis of Results

Screenlines
Screenlines are used to minimize the effects of assignment errors. Screenlines are defined as the set of count sites which intercept traffic/passenger flows between sets of zones which share the same general corridors of movement (across which the screenlines are suitably located). The extent of a screenline is determined by the number of alternative (reasonable) paths which are available. In many public transport networks where services are sparse, or in rural highway networks, there may only be a single reasonable route between one general area and another. In this case, screenlines may correspond to single links (although they are still treated as 'screenlines' in this context of Cube ME). In general, however, a screenline will represent a set of links. In the case of highways, a useful type of screenline is provided by a river or a railway line, that has only a few crossing points. In this case all traffic must be routed through known points, and so assignment error associated with the screenline will be minimized. For Cube ME, there is no difference between a group of traffic counts on separate links (that form a logical screenline) and a single link count amalgamating the flows on separate traffic lanes. There will normally be few, if any, screenlines that entirely bisect a study area and so intercept all trips either side of it. Cube ME therefore employs the concept of partial screen lines. They are partial in the sense that they do not extend between the boundaries of a study area, but they intercept all trips between, at least, certain defined pairs of zones. The method for defining such partial screenlines is manual, and based partly on judgement and the availability of count data sites. The routing information, together with user-defined screenlines, is used to define the set of O-D pairs whose routes they intercept. The aim is to group count sites into screenlines that balance the objectives to: 1. maximize the number of O-D pairs that have all routes passing through a screenline, and 2. minimize the number of O-D pairs per screenline, as this maximizes the information value of the counts for the corresponding matrix cells. Figure 5.2 shows an example of screenlines for an example urban area. Features that these screenline locations demonstrate are shown in Table 5.1.

47

Cube ME

Figure 5.2: Typical Screenline Configuration for an Urban Area

Screenline Location Northern Western Southern Ring Road Eastern

Function Screenline over a single link (e.g. a bridge) intercepts all traffic to and from the North. Parallel, alternative routes from the West require a single screenline intercepting both routes for this corridor. Non-radial traffic is intercepted by (two) screenlines on orbital road. Similar parallel routes for long distance traffic to Western side, but parallel routes for local traffic require additional, shorter screenline. Note use of count location in more than one screenline. Detailed movements in centre intercepted with several short screenlines.

Central Area

Table 5.1: Features of Screenline Locations Shown in Figure 5.7

48

Data Preparation and Analysis of Results

Routings
Matrix estimation requires information about which routes are used to connect each pair of origin and destination zones, and the probability that each route is used. Ideally this would come from survey information, but this is onerous and not very practical, so the method uses modeling instead. This routing information is one of the outputs from the assignment process. For TRIPS users it is stored in the route choice probability (RCP) file. For VOYAGER Highways users, it is stored in the Voyager path file.

Highways
The main requirement for Cube ME is for the routings to reflect all reasonable alternative paths whilst avoiding spreading out too much so that they become unrealistic. For VOYAGER users, the paths reflected in the Intercept file derive from combining the all-or-nothing paths from each assignment iteration into one set. This can be done directly in the HIGHWAY program. Alternatively, HIGHWAY can be used to generate a path file, and the appropriate path sets and volumes selected from it for use in Cube ME. TRIPS users could use a similar approach, or apply one of the stochastic methods. When considering networks where congestion is a factor, the assignment itself relies on the trip matrix that the estimation is trying to provide. Hence it may be preferable to apply routes derived using methods that can calculate multiple routes between zones based on stochastic (statistical) methods, rather than to rely on the paths from a capacity-restrained assignment. TRIPS supports two such methods, known after their originators as Burrell and Dial. Both methods can be used successfully with Cube ME, but Burrell can have limitations in large networks when routes traverse large numbers of links. In this case, the 'central limit theorem' of statistics means that the chances of routes having the same cost for a different set of randomized link costs (which is the approach used in Burrell) become higher the more links occurring on an average route. The consequence of this is that it is more difficult to generate varied routes. (It can be noted, in passing, that the length of routes in terms of distance is not a problem for the implementation of Burrell used in TRIPS.) The Dial method is not subject to this effect concerning routes with many links so it is the approach that is advised. Note that in cases where estimation is being used to update a matrix that is not anticipated to have changed by very much, for instance, it was obtained from a relatively recent survey, then the RCP file from an existing converged capacity restraint assignment may be used in preference to Dial. The choice here is a matter of judgement on relative accuracies of the RCP information.

Public Transport
Cube automatically produces multi-route paths and can also store them in a RCP File. The determination of which links are used to connect pairs of origindestination zones is a function of a path building algorithm which generates a set 49

Cube ME of 'reasonable' paths. These are based on considerations of generalized cost, which reflect users' data about transit times, fares, boarding and transfer penalties, and so on. A sub-mode split model can be used to reflect passenger biases when deciding if different modes (bus, metro, rail, etc.) are candidates for inclusion into the set of reasonable paths.

50

Data Preparation and Analysis of Results

Setting Confidence Levels


Mathematically, Confidence Levels have the dual facets of being sampling rates and weighting factors. Confidence Levels are entered as percentages but, from both points of view, values of greater than 100 are legitimate.

Characteristics of the Data


The ability of a Confidence Level to help match an estimated data item (trip end, screenline flow, matrix cell) to its corresponding observed value is influenced by: 1. Data Consistency If data is consistent and free of errors, then the Confidence Levels will have no influence as they, essentially, help to mediate between different estimates implied by different data items. Conversely, more discrepancies within the data increase the importance of Confidence Levels; 2. Data Quantity As all data is present in the Objective Function (see Mathematical Background), the quantity of data is influential, besides the Confidence Levels. This means that, for example, relatively large confidence levels applied to the prior matrix, which has many data elements, will tend to restrict the scope of a few count sites to influence the estimated matrix to a significant degree. Of course, this may be the desired effect in some circumstances. An improved match with any data item can always be achieved with an arbitrarily large Confidence Level, but it will normally be necessary for users to check the appropriateness of Confidence Levels that are input.

Deciding on Confidence Values


A practical approach to setting Confidence Levels is often to establish a dataset as a reference benchmark, and then set the Confidence Levels of other data relative to this. For example, if a program of automatic counting means that traffic counts are well and recently observed, then these may be given a high Confidence Level, say 100, and Confidences for other data set relative to that value. Note that an implied range of 1 - 100 (or of that order of magnitude) has been found to be suitable for many studies. Large applications (say, of 500 zones or more) will tend to encounter a greater range of absolute data values, which can imply the need for a wider range of Confidence Levels (see the discussion above). The need for this is suitably assessed by means of sensitivity analysis on the Confidence Levels. Some general observations applying to Confidence Levels for different categories of data are given below, in descending order of magnitude of Confidence Levels for most applications:

51

Cube ME
At least some count sites should have observations made over several days (weeks, etc.) to determine basic levels of variability associated with single observations; Count Confidences should be set with respect to the time period applying to the estimated matrix (e.g. a series of counts made on Tuesdays is only a partial observation if the matrix is to correspond to an average working day); In the case of highways, Trip End Confidences are unlikely to exceed Count Confidences, and will usually be less due to observational difficulties; in the case of public transport, the two sets of Confidences are more likely to be similar; Even when Trip Ends have been determined simply from the row and column totals of the Prior Matrix, the aggregation of the data means that the Trip End Confidences will be higher than the corresponding individual cell Confidences. For this reason, Trip End data should always be used when a Prior Matrix is input; Prior Matrix cells are, individually, unlikely to have high Confidences even when collected by recent, good surveys because there are so many elements of the matrix. This becomes truer as the number of study area zones increases (due to the difficulty of observing all possible movements adequately); Cost Matrix data may be obtained reasonably reliably, but the relevant Confidence concerns the use of this data for trip estimation and this normally only offers an approximation.

52

Data Preparation and Analysis of Results

Tuning Estimation Performance


In general Cube ME should be run with default parameter settings. In the majority of cases this will lead to a converged solution, within a reasonable number of iterations. In some cases an excessive number of iterations may be required or Cube ME may be unable to find a converged solution. In the latter case Cube ME will report that it has halted optimization for a reason such as 'No further progress possible - linear search failed', rather than the successful message 'Convergence detected'. Such a message is usually caused by excessively inconsistent data being input to Cube ME which pulls the optimizer in opposite directions to the extent that no solution can be found. To correct this, the user is normally required to check the input data. However, Cube ME does provide an extra control in the form of the parameter ITERH. This determines the frequency by iteration for the calculation of the Hessian matrix (see Introduction to Mathematics in Cube ME) which directs the optimizer towards the solution. Although this calculation is a time consuming process, it will result in the optimizer converging in significantly fewer iterations. For the case of unconverged problems, recalculation of the Hessian may provide the direction which the optimizer needs to find a solution. For example, if a problem was halted after 58 iterations, try setting ITERH=50 to see if a new Hessian will allow the optimizer to converge. In most cases, recalculation of the Hessian matrix will result in longer run times. In particular, time will be wasted if ITERH is set to low values such as 40 or less. Cube ME will determine a suitable value for ITERH. It is only recommended for the user to set ITERH in order to attempt to solve convergence problems (which are encountered only exceptionally).

53

Cube ME

Control of Routing Information


For many estimation runs, the production of the O-D intercepts for Screenlines and/or Part Trip data takes as much or even more time than the actual estimation itself. Cube ME just needs the reasonable paths so controlling the routing to avoid the production of routes used only by a small proportion of trips is an important aspect of achieving practical run times for the estimation. This is particularly the case for public transport which can often supply a huge variety of routes. For large models this could result in the production of the intercepts requiring an excessive time to complete; this can be an order of magnitude greater than if parameters are given appropriate settings. Too many routes can also result in file sizes becoming too large for practical use. Routing information can be supplied to Cube ME in the form of a TRIPS RCP file, or VOYAGER path file. Before starting the estimation proper, Cube ME analyses the routes through screenlines and/or part trip links to produce the Intercepts which it saves in an ICP file. It is important to note that this intercept file can be input back into subsequent estimations as long as the links of the screenlines and/or part trip data are not modified. This is achieved by setting option INTCPT=T or WARMST=T as appropriate and will result in a considerable time saving.

54

Data Preparation and Analysis of Results

Analyzing the Results


Cube ME produces its results as a set of tabulation for printing or viewing, and as a set of files which may be subject to further analysis - one of these files is the estimated matrix itself. Tabulations Cube ME's printout is ordered as follows, after the standard header information: i) a summary of input data characteristics, showing:
which data types were used in the estimation; average Confidence Levels, and their ranges; the number of data elements for each type of data.

This information indicates the relative 'weighting' of data in the estimation process, which is important to know when assessing the results; ii) A summary of the values of key indicators from the last five iterations before the optimization halted. The indicators, and their values, are the same as Cube ME outputs to the screen during the course of its calculations. They are:
the iteration number the Stepsize the value of the Objective Function the Estimated Matrix total number of trips.

The reason for halting is also shown, which will normally be 'Convergence Detected'. This information is mainly provided for confirmation that the estimation calculations operated in an appropriate manner (e.g. that the Objective Function value never increased). These two elements of Cube ME's printout are shown in Table 6.4a (and in an abbreviated form in Table 6.1); iii) the remainder of Cube ME's tabulations are concerned with comparisons between the user's input data and the corresponding values derived from the estimated matrix. Comparative information is output, when applicable, for:
trip matrix totals part trip data total trip generations from zones total trip attractions to zones screenline flow counts.

55

Cube ME The general pattern of this comparative information from Cube ME is shown in Table 6.4b (Tables 6.2 and 6.3 contain this information in a slightly altered format). Tables 6.4a and 6.4b illustrate the case for Cube ME including Part Trip data. Hierarchic Estimation output conforms to this same basic pattern, but extra information is provided, as explained in Hierarchic Estimation, and illustrated in Figures 8.12a - 8.12d. As a rule, the user will be looking for good correspondences between input data and estimated results. However, it is important to note that a poor comparison between input and estimated information is not, by itself, a sign of a poor quality estimation. The reason is (or should be) that a data item with a higher Confidence Level is dominating the estimation with respect to data which is also relevant, but which has a lower Confidence Level. The approach to analyzing Cube ME's comparative results is, therefore, to identify data which has not been matched well in the estimation and to determine what the other data might be causing the discrepancy. Often this is straightforward, for example, a screenline flow count with a markedly different value from trip end values for adjacent zones. If the discrepancy seems unwarranted then this may be a cause to review either the data values themselves, or their Confidence Levels. (One cause of discrepancies which may not be immediately apparent, is poor routing information, for example, on account of inappropriate generalized cost parameters.)

56

The Estimation Process In Application


Study Area
This section discusses a highways based application of Cube ME for an 82 zone study area for the town of Guildford in Surrey, UK, (pop. 100,000). The network shown in Figure 6.1 has a major bypass for the town, which is shown as a thicker line. Zone centroid connectors are shown as pale blue lines. Eleven zones were designated cordon-crossing zones at the study area boundary.

Figure 6.1: Guildford Highway Network

57

Cube ME

Data
The network was well provided by current traffic Counts and these were all given a Confidence Level of 80, which served as a benchmark for other data confidences. Most of the Trip End data was synthesized, by disaggregation of UK Department of Transport data with reference to zonal population and employment figures, and was given Confidence Levels of 40. Higher Confidence values of 80 were set for external Trip Ends, determined from a cordon crossing survey, and to a set of five zones in the town center area that were the subject of a car park survey. An out-of-date Trip Matrix existed, which served as the Prior Matrix, and which was given a uniformly low confidence for each cell of 5. Sixteen Screenlines were defined, which are shown in Figure 6.2.

Figure 6.2: Screenlines for Guildford

MVHWAY in TRIPS was used to calculate three sets of Burrell paths. The degree of randomization was controlled by setting the SPREAD parameter to 25, a relatively low value selected after viewing paths for different values using MVGRAF and using local knowledge of the network. MVHWAY was also used to prepare a Cost Matrix based on minimum cost routes.

58

The Estimation Process In Application

Estimating the Matrix


Cube ME offers a number of controls on the calculation process and convergence criteria, but these were left to take default values and the process of running Cube ME itself was entirely straightforward. However, a series of estimation runs were undertaken, as described below. The results provided by Cube ME of the first estimation are shown in Tables 6.16.3. These show extracts of the Cube ME printed reports, from which a number of observations can be made. i) Because each data item enters the objective function, the number of elements associated with each different type of data is significant, as well as their Confidence Levels (Table 6.1). AVERAGE CONFIDENCE LEVELS (EXCLUDING ZERO VALUES) Average Trip matrix confidence levels Screen line confidence levels Trip end (dest) confidence levels Trip end (orig) confidence levels Optimisation halted because: Convergence detected 5.0 80.0 47.8 47.8 Maximum 5.0 80.0 80.0 80.0 Minimum Number of Elements 5.0 6724 80.0 16 40.0 82 40.0 82

Table 6.1: Confidence and Convergence Summary

ii) The optimizer adjusts Model Parameter values and evaluates the resulting cell estimations in a series of iterations. The mathematics of the optimizer implies that it will converge to a solution in a number of iterations which is less than the number of model parameters.
MVESTM with Counts, Input Prior Matrix and Trip Ends Only REPORTING OBSERVED/ESTIMATED GENERATIONS AND ATTRACTIONS GENERATIONS NO OBS. EST. OBS-EST 4869.0 4324.3 544.7 11.2% 3825.0 3745.0 80.0 2.1% 1798.0 2559.5 -761.5 -42.4% 419.0 383.2 35.8 8.5% 1256.0 1572.5 -316.5 -25.2% 2045.0 1731.1 313.9 15.4% 1935.0 1815.4 119.6 6.2% 1794.0 1894.8 -100.8 -5.6% 3662.0 3364.9 297.1 8.1% 430.0 388.9 41.1 9.5% missing.... 3870.0 3176.5 693.5 17.9% 2778.0 2618.2 159.8 5.8% 5450.0 4633.8 816.2 15.0% 2943.0 2741.1 201.9 6.9% 736.0 806.5 -70.5 -9.6% 368.0 785.9 -417.9 -113.5% 4042.0 4062.2 -20.2 -0.5% 1821.0 1964.4 -143.4 -7.9% ATTRACTIONS % OBS. EST. OBS-EST % 3657.0 3591.6 65.4 1.8 2984.0 3571.1 -587.1 -19.7 5715.0 5710.1 4.9 0.1 558.0 528.3 29.7 5.3 2018.0 2156.1 -138.1 -6.8 2084.0 1998.6 85.4 4.1 2112.0 2194.3 -82.3 -3.9 2673.0 2815.2 -142.2 -5.3 4763.0 4247.7 515.3 10.8 273.0 307.1 -34.1 -12.5 2370.0 1304.0 3257.0 3006.0 1151.0 930.0 1523.0 2026.0 2375.0 1616.1 3175.7 2807.4 1107.2 909.7 1570.5 2083.9 -5.0 -0.2 -312.1 -23.9 81.3 2.5 198.6 6.6 43.8 3.8 20.3 2.2 -47.5 -3.1 -57.9 -2.9

ZONE 1 2 3 4 5 6 7 8 9 10 Some 30 31 32 33 34 35 36 37

59

Cube ME

38 4719.0 4763.3 -44.3 -0.9% 2683.0 2326.7 356.3 13.3 39 3116.0 3440.8 -324.8 -10.4% 6410.0 6234.9 175.1 2.7 40 3030.0 3369.2 -339.2 -11.2% 5227.0 6016.3 -789.3 -15.1 Some missing.... 70 1829.0 1639.9 189.1 10.3% 1251.0 1214.6 36.4 2.9 71 1089.0 1160.4 -71.4 -6.6% 1298.0 1364.2 -66.2 -5.1 72 4396.0 4122.0 274.0 6.2% 4226.0 3952.8 273.2 6.5 73 10600.0 11231.3 -631.3 -6.0% 11100.0 11146.4 -46.4 -0.4 74 6950.0 6931.0 19.0 0.3% 5806.0 5720.2 85.8 1.5 75 9200.0 9605.6 -405.6 -4.4% 9200.0 9384.8 -184.8 -2.0 76 14423.0 15045.9 -622.9 -4.3% 14313.0 14109.6 203.4 1.4 77 1008.0 824.4 183.6 18.2% 722.0 655.4 66.6 9.2 78 2270.0 2217.8 52.2 2.3% 2270.0 2236.2 33.8 1.5 79 5665.0 5396.6 268.4 4.7% 5665.0 5465.7 199.3 3.5 80 26660.0 26727.9 -67.9 -0.3% 28912.0 27872.0 1040.0 3.6 81 5310.0 5258.3 51.7 1.0% 5990.0 5940.6 49.4 0.8 82 6033.0 6390.8 -357.8 -5.9% 6085.0 6601.1 -516.1 -8.5 Table 6.2: Trip End comparison of prior (observed) and estimated values

iii) Cube ME prints basic comparisons of input and estimated data for (a) Trip Ends (generations and attractions - Table 6.2) and (b) screenline inputs (Table 6.3). This information must be interpreted with care, as a difference may be a good feature, indicating that some other, more reliable information has determined the estimated result.
MVESTM with Counts, Input Prior Matrix and Trip Ends Only REPORTING OBSERVED/ESTIMATED SCREEN LINE COUNTS SCRLINE NO OBSERVED ESTIMATED OBS-ESTM % 1 11677.0 11301.1 375.9 3.2% 2 11677.0 11925.8 -248.8 -2.1% 3 27947.0 26234.4 1712.6 6.1% 4 25504.0 25213.3 290.7 1.1% 5 28539.0 31075.9 -2536.9 -8.9% 6 28431.0 30261.4 -1830.4 -6.4% 7 18981.0 15441.2 3539.8 18.6% 8 18809.0 18445.5 363.5 1.9% 9 24000.0 23770.1 229.9 1.0% 10 24435.0 23585.0 850.0 3.5% 11 7225.0 7635.8 -410.8 -5.7% 12 7225.0 8479.7 -1254.7 -17.4% 13 16285.0 16367.7 -82.7 -0.5% 14 22670.0 23883.7 -1213.7 -5.4% 15 6261.0 6511.4 -250.4 -4.0% 16 6022.0 6886.0 -864.0 -14.3% Table 6.3: Screenline comparison of prior (observed) and estimated values

60

The Estimation Process In Application

Evaluation: Sensitivity Analysis


The Estimated Matrix was also evaluated by examining how sensitive the results were to changes in the input data: Alterations in Confidence Levels: The effect of assumptions in setting Confidence Levels was tested by increasing the Confidence Levels from 80 to 200 on two Screenlines for the major traffic carrying road (the town by-pass). Using the previously calculated Model Parameter and Gradient Search (Hessian) Matrix, the re-estimation, in this case, required only six iterations. The differences between observed and estimated Screenline counts were correspondingly improved: Flow differences Screenline Before (80) After (200) (i) 1713 1094 (ii) 291 219 Flow % differences (i) 6.1 3.9 (ii) 1.1 0.9

Elsewhere, other screenlines were marginally affected, both better and worse, apart from one screenline where the improvement was much more noticeable. In general, the results suggested that small changes in Confidence Levels were not significant, but that improvements were obtainable where it was possible to refine values of Confidence Levels rationally. Matches of estimated and input data can always be improved for individual data items by increasing the corresponding Confidence, but this will only have a net improvement on the Estimated Matrix when it does not exacerbate data inconsistencies.

61

Cube ME

Including Part Trip Data


The original estimation of the Guildford matrix was later updated using a set of data which corresponded to a license plate match survey taken around the center of the town. The data was pre-processed and converted into a set of link flows, as illustrated in Figure 6.3 in terms of bandwidths; this also serves to indicate the extent of the survey.

Figure 6.3: Part Trip Data Shown as Link Flows, using Bandwidths

The estimation was re-run, now incorporating the following sets of information:
Prior Matrix Trip Ends Link Counts Part Trip Data

Figure 6.4 shows the two sets of link flow information which were used. Link Counts, shown as 'open' bandwidths, and Part Trip, as shown previously in Figure 6.3. It may be noted that some links had both Link Count data (as shown in Figure 5.4) and Part Trip data. (This is visible on the color screen display for Figure 6.6.) In this application, the Confidence Levels for Link Counts were set higher, at 80 or more, than those for Part Trip data, which were set at 60 in recognition of the sampling process inherent in license plate surveys.

62

The Estimation Process In Application

Figure 6.4: Part Trip Data and Link Counts

Results of New Estimation Extracts of the Cube ME results of new estimation are shown in Table 6.4 (a and b). These are similar to those presented in Tables 6.1 to 6.3, but with additional information concerning Part Trip data, and with some differences of presentation format. From Table 6.4b, it may be noted that the estimated Part Trip Flows match the overall number of observed Part Trips, in this case, to within 1.9%. This, of course, partly reflects their relatively high Confidence Levels and 'Number of Elements', which are reported near the top of Table 6.4a. 'Number of Elements' for Part Trip data is the number of (one-way) links with Part Trip data. Figures 6.3 and 6.4, in fact, show respectively estimated and observed Part Trip data, but the difference is too small to make clear graphically in this particular application. It is therefore useful to view the correspondence as a tabulation. This report is shown in Table 6.5, which is headed by a key explaining the storage of information in Volume Fields. AVERAGE CONFIDENCE LEVELS (EXCLUDING ZERO VALUES) ------------------------------------------------Average of Trip matrix confidence levels 6642 Screen line confidence levels 5.0 95.0 5.0 200.0 5.0 80.0 63 Maximum Minimum Number

Cube ME

16 Trip end (dest) confidence levels 82 Trip end (orig) confidence levels 82 Part Trip confidence levels 226 SUMMARY OF FINAL FIVE ITERATIONS -------------------------------Iteration 34 35 36 37 38 Stepsize (Tolerance=0.00010) 0.0003559 0.0001208 0.0001890 0.0001123 0.0000580

47.8 47.8 60.0

80.0 80.0 60.0

40.0 40.0 60.0

Objective Value -8859208.83 -8859208.83 -8859208.83 -8859208.83 -8859208.83

Matrix Total 229655.8 229656.0 229655.9 229655.9 229655.9

Optimisation halted after Convergence detected

38 iterations because:

Table 6.4a: Results of Estimation - including Part Trip Data

REPORTING PRIOR/ESTIMATED MATRIX TOTALS CONFIDENCE PRIOR)/PRIOR(%) 5.0 238498.0 PRIOR ESTIMATED ESTM-PRIOR (ESTM-3.7%

229655.9

-8842.1

REPORTING OBSERVED/ESTIMATED PART TRIP FLOW TOTALS CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV OBSV)/OBSV(%) 60.0 972944.0 991158.2 18214.2 (ESTM1.9%

REPORTING OBSERVED/ESTIMATED GENERATIONS AND ATTRACTIONS GENERATIONS ZONE NO CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTMOBSV)/OBSV(%) 1 40.0 4869.0 4714.7 -154.3 -3.2% 2 40.0 3825.0 3756.0 -69.0 -1.8% 3 40.0 1798.0 2015.2 217.2 12.1% 4 40.0 419.0 398.8 -20.2 -4.8% 5 40.0 1256.0 1381.2 125.2 10.0% 6 40.0 2045.0 1879.7 -165.3 -8.1% 7 40.0 1935.0 1866.6 -68.4 -3.5% 8 40.0 1794.0 1786.8 -7.2 -0.4% 9 40.0 3662.0 3490.7 -171.3 -4.7% 10 40.0 430.0 411.9 -18.1 -4.2% Some missing.... ATTRACTIONS ZONE NO CONFIDENCE OBSERVED ESTIMATED OBSV)/OBSV(%) 1 40.0 3657.0 3661.8 64

ESTM-OBSV 4.8

(ESTM0.1%

The Estimation Process In Application

2 40.0 3 40.0 4 40.0 5 40.0 6 40.0 7 40.0 8 40.0 9 40.0 10 40.0 Some missing....

2984.0 5715.0 558.0 2018.0 2084.0 2112.0 2673.0 4763.0 273.0

3142.7 5668.6 535.7 2067.6 2000.6 2092.6 2629.0 4437.3 279.5

158.7 -46.4 -22.3 49.6 -83.4 -19.4 -44.0 -325.7 6.5

5.3% -0.8% -4.0% 2.5% -4.0% -0.9% -1.6% -6.8% 2.4%

REPORTING OBSERVED/ESTIMATED SCREEN LINE COUNTS SCREENLINE CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV OBSV(%) OF ODs NO & NAME 1 A'shot Rd W-E 80.0 11677.0 11370.7 -306.3 -2.6% 2 A'shot Rd E-W 80.0 11677.0 11651.1 -25.9 -0.2% 3 A3-Hogs Back S-N 200.0 27947.0 26670.6 -1276.4 -4.6% 4 A3-Hogs Back N-S 200.0 25504.0 24896.4 -607.6 -2.4% 5 A3-Parkway W-E 80.0 28539.0 29956.5 1417.5 5.0% Some missing....
Table 6.4b: Results of Estimation - including Part Trip Data

NO

219 221 153 154 538

NETWORK IDENTIFIER <Network with Estimated Part Trip Flows> VOLUME VOLUME VOLUME VOLUME VOLUME FIELD FIELD FIELD FIELD FIELD 1 2 3 4 5 NAME NAME NAME NAME NAME <Obsv> <Conf> <PrtT> <PrtC> <EPtr> Observed Link Counts Confidences Levels for Link Counts Observed Part Trip Data Confidence Levels for Part Trip Data Estimated Part Trip Data

Print Comparisons of Part Trip Data and Estimates REPORT 4: LINK VOLUME FIELDS ANODE BNODE ----1 <Obsv> 2 <Conf> 3 <PrtT> 4 <PrtC> 5 <EPtr> REPORT 4: FIELDS ANODE 2120 2119 2105 2112 2644 ---------- ---------- ---------8552. 18809. 0. 80. 80. 0. 6871. 17775. 9073. 60. 60. 60. 7139. 18420. 10808. LINK VOLUME

65

Cube ME

BNODE ----1 <Obsv> 2 <Conf> 3 <PrtT> 4 <PrtC> 5 <EPtr> ANODE BNODE ----1 <Obsv> 2 <Conf> 3 <PrtT> 4 <PrtC> 5 <EPtr>

2127 2207 2212 ---------- ---------- ---------0. 5387. 0. 0. 80. 0. 4906. 3497. 2821. 60. 60. 60. 4565. 3356. 2822. 2843 2113 2194 2841 ---------- ---------- ---------1226. 0. 0. 80. 0. 0. 3635. 213. 0. 60. 60. 60. 3325. 231. 0.

Table 6.5: Report on Observed and Estimated Part Trip Data

66

Hierarchic Estimation
Introduction to Hierarchic Estimation Approaches to Estimating Very Large Matrices
There are formidable data processing and computational issues to be faced when estimating very large matrices, whose size may lie in the range of 2,500 to 10,000 zones for major transport studies. Theoretically, the matrices can have between 25002 and 100002 (6,250,000 to 100,000,000) cells to estimate, although the practical number of cells with non-zero trips will only be a fraction of this. Nevertheless, the number of cells to be estimated in typical applications will be of the order of 250,000 to 750,000 cells. The natural approach, which is used in hierarchic matrix estimation, is to reduce the estimation problem to a more manageable size by grouping information. However, it is necessary to recognize that the pattern of trips across many large study areas, such as conurbations, is not readily partitioned. For example, a data item such as a flow count or a trip end may relate to trips with dispersed origins and destinations which may not easily be grouped. It is therefore a feature of Cube ME hierarchic estimation that each of the different approaches to estimation offered, and which are described below, always considers all of the trips in the entire study area.

Different Levels of Detail: Districts and Zones


The approaches offered by Cube ME hierarchic estimation considers the OD matrix at two levels of detail:
a fine level, which is the original zoning system and results in a Zonal matrix; a coarser level, which aggregates (groups) sets of zones into a limited number of districts, from which a corresponding District matrix may be produced.

The total number of trips in the Zonal and District matrices is the same.

Different Approaches to Hierarchic Estimation


The main method is called 'Hierarchic Estimation' as the estimated District matrix is used to control a series of estimations primarily conducted at the Zonal level. This process leads to a fully updated Zonal matrix. Hierarchic estimation also allows a variant method in which the 'District' matrix is defined as a mixture of District and Zonal detail. The resulting 'District' matrix which is estimated includes some cells estimated at the Zonal level. The output estimated matrix has fewer rows and columns than the input matrix, but there will be a direct correspondence between certain of the cells as selected by the user. This variant is valuable when it is only necessary for the application to update cells relating to only parts of the large study area, for example, to update 67

Cube ME cells for an administrative borough within a large city region. The method only requires a single estimation, rather than the series of estimations used in the main Hierarchic Estimation process. This Hierarchic Estimation variant is referred to as 'Combined District and Zonal' estimation. The underlying estimation process is common to all Cube ME runs but there are differences in how information is grouped in Hierarchic Estimation. Apart from differences in information grouping, the Combined District and Zonal estimation is very similar to a standard estimation. The Hierarchic Estimation method introduces a new concept, which is called a Local Matrix. This is explained in the next section.

68

Hierarchic Estimation

Alternative Approaches to Hierarchic Estimation Estimation with Mixed District and Zonal Detail
The majority of this section is concerned with Hierarchic Estimation, but it begins with a view of the approach for Combined District and Zonal Estimation, shown in Figure 7.1. This shows the estimated matrix where the sides of the cells have been scaled according to the geographical size of the areas to which they relate. That is, the large sides correspond to Districts and the small sides to Zones. This has resulted in three types of cells:
a) large squares - all information is estimated at District level; b) small squares - all information is estimated at Zonal level; c) rectangles - information is estimated at a mixture of District and Zonal detail.

Figure 7.1: Combined Estimation of Selected Zones and Districts

The user may choose whether to retain information at mixed levels of details, as shown, or (manually) to extract the cells fully estimated at Zonal detail (the small squares in Figure 7.1) to update a portion of the zonal Prior matrix. As shown in Figure 7.1, the detailed estimation has been for trips traveling from one part of the study area to another; if the small squares were located on the diagonal of the main square shown in Figure 7.1, then the detailed estimation would be for all trips within, and traveling to and from, a particular part of the study area, such as a town center area.

69

Cube ME Some points to note about this approach are: 1. although the terms 'Zonal' and 'District' have been used to indicate different levels of detail, Cube ME considers this form of estimation as a special form of 'District' estimation, without recognizing that a selected number of 'Districts' are simply individual Zones; 2. there must be the same number of origin and destination Districts, which is not the case for Hierarchic Estimation; 3. this approach requires a single estimation

Local Matrices
When using Hierarchic Estimation, Cube ME first estimates a District matrix, which is used to influence the calculation of a set of Local matrices. These Local matrices contain a mixture of Zonal detail and District-based information. The estimated Zonal detail is captured automatically by Cube ME and, as each Local matrix is estimated, is used to develop progressively an update of the entire matrix at the Zonal level of detail. The District matrix simply represents the Zonal matrix aggregated into a District matrix, although the District matrix may be non-square, that is, there may be a different number of origin and destination Districts. Further information about Districts is given later in this Section. Figure 7.2 shows an illustration of a Local matrix as an extension of the combined District and Zonal matrix shown in Figure 7.1, and discussed above.

70

Hierarchic Estimation
Figure 7.2: Zonal Estimation Controlled by District Matrix

In Figure 7.2 all of the large squares, where information is only estimated at District Level, have been shaded. This is because this portion of the matrix is treated in a Local matrix as a single unit, termed 'Rest-of (the)-World - RoW'. A Local matrix therefore has the following elements, with reference to Figure 7.2:
a detailed Zonal level set of cells (the small squares) trips in the Rest-of-World (shaded area) trips from RoW to Zonal level area (rectangular cells) trips to RoW from Zonal level area (rectangular cells).

A Local matrix is defined for each origin and destination District pair (the unshaded part of Figure 7.2 represents one such pair), and the fully Estimated (Zonal) matrix is produced when all local matrices have been estimated. Information involving trips from the RoW is obtained from the District Matrix. This element, and the fact that the total number of trips is the same (in principle) for each local matrix, ensures that consistency is maintained across the entire study area, even though detail is calculated separately in estimations for different parts.

Summary of the Hierarchic Estimation Process


The Hierarchic Estimation process may be summarized as the four stages illustrated in Figures 7.3 - 7.6 i) Creation of Districts from Zones - Figure 7.3 Figure 7.3 shows a study area divided into many (small) Zones (denoted by ij). These are grouped into a number of fewer (and larger) Districts (denoted by IJ). Later topics in this chapter give more information about creating districts.

71

Cube ME

Figure 7.3: Districts (I,J) and Zones (i,j)

ii) Estimate District Matrix - Figure 7.4 This is the first operation by Cube ME, which estimates a small matrix for the 5 to 15 origin and destination Districts which are typically defined. One of the cells, corresponding to a pair of origin and destination Districts which contribute to a Local matrix is referenced in Figure 7.4 as Mij. Figure 7.4 indicates the information in the District matrix estimation: the Prior matrix and Trip Ends are automatically aggregated from the user's input Zonal-level information. Internally, Cube ME creates a 'condensed network' but does not aggregate the screenline count data. This treatment of data is reflected in Cube ME's reports on the District matrix (see Figure 7.12b).

72

Hierarchic Estimation

Figure 7.4: Estimate District Matrix

iii) Estimate Local Matrices - Figure 7.5 Figure 7.5 relates to a single Local matrix, but this stage is repeated for all Local matrices. Cube ME can estimate all Local matrices in one run, but the user may exercise considerable control over this process. Figure 7.5 shows the same structural elements introduced in the discussion on Figure 7.2. The information used to estimate Zonal cells, referenced in Figure 7.5 as Mij, is shown in Figure 7.5: the Prior matrix and Trip Ends are used at Zonal level in the estimation, count data is used as input where relevant to the local matrix, and other items are obtained from the corresponding District matrix estimation. This use of information is reflected in Cube ME's reports on Local matrices (see Figures 7.12c and 7.12d).

73

Cube ME

Figure 7.5: Estimate Local Matrix

iv) Build-up Full Estimated Matrix - Figure 7.6 Figure 7.6 indicates the construction of the Fully Estimated matrix from detailed information (Mij) calculated from a set of Local Matrices. While the matrix is in the form shown in Figure 7.6, that is, with only some of the cells estimated, it is referred to as the Partially Estimated Matrix. Those cells in the Partially Estimated Matrix which have not yet been estimated contain copies of the corresponding Prior matrix cells. (This can provide another means of estimating just part of a study area, namely, by restricting the estimation to selected Districts/Zones of interest.) When all cells of the Partially Estimated Matrix have been estimated, it, of course, becomes the Final Fully Estimated Zonal matrix.

74

Hierarchic Estimation

Figure 7.6: Combine Local Matrices in Partially Estimated Matrix

75

Cube ME

Defining Districts
Hierarchic Estimation is a heuristic method which approximates the formal mathematical methodology provided by a standard run of Cube ME. It is most appropriate when the study area is large enough to encompass sub-areas which can become Districts where the travel patterns are reasonably independent of one another. The purpose of the estimated District matrix is, largely, to consider the interDistrict movements, while the focus of Local matrices is the intra-District movements. Because precision (greater detail) is associated with the latter, it is desirable to minimize the amount of inter-District movements. The number of Local Matrices is approximately the square of the number of Districts. It therefore can make a considerable difference to computational times whether, say, 10 Districts are chosen (about 100 Local Matrix estimations) or 8 Districts (about 64 Local Matrix estimations). Not all study area Zones may be allocated to Districts in this way, either because some or all trips from or to a Zone do not pass through a screenline, or because allocation of the Zone to a District would violate the maximum number of Zones per District. Zones are then allocated to the adjacent District, based on the coordinates associated with Zone Centroids. The effect of allocating Zones to District which is not based on routing behavior is potentially to worsen the effects of the approximation implicit in Hierarchic Estimation. In many cases, this worsening may be negligible in practice, but will be more significant if those Zones involve relatively large numbers of trips, or if a significant proportion of Zones are involved. It is this latter consideration which makes it inadvisable to use Hierarchic Estimation on Study Areas with less than 500 zones. The considerations involved in defining Districts may be summarized as:
the fewer Districts the better; the maximum Local Matrix size is determined by the maximum size of standard estimation which may be conveniently run on the available computer (say 1000 2500 zones); the more allocation of Zones to Districts on the basis of routings through screenlines the better.

Note that it is a feature of Hierarchic Estimation Districts that there may be a different number of origin and destination Districts (that is, the District matrix may be 'non-square'), and the allocation of origin Zones to origin Districts is independent of the allocation of that same Zone to a destination District. This enables the asymmetries of trip patterns to be reflected, as, for example, in a morning peak matrix when trips originate from many Zones in the suburbs and head for only a few destination Zones in the city center. This is of value to the estimation process, but means that the District Matrix and the Local Matrices cannot be reported directly.

76

Hierarchic Estimation

Running Cube ME for Hierarchic Estimation


Cube ME is run in a similar manner to non-Hierarchic Estimation except that:
Option DSTRCT=T, to indicate calculation/use of a District matrix LMC and DDF files are input additionally Parameter ZCONF is set.

If Cube ME is run with an incomplete LMC file, then the estimated matrix is a Partially Estimated Matrix. This matrix provides an additional input file when further Local Matrices are to be estimated. The Model Parameter file only ever contains information relating to the District Matrix (and not any Local Matrices), and the Execution Log file contains brief summary information for both District and Local Matrix Estimations. The printout file for Hierarchic Estimation contains the same type of information as for non-Hierarchic Estimation, as illustrated in The Estimation Process in Application. However, there may be many sets of this information: the first set of information always refers to the District Matrix estimation. This is followed by a set of information for each Local Matrix being estimated, noting that this may be none in the case of a combined District and Zone estimation. (Because estimations involving many Local Matrices can generate very large print files, it can be convenient to edit the Local Matrix Control file to create a series of runs of Cube ME in which the size of individual print files is reduced.) An additional item of information is provided for Hierarchic Estimation concerning the influence of the District Matrix on each Local Matrix estimation. The table with this information, shown in Figure 7.12c, is labeled 'Side constraints on Matrix Totals'. This term refers to the constraints of the District Matrix on various sides (and elements) of the Local Matrix, as illustrated previously in Figure 7.5. Reporting Hierarchic Estimation Results, discusses the printout for Hierarchic Estimation.

Parameter ZCONF
The extent of the constraining effect of the District Matrix on the Local Matrices is determined by Cube ME Parameter ZCONF, which acts as a Confidence Level, treating the District Matrix as 'observed' data and the Local Matrix as 'estimated'. For the Local Matrix Estimation, therefore, the District Matrix is just another item of observed data and ZCONF should be set in relation to Confidence Levels for other items of observed data. From the user's point of view, the setting of ZCONF should be a reflection of the degree and importance of the interaction between Districts, in terms of trips which cross more than one origin or destination District boundary. (An effect of the automatic generation of Districts is to minimize such boundary crossings.) The District matrix contains information about these interactions; if they are important then the District Matrix should be made correspondingly significant with a relatively high setting of ZCONF. A low value of ZCONF allows Local Matrices to reflect local data more precisely, at the expense of the 'larger picture' 77

Cube ME across the entire study area. A possible symptom of an inappropriate setting of ZCONF might be an unwarranted distortion of the distribution of trip costs/lengths in the Estimated Matrix.

78

Using Cube ME
Input Data - Overview
The data that can be used in estimating the new O-D matrix may include some or all of the following types of data:
A prior (existing) trip matrix Traffic generations and attractions of zones Traffic counts on links and/or turns. Modeled (multiple) paths between zones Cost of travel between zones Parameters of a calibrated trip distribution function Part Trip data, where trips are observed traveling between points which are not necessarily their ultimate origins and destinations

79

Cube ME

Outputs - Overview
The outputs from Cube ME are:
The estimated O-D matrix. Summary reports on differences between input data and corresponding values implied by the estimated matrix. A set of files with information on:

i) Model Parameter values ii) a log of the optimization steps iii) internal Gradient Search and Intercept data

80

Using Cube ME

Estimating Large Matrices (Hierarchic Estimation)


Cube ME provides a hierarchic approach to estimation for use with very large matrices; typically more than 2,500 zones. This is required to make the process more manageable and less time consuming. The basic approach is to estimate a general matrix, in which zones are automatically grouped into districts. This area-wide estimation is then used to control a set of detailed estimations, these build up to provide a fully-detailed estimate for the entire study area. This is discussed in detail in Hierarchic Estimation.

81

Cube ME

The Estimation Process


The only program directly involved in the estimation process itself is Cube ME, although other Cube programs play an important part in the pre- and post processing of the data. The data used may be some or all of the data described earlier in Input Data Overview. Cube ME may also use Model Parameters, Gradient Search, and Intercept files from a previous run of Cube ME for the current estimation to 'warm start' the calculations. Internally Cube ME can be considered to be made up of two main parts each of which is executed alternately, namely: The Estimation Model The function of this is, given some particular values of the Model Parameters, to calculate the Estimated Matrix, Trip Ends, Screenline Volumes etc., and also to perform the likelihood calculation. The Optimization Step This procedure attempts to change the values of the Model Parameters to improve the likelihood value (the objective function). These two stages are carried out alternately in a series of iterations until no further improvement can be made.

82

Reports
Summary of Reports
Listing of input parameters and options, and input binary header information. Mean, minimum and maximum Confidence Levels set by the user for each type of input data are given. Memory requirements. During execution in interactive mode a report of each iteration of the optimization process is given showing the current value of the objective function, the gradient tolerance, and the sum of all the estimated matrix elements. These values for the last five iterations are always reported. On completion, Cube ME provides summary reports on the comparison between sets of input data and the corresponding estimated values, with the Confidence Levels that apply. Where relevant data is input to Cube ME, these reports are produced giving comparisons for: Prior and Estimated Matrices - matrix totals Trip Ends - zone generations and attractions, with input zone generations and attractions Link Flows - screenline volumes and input screenline volumes Part Trips - part trip 'matrix' totals, distinguished by Line Groups, where appropriate. Further information may be obtained by using Cube programs to report on the estimated matrix file. For an estimation using Part Trip data, the output network file contains detailed information on estimated Part Trip link flows (equivalent to an assignment of the estimated Part Trip matrix). Cube ME Reporting for Hierarchic Estimations Cube ME reporting for a Hierarchic Estimation varies according to whether the estimation is for a District or a Local Matrix. The reports for District estimation are the same as for other Levels, except, of course, the results apply to Districts rather than Zones. For Local Matrices, Cube ME additionally provides summaries of the row and column side constraints from the District Matrix, and equivalent values from the Prior Matrix. The first reported 'zone' corresponds to the Rest-of-the-World 83

Cube ME (RoW), while the other reported zones are the set of Zones relevant to that Local Matrix. No Screenline reports are produced for Local Matrices. The execution log file is output by the optimization step of Cube ME, and three levels of report may be produced. These are controlled by the IREP parameter. The contents of the log file will not normally be of interest to general users, but are of assistance in summarizing the progress of the calculation should investigation be required.

84

Reports

Example of Average Confidence Level Report


AVERAGE CONFIDENCE LEVELS (EXCLUDING ZERO VALUES) -------------------------------------------------

Average Maximum Minimum Number of Elemen ts Trip matrix confidence levels 6724 Screen line confidence levels 16 Trip end (dest) confidence levels 82 Trip end (orig) confidence levels 82 20.0 95.0 47.8 47.8 20.0 200.0 80.0 80.0 20.0 80.0 40.0 40.0

85

Cube ME

Example of Final Five Iterations Report


SUMMARY OF FINAL FIVE ITERATIONS -------------------------------Iteration Stepsize (Tolerance= 0.0001) 149 0.0004152 239547.4 150 0.0005055 239548.6 151 0.0003342 239550.3 152 0.0002368 239551.0 153 0.0000781 239551.0 Objective Value -4735264.48 -4735264.48 -4735264.49 -4735264.49 -4735264.49 Matrix Total

Optimization halted after 153 iterations because: Convergence detected Final Value of Maximum Search Step, UMAX = 0.01

86

Reports

Example of Matrix Totals and Zone Generation Report


REPORTING PRIOR/ESTIMATED MATRIX TOTALS CONFIDENCE PRIOR PRIOR)/PRIOR(%) 20.0 238498.0 ESTIMATED 239551.2 ESTM-PRIOR 1053.2 (ESTM0.4%

REPORTING OBSERVED/ESTIMATED GENERATIONS AND ATTRACTIONS GENERATIONS OBSERVED ESTIMATED

ZONE NO CONFIDENCE OBSV)/OBSV(%) 1 2 3 4 5 6 7 8 9 10 11 <continued> 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 80.0

ESTM-OBSV

(ESTM-

4869.0 3825.0 1798.0 419.0 1256.0 2045.0 1935.0 1794.0 3662.0 430.0 9200.0

4387.9 3763.3 2562.3 386.7 1574.9 1743.0 1827.1 1904.4 3288.3 381.3 9347.4

-481.1 -61.7 764.3 -32.3 318.9 -302.0 -107.9 110.4 -373.7 -48.7 147.4

-9.9% -1.6% 42.5% -7.7% 25.4% -14.8% -5.6% 6.2% -10.2% -11.3% 1.6%

87

Cube ME

Example of Zone Attractions Report


ATTRACTIONS OBSERVED ESTIMATED 3657.0 2984.0 5715.0 558.0 2018.0 2084.0 2112.0 976.0 2673.0 0.0 5665.0 3586.4 3500.3 5556.7 518.4 2162.9 1948.0 2129.7 1030.0 2804.5 0.0 5549.3

ZONE NO CONFIDENCE OBSV)/OBSV(%) 1 40.0 2 40.0 3 40.0 4 40.0 5 40.0 6 40.0 7 40.0 8 80.0 9 40.0 10 40.0 11 80.0 <continued>

ESTM-OBSV -70.6 516.3 -158.3 -39.6 144.9 -136.0 17.7 54.0 131.5 0.0 -115.7

(ESTM-1.9% 17.3% -2.8% -7.1% 7.2% -6.5% 0.8% 5.5% 4.9% n/a% -2.0%

The trip end summaries can also be produced with the zone labels. Short zone labels are printed if NODLAB=T, LNGLAB=F: ATTRACTIONS ZONE NO,NAME CONFIDENCE OBSERVED ESTIMATED OBSV)/OBSV(%) 1 <Beaumont> 40.0 3777.0 3382.2 2 <Cross_Ro> 40.0 3482.0 3441.1 3 <Binley_S> 40.0 5815.0 5220.2 <continued>

ESTM-OBSV (ESTM-394.8 -400.9 -594.8 -10.5% -10.4% -10.2%

Long zone labels are printed if NODLAB=T and LNGLAB=T. The example below shows hierarchic zone numbers and long zone labels in the report: ZONE CONFIDENCE OBSERVED ESTIMATED OBSV)/OBSV(%) NUMBER & NAME 28480 <Beaumont Avenue> 40.0 5069.0 5544.5 28172 <Cross Roads, town centre> 40.0 4025.0 4392.2 27848 <Binley Street> 40.0 1898.0 2076.7 <continued> ESTM-OBSV (ESTM-

475.5 367.2 178.7

9.4% 9.1% 9.4%

88

Reports

Example of Average Confidence Level Report (Part Trip Data)


AVERAGE CONFIDENCE LEVELS (EXCLUDING ZERO VALUES) ------------------------------------------------Average Number of Ele ments Trip matrix confidence levels 1083 Screen line confidence levels 2 Trip end (dest) confidence levels 95 Trip end (orig) confidence levels 95 Part Trip confidence levels 594 10.0 80.0 46.7 46.7 7.0 10.0 80.0 80.0 80.0 7.0 10.0 80.0 40.0 40.0 7.0 Maximum Minimum

89

Cube ME

Example of Part Trip Totals Report


This report is produced if option PRTTRP=T. For Public Transport data, the report is as follows: REPORTING OBSERVED/ESTIMATED PART TRIP FLOW TOTALS GROUP CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV OBSV)/OBSV (%) ALL 7.0 1386723.0 1232440.0 -154283.0 1 Local 7.0 624295.0 597702.1 -26592.9 2 Express 7.0 521925.0 532005.7 10080.7 For Highways data, the report is as follows: REPORTING OBSERVED/ESTIMATED PART TRIP FLOW TOTALS (ESTM-11.1% -4.3% 1.9%

CONFIDENCE OBSERVED ESTIMATED 20.0 1590478.0 1606103.8

ESTM-OBSV 15625.8

(ESTM-OBSV)/OBSV(%) 1.0%

90

Reports

Example of District Matrix Reports


REPORTING PRIOR/ESTIMATED MATRIX TOTALS

CONFIDENCE PRIOR 20.0 238498.0

ESTIMATED 240291.2

ESTM-PRIOR (ESTM-PRIOR)/PRIOR(%) 1793.2 0.8%

REPORTING OBSERVED/ESTIMATED GENERATIONS AND ATTRACTIONS GENERATIONS DISTRICT CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTM-OBSV)/OBSV(%) 1 40.0 14616.0 13368.5 -1247.5 -8.5% 2 40.0 48050.0 47995.1 -54.9 -0.1% 3 40.0 7855.0 7711.1 -143.9 -1.8% 4 40.0 40478.0 42008.8 1530.8 3.8% 5 40.0 62530.0 59877.4 -2652.6 -4.2% 6 40.0 15462.0 16832.4 1370.4 8.9% 7 40.0 18734.0 19158.2 424.2 2.3% 8 40.0 6744.0 6707.7 -36.3 -0.5% 9 40.0 26890.0 26631.9 -258.1 -1.0% ATTRACTIONS DISTRICT CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTM-OBSV)/OBSV(%) 1 40.0 21562.0 22434.0 872.0 4.0% 2 40.0 43850.0 43476.7 -373.3 -0.9% 3 40.0 43963.0 44217.7 254.7 0.6% 4 40.0 21627.0 20809.8 -817.2 -3.8% 5 40.0 30926.0 30638.1 -242.9 -0.8% 6 40.0 37198.0 39445.7 2247.7 6.0% 7 40.0 8070.0 7973.4 -96.6 -1.2% 8 40.0 15332.0 16906.5 1574.5 10.3% 9 40.0 14423.0 14344.4 -78.6 -0.5% REPORTING OBSERVED/ESTIMATED SCREEN LINE COUNTS SCREENLINE CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV OBSV(%) ODs NO & NAME 1 A'shot Rd W-E 80.0 11677.0 11303.8 -373.2 -3.2% 244 2 A'shot Rd E-W 80.0 11677.0 11947.6 270.6 2.3% 244 <continued>

NO OF

91

Cube ME

Example of Local Matrix Reports


REPORTING SIDE CONSTRAINTS ON MATRIX TOTALS DISTRICT IN-PRIOR ESTIMATED ESTM-DISTRICT (ESTMDISTRICT) CONSTRAINT /ZONAL(%) WITHIN DISTRICT 1506.2 958.0 1456.7 -49.5 3.3% FROM DISTRICT 6204.9 6276.0 5966.2 -238.7 3.8% TO DISTRICT 19303.5 16610.0 19181.9 -121.6 0.6% NOT IN DISTRICT 213276.5 214654.0 214338.2 1061.7 0.5% MATRIX TOTAL 240291.2 238498.0 240943.0 REPORTING OBSERVED/ESTIMATED GENERATIONS AND ATTRACTIONS GENERATIONS ZONE NO CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV (ESTMOBSV)/OBSV(%) R-o-W 40.0 233504.0 233520.1 16.1 -0.0% 25 40.0 1557.0 1625.1 68.1 -4.4% 26 40.0 1753.0 1654.5 -98.5 -5.6% 27 40.0 378.0 338.8 -44.2 -11.7% 28 40.0 1535.0 1339.8 -195.2 -12.7% 55 40.0 211.0 232.5 21.5 10.2% 56 40.0 875.0 878.7 3.7 0.4% 60 40.0 268.0 296.6 28.6 10.7% 61 40.0 1278.0 1061.8 -216.2 -16.9%

ATTRACTIONS ZONE NO CONFIDENCE OBSERVED ESTIMATED OBSV)/OBSV(%) R-o-W 40.0 215324.0 220304.4 30 40.0 2370.0 2431.7 44 40.0 12392.0 11794.8 48 80.0 1708.0 1757.0 53 40.0 209.0 273.1 72 80.0 4226.0 3691.0 77 80.0 722.0 661.0

ESTM-OBSV (ESTM4980.4 91.7 -597.2 49.0 64.1 -535.0 -61.0 2.3% 3.9% -4.8% 2.9% 30.7% -12.7% -8.5%

REPORTING OBSERVED/ESTIMATED SCREEN LINE COUNTS SCREENLINE CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV ODs NO & NAME 1 A'shot Rd W-E 80.0 11677.0 11459.1 -217.9 244 2 A'shot Rd E-W 80.0 11677.0 12580.3 903.3 244 92

OBSV(%) NO OF

-1.9%

7.7%

Reports Note that as for standard estimations, short and long zone labels can be shown in the trip end reports. The label for the 'R-O-W' (Rest-of-the World) will be left blank.

93

Files
Permanent Files
SYMBOLIC* PARAMETER IMAT1 FILE** EXT .CTL .MAT I/O I I FILE DESCRIPTION Control Data File Prior Trip/ Cost Matrix File Trip End Records Model Parameters File Gradient Search File Screenline File Intercept File REQUIRED (R)/ OPTIONAL (O)*** (R) (R)

IDAT1 IDAT2

.DAT .DAT

I I

(R) If TRPEND=T (R) If MODPAR=T or if WARMST=T (R) If WARMST=T (R) If SCRFIL=T (R) If WARMST=T or If INTCPT=T (R) If PRTTRP=T (R) If WARMST=F and INTCPT=F and no IDAF1 input (R) If WARMST=F and INTCPT=F and no PATH input (R) If PRTTRP=T and if network is Public Transport (R) If DSTRCT=T (R) If DSTRCT=T (R) If DSTRCT=T and if WARMST=T (R) If NODLAB=T or using 95

IDAT3 IDAT4 IDAT5

.GDS .DAT .ICP

I I I

INET1 PATH1

.NET .PTH

I I

Network File VOYAGER Path File

IDAF1

.RCP

Route Choice Probability File

IDAF2

.PTL

Lines File

IDAF3 IDAT6 IMAT2

.DDF .DAT .MAT

I I I

District Definition File Local Matrix Control File Partial Estimated Matrix Coordinate File

IDAT7

.DAT

Cube ME

hierarchic numbering OMAT1 ODAT1 ODAT2 ODAT3 ODAT4 .MAT .DAT .GDS .DAT .ICP O O O O O Estimated Trip Matrix Model Parameter File Gradient Search File Optimization Log File Intercept File (R) (R) (R) (R) (R) If WARMST=F and INTCPT=T and either SCRFIL=T or PRTTRP=T (O) (R) If PRTTRP = T and DSTRCT = F (R)

ODAT5 ONET1

.DAT .NET

O O

Text Intercept File Network File

OPRN
Notes:

.NET

Print File

* The SYMBOLIC PARAMETERS are those that would appear in an &FILES record to control file opening. ** The File Extension shown is that used conventionally when running in Application Manager. *** File requirements can vary according to the combination of program PARAMETERS and OPTIONS chosen.

96

Control Data
&PARAM Keywords
It is usual to leave Cube ME control parameters to their default values, with the user only setting the parameters associated with data input and output file definition. These are described below under the sub-heading 'Standard User Control Parameters'. Of the remaining parameters, there is a set which is sometimes changed ('Secondary User Control Parameters') and another which is rarely changed ('Tuning Control'). Most of the parameters in this last set are connected to the operation of Cube ME's optimization process, and hence are only of interest when there is evidence of poor performance in achieving convergence.

Standard User Control Parameters


TABLES

Type = Integer(4) Default = 101, 102, 0, 0 Example: TABLES=101,102,103,104


The input matrix numbers to be used. They are respectively the Prior Trip Matrix and Confidence Levels, and the Cost Matrix and Confidence Levels. MATID

Type = Character(60) Default = Blank Example: 'Estimated Matrix for Study Area'
Matrix identifier. Up to 60 alphanumeric characters can be used to describe the contents of the output Matrix. The identifier should be enclosed in single quotes ('). WIDEND

Type = Integer Default = 2 if hierarchic numbering, 0 otherwise Range = 0-3 Example: WIDEND = 0
Indicates the format of the Screenline File:

97

Cube ME 0 = Cube ME establishes the format automatically. For this to happen, all numeric entries in the file need to be right justified for Cube ME to determine the file format unambiguously. 1 = Version Six format. This supports the record format where both link flow and link toll data are stored in the same record type. 2 = Version Seven format. Two types of Screenline record format can be defined at Version Seven; link flow records ('S' in column one) or link toll records ('T' in column one). The former type must be used for Cube ME runs. 3 = Version Seven format with a CNode column inserted to support the input of turning counts in addition to link counts. If hierarchic numbering is in use (HIERND = T), WIDEND must be set to 2 to use the Version Seven format MFORM

Type = Integer Default = 0 Range = 0-4 Example: MFORM = 0


Indicates the format of the output matrix: 0 = Save the matrix in the same format as the input matrix. This is the default action. 1 = TRIPS 2 = TP+/VOYAGER 3 = TRANPLAN 4 = MINUTP DEC

Type = Character (1) Default = Blank Range = '0' to '9', 'S', 'D', or blank Example: DEC='4'
Defines the precision with which to store values in the output matrix: Blank = Uses same precision as in the input trip matrix. If just a cost matrix is input, a value of '2' is used. '0' to '9' = Store numbers in the matrix as integers representing values to the specified number of decimal places. 'S' or 'D' = Store numbers as floating point numbers in either Single or Double precision. Double precision gives more accuracy to a greater number of decimal places than Single. These values give the best representation in the output matrix, but will generally produce a bigger

98

Control Data output file. This option is only available if the output matrix format is TP+/VOYAGER. PSETS

Type = Integer(50) Default = 1 Range = 1 to Number of paths sets in the input VOYAGER path file Example = 1,3
Applies only when a VOYAGER path file is input. It defines the path sets to apply when building the intercepts for the screenlines. At least one set must be specified, and sets are referenced by their number rather than by their name. PVOLS

Type = Integer(50) Default = 1 Range = 0 to number of volumes in the input VOYAGER path file Example = 1,2
Applies only when a VOYAGER path file is input. It defines the volumes to apply when building the intercepts for the screenline. If a value of 0 is specified, the volumes will be ignored, and the weighting of alternative routes will be solely defined by the iteration factors. Otherwise, PVOLS is a list of numbers that are 1 or more, representing the selected volume. NETID

Type = Character(40) Default = Blank Example: NETID='Network with estimated link flows'
Network identifier. Up to 40 alphanumeric characters can be used to describe the contents of the output Network. The identifier should be enclosed in single quotes ('). NETID is only used if reading part trip data (PRTTRP=T). EFLOW

Type = Integer Default = 2 Range = 1-20 Example: EFLOW = 4


The number of the Volume Field in the output network into which the total link flows estimated by Cube ME will be written. EFLOW is only used if PRTTRP=T. ELINEn

99

Cube ME

Type = Integer Default = ELINE(n)=2+n*2 PT Only Range = 1-20 Example: ELINE1 = 6


ELINE(n) is the number of the Volume Field in the output network into which the link flows estimated by Cube ME will be written for Line Group n. ELINEn is only used if PRTTRP=T and doing a Public Transport matrix estimation. NFLOW

Type = Character(4) Default = 'EFLW' Example: NFLOW = 'EFLW'


Volume Field identifier. Up to four alphanumeric characters can be used to indicate the contents of the Volume Field specified by EFLOW. The identifier should be enclosed in single quotes ('). EFLOW is only used if reading part trip data (PRTTRP=T). NLINE

Type = Character(4)*8 Default = 'NLINEn='ELGn' PT Only Example: NLINE1 = 'ELG1'


Volume Field identifier. Up to four alphanumeric characters can be used to indicate the contents of the Volume Field specified by ELINEn. The identifier should be enclosed in single quotes ('). NLINEn is only used if PRTTRP=T and doing a Public Transport matrix estimation. ZCONF

Type = Integer Default = 100 Range = 1-10000 Example: ZCONF =200


Confidence Level for Side Constraints applied to Local Matrices and derived from estimated District matrix.

Secondary User Control Parameters


The parameters described in this section would only be used to try to reduce the processing times required to achieve convergence. refer to Computation Times. MXITER 100

Control Data

Type = Integer Default = 3000 Range = 1-999999 Example: MXITER = 1500


The maximum number of iterations. Cube ME will stop if this number of iterations has been reached and no convergence has been achieved. The Model Parameter and Gradient Search files are written out and can be used to restart Cube ME (from the position it was in when it stopped) and the optimization continued. The currently estimated matrix is also output. ITERH

Type = Integer Default = Generated by Cube ME Range = 1-9999 Example: ITERH = 4000
The number of iterations between recalculations of the estimated Hessian Matrix. UTOL

Type = Real Default = 0.0001 Range = 0.0-99.0 Example: UTOL = 0.05


The accuracy tolerance in detecting convergence or failure. When the maximum absolute size of the search vector is less than this value then the procedure will be deemed to have converged. IREP

Type = Integer Default = 3 Range = 1-3 Example: IREP = 2


Reporting level for the Optimization Log file. See Information in the Optimization Log. IHTYPE

Type = Integer Default = 4 Range = 0-4 Example: IHTYPE = 2

101

Cube ME This controls the type of optimization process used by Cube ME, as shown in Table 4.1. The difference for values 1 - 4 correspond to differences in the way the initial Hessian matrix, H0, is calculated. Optimization Process Steepest Descent Quasi-Newton Value of IHTYPE 0 1,2,3 Comments 'Simple searching' 1 = H0 set to unit matrix 2 = H0 read from file (warm start) 3 = H0 computed every iteration Hessian calculated regularly, according to setting of ITERH

Newton/Hybrid Newton

Table 4.1:Methods of Optimization

Tuning Control
The parameters documented in this section would normally be changed only in response to an error message generated by Cube ME. In the event of this occurring, please contact [email protected] for advice. MXCALL

Type = Integer Default = 5000 Range = 1-999999 Example: MXCALL = 10000


The maximum number of function evaluations. (This should be greater than MXITER. At least one function evaluation is required at each iteration, possibly more.) MXFREE

Type= Integer Default= 4 Range= 1-10 Example: MXFREE = 7


The number of times a parameter may be freed from its bounds. UMAX

Type = Real Default = 1.0 Range = 0.0-1000.0 Example: UMAX = 0.5


The maximum allowed search step. If the maximum absolute value of the search vector (called UNORMX in the Log Report) is greater than this then the entire 102

Control Data search vector is multiplied by a term UMAX/UNORMX so that the new maximum entry is equal to UMAX.

103

Cube ME

&OPTION Keywords
Note: Options TRIPM and COSTM work in conjunction with one another. TRIPM

Type = Logical Default = True


If TRIPM = T then the input matrix file will contain at least two tables. The first will be the Prior Trip matrix; the second will be the associated Confidence Levels. IF COSTM = F then these will be the only two matrices present in the file. TRIPM = F is only allowed if COSTM = T. COSTM

Type = Logical Default = False


If COSTM = T and TRIPM = T, then the input matrix file will contain four two tables. The first two are as described above; the third will be the Cost Matrix and the fourth will be the associated Confidence Levels. If COSTM = T and TRIPM = F, then the Cost and Confidence Level matrices will be the first and second supplied in the file. SCRFIL

Type = Logical Default = True


If SCRFIL = T then an input Screenline file is supplied. See Screenline File. TRPEND

Type = Logical Default = True


If TRPEND = T then an input Trip End data file is supplied. See Trip End File. INTCPT

Type= Logical Default= False


If INTCPT=T then an input Intercept file is supplied. See Intercept File.

104

Control Data MODPAR

Type= Logical Default= False


If MODPAR = T then an input Model Parameter file is supplied. See Model Parameter File. WARMST

Type= Logical Default= False


If WARMST = T then Gradient Search, Model Parameter, and Intercept files are supplied to warm start the estimation calculation. The input of these files from a previous run of the same model should assist the speed of optimization, but see Approaches to Running Cube ME HIERND

Type= Logical Default= Set automatically from RCP input file; False if no RCP file input.
HIERND=T indicates that a hierarchic node numbering system is in use. This option only needs to be set if no RCP file is input. If there is an RCP input file, the setting of HIERND in the control file will be ignored. NODLAB

Type= Logical Default= False


If NODLAB=T, zone labels will be included in the Cube ME reports. A coordinate file must be supplied containing node labels. Coordinate File. LNGLAB

Type= Logical Default= False


LNGLAB=T to include long zone labels in the Cube ME reports. Note that NODLAB = T must also be set to use long labels. If NODLAB=T and LNGLAB=F, the short zone labels will be used. A coordinate file must be supplied containing node labels. See Coordinate File. PRTTRP

Type= Logical Default= False


105

Cube ME If PRTTRP = T then an input network file is supplied which contains Part Trip link flows. DSTRCT

Type= Logical Default= False


If DSTRCT = T then a Local Matrix Control file and District Definition file must be supplied. See Local Matrix Control File District Definition File.

106

Program Specific Data


Screenline File
This file is required if SCRFIL = T. The Screenline File is used to supply link/turn count and confidence level data to Cube ME. There are two formats of the file supported. The original format (indicate by Parameter WIDEND=2) just supports link counts. An alternative format (WIDEND=3) has an extra column to allow turning counts to be specified.

Link Count Format


The format of the file containing just link counts is as follows: Columns 1 2-5 6 - 14 15 - 23 24 - 33 34 - 40 Type Character Integer Integer Integer Real Integer Contents 'S' Screenline record identifier Screenline number Anode of link Bnode of link Link traffic volume count Confidence Level. A number between 1 and 10000, but usually in the range 1-100, that expresses the user's confidence in the link traffic volume count. This is used only by Cube ME. Screenline name, up to 18 characters (optional) Direction code. For purposes of Matrix Estimation this must be set to 1. X coordinate (optional) and Y coordinate at which to display screenline name on the screen.

41 - 58 60 61 - 70 71 - 80

Character Integer Integer Integer

Turning Count Format


The format of the file that supports turning counts is as follows: Columns 1 2-5 6 - 14 15 - 23 24 - 32 33 - 42 43 - 49 Type Character Integer Integer Integer Integer Real Integer Contents 'S' Screenline record identifier Screenline number Anode of link/turn Bnode of link/turn Cnode of turn (leave blank for link counts) Link/Turn traffic volume count Confidence Level. A number between 1 and 10000, 107

Cube ME but usually in the range 1-100, that expresses the user's confidence in the traffic volume count. This is used only by Cube ME. Screenline name, up to 18 characters (optional) Direction code. For purposes of Matrix Estimation this must be set to 1.

50 - 67 70

Character Integer

Notes:

If a screenline contains more than one link/turn, then Cube ME calculates the screenline count as the sum of the counts for each link/turn in the screenline. Also, the screenline confidence level is set as the weighted average of the individual link/turn count confidence levels. The file can contain a mixture of link and turning counts. For link counts, the Cnode should be left blank. Comment records, which have an asterisk (*) in column one, may appear anywhere in the file.

108

Program Specific Data

Trip End File


This file is required if TRPEND = T. The Trip End file format for Cube ME is therefore: Columns Type Content 1 - 10 Integer Zone Number 11 - 20 Real Generations 21 - 40 unused 41 - 50 Real Attractions 51 - 60 Integer Confidence Level for Generations 61 - 70 Integer Confidence Level for Attractions Comment records, which have an asterisk (*) in column one, may appear anywhere within the file.

109

Cube ME

Coordinate File
The input coordinate file must be supplied if option NODLAB has been set to TRUE. The file supplies the correspondence between node numbers and their hierarchic equivalents. The format of the file is summarized below: Items marked '*' must be coded for hierarchic processing. Columns *1 - 10 11 - 20 21 - 30 *31 - 40 41 - 48 49 - 80 Notes: Type Integer Integer Integer Integer Character Character Content Node number (sequential) X coordinate Y coordinate Hierarchic node number Text for node; short label (optional) Text for node; long label (optional)

Sequential node numbers must be unique; If hierarchic node numbers are being used then:

Hierarchic node numbers must be unique; Columns 31-40 must be coded on each record;

Text labels should normally be left justified in their respective fields; Blank records will be ignored; Records with an asterisk (*) in column 1 will be treated as comment records; Node coordinates are used by the graphics programs and are therefore optional.

110

Program Specific Data

Model Parameter File


This file is required if MODPER = T. This file contains data describing the Model Parameters and their attributes. It would not normally be constructed by the user, as on initiating a run of Cube ME the Model Parameters take a default value, as shown in the table below. However, at the end of the Cube ME run a file is generated containing the new Model Parameters calculated. The amended file, or indeed the unedited file, can be re-input to Cube ME to invoke a 'Warm Start'; that is to continue the estimation process from where the last run finished. The general format of the file is as follows: Record Record One The next The next The next records The next ZONES records ZONES records SCREENLINE two records Description Header record defining the number/type of parameters in the file. Values for the A(i) parameters Values for the B(j) parameters Values for the X(k) parameters The and parameters of the Distribution Model

(if COSTM = T) Where:

ZONES is defined as the number of zones in the matrix. SCREENLINE is defined as the number of screenlines specified.

Note that comment records, which have an asterisk (*) in column one, can appear anywhere in the file. The format of the individual record types is as follows: First Record:

Columns 1 - 23

Typ Cha

24 - 31

Int

32 - 39

Int

40 - 47 111

Int

Cube ME

48 - 55

Int

56 - 63

Int

64 - 77

Rea

78 - 91 92 - 99

Rea Int

Remaining Records: Columns Type 1-8 10 - 22 24 - 36 38 - 50 52 - 64 Integer Real Real Real Real

Int Content Parameter number Parameter value Lower bound for parameter Upper bound for parameter Scale factor for parameter Reserved for Cube ME Default if Parameter not Defined 1.0 0.1 E-6 1.0 E10 1.0

65 - 89 100 - 107 If the file is not supplied by the user then it is created by Cube ME and the default values shown above are used for each of the Model Parameters. If the second, third and fourth fields all have the same value then the parameter is deemed to be 'fixed' at this value. It is a requirement of Cube ME that:
At least one parameter must be free otherwise a fatal error is reported and the program will stop. At least one parameter must be fixed. If not done by the user than Cube ME will fix A(1).

An identical format file is created at the end of an Cube ME run, but it will contain the revised parameter values in it. This is so that Cube ME can be restarted from where the last run was finished if required, or used as a basis for fixing parameter values. Note that Cube ME adds up to three extra columns on the end of each record which are for its own internal use. The information put there should not be edited by the user. 112

Program Specific Data

Local Matrix Control File


This file is required if DSTRCT = T. The format of the file is as follows: Columns Type Content 1 - 10 Integer Origin District 11 - 20 Integer Destination District Comments records, which have asterisk (*) in column one, may appear anywhere in the file.

113

Cube ME

District Definition File


This file is required if DSTRCT = T or WARMST = T. The user may affect the operation of the estimation according to the grouping of Zones into Origin and Destination Districts. The District Definition file which is input to Cube ME is a direct access file, and so it is not amenable to direct alteration by the user.

114

Program Specific Data

Intercept File
This file is required if INTCPT = T or WARMST = T. This is a binary file output by Cube ME and VOYAGER HIGHWAY which stores information on routings and screenlines in a concise format. One established, it may be re-input to Cube ME to save (substantial) processing times when Cube ME is (re-)estimating for data where neither the routings or screenline locations definitions have been altered. This file cannot be edited by the user. Note that there is also a text file version of the Intercept file that can be output. Its purpose is for information only; it is not intended for subsequent input to Cube ME or any other program. The file is written to if the file is named, and is generated from either the input or output binary Intercept file, depending upon which is used. For each screenline it shows: - The number of intercepting I-J pairs. - A sub-header under the screenline for each Origin I that has routes that intercept the screenline. - Under each origin, a list of pairs of numbers. The first number of the pair represents the Destination zone J. The second number of the pair represents the percentage of traffic travelling from the Origin to the Destination that routes through the screenline.

115

Cube ME

Gradient Search File


This file is required if WARMST = T. This is a binary file output by Cube ME which is re-read by Cube ME when warm starting a run (WARMST = T). It contains information used by Cube ME's optimizer and cannot be edited by the user.

116

Notes on Program Use


Approaches to Running Cube ME
There are several approaches that the user may adopt towards running Cube ME, which vary according to the information available to the user about the estimation of any particular matrix. The approaches may be categorized as:

Initial Estimation
Only basic input data is required, as contained in the routes (RCP for TRIPS, PATH or ICP for VOYAGER users), Matrix, Trip End, and Screenline files (as well as optionally a network file with Part Trip data). Program control parameters are allowed to take default values. Occasionally, an input Model Parameter file may be required to influence the model form by fixing some parameter values. Re-Estimation with Altered Data: 'Warm Starting' In this case the Model Parameter, Gradient Search, and Intercept files from the last (or initial) estimation run are input, additionally to the user input data. Warm starting is only valid when the 'structure' of the estimation is unaltered, this means that the number of data items, screenline locations, and routings should not be altered. However, data values and Confidence Levels may be changed. Warm starting is useful either to split an estimation into more than one run of Cube ME, for the sake of convenience, or to undertake sensitivity analysis on the effects of altered data or Confidence Level values. When a run of Cube ME is split for convenience, and no input data is changed, then it is efficient to set the parameter UMAX to the value reported by Cube ME at the final iteration of the previous run.

Constrained Model Parameters


Model Parameters may be constrained to:
reflect a user-specified value e.g. of and parameters partition an estimation into sub-problems to be accommodated within computing resources alter the nature of the trip estimation equation.

If Model Parameters are fixed or freed from run to run then the Gradient Search file from one run should not be used in the next. Note that Cube ME may itself constrain Model Parameters. This occurs when the estimated Hessian Matrix is found to be, mathematically speaking, 'non-positive definite', which arises when one or more Model Parameters are 'degenerate'.

117

Cube ME That is, a Model Parameter is not contributing independently of another model parameter. In these circumstances, Cube ME gives a message reporting how many such Model Parameters it is constraining, which is of the form: ME (I): XXX MODEL PARAMETERS ARE NOT CONTRIBUTING TO THE ESTIMATION The constrained Model Parameters are listed in Cube ME's log file. It is not necessarily a cause for concern when Cube ME constrains Model Parameters in this way, although it is a signal that not all data is of value to the estimation because it is strongly correlated with other data. For instance, link flow counts on adjacent links of a main road may refer to substantially the same trips and hence one count is (mathematically) redundant. It is thus most frequently the case that Xx Model Parameters are constrained by Cube ME, although other Model Parameters may be constrained too.

Controlling the Optimization Process


Normally program control parameters should be allowed to take their default values. However, computation times may be improved by judicious setting of the control parameters. This is discussed further in Computation Time, of this manual and Tuning Estimation Performance, of the 'Introduction to the Matrix Estimation Programs' manual.

118

Notes on Program Use

Selection of Model Form


Cube ME provides capability for the user to control the structure of the solution and how it is achieved. This section describes these possibilities. However, it may be observed that Cube ME is usually run with the full, default Model Form. It may be shown that in this form, the XK parameter is, strictly redundant, although it is of value in providing extra degrees of freedom by which Cube ME can handle the effects of errors and inconsistencies in the input data. The number of possible parameters in a model is:
two for each zone (that is the a(i) and b(j)) one for each screenline (the X(k)), and two more if any cost data is to be used ( and ).

The Model Parameter file contains a value for the parameter, the upper and lower bounds for the parameter, and a scaling factor. The scaling factor is used only to assist the optimization process in ensuring that maximum accuracy is obtained. It should be set equal to the expected value of the parameter in the final solution - it is only necessary to make this scaling factor of the same order of magnitude if there are difficulties in ensuring convergence. If no such difficulties are apparent then the scaling factor can just be set to 1.0. The lower and upper bounds for the parameters allow the user to specify the degrees of freedom which are permitted. In particular if a Model Parameter is set to 1.0 and its lower and upper bounds are also set to this value then a number of standard forms for the Matrix Estimation process can be achieved. For example: i. Setting all a(i), b(j) and X(k) equal to a fixed value (e.g. 1.0) together with their bounds then the problem is reduced to a Gravity model driven only by the cost data (i.e. only and are allowed to vary). (Note that varying values of a(i) and b(j) can be used to scale the numbers of trips per zone). Setting and and their bounds to a particular value allows the estimation process to use cost data which has been previously calibrated. Setting a(i), b(j) and and to a fixed value and allowing x(k) to be free provides a 'link constraint model'. Setting x(k), and to fixed value gives a growth factor model. Setting and to a fixed value allows a growth factor link model (note that and are only defined if cost data is supplied) . Setting a(i) and b(j) to a fixed value defines a link gravity model.

ii. iii. iv. v. vi.

In some of these cases input data although defined and requested by the program is not used by the estimation process. The data that is used in these special cases is summarized in Table 6.1.

119

Cube ME

Data/Model Type

Growth Factor X X

Gravity

Link Constrai nt X

Growth Factor Link X X

Link Gravit y

Trip Matrix & Confidences Trip End & Confidences routing information (RCP) Trip Costs

X X

X X

Table 6.1: Data Used in Different Reduced Model Forms

It should be observed that there are no specific Model Parameters associated with Part Trip data, so this form of data is not relevant to discussions of model forms.

120

Notes on Program Use

Information in the Optimization Log File


The levels of report, determined by the setting of the IREP parameter are as follows: IREP=1 A report is only produced at the end of the run. This shows: i. ii. iii. iv. v. The reason that the optimization has been halted. The value of the objective function (the maximum likelihood). The current step size. The minimum tolerance step size. In addition a number of important variables defining the size of the problem and the parameters input to Cube ME are also displayed.

IREP=2 A report is produced as for IREP 1 and also a report at each iteration. This shows: i. ii. The iteration number and whether or not progress was made at this iteration. The number of evaluations made so far in this run. This is the number of times that a matrix and associated trip and screenline data have been calculated together with the likelihood function. The current step size. The step size tolerance.

iii. iv.

IREP=3 A report is produced as for IREP 2 and also a report at each time the Model is evaluated to calculate the effects of a particular choice of Model Parameter. This shows: i. ii. iii. iv. v. vi. The evaluation number. The step size multiplier (used if no progress is made in the first evaluation to reduce the step size, ALPHA). The step size (STEP). The objective function at the last iteration and at the point at which the function evaluation is made. (FTRIAL and FBEST). A measure of the gradient at the last iteration at the point at which the function evaluation is made. Other internal variables used only for intermediate calculations (FGOLD, FJJS).

121

Cube ME In addition, reports may be output to the Execution Log file if the Gradient Search matrix is found to be unstable and has to be re-initialized.

122

Notes on Program Use

Computation Times
Program control parameters should usually take their default values. However, computation times may be improved by setting the set of parameters shown in Table 6.2. Control Parameter MXITER Comment MXITER should only be used to terminate an estimation prematurely when there is some evidence that it is safe to do so, i.e., after Cube ME has been initially allowed to reach convergence. There is a trade-off between the reduction in the number of iterations and the number of times the Hessian matrix must be re-calculated. The default value of ITERH represents an average best value, but it is worth experimenting with the value of ITERH for different types of estimation problem. In some cases, lowering the value of ITERH may guide the optimiser to a solution which it otherwise could not find. In other cases, estimating the Hessian too frequently will add to the run time, sometimes significantly. UTOL Examination of the Log file, and the screen display, will show how the convergence indicator, UNORM, is approaching the target value set by UTOL. Larger values of UTOL increase the risk of Cube ME terminating significantly away from the most likely value of the estimated matrix for the set of input data, while lower values of UTOL imply lower Standard Errors for the Model Parameters.

ITERH

Table 6.2 Parameters for Influencing Computation Times

123

Examples
Estimation with Prior Trip and Count Data Only
The following control data would be appropriate for an initial estimation when the only data available for updating a matrix is from count sites. Column 1...5...10...15...20...25...30...35..40..45..50..55..60 Estimate with Old Matrix and Count Data &FILES IMAT1='PRIOR.MAT', IDAT4='SCRL.DAT', IDAF1='ROUTES.RCP', OPRN='MVESTM.PRN', OMAT1='ESTM.MAT', ODAT1='PARM.DAT', ODAT2='GRAD.GDS', ODAT3='LOG.DAT', ODAT4='INTCPT.ICP' &END &PARAM MATID='Estimated Matrix' &END &OPTION TRPEND=F &END

125

Cube ME

Estimation with Prior Trip, Count, and Trip End Data


This is a similar run to 7.1, but additionally Trip End data is available. Also, short zone labels are included in the reports. Column 1...5...10...15...20...25...30...35..40..45..50..55..60 Estimate with an Old Matrix, Counts, and Trip End Data &FILES IMAT1='PRIOR.MAT', IDAT1='TEND.DAT', IDAT4='SCRL.DAT', IDAT7='COORD.DAT' IDAF1='ROUTES.RCP', OPRN='MVESTM.PRN', OMAT1='ESTM.MAT', ODAT1='PARM.DAT', ODAT2='GRAD.GDS', ODAT3='LOG.DAT', ODAT4='INTCPT.ICP' &END &PARAM MATID='Estimated Matrix - including Trip End Data', &END &OPTION NODLAB=T, LNGLAB=F &END

126

Examples

Estimation with 'Warm Start' and Cost Data


The following control data would be suitable if, say, some confidence levels had been altered in data input files, and where the data included both trip and cost matrices, as well as Trip End and count data. Long zone labels are to be included in the reports. Column 1...5...10...15...20...25...30...35..40..45..50..55..60 Re-Estimation with altered Confidence Levels &FILES IMAT1='PRIOR.MAT', IDAT1='TEND.DAT', IDAT2='PARM.DAT', IDAT3='GRAD.DAT', IDAT4='SCRL.DAT', IDAT7='COORD.DAT' IDAT5='INTCPT.ICP', IDAF1='ROUTES.RCP', OPRN='MVESTM.PRN', OMAT1='ESTM2.MAT', ODAT1='PARM2.DAT', ODAT2='GRAD2.GDS', ODAT3='LOG.DAT' &END &PARAM TABLES=101, 102, 103, 104 MATID='Re-Estimated Matrix' &END &OPTION TRIPM=T, COSTM=T, NODLAB= T, LNGLAB=T WARMST=T &END

127

Cube ME

Estimation with Highways Part Trip Data


The following control data would be suitable where the data included Part Trip data as well as a trip matrix, Trip Ends, and count data. Column 1...5...10...15...20...25...30...35..40..45..50..55..60 Estimation including Part Trip data &FILES IMAT1='PRIOR.MAT', IDAT1='TEND.DAT', IDAT4='SCRL.DAT', IDAF1='ROUTES.RCP', INET1='PTRIPS.NET', OPRN='MVESTM.PRN', OMAT1='ESTM.MAT', ODAT1='PARM2.DAT', ODAT2='GRAD2.GDS', ODAT3='LOG.DAT', ODAT4='INTCPT.ICP', ONET1='ESTMPTRP.NET' &END &PARAM MATID='Estimated Matrix using Part Trip data', NETID='Estimated Flows (with Part Trip Flows)', EFLOW=7, NFLOW='ESTM' &END &OPTION PRTTRP=T &END

128

Examples

Estimation with Public Transport Part Trip Data


The following control data would be suitable for estimating a public transport matrix where the data included Part Trip data, organized into three Line Groups, as well as a trip matrix, Trip Ends, and count data. Column 1. 5. 10. 15 .20..25. 30..35 40. 45 .50 .55 .60 Estimation including Part Trip data by line group &FILES IMAT1='PRIOR.MAT', IDAT2='TEND.DAT', IDAT4='SCRL.DAT', IDAF1='ROUTES.RCP', IDAF2='LINES.PTL' INET1='PTRIPS.NET', OPRN='MVESTM.PRN' OMAT1='ESTM.MAT', ODAT1='PARM.DAT', ODAT2='GRAD.GDS', ODAT3='LOG.DAT', ODAT4='INTCPT.ICP' ONET1='ESTMPTRP.NET' &END &PARAM MATID='Estimated Matrix using Part Trip data', NETID='Estimated Flows (Prt Trp Flows by Line Group)' ELINE1=4, ELINE4=6, ELINE5=8, NLINE1='EXPR', NLINE4='LOCL', NLINE5='AIRP' &END &OPTION PRTTRP=T &END

129

Cube ME

Hierarchic Estimation
The following control data would be suitable for estimating a very large public transport matrix where the data included Part Trip data as well as a trip matrix, Trip Ends, and count data. Column 1...5...10...15...20...25...30...35..40..45..50..55..60Title: Large PT Estimation using Part Trip Total Link Flow &FILES IMAT1='PRIOR.MAT', IDAT1='TEND.DAT', IDAT4='SCRL.DAT', IDAF1='ROUTES.RCP', IDAF2='LINES.PTL', IDAF3='DISTRICT.DDF', IDAT6='LMCTL.DAT', OPRN='MVESTM.PRN', OMAT1=ESTM.MAT', ODAT1='PARM.DAT', ODAT2='GRAD.GDS', ODAT3='LOG.DAT', ODAT4='INTCPT.ICP' ONET1='ESTMPTRP.NET' &END &PARAM MATID='Estimated Matrix using Part Trip data', NETID='Estimated Flows (total Part Trip flow)', EFLOW=2, NFLOW='ETOT' &END &OPTION PRTTRP=T, DSTRCT=T &END

130

Examples

Example of Screenline Volumes Report


REPORTING OBSERVED/ESTIMATED SCREEN LINE COUNTS SCREENLINE CONFIDENCE OBSERVED ESTIMATED ESTM-OBSV OF ODs NO & NAME 1 A'shot Rd W-E 80.0 11677.0 11335.4 -341.6 244 2 A'shot Rd E-W 80.0 11677.0 11734.9 57.9 244 3 A3-Hogs Back S-N 200.0 27947.0 26672.2 -1274.8 160 4 A3-Hogs Back N-S 200.0 25504.0 25160.5 -343.5 160 5 Onslow St S-N 80.0 18981.0 17479.2 -1501.8 383 6 Onslow St N-S 80.0 18809.0 18285.2 -523.8 687 7 Town Centre E-W 80.0 16285.0 16556.5 271.5 904 8 Town Centre W-E 80.0 870 <continued>

OBSV(%) NO

-2.9%

0.5%

-4.6%

-1.3%

-7.9%

-2.8%

1.7%

22670.0

22494.5

-175.5

-0.8%

131

Index
A Alternative Approaches Hierarchic Estimation ..................69 Alternative Approaches ..................69 Analyzing Results .....................................55 Analyzing.....................................55 Approaches Running Cube ME ..................... 117 Approaches ................................ 117 Average Confidence Level Report ....85 Average Confidence Level Report Part Trip Data Example ...................................89 Average Confidence Level Report Part Trip Data ............................89 B Background................................... 3 C Calculations Extensions ................................41 Calculations .................................41 Common Elements ......................... 4 Computation Times ..................... 123 Computing Resources .................................. 7 Computing .................................... 7 Confidence Levels Setting .....................................51 Confidence Levels .........................51 Considerations User .........................................13 Considerations..............................13 Control Routing Information....................54 Control ........................................54 Conventions Used .......................... 6 Coordinate File ........................... 110 Cost Data .................................. 127 Cost Distribution Function ..............19 Cost Information ............................ 8 COSTM ...................................... 104 Count ........................................ 126 Count Data Only ......................... 125 Cube ME Overview ..................................17 Running ....................................77 Cube ME ......................................17 D Data Different Types ..........................19 133

Cube ME Sets .........................................22 Data ...........................................58 DDF ............................................77 Defining Districts ....................................76 Defining.......................................76 Different Types Data.........................................19 Different Types .............................19 District Definition File .................. 114 District Matrix...............................41 District Matrix Reports Example ...................................91 District Matrix Reports ...................91 Districts Defining ....................................76 Districts.......................................76 DSTRCT..............................104, 114 E EFLOW ........................................97 ELINE..........................................97 ELINEn ........................................97 Estimating Highway ...................................16 Large Matrices ...........................81 Matrix.......................................59 Estimating ...................................59 134 Calculations...............................41 Extensions ...................................41 F Final Five Iterations Report Example....................................86 Final Five Iterations Report.............86 Framework Handling Different Data Consistently............................. 9 Estimation ................................. 126 Estimation Performance Tuning ......................................53 Estimation Performance .................53 Estimation Process ........................82 Evaluation....................................61 Example Average Confidence Level Report..85 Average Confidence Level Report Part Trip Data .........................89 District Matrix Reports ................91 Final Five Iterations Report ..........86 Local Matrix Reports ...................92 Matrix Totals .............................87 Part Trip Totals Report ................90 Screenline Volumes Report ........ 131 Zone Attractions Report ..............88 Example .................................... 131 Extensions

Index Framework.................................... 9 G Gradient Search File .................... 116 H Handling Different Data Consistently Framework ................................. 9 Handling Different Data Consistently. 9 Hierarchic Estimation Alternative Approaches ...............69 Introduction ..............................67 Hierarchic Estimation.....................41 Hierarchic Estimation.....................67 HIERND ..................................... 104 Highway Estimating.................................16 Highway ......................................16 Highways Part Trip Data............... 128 I ICP .............................................54 IHTYPE ........................................97 Including Part Trip Data ............................62 Including .....................................62 Information................................ 121 Input Data ...................................79 INTCPT ...............................104, 115 Intercept File.............................. 115 Introduction Hierarchic Estimation ..................67 Mathematics ..............................25 Introduction .................................67 IREP.................................... 97, 121 ITERH ................................. 97, 123 L Large Matrices Estimating.................................81 Large Matrices ..............................81 Link Counts ..................................19 LMC ............................................77 LNGLAB ..................................... 104 Local Matrices...............................69 Local Matrix Control File ............... 113 Local Matrix Reports Example....................................92 Local Matrix Reports ......................92 M Manual ......................................... 5 Mathematical Notation ...................23 Mathematical Summary .................34 Mathematics Introduction ..............................25 Mathematics.................................25 MATID .........................................97 Matrices.......................................44 135

Cube ME Matrix .........................................59 Matrix Totals Estimating.................................59 Example ...................................87 Matrix Totals ................................87 Maximum Likelihood ......................34 Mixed District ...............................69 Model Form Selection................................. 119 Model Form ................................ 119 Model Parameter File ................... 111 MODPAR .................................... 104 MODPER .................................... 111 MXCALL .......................................97 MXFREE .......................................97 MXITER ............................... 97, 123 N NETID .........................................97 Networks .....................................46 NFLOW ........................................97 NLINE .........................................97 NODLAB .............................104, 110 O Objectives....................................10 O-D ............................................. 1 Optimization Log File ................... 121 OPTION Keywords ....................... 104 136 Options User .........................................12 Options .......................................12 Origin-Destination .......................... 1 Other Features .............................11 Outputs .......................................80 Overview Cube ME ...................................17 Overview .....................................79 P PARAM Keywords#widend ..............97 Part Trip ......................................54 Part Trip Data Including...................................62 Part Trip Data...............................19 Part Trip Data...............................62 Part Trip Totals Report Example....................................90 Part Trip Totals Report ...................90 Partial O-D Matrix .........................19 Passenger Counts..........................46 Permanent Files ............................95 Prior Trip ................................... 126 Prior Trip Matrix ............................19 PRTTRP...................................... 104 Public Transport Matrices ...............16 Public Transport Part Trip Data...... 129

Index Screenline Volumes Report ........... 131 Screenlines ..................................47 SCRFIL ...............................104, 107 Selection Model Form ............................. 119 Selection ................................... 119 Sensitivity Analysis........................61 Setting Confidence Levels.......................51 Data.........................................22 Setting ........................................51 Side Constraints ...........................41 Study Area...................................57 Summary Reports.....................................83 Summary.....................................83 T TABLES .......................................97 The Estimation Model ....................82 The Optimization Step ...................82 Traffic .........................................46 Trip Cost Matrix ............................19 Trip End Data ............................. 126 Trip End File ............................... 109 Trip Ends ............................... 19, 45 trip matrix .................................... 1 TRIPM ....................................... 104 137

R RCP ............................................54 Reports Summary ..................................83 Reports .......................................83 Resources Computing ................................. 7 Resources ..................................... 7 Results Analyzing ..................................55 Results ........................................55 route choice probability..................49 Routing Information Control .....................................54 Routing Information ......................19 Routing Information ......................54 Routings ......................................49 Running.......................................77 Running Cube ME Approaches ............................. 117 Cube ME ...................................77 Running Cube ME........................ 117 S Scope........................................... 2 Screenline File ............................ 107 Screenline Volumes Report Example ................................. 131

Cube ME TRPEND ..............................104, 109 Tuning Estimation Performance ..............53 Tuning.........................................53 U UMAX ..........................................97 User Considerations ...........................13 Options.....................................12 User............................................13 UTOL................................... 97, 123 V Variations ..................................... 4

W Warm Start ................................ 127 WARMST.............................104, 116 WIDEND ......................................97 Z ZCONF .................................. 77, 97 Zonal Detail .................................69 Zone Attractions Report Example....................................88 Zone Attractions Report .................88 Zone Generation Report .................87

138

You might also like