Xlstat® Tip Sheet For Business Statistics - Cengage Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

XLSTAT® TIP SHEET FOR BUSINESS STATISTICS – CENGAGE LEARNING

INTRODUCTION

XLSTAT makes accessible to anyone a powerful, complete and user-friendly data analysis and statistical solution. Accessibility to the
XLSTAT interface totally relies on Microsoft Excel, whether for inputting the data or for displaying the results. The power of XLSTAT is
a result of all computations having been developed with the C++ programming language. Users’ suggestions have led to software
improvements that offer a complete solution to a well-adapted variety of requirements. The user-friendly interface facilitates the
use of some statistical methods that might require hours of training with other software.

More detail about the statistical processes described in this card can be found in the Help option available on the XLSTAT ribbon.

COMMON DIALOG BOX TOOLS

Click to reset the dialog box to default options.

Click to delete existing data selections of the dialog


box.

The blue arrow indicates that data are displayed in


columns; change the arrow to red to indicate that data
are displayed by rows.

You may choose to locate the generated subsample


beginning in a cell in the existing worksheet (Range), in
a new worksheet within the current workbook (Sheet),
or in a new workbook (Workbook).

DATA SAMPLING

Use the Data Sampling tool found in the Preparing data dropdown object to generate a subsample of observations from a dataset.

1
Select the data in the Excel worksheet, choose the sampling method, and enter the sample size to be generated. You may choose to
locate the generated subsample beginning in a cell in the existing worksheet, in a new worksheet within the current workbook, or in
a new workbook. The variable labels checkbox is selected if the first row of the data contains a data label (header). Default settings
indicate that a report header will be displayed and that the sample output data will automatically be permuted. Uncheck these
options if no report header is required and if you want to display the data in the order that it was generated.

Sample report header associated with using the Random without replacement sampling method to generate a sample of 500 from

the given dataset.

DESCRIPTIVE STATISTI CS

The Descriptive Statistics tool (found in the Describing data tab) is used to calculate descriptive statistics and display univariate plots
for quantitative and/or qualitative variables.

GENERAL TAB
In the General tab, sample data is selected and the report output location is specified.

2
OPTIONS TAB
The Options tab provides options for descriptive statistics, charts, normalization and scaling of data, and specification of a
confidence interval.

OUTPUTS TAB
The Outputs tab allows you to choose the specific options you want to calculate for quantitative and/or qualitative data.

3
CHARTS(1) TAB
The Charts(1) tab specifies chart types for quantitative data including box plots, scatter grams, stem-and-leaf plots, P-P plots, and Q-
Q charts.

CHARTS(2) TAB
The Charts(2) tab provides charting options for qualitative data including bar charts, pie charts, doughnuts, stacked bars, multiple
bars. In this tab you may choose to plot frequencies or relative frequencies of the categories. This tab becomes available when the
Qualitative data option is selected in the General tab.

4
HISTOGRAMS

Histograms – Use this tool (found in the Visualizing and Describing data tabs) to create a histogram from a sample of continuous or
discrete quantitative data. Data and data type are selected in the General tab.

OPTIONS TAB
Use the Options tab to define the intervals for the histogram. These intervals may be defined by Number:, Range:, or User defined:.
Activate the Minimum option to specify the lower value of the first interval.

MISSING DATA TAB


The Missing data tab allows you to remove missing data or estimate missing data using the mean of the sample.

OUTPUTS TAB
The Outputs tab allows you to display the descriptive statistics of the samples (default).

5
CHARTS TAB
The Charts tab allows you to activate the display of the histogram. The histogram may be displayed as a bar or continuous line. The
cumulative histograms option provides a display of cumulated histograms of the sample(s). The Ordinate of the histograms:
provides the quantity to be used in the histogram as density, frequency, or relative frequency.

6
SCATTERPLOTS

Use this tool (found in the Visualizing Data tab) to create 2- or 3–dimensional plots (represented by the size of the point).

GENERAL TAB
In the General tab, specify the data along the X-axis in the X: field and the data along the Y-axis in the Y: field. The Variable labels
checkbox frequencies for each point on the chart, and a display of confidence ellipses.

OPTIONS TAB
The Options tab displays by default a legend for the histogram. Additional options provide for a Matrix of plots, display of
Frequencies for each point on the chart, and a display of Confidence ellipses.

7
LINEAR REGRESSION

Linear regression is the most frequently used statistical method. The principle of linear regression is to model a quantitative
dependent variable Y through a linear combination of p quantitative explanatory variables, X 1, X2, …, Xp. While a distinction is usually
made between simple regression (with only one explanatory variable) and multiple regression (several explanatory variables), the
overall concept and calculation methods are identical.

Use this tool (found in the Modeling Data tab) to create a simple or multiple linear regression model for explanation or prediction.

GENERAL TAB
In the Y / Dependent variables: box select the response variable(s) you want to model. If several variables have been selected,
XLSTAT carries out calculations for each of the variables separately. In the X / Explanatory variables: box select the quantitative
explanatory variables in the Excel worksheet. The data selected must be of type numeric. Activate the Qualitative: data option and
select qualitative variables to perform an ANCOVA analysis. Activate the Variable labels option if the first row of the data selections
(dependent and explanatory variables, weights, observations labels) includes a header.

8
OPTIONS TAB
Activate the Fixed Intercept: option to set the regression constant to a fixed value (default 0).The Tolerance: option (default 0.0001)
prevents the OLS regression algorithm from considering variables which are constant or too correlated with other variables in the
model. The Interactions/Level: option allows you to include interactions in the model. Select a value between 1 and 4. Use the
Confidence interval (%): option to enter the percentage range of the confidence interval (default .95). Activate the Model selection:
option if you want to indicate one of four available selection methods: best model regression, stepwise regression, forward, and
backward regression.

VALIDATION TAB
In the Validation tab, activate the Validation option if you want to use a sub-sample of data to validate the model. Define the
method for obtaining the validation set as random, N last rows, N first rows, or Group variable.

9
PREDICTION TAB
Activate the Prediction option in the Prediction tab to select data to use in prediction model. The data for Quantitative or Qualitative
variables must not include variable labels.

MISSING DATA TAB


Activate the Remove the observations: option to remove the observations with missing data. The observations may be removed for
each Y separately or across all Ys. Activate the Estimate missing data: option to estimate missing data before starting computations.
The missing data may be estimated as the Mean or mode or Nearest neighbor.

OUTPUTS TAB
In the General sub-tab of the Outputs tab, defaults output is selected for Descriptive statistics, Correlations, Analysis of variance,
Standardized coefficients, Predictions and residuals. In the Contrasts sub-tab of the Outputs tab, activate the Compute contrasts
10
option to compute contrasts, then select the contrasts table. There must be one column per contrast and one row for each
coefficient of the model.

CHARTS TAB
In the Charts tab, Regression charts, Standardized coefficients, Predictions and residuals, and Confidence intervals are displayed by
default. De-select any option to remove its display.

ANALYSIS OF VARIANCE (ANOVA)

Use this tool to carry out ANOVA (ANalysis Of VAriance) of one or more balanced or unbalanced factors.

ANOVA uses the same conceptual framework as linear regression. The presented dialog box options will be much the same. The
main difference is that the explanatory variables are qualitative. Thus, required data organization for ANOVA in XLSTAT differs from
data normally available for datasets provided with most texts. Consider the following data from National Computer Products, Inc. A
11
quality awareness examination was given to employees from three plants. Managers want to test the hypothesis that the mean
examination score is the same for all three plants. You might receive (or expect to receive) the data in the following format:

XLSTAT expects that the data is organized as found in the Regression format where the quantitative data are in a single column. This
data is delineated by city by qualitative data in an adjacent column as shown here:

GENERAL TAB
In the General tab, select the Quantitative: data you want to model as the Y/Dependent variables (data from column F above). In
the X/Explanatory variables section choose Qualitative: explanatory variables (the factors from column E in the above example).
Note that the data as presented does not include Variable labels and the output will be produced in a new worksheet.

12
OPTIONS TAB
Activate the Fixed Intercept: option to set the regression constant to a fixed value (default 0). The Tolerance: option (default 0.0001)
prevents the OLS regression algorithm from considering variables which are constant or too correlated with other variables in the
model. The Interactions/Level: option allows you to include interactions in the model. Select a value between 1 and 4. Use the
Confidence interval (%): option to enter the percentage range of the confidence interval (default .95).

A number of Constraints: are available as a1 = 0, an = 0 (default), Sum (ai) = 0, and Sum (ni.ai) = 0. These options are fully described
in the Description section of the Help system in XLSTAT.

Activate the Random effects option to include random factors in the model displayed as an expected mean squares table.

VALIDATION TAB
Activate this option to use a sub-sample of the data to validate the model. Define how to obtain the observations as random, N last
rows, N first rows, or a group variable which is a binary with the 1s option identifying the observations to use for the validation.

13
PREDICTION TAB
Activate the Prediction option in the Prediction tab to select data to use in prediction model. The data for Quantitative: or
Qualitative: variables must not include variable labels.

CORRELATION TESTS

Use this tool (in the Correlation/Association tests tab) to compute the correlation coefficients of Pearson, Spearman or Kendall,
between two or more variables, and to determine if the correlations are significant or not. Several visualizations of the correlation
matrices are proposed. The Pearson correlation coefficient corresponds to the classical linear correlation coefficient for continuous
data. Its value ranges from -1 to 1, and it measures the degree of linear correlation between two variables.

GENERAL TAB
Select a table in Excel of N observations (rows) described by P variables (columns)in the Observations/variables table: option.
Activate the Weights: option as a number greater than or equal to 0 if the observations are weighted. Check the Subsamples: option
14
to select a column showing the names or indexes of the subsamples for each of the observations. All computations are then
performed.

MISSING DATA TAB


Use the Missing data tab to determine how missing data should be treated in calculations: Do not accept missing data, Remove the
observations, Pairwise deletion or Estimate missing data using the mean or mode or nearest neighbor.

OUTPUTS TAB
In the Output tab, default settings include descriptive statistics, correlations, p-values, and coefficients of determination. Choose the
Sort the variables option to provide the data sorted alphabetically by variable name.

15
CHARTS TAB
Use the charts tab to visualize the correlation matrix as a set of Correlation maps or Scatter plots. Correlation maps may be provided
in blue-red, black and white, or patterns scales to represent positive and negative correlations. Present scatter plots as histogram or
Q-Q matrix plots or confidence ellipses, for a x% confidence interval for a bivariate normal distribution. Additionally colors by
correlation may be added.

PARAMETRIC TESTS

Use this tool (in the Parametric tests tab) to conduct one- and two-sample t and z tests, one-and two-sample variance tests, z-tests
for one and two proportions, and the multinomial goodness of fit test. Tests are said to be parametric since their use requires the
assumption that the samples are distributed normally. Additionally, it is also assumed that observations are independent and
identically distributed.

ONE-SAMPLE T-TEST AND Z-TEST

16
Use this tool to compare the mean of a normally-distributed sample with a given value. Two parametric tests are possible. Student's
t test if the true variance of the population from which the sample has been extracted is not known; the variance of sample s 2 is
used as the variance estimator. Use the z test if the true variance s² of the population is known. Three types of test are possible
depending on the alternative hypothesis chosen:

Two-tailed test Ha: u ≠ u0

Left one-tailed test Ha: u < u0

Right one-tailed test Ha: u > u0

GENERAL TAB
Select the data in the Excel worksheet. Choose the Data format: as One column/row per sample which considers each column or row
as a sample and allows for testing hypotheses on several samples at the same time OR One sample: to consider all values as a single
sample. Activate the Column/row labels option if the selected data contain labels. Activate the z Test or Student’s t Test option
depending on the type of test to be conducted.

OPTIONS TAB
Choose the Alternative hypothesis: to be used for the test. Enter the value of the Theoretical mean: with which the mean of the
sample is to be compared, and the Significance level (%): for the tests (default value: 5%). If the z test option has been selected,
indicate whether the Variance for the z-test: should be Estimated using samples or User defined: which requires a value to be
entered for the variance.

17
MISSING DATA TAB
Activate the desired option: Do not accept missing data OR Remove the observations. If the missing data option is activated, XLSTAT
will discontinue calculations if missing values are detected. The remove observations option will remove observations with missing
data and continue processing.

OUTPUTS TAB
The Outputs tab allows you to display the descriptive statistics of the samples (default).

18
TWO-SAMPLE T-TEST AND Z-TEST

Use this tool to compare the means of two normally distributed independent or paired samples . A distinction is made between
independent samples (for example a comparison of annual sales by shop between two regions for a chain of supermarkets), or
paired samples (for example if comparing the annual sales within the same region over two years). When comparing the means of 2
independent samples, use Student's t test if the true variance of the populations from which the samples are extracted is not known;
the z test if the true variance s² of the population is known. When the Student’s t Test is used a decision must be made about the
equality/inequality of variances. When comparing the means of two paired samples, sample sizes of the paired samples must be the
same size. Where values are missing from certain observations, either the observation is removed from both samples or the missing
values are estimated.

Three types of test are possible depending on the alternative hypothesis chosen

Two-tailed test Ha: u1 – u2 ≠ D

Left one-tailed test Ha: u1 – u2 < D

Right one-tailed test Ha: u1 – u2 > D

GENERAL TAB
Sample 1: If the format of the selected data is "one column per variable", select the data for the various samples in the Excel
worksheet. If the format of the selected data is "one column per sample" or "paired samples", select a column of data corresponding
to the first sample.

Sample 2: If the format of the selected data is "one column per variable", select the data identifying the two samples to which the
selected data values correspond. If the format of the selected data is "one column per sample" or "paired samples", select a column
of data corresponding to the second sample.

Choose the Data format: as One column/row per sample, One column/row per variable, or Paired samples. Activate the Column/row
labels option if the first row/column of the selected data contain labels. Activate the z-test or Student’s t test to indicate the type of
test to execute.

19
OPTIONS TAB
Choose the Alternative hypothesis: to be used for the test. Enter the value of the supposed difference between the samples as the
Hypothesized difference (D): Enter the significance level for the tests (default value: 5%) as the Significance level (%):. A Weights:
option is available if the data format is “One column/row per variable” or if the data are paired. Check this option if the observations
are weighted.

Where a z test has been requested, the value of the known variance of the populations, or, for a test on paired samples, the variance
of the difference must be entered. Choose the Estimated using samples option to estimate the variance of the population from the
sample data (leads to a t test). Choose the User defined: option to enter the values of the known variances of the populations. For t-
test samples variances, activate Assume equality to consider sample variances equal, Cochran-Cox to calculate the p-value using the
Cochran and Cox method where the variances are assumed to be unequal, and Use an F test to use Fisher’s F test to determine
whether the variances of both samples can be considered to be equal or not.

MISSING DATA
In the Missing data tab, activate the appropriate option to not accept missing data, remove the observations, or ignore missing data
(default).

20
OUTPUTS TAB
The Outputs tab allows you to display the descriptive statistics of the samples (default).

CHARTS TAB
Activate the Dominance diagram: option to display a dominance diagram in order to make a visual comparison of the samples.

21
ONE-SAMPLE VARIANCE TEST
Use this tool to compare the variance of a normally-distributed sample with a given value. The sample variance, s2, follows a scale
chi-squared distribution with n-1 degrees of freedom. Three types of test are possible depending on the alternative hypothesis
chosen:

For the two-tailed test, the alternative hypothesis is Ha: σ2 ≠ σ02

In the left-one-tailed test, the alternative hypothesis is Ha: σ2 < σ02

In the right one-tailed test, the alternative hypothesis is Ha: σ2 > σ02

GENERAL TAB
Select the Data: in the Excel worksheet. Activate the Data format: option to select one column per sample when multiple samples
are available in the dataset. Activate the Column/row labels option if the selected data contain labels.

OPTIONS TAB

22
Choose the Alternative hypothesis: to be used for the test.

Theoretical variance: Enter the value of the theoretical mean with which the mean of the sample is to be compared.

Significance level (%): Enter the significance level for the tests (default value: 5%).

MISSING DATA
Activate the appropriate option to not accept missing data or remove the observations with missing data.

OUTPUTS TAB
The Outputs tab allows you to display the descriptive statistics of the samples (default).

23
TWO-SAMPLE COMPARISON OF VARIANCE
Use this tool to compare the variances of two samples Take a sample S 1 comprising n1 observations with variance s1². Take a second
sample S2 comprising n2 observations with variance s2². XLSTAT offers three tests for comparing the variances of the two samples.

Fisher’s F test

The test statistic F is given by: F = s12 / s22 This statistic follows a Fisher distribution with (n1-1) and (n2-1) degrees of freedom if both
samples follow a normal distribution. Three types of test are possible depending on the alternative hypothesis chosen:

For the two-tailed test, the alternative hypothesis is Ha: σ12 ≠ σ22

In the left-one-tailed test, the alternative hypothesis is Ha: σ12 < σ22

In the right one-tailed test, the alternative hypothesis is Ha: σ12 > σ22

Levene’s test

Levene’s test is a two-tailed test for which the null and alternative hypotheses are given by the following for the case where two
variances are being compared: H0: σ12 = σ22

Ha: σ12 ≠ σ22

The statistic from the Levene test is more complex than that from the Fisher test. It follows a Fisher’s F distribution with 1 and n 1 +
n2-2 degrees of freedom.

Bartlett’s homogeneity of variances test

Bartlett’s test is sensitive to the normality of the data. In other words, if the hypothesis of normality of the data seems fragile, it is
better to use Levene's or Fisher's test. On the other hand, Bartlett's test is more powerful if the samples follow a normal distribution.
Bartlett's statistic follows a Chi-square distribution with one degree of freedom.

GENERAL TAB
Sample 1: If the format of the selected data is "one column per variable", select the data for the various samples in the Excel
worksheet. If the format of the selected data is "one column per sample" or "paired samples", select a column of data corresponding
to the first sample.

24
Sample 2: If the format of the selected data is "one column per variable", select the data identifying the two samples to which the
selected data values correspond. If the format of the selected data is "one column per sample" or "paired samples", select a column
of data corresponding to the second sample.

Choose the Data format: option as One column/row per sample or One column per variable. Activate the Column/row labels option
if the first column (or row) of the selected data contains labels.

Activate the Fisher’s F-test, Levene’s test, or Bartlett’s test to indicate the test statistic to compute.

OPTIONS TAB
Choose the Alternative hypothesis: to be used for the test. Enter the value of the supposed ratio between the variances of the
samples (typically 1) in the Hypothesized ratio (R ): option. Enter the significance level for the tests (default value: 5%) as the
Significance level (%):.

MISSING DATA TAB


Activate the appropriate option to not accept missing data or remove the observations with missing data.

25
OUTPUTS TAB
The Outputs tab allows you to display the descriptive statistics of the samples (default).

Z-TEST FOR ONE PROPO RTION


Use this test to compare a proportion calculated from a sample with a given proportion. Let n be the number of observations
verifying a certain property among a sample of size N. The proportion of the sample verifying the property is defined by p = n / N. Let
p0 be a known proportion with which we wish to compare p and D be the assumed difference (p – p0). Three types of test are
possible depending on the alternative hypothesis chosen:

For the two-tailed test, the alternative hypothesis is Ha: p – p0 ≠D

In the left-one-tailed test, the alternative hypothesis is Ha: p – p0 <D

26
In the right one-tailed test, the alternative hypothesis is p – p0 >D

GENERAL TAB
In the Data format: area activate Frequency if you would prefer to enter the value of the number of observations for which the
property is observed, or activate Proportion to enter the proportion observed. Based on your option in the Data format: section,
indicate the Frequency: as the number of observations or corresponding proportion. Enter the number of observations in the
Sample size: option. Enter the value of the test proportion with which the proportion observed is to be compared in the Test
Proportion: option. The z test and Continuity correction options are automatically selected. To deselect an option, click on the check
box.

OPTIONS TAB
Choose the Alternative hypothesis: to be used for the test. Enter the value of the supposed difference between the proportions of
the sample in the Hypothesized ratio (D ): option. Enter the significance level for the tests (default value: 5%) as the Significance
level (%):. Choose the Variance (confidence interval): method to estimate the variance of the proportion. Activate Sample to
compute the variance using the proportion obtained for the sample. Activate Test proportion to compute the variance using the test
proportion and the size of the sample. Select the method used to compute the Confidence interval: as Wald, Wilson score, Clopper-
Pearson, or Agresti-Goull.

27
Z-TEST FOR TWO PROPORTIONS
Use this tool to compare two proportions calculated for two samples. Let n1 be the number of observations verifying a certain
property for sample S1 of size N1, and n2 the number of observations verifying the same property for sample S 2 of size N2. The
proportion of sample S1 verifying the property is defined by p1 = n1 / N1, and the proportion for S2 is defined by p2 = n2 / N2. Let D be
the assumed difference (exact, minimum or maximum) between the two proportions p 1 and p2. D is usually 0. Three types of test are
possible depending on the alternative hypothesis chosen:

For the two-tailed test, the alternative hypothesis is Ha: p1 – p2 ≠D

In the left-one-tailed test, the alternative hypothesis is Ha: p1 – p2 <D

In the right one-tailed test, the alternative hypothesis is Ha: p1 – p2 >D

GENERAL TAB
In the Data format: area activate Frequencies if you would prefer to enter the value of the number of observations for which the
property is observed, or activate Proportions to enter the proportion observed. Based on your option in the Data format: section,
indicate the Frequency 1: as the number of observations or corresponding proportion for Sample 1. Enter the number of
observations for Sample 1 in the Sample size 1: option. Indicate the Frequency 2: as the number of observations or corresponding
proportion for Sample 2. Enter the number of observations for Sample 2 in the Sample size 2: option. Enter the value of the test
proportion with which the proportion observed is to be compared in the Test Proportion: option. The z test option is automatically
selected. Activate the Monte Carlo method: option to compute the p-value using Monte Carlo simulations. Enter the number of
simulations to perform.

28
OPTIONS TAB
Choose the Alternative hypothesis: to be used for the test. Enter the value of the supposed difference between the proportions of
the sample in the Hypothesized ratio (D ): option. Enter the significance level for the tests (default value: 5%) as the Significance
level (%):. Choose the Variance: method used to estimate the variance of the difference between the proportions. Select the radio
button that corresponds to the correct formula.

MULTINOMIAL GOODNESS OF FIT TEST


Use this tool to test whether the observed frequencies of the values (categories) of a qualitative variable correspond to the expected
frequencies or proportions. The test is based on the multinomial distribution which is the extension of the binomial distribution
when there are more than two possible outcomes. Let k be the number of possible values (categories) for variable X. We write p1,
p2, …, pk the probabilities (or densities) corresponding to each value. Let n1, n2, n3, …, nk be the frequencies of each value for a
sample.

The null hypothesis of the test writes:

H0: The distribution of the values in the sample is consistent with what is expected, meaning the distribution of the sample is not
different from the distribution of X.

The alternative hypothesis of the test writes:


29
Ha: The distribution of the values in the sample is not consistent with what is expected, meaning the distribution of the sample is
different from the distribution of X.

XLSTAT offers the Chi-square test and the Monte Carlo test as alternatives for generating the test statistic

MULTINOMIAL GOODNESS OF FIT TEST TAB


In the Data Format: options activate the option indicating whether you will choose expected Frequencies or expected Proportions.In
the Frequencies: option select the data corresponding to the observed frequencies in the Excel worksheet. In the Expected
proportions: option select the data corresponding to the expected frequencies or to the expected proportions. If you select
expected frequencies, they must sum to the same value as the sum of the observed frequencies. Activate the Column labels option if
the first line of the data selected contain a label. Activate the Chi-square test: option to use the Chi-Square test. Activate the Monte
Carlo method: to use the simulation method and enter the number of simulations. Enter the significance level for the tests (default
value: 5%) as the Significance level (%):.

30

You might also like