Statistica
Statistica
Statistica
1. Data import .. 2
2. Basic data processing .. 3
3. Data recalculation .. 5
4. Descriptive statistics 7
5. Histogram 8
6. Plot creating . 10
7. Plot formatting . 12
8. Plot points deleting ... 13
9. Plot data reading 14
10. Multiple plots of data contained in different worksheets .. 15
11. Fitting data with standard models 16
12. Fitting data with non-standard models .. 19
Statistica instruction
1
1. Data import
Fig. 1. Statistica program main window. Two example worksheets are presented.
All data managed in Statistica program are presented in a form of data
worksheets (Fig. 1). Data can be entered manually, pasted through the clipboard or
imported form a disk file.
Most frequently in labs the data are imported from the disk files created by
software applied in the particular exercise in a form of DAT or TXT text files. Use File
Open option in order to import the data. A dialog window occurs in order to
define the location and name of imported file. The default file type filter applied in the
open dialog does not allow to see the text files so All files (*.*) filter has to be
chosen in Pliki typu (File types) edit box. Ones a file location and a file name was
defined use Otwrz (Open) button which opens subsequent dialog window entitled
Importing file. In all cases valid in labs, both Import as Spreadsheet and Delimited
options, should be checked.
Next window called Import Delimited Text Files (Fig. 2) contains a lot of
options. but only few are used in labs. If the use of any particular option would be
necessary the detailed suggestions will be contained in the exercise instruction.
Statistica instruction
2
Here only one problem should be pointed out. The most of problems with the
data import occurs when the data decimal separator (comma or dot) is different from
the decimal separator defined as the default in the operating system.
It is quite easy to see what is the data decimal separator as the file preview is
visible in the lower part of the dialog. Check Decimal separator character option
and put the same separator into the edit box next to the option, as visible in the
preview. Than use Refresh View button. If the data columns are well separated and
the Format String contains a string of R letters only without any T letters and
numbers the import will be performed properly. In the example shown in Fig. 2 the
decimal separator was not defined properly. T10 T5 string visible in Format String
field means that the program recognized data as two columns table and the columns
represent text variables consisted of 10 and 5 characters.
In case if the chosen options are not working well use Reset button and start
with another options combination.
denoted as 3 Var3. Var3 is the name of variable represented by the column, and 3
is the column number. Column names and numbers are important because can be
used in formulas used for data recalculations.
How to create an empty worksheet?
When a new worksheet is created (File New option, Spreadsheet tab)
the columns number (Number of variables:) and the rows number (Number of
cases:) have to be defined. Other options usually stay not changed. In the created
worksheet the defined numbers of columns and rows are marked with white, while
the rest of the worksheet is marked in gray. As far as the columns/rows numbers will
not be redefined the gray part of the worksheet is not accessible.
How to change the data in the worksheet?
Click the appropriate worksheet cell and input the new value. The change is
confirmed with the use of ENTER button or after changing focus to the other cell.
How to delete the cell contents?
Click onto the cell and use DELETE button.
How to add new columns/rows at the end of the worksheet?
Double click on the gray worksheet area. The Add cases and/or variables
dialog appears allowing the definition of the number of new columns/rows. The
default values of dialog edit boxes depend on the position which was double clicked.
For instance if one would like to introduce two new columns and three new rows in
the worksheet shown in Fig. 1, he has to double click two positions to the left and
three positions down from the most right and bottom active (white) cell.
How to add a new column/row in any worksheet position?
The Insert option allows the definition of new columns/rows. Add variables
or Add cases should be chosen depending on needs. In the case of columns
insertion How many:, After: and Name: fields should be fulfilled while only two: How
many: and Insert after case: edit boxes in the case of rows insertion.
How to delete a piece of worksheet?
It is possible to remove whole rows or columns from the worksheet. In order to
achieve that, mark the area which should be removed and use Edit Delete
Variables or Edit Delete Cases option.
Statistica instruction
4
Fig. 3. The dialog window used for the variable (column) properties definition.
3. Data recalculation
There are few different possibilities of performing the data recalculations. Most
often the effect of data recalculations is stored in a new column and is the effect of
using the variables stored in the worksheet earlier.
The simples possibility is to use the dialog window applied for column
properties definition (Fig. 3). It is opened by double click on the column header.
The formula used for data recalculation should be introduced in Long name
(label or formula with Functions): edit box. For instance in the case when in
column called Var3 the sum of columns Var1 and Var2 has to be calculated put the
Statistica instruction
5
following formula: =Var1+Var2 into the edit box. Notice the Functions button above
the edit box. It can be used for the choice of the functions implemented in Statistica.
Another possibility is Data Batch Transformation Formulas option. It is
opening the dialog window shown in Fig. 4. The edit box contained in the dialog
allows for the definition of very complicated formulas applying numerous functions,
operators and variables. It is possible to recalculate many variables at the same time
with separated formulas.
If concern the example shown in Fig. 4, the sum of variables Var1 and Var2 will
be stored in variable Var3. At the same time the values of column Var4 will be
calculated as Var1 multiplied by the square root of variable Var2.
Statistica instruction
6
4. Descriptive statistics
There are three methods for calculating statistical parameters for variables.
The amount of achieved information is different in these three methods and they are
not equivalent.
Method 1: Click twice the column header for desired variable and choose
Values/Stats button in the appeared dialog window (Fig. 3). The average, the
standard deviation and the number of valid cases is displayed (Fig. 5). It is possible
to copy statistics to the clipboard with
button.
Method 2: Mark a part of the worksheet for which the statistics have to be
calculated. Click ones with the right mouse button to display the context menu
(Fig. 6) and choose Statistics of Block Data Block Columns. Than pick the
statistics which should be produced, e.g. All. Calculated results are placed in a new
worksheet.
Statistica instruction
7
5. Histogram
Use Graphs Histograms option in order to create the histogram of data
stored in the worksheet. The option opens a dialog window shown in Fig. 7. Variables
button displays form allowing choice of variables for analysis (Fig. 8).
Fig. 8. Select Variables for Histogram window allowing the choice of columns for
histogram analysis.
Statistica instruction
8
Statistics calculated In
the histogram analysis
procedure.
Statistica instruction
9
Fig. 10. The dialog box used for the histogram range redefinition.
Descriptive
statistics.
6. Plot creating
A plot is a set of points marked in the coordinating system and usually illustrates
a dependence between two quantities e.g. pressure as a function of time. Avery point
is characterized by two co-ordinates: Y called dependent variable, ad X called
independent variable. In this nomenclature Y is a function of X.
The first thing to do before creating the plot is the choice of independent and
dependent variables. Variables are represented in Statistica by data columns.
Program allows the plot of many dependences on single plot. In order to create the
plot one independent (X) variable (column) has to be chosen while many dependent
variables (Y) can be applied.
Statistica instruction
10
Most plots in labs, except from histograms described in the previous point, can
be created with Graphs Scatterplots option (Fig. 12) which displays
2D Scatterplots dialog window (Fig. 13).
Fig. 14. The dialog box allowing the choice of variables used for plot preparation.
Scatterplot of Var2 against Var1
RR-Proba 2v*218c
1,25
1,20
1,15
Var2
1,10
1,05
1,00
0,95
0,90
0,85
0,80
0,80
0,85
0,90
0,95
1,00
1,05
1,10
1,15
1,20
1,25
Var1
7. Plot formatting
In order to modify any plot element click it double. The appearing menu
depends on the element one would like to alter. E.g. when double click on Var1 axis
title was done in example shown in Fig 15, a dialog window allowing the axis title
edition appears (Fig. 16a).
In all cases when any graph element is chosen by double clicking a Graph
Options dialog window is displayed but depending on situation it is opened with
different options tree position. The options structure is shown on the left site of the
window (Fig. 16a,b). The Graph Options window could be always displayed from the
right mouse button context menu.
Statistica instruction
12
a)
b)
Statistica instruction
13
marked points can be removed by clicking on them with the right mouse button and
choosing Brushing Off option in appearing context menu (Fig. 17b).
In order to switch off the brushing mode use ESC button from the clipboard.
a)
b)
Statistica instruction
14
Statistica instruction
15
Statistica instruction
16
Statistica instruction
17
Fig. 22. Example of a plot with linear model fitted to the experimental data.
Fig. 23. Graph Options window opened in Fitting section. The range list box allows
for the redefinition of the data range taken into consideration when the mathematical
model is fitted.
Statistica instruction
18
Fig. 24. Nonlinear Estimation option allows fitting of any mathematical model to the
experimental data.
Fig. 25. A dialog window used for the estimation method choice.
Statistica instruction
19
The dialog window showed In Fig. 26 allows for the definition of mathematical
model used for data fitting. There are some examples showing the syntax used here
on the bottom.
Variables names, operators and Statistica functions can be used in model
function definition. All characters and character strings not recognized as one of the
categories mentioned above are treated as model parameters.
Following string was defined in the example showed in Fig. 26:
PrzY=A+B*Exp(-C*X). It means that the function y = A + B eC x will be used. PrzY
variable (column 4 in the worksheet) plays the role of dependent variable (y) while X
(column 1) is the independent variable (x). A, B and C are model parameters which
will be calculated in the estimation (fitting) procedure.
Fig. 26. Dialog window used for definition of the mathematical model.
After model function is set use OK button to come back to User-Specified
Regresion, Last Squares dialog. The next procedure stage starts after using OK
button. Dialog window showed in Fig. 27 allows the fitting procedure start (OK
button). The fitting procedure is based on the minimization method. The computer
looks for the best parameters values starting from some initial values. The default
initial values for parameters are always the same and are equal to 0.1. This default
choice is usually not proper and sometimes causes some errors when trying to fit the
model (e.g. Predictors are probably very redundand; estimates suspect). In other
cases the model could be fitted without errors, but the fitted curve will not fit perfectly.
In both cases the solution is to choose new initial parameters.
Statistica instruction
20
Use Advanced tab to do that (Fig. 27). There is a button titled Start values:
which allows the definition of parameters initial values (Fig. 28). The choice of initial
values is not easy and needs some experience. The simpler solution is to set all
parameters for 0. If this would not work it is necessary to take into consideration the
physical interpretation of experimental data and mathematical model to guess the
initial values. Some solution could be also to try to fit the simpler but standard model
to the experimental data and use its fitted parameters as the initials for the nonstandard model.
After possible choice of starting parameters values use OK button in window
showed in Fig. 27. If the procedure would be succeed results window appears (Fig.
29). Two buttons need attention here: Summary and Fitted 2D function &
observed vals.
Summary button displays the worksheet with fitting procedure results (Fig. 30).
There are some statistical data and the most important among them are parameters
values (first column) and their errors (second column).
Fitted 2D function & observed vals button creates a plot with fitted curve.
There is also the estimated function equation visible in the plot header.
Fig. 27. The window used for data fitting with non-standard models. Advanced tab
with Start values: button allows for the initial parameters values definition.
Statistica instruction
21
Fig. 28. The dialog window used for initial values definition.
Statistica instruction
22