Statistica

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Students Laboratory, Department of Biophysics

1. Data import .. 2
2. Basic data processing .. 3
3. Data recalculation .. 5
4. Descriptive statistics 7
5. Histogram 8
6. Plot creating . 10
7. Plot formatting . 12
8. Plot points deleting ... 13
9. Plot data reading 14
10. Multiple plots of data contained in different worksheets .. 15
11. Fitting data with standard models 16
12. Fitting data with non-standard models .. 19

Statistica instruction
1

Students Laboratory, Department of Biophysics

1. Data import

Fig. 1. Statistica program main window. Two example worksheets are presented.
All data managed in Statistica program are presented in a form of data
worksheets (Fig. 1). Data can be entered manually, pasted through the clipboard or
imported form a disk file.
Most frequently in labs the data are imported from the disk files created by
software applied in the particular exercise in a form of DAT or TXT text files. Use File
Open option in order to import the data. A dialog window occurs in order to
define the location and name of imported file. The default file type filter applied in the
open dialog does not allow to see the text files so All files (*.*) filter has to be
chosen in Pliki typu (File types) edit box. Ones a file location and a file name was
defined use Otwrz (Open) button which opens subsequent dialog window entitled
Importing file. In all cases valid in labs, both Import as Spreadsheet and Delimited
options, should be checked.
Next window called Import Delimited Text Files (Fig. 2) contains a lot of
options. but only few are used in labs. If the use of any particular option would be
necessary the detailed suggestions will be contained in the exercise instruction.

Statistica instruction
2

Students Laboratory, Department of Biophysics

Here only one problem should be pointed out. The most of problems with the
data import occurs when the data decimal separator (comma or dot) is different from
the decimal separator defined as the default in the operating system.
It is quite easy to see what is the data decimal separator as the file preview is
visible in the lower part of the dialog. Check Decimal separator character option
and put the same separator into the edit box next to the option, as visible in the
preview. Than use Refresh View button. If the data columns are well separated and
the Format String contains a string of R letters only without any T letters and
numbers the import will be performed properly. In the example shown in Fig. 2 the
decimal separator was not defined properly. T10 T5 string visible in Format String
field means that the program recognized data as two columns table and the columns
represent text variables consisted of 10 and 5 characters.
In case if the chosen options are not working well use Reset button and start
with another options combination.

Fig. 2. Data import dialog window.

2. Basic data processing


The operations of data contained in Statistica worksheets resemble the way
of processing in MS Excel or Open Office Calc programs.
In order to mark the whole data column click the gray button in the column
header denoted with the column name and number. For instance in order to mark the
third column from the left in the example shown in Fig. 1, one has to click the button
Statistica instruction
3

Students Laboratory, Department of Biophysics

denoted as 3 Var3. Var3 is the name of variable represented by the column, and 3
is the column number. Column names and numbers are important because can be
used in formulas used for data recalculations.
How to create an empty worksheet?
When a new worksheet is created (File New option, Spreadsheet tab)
the columns number (Number of variables:) and the rows number (Number of
cases:) have to be defined. Other options usually stay not changed. In the created
worksheet the defined numbers of columns and rows are marked with white, while
the rest of the worksheet is marked in gray. As far as the columns/rows numbers will
not be redefined the gray part of the worksheet is not accessible.
How to change the data in the worksheet?
Click the appropriate worksheet cell and input the new value. The change is
confirmed with the use of ENTER button or after changing focus to the other cell.
How to delete the cell contents?
Click onto the cell and use DELETE button.
How to add new columns/rows at the end of the worksheet?
Double click on the gray worksheet area. The Add cases and/or variables
dialog appears allowing the definition of the number of new columns/rows. The
default values of dialog edit boxes depend on the position which was double clicked.
For instance if one would like to introduce two new columns and three new rows in
the worksheet shown in Fig. 1, he has to double click two positions to the left and
three positions down from the most right and bottom active (white) cell.
How to add a new column/row in any worksheet position?
The Insert option allows the definition of new columns/rows. Add variables
or Add cases should be chosen depending on needs. In the case of columns
insertion How many:, After: and Name: fields should be fulfilled while only two: How
many: and Insert after case: edit boxes in the case of rows insertion.
How to delete a piece of worksheet?
It is possible to remove whole rows or columns from the worksheet. In order to
achieve that, mark the area which should be removed and use Edit Delete
Variables or Edit Delete Cases option.

Statistica instruction
4

Students Laboratory, Department of Biophysics

How to change variable (column) properties?


Some column properties can be defined by double click on the column header.
The most important possibility offered by the dialog appearing (Fig. 3) is the change
of variable name. The column name is very important in Statistica program. It can be
used in data recalculations or plotting.
One of the most usefull and interesting options of presented dialog is the possibility
of calculating the mean value, standard deviation and number of valid cases in
chosen column. Use Values/Stats to do that.

Fig. 3. The dialog window used for the variable (column) properties definition.

3. Data recalculation
There are few different possibilities of performing the data recalculations. Most
often the effect of data recalculations is stored in a new column and is the effect of
using the variables stored in the worksheet earlier.
The simples possibility is to use the dialog window applied for column
properties definition (Fig. 3). It is opened by double click on the column header.
The formula used for data recalculation should be introduced in Long name
(label or formula with Functions): edit box. For instance in the case when in
column called Var3 the sum of columns Var1 and Var2 has to be calculated put the
Statistica instruction
5

Students Laboratory, Department of Biophysics

following formula: =Var1+Var2 into the edit box. Notice the Functions button above
the edit box. It can be used for the choice of the functions implemented in Statistica.
Another possibility is Data Batch Transformation Formulas option. It is
opening the dialog window shown in Fig. 4. The edit box contained in the dialog
allows for the definition of very complicated formulas applying numerous functions,
operators and variables. It is possible to recalculate many variables at the same time
with separated formulas.
If concern the example shown in Fig. 4, the sum of variables Var1 and Var2 will
be stored in variable Var3. At the same time the values of column Var4 will be
calculated as Var1 multiplied by the square root of variable Var2.

Fig. 4. Batch transformation Formulas dialog window.


Calculations performed in Statistica on the basis of user defined formulas are
not refreshed automatically in default. It means, that any changes of data used earlier
in calculations are not automatically influencing calculations results. To achieve the
automatic data update use Data Recalculate Spreadsheet formulas and
subsequently Auto-recalculate when the data change option in the appeared
dialog window. Since that all formulas will be recalculated automatically. This rule is
valid only for formulas defined in the dialog window used for column properties
definition (Fig. 3), not for the batch transformations.
Batch transformations can be updated only by the use of Batch
Transformations Formulas dialog.

Statistica instruction
6

Students Laboratory, Department of Biophysics

4. Descriptive statistics
There are three methods for calculating statistical parameters for variables.
The amount of achieved information is different in these three methods and they are
not equivalent.
Method 1: Click twice the column header for desired variable and choose
Values/Stats button in the appeared dialog window (Fig. 3). The average, the
standard deviation and the number of valid cases is displayed (Fig. 5). It is possible
to copy statistics to the clipboard with
button.
Method 2: Mark a part of the worksheet for which the statistics have to be
calculated. Click ones with the right mouse button to display the context menu
(Fig. 6) and choose Statistics of Block Data Block Columns. Than pick the
statistics which should be produced, e.g. All. Calculated results are placed in a new
worksheet.

Fig. 5. The dialog window displaying statistics of variable.

Fig. 6. Descriptive statistics calculations for choosen part of the worksheet.

Statistica instruction
7

Students Laboratory, Department of Biophysics

Method 3: Use Statistics Basic Statistics/Tables option. Subsequently


choose Descriptive statistics option in appearing window and confirm the choice
with OK button. Next dialog contains a lot of options, use Sumary: Statistics or
Summary buttons to calculate statistics. Calculations can be done on the basis of
chosen or all variables. Mark desired variables before using the presented option or
use Variables button in the last described dialog window. The results are displayed
as a new worksheet.

5. Histogram
Use Graphs Histograms option in order to create the histogram of data
stored in the worksheet. The option opens a dialog window shown in Fig. 7. Variables
button displays form allowing choice of variables for analysis (Fig. 8).

Fig. 7. 2D Histograms dialog window.

Fig. 8. Select Variables for Histogram window allowing the choice of columns for
histogram analysis.
Statistica instruction
8

Students Laboratory, Department of Biophysics

The window 2D Histograms contains Advanced tab (Fig. 9) which offers


some useful features. The created histogram can be fitted with one of many
statistical distributions. The choice can be done in File type list.
As the analysis is performed some statistical data are calculated. Four check
buttons in Statistics panel allow to decide what kind of statistics has to be
calculated.
Another interesting and useful possibility is Boundaries option. If the radio
button titled Boundaries is checked, the Specify Boundaries button becomes
active. The use of mentioned button opens the possibility of redefining the data range
taken into account in the histogram (Fig. 10). Also the width of analyzed bins can be
set in Interval Step edit box. All elements of the histogram plot, as well as in the
case of any other plot types can be performed by double click on the element which
needs edition. An example of a histogram with descriptive statistics and Gaussian fit
is presented in Fig. 11.

The type of fitted


statistical distribution

The button allowing


redefinition of analysis
range and bin width.

Statistics calculated In
the histogram analysis
procedure.

Fig. 9. Advanced tab in 2D Histograms dialog window.

Statistica instruction
9

Students Laboratory, Department of Biophysics

Fig. 10. The dialog box used for the histogram range redefinition.

Descriptive
statistics.

Fig. 11. An example of histogram created in Statistica program. The histogram is


fitted with normal distribution (Gaussian). The normal distribution parameters are
displayed in the plot header. Descriptive statistics in the bottom part of the plot is
displayed because Descriptive Statistics check box was used in Advanced tab of
2D Histograms dialog (Fig. 9).

6. Plot creating
A plot is a set of points marked in the coordinating system and usually illustrates
a dependence between two quantities e.g. pressure as a function of time. Avery point
is characterized by two co-ordinates: Y called dependent variable, ad X called
independent variable. In this nomenclature Y is a function of X.
The first thing to do before creating the plot is the choice of independent and
dependent variables. Variables are represented in Statistica by data columns.
Program allows the plot of many dependences on single plot. In order to create the
plot one independent (X) variable (column) has to be chosen while many dependent
variables (Y) can be applied.
Statistica instruction
10

Students Laboratory, Department of Biophysics

Most plots in labs, except from histograms described in the previous point, can
be created with Graphs Scatterplots option (Fig. 12) which displays
2D Scatterplots dialog window (Fig. 13).

Fig. 12. A main program menu used for plots creation.

Fig. 13. 2D Scatterplots dialog window.


The variables used for plot preparation can be chosen in dialog box (Fig. 14)
run by Variables button located in Quick tab of 2D Scatterplots dialog window.
Appearing Select Variables for Scatterplot dialog box contains two lists of variables
defined in the active worksheet. Four elements are contained in the example shown
in Fig. 14: Var1, Var2, Var3 and Var 4. Var2 and Var3 were chosen as the dependent
variables while Var1 was defined as the independent one.
The choice of many dependent variables at the same time gives the possibility
of creating multiple dependencies on a single plot but the Grapht type option in 2D
Scatterplots window should be defined as Multiple. In case if Regular option was
set, only one dependency will be plotted. An example of a scatter plot created in
Statistica program is shown in Fig. 15.
Statistica instruction
11

Students Laboratory, Department of Biophysics

Fig. 14. The dialog box allowing the choice of variables used for plot preparation.
Scatterplot of Var2 against Var1
RR-Proba 2v*218c
1,25
1,20
1,15

Var2

1,10
1,05
1,00
0,95
0,90
0,85
0,80
0,80

0,85

0,90

0,95

1,00

1,05

1,10

1,15

1,20

1,25

Var1

Fig. 15. An example of a regular scatter plot created in Statistica program.

7. Plot formatting
In order to modify any plot element click it double. The appearing menu
depends on the element one would like to alter. E.g. when double click on Var1 axis
title was done in example shown in Fig 15, a dialog window allowing the axis title
edition appears (Fig. 16a).
In all cases when any graph element is chosen by double clicking a Graph
Options dialog window is displayed but depending on situation it is opened with
different options tree position. The options structure is shown on the left site of the
window (Fig. 16a,b). The Graph Options window could be always displayed from the
right mouse button context menu.

Statistica instruction
12

Students Laboratory, Department of Biophysics

a)

b)

Fig. 16. Graph Options dialog window.


One of the most frequently options used, concerning the plot format, is axis
scaling. Usually default procedure does not working well and plots demand manual
rescaling. Choose Scalling option in Graph Options window on the options
structure tree visible on the right site of the window (Fig. 16b) and then change the
Mode option from Auto to Manual. Afterwards redefine the axis ranges in Minimum
and Maximum edit boxes.
Options shown in Fig. 16b allow also for the choice of different scale types
(Scale type panel). From five available possibilities, Linear and Logarithmic scale
types are most often used.
All choices done in dialog shown in Fig. 16b concern the axis which was picked
up in Axis edit box in the upper part of the dialog.

8. Plot points deleting


Sometimes because of some mistakes and errors wrong data are occurring.
They can be excluded from the analysis if necessary with the use of brushing
function. In order to exclude some data use right mouse button on the plot and pick
Show Brushing option (Fig. 17a). The cursor shape changes for magnifying glass.
Also another tool box titled 2D Brushing appears. Since that the unnecessary data
points can be marked and eliminated. In order to do that mark the data first with the
left mouse button. Marked data are displayed on the plot in black. In order to mark
more than single point mark subsequent points using CTRL keyboard key. The

Statistica instruction
13

Students Laboratory, Department of Biophysics

marked points can be removed by clicking on them with the right mouse button and
choosing Brushing Off option in appearing context menu (Fig. 17b).
In order to switch off the brushing mode use ESC button from the clipboard.
a)

b)

Fig. 17. The use of brushing function.

9. Plot data reading


It is very easy to read the data values from the plot. In order to display the data
point co-ordinates and case (row) number of particular point it is enough to pick the
point with the mouse cursor without clicking. The picked poin description is displayed
in a small box next to the point as shown in Fig. 18.

Statistica instruction
14

Students Laboratory, Department of Biophysics

Fig. 18. Data point values reading.

10. Multiple plots of data contained in different worksheets


The creation of multiple plots is easy when the data are contained in one
worksheet. Use the same method as presented in Chapter 6. Set Graph type option
for Multiple in 2D Scatterplots dialog (Fig. 13) and then use Variables button to
define dependent and independent variables.
When the data which should be presented on a single plot belong to different
worksheets it is necessary to merge them before plotting.
It is useful to set specific names for variables (columns) in both merged
worksheets first. It helps in proper variables identification and allows to avoid
mistakes. Click twice the column header to call a dialog allowing the column name
change (Fig. 3).
Use Data Merge option (Fig. 19) in order to merge two separate
worksheets. A Merge Options dialog appears (Fig. 20).

Fig. 19. Data Merge option.

Statistica instruction
15

Students Laboratory, Department of Biophysics

Fig. 20. Merge Options dialog window.


The most important is to choose the worksheets for merge using File 1 and
File 2 buttons. Chosen worksheet names are placed next to mentioned above
buttons and OK button creates a new merged worksheet. Then use Graphs 
ScatterplotsF. option and follow standard procedure of creating plots as described
in Chapter 6.

11. Fitting data with standard models


There is in Statistica program the possibility of fitting mathematical models to
data. The most popular models (linear, exponential, logarithmic etc.) can be fitted to
the data when the data are plotted. This method will be described in this point. It has
to be pointed out that it is not possible to calculate the fitted model parameter errors
in this solution. Only model parameters and some simple statistics are calculated. If
parameters errors are necessary refer to the next point of the instruction.
For the plot creation one of the two subsequent options can be used:
Graphs Scatterplots or Graphs 2D Graphs Scatterplots . The first
tab of appearing window called Quick was described earlier. The second tab called
Advanced gives the chance to fit mathematical models to experimental data
(Fig. 21).

Statistica instruction
16

Students Laboratory, Department of Biophysics

Rys. 21. 2D Scatterplots widndow, Advanced tab.


After choosing the dependent and independent variables for plotting and fitting
with the use of Variables button, use Fit panel to choose the fitting model. Between
8 models implemented here only three will be used in labs: Linear (Y=A+BX),
Logarithmic (Y=A+Blog(X)) and Exponential (Y=AeBx). Finally, decide which
statistical data have to be included to the plot in Statistics panel.
An example of a plot of data fitted with the linear function is shown in Fig. 22.
Var2 variable was plotted as a function of Var1 variable. The plot was fitted with a
linear function which equation is Var2 = 26.4067 - 2.2257x (see the plot header).
Thera are statistical data, including also the function equation in the left lower corner.
In many cases there is a need to fit a mathematical model only to the part of
data. In such case prepare the plot with fitting in normal way i.e. fit the whole data
range. And then limit the fitting range in Graph Options window. The window starts
after double clicking the fitted curve. Ensure that Fitting branch is active in the
options tree visible on the left site of the options window. If not click it. The Graph
Options window should look like one showed in Fig. 23. The default value in Range
list box is Full range. Change it for Axis range in order to limit the fitting range to the
data range visible on plot. In the case when Custom range was set it is necessary to
define manually the beginning and the end of the range of data which should be fitted
in Min and Max edit boxes.

Statistica instruction
17

Students Laboratory, Department of Biophysics

Fig. 22. Example of a plot with linear model fitted to the experimental data.

Fig. 23. Graph Options window opened in Fitting section. The range list box allows
for the redefinition of the data range taken into consideration when the mathematical
model is fitted.

Statistica instruction
18

Students Laboratory, Department of Biophysics

12. Fitting data with non standard models


The procedure described here can be used in two cases: (1) if there is a need
for data fitting with non-standard models i.e. model not implemented in 2D
Scatterplots window or (2) if it is necessary to use standard models but there is
necessary to calculate model parameters and evaluate their errors.
Use Statistics Advanced Linear/Nonlinear Models Nonlinear
Estimation (Fig. 24) option which starts Nonlinear Estimation dialog (Fig. 25). Pick
User-specified regression, last squares and confirm your choice with OK button.
Click Function to be estimated button in the next dialog window which allows for
the definition of any mathematical model for data fitting.

Fig. 24. Nonlinear Estimation option allows fitting of any mathematical model to the
experimental data.

Fig. 25. A dialog window used for the estimation method choice.

Statistica instruction
19

Students Laboratory, Department of Biophysics

The dialog window showed In Fig. 26 allows for the definition of mathematical
model used for data fitting. There are some examples showing the syntax used here
on the bottom.
Variables names, operators and Statistica functions can be used in model
function definition. All characters and character strings not recognized as one of the
categories mentioned above are treated as model parameters.
Following string was defined in the example showed in Fig. 26:
PrzY=A+B*Exp(-C*X). It means that the function y = A + B eC x will be used. PrzY
variable (column 4 in the worksheet) plays the role of dependent variable (y) while X
(column 1) is the independent variable (x). A, B and C are model parameters which
will be calculated in the estimation (fitting) procedure.

Fig. 26. Dialog window used for definition of the mathematical model.
After model function is set use OK button to come back to User-Specified
Regresion, Last Squares dialog. The next procedure stage starts after using OK
button. Dialog window showed in Fig. 27 allows the fitting procedure start (OK
button). The fitting procedure is based on the minimization method. The computer
looks for the best parameters values starting from some initial values. The default
initial values for parameters are always the same and are equal to 0.1. This default
choice is usually not proper and sometimes causes some errors when trying to fit the
model (e.g. Predictors are probably very redundand; estimates suspect). In other
cases the model could be fitted without errors, but the fitted curve will not fit perfectly.
In both cases the solution is to choose new initial parameters.
Statistica instruction
20

Students Laboratory, Department of Biophysics

Use Advanced tab to do that (Fig. 27). There is a button titled Start values:
which allows the definition of parameters initial values (Fig. 28). The choice of initial
values is not easy and needs some experience. The simpler solution is to set all
parameters for 0. If this would not work it is necessary to take into consideration the
physical interpretation of experimental data and mathematical model to guess the
initial values. Some solution could be also to try to fit the simpler but standard model
to the experimental data and use its fitted parameters as the initials for the nonstandard model.
After possible choice of starting parameters values use OK button in window
showed in Fig. 27. If the procedure would be succeed results window appears (Fig.
29). Two buttons need attention here: Summary and Fitted 2D function &
observed vals.
Summary button displays the worksheet with fitting procedure results (Fig. 30).
There are some statistical data and the most important among them are parameters
values (first column) and their errors (second column).
Fitted 2D function & observed vals button creates a plot with fitted curve.
There is also the estimated function equation visible in the plot header.

Fig. 27. The window used for data fitting with non-standard models. Advanced tab
with Start values: button allows for the initial parameters values definition.

Statistica instruction
21

Students Laboratory, Department of Biophysics

Fig. 28. The dialog window used for initial values definition.

Rys. 29. The dialog appearing if the fitting procedure succeed. .

Rys. 30. Fitting procedure Summary.

Statistica instruction
22

You might also like