Data Processing
Data Processing
Data Processing
Data processing is "the collection and manipulation of items of data to produce meaningful information.
Data processing is distinct from word processing, which manipulates text rather than data.It is a subset of information
processing, "the change (processing) of information in any manner detectable by an observer."
Data processing may involve various processes, including:
Validation – In computer science, data validation is the process of ensuring that a program operates on clean, correct
and useful data. It uses routines, often called "validation rules" or "check routines", that check for correctness,
meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated
facilities of a data dictionary, or by the inclusion of explicit application program validation logic.
For business applications, data validation can be defined through declarative data integrity rules, or procedure-based business
rules. Data that does not conform to these rules will negatively affect business process execution. Therefore, data validation
should start with business process definition and set of business rules within this process. Rules can be collected through the
requirements capture exercise.
The simplest data validation verifies that the characters provided come from a valid set. For example, telephone numbers
should include the digits and possibly the characters +, -, (, and ) (plus, minus, and parentheses). A more sophisticated data
validation routine would check to see the user had entered a valid country code, i.e., that the number of digits entered
matched the convention for the country or area specified.
Incorrect data validation can lead to data corruption or a security vulnerability. Data validation checks that data are valid,
sensible, reasonable, and secure before they are processed.
A validation process involves two distinct steps: (a) Validation Check and (b) Post-Check action. The check step uses one or
more computational rules (see section below) to determine if the data is valid. The Post-validation action sends feedback to
help enforce validation.
Sorting – Sorting is any process of arranging items in some sequence and/or in different sets, and accordingly, it has two
common, yet distinct meanings:
ordering: arranging items of the same kind, class, nature, etc. in some ordered sequence,
grouping and labeling items with similar properties together (by sorts).
categorizing: Summarization – In descriptive statistics, summary statistics are used to summarize a set of
observations, in order to communicate the largest amount as simply as possible. Statisticians commonly try to describe the
observations in
a measure of location, or central tendency, such as the arithmetic mean
a measure of statistical dispersion like the standard deviation
a measure of the shape of the distribution like skewness or kurtosis
if more than one variable is measured, a measure of statistical dependence such as a correlation coefficient
A common collection of order statistics used as summary statistics are the five-number summary, sometimes extended to a
seven-number summary, and the associated box plot.
Entries in an analysis of variance table can also be regarded as summary statistics
Aggregation – combining multiple pieces of data. In statistics, aggregate data describes data combined from several
measurements. When data are aggregated, groups of observations are replaced with summary statistics based on those
observations.
In economics, aggregate data or data aggregates describes high-level data that is composed from a multitude or combination
of other more individual data.
In macroeconomics, data such as the overall price level or overall inflation rate
In microeconomics, data of an entire sector of an economy composed of many firms, or of all households in a city or region.
Analysis – the "collection, organization, analysis, interpretation and presentation of data.".
Reporting – list detail or summary data or computed information.
Computer data storage, often called storage or memory, is a technology consisting of computer components and
recording media used to retain digital data. It is a core function and fundamental component of computers. The central
processing unit (CPU) of a computer is what manipulates data by performing computations. In practice, almost all
computers use a storage hierarchy, which puts fast but expensive and small storage options close to the CPU and slower but
larger and cheaper options farther away. Often the fast, volatile technologies (which lose data when powered off) are referred
to as "memory", while slower permanent technologies are referred to as "storage", but these terms can also be used
interchangeably. In the Von Neumann architecture, the CPU consists of two main parts: control unit and arithmetic logic unit
(ALU). The former controls the flow of data between the CPU and memory; the latter performs arithmetic and logical
operations on data.