Itjdm 07
Itjdm 07
Itjdm 07
ABSTRACT A P2P-based framework supporting the extraction of aggregates from historical multidimensional data is proposed, which provides efficient and robust query evaluation. When a data population is published, data are summarized in a synopsis, consisting of an index built on top of a set of sub synopses (storing compressed representations of distinct data portions). The index and the sub synopses are distributed across the network, and suitable replication mechanisms taking into account the query workload and network conditions are employed that provide the appropriate coverage for both the index and the sub synopses.
Introduction:
Peer-to-peer networking, a disruptive technology for large scale distributed applications, has gained widespread attention due to the successes of peer-to-peer (P2P) content sharing, media streaming. In our project we concentrate on the same data sharing in client to client system. The provider splits the data in to their respective types and then compress them, sub synopsis have been prepared and then scattered in the networks. These sub synopses are retrieved using the index structures which accompany all of them. The huge amount of resources provided by P2P networks (in terms of storage capacity, computing power, and data transmission capability) could effectively support data management. From this standpoint, one of the application contexts which are likely to benefit from the support of a P2P network is the analysis of multidimensional data. In this scenario, information is represented as points in a multidimensional space whose dimensions correspond to different perspectives over data: users explore data and retrieve aggregates by issuing range queries, i.e., queries specifying an aggregate operator and the range of the data domain from which the aggregate information should be retrieved.
We will consider the case of analytical applications dealing with historical data, which typically require huge computation and storage capabilities, due to the large amount of data which need to be accessed to evaluate queries. Although the multidimensional data model is substantially more complex than the representation paradigm adopted in the file sharing context, analytical applications dealing with historical multidimensional data and file-sharing applications share a fundamental aspect: they can rely on lossy data compression. In fact, analogously to tools for reproducing audio and/or video files, a lot of applications dealing with multidimensional data can effectively accomplish their tasks even in the case that only an approximate representation of data is available. For instance, in Decision Support Systems (DSSs) or statistical databases, users are often concerned with performing data exploration with the aim of discovering interesting trends rather than extracting fine-grained information. In this scenario, high accuracy in less relevant digits of query answers is not needed, as providing their order of magnitude suffices to locate the regions of the database containing relevant information. At the same time, fast answers to these preliminary queries allow users to focus their explorations quickly and effectively, thus saving large amounts of system resources.
Our aim is devising a P2P-based framework supporting the analysis of multidimensional historical data. Specifically, our efforts will be devoted to combining the amenities of P2P networks and data compression to provide a support for the evaluation of range queries, possibly trading off efficiency with accuracy of answers. The framework should enable members of an organization to cooperate by sharing their resources (both storage and computational) to host (compressed) data and perform aggregate queries on them, while preserving their autonomy. A framework with these characteristics can be useful in different application contexts. For instance, consider the case of a worldwide virtual organization with users interested in geographical data, as well as the case of a real organization on an enterprise network. In both cases, even users who are not continuously interested in performing data analysis can make a part of their resources available for supporting analysis tasks needed by others, if their own capability of performing local tasks is preserved.
This is analogous to the idea on which several popular applications for public resource computing are based. Members of a worldwide community offer their CPU, when it is idle, to analyze radio telescope readings in search of nonrandom patterns, such as spikes in power spectra. In order to make participants really autonomous, they should be imposed no constraint on storage and computational resources to be shared, as well as on the reliability of their network connection. These requirements make traditional distributed frameworks unsuitable and suggest the adoption of a solution based on an unstructured P2P network, where peers are neither responsible of coordination tasks
EXISTING SYSTEM
The P2P-based solution has imposed itself as an effective evolution of traditional distributed databases. This is quite surprising, as the huge amount of resources provided by P2P networks (in terms of storage capacity, computing power, and data transmission capability) could effectively support data management. From this standpoint, one of the application contexts which are likely to benefit from the support of a P2P network is the analysis of multidimensional data. In this scenario, information is represented as points in a multidimensional space whose dimensions correspond to different perspectives over data: users explore data and retrieve aggregates by issuing range queries.
Although the cost of disk storage is continuously and rapidly decreasing, it may still be difficult to find peers for which hosting replicas of synopses has a negligible cost, while autonomy is a requirement in our setting
Although compressing the data certainly makes replication less resource consuming, replicating the entire synopsis each time would require storage and network resources that could be saved if only some specific portion of the synopsis could be replicated.
PROPOSED SYSTEM Our proposal is a framework supporting the sharing and the analysis of compressed historical multidimensional data over an unstructured P2P network. From the user standpoint, Two tasks are supported: data publication and data querying.
Data publication: Let p be a peer which is willing to share a historical multidimensional data set D so that the other peers can pose aggregate range queries against it. In order to make its data suitable for being distributed across the network, p builds a synopsis of D by first appropriately partitioning D, and then, compressing each portion of data in the partition. Peer p also builds an index over these sub synopses, which is properly fragmented in order to make it prone to be distributed. Finally, the sub synopses and the index portions are disseminated across the network, along with metadata about D. The assignment of data and index portions to peers takes into account the willingness of peers to share their resources.
Data querying: Exploration queries can be issued by peers to discover the shared data sets in which they may be interested. These queries specify criteria that are matched against the metadata associated with each available data set. The result of the exploration process is a set of matching data sets and for each of them, a set of peers that should be contacted to start the evaluation of range queries.
ADVANTAGES IN PROPOSED SYSTEM In order to guarantee peer autonomy, peers cannot be constrained to host a certain portion of the index or to be always connected to the network.
Peers are volatile, so the framework must be capable of promptly reacting to peer disconnections, preventing dangling references in the index.
Literature Survey:
Peer to peer system:
Peer-to-peer technology has become very popular for various applications such as file sharing or live streaming. Today, a large fraction of the Internet traffic is due to peerto-peer applications. However, there are still many theoretical and practical challenges. In our research project, we mainly focus on In contrast to client-server architectures, the p2p computing paradigm seeks to harness the computing power of all machines in the network: A peer-to-peer system leverages resources of all clients such as bandwidth, storage space, or CPU cycles. A crucial advantage of such distributed systems is their scalability: As more and more peers join the network and the demand on the system increases, the total capacity of the system increases as well. Moreover, p2p systems are often fault tolerant as data is replicated and there is no single point of failure. However, there are also challenges. While a server is available most of the time, p2p systems often experience frequent membership changes: A user typically joins the network only for a short period of time, e.g., in order to download a file, and then leaves the network again. Hence, a p2p system in practice must guarantee seamless operation despite the ongoing joins and leaves. The power of peer-to-peer computing arises from the collaboration of its numerous constituent parts, the peers. If all the participating peers contribute some of their resources for instance bandwidth, memory, or CPU cycles , highly scalable decentralized systems can be built which significantly outperform existing server based solutions. Unfortunately, in reality, many peers are selfish and strive for maximizing their own utility by benefiting from the system without contributing much themselves. Hence the performance of a p2p system crucially depends on its capability of dealing with selfishness. A well-known mechanism designed to cope with this free-riding problem is the tit-for-tat policy which is for instance employed by the file-distribution tool Bit Torrent.
However, selfish behavior in peer-to-peer networks has numerous important implications even beyond the peer's unwillingness to contribute bandwidth or memory. For example, in unstructured p2p systems the predominant p2p architectures in today's Internet , a peer can select to which and to how many other peers in the network it wants to connect. With a clever choice of neighbors, a peer can attempt to optimize its lookup performance by minimizing the latencies or more precisely, the stretch to the other peers in the network. Achieving good stretches by itself is of course simple: A peer can establish links to a large number of other peers in the system. Because the memory and maintenance overhead of such a neighbor set is large, however, egoistic peers try to exploit locality as much as possible, while avoiding storing too many neighbors. It is this fundamental trade-off between the need to have small latencies and the desire to reduce maintenance overhead that governs the decisions of selfish peers.
In handling product lines, you must be careful to forecast each item with an eye to the forecasts of all other items in the group. For instance, increased sales of one item in the product line may imply lower or higher sales of other items. Another possible complication is that you may need to keep switching focus between several dimensions, such as sales regions and market segments.
Data compression:
Data compression is used just about everywhere. Data compression is useful because it helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. On the downside, compressed data must be decompressed to be used, and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it's being decompressed (the option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video). The design of data compression schemes therefore involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data. Compression is useful because it helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. On the downside, compressed data must be decompressed to be used, and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed (the option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video). The design of data compression schemes therefore involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data.
An unstructured P2P network is formed when the overlay links are established arbitrarily. Such networks can be easily constructed as a new peer that wants to join the network can copy existing links of another node and then form its own links over time. In an unstructured P2P network, if a peer wants to find a desired piece of data in the network, the query has to be flooded through the network to find as many peers as possible that share the data. Aggregate data describes data combined from several measurements
SQL
SQL (Structured Query Language) is a database computer language designed for managing data in relational database management systems (RDBMS). Its scope includes data query and update, schema creation and modification, and data access control. SQL was one of the first languages for Edgar F. Codd's relational model in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks" and became the most widely used language for relational databases.
History
SQL was developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. This version, initially called SEQUEL, was designed to manipulate and retrieve data stored in IBM's original relational database product, System R. IBM patented this version of SQL in 1985. During the 1970s, a group at IBM San Jose Research Laboratory developed the System R relational database management system. Donald D. Chamberlin and Raymond F. Boyce of IBM subsequently created the Structured English Query Language (SEQUEL or SEQL) to manage data stored in System R. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company. The first Relational Database Management System (RDBMS) was RDMS, developed at MIT in the early 1970s and Ingres, developed in 1974 at U.C. Berkeley. Ingres implemented a query language known as QUEL, which was later supplanted in the marketplace by SQL. In the late 1970s, Relational Software, Inc. (now Oracle Corporation) saw the potential of the concepts described by Codd, Chamberlin, and Boyce and developed their own SQL-
based RDBMS with aspirations of selling it to the U.S. Navy, Central Intelligence Agency, and other U.S. government agencies. In the summer of 1979, Relational Software, Inc. introduced the first commercially available implementation of SQL, Oracle V2 (Version2) for VAX computers. Oracle V2 beat IBM's release of the System/38 RDBMS to market by a few weeks.
Language elements
This chart shows several of the SQL language elements that compose a single statement.
Clauses, which are in some cases optional, constituent components of statements and queries.[9] Expressions which can produce either scalar values or tables consisting of columns and rows of data.
Predicates which specify conditions that can be evaluated to SQL three-valued logic (3VL) Boolean truth values and which are used to limit the effects of statements and queries, or to change program flow.
Queries which retrieve data based on specific criteria. Statements which may have a persistent effect on schemas and data, or which may control transactions, program flow, connections, sessions, or diagnostics.
o
SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar.
Insignificant whitespace is generally ignored in SQL statements and queries, making it easier to format SQL code for readability.
Queries
The most common operation in SQL is the query, which is performed with the declarative SELECT statement. SELECT retrieves data from one or more tables, or expressions. Standard SQL statements have no persistent effects on the database. Some non-standard implementations of SELECT can have persistent effects, such as the SELECT INTO syntax that exists in some databases.[10]
Queries allow the user to describe desired data, leaving the database management system (DBMS) responsible for planning, optimizing, and performing the physical operations necessary to produce that result as it chooses.
A query includes a list of columns to be included in the final result immediately following the SELECT keyword. An asterisk ("*") can also be used to specify that the query should return all columns of the queried tables. SELECT is the most complex statement in SQL, with optional keywords and clauses that include:
The FROM clause which indicates the table(s) from which data is to be retrieved. The FROM clause can include optional JOIN subclauses to specify the rules for joining tables.
The WHERE clause includes a comparison predicate, which restricts the rows returned by the query. The WHERE clause eliminates all rows from the result set for which the comparison predicate does not evaluate to True.
The GROUP BY clause is used to project rows having common values into a smaller set of rows.
GROUP BY is often used in conjunction with SQL aggregation functions or to eliminate duplicate
rows from a result set. The WHERE clause is applied before the GROUP BY clause.
The HAVING clause includes a predicate used to filter rows resulting from the GROUP BY clause. Because it acts on the results of the GROUP BY clause, aggregation functions can be used in the
HAVING clause predicate.
The ORDER BY clause identifies which columns are used to sort the resulting data, and in which direction they should be sorted (options are ascending or descending). Without an ORDER BY clause, the order of rows returned by an SQL query is undefined.
The following is an example of a SELECT query that returns a list of expensive books. The query retrieves all rows from the Book table in which the price column contains a value greater than 100.00. The result is sorted in ascending order by title. The asterisk (*) in the select list indicates that all columns of the Book table should be included in the result set.
SELECT FROM
WHERE
ORDER BY
The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book.
SELECT
Book.title,
count(*) AS Authors
FROM
JOIN
GROUP BY
---------------------- ------SQL Examples and Guide The Joy of SQL An Introduction to SQL Pitfalls of SQL 1 1 2 4
Under the precondition that isbn is the only common column name of the two tables and that a column named title only exists in the Books table, the above query could be rewritten in the following form:
SELECT
title,
count(*) AS Authors
FROM
Book Book_author
title;
However, many vendors either do not support this approach, or require column naming conventions. SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the select list to project data, as in the following example which returns a list of books that cost more than 100.00 with an additional sales_tax column containing a sales tax figure calculated at 6% of the price.
SELECT
isbn,
WHERE
ORDER BY
Data manipulation
The Data Manipulation Language (DML) is the subset of SQL used to add, update and delete data:
INSERT INTO
My_table
My_table
WHERE
My_table
field2 = 'N';
TRUNCATE deletes all data from a table in a very fast way. It usually implies a subsequent COMMIT operation.
MERGE is used to combine the data of multiple tables. It combines the INSERT and UPDATE elements. It is defined in the SQL:2003 standard; prior to that, some databases provided similar functionality via different syntax, sometimes called "upsert".
Transaction controls
Transactions, if available, wrap DML operations:
START TRANSACTION (or BEGIN WORK, or BEGIN TRANSACTION, depending on SQL dialect)
mark the start of a database transaction, which either completes entirely or not at all.
COMMIT causes all data changes in a transaction to be made permanent. ROLLBACK causes all data changes since the last COMMIT or ROLLBACK to be discarded, leaving the state of the data as it was prior to those changes.
Once the COMMIT statement completes, the transaction's changes cannot be rolled back.
COMMIT
and ROLLBACK terminate the current transaction and release data locks. In the absence of
a START TRANSACTION or similar statement, the semantics of SQL are implementationdependent. Example: A classic bank transfer of funds transaction.
START TRANSACTION;
UPDATE UPDATE IF IF
Account SET amount=amount-200 WHERE account_number=1234; Account SET amount=amount+200 WHERE account_number=2345;
Data definition
The Data Definition Language (DDL) manages table and index structure. The most basic items of DDL are the CREATE, ALTER, RENAME, DROP and TRUNCATE statements:
CREATE creates an object (a table, for example) in the database. DROP deletes an object in the database, usually irretrievably. ALTER modifies the structure an existing object in various waysfor example, adding a column to an existing table.
Data types
Each column in an SQL table declares the type(s) that column may contain. ANSI SQL includes the following data types.
Character strings
CHARACTER(n) or CHAR(n) fixed-width n-character string, padded with spaces as needed CHARACTER VARYING(n) or VARCHAR(n) variable-width string with a maximum size of n
characters
NATIONAL CHARACTER(n) or NCHAR(n) fixed width string supporting an international
character set
NATIONAL CHARACTER VARYING(n) or NVARCHAR(n) variable-width NCHAR string
Bit strings
BIT(n) an array of n bits BIT VARYING(n) an array of up to n bits
Numbers
INTEGER and SMALLINT FLOAT, REAL and DOUBLE PRECISION NUMERIC(precision, scale) or DECIMAL(precision, scale)
Data control
The Data Control Language (DCL) authorizes users and groups of users to access and manipulate data. Its two main statements are:
GRANT authorizes one or more users to perform an operation or a set of operations on an
object.
REVOKE eliminates a grant, which may be the default grant.
Example:
GRANT SELECT, UPDATE ON TO
FROM
Procedural extensions
SQL is designed for a specific purpose: to query data contained in a relational database. SQL is a set-based, declarative query language, not an imperative language such as C or BASIC. However, there are extensions to Standard SQL which add procedural programming language functionality, such as control-of-flow constructs. These are:
Source
Common Name
Full Name
ANSI/ISO Standard
Interbase/ Firebird
PSQL
Procedural SQL
IBM
SQL PL
Microsoft/ Sybase
T-SQL
Transact-SQL
MySQL
Oracle
PL/SQL
PostgreSQL
PL/pgSQL
PostgreSQL
PL/PSM
Criticisms of SQL
SQL is a declarative computer language for use with relational databases. Interestingly, many of the original SQL features were inspired by, but violated, the semantics of the relational model and its tuple calculus realization. Recent extensions to SQL achieved relational completeness, but have worsened the violations, as documented in The Third Manifesto. Practical criticisms of SQL include:
Implementations are inconsistent and, usually, incompatible between vendors. In particular date and time syntax, string concatenation, nulls, and comparison case sensitivity vary from vendor to vendor.
The language makes it too easy to do a Cartesian join (joining all possible combinations), which results in "run-away" result sets when WHERE clauses are mistyped. Cartesian joins are so rarely used in practice that requiring an explicit CARTESIAN keyword may be warranted. (SQL 1992 introduced the CROSS JOIN keyword that allows the user to make clear that a Cartesian join is intended, but the shorthand "comma-join" with no predicate is still acceptable syntax, which still invites the same mistake.)
It is also possible to misconstruct a WHERE on an update or delete, thereby affecting more rows in a table than desired. (A work-around is to use transactions or habitually type in the WHERE clause first, then fill in the rest later.)
The grammar of SQL is perhaps unnecessarily complex, borrowing a COBOL-like keyword approach, when a function-influenced syntax could result in more re-use of fewer grammar and syntax rules.
Cross-vendor portability
Popular implementations of SQL commonly omit support for basic features of Standard SQL, such as the DATE or TIME data types. As a result, SQL code can rarely be ported between database systems without modifications. There are several reasons for this lack of portability between database systems:
The complexity and size of the SQL standard means that most implementors do not support the entire standard.
The standard does not specify database behavior in several important areas (e.g., indexes, file storage...), leaving implementations to decide how to behave.
The SQL standard precisely specifies the syntax that a conforming database system must implement. However, the standard's specification of the semantics of language constructs is less well-defined, leading to ambiguity.
Many database vendors have large existing customer bases; where the SQL standard conflicts with the prior behavior of the vendor's database, the vendor may be unwilling to break backward compatibility.
Software vendors often desire to create incompatibilities with other products, as it provides a strong incentive for their existing users to remain loyal
The SQL/OLB, or Object Language Bindings, part is defined by ISO/IEC 9075, Part 10. SQL/OLB defines the syntax and symantics of SQLJ, which is SQL embedded in Java. The standard also describes mechanisms to ensure binary portability of SQLJ applications, and specifies various Java packages and their contained classes. This part of the standard consists solely of optional features.
The SQL/Schemata, or Information and Definition Schemas, part is defined by ISO/IEC 9075, Part 11. SQL/Schemata defines the Information Schema and Definition Schema, providing a common set of tools to make SQL databases and objects self-describing. These tools include the SQL object identifier, structure and integrity constraints, security and authorization specifications, features and packages of ISO/IEC 9075, support of features provided by SQLbased DBMS implementations, SQL-based DBMS implementation information and sizing items, and the values supported by the DBMS implementations.[24] This part of the standard contains both mandatory and optional features. The SQL/JRT, or SQL Routines and Types for the Java Programming Language, part is defined by ISO/IEC 9075, Part 13. SQL/JRT specifies the ability to invoke static Java methods as routines from within SQL applications. It also calls for the ability to use Java classes as SQL structured user-defined types. This part of the standard consists solely of optional features. The SQL/XML, or XML-Related Specifications, part is defined by ISO/IEC 9075, Part 14. SQL/XML specifies SQL-based extensions for using XML in conjunction with SQL. The XML data type is introduced, as well as several routines, functions, and XML-to-SQL data type mappings to support manipulation and storage of XML in an SQL database. This part of the standard consists solely of optional features.
What Is Java?
Java is two things: a programming language and a platform.
Java is also unusual in that each Java program is both compiled and interpreted. With a compiler, you translate a Java program into an intermediate language called Java byte codes--the platform-independent codes interpreted by the Java interpreter. With an interpreter, each Java byte code instruction is parsed and run on the computer. Compilation happens just once; interpretation occurs each time the program is executed. This figure illustrates how this works.
Java byte codes can be considered as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether it's a Java development tool or a Web browser that can run Java applets, is an implementation of the Java VM. The Java VM can also be implemented in hardware.
Java byte codes help make "write once, run anywhere" possible. The Java program can be compiled into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. For example, the same Java program can run on Windows NT, Solaris, and Macintosh.
The Java Virtual Machine (Java VM) The Java Application Programming Interface (Java API)
The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries (packages) of related components. The following figure depicts a Java program, such as an application or applet, that's running on the Java platform. As the figure shows, the Java API and Virtual Machine insulates the Java program from hardware dependencies.
As a platform-independent environment, Java can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring Java's performance close to that of native code without threatening portability.
However, Java is not just for writing cute, entertaining applets for the World Wide Web ("Web"). Java is a general-purpose, high-level programming language and a powerful software platform. Using the generous Java API, we can write many types of programs. The most common types of programs are probably applets and applications, where a Java application is a standalone program that runs directly on the Java platform. How does the Java API support all of these kinds of programs? With packages of software components that provide a wide range of functionality. The core API is the API included in every full implementation of the Java platform. The core API gives you the following features:
The Essentials: Objects, strings, threads, numbers, input and output, data structures, system
Applets: The set of conventions used by Java applets. Networking: URLs, TCP and UDP sockets, and IP addresses.
Internationalization: Help for writing programs that can be localized for users worldwide.
Programs can automatically adapt to specific locales and be displayed in the appropriate language.
Security: Both low-level and high-level, including electronic signatures, public/private key
Software components: Known as JavaBeans, can plug into existing component architectures
Object serialization: Allows lightweight persistence and communication via Remote Method
Invocation (RMI).
Java Database Connectivity (JDBC): Provides uniform access to a wide range of relational
databases. Java not only has a core API, but also standard extensions. The standard extensions define APIs for 3D, servers, collaboration, telephony, speech, animation, and more.
Get started quickly: Although Java is a powerful object-oriented language, it's easy to learn,
Write less code: Comparisons of program metrics (class counts, method counts, and so on)
suggest that a program written in Java can be four times smaller than the same program in C++.
Write better code: The Java language encourages good coding practices, and its garbage
collection helps you avoid memory leaks. Java's object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other people's tested code and introduce fewer bugs.
Develop programs faster: Your development time may be as much as twice as fast versus
writing the same program in C++. Why? You write fewer lines of code with Java and Java is a simpler programming language than C++.
Avoid platform dependencies with 100% Pure Java: You can keep your program portable by
following the purity tips mentioned throughout this book and avoiding the use of libraries written in other languages.
Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-
independent byte codes, they run consistently on any Java platform. Distribute software more easily: You can upgrade applets easily from a central server. Applets take advantage of the Java feature of allowing new classes to be loaded "on the fly," without recompiling the entire program. We explore the java.net package, which provides support for networking. Its creators have called Java programming for the Internet. These networking classes encapsulate the socket paradigm pioneered in the Berkeley Software Distribution (BSD) from the University of California at Berkeley.
Most popular and widely accepted database connectivity called Open Database Connectivity (ODBC) is used to access the relational databases. It offers the ability to connect to almost all the databases on almost all platforms. Java applications can also use this ODBC to communicate with a database. Then we need JDBC why? There are several reasons:
ODBC API was completely written in C language and it makes an extensive use of pointers. Calls from Java to native C code have a number of drawbacks in the security, implementation, robustness and automatic portability of applications.
ODBC is hard to learn. It mixes simple and advanced features together, and it has complex options even for simple queries. ODBC drivers must be installed on clients machine.
Architecture of JDBC:
JDBC Architecture contains three layers:
JDBC Application
JDBC Drivers
JDBC Drivers
Application Layer: Java program wants to get a connection to a database. It needs the information from the database to display on the screen or to modify the existing data or to insert the data into the table. Driver Manager: The layer is the backbone of the JDBC architecture. When it receives a connection-request form.
The JDBC Application Layer: It tries to find the appropriate driver by iterating through all the available drivers, which are currently registered with Device Manager. After finding out the right driver it connects the application to appropriate database. JDBC Driver layers: This layer accepts the SQL calls from the application and converts them into native calls to the database and vice-versa. A JDBC Driver is responsible for ensuring that an application has consistent and uniform m access to any database.
when a request received by the application, the JDBC driver passes the request to the ODBC driver, the ODBC driver communicates with the database and sends the request and gets the results. The results will be passed to the JDBC driver and in turn to the application. So, the JDBC driver has no knowledge about the actual database, it knows how to pass the application request o the ODBC and get the results from the ODBC.
The JDBC and ODBC interact with each other, how? The reason is both the JDBC API and ODBC are built on an interface called Call Level Interface (CLI). Because of this reason the JDBC driver translates the request to an ODBC call. The ODBC then converts the request again and presents it to the database. The results of the request are then fed back through the same channel in reverse.
Modules:
1. Compressing Data in provider side 2. Indexing the values 3. Designing Receiver client 4. Justifying Algorithm
4. Justifying Algorithm:
CHIST is the algorithm used in compression of data The contents are compressed in individual manner and scattered in the network. Replication is the major hindrance so it have to governed, the duplication must done on demand only. Parrel execution must taken in account, it have to maintain its own efficiency
SYSTEM DIAGRAM
SERVER
R.CLIENT
N.CLIENT
N.CLIENT
P.CLIENT
Multidimensional data
P.CLIENT
SERVER
Authenticated
Requesting
R.CLIENT
4. Justifying Algorithm:
client
client
client
client
subsynopsis
client
R.client
requesting R. client
checking server
Resource providing
Class diagram:
State diagram:
Client
import java.io.*; import java.net.*; import java.util.*;
Socket s;
DatagramPacket dp;
int count=0;
public NewClient() {
initComponents();
setSize(529, 407);
setVisible(true);
e.printStackTrace(); } }
jPanel1.setBackground(new java.awt.Color(51, 51, 51)); jPanel1.setBorder(javax.swing.BorderFactory.createLineBorder(new java.awt.Color(0, 0, 0))); jPanel1.setForeground(new java.awt.Color(51, 51, 0)); jPanel1.setFont(new java.awt.Font("Tahoma", 0, 18)); jPanel1.setLayout(null);
jPanel1.add(jLabel1);
list1.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { list1ActionPerformed(evt); } }); jPanel1.add(list1); list1.setBounds(150, 110, 210, 200);
jLabel2.setForeground(new java.awt.Color(255, 255, 255)); jLabel2.setText("Files given by provider"); jPanel1.add(jLabel2); jLabel2.setBounds(150, 90, 210, 14);
private void list1ActionPerformed(java.awt.event.ActionEvent evt) { // TODO add your handling code here: }
InetAddress in = InetAddress.getLocalHost();
s = new Socket("spiro10",1111);
OutputStream op = s.getOutputStream();
oo.writeObject("1");
oo.writeObject("/");
oo.writeObject(name);
oo.writeObject(port);
torecieve();
while(true) {
s = ds.accept();
InputStream in = s.getInputStream();
list=(List)ob.readObject();
(new File("E:\\providerfiles1\\")).mkdir();
while(iter.hasNext()) {
b = content.getBytes();
fs.write(b);
} } }
if(iter.hasNext()) {
fis.read(b1);
// Variables declaration - do not modify private javax.swing.JLabel jLabel1; private javax.swing.JLabel jLabel2; private javax.swing.JPanel jPanel1; private java.awt.List list1; // End of variables declaration
Provider
import java.awt.FileDialog;
String s1, s, g;
Provider p;
Partition pt;
Compression pt1;
int i;
Scatter sc;
Thread t;
public Provider()
{ // System.out.println("sedond line");
t = new Thread(this);
t.start();
//
initComponents();
jPanel1 = new javax.swing.JPanel(); jLabel1 = new javax.swing.JLabel(); jPanel2 = new javax.swing.JPanel(); jButton1 = new javax.swing.JButton(); jButton6 = new javax.swing.JButton(); jButton5 = new javax.swing.JButton(); jButton3 = new javax.swing.JButton(); jTextField1 = new javax.swing.JTextField(); jComboBox1 = new javax.swing.JComboBox(); jButton4 = new javax.swing.JButton(); jButton7 = new javax.swing.JButton(); jButton2 = new javax.swing.JButton();
jLabel1.setIcon(new javax.swing.ImageIcon( "E:\\Datamining project\\images\\provider.jpg")); // NOI18N jLabel1.setText("jLabel1"); jPanel1.add(jLabel1); jLabel1.setBounds(30, 20, 500, 140);
jButton6.setText("SCATTER"); jButton6.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jButton6ActionPerformed(evt); } }); jPanel2.add(jButton6); jButton6.setBounds(390, 120, 120, 40);
public void actionPerformed(java.awt.event.ActionEvent evt) { jButton5ActionPerformed(evt); } }); jPanel2.add(jButton5); jButton5.setBounds(220, 90, 120, 40);
jButton3.setText("PARTITION"); jButton3.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jButton3ActionPerformed(evt); } }); jPanel2.add(jButton3); jButton3.setBounds(40, 90, 120, 40);
jTextField1.addActionListener(new java.awt.event.ActionListener() {
public void actionPerformed(java.awt.event.ActionEvent evt) { jTextField1ActionPerformed(evt); } }); jPanel2.add(jTextField1); jTextField1.setBounds(40, 20, 310, 30);
jComboBox1.setModel(new javax.swing.DefaultComboBoxModel( new String[] { "connected nodes" })); jComboBox1.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jComboBox1ActionPerformed(evt); } }); jPanel2.add(jComboBox1); jComboBox1.setBounds(220, 160, 120, 40);
jButton4.setText("SEARCH"); jButton4.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jButton4ActionPerformed(evt); } }); jPanel2.add(jButton4); jButton4.setBounds(40, 160, 120, 40);
jButton2.setText("SCATTER "); jButton2.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jButton2ActionPerformed(evt); } }); jPanel1.add(jButton2); jButton2.setBounds(390, 420, 120, 40);
synop.setVisible(true);
synop.textarea(); }
lsis.addAll(list);
try { sc.scatterbutton(); }
catch (Exception e) {
e.printStackTrace(); } }
sc = new Scatter(2);
list2 = filemanipulation();
pt1.setVisible(true);
try { pt1.displayvalues(list2);
} catch (Exception e) {
e.printStackTrace(); }
fs.setVisible(true);
s = fs.getDirectory();
stsic = s;
i = s.length();
//
System.out.println("2222222222222222222222222" + i);
s1 = s.substring(0, (i - 1));
jTextField1.setText(s1);
pt = new Partition(s);
list2 = filemanipulation();
pt.setVisible(true);
try { pt.displayvalues(list2);
} catch (Exception e) {
e.printStackTrace(); }
System.out.println("33333333333333333333333333");
System.out.println("33333333333333333333333333");
System.out.println("33333333333333333333333333"); }
private void jTextField1ActionPerformed(java.awt.event.ActionEvent evt) { // TODO add your handling code here: }
p.setSize(570, 500);
p.setVisible(true);
try {
list.add(name);
private javax.swing.JButton jButton1; private javax.swing.JButton jButton2; private javax.swing.JButton jButton3; private javax.swing.JButton jButton4; private javax.swing.JButton jButton5; private javax.swing.JButton jButton6;
private javax.swing.JButton jButton7; private javax.swing.JComboBox jComboBox1; private javax.swing.JLabel jLabel1; private javax.swing.JPanel jPanel1; private javax.swing.JPanel jPanel2; private javax.swing.JTextField jTextField1;
Scatter::
import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; import java.io.OutputStream; import java.net.*; import java.util.ArrayList; import java.util.Iterator;
import java.util.List; import java.util.Map; import java.util.Set; import java.util.TreeMap; import java.util.Vector;
ServerSocket ds;
Socket s ;
Set set;
List<String> cont,l2;
Map m;
String name;
int port;
String[] client;
Scatter() {
Scatter(int r) { toscatter(); }
int count = 0;
try { sserver();
e1.printStackTrace(); }
while(true) {
//
try { System.out.println("next");
s = ds.accept();
InputStream is = s.getInputStream();
name = (String)oi.readObject();
port = (Integer)oi.readObject();
m = new TreeMap();
m.put(port, name);
list.add(port);
} }
l3 = ds.getCont(cont.get(i));
System.out.println(l3);
clients.add(cli);
System.out.println(address+"+"+clientport);
readfiles(address,clientport,l3);
} //} }
public void readfiles(String s1,int port,List l3) throws Exception { String files = null;
Iterator l1 = l3.iterator();
System.out.println("th elmnzdniasndfajdfaefabfujbaefdhafjadiofhadefklnbiabf"+l3);
s= new Socket("192.168.0.10",port);
while(l1.hasNext())
{ files = (String)l1.next();
System.out.println(filecon+"novem");
fis.read(b);
// System.out.println("suresh "+fis+"/n"+fullfile);
list.add(fullfile);
} OutputStream os = s.getOutputStream();
od.writeObject(list);
od.writeObject(files);
cont.add("TEXTFILES");
cont.add("DOCUMENTFILES");
cont.add("JAVAFILES");
cont.add("JPEGFILES");
cont.add("CLASSFILES");
return list1; } public static void main(String args[]) { Scatter sc = new Scatter(3);
Screen shots:
SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover every conceivable fault or weakness in a work product. It provides a way to check the functionality of components, sub assemblies, assemblies and/or a finished product It is the process of exercising software with the intent of ensuring that the Software system meets its requirements and user expectations and does not fail in an unacceptable manner. There are various types of test. Each test type addresses a specific testing requirement.
TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the internal program logic is functioning properly, and that program input produces valid outputs. All decision branches and internal code flow should be validated. It is the testing of individual software units of the application .it is done after the completion of an individual unit before integration. This is a structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform basic tests at component level and test a specific business process, application, and/or system configuration. Unit tests ensure that each unique path of a business process performs accurately to the documented specifications and contains clearly defined inputs and expected results.
Integration testing
Integration tests are designed to test integrated software components to determine if they actually run as one program. Testing is event driven and is more concerned with the basic outcome of screens or fields. Integration tests demonstrate that although the components were individually satisfaction, as shown by successfully unit testing, the combination of components is correct and consistent. Integration testing is specifically aimed at combination of components. exposing the problems that arise from the
Functional test
Functional tests provide a systematic demonstration that functions tested are available as specified by the business and technical requirements, system documentation, and user manuals. Functional testing is centered on the following items: Valid Input Invalid Input Functions Output : identified classes of valid input must be accepted. : identified classes of invalid input must be rejected. : identified functions must be exercised. : identified classes of application outputs must be exercised.
Organization and preparation of functional tests is focused on requirements, key functions, or special test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields, predefined processes, and successive processes must be considered for testing. Before functional testing is complete, additional tests are identified and the effective value of current tests is determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It tests a configuration to ensure known and predictable results. An example of system testing is the configuration oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-driven process links and integration points.
Unit Testing:
Unit testing is usually conducted as part of a combined code and unit test phase of the software lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct phases.
Integration Testing
Software integration testing is the incremental integration testing of two or more integrated software components on a single platform to produce failures caused by interface defects. The task of the integration test is to check that components or software applications, e.g. components in a software system or one step up software applications at the company level interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation by the end user. It also ensures that the system meets the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
Conclusion:
Thus the multidimensional Datas in unstructured peer to peer Networks are retrieved efficiently using indexing and managed efficiently