2019 DatabaseII MiniProject 1 28700
2019 DatabaseII MiniProject 1 28700
2019 DatabaseII MiniProject 1 28700
Background
The course projects are worth 20%. Mini-Project 1 (this one) is worth 12%. No best policy.
Team Formation
This assignment should be done in teams of FOUR or FIVE (any other number not ok!). You can
make a team cross-tutorial and cross-major. If you cannot find a team, contact your TA ASAP,
and you will be synched with other students. The link to the teams form is:
https://goo.gl/forms/ZX63a1wpGS37pTao1
The deadline for team assignment will be 22nd of Feb at 11:50 PM.
Environment
You must use Java to develop this project. You can use any IDE of your choice such as Eclipse.
Submissions
Submissions to be made electronically via MET website.
Description
In this project, you are going to build a small database engine with support for a Bitmap index.
The required functionalities are 1) creating tables, 2) inserting tuples, 3) deleting tuples, 4)
searching in tuples, 5) creating a bitmap index, 6) searching using bitmap index.
The description below is numbered for ease of communication between you and course staff.
Tables
1) Each table/relation will be stored as pages on disk.
2) Supported type for a table’s column is one of: java.lang.Integer, java.lang.String,
java.lang.Double, java.lang.Boolean and java.util.Date
3) Each table should have an additional column beside those specified by the user. The
additional column must be called TouchDate and is initialized with the date/time of row
insertion and updated with current date/time every time a row is updated.
Pages
4) A page has a predetermined fixed maximum number of rows (N). For example, if a
table has 40000 tuples, and N=200, the table will be stored in 200 binary files.
5) You are required to use Java’s binary object file (.class) for emulating a page (to avoid
having you work with file system pages, which is not the scope of this course). A single
page must be stored as a serialized Vector (java.util.Vector). Note that you can save/load
any Java object to/from disk by implementing the java.io.Serializable interface. You
don’t actually add any code to your class. For more info, check these:
https://www.tutorialspoint.com/java/java_serialization.htm
6) A single tuple should be stored in a separate object inside the binary file.
7) You need to postpone the loading of a page until the tuples in that page are actually
needed. Note that the purpose of using pages is to avoid loading the entire table’s content
into memory. Hence, it defeats the purpose to load all pages upon program startup.
8) If all the rows in a page are deleted, then you are required to delete that page. Do not
keep around completely empty pages.
9) You might find it useful to create a Table java class to store relevant information about
the pages and serialize it just like you serialize a page. Note that to prevent serializing an
attribute, you will need to use the transient keyword in your attribute declaration. Read
more about that here:
https://en.wikibooks.org/wiki/Java_Programming/Keywords/transient
Meta-Data File
10) Each user table has meta data associated with it; number of columns, data type of
columns, which columns have indices built on them.
11) You will need to store the meta-data in a text file. This structure should have the
following layout: Table Name, Column Name, Column Type, Key, Indexed
For example, if a user creates a table/relation CityShop, specifying several attributes with
their types, etc… the file will be:
The above meta data teaches that there are 1 table of 7 tuples (ID, Name, X,Y,Z,
Specialization, Address). There are 4 bitmap indices created on this table
CityShop.
12) You must store the above metadata in a single file called metadata.csv. Do not worry
about its size in your solution.
13) You must use the metadata.csv file to learn about the types of the data being passed
and verify it is of the correct type. So, do not treat metadata.csv as decoration!
14) You can (but not required to) use reflection to load the data type and also value of a
column, for example:
strColType = "java.lang.Integer";
strColValue = "100";
Class class = Class.forName( strColType );
Constructor constructor = class.getConstructor( ….);
… = constructor.newInstance( );
For more info on reflection, check this article:
http://download.oracle.com/javase/tutorial/reflect/
Bitmap Index
15) You are required to support creating bitmap index on a specified column.
16) You are going to implement your own Bitmap Index data structure.
17) The keys in the index must be sorted.
18) Note that building a bitmap index on a column that is not used to cluster the table
(aka sort the table pages) will force you to create a dense index first and then use this
dense level in building the bitmap index.
19) Each bitmap page is an array of entry objects. Each entry consists of a column value,
and the bitmap.
20) The Bitmap index should be paged just like the table is paged!. For example if an
index page has a predetermined maximum number (M), and an index has 40000 tuples,
and M=2000, the index will be stored in 20 binary files.
21) When a table is created, you do not need to create an index. An index will be created
later on when the user requests that through a method call to createBitmapIndex.
22) You should update existing relevant indices when a tuple is inserted/deleted/updated.
23) Upon application startup; to avoid having to scan all tables to build existing indices,
you should save the index itself to disk and load it when the application starts next time.
24) The Bitmap (sequence of 1’s and 0’s) should be compressed to avoid wasting space.
You can use run length encoding or any other more sophisticated compression algorithm.
Java’s GZIPOutputStream or ZIPOutputStream are also acceptable.
25) Note that once an index exists, it should be used in executing queries. Hence, if an
index did not exist on a column that is being queried (select * from x = 20;), then it will
be answered using linear scan on x, but if an index is created on x, then x’s bitmap index
should be used to find if there are tuples with x=20;
26) Note that Bitmap indices should be used in answering multi-dimension queries
whether the bitmap indices have been created on all or some of columns used in the query
(i.e. if the query is on Table1.column1 and Table1.column2, and an index has been
created on Table.column1, then it should be used in answering the queries.
Required Methods/Class
27) Your main class should be called DBApp.java and should have the following seven
methods with the signature as specified. The parameters names are written using
Hungarian notation -- which you are not required to use but it does enhances the
readability of your code, so it is a good idea to try it out! You can add any other helper
methods and classes. Note that these 7 methods are the only way to run your Database
engine. You are not required to develop code that process SQL statements directly
because that requires knowledge beyond this course and it is not a learning outcome from
this course!
public void init( ); // this does whatever initialization you would like
// or leave it empty if there is no code you want to
// execute at application startup
Here is an example code that creates a table, creates an index, does few inserts, and a select;
String strTableName = "Student";
Hashtable htblColNameType = new Hashtable( );
htblColNameType.put("id", "java.lang.Integer");
htblColNameType.put("name", "java.lang.String");
htblColNameType.put("gpa", "java.lang.double");
createTable( strTableName, "id", htblColNameType );
createBitmapIndex( strTableName, "gpa" );
htblColNameValue.clear( );
htblColNameValue.put("id", new Integer( 453455 ));
htblColNameValue.put("name", new String("Ahmed Noor" ) );
htblColNameValue.put("gpa", new Double( 0.95 ) );
insertIntoTable( strTableName , htblColNameValue );
htblColNameValue.clear( );
htblColNameValue.put("id", new Integer( 5674567 ));
htblColNameValue.put("name", new String("Dalia Noor" ) );
htblColNameValue.put("gpa", new Double( 1.25 ) );
insertIntoTable( strTableName , htblColNameValue );
htblColNameValue.clear( );
htblColNameValue.put("id", new Integer( 23498 ));
htblColNameValue.put("name", new String("John Noor" ) );
htblColNameValue.put("gpa", new Double( 1.5 ) );
insertIntoTable( strTableName , htblColNameValue );
htblColNameValue.clear( );
htblColNameValue.put("id", new Integer( 78452 ));
htblColNameValue.put("name", new String("Zaky Noor" ) );
htblColNameValue.put("gpa", new Double( 0.88 ) );
insertIntoTable( strTableName , htblColNameValue );
SQLTerm[] arrSQLTerms;
arrSQLTerms = new SQLTerm[2];
arrSQLTerms[0]._strTableName = "Student";
arrSQLTerms[0]._strColumnName= "name";
arrSQLTerms[0]._strOperator = "=";
arrSQLTerms[0]._objValue = "John Noor";
arrSQLTerms[1]._strTableName = "Student";
arrSQLTerms[1]._strColumnName= "gpa";
arrSQLTerms[1]._strOperator = "=";
arrSQLTerms[1]._objValue = new Double( 1.5 );
config
DBApp.config
libs
src
teamname
DBApp.java
DBAppTest.java
Makefile
37) teamname is your team name! (choose any name you like!)
38) data directory contains the important metadata.csv which holds meta information
about the user created tables. Also, it will store binary files storing user table pages,
indices, and any other data related files you need to store.
39) docs directory contains html files generated by running javadoc on your source code
40) src directory is a the parent directory of all your java source files. You should use
teamname as the parent package of your files. You can add other Java source files you
write here.
41) classes directory is the parent directory of all your java class files. When you run
make all, the java executables should be generated in classes.
42) libs directory contains any third-party libraries/jar-files/java-classes. Those are the
ones you did not write and using in the assignment.
43) config directory contains the important configuration DBApp.properties which holds
a two parameters as key=value pairs
MaximumRowsCountinPage = 200
BitmapSize = 15
Where
MaximumRowsCountinPage as the name
indicates specifies the maximum number
of rows in a page.
BitmapSize specifies the count of entries
that could be stored in a single Bitmap
file.
44) DBApp.properties file could be read using java.util.Properties class
45) You can add other parameters to this file as per need.
46) Makefile is a text file for the make building utility on linux ! If you are going to
provide make, have make all and make clean targets. Else, you can use maven or ant
and submit maven/ant file instead.
Submission Milestones
There are two mini-submissions in this assignment as listed below. You are required to make
each submission on time/date. Below is a detailed description of each submission.
Submission # Date Submission Focus Methods to be
implemented (or
changed) in
DBApp.java