I am using
drop table <table_name>
If I recreate the table with the same schema and name, I am getting the old data back. Should I remove the table directory from hdfs file system to completely get rid of the data?
I am using
drop table <table_name>
If I recreate the table with the same schema and name, I am getting the old data back. Should I remove the table directory from hdfs file system to completely get rid of the data?
You have to change the external to internal table before drop it:
example
beeline> ALTER TABLE $tablename SET TBLPROPERTIES('EXTERNAL'='False'); // make the table as internal
and then:
beeline> drop table $tablename; //if you drop the table data will be dropped as well.
First get path of the table using following command :
hive> describe formatted database_name.table_name;
Then copy entire location which appear in description, for example : /user/hive/warehouse/database_name.db/table_name
After this use following command to truncate all the data from given table :
***hive> dfs -rmr /user/hive/warehouse/database_name.db/table_name;***
OR
***hive> dfs -rm -r /user/hive/warehouse/database_name.db/table_name;***
Then you can wipe it completely using DROP TABLE command.
Although I agree with pensz, a slight alteration, you need not drop the table. Just replace the external hdfs file with whichever new file you want (the structure of the replaced file should be the same) and when you do a select * of the previous table, you will notice that it will have the new data and not the old one.
External tables basically only denote the schema of the data and the location of the file. You can add many files to the same location, and your table will automatically contain all the data related to these files. Similarly, you can replace any data and automatically your table will reflect this.
No need to remove the directory in hdfs except you need more hdfs space.
If you wanna replace new data, u just need to replace file in hdfs.
If u wanna use the table name for other use, then drop the table and remove the directory in hdfs.
In fact, I think this is a very handy feature that you can change your table's schema(for instance, you wanna change field name or concat two field to one field) without lose any data.
if it is an external table, dropping the table means you are just deleting the scheme
so you have to manually delete the file from HDFS
or create a new table, and give a different file location in tbl properties
Indeed dropping EXTERNAL TABLES won't delete data.
You can use TRUNCATE TABLE to get rid of data.
Doc here:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TruncateTable
Then use DROP TABLE to delete schema if needed