Saturday, 26 January 2019

make hive as transactional

Here we are going to see how to make hive as transactional


hive 1.4 itself we have hive as transactional property by default. prior version we have to add some property to make it as alive.



 hive.support.concurrency – true
 hive.enforce.bucketing – true
 hive.exec.dynamic.partition.mode – nonstrict
 hive.txn.manager –org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
 hive.compactor.initiator.on – true
 hive.compactor.worker.threads – 1


below property we have to add make hive transactional property as alive in hive-site.xml

<property>
    <name>hive.support.concurrency</name>
    <value>true</value>
</property>
<property>
    <name>hive.enforce.bucketing</name>
    <value>true</value>
</property>
<property>
    <name>hive.exec.dynamic.partition.mode</name>
    <value>nostrict</value>
</property>
<property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
    <name>hive.compactor.initiator.on</name>
    <value>true</value>
</property>
<property>
    <name>hive.compactor.worker.threads</name>
    <value>1</value>
</property>



Above we have access for root user to do rw the file. I used cloudera uer to do it.cloudera use not root.so we have to switch to root


[cloudera@quickstart Desktop]$ who
cloudera tty1         2019-01-14 00:39 (:0)
cloudera pts/0        2019-01-24 21:51 (:0.0)
cloudera pts/1        2019-01-24 21:53 (:0.0)
[cloudera@quickstart Desktop]$ su root
Password:
[root@quickstart Desktop]# vi /etc/hive/conf/hive-site.xml

once I executed above command. got had edit access. pasted the following property in it before </configuration> tag

<property>
    <name>hive.support.concurrency</name>
    <value>true</value>
</property>
<property>
    <name>hive.enforce.bucketing</name>
    <value>true</value>
</property>
<property>
    <name>hive.exec.dynamic.partition.mode</name>
    <value>nostrict</value>
</property>
<property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
    <name>hive.compactor.initiator.on</name>
    <value>true</value>
</property>
<property>
    <name>hive.compactor.worker.threads</name>
    <value>1</value>
</property>


This table is the purpose of  staging

create table stage_HiveTrans
   (EmployeeID Int,FirstName String,Designation String,
     Salary Int,Department String) 
 ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
    


This HiveTrans we are using for make the same table as ORC and transctional

create table HiveTransTest 
   (EmployeeID Int,FirstName String,Designation String,
     Salary Int,Department String) 
   clustered by (department) into 3 buckets 
   stored as orc TBLPROPERTIES ('transactional'='true') ;



from stage_HiveTransTest 
   insert into table HiveTransTest 
   select employeeid,firstname,designation,salary,department;


my same table before update

hive> select * from HiveTrans;
OK
104 sunil web 60000 MGMT  
103 sam web 30000 COR
102 mani web 35000 COR
101 satheesh bigdata 40000 COR

update the values

hive> update HiveTrans 
    >     set salary = 50000
    >     where employeeid = 103;
Query ID = cloudera_20190124224949_cb23a524-bcba-4952-b832-de0d642bbe53
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1547455371417_0143, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1547455371417_0143/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1547455371417_0143
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3
2019-01-24 22:49:27,712 Stage-1 map = 0%,  reduce = 0%
2019-01-24 22:50:27,858 Stage-1 map = 0%,  reduce = 0%
2019-01-24 22:50:56,777 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 11.63 sec
2019-01-24 22:51:43,758 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 13.86 sec
2019-01-24 22:51:57,149 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 19.48 sec
2019-01-24 22:52:05,677 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 21.53 sec
MapReduce Total cumulative CPU time: 21 seconds 530 msec
Ended Job = job_1547455371417_0143
Loading data to table satheesh.hivetrans
Table satheesh.hivetest stats: [numFiles=5, numRows=4, totalSize=3150, rawDataSize=0]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 3   Cumulative CPU: 21.53 sec   HDFS Read: 34868 HDFS Write: 1017 SUCCESS
Total MapReduce CPU Time Spent: 21 seconds 530 msec
OK
Time taken: 170.966 seconds

checking whether it is updated or not

hive> select * from HiveTrans;
OK
104 sunil web 60000 MGMT
103 sam web 50000 COR
102 mani web 35000 COR
101 satheesh bigdata 40000 COR


now we can delete some rows also please try..Follow me further updates related hivde


Thanks,
saisma













No comments:

Post a Comment