Here we are going to see how to make hive as transactional
hive 1.4 itself we have hive as transactional property by default. prior version we have to add some property to make it as alive.
hive.support.concurrency – true
hive.enforce.bucketing – true
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager –org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.compactor.initiator.on – true
hive.compactor.worker.threads – 1
below property we have to add make hive transactional property as alive in hive-site.xml
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nostrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>
Above we have access for root user to do rw the file. I used cloudera uer to do it.cloudera use not root.so we have to switch to root
[cloudera@quickstart Desktop]$ who
cloudera tty1 2019-01-14 00:39 (:0)
cloudera pts/0 2019-01-24 21:51 (:0.0)
cloudera pts/1 2019-01-24 21:53 (:0.0)
[cloudera@quickstart Desktop]$ su root
Password:
[root@quickstart Desktop]# vi /etc/hive/conf/hive-site.xml
once I executed above command. got had edit access. pasted the following property in it before </configuration> tag
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nostrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>
This table is the purpose of staging
create table stage_HiveTrans
(EmployeeID Int,FirstName String,Designation String,
Salary Int,Department String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
This HiveTrans we are using for make the same table as ORC and transctional
create table HiveTransTest
(EmployeeID Int,FirstName String,Designation String,
Salary Int,Department String)
clustered by (department) into 3 buckets
stored as orc TBLPROPERTIES ('transactional'='true') ;
from stage_HiveTransTest
insert into table HiveTransTest
select employeeid,firstname,designation,salary,department;
my same table before update
hive> select * from HiveTrans;
OK
104 sunil web 60000 MGMT
103 sam web 30000 COR
102 mani web 35000 COR
101 satheesh bigdata 40000 COR
update the values
hive> update HiveTrans
> set salary = 50000
> where employeeid = 103;
Query ID = cloudera_20190124224949_cb23a524-bcba-4952-b832-de0d642bbe53
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1547455371417_0143, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1547455371417_0143/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1547455371417_0143
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3
2019-01-24 22:49:27,712 Stage-1 map = 0%, reduce = 0%
2019-01-24 22:50:27,858 Stage-1 map = 0%, reduce = 0%
2019-01-24 22:50:56,777 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 11.63 sec
2019-01-24 22:51:43,758 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 13.86 sec
2019-01-24 22:51:57,149 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 19.48 sec
2019-01-24 22:52:05,677 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 21.53 sec
MapReduce Total cumulative CPU time: 21 seconds 530 msec
Ended Job = job_1547455371417_0143
Loading data to table satheesh.hivetrans
Table satheesh.hivetest stats: [numFiles=5, numRows=4, totalSize=3150, rawDataSize=0]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3 Reduce: 3 Cumulative CPU: 21.53 sec HDFS Read: 34868 HDFS Write: 1017 SUCCESS
Total MapReduce CPU Time Spent: 21 seconds 530 msec
OK
Time taken: 170.966 seconds
checking whether it is updated or not
hive> select * from HiveTrans;
OK
104 sunil web 60000 MGMT
103 sam web 50000 COR
102 mani web 35000 COR
101 satheesh bigdata 40000 COR
now we can delete some rows also please try..Follow me further updates related hivde
Thanks,
saisma
No comments:
Post a Comment