Sunday, 14 April 2019

sqoop

Limitation:
1.It should have one unique column or if its not it should have date column


How to modify/reset incremental.last.value in a sqoop job
===========================================

At the initial stage I thought it was impossible to reset the incremental.last.value in a sqoop job. After creating a sqoop job for incremental import of data from sqlserver to hadoop, there comes a request to re-dump the data.

Although, flushing the previously ingested data in hadoop seems to be effortless using the command hdfs dfs -rm /location/to/dir/part*  the tough one comes after trying to execute the sqoop job again. The job executes without pulling any record because it sees no increment in the number of records. So, instead of removing the sqoop job and creating another one afresh, the way to go is to reset the incremental.last.value to 0.  And this can be done by changing the value of the last record in the sqoop metastore. The steps involves:


  1. Navigate to the home directory using cd ~
  2. Locate the sqoop metastore using ls -a
  3. You'll see a dir named .sqoop, cd into it with cd .sqoop
  4. vi metastore.db.script or nano metastore.db.script

Scroll all the way down to see the details of the most recent job executed. Then locate the line with 'incremental.last.value','xxxxxx','SqoopOptions') where xxxxxx represent the last record pulled. Then change the value to 0 or whatever number you want you next job execution to start with.
Save the file and execute the sqoop job again, I'm sure you'll be fine just like me.



Alternatively if the logs are missed. Use the below command

sqoop job --show <jobname> - It will list all the properties of the job

Refer to

incremental.last.value - This will contain the latest value of incremental job performed.

It will be updated each time when we run the job via sqoop job --exec <jobname>

No comments:

Post a Comment