What are good use cases for Impala as opposed to Hive or MapReduce?
Impala is well-suited to executing SQL queries for interactive exploratory analytics on large data sets. Hive and MapReduce are appropriate for very long running, batch-oriented tasks such as ETL.
Is MapReduce required for Impala? Will Impala continue to work as expected if MapReduce is stopped?
Impala does not use MapReduce at all.
Why Impala instead of hive ?
Hive is a platform used to develop SQL type scripts to do MapReduce operations.
Hive is useful for batch processing.
Impala can be awesome for small ad-hoc queries.
Impala does not use map/reduce which are very expensive to fork in separate jvms.
It runs separate Impala Daemon which splits the query and runs them in parallel and merge result set at the end.
What is Impala?
Impala server is a distributed, massively parallel processing (MPP) database engine. Developed by Cloudera.
Based on Google 2010 dremel paper.
Direct access to data via impala engine.
It consists of different daemon processes that run on specific hosts within your CDH cluster.
No comments:
Post a Comment