Apache Hive - Amazon EMR

Apache Hive

Hive is an open-source, data warehouse, and analytic package that runs on top of a Hadoop cluster. Hive scripts use an SQL-like language called Hive QL (query language) that abstracts programming models and supports typical data warehouse interactions. Hive enables you to avoid the complexities of writing Tez jobs based on directed acyclic graphs (DAGs) or MapReduce programs in a lower level computer language, such as Java.

Hive extends the SQL paradigm by including serialization formats. You can also customize query processing by creating table schema that match your data, without touching the data itself. While SQL only supports primitive value types, such as dates, numbers, and strings), Hive table values are structured elements, such as JSON objects, any user-defined data type, or any function written in Java.

For more information about Hive, see http://hive.apache.org/.

The following table lists the version of Hive included in the latest release of the Amazon EMR 7.x series, along with the components that Amazon EMR installs with Hive.

For the version of components installed with Hive in this release, see Release 7.0.0 Component Versions.

Hive version information for emr-7.0.0
Amazon EMR Release Label Hive Version Components Installed With Hive

emr-7.0.0

Hive 3.1.3

emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server

The following table lists the version of Hive included in the latest release of the Amazon EMR 6.x series, along with the components that Amazon EMR installs with Hive.

For the version of components installed with Hive in this release, see Release 6.15.0 Component Versions.

Hive version information for emr-6.15.0
Amazon EMR Release Label Hive Version Components Installed With Hive

emr-6.15.0

Hive 3.1.3

emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server

The following table lists the version of Hive included in the latest release of the Amazon EMR 5.x series, along with the components that Amazon EMR installs with Hive.

For the version of components installed with Hive in this release, see Release 5.36.1 Component Versions.

Hive version information for emr-5.36.1
Amazon EMR Release Label Hive Version Components Installed With Hive

emr-5.36.1

Hive 2.3.9

emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn

Beginning with Amazon EMR 5.18.0, you can use the Amazon EMR artifact repository to build your job code against the exact versions of libraries and dependencies that are available with specific Amazon EMR releases. For more information, see Checking dependencies using the Amazon EMR artifact repository.