Configuring an external metastore for Hive - Amazon EMR

Configuring an external metastore for Hive

By default, Hive records metastore information in a MySQL database on the primary node's file system. The metastore contains a description of the table and the underlying data on which it is built, including the partition names, data types, and so on. When a cluster terminates, all cluster nodes shut down, including the primary node. When this happens, local data is lost because node file systems use ephemeral storage. If you need the metastore to persist, you must create an external metastore that exists outside the cluster.

You have two options for an external metastore:

Note

If you're using Hive 3 and encounter too many connections to Hive metastore, configure the parameter datanucleus.connectionPool.maxPoolSize to have a smaller value or increase the number of connection the database server can handle. The increased number of connections is due to the way Hive computes the maximum number of JDBC connections. To calculate the optimal value for performance, see Hive Configuration Properties.