Uso do HCatalog - Amazon EMR

Uso do HCatalog

Você pode usar o HCatalog em vários aplicativos que usam a metastore Hive. Os exemplos nesta seção mostram como criar uma tabela e usá-la no contexto do Pig e do Spark SQL.

Desabilitar a gravação direta ao usar o HStorer do HCatalog

Sempre que uma aplicação usar o HCatStorer para gravar em uma tabela do HCatalog armazenada no Amazon S3, desabilite o atributo de gravação direta do Amazon EMR. Por exemplo, desativar a gravação direta ao usar o comando STORE do Pig ou ao executar trabalhos do Sqoop que gravam tabelas do HCatalog no Amazon S3. Você pode desativar o recurso de gravação direta definindo as configurações mapred.output.direct.NativeS3FileSystem e mapred.output.direct.EmrFileSystem como false. O exemplo a seguir demonstra como definir essas configurações usando o Java.

Configuration conf = new Configuration(); conf.set("mapred.output.direct.NativeS3FileSystem", "false"); conf.set("mapred.output.direct.EmrFileSystem", "false");

Criar uma tabela usando a CLI do HCat e usar esses dados no Pig

Crie o seguinte script, impressions.q, no seu cluster:

CREATE EXTERNAL TABLE impressions ( requestBeginTime string, adId string, impressionId string, referrer string, userAgent string, userCookie string, ip string ) PARTITIONED BY (dt string) ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe' with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' ) LOCATION 's3://[your region].elasticmapreduce/samples/hive-ads/tables/impressions/'; ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05');

Execute o script usando a CLI do HCat:

% hcat -f impressions.q Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK Time taken: 4.001 seconds OK Time taken: 0.519 seconds

Abra o shell do Grunt e acesse os dados em impressions:

% pig -useHCatalog -e "A = LOAD 'impressions' USING org.apache.hive.hcatalog.pig.HCatLoader(); B = LIMIT A 5; dump B;" <snip> (1239610346000,m9nwdo67Nx6q2kI25qt5On7peICfUM,omkxkaRpNhGPDucAiBErSh1cs0MThC,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET,wcVWWTascoPbGt6bdqDbuWTPPHgOPs,69.191.224.234,2009-04-13-08-05) (1239611000000,NjriQjdODgWBKnkGJUP6GNTbDeK4An,AWtXPkfaWGOaNeL9OOsFU8Hcj6eLHt,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.1.4322),OaMU1F2gE4CtADVHAbKjjRRks5kIgg,57.34.133.110,2009-04-13-08-05) (1239610462000,Irpv3oiu0I5QNQiwSSTIshrLdo9cM1,i1LDq44LRSJF0hbmhB8Gk7k9gMWtBq,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; InfoPath.1),QSb3wkLR4JAIut4Uq6FNFQIR1rCVwU,42.174.193.253,2009-04-13-08-05) (1239611007000,q2Awfnpe0JAvhInaIp0VGx9KTs0oPO,s3HvTflPB8JIE0IuM6hOEebWWpOtJV,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; InfoPath.1),QSb3wkLR4JAIut4Uq6FNFQIR1rCVwU,42.174.193.253,2009-04-13-08-05) (1239610398000,c362vpAB0soPKGHRS43cj6TRwNeOGn,jeas5nXbQInGAgFB8jlkhnprN6cMw7,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322),k96n5PnUmwHKfiUI0TFP0TNMfADgh9,51.131.29.87,2009-04-13-08-05) 7120 [main] INFO org.apache.pig.Main - Pig script completed in 7 seconds and 199 milliseconds (7199 ms) 16/03/08 23:17:10 INFO pig.Main: Pig script completed in 7 seconds and 199 milliseconds (7199 ms)

Acessar a tabela usando o Spark SQL

Este exemplo cria um Spark DataFrame a partir da tabela criada no primeiro exemplo e mostra as primeiras 20 linhas:

% spark-shell --jars /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-1.0.0-amzn-3.jar <snip> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc); scala> val df = hiveContext.sql("SELECT * FROM impressions") scala> df.show() <snip> 16/03/09 17:18:46 INFO DAGScheduler: ResultStage 0 (show at <console>:32) finished in 10.702 s 16/03/09 17:18:46 INFO DAGScheduler: Job 0 finished: show at <console>:32, took 10.839905 s +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ |requestbegintime| adid| impressionid| referrer| useragent| usercookie| ip| dt| +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ | 1239610346000|m9nwdo67Nx6q2kI25...|omkxkaRpNhGPDucAi...|cartoonnetwork.com|Mozilla/4.0 (comp...|wcVWWTascoPbGt6bd...|69.191.224.234|2009-04-13-08-05| | 1239611000000|NjriQjdODgWBKnkGJ...|AWtXPkfaWGOaNeL9O...|cartoonnetwork.com|Mozilla/4.0 (comp...|OaMU1F2gE4CtADVHA...| 57.34.133.110|2009-04-13-08-05| | 1239610462000|Irpv3oiu0I5QNQiwS...|i1LDq44LRSJF0hbmh...|cartoonnetwork.com|Mozilla/4.0 (comp...|QSb3wkLR4JAIut4Uq...|42.174.193.253|2009-04-13-08-05| | 1239611007000|q2Awfnpe0JAvhInaI...|s3HvTflPB8JIE0IuM...|cartoonnetwork.com|Mozilla/4.0 (comp...|QSb3wkLR4JAIut4Uq...|42.174.193.253|2009-04-13-08-05| | 1239610398000|c362vpAB0soPKGHRS...|jeas5nXbQInGAgFB8...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610600000|cjBTpruoaiEtqLuMX...|XwlohBSs8Ipxs1bRa...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610804000|Ms3eJHNAEItpxvimd...|4SIj4pGmgVLl625BD...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610872000|h5bccHX6wJReDi1jL...|EFAWIiBdVfnxwAMWP...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610365000|874NBpGmxNFfxEPKM...|xSvE4XtGbdtXPF2Lb...|cartoonnetwork.com|Mozilla/5.0 (Maci...|eWDEVVUphlnRa273j...| 22.91.173.232|2009-04-13-08-05| | 1239610348000|X8gISpUTSqh1A5reS...|TrFblGT99AgE75vuj...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610743000|kbKreLWB6QVueFrDm...|kVnxx9Ie2i3OLTxFj...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610812000|9lxOSRpEi3bmEeTCu...|1B2sff99AEIwSuLVV...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610876000|lijjmCf2kuxfBTnjL...|AjvufgUtakUFcsIM9...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610941000|t8t8trgjNRPIlmxuD...|agu2u2TCdqWP08rAA...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610490000|OGRLPVNGxiGgrCmWL...|mJg2raBUpPrC8OlUm...| corriere.it|Mozilla/4.0 (comp...|r2k96t1CNjSU9fJKN...| 71.124.66.3|2009-04-13-08-05| | 1239610556000|OnJID12x0RXKPUgrD...|P7Pm2mPdW6wO8KA3R...| corriere.it|Mozilla/4.0 (comp...|r2k96t1CNjSU9fJKN...| 71.124.66.3|2009-04-13-08-05| | 1239610373000|WflsvKIgOqfIE5KwR...|TJHd1VBspNcua0XPn...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610768000|4MJR0XxiVCU1ueXKV...|1OhGWmbvKf8ajoU8a...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610832000|gWIrpDiN57i3sHatv...|RNL4C7xPi3tdar2Uc...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610789000|pTne9k62kJ14QViXI...|RVxJVIQousjxUVI3r...| pixnet.net|Mozilla/5.0 (Maci...|1bGOKiBD2xmui9OkF...| 33.176.101.80|2009-04-13-08-05| +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ only showing top 20 rows scala>