Usando HCatalog - Amazon EMR

Le traduzioni sono generate tramite traduzione automatica. In caso di conflitto tra il contenuto di una traduzione e la versione originale in Inglese, quest'ultima prevarrà.

Usando HCatalog

È possibile utilizzarlo HCatalog all'interno di varie applicazioni che utilizzano il metastore Hive. Gli esempi in questa sezione mostrano come creare una tabella e usarla nel contesto di Pig e Spark. SQL

Disabilita la scrittura diretta quando usi HCatalog HStorer

Ogni volta che un'applicazione utilizza HCatStorerla scrittura su una HCatalog tabella archiviata in Amazon S3, disabilita la funzionalità di scrittura diretta di Amazon. EMR Ad esempio, disabilita la scrittura diretta quando usi il STORE comando Pig o quando esegui processi Sqoop che scrivono HCatalog tabelle su Amazon S3. Puoi disabilitare la caratteristica di scrittura diretta impostando le configurazioni mapred.output.direct.NativeS3FileSystem e mapred.output.direct.EmrFileSystem su false. Nell'esempio seguente viene illustrato come impostare queste configurazioni tramite Java.

Configuration conf = new Configuration(); conf.set("mapred.output.direct.NativeS3FileSystem", "false"); conf.set("mapred.output.direct.EmrFileSystem", "false");

Crea una tabella usando HCat CLI e usa quei dati in Pig

Crea lo script seguente, impressions.q, sul cluster:

CREATE EXTERNAL TABLE impressions ( requestBeginTime string, adId string, impressionId string, referrer string, userAgent string, userCookie string, ip string ) PARTITIONED BY (dt string) ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe' with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' ) LOCATION 's3://[your region].elasticmapreduce/samples/hive-ads/tables/impressions/'; ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05');

Esegui lo script usando: HCat CLI

% hcat -f impressions.q Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK Time taken: 4.001 seconds OK Time taken: 0.519 seconds

Apri la shell Grunt e accedi ai dati in impressions:

% pig -useHCatalog -e "A = LOAD 'impressions' USING org.apache.hive.hcatalog.pig.HCatLoader(); B = LIMIT A 5; dump B;" <snip> (1239610346000,m9nwdo67Nx6q2kI25qt5On7peICfUM,omkxkaRpNhGPDucAiBErSh1cs0MThC,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET,wcVWWTascoPbGt6bdqDbuWTPPHgOPs,69.191.224.234,2009-04-13-08-05) (1239611000000,NjriQjdODgWBKnkGJUP6GNTbDeK4An,AWtXPkfaWGOaNeL9OOsFU8Hcj6eLHt,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.1.4322),OaMU1F2gE4CtADVHAbKjjRRks5kIgg,57.34.133.110,2009-04-13-08-05) (1239610462000,Irpv3oiu0I5QNQiwSSTIshrLdo9cM1,i1LDq44LRSJF0hbmhB8Gk7k9gMWtBq,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; InfoPath.1),QSb3wkLR4JAIut4Uq6FNFQIR1rCVwU,42.174.193.253,2009-04-13-08-05) (1239611007000,q2Awfnpe0JAvhInaIp0VGx9KTs0oPO,s3HvTflPB8JIE0IuM6hOEebWWpOtJV,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; InfoPath.1),QSb3wkLR4JAIut4Uq6FNFQIR1rCVwU,42.174.193.253,2009-04-13-08-05) (1239610398000,c362vpAB0soPKGHRS43cj6TRwNeOGn,jeas5nXbQInGAgFB8jlkhnprN6cMw7,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322),k96n5PnUmwHKfiUI0TFP0TNMfADgh9,51.131.29.87,2009-04-13-08-05) 7120 [main] INFO org.apache.pig.Main - Pig script completed in 7 seconds and 199 milliseconds (7199 ms) 16/03/08 23:17:10 INFO pig.Main: Pig script completed in 7 seconds and 199 milliseconds (7199 ms)

Accedere alla tabella usando Spark SQL

Questo esempio crea uno Spark DataFrame dalla tabella creata nel primo esempio e mostra le prime 20 righe:

% spark-shell --jars /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-1.0.0-amzn-3.jar <snip> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc); scala> val df = hiveContext.sql("SELECT * FROM impressions") scala> df.show() <snip> 16/03/09 17:18:46 INFO DAGScheduler: ResultStage 0 (show at <console>:32) finished in 10.702 s 16/03/09 17:18:46 INFO DAGScheduler: Job 0 finished: show at <console>:32, took 10.839905 s +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ |requestbegintime| adid| impressionid| referrer| useragent| usercookie| ip| dt| +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ | 1239610346000|m9nwdo67Nx6q2kI25...|omkxkaRpNhGPDucAi...|cartoonnetwork.com|Mozilla/4.0 (comp...|wcVWWTascoPbGt6bd...|69.191.224.234|2009-04-13-08-05| | 1239611000000|NjriQjdODgWBKnkGJ...|AWtXPkfaWGOaNeL9O...|cartoonnetwork.com|Mozilla/4.0 (comp...|OaMU1F2gE4CtADVHA...| 57.34.133.110|2009-04-13-08-05| | 1239610462000|Irpv3oiu0I5QNQiwS...|i1LDq44LRSJF0hbmh...|cartoonnetwork.com|Mozilla/4.0 (comp...|QSb3wkLR4JAIut4Uq...|42.174.193.253|2009-04-13-08-05| | 1239611007000|q2Awfnpe0JAvhInaI...|s3HvTflPB8JIE0IuM...|cartoonnetwork.com|Mozilla/4.0 (comp...|QSb3wkLR4JAIut4Uq...|42.174.193.253|2009-04-13-08-05| | 1239610398000|c362vpAB0soPKGHRS...|jeas5nXbQInGAgFB8...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610600000|cjBTpruoaiEtqLuMX...|XwlohBSs8Ipxs1bRa...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610804000|Ms3eJHNAEItpxvimd...|4SIj4pGmgVLl625BD...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610872000|h5bccHX6wJReDi1jL...|EFAWIiBdVfnxwAMWP...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610365000|874NBpGmxNFfxEPKM...|xSvE4XtGbdtXPF2Lb...|cartoonnetwork.com|Mozilla/5.0 (Maci...|eWDEVVUphlnRa273j...| 22.91.173.232|2009-04-13-08-05| | 1239610348000|X8gISpUTSqh1A5reS...|TrFblGT99AgE75vuj...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610743000|kbKreLWB6QVueFrDm...|kVnxx9Ie2i3OLTxFj...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610812000|9lxOSRpEi3bmEeTCu...|1B2sff99AEIwSuLVV...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610876000|lijjmCf2kuxfBTnjL...|AjvufgUtakUFcsIM9...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610941000|t8t8trgjNRPIlmxuD...|agu2u2TCdqWP08rAA...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610490000|OGRLPVNGxiGgrCmWL...|mJg2raBUpPrC8OlUm...| corriere.it|Mozilla/4.0 (comp...|r2k96t1CNjSU9fJKN...| 71.124.66.3|2009-04-13-08-05| | 1239610556000|OnJID12x0RXKPUgrD...|P7Pm2mPdW6wO8KA3R...| corriere.it|Mozilla/4.0 (comp...|r2k96t1CNjSU9fJKN...| 71.124.66.3|2009-04-13-08-05| | 1239610373000|WflsvKIgOqfIE5KwR...|TJHd1VBspNcua0XPn...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610768000|4MJR0XxiVCU1ueXKV...|1OhGWmbvKf8ajoU8a...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610832000|gWIrpDiN57i3sHatv...|RNL4C7xPi3tdar2Uc...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610789000|pTne9k62kJ14QViXI...|RVxJVIQousjxUVI3r...| pixnet.net|Mozilla/5.0 (Maci...|1bGOKiBD2xmui9OkF...| 33.176.101.80|2009-04-13-08-05| +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ only showing top 20 rows scala>