Detecting and processing compressed files with Amazon EMR
Hadoop checks the file extension to detect compressed files. The compression types supported by Hadoop are: gzip, bzip2, and LZO. You do not need to take any additional action to extract files using these types of compression; Hadoop handles it for you.
To index LZO files, you can use the hadoop-lzo library which can be downloaded
from https://github.com/kevinweil/hadoop-lzo