Learn to Install Cloudera Hive in AWS

Apache Hive is a data warehouse tool built on top of Apache Hadoop that allows users to query and analyze data. Hive provides a SQL Like Interface for querying data stored in various Hadoop-integrated databases and filesystems. It allows for the reading, writing, and management of large datasets in distributed storage. Hive queries are transformed into a series of jobs that run on a Hadoop cluster using MapReduce or Apache Spark. It also makes batch processing for Apache Hadoop simple and familiar.

Let us learn how to set up Cloudera Hive in AWS!

Install Cloudera Hive in AWS

To install Cloudera Hive on the AWS platform, follow the steps below.

Step 1: Go to https://www.cloudera.com/downloads/connectors/hive/odbc/2-5-25.html and get the software.

ClouderaHiveODBC-2.5.25.1020-1.el7.x86 64.rpm will get download.

Step 2: Log in as root to EC2 and create the directory listed below.

mkdir -p /tmp/cloudera-hive && cd /tmp/cloudera-hive

Furthermore, Step 3: Put the rpm package in the above-mentioned path.

aws s3 cp s3://test-bucket/Hive/ClouderaHiveODBC-2.5.25.1020-1.el7.x86_64.rpm .

Step 4:Set up the package.

yum --nogpgcheck localinstall ClouderaHiveODBC-2.5.25.1020-1.el7.x86_64.rpm

Also, Step 5: Examine the package

yum list | grep ClouderaHiveODBC

Step 6: If the directory does not already exist, you must create it.

mkdir -p /opt/odbc_path/client/ODBC_64

Furthermore, Step 7: Navigate to the directory and create two odbc files (odbc.ini and obcinst.ini) within it.

cd /opt/odbc_path/client/ODBC_64

Make an odbc.ini file with the following contents.

[ODBC]
#QEWSD=2458358
 InstallDir=/opt/teradata/client/ODBC_64
 Trace=no
 Pooling=yes
 [ODBC Data Sources]
 Teradata ODBC DSN=Teradata Database ODBC Driver 16.20
 [Teradata ODBC DSN]
 Description=Teradata Database ODBC Driver 16.20
 [testdsn]
 Driver=/opt/teradata/client/ODBC_64/lib/tdataodbc_sb64.so
 DBCName=<Your DB End Point>
 MechanismName=LDAP
 Username=<User Name>
 Passowrd=<Password>
 Database=<DB Name>
 AccountString=
 CharacterSet=ASCII
 DatasourceDNSEntries=
 DateTimeFormat=AAA
 DefaultDatabase=
 DontUseHelpDatabase=0
 DontUseTitles=1
 EnableExtendedStmtInfo=1
 EnableReadAhead=1
 IgnoreODBCSearchPattern=0
 LogErrorEvents=0
 LoginTimeout=20
 MaxRespSize=65536
 MaxSingleLOBBytes=0
 MaxTotalLOBBytesPerRow=0
 MechanismName=
 NoScan=0
 PrintOption=N
 retryOnEINTR=1
 ReturnGeneratedKeys=N
 SessionMode=System Default
 SplOption=Y
 TABLEQUALIFIER=0
 TCPNoDelay=1
 TdmstPortNumber=1025
 UPTMode=Not set
 USE2XAPPCUSTOMCATALOGMODE=0
 UseDataEncryption=0
 UseDateDataForTimeStampParams=0
 USEINTEGRATEDSECURITY=0
 UseSequentialRetrievalOnly=0
 UseXViews=0
 [ODBC]
 Trace = 1
 TraceFile =
 [ODBC Data Sources]
 Cloudera Hive 32-bit=Cloudera ODBC Driver for Apache Hive 32-bit
 Cloudera Hive 64-bit=Cloudera ODBC Driver for Apache Hive 64-bit
 [Cloudera Hive 32-bit]
 Description=Cloudera ODBC Driver for Apache Hive (32-bit) DSN
 Driver=/opt/cloudera/hiveodbc/lib/32/libclouderahiveodbc32.so
 HOST=[HOST]
 PORT=[PORT]
 Schema=default
 ServiceDiscoveryMode=0
 ZKNamespace=
 HiveServerType=2
 AuthMech=2
 ThriftTransport=1
 UseNativeQuery=0
 UID=
 KrbHostFQDN=_HOST
 KrbServiceName=hive
 KrbRealm=
 SSL=0
 TwoWaySSL=0
 ClientCert=
 ClientPrivateKey=
 ClientPrivateKeyPassword=
 [Hive]
 Description=Cloudera ODBC Driver for Apache Hive (64-bit) DSN
 Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
 HOST=
 PORT=
 Schema=
 ServiceDiscoveryMode=0
 ZKNamespace=
 HiveServerType=2
 AuthMech=3
 ThriftTransport=1
 UseNativeQuery=0
 UID=
 PWD=
 KrbHostFQDN=_HOST
 KrbServiceName=hive
 KrbRealm=
 SSL=0
 TwoWaySSL=0
 ClientCert=
 ClientPrivateKey=
 ClientPrivateKeyPassword=

Create an odbcinst.ini file with the following contents:

[ODBC Drivers]
Teradata Database ODBC Driver 16.20=Installed
[Teradata Database ODBC Driver 16.20]
Description=Teradata Database ODBC Driver 16.20
Driver=/opt/teradata/client/ODBC_64/lib/tdataodbc_sb64.so
[ODBC Drivers]
Cloudera ODBC Driver for Apache Hive 32-bit=Installed
Cloudera ODBC Driver for Apache Hive 64-bit=Installed
[Cloudera ODBC Driver for Apache Hive 32-bit]
Description=Cloudera ODBC Driver for Apache Hive (32-bit)
Driver=/opt/cloudera/hiveodbc/lib/32/libclouderahiveodbc32.so
[Cloudera ODBC Driver for Apache Hive 64-bit]
Description=Cloudera ODBC Driver for Apache Hive (64-bit)
Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

Step 8: Configure the environment variables

export ODBCINI=/opt/odbc_path/client/ODBC_64/odbc.ini 
export ODBCINSTINI=/opt/odbc_path/client/ODBC_64/odbcinst.ini

You could also update the same in your /etc/profile file to avoid having to update environment variables every time they change.

You are now ready to test connectivity to Hive EDL using pyodbc python packages. Here’s an example of how to connect to Hive.

import pyodbc
pyodbc.autocommit = True
pyodbc.pooling = False
conn_str = "DSN="+'Hive'+";HOST="+'Hostname'+";UID="+'User_ID'+";PWD="+'Password'+";PORT="+'Port_No'
con = pyodbc.connect(conn_str, autocommit=True)

Hurry up and try the free practice tests now for AWS exams offered by testpreptraining.com!

Anandita Doda

Categories

Learn to Install Cloudera Hive in AWS

Install Cloudera Hive in AWS

Prepare for Assured Success