Apache Hive is a data warehouse tool built on top of Apache Hadoop that allows users to query and analyze data. Hive provides a SQL Like Interface for querying data stored in various Hadoop-integrated databases and filesystems. It allows for the reading, writing, and management of large datasets in distributed storage. Hive queries are transformed into a series of jobs that run on a Hadoop cluster using MapReduce or Apache Spark. It also makes batch processing for Apache Hadoop simple and familiar.
Let us learn how to set up Cloudera Hive in AWS!
Install Cloudera Hive in AWS
To install Cloudera Hive on the AWS platform, follow the steps below.
Step 1: Go to https://www.cloudera.com/downloads/connectors/hive/odbc/2-5-25.html and get the software.
ClouderaHiveODBC-2.5.25.1020-1.el7.x86 64.rpm will get download.
Step 2: Log in as root to EC2 and create the directory listed below.
mkdir -p /tmp/cloudera-hive && cd /tmp/cloudera-hive
Furthermore, Step 3: Put the rpm package in the above-mentioned path.
aws s3 cp s3://test-bucket/Hive/ClouderaHiveODBC-2.5.25.1020-1.el7.x86_64.rpm .
Step 4:Set up the package.
yum --nogpgcheck localinstall ClouderaHiveODBC-2.5.25.1020-1.el7.x86_64.rpm
Also, Step 5: Examine the package
yum list | grep ClouderaHiveODBC
Step 6: If the directory does not already exist, you must create it.
mkdir -p /opt/odbc_path/client/ODBC_64
Furthermore, Step 7: Navigate to the directory and create two odbc files (odbc.ini and obcinst.ini) within it.
cd /opt/odbc_path/client/ODBC_64
Make an odbc.ini file with the following contents.
[ODBC] #QEWSD=2458358 InstallDir=/opt/teradata/client/ODBC_64 Trace=no Pooling=yes [ODBC Data Sources] Teradata ODBC DSN=Teradata Database ODBC Driver 16.20 [Teradata ODBC DSN] Description=Teradata Database ODBC Driver 16.20 [testdsn] Driver=/opt/teradata/client/ODBC_64/lib/tdataodbc_sb64.so DBCName=<Your DB End Point> MechanismName=LDAP Username=<User Name> Passowrd=<Password> Database=<DB Name> AccountString= CharacterSet=ASCII DatasourceDNSEntries= DateTimeFormat=AAA DefaultDatabase= DontUseHelpDatabase=0 DontUseTitles=1 EnableExtendedStmtInfo=1 EnableReadAhead=1 IgnoreODBCSearchPattern=0 LogErrorEvents=0 LoginTimeout=20 MaxRespSize=65536 MaxSingleLOBBytes=0 MaxTotalLOBBytesPerRow=0 MechanismName= NoScan=0 PrintOption=N retryOnEINTR=1 ReturnGeneratedKeys=N SessionMode=System Default SplOption=Y TABLEQUALIFIER=0 TCPNoDelay=1 TdmstPortNumber=1025 UPTMode=Not set USE2XAPPCUSTOMCATALOGMODE=0 UseDataEncryption=0 UseDateDataForTimeStampParams=0 USEINTEGRATEDSECURITY=0 UseSequentialRetrievalOnly=0 UseXViews=0 [ODBC] Trace = 1 TraceFile = [ODBC Data Sources] Cloudera Hive 32-bit=Cloudera ODBC Driver for Apache Hive 32-bit Cloudera Hive 64-bit=Cloudera ODBC Driver for Apache Hive 64-bit [Cloudera Hive 32-bit] Description=Cloudera ODBC Driver for Apache Hive (32-bit) DSN Driver=/opt/cloudera/hiveodbc/lib/32/libclouderahiveodbc32.so HOST=[HOST] PORT=[PORT] Schema=default ServiceDiscoveryMode=0 ZKNamespace= HiveServerType=2 AuthMech=2 ThriftTransport=1 UseNativeQuery=0 UID= KrbHostFQDN=_HOST KrbServiceName=hive KrbRealm= SSL=0 TwoWaySSL=0 ClientCert= ClientPrivateKey= ClientPrivateKeyPassword= [Hive] Description=Cloudera ODBC Driver for Apache Hive (64-bit) DSN Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so HOST= PORT= Schema= ServiceDiscoveryMode=0 ZKNamespace= HiveServerType=2 AuthMech=3 ThriftTransport=1 UseNativeQuery=0 UID= PWD= KrbHostFQDN=_HOST KrbServiceName=hive KrbRealm= SSL=0 TwoWaySSL=0 ClientCert= ClientPrivateKey= ClientPrivateKeyPassword=
Create an odbcinst.ini file with the following contents:
[ODBC Drivers] Teradata Database ODBC Driver 16.20=Installed [Teradata Database ODBC Driver 16.20] Description=Teradata Database ODBC Driver 16.20 Driver=/opt/teradata/client/ODBC_64/lib/tdataodbc_sb64.so [ODBC Drivers] Cloudera ODBC Driver for Apache Hive 32-bit=Installed Cloudera ODBC Driver for Apache Hive 64-bit=Installed [Cloudera ODBC Driver for Apache Hive 32-bit] Description=Cloudera ODBC Driver for Apache Hive (32-bit) Driver=/opt/cloudera/hiveodbc/lib/32/libclouderahiveodbc32.so [Cloudera ODBC Driver for Apache Hive 64-bit] Description=Cloudera ODBC Driver for Apache Hive (64-bit) Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
Step 8: Configure the environment variables
export ODBCINI=/opt/odbc_path/client/ODBC_64/odbc.ini export ODBCINSTINI=/opt/odbc_path/client/ODBC_64/odbcinst.ini
You could also update the same in your /etc/profile file to avoid having to update environment variables every time they change.
You are now ready to test connectivity to Hive EDL using pyodbc python packages. Here’s an example of how to connect to Hive.
import pyodbc pyodbc.autocommit = True pyodbc.pooling = False conn_str = "DSN="+'Hive'+";HOST="+'Hostname'+";UID="+'User_ID'+";PWD="+'Password'+";PORT="+'Port_No' con = pyodbc.connect(conn_str, autocommit=True)
Hurry up and try the free practice tests now for AWS exams offered by testpreptraining.com!