Running HQL from Python without using the Hive Standalone Server

To use a language other than Java (say python) with Hive, you must use the Hive Standalone Server. The main disadvantage of using the Hive Standalone Server is that it is currently single threaded [HIVE-80]. Additionally, there is the inconvenience of running an additional server.

We can solve this problem by using Jython (and possibly JRuby). Jython enables us to use Hive's Java client library to execute the HQL query and retrieve the results. We can then process the results in pure python.

Let us try it out:

STEP 1:

Download and install Jython.

STEP 2:

Make sure you have the following jars and directories in your CLASSPATH.

  • hive-service-0.6.0.jar
  • libfb303.jar
  • log4j-1.2.15.jar
  • antlr-runtime-3.0.1.jar derby.jar
  • jdo2-api-2.3-SNAPSHOT.jar
  • commons-logging-1.0.4.jar
  • datanucleus-core-1.1.2.jar
  • datanucleus-enhancer-1.1.2.jar
  • datanucleus-rdbms-1.1.2.jar
  • hive-exec-0.6.0.jar
  • hive-jdbc-0.6.0.jar
  • hive-metastore-0.6.0.jar
  • derby.jar
  • jdo2-api-2.3-SNAPSHOT.jar
  • commons-lang-2.4.jar
  • hadoopcore/hadoop-0.20.0/hadoop-0.20.0-core.jar
  • /usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar
  • conf (this is your hive installation's build/dist/conf directory)

Jar locations and versions may be different in your hive installation.

STEP 3:

Create a test data file /tmp/test.dat with the following lines

1:one
2:two
3:three

STEP 4:

Run the following Jython script

from java.lang import*from java.lang import*from java.sql import*
driverName ="org.apache.hadoop.hive.jdbc.HiveDriver";try:Class.forName(driverName);exceptException, e:print"Unable to load %s"% driverName
System.exit(1);
conn =DriverManager.getConnection("jdbc:hive://");
stmt = conn.createStatement();# Drop table#stmt.executeQuery("DROP TABLE testjython")# Create a table
res = stmt.executeQuery("CREATE TABLE testjython (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ':'")# Show tables
res = stmt.executeQuery("SHOW TABLES")print"List of tables:"while res.next():print res.getString(1)# Load some data
res = stmt.executeQuery("LOAD DATA LOCAL INPATH '/tmp/test.dat' INTO TABLE testjython")# SELECT the data
res = stmt.executeQuery("SELECT * FROM testjython")print"Listing contents of table:"while res.next():print res.getInt(1), res.getString(2)

You should see the following output, amidst a whole lot of debug statements:

1 one

2 two

3 three

No comments:

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。