kettle7.1中hive支持kerberos认证

/ Kettle / 1 条评论 / 343浏览

项目中使用kettle作为ETL工具,但HDP集群开启kerberos认证后,kettle连接hive就gg了。

Java连接Hive2

先看Java如何连接Hive2,下面是一段标准的JDBC代码:

public class HiveTest {

    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args) {
        try {
            Class.forName(driverName);
            Connection con = DriverManager.getConnection("jdbc:hive2://ip:port/default", "hadoop", "hadoop");
            String sql = "select * from settings";
            PreparedStatement pstmt = con.prepareStatement(sql);
            ResultSet rs = pstmt.executeQuery();
            while (rs.next()) {
                System.out.println(rs.getInt("key") + "**" + rs.getString("value"));
            }
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

kerberos认证

开启kerberos认证后,我们需要认证授权了:

public class HiveTest {

    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args) {
        System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
        Configuration conf = new Configuration();
        conf.set("hadoop.security.authentication", "kerberos");
        try {
            UserGroupInformation.setConfiguration(conf);
            UserGroupInformation.loginUserFromKeytab("omm@EXAMPLE.COM", "/cmcc/kettle/source/rpone/user.keytab");
        } catch (IOException e) {
            e.printStackTrace();
            System.exit(1);
        }

        try {
            Class.forName(driverName);
            Connection con = DriverManager.getConnection("jdbc:hive2://erpprdap33:15024/test;principal=hive/erpprdap33@EXAMPLE.COM", "hadoop", "hadoop");
            String sql = "select * from settings";
            PreparedStatement pstmt = con.prepareStatement(sql);
            ResultSet rs = pstmt.executeQuery();
            while (rs.next()) {
                System.out.println(rs.getInt("key")+"**"+rs.getString("value"));
            }
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

要注意的是,jdbc url连接jdbc:hive2://erpprdap33:15024/test;principal=hive/erpprdap33@EXAMPLE.COMprincipal=hive/erpprdap33@EXAMPLE.COM

ambari中hive2 server的principal为:hive/_HOST@EXAMPLE.COM

principal

这里,我们在jdbc连接中不能直接写principal=hive/_HOST@EXAMPLE.COM,最开始这样写,一直报错。后面同事把_HOST替换为hive2的host,即erpprdap33,最后可以执行。

kettle支持kerberos

修改kettle-engine源码

由于我们执行都是采用命令:./kitchen /file xxx.kjb,因此我们修改org.pentaho.di.kitchen.Kitchen这个启动类就行了。

提示: jar包在/data-integration/lib/kettle-engine-7.1.0.0-12.jar

我们得创建一个maven工程,引入一些依赖。

pom.xml

这样基本依赖就有了,当然依赖可能还不全,不过我们也用不到,我们只需要保证org.pentaho.di.kitchen.Kitchen这个启动类能顺利编译即可(后面手动替换/data-integration/lib/kettle-engine-7.1.0.0-12.jar中的Kettle.class)。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>kettle</artifactId>
    <version>1.0-SNAPSHOT</version>

    <repositories>
        <repository>
            <id>pentaho-releases</id>
            <url>https://nexus.pentaho.org/content/groups/omni/</url>
        </repository>
    </repositories>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <maven.compiler.encoding>UTF-8</maven.compiler.encoding>
        <java.version>1.8</java.version>
        <kettle.version>7.1.0.0-12</kettle.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>pentaho-kettle</groupId>
            <artifactId>kettle-core</artifactId>
            <version>${kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>pentaho-kettle</groupId>
            <artifactId>kettle-engine</artifactId>
            <version>${kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>pentaho-kettle</groupId>
            <artifactId>kettle-dbdialog</artifactId>
            <version>${kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>pentaho-kettle</groupId>
            <artifactId>kettle-ui-swt</artifactId>
            <version>${kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>pentaho</groupId>
            <artifactId>pentaho-report-wizard-core</artifactId>
            <version>${kettle.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.xmlgraphics</groupId>
            <artifactId>xmlgraphics-commons</artifactId>
            <version>2.2</version>
        </dependency>
        <dependency>
            <groupId>org.jfree</groupId>
            <artifactId>jcommon</artifactId>
            <version>1.0.24</version>
        </dependency>

        <!--&lt;!&ndash; kjb中使用了插件 &ndash;&gt;-->
        <!--<dependency>-->
            <!--<groupId>pentaho</groupId>-->
            <!--<artifactId>pentaho-big-data-kettle-plugins-hive</artifactId>-->
            <!--<version>${kettle.version}</version>-->
        <!--</dependency>-->
        <!--<dependency>-->
            <!--<groupId>pentaho</groupId>-->
            <!--<artifactId>pentaho-big-data-plugin</artifactId>-->
            <!--<version>${kettle.version}</version>-->
        <!--</dependency>-->

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.5</version>
        </dependency>

        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17</version>
        </dependency>

        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>javax.servlet-api</artifactId>
            <version>3.1.0</version>
        </dependency>
        
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-exec</artifactId>
            <version>1.3</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

重写Kitchen

将官方源码拷出来自己修改,找到(大约512行):

location

在job运行之前,添加主要代码如下(自己添加下option参数,让调用kitchen命令时传递参数进来):

// kerberos auth
if (!Utils.isEmpty(optionKerberos) && Boolean.parseBoolean(optionKerberos.toString())) {
    log.logBasic("=================kerberos auth=====================");
    System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
    Configuration conf = new Configuration();
    conf.set("hadoop.security.authentication", "kerberos");
    try {
        UserGroupInformation.setConfiguration(conf);
        UserGroupInformation.loginUserFromKeytab(optionKerberosUser.toString(), optionKerberosPath.toString());
    } catch (IOException e) {
        log.logError("hadoop kerberos error: " + e);
        returnCode = 1;
    }
}

由于我还做了把kjb生成预览的png流程图,所以代码就是下面这样了。。

预览

打包替换

kettle客户端传递principal参数

在hive数据库连接的选项中,传递

传递参数

运行

由于我的值获取是通过option参数传递进来的,所以运行kettle时要传递参数:

./kitchen.sh /file hive_test.kjb /kerberos true /kerberosUser cmcc@EXAMPLE.COM /kerberosPath /cmcc/kettle/source/rpone/cmcc.keytab

结果

成功

如果我们不开启认证:

./kitchen.sh /file hive_test.kjb /kerberos false /kerberosUser cmcc@EXAMPLE.COM /kerberosPath /cmcc/kettle/source/rpone/cmcc.keytab

结果如下:

错误

  1. hadoop-client.jar 需要放在data-integration/lib下面,还需要别的包吧

    回复