博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Kettle访问IDH2.3中的HBase
阅读量:6181 次
发布时间:2019-06-21

本文共 3982 字,大约阅读时间需要 13 分钟。

摘要

Kettle是一款国外开源的ETL工具,纯java编写,可以在Window、Linux、Unix上运行,绿色无需安装,数据抽取高效稳定。是kettle中用于访问bigdata,包括hadoop、cassandra、mongodb等nosql数据库的一个插件。

截至目前,kettle的版本为4.4.1,big-data-plugin插件支持cloudera CDH3u4、CDH4.1,暂不支持Intel的hadoop发行版本IDH。

本文主要介绍如何让kettle支持IDH的hadoop版本。

方法

假设你已经安装好IDH-2.3的集群,并已经拷贝出/usr/lib/下的hadoop、hbase、zookeeper目录。

首先,下载一个kettle版本,如社区版data-integration,然后进入data-integration/plugins/pentaho-big-data-plugin目录,修改plugin.properties文件中的active.hadoop.configuration属性,将其值改为cdh4

active.hadoop.configuration=cdh4

修改kettle的log4j日志等级,并启动kettle,检查启动过程中是否报错,如有错误,请修正错误。

进入hadoop-configurations目录,copy and paste cdh3u4并命名为idh2.3。

因为IDH和CDH的hadoop版本不一致,故需要替换hadoop和hbase、zookeeper为IDH的版本,涉及到需要替换、增加的jar有,这些jar文件从IDH安装后的目录中拷贝即可:

data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/pmr/hbase-0.94.1-Intel.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/pmr/protobuf-java-2.4.0a.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/pmr/zookeeper-3.4.5-Intel.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/client/hadoop-ant-1.0.3-Intel.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/client/hadoop-core-1.0.3-Intel.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/client/hadoop-examples-1.0.3-Intel.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/client/hadoop-test-1.0.3-Intel.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/client/hadoop-tools-1.0.3-Intel.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/libthrift-0.8.0.jar

其他依赖包可以尝试添加,并删除多版本的jar文件。

需要删除CDH的jar有:

data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/pmr/hbase-0.90.6-cdh3u4.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/pmr/zookeeper-3.3.5-cdh3u4.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/client/hadoop-client-0.20.2-cdh3u4.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/client/hadoop-core-0.20.2-cdh3u4.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/libfb303-0.5.0-cdh.jardata-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/idh2.3/lib/libthrift-0.5.0-cdh.jar

修改plugin.properties文件中的active.hadoop.configuration属性,将其值改为idh2.3。重起kettle,观察启动过程中是否报错。

验证

  1. 打开hbase output组件,配置zookeeper的host和port

hbase-output-setup-for-idh-2.3

  1. Create/Edit mappings tab页点击Get table names,发现该组件卡住,kettle控制台提示异常则需要检查客户端jar版本和服务端是否一致:
INFO client.HConnectionManager$HConnectionImplementation: getMaster attempt 0 of 10 failed; retrying after sleep of 1000java.io.IOException: Call to OS-GZP2308-04/192.168.40.84:60000 failed on local exception: java.io.EOFExceptionat org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1110)at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1079)at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)at $Proxy5.getProtocolVersion(Unknown Source)at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:335)at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:312)at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:364)at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:710)at org.apache.hadoop.hbase.client.HBaseAdmin.< init>(HBaseAdmin.java:141)at com.intel.hbase.test.createtable.TableBuilder.main(TableBuilder.java:48)Caused by: java.io.EOFExceptionat java.io.DataInputStream.readInt(DataInputStream.java:375)at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:605)at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:538)

转载地址:http://tbcda.baihongyu.com/

你可能感兴趣的文章
HDU Problem 1260 Tickets 【dp】
查看>>
STL map容器常用API
查看>>
队列的顺序存储---顺序队列
查看>>
Delphi 读取 c# webservice XML的base64编码图片字符串转化图片并显示
查看>>
第三天
查看>>
connector for python
查看>>
等价类划分的应用
查看>>
Web Service(下)
查看>>
trigger()
查看>>
nvm 怎么安装 ?
查看>>
Java VM里的magic
查看>>
[Node.js]Domain模块
查看>>
Linux操作系统文档
查看>>
利用Tensorflow训练自定义数据
查看>>
c++官方文档-枚举-联合体-结构体-typedef-using
查看>>
[题解]UVA11029 Leading and Trailing
查看>>
利用vue-gird-layout 制作可定制桌面 (一)
查看>>
校园社交网站app
查看>>
如何指定某些文件关闭ARC
查看>>
4、跃进表
查看>>