2016-05-10

hadoop集群管理

yarn日志和中间结果地址设置

最近遇到跑任务时将系统磁盘打满的情况，定位发现是yarn.nodemanager.log-dirs使用默认配置，导致运行任务的日志过多造成的。

yarn.nodemanager.local-dirs
参数解释：中间结果存放位置，类似于1.0中的mapred.local.dir。注意，这个参数通常会配置多个目录，以分摊磁盘IO负载。
默认值：${hadoop.tmp.dir}/nm-local-dir
yarn.nodemanager.log-dirs
参数解释：日志存放地址（可配置多个目录）。
默认值：${yarn.log.dir}/userlogs

这两个参数在yarn-site.xml中，尽量进行单独配置，将其放置到数据磁盘上。

ResouceManager和NodeManager的一些参数说明可以参考http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-configurations-resourcemanager-nodemanager/

hdfs开启ACL

HDFS本身没有提供用户名、用户组的创建，在客户端调用hadoop 的文件操作命令时，hadoop 识别出执行命令所在进程的用户名和用户组，然后使用这个用户名和组来检查文件权限。

<property>
    <name>dfs.permissions.enabled</name>
    <value>true</value>
</property>
<property>
    <name>dfs.namenode.acls.enabled</name>
    <value>true</value>
</property>

For HDFS, the mapping of users to groups is performed on the NameNode. Thus, the host system configuration of the NameNode determines the group mappings for the users.

参考:

http://debugo.com/hdfs-acl/?utm_source=tuicool&utm_medium=referral
基于Hadoop SLA认证机制实现权限控制: http://shiyanjun.cn/archives/994.html

hive查询出错或 yarn执行报错

在hue中使用hive查询时有时会报错: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask，显然具体原因还得去yarn中查看，具体原因是Diagnostics: Rename cannot overwrite non empty destination directory /data/yarn/nm/usercache/xxxx/filecache/100。

原因:

This is a bug in Hadoop 2.6.0. It’s been marked as fixed but it still happens occasionally (see: https://issues.apache.org/jira/browse/YARN-2624).

解决:

手动将所有NodeManager上目录/data/yarn/nm/usercache/下内容删除。

参考:

http://stackoverflow.com/questions/30857413/hadoop-complains-about-attempting-to-overwrite-nonempty-destination-directory

yarn unhealthy nodes

在yarn resourcemanager web ui界面中可以看到有一项是Unhealthy Nodes，点进去可以看到具体信息，health-report显示具体原因。

查看此node manager上磁盘已经到90%，考虑到磁盘剩余空间还很充足，决定调节yarn参数。

yarn.nodemanager.disk-health-checker.min-healthy-disks: 当磁盘剩余空间小于此阀值时，yarn不会创建新的container，默认是0.25。
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage: 当磁盘占用空间小于此阀值时，yarn会视为此节点为unhealthy的，默认是90.0。

增加这两个参数如下:

<property>
  <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
  <value>0.05</value>
</property>
<property>
  <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
  <value>95.0</value>
</property>

修改完成后，需要先将nodemanager重启，然后再重启resourcemanager，否则会导致修改的节点状态错乱。

参考:

yarn日志和中间结果地址设置

hdfs开启ACL

hive查询出错 或 yarn执行报错

yarn unhealthy nodes

hive查询出错或 yarn执行报错