目录

Yarn

1 OverView

2 Yarn site

2.1 yarn 内存调优

  • Hortonworks配置推荐 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_installing_manually_book/content/rpm-chap1-9.html
  • RAM-per-container = max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))
  • containers = min (2CORES, 1.8DISKS, (total available RAM) / MIN_CONTAINER_SIZE)
  • 也就是Cpu cores 2倍,配载磁盘数的1.8倍,还有根据内存公式(total available RAM) / MIN_CONTAINER_SIZE)算出可用开启的containers数的综合考量。

Configuration File Configuration Setting Value Calculation
yarn-site.xml yarn.nodemanager.resource.memory-mb containers * RAM-per-container
yarn-site.xml yarn.scheduler.minimum-allocation-mb RAM-per-container
yarn-site.xml yarn.scheduler.maximum-allocation-mb containers * RAM-per-container
yarn-site.xml yarn.app.mapreduce.am.resource.mb 2 * RAM-per-container
yarn-site.xml yarn.app.mapreduce.am.command-opts 0.8 * 2 * RAM-per-container
mapred-site.xml mapreduce.map.memory.mb RAM-per-container
mapred-site.xml mapreduce.reduce.memory.mb 2 * RAM-per-container
mapred-site.xml mapreduce.map.java.opts 0.8 * RAM-per-container
mapred-site.xml mapreduce.reduce.java.opts 0.8 * 2 * RAM-per-container
wget http://public-repo-1.hortonworks.com/HDP/tools/2.1.1.0/hdp_manual_install_rpm_helper_files-2.1.1.385.tar.gz
tar -zxvf hdp_manual_install_rpm_helper_files-2.1.1.385.tar.gz
cd hdp_manual_install_rpm_helper_files-2.1.1.385

➜  scripts python yarn-utils.py -h
Usage: yarn-utils.py [options]

Options:
  -h, --help            show this help message and exit
  -c CORES, --cores=CORES
                        Number of cores on each host
  -m MEMORY, --memory=MEMORY
                        Amount of Memory on each host in GB
  -d DISKS, --disks=DISKS
                        Number of disks on each host
  -k HBASE, --hbase=HBASE
                        True if HBase is installed, False is not
  • cloudera 配置推荐

    tools: http://tiny.cloudera.com/yarn-tuning-guide

下载以后链接excel

-w1107 -w1102

2.2 yarn.nodemanager.local-dirs

#日志配置你多目录,将该值设置为一系列多磁盘目录有助于提高I/O效率
yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs

3 mapred site

3.1 mapred.split.size

#待补充
<property><name>mapred.max.split.size</name><value>134217728</value></property>
<property><name>mapred.min.split.size</name><value>134217728</value></property>

3.2 mapreduce.tasktracker.outofband.heartbeat

<property><name>mapreduce.tasktracker.outofband.heartbeat</name><value>true</value></property>
<property>

3.3 mapred.compress.map.output

#Map任务的中间结果默认不会进行压缩,设置为true,会对中间结果进行压缩,减少数据传输时的带宽需求
<property><name>mapred.compress.map.output</name><value>true</value></property>

3.4 mapred.job.reuse.jvm.num.tasks

#JVM重用机制,默认为1,表示一个JVM只能启动一个任务,现在设置为-1.表示1个JVM可以启动的任务数不受限制
<property><name>mapred.job.reuse.jvm.num.tasks</name><value>-1</value></property

3.5 mapreduce.jobtracker.heartbeat.interval.min

3.6 mapred.heartbeats.in.second

3.7 mapreduce.jobtracker.heartbeat.scaling.factor

mapred.max.map.failures.percent 作业允许失败的map最大比例 默认值0,即0% mapred.max.reduce.failures.percent 作业允许失败的reduce最大比例 默认值0,即0% mapred.map.max.attemps 失败后最多重新尝试的次数 默认是4 mapred.reduce.max.attemps 失败后最多重新尝试的次数 默认是4