目录

Cdh

CDH

1 概述

参考文档: https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_60_download.html

rpm下载地址: parcels: https://archive.cloudera.com/cdh6/6.0.0/parcels/ cm: https://archive.cloudera.com/cdh6/6.0.0/redhat7/yum/

2 安装

2.1 离线安装

2.1.1 cm parcels 离线安装

cm server & cm agent 已写好ansible-playbooks安装【所以这里跳过,后续介绍】

  • parcels配置

-w1251

  • parcels下载与激活

以SPARK2为例,其他parcel激活方法相同

-w272

-w1427

-w1114

-w1135

2.1.2 SPARK2 Install

#csd文件下载地址 https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html

将在的问题件复制到此目录
/opt/cloudera/csd/SPARK2_ON_YARN-2.3.0.cloudera3.jar
从起cm服务,然后通过parcels进行安装
  • 添加服务

-w431

  • 就可以看到SPARK2,安装过程点点点,就不详细截图了

-w1273

  • 安装后

  • 环境变量-配置文件目录 更新
#configure_spark2_as_default.sh 所有节点执行
#! /bin/bash
for binary in pyspark spark-shell spark-submit; do
  # Generate the name of the new binary e.g. pyspark2, spark2-shell, etc.
  new_binary=$(echo $binary | sed -e 's/spark/spark2/')
  # Update the old alternative to the client binary to the new client binary
  # Use priority 11 because the default priority with which these alternatives are created is 10
  update-alternatives --install /usr/bin/${binary} ${binary} /usr/bin/${new_binary} 11
done
# For configuration, we need to have a separate command
# because the destination is under /etc/ instead of /usr/bin like for binaries.
# The priority is different - 52 because Cloudera Manager sets up configuration symlinks
# with priority 51.
update-alternatives --install /etc/spark/conf spark-conf /etc/spark2/conf 52

2.1.3 Kerberos

先搭建Kerberos服务,一键ansible-playbook已写好,kerberos安装以及主备数据同步后续写单独写一篇文章,本章只描述CDH如何对接Kerberos Kerberos-Install-URL=http://wangbokun.com/xxxx

2.1.4 S3 connector

2.1.5 Sentry

2.2 CM UI操作

2.2.1 体验cm 6.0 UI

全新UI

2.2.2 NN HA

点击开启HA,选择角色,NN&&JN,然后配置元数据目录,继续后出现一下界面

/data/cloudera/parcel-repo http://xxx/cdh/cdh6/parcels/

3 配置

3.1 cm conntion s3 configure

  • 安装s3 connector服务,安装过程不详写,图形界面点点点

  • 配置s3 endpoint,例如北京区(s3.cn-north-1.amazonaws.com.cn)

  • hdfs 配置s3 key

3.2 Yarn

3.2.1 Yarn scheduler queue

-w846

-w580

  • 设置root访问控制

-w1397 -w837

-w871

  • 设置 root.default

-w1420

-w886

-w874

  • 设置不限制用户

-w883

-w874

  • 设置提交job用户白名单

-w856

  • 设置管理queue的用户白名单【多个用户用逗号分隔】

-w859

-w1408

  • 删除允许自动创建用户queue -w1419

  • 打开Yarn scheduler 显示以下界面 -w1225

3.3 HDFS

3.3.1 hdfs-web-http

配置 value 说明
hdfs.httpfs.http.port 14000 端口
dfs.webhdfs.enabled true 开启httpfs
cloudera manager->hdfs->实例-> 添加角色-> HttpFS角色选择主机


简单测试:
curl -i  "http://127.0.0.1:14000/webhdfs/v1/user/root/?op=LISTSTATUS&user.name=root"

3.3.1 hdfs mount

参考文档:https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cdh_ig_hdfs_mountable.html

#如果是通过cloudera manager 安装的集群不需要手动安装hadoop-hdfs-fuse
直接mount就可以了
DIR=/cdh && mkdir -p  $DIR
hadoop-fuse-dfs hdfs://cdh:8020 $DIR
#如果集群接入了kerberos,需要用户kinit之后才可以看到本地hdfs目录,反之本地hdfs看不到

4 Kerberos以及用户管理

playbooks已实现自动安装,以及主备同步.

4.1 cm开始kerberos认证

-w449

-w1214

-w1222

-w1242

-w1289

-w1249

-w512

-w1261

-w1245

-w1271

-w1269

-w407

-w966

-w1191

4.2 Sentry服务

命令参考 https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_show.html https://www.cnblogs.com/bugsbunny/p/7097958.html

-w1284

-w1316

-w1329

-w1246

-w1282

-w417

4.3 权限配置

4.3.1 hive sentry

-w1404

-w791

-w626

-w605

  • sentry group user -w1087

-w1097

4.3.2 HDFS sentry

<property>
    <name>fs.s3a.access.key</name>
    <value></value>
</property>

<property>
    <name>fs.s3a.secret.key</name>
    <value></value>
</property>

<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>

<property>
    <name>hadoop.proxyuser.livy.groups</name>
    <value>*</value>
</property>

<property>
    <name>hadoop.proxyuser.livy.hosts</name>
    <value>*</value>
</property>

<property>
    <name>fs.s3a.fast.upload</name>
    <value>true</value>
</property>

<property>
    <name>fs.s3a.fast.upload.buffer</name>
    <value>disk</value>
</property>

<property>
    <name>hadoop.proxyuser.zeppelin.groups</name>
    <value>*</value>
</property>

<property>
    <name>hadoop.proxyuser.zeppelin.hosts</name>
    <value>*</value>
</property>

<property>
    <name>hadoop.proxyuser.presto.groups</name>
    <value>*</value>
</property>
    
<property>
    <name>hadoop.proxyuser.presto.hosts</name>
    <value>*</value>
</property>

-w1367

-w440

4.3.3 Hue sentry

预先添加hive_admin 组的权限,用命令行添加权限
hive的机器上寻找keytab,先找到pid最大的目录
$ ll /var/run/cloudera-scm-agent/process |grep hive-HIVESERVER2
$ cd /var/run/cloudera-scm-agent/process/482-hive-HIVESERVER2
$ ls -al hive.keytab
-rw-------. 1 hive hive  1570 1月   8 17:01 hive.keytab
$ kinit -kt hive.keytab hive/xxx@xxx.COM

# 用beeline链接hive
beeline
!connect jdbc:hive2://localhost:10000/;principal=hive/xxx.COM;
create role hive_admin_role;
grant all on server server1 to role hive_admin_role;
grant role hive_admin_role to group hive_admin;

-w778

  • 重启服务后,HUE出现Security菜单 -w204

-w1142

-w787

-w1124

5 Cloudera Manager S3

-w459

-w1281

-w1239

-w611

-w903 -w888 -w1295 -w1240

-w1278

-w1256

-w1276

-w1419

-w1431

region name:
s3.cn-north-1.amazonaws.com.cn
之后提示同步配置,并重启服务.

-w405

6 Hue

6.1 Hue LDAP

-w613

6.2 手动导入用户

-w1110

-w541

-w129

  • admin group

-w608

-w545 -w603

-w606

-w600

7 HDFS

7.1 hadoop fs

7.1.1 appendToFile

[~]$ echo a > a
[~]$ echo b > b
[~]$ hadoop fs -appendToFile b appendToFile/a 
[~]$ hadoop fs -cat  appendToFile/a
a
b

7.1.2 checksum

[~]$ hadoop fs -checksum  appendToFile/a
appendToFile/a	MD5-of-0MD5-of-512CRC32C	000002000000000000000000b0a41fe649f3230d9e76f743d1345b2a

7.1.3 chgrp chmod chown

[-chgrp [-R] GROUP PATH…] [-chmod [-R] <MODE[,MODE]… | OCTALMODE> PATH…] [-chown [-R] [OWNER][:[GROUP]] PATH…] [-copyFromLocal [-f] [-p] [-l] ... ] [-copyToLocal [-p] [-ignoreCrc] [-crc] ... ] [-count [-q] [-h] [-v] [-x] ...] [-cp [-f] [-p | -p[topax]] ... ] [-df [-h] [ ...]] [-du [-s] [-h] [-x] ...]

7.1.4 snapshot

[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>] ``` [~]$ hadoop fs -createSnapshot  appendToFile/ bk_test createSnapshot: Directory is not a snapshottable directory: /user/bokun.wang/appendToFile ```

7.1.5 expunge 清空回收站

[-expunge]
```
hadoop fs -expunge
```

[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]

7.2 hdfs File ACL


#修改历史数据读
hdfs dfs -setfacl -m  -R  group:$user:r-x  /xxx 


#修改default,后进入新数据读
hdfs dfs -setfacl -m -R  default:group:$user:r-x /xx

Q&&A

1. REVISION_ID’ doesn’t exist in table

数据库问题: https://community.cloudera.com/t5/Cloudera-Manager-Installation/MySQLSyntaxErrorException-Key-column-REVISION-ID-doesn-t/td-p/69621

2018-09-11 19:06:30,890 FATAL main:org.hsqldb.cmdline.SqlFile: SQL Error at 'UTF-8' line 57:
"alter table ROLE_CONFIG_GROUPS
    drop column REVISION_ID"
Key column 'REVISION_ID' doesn't exist in table
2018-09-11 19:06:30,890 FATAL main:org.hsqldb.cmdline.SqlFile: Rolling back SQL transaction.
2018-09-11 19:06:30,892 ERROR main:com.cloudera.enterprise.dbutil.SqlFileRunner: Exception while executing ddl scripts.
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Key column 'REVISION_ID' doesn't exist in table
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

MariaDB 10.2.12 解决办法:

# 添加83 84两行内容
vim  /opt/cloudera/cm/schema/mysql/05003_cmf_schema.mysql.ddl

 80 alter table CONFIGS
 81     drop column REVISION_ID;
 82
 83 ALTER TABLE ROLE_CONFIG_GROUPS DROP INDEX IDX_UNIQUE_ROLE_CONFIG_GROUP;
 84 ALTER TABLE ROLE_CONFIG_GROUPS DROP INDEX IDX_ROLE_CONFIG_GROUP_CONFIG_REVISION;
 85
 86 alter table ROLE_CONFIG_GROUPS
 87     drop column REVISION_ID;

然后执行这两步骤就可以通过了
CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
sh -x /opt/cloudera/cm/schema/scm_prepare_database.sh mysql -h $IP   --scm-host $IP scm scm scm


以上两步同:
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql -h $IP -u$USER -p$PW --scm-host $IP scm scm scm

2 FK_SERVICE_CONFIG_REVISION; check that it exists

cm服务启动失败
#错误日志
2018-12-26 18:09:31,359 FATAL main:org.hsqldb.cmdline.SqlFile: SQL Error at 'UTF-8' line 2:
"alter table SERVICES
    drop foreign key FK_SERVICE_CONFIG_REVISION"
Can't DROP FOREIGN KEY `FK_SERVICE_CONFIG_REVISION`; check that it exists
2018-12-26 18:09:31,360 ERROR main:com.cloudera.enterprise.dbutil.SqlFileRunner: Exception while executing ddl scripts.
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Can't DROP FOREIGN KEY `FK_SERVICE_CONFIG_REVISION`; check that it exists
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	
#【解决办法】
#http://community.cloudera.com/t5/Cloudera-Manager-Installation/Foreign-Key-issue-creating-SCM-repository-in-MySQL-database/td-p/2523

alter the ddl scripts in /usr/share/cmf/schema/mysql directory

You will need to add two lines:

After the first update add 

SET FOREIGN_KEY_CHECKS=0;

Then at the bottom of the file add 

SET FOREIGN_KEY_CHECKS=1;

 

You will need to do that to the following files: 

00035_cmf_schema.mysql.ddl
00043_cmf_schema.mysql.ddl
04509_cmf_schema.mysql.ddl
04511_cmf_schema.mysql.ddl

3 ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf.cloudera.yarn/container-executor.cfg

INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:269)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:199)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:267)
        ... 3 more
Caused by: ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf.cloudera.yarn/container-executor.cfg

【解决办法】

-w1028

$chmod 6050 ./cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/hadoop-yarn/bin/container-executor
---Sr-s---. 1 root yarn 53728 8月  10 02:14 ./cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/hadoop-yarn/bin/container-executor
----r-x---. 1 root yarn 53728 1月  23 16:31 ./cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/hadoop-yarn/bin/container-executor.20190123