环境准备
三台虚拟机(Centos7.6)
固定IP地址
hadoop101 192.168.10.101
hadoop102 192.168.10.102
hadoop103 192.168.10.103
环境搭建 关闭防火墙并关闭自启动
1 2 systemctl stop firewalld systemctl disable firewalld.service
创建hadoop用户
1 2 useradd hadoop passwd hadoop
配置hadoop拥有root权限
1 2 3 4 5 6 ## Allow root to run any commands anywhere root ALL=(ALL) ALL ## Allows people in group wheel to run all commands %wheel ALL=(ALL) ALL hadoop ALL=(ALL) NOPASSWD:ALL
在/opt目录下新建module目录和software目录
1 2 mkdir /opt/modulemkdir /opt/software
修改module文件夹和software文件夹的所有者和所属组
1 2 chown hadoop:hadoop /opt/modulechown hadoop:hadoop /opt/software
卸载虚拟机自带的jdk
1 rpm -qa | grep -i java | xargs -n1 rpm -e --nodeps
重启虚拟机
设置静态IP地址(网关等信息从虚拟机的网络设置中查看)
1 vim /etc/sysconfig/network-scripts/ifcfg-ens33
1 2 3 4 5 6 7 8 9 DEVICE=ens33 TYPE=Ethernet ONBOOT=yes BOOTPROTO=static NAME="ens33" IPADDR=192.168.10.102 PREFIX=24 GATEWAY=192.168.10.101 DNS1=192.168.10.2
修改主机名称
修改hosts文件
添加对应的域名解析
1 2 3 192.168.10.101 hadoop101 192.168.10.102 hadoop102 192.168.10.103 hadoop103
同理,把三台虚拟机的IP地址和主机名都设置好
修改Windows系统C:\Windows\System32\drivers\etc下的hosts文件,添加对应的dns信息
1 2 3 192.168.10.101 hadoop101 192.168.10.102 hadoop102 192.168.10.103 hadoop103
把jdk上传到/opt/software目录下,然后解压
1 tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module/
配置jdk环境变量
新建/etc/profile.d/my_env.sh文件
1 sudo vim /etc/profile.d/my_env.sh
添加以下内容
1 2 3 # JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin
使用source让配置生效
测试是否安装成功
hadoop下载路径:https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/
上传到/opt/software目录下,然后解压到module目录下
1 tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
添加环境变量
1 sudo vim /etc/profile.d/my_env.sh
1 2 3 4 # HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
让配置生效
测试是否安装成功
为了方便管理三台虚拟机,编写分发脚本xsync
在/home/hadoop/bin 目录下新建xsync文件
1 2 3 4 cd /home/hadoopmkdir bincd binvim xsync
编写脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 # !/bin/bash # 1. 判断参数个数 if [ $# -lt 1 ] then echo Not Enough Arguement! exit; fi # 2. 遍历集群所有机器 for host in hadoop101 hadoop102 hadoop103 do echo ==================== $host ==================== #3. 遍历所有目录,挨个发送 for file in $@ do #4. 判断文件是否存在 if [ -e $file ] then #5. 获取父目录 pdir=$(cd -P $(dirname $file); pwd) #6. 获取当前文件的名称 fname=$(basename $file) ssh $host "mkdir -p $pdir" rsync -av $pdir/$fname $host:$pdir else echo $file does not exists! fi done done
修改权限
配置ssh免密登录
生成密钥
然后一直敲回车
复制密钥
1 2 3 ssh-copy-id hadoop101 ssh-copy-id hadoop102 ssh-copy-id hadoop103
配置hadoop的文件
配置core-site.xml
1 2 cd $HADOOP_HOME /etc/hadoopvim core-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 <?xml version="1.0" encoding="UTF-8" ?> <?xml-stylesheet type="text/xsl" href="configuration.xsl" ?> <configuration > <property > <name > fs.defaultFS</name > <value > hdfs://hadoop101:8020</value > </property > <property > <name > hadoop.tmp.dir</name > <value > /opt/module/hadoop-3.1.3/data</value > </property > <property > <name > hadoop.http.staticuser.user</name > <value > hadoop</value > </property > </configuration >
配置hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 <?xml version="1.0" encoding="UTF-8" ?> <?xml-stylesheet type="text/xsl" href="configuration.xsl" ?> <configuration > <property > <name > dfs.namenode.http-address</name > <value > hadoop101:9870</value > </property > <property > <name > dfs.namenode.secondary.http-address</name > <value > hadoop103:9868</value > </property > </configuration >
配置yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 <?xml version="1.0" encoding="UTF-8" ?> <?xml-stylesheet type="text/xsl" href="configuration.xsl" ?> <configuration > <property > <name > yarn.nodemanager.aux-services</name > <value > mapreduce_shuffle</value > </property > <property > <name > yarn.resourcemanager.hostname</name > <value > hadoop102</value > </property > <property > <name > yarn.nodemanager.env-whitelist</name > <value > JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value > </property > </configuration >
配置mapred-site.xml
1 2 3 4 5 6 7 8 9 10 <?xml version="1.0" encoding="UTF-8" ?> <?xml-stylesheet type="text/xsl" href="configuration.xsl" ?> <configuration > <property > <name > mapreduce.framework.name</name > <value > yarn</value > </property > </configuration >
然后分发配置文件
1 xsync /opt/module/hadoop-3.1.3/etc/hadoop/
三台主机都查看一下配置文件是否正确修改
配置workers
1 vim /opt/module/hadoop-3.1.3/etc/hadoop/workers
新增
1 2 3 hadoop101 hadoop102 hadoop103
然后同步所有主机
1 xsync /opt/module/hadoop-3.1.3/etc
第一次启动需要先格式化hdfs
然后启动hdfs
hadoop102启动yarn
浏览器输入http://hadoop101:9870查看hdfs的NameNode
在http://hadoop102:8088 查看yarn
启动/关闭集群
1 2 start-dfs.sh/stop-dfs.sh start-yarn.sh/stop-yarn.sh
为了方便启动和关闭集群,编写脚本
1 2 cd /home/hadoop/binvim myhadoop.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 # !/bin/bash if [ $# -lt 1 ] then echo "No Args Input..." exit ; fi case $1 in "start") echo " =================== 启动 hadoop集群 ===================" echo " --------------- 启动 hdfs ---------------" ssh hadoop101 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh" echo " --------------- 启动 yarn ---------------" ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh" echo " --------------- 启动 historyserver ---------------" ssh hadoop101 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver" ;; "stop") echo " =================== 关闭 hadoop集群 ===================" echo " --------------- 关闭 historyserver ---------------" ssh hadoop101 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver" echo " --------------- 关闭 yarn ---------------" ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh" echo " --------------- 关闭 hdfs ---------------" ssh hadoop101 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh" ;; *) echo "Input Args Error..." ;; esac
修改权限
同理编写查看集群进程的脚本
1 2 cd /home/hadoop/binvim jpsall
1 2 3 4 5 6 7 # !/bin/bash for host in hadoop101 hadoop102 hadoop103 do echo =============== $host =============== ssh $host jps done