Install
find installed Hadoop
env | grep -i hadoop
echo $HADOOP_HOME
echo $HADOOP_CLASSPATH
Hadoop directory structure:
hadoop-3.2.1-1.3.2
├── bin # hadoop的可执行文件,包括 hadoop, hdfs, yarn 等
│ ├── container-executor
│ ├── hadoop
│ ├── hdfs
│ ├── mapred
│ ├── oom-listener
│ └── yarn
├── etc # hadoop的配置文件
│ └── hadoop
│ ├── core-site.xml
│ ├── hdfs-site.xml
│ ├── httpfs-site.xml
│ ├── kms-site.xml
│ ├── log4j.properties
│ ├── mapred-site.xml
│ ├── workers
│ └── yarn-site.xml
├── sbin # hadoop的启动脚本
│ ├── hadoop-daemon.sh
│ ├── hadoop-daemons.sh
│ ├── httpfs.sh
│ ├── kms.sh
│ ├── mr-jobhistory-daemon.sh
│ ├── refresh-namenodes.sh
│ ├── start-all.sh
│ ├── start-balancer.sh
│ ├── start-dfs.sh
│ ├── start-secure-dns.sh
│ ├── start-yarn.sh
│ ├── stop-all.sh
│ ├── stop-balancer.sh
│ ├── stop-dfs.sh
│ ├── stop-secure-dns.sh
│ ├── stop-yarn.sh
│ ├── workers.sh
│ ├── yarn-daemon.sh
│ └── yarn-daemons.sh
└── share/ # hadoop的共享 jar 文件
└── hadoop
├── client
├── common
├── hdfs
├── mapreduce
├── tools
└── yarn
Commands
# check version
hadoop version
# run a jar file
hadoop jar <jar>
# copy files
hadoop distcp <src> <dest>
HDFS
Architecture
- NameNode: 负责管理文件系统的命名空间,提供文件的访问接口
- DataNode: 负责存储实际的数据块
Commands
hadoop fs is a synonym for hdfs
# 查看hdfs文件系统
hadoop fs -ls <dir>
# 查看文本文件
hadoop fs -cat <filename>
hadoop fs -cat <filename> | head
# 查看压缩文件
hadoop fs -text <filename>
# 查看文件大小
hadoop fs -du <dir>
其他命令可以看:hadoop shell commands
YARN
Commands
# list nodes
yarn node --list
# check resource usage
yarn top
# show queue status
yarn queue -status default
# run a jar file
yarn jar myapp.jar com.example.Main
# list applications
yarn application -list
# show application status
yarn application -status <application_id>
# kill an application
yarn kill <application_id>
# get containers
yarn container -list <application_id/application_name>
# show container logs
yarn logs -containerId <container_id>