Install

find installed Hadoop

env | grep -i hadoop
echo $HADOOP_HOME
echo $HADOOP_CLASSPATH

Hadoop directory structure:

hadoop-3.2.1-1.3.2
├── bin                     # hadoop的可执行文件,包括 hadoop, hdfs, yarn 等
│   ├── container-executor
│   ├── hadoop
│   ├── hdfs
│   ├── mapred
│   ├── oom-listener
│   └── yarn
├── etc                     # hadoop的配置文件
│   └── hadoop
│       ├── core-site.xml
│       ├── hdfs-site.xml
│       ├── httpfs-site.xml
│       ├── kms-site.xml
│       ├── log4j.properties
│       ├── mapred-site.xml
│       ├── workers
│       └── yarn-site.xml
├── sbin                    # hadoop的启动脚本
│   ├── hadoop-daemon.sh
│   ├── hadoop-daemons.sh
│   ├── httpfs.sh
│   ├── kms.sh
│   ├── mr-jobhistory-daemon.sh
│   ├── refresh-namenodes.sh
│   ├── start-all.sh
│   ├── start-balancer.sh
│   ├── start-dfs.sh
│   ├── start-secure-dns.sh
│   ├── start-yarn.sh
│   ├── stop-all.sh
│   ├── stop-balancer.sh
│   ├── stop-dfs.sh
│   ├── stop-secure-dns.sh
│   ├── stop-yarn.sh
│   ├── workers.sh
│   ├── yarn-daemon.sh
│   └── yarn-daemons.sh
└── share/                      # hadoop的共享 jar 文件
    └── hadoop
        ├── client
        ├── common
        ├── hdfs
        ├── mapreduce
        ├── tools
        └── yarn

Commands

Hadoop Commands Guide

# check version
hadoop version

# run a jar file
hadoop jar <jar>

# copy files
hadoop distcp <src> <dest>

HDFS

Architecture

  • NameNode: 负责管理文件系统的命名空间,提供文件的访问接口
  • DataNode: 负责存储实际的数据块

Commands

hadoop fs is a synonym for hdfs

# 查看hdfs文件系统
hadoop fs -ls <dir>

# 查看文本文件
hadoop fs -cat <filename>
hadoop fs -cat <filename> | head

# 查看压缩文件
hadoop fs -text <filename>

# 查看文件大小
hadoop fs -du <dir>

其他命令可以看:hadoop shell commands

YARN

Commands

# list nodes
yarn node --list

# check resource usage
yarn top

# show queue status
yarn queue -status default

# run a jar file
yarn jar myapp.jar com.example.Main

# list applications
yarn application -list

# show application status
yarn application -status <application_id>

# kill an application
yarn kill <application_id>

# get containers
yarn container -list <application_id/application_name>

# show container logs
yarn logs  -containerId <container_id>