Linux 系统监控与性能分析

2017-06-23 | 阅读：次

查看进程信息

查看进程号的方式大概有三种，静态的ps、动态的top、查阅程序树之间的关系pstree。

截取某个时间点的进程运行情况(静态)

# 查看所有进程
$ ps

# 使用 -ef
# -e 所有进程均显示出来，同 -A 
# -f 做一个更为完整的输出
$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Jun30 ?        00:00:07 /sbin/init
...

# 使用 aux ，注意没有 - 
# a 列出所有进程。注意 a 和 -a 是不同的两个参数。-a 列出不与terminal有关的所有进程
# u 以面向用户的格式显示进程信息。注意 u 和 -u 也是不同的两个参数。
# x 通常与参数 a 一起使用，显示更多的信息
$ ps aux 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  19236  1464 ?        Ss   Jun30   0:07 /sbin/init
...

# 查看neu用户下正在进行的进程
$ ps -u neu
PID TTY          TIME CMD
16223 pts/3    00:08:44 java
25583 ?        00:11:10 redis-server


# 仅查看当前bash相关的进程
$ ps -l
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 R   503   630   682  0  80   0 - 27024 -      pts/3    00:00:00 ps
4 S   503   682   681  0  80   0 - 27113 wait   pts/3    00:00:00 bash

# 无条件终止进程
$ kill -9 24270（pid,进程号）

# ww 参数的含义是：避免详细参数被截断
# 管道符'|'用来分隔两个命令，左边命令的输出会作为右面命令的输入
# 输出所有进程号|输出含有关键字'a'的进程|去除含有关键'b的进程'|截取输入行第9~15个字符（pid）|将前面命令的输出结果（pid）作为kill -9的参数，并执行
$ ps -efww | grep a | grep -v b | cut -c 9-15 | xargs kill -9
# grep命令用来筛选信息

# 在已知进程名(name)的前提下，交互式 Shell 获取进程 pid 有很多种方法，典型的通过 grep 获取 pid 的方法为（这里添加 -v grep是为了避免匹配到 grep 进程）：
$ ps -ef | grep "name" | grep -v grep | awk '{print $2}'
# 或者不使用 grep（这里名称首字母加[]的目的是为了避免匹配到 awk 自身的进程）：
$ ps -ef | awk '/[n]ame/{print $2}'
# 如果只使用 x 参数的话则 pid 应该位于第一位：
$ ps x | awk '/[n]ame/{print $1}'
# 注意 grep 是模糊匹配，如果想使用精确匹配，使用参数 -w
$ ps -ef | grep -w "test"

# 最简单的方法是使用 pgrep：
$ pgrep -f name
# 如果需要查找到 pid 之后 kill 掉该进程，还可以使用 pkill：
$ pkill -f name
# 如果是可执行程序的话，可以直接使用 pidof
$ pidof name

查看所有java进程的pid的命令

jdk1.5起提供jps来查看java进程的pid

查看流量带宽

查看 Linux 每个进程的流量和带宽

$ apt-get install -y iptraf
$ iptraf


# 或者使用 iftop
$ apt-get install -y iptraf
# -P 在 iftop 的输出结果中开启端口显示，这样可以用 netstat 或者 lsof 找到占用端口的进程
$ iftop -P

Linux 系统监控

pidstat

从进程角度获取每个进程使用cpu、内存和磁盘等系统资源的统计信息

$ yum install -y sysstat
# 查看某进程的 cpu(-u)、memory(r)、disk(-d) 信息
# -p 指定 pid
# 以 2 秒为采样周期，输出 10 次统计信息
$ pidstat -urd -p 18936 2 10
09:32:27 AM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
09:32:27 AM     0     18936    1.66    0.31    0.00    1.97    21  java

09:32:27 AM   UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
09:32:27 AM     0     18936     48.32      0.00 41139636 35064140  26.62  java

09:32:27 AM   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
09:32:27 AM     0     18936     43.40   1058.57      0.00  java

$ pidstat

netstat

netstat 是通过解析/proc/net/下的文件返回网络相关的信息

$ yum install -y net-tools
$ info netstat
       netstat  - Print network connections, routing tables, interface statis‐
       tics, masquerade connections, and multicast memberships
      
# 使用root用户才能看到全部信息
$ netstat [options]
#  -a Show both listening and non-listening sockets，LISTEN、ESTABLISHED、TIME_WAIT
#  -l --listening, Show only listening sockets.  (These are omitted by default.)
#  -t 列出 tcp 网络数据包的数据
#  -u 列出 udp 网络数据包的数据
#  -n Show numerical addresses instead of trying to determine symbolic host, port or user names.
#  -p Show the PID and name of the program to which each socket belongs.
#  -r 路由信息
 
# 每列的详细含义可以阅读`man netstat`的 OUTPUT 部分
#
# Proto:
#   The protocol (tcp, udp, udpl, raw) used by the socket.
# Recv-Q:
#   Established: The count of bytes not copied by the user program connected to this socket. 
#   Listening: Since Kernel 2.6.18 this column contains the current syn backlog.
# Send-Q:
#   Established: The count of bytes not acknowledged by the remote host. 
#   Listening: Since Kernel 2.6.18 this column contains the maximum size of the syn backlog.
# Local Address:
#   Address  and  port  number  of  the local end of the socket.  Unless the --numeric (-n) option is 
#   specified, the socket address is resolved to its canonical host name(FQDN), and the port number 
#   is translated into the corresponding service name.
# Foreign Address:
#   Address and port number of the remote end of the socket.  Analogous to "Local Address."
# State：
#   The state of the socket. Since there are no states in raw mode and usually no states used in UDP, 
#   this column may be left blank. Normally this can be one of several values:
#     - ESTABLISHED：The socket has an established connection.
#     - SYN_SENT：The socket is actively attempting to establish a connection.
#     - SYN_RECV：A connection request has been received from the network.
#     - FIN_WAIT1：The socket is closed, and the connection is shutting down.
#     - FIN_WAIT2：Connection is closed, and the socket is waiting for a shutdown from the remote end.
#     - TIME_WAIT：The socket is waiting after close to handle packets still in the network.
#     - CLOSE：The socket is not being used.
#     - CLOSE_WAIT：The remote end has shut down, waiting for the socket to close.
#     - LAST_ACK：The remote end has shut down, and the socket is closed. Waiting for acknowledgement.
#     - LISTEN：The socket is listening for incoming connections.  Such sockets are not included in the
#               output unless you specify the --listening (-l) or --all (-a) option.
#     - CLOSING：Both sockets are shut down but we still don't have all our data sent.
#     - UNKNOWN：The state of the socket is unknown.
$ sudo netstat -anp
Active Internet connections (servers and established)
Proto   Recv-Q   Send-Q   Local Address       Foreign Address     State       PID/Program name
tcp     0        0        0.0.0.0:8181        0.0.0.0:*           LISTEN      1780/nginx: worker 
tcp     0        0        0.0.0.0:8181        0.0.0.0:*           LISTEN      1767/nginx: worker 
tcp     0        0        0.0.0.0:8181        0.0.0.0:*           LISTEN      1754/nginx: worker
...

# 例如：根据端口信息，产看进程号
$ netstat -anp | grep `port`

# 根据进程号，查看占用端口信息
$ netstat -anp | grep `pid`

# 查看路由表， 每列的详细含义可以阅读`man route`的 OUTPUT 部分
# See the description in route(8) for details. `netstat -r` and `route -e` produce the same output.
# 
# Destination: 
#   The destination network or destination host.
# Gateway: 
#   The gateway address or '*' if none set. 如果需要前往 Destination，应该从哪个网关过去。
# Genmask: 
#   The netmask for the destination net; '255.255.255.255' for a host destination and '0.0.0.0' for the default route.
# Flags: 
#   Possible flags include
#   U (route is up)
#   H (target is a host)
#   G (use gateway)
#   R (reinstate route for dynamic routing)
#   D (dynamically installed by daemon or redirect)
#   M (modified from routing daemon or redirect)
#   A (installed by addrconf)
#   C (cache entry)
#   !  (reject route)
# MSS:
#   Default maximum segment size for TCP connections over this route.
# Window:
#   Default window size for TCP connections over this route.
# irtt:
#   Initial RTT (Round Trip Time). The kernel uses this to guess about the best TCP protocol parameters without waiting on (possibly slow) answers.
# Iface:
#   Interface to which packets for this route will be sent.
$ netstat -r
Kernel IP routing table
Destination       Gateway         Genmask          Flags   MSS Window  irtt  Iface
default           gateway         0.0.0.0          UG      0   0       0     eth0
10.10.40.0        0.0.0.0         255.255.252.0    U       0   0       0     eth0
link-local        0.0.0.0         255.255.0.0      U       0   0       0     eth0
169.254.169.254   10.10.40.10     255.255.255.255  UGH     0   0       0     eth0

# 按 State 分组计算 tcp 连接数
$ netstat -a | awk '/^tcp/ {++SCount[$NF]};END {for(state in SCount) print state, SCount[state]}' 
LISTEN 208
ESTABLISHED 20
TIME_WAIT 50

ss

ss is used to dump socket statistics. It allows showing information similar to netstat. It can display more TCP and state informations than other tools.

# Print summary statistics. This option does not parse socket lists obtaining summary from various sources.
# It is useful when amount of sockets is so huge that parsing `/proc/net/tcp` is painful.
$ ss -s
Total: 1364 (kernel 0)
TCP:   862 (estab 20, closed 634, orphaned 0, synrecv 0, timewait 512/0), ports 0

Transport Total     IP        IPv6
*	        0         -         -        
RAW	      0         0         0        
UDP	      3         3         0        
TCP	      228       70        158      
INET	    231       73        158      
FRAG	    0         0         0

lsof

在 Linux 环境下，任何事物都以文件的形式存在，通过文件不仅仅可以访问常规数据，还可以访问网络连接和硬件。所以如传输控制协议 (TCP) 和用户数据报协议 (UDP) 套接字等，系统在后台都为该应用程序分配了一个文件描述符，无论这个文件的本质如何，该文件描述符为应用程序与基础操作系统之间的交互提供了通用接口。因为应用程序打开文件的描述符列表提供了大量关于这个应用程序本身的信息，因此通过lsof工具能够查看这个列表对系统监测以及排错将是很有帮助的。

# lsof（list open files）是一个列出当前系统打开文件的工具。
$ lsof
# 在终端下输入lsof即可显示系统打开的文件，因为 lsof 需要访问核心内存和各种文件，所以必须以 root 用户的身份运行它才能够充分地发挥其功能。 
# - COMMAND：进程的名称
# - PID：进程标识符
# - USER：进程所有者
# - FD：文件描述符，应用程序通过文件描述符识别该文件。如cwd、txt等
# - TYPE：文件类型，如DIR、REG等
# - DEVICE：指定磁盘的名称
# - SIZE：文件的大小
# - NODE：索引节点（文件在磁盘上的标识）
# - NAME：打开文件的确切名称
COMMAND    PID      USER   FD      TYPE     DEVICE     SIZE       NODE      NAME
init       1        root   cwd     DIR      3,3        1024       2         /
init       1        root   rtd     DIR      3,3        1024       2         /
init       1        root   txt     REG      3,3        38432      1763452   /sbin/init
init       1        root   mem     REG      3,3        106114     1091620   /lib/libdl-2.6.so
...

# 命令格式
$ lsof ［options］ filename1 filename2 ...
# options
# -a          表示其他参数都必须满足时才显示结果，相当于 and 连接符。
# -c string   显示COMMAND列中包含指定字符的进程所有打开的文件
# -u user     显示所属user进程打开的文件
# -g gid      显示归属gid的进程情况
# +d /DIR/    显示目录下被进程打开的文件
# +D /DIR/    同上，递归搜索目录下的所有目录，时间相对较长
# -d FD       显示指定文件描述符的进程
# -n          不将 IP 转换为 hostname，即已 IP 形式展现
# -r [time]   控制lsof不断重复执行，直到收到中断信号，time 单位秒，缺省是15s刷新。
# +r [time]   lsof会一直执行，直到没有档案被显示。
# -p pid      显示指定进程打开的文件
# -i          用以显示符合条件的进程情况
lsof -i[4|6][protocol][@[hostname|hostaddr]][:[service|port]]
    # 46 --> 显示类型是 IPv4 or IPv6的，默认都显示。
    # protocol --> TCP or UDP，默认都显示。
    # hostname --> Internet host name
    # hostaddr --> IPv4地址
    # service --> /etc/service 中的 service name (可以不只一个)
    # port --> 端口号 (可以不只一个)
# 例如显示IPV4、TCP、s12179的22端口的信息。
$ lsof -i4tcp@s12179:22
# 显示结果如下
COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
sshd     3504 root    3r  IPv4 1507618      0t0  TCP s12179:ssh->10.4.124.98:54278 (ESTABLISHED)
sshd    30532 root    3u  IPv4 1024660      0t0  TCP s12179:ssh->10.4.126.51:56584 (ESTABLISHED)

# 根据端口号查看进程号
$ lsof -i:port 
# 或者
$ lsof -i | grep port

# 根据进程号查看端口号
# -P 端口以数字形式展示，而不是端口名称
$ lsof -n -P -i TCP | grep pid

# -s 配合 -i 使用，用于过滤输出
$ lsof -n -P -i TCP -s TCP:LISTEN | grep pid

# 查看某进程打开 TCP 连接情况
$ lsof -p 23809 | grep TCP

通过 /proc 查看打开文件数最多的前20个进程

what is proc

for pid in `ps -eF | awk '{ print $2 }'`
do 
echo `ls /proc/$pid/fd 2> /dev/null | wc -l`  $pid  `cat /proc/$pid/cmdline 2> /dev/null`
done | sort -n -r | head -n 20

nmon

奈吉尔性能监视器，用来检测系统的所有资源:CPU、内存、磁盘使用率、网络上的进程、NFS、内核等等。这个工具有两个模式: 在线模式和捕捉模式。在线模式适用于实时监控，捕捉模式用于存储输出为 CSV 文件，然后通过nmon_analyzer工具产生数据文件与图形化结果。

# 这个得安装
$ yum install -y nmon

内存监控

查看内存使用情况

$ free -m # 以MB为单位显示
$ free -g # 以GB为单位显示
              total        used        free      shared  buff/cache   available
Mem:            125          66           0           0          57          57
Swap:             3           0           3

# 内存并不是只有使用(used)和空闲(free)两个状态，还有缓冲(buffer)和缓存(cache)，从字面义上这两个都是缓存，
# 但是缓存的数据不同。

# buffer 一般都不太大，在通用的 Linux 系统中一般为几十到几百MB，它是用来存储磁盘块设备的元数据的，比如哪些
# 块属于哪些文件，文件的权限目录信息等。

# cache 一般很大(GB)，因为它用来存储读写文件的页，当对一个文件进行读时，会取磁盘文件页，放到内存，然后从内存
# 中进行读取；而写文件的时候，会先将数据写到缓存中，并将相关的页面标记为`dirty`。cache 会随着读写磁盘的多少
# 自动的增加或减少，这也取决于物理内存是否够用，如果物理内存不够用了，操作系统会适当的减少 buffer/cache 以
# 保证应用有足够的内存使用。

查看进程中各个模块占用内存情况

显示比较底层的进程模块所占用内存的信息，并且可以打印内存的起止地址等，可以用于定位深层次的JVM或者操作系统的内存问题

$ pmap -d 15450
15450:   /usr/java/jdk1.8.0_191-amd64/jre/bin/java -cp /root/spark-2.4.2-bin-hadoop2.7/conf/:/root/spark-2.4.2-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://rockfs:7077
Address           Kbytes Mode  Offset           Device    Mapping
0000000000400000       4 r-x-- 0000000000000000 0fd:00000 java
0000000000600000       4 r---- 0000000000000000 0fd:00000 java
0000000000601000       4 rw--- 0000000000001000 0fd:00000 java
0000000002446000     132 rw--- 0000000000000000 000:00000   [ anon ]
00000000c0000000  469504 rw--- 0000000000000000 000:00000   [ anon ]
00000000dca80000  229888 ----- 0000000000000000 000:00000   [ anon ]
00000000eab00000  327168 rw--- 0000000000000000 000:00000   [ anon ]
00000000fea80000   22016 ----- 0000000000000000 000:00000   [ anon ]
0000000100000000    4480 rw--- 0000000000000000 000:00000   [ anon ]
0000000100460000 1044096 ----- 0000000000000000 000:00000   [ anon ]
...
00007fd4fc000000       4 r--s- 0000000000000000 0fd:00000 kubernetes-model-common-4.1.2.jar
00007fd4fc001000       4 r--s- 000000000000b000 0fd:00000 snappy-0.2.jar
00007fd4fc002000      44 r--s- 00000000000b0000 0fd:00000 jtransforms-2.4.0.jar
00007fd4fc00d000       4 r--s- 000000000000a000 0fd:00000 apacheds-i18n-2.0.0-M15.jar
00007fd4fc00e000       4 r--s- 0000000000009000 0fd:00000 hadoop-annotations-2.7.3.jar
00007fd4fc00f000       8 r--s- 000000000000e000 0fd:00000 hadoop-mapreduce-client-jobclient-2.7.3.jar
00007fd4fc011000      20 r--s- 0000000000027000 0fd:00000 hk2-api-2.4.0-b34.jar
00007fd4fc016000       8 r--s- 0000000000004000 0fd:00000 stax-api-1.0-2.jar
00007fd4fc018000      12 r--s- 000000000000f000 0fd:00000 spark-network-shuffle_2.12-2.4.2.jar
...
mapped: 11446060K    writeable/private: 1390808K    shared: 13876K

磁盘

查看磁盘使用情况

$ df -lh
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root  2.7T  473G  2.2T  18% /
devtmpfs                  63G     0   63G   0% /dev
tmpfs                     63G     0   63G   0% /dev/shm
tmpfs                     63G   19M   63G   1% /run
tmpfs                     63G     0   63G   0% /sys/fs/cgroup
/dev/sda2               1014M  137M  878M  14% /boot
/dev/sda1                200M   12M  189M   6% /boot/efi
tmpfs                     13G     0   13G   0% /run/user/0

# 查看Inode使用情况
$ df -i
Filesystem       Inodes IUsed    IFree IUse% Mounted on
/dev/vda1      26213824 81171 26132653    1% /
devtmpfs        4095237   361  4094876    1% /dev
tmpfs           4097693     1  4097692    1% /dev/shm
tmpfs           4097693   453  4097240    1% /run
tmpfs           4097693    16  4097677    1% /sys/fs/cgroup
tmpfs           4097693     1  4097692    1% /run/user/0
tmpfs           4097693     1  4097692    1% /run/user/109910
/dev/vdb1      65536000  1664 65534336    1% /data
tmpfs           4097693     1  4097692    1% /run/user/109908

# 删除文件后，空间不释放
# 查看占用文件的 PID
$ lsof | grep deleted
zk-sessio  6350 12754       root    2w      REG              253,1 40460951552   67720033 /home/eagle/kafka-eagle-web-2.0.1/kms/logs/catalina.out (deleted)
zk-sessio  6350 12754       root    8w      REG              253,1        5595   68975530 /home/eagle/kafka-eagle-web-2.0.1/kms/logs/catalina.2020-11-21.log (deleted)
zk-sessio  6350 12754       root    9w      REG              253,1         416   67720044 /home/eagle/kafka-eagle-web-2.0.1/kms/logs/localhost.2020-09-29.log (deleted)
zk-sessio  6350 12754       root   10w      REG              253,1           0   67720322 /home/eagle/kafka-eagle-web-2.0.1/kms/logs/manager.2020-09-29.log (deleted)
zk-sessio  6350 12754       root   11w      REG              253,1           0   67726356 /home/eagle/
...
# 终止进程
$ kill -9 6350

索引节点(INode)爆满问题处理

磁盘 Direct IO 顺序写读写性能测试

# 测试写入20G数据，数据量越大，测试值应该更精确
$ sync;/usr/bin/time -p bash -c "(dd if=/dev/mapper/centos-root of=test.dd bs=1M count=20000;sync)"
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 13.852 s, 1.5 GB/s
real 25.89
user 0.01
sys 17.03

# real write speed: 21GB / 25.89s = 0.811 GB/s

# sync 刷新文件系统的缓冲区，把内存中的数据缓冲写入到磁盘中。先执行下sync命令，是为了减少对后面测试的影响。也可以使用 echo 3 > /proc/sys/vm/drop_caches 来清除缓存。
# time 命令用来测试命令的执行时间，shell内建还有一个time命令，我们这里使用全路径来指定使用的是非内建命令。-p 选项设置时间的输出格式为POSIX缺省时间格式，单位是秒
# bash 命令 -c 选项的作用是将后面的字符串参数当作bash脚本来执行，看起来有些画蛇添足，好像直接执行也是可行的，其实不然，因为后面字符串中包含了两条命令行，而time命令需要统计这两条命令行的执行时间。
# 小括号的意思是另起一个子进程来执行括号中的脚本
# if=FILE read from FILE instead of stdin
# of=FILE write to FILE instead of stdout
# bs=BYTES read and write up to BYTES bytes at a time，默认 512KB
# count=N copy only N input blocks

# 读取20GB数据
$ echo 3 >/proc/sys/vm/drop_caches;time -p dd if=test.dd of=/dev/null bs=1M
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB) copied, 47.3728 s, 443 MB/s
real 47.38
user 0.01
sys 8.89

# real read speed: 21GB / 47.38s = 0.443 GB/s，读取不需要 sync，因此基本没有误差

# 或者使用hdparm，需要安装，且用 root 用户执行，测试 3 秒能读取多少数据
$ hdparm -t --direct /dev/mapper/centos-root

/dev/mapper/centos-root:
 Timing O_DIRECT disk reads: 2994 MB in  3.00 seconds = 997.47 MB/sec

查看磁盘实时 IO 信息

$ yum install -y sysstat
# 从磁盘设备 sda 角度统计 io
# -x 输出扩展信息，interval 5 
$ iostat -x 5
Linux 3.10.0-957.el7.x86_64 (rockfs)    05/31/2019      _x86_64_        (40 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.11    0.00    0.21    0.05    0.00   97.64

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.04     0.22   18.94   30.77  3169.87  7600.27   433.25     0.65   13.13    0.95   20.63   0.35   1.73
dm-0              0.00     0.00   18.94   30.90  3169.64  7599.91   432.21     0.66   13.17    0.95   20.66   0.35   1.73
dm-1              0.00     0.00    0.05    0.09     0.20     0.36     8.02     0.00   31.43    1.20   48.33   0.09   0.00
...
# rrqm/s、wrqm/s: The number of read/write requests merged per second that were queued to the device
# r/s、w/s: 指的 IOPS（Input/Output Operations Per Second) 每秒读写次数，after merge
# rkB/s、wkB/s: 指的是数据存取数据
# await 指的是 IO 平均等待时间，一般都在 10ms 左右，越大说明磁盘负载越大越繁忙
# $util 设备带宽利用率，越接近 100% 越饱和

# 从进程角度获取每个进程使用cpu、内存和磁盘等系统资源的统计信息
$ pidstat

$ yum install -y iotop
$ iotop
Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                         
20698 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/16:2]
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd --switched-root --system --deserialize 22
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
    5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
    6 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u80:0]

# iotop 命令的快捷键
1、左右箭头改变排序方式，默认是按IO排序
2、r 键是反向排序
3、o 键是只显示有IO输出的进程
4、q 是退出

iozone

# 官网 http://www.iozone.org/
# 下载地址 http://www.iozone.org/src/current/
$ wget http://www.iozone.org/src/current/iozone-3-487.src.rpm
$ rpm -ivh iozone-3-487.src.rpm
$ cd ~/rpmbuild/SOURCES
$ ll 
-rw-r--r--. 1 root root 1873920 Jan 18 15:26 iozone3_487.tar
# 或者直接下载
$ wget http://www.iozone.org/src/current/iozone3_487.tar
$ tar -xvf iozone3_487.tar
# 查看帮助文档
$ cd iozone3_487/docs && man ./iozone.1
# 编译
$ cd iozone3_487/src/current
# 查看编译参数
$ make
# 根据操作系统架构选择不同的参数
$ uname -m
x86_64
$ make linux-AMD64
$ ll -t 
-rwxr-xr-x. 1 root root  18616 Jun  2 23:45 pit_server
-rwxr-xr-x. 1 root root  44304 Jun  2 23:45 fileop
-rwxr-xr-x. 1 root root 359368 Jun  2 23:45 iozone
...
# 查看使用帮助
$ ./pit_server -h
$ ./fileop -h
$ ./iozone -h

$ ./iozone -Raz -b lab-2G.xls -g 2G |tee 2G.log

# -i 比如测试写: -i 0，测试读和写: -i 0 -i 1。 
  -R 产生excel格式的输出（仅显示在屏幕上,不会产生excel文件）
  -b 产生excel格式的文件
  -g 最大测试文件大小 for auto mode
  -t 并发数
  -s 测试文件的大小,支持-k -m -g
  -q 块大小 for auto mode
  -r 文件块大小。 
  -a 在希望的文件系统上测试，不过只有-a的话会进行全面测试，要花费很长时间，最好用-i指定测试范围。 
  -n 指定最小测试文件大小。 
  -f 指定测试文件。 
  -C 显示每个节点的吞吐量。 
  -c 测试包括文件的关闭时间 
  
# 附: -i 参数
  0=write/rewrite
  1=read/re-read
  2=random-read/write
  3=Read-backwards
  4=Re-write-record
  5=stride-read
  6=fwrite/re-fwrite
  7=fread/Re-fread,
  8=random mix
  9=pwrite/Re-pwrite
  10=pread/Re-pread
  11=pwritev/Re-pwritev,
  12=preadv/Re-preadv

python tqdm

$ pip install tqdm
$ vim iospeed.py
from tqdm import tqdm
import os
import time

# 写性能测试
file_size = 2*1000**3
trunk_size = 1000**2

data = os.urandom(trunk_size)
with tqdm(unit='B', unit_scale=True, total=file_size, smoothing=0) as pbar:
    with open('tmp', mode='wb', buffering=trunk_size) as f:
        for i in range(file_size//trunk_size):
            f.write(data)
            pbar.update(len(data))
        os.system('sync; echo 1 > /proc/sys/vm/drop_caches')

# 读性能测试
trunk_size = 1000**2
with open('tmp', mode='rb', buffering=trunk_size) as f:
    with tqdm(unit='B', unit_scale=True, total=os.fstat(f.fileno()).st_size,
              smoothing=0) as pbar:
        while True:
            d = f.read(trunk_size)
            pbar.update(len(d))
            if len(d) == 0:
                break

os.remove('tmp')

$ python iospeed.py

CPU 监控

查看 cpu 信息

# 服务器包含若干个 cpu，一个 cpu 包含若干物理核，一个物理核又包含若干逻辑核。
# 总核数 = 物理 cpu 个数 * 每个物理 cpu 的核数
# 超线程数 = 每个 cpu 线程数(siblings)  / 每个 cpu 物理核数(cpu cores)，大于 1 表示启用了超线程技术，可以在逻辑上分几倍数量的 cpu core 出来。
# 总逻辑 cpu 数 = 物理 cpu 个数(physical cpus) * 每个物理 cpu 核数(cpu cores) * 超线程数 = 物理 cpu 个数(physical cpus) * 每个 cpu 线程数(siblings)

$ cat /proc/cpuinfo
# 物理 cpu 个数    
$ cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l
# 每个 cpu 物理核数
$ cat /proc/cpuinfo | grep "cpu cores" | uniq
# 每个 cpu 线程数  
$ cat /proc/cpuinfo | grep "siblings" | uniq
# 总 cpu 逻辑核数
$ cat /proc/cpuinfo | grep -c "processor" 
# 或者
$ grep -c 'model name' /proc/cpuinfo
# 超线程指物理内核+逻辑内核，芯片上只存在一个物理内核，但是这个物理内核可以模拟出一个逻辑内核，于是系统信息就显示了两个内核，一真一假(一个核可以当两个用)。

# 查看 cpu 型号
$ cat /proc/cpuinfo|grep name|cut -f2 -d:|uniq -c


$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 13
Model name:            QEMU Virtual CPU version 2.5+
Stepping:              3
CPU MHz:               2099.998
BogoMIPS:              4199.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
L3 cache:              16384K
NUMA node0 CPU(s):     0-7
Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl xtopology pni cx16 x2apic hypervisor lahf_lm

top

# 查看cpu负载情况，一秒一刷新
$ top -d 1
top - 13:42:52 up 39 days, 10 min,  3 users,  load average: 0.19, 0.16, 0.14
Tasks: 224 total,   1 running, 223 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.3 us,  0.3 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 32781820 total,   290688 free, 15894116 used, 16597016 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 16378768 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                         
 7755 hbase     20   0 3029804 423140   7464 S   1.0  1.3  76:21.68 java
.
.
.                                                                                          

下面说一下每一行的参数都是啥意思

第一行 top -

字符	含义
13:42:52	当前系统时间
up 39 days, 10 min	系统已运行时间
3 users	当前有3个用户在线(包含系统用户)
load average: 0.19, 0.16, 0.14	分别为1分钟、5分钟、15分钟前到现在系统负载的平均值(即任务队列的平均长度)

判断 load average 是否正常是取决于你的系统拥有多少核，如果你有2个CPU，每个CPU有4个核，那么这个数小于8.0就都是正常的

第二行 Tasks:

字符	含义
224 total	total 总进程数
1 running	正在运行的进程数
223 sleeping	正在睡眠的进程数
0 stopped	停止的进程数
0 zombie	僵尸进程数

第三行 %Cpu(s):

字符	含义
1.3 us	用户进程占用 CPU 百分率
0.3 sy	系统进程占用 CPU 百分率
0.0 ni	用户进程空间内改变过优先级的进程占用 CPU 百分比
98.3 id	CPU 空闲率
0.0 wa	等待 IO 的 CPU 时间百分比
0.0 hi	硬中断（Hardware IRQ）占用 CPU 的百分比
0.0 si	软中断（Software Interrupts）占用 CPU 的百分比
0.0 st	虚拟机管理程序从此虚拟机窃取的时间

第四行 KiB Mem:

字符	含义
32781820 total	总内存
290688 free	内存空闲量
15894116 used	实际使用内存
16597016 buff/cache	缓存

总内存 - 实际使用内存 = 内存空闲量 + 缓存

第五行 KiB Swap:

字符	含义
0 total	交换区总量
0 free	交换区空闲量
0 used	交换区使用量
16378768 avail Mem	还可以使用的交换区容量

第六行

字符	含义
PID	进程号
USER	进程创建用户
PR	进程优先级
NI	nice 值。越小优先级越高，最小 -20，最大 20（用户设置最大19）
VIRT	进程使用的虚拟内存总量，单位 kb。VIRT=SWAP+RES
RES	进程使用的、未被换出的物理内存大小，单位 kb。RES=CODE+DATA
SHR	共享内存大小，单位kb
S	进程状态。D=不可中断的睡眠状态 R=运行 S=睡眠 T=跟踪/停止 Z=僵尸进程
%CPU	进程占用 CPU 百分比
%MEM	进程占用内存百分比
TIME+	进程运行时间
COMMAND	进程名称

进入 top 后还可以使用的命令

1: 在多核 cpu 和总 cpu(s) 之间切换，如查看多核负载情况

进入top后，按1可以

top - 13:44:25 up 39 days, 12 min,  3 users,  load average: 0.08, 0.14, 0.14
Tasks: 223 total,   1 running, 222 sleeping,   0 stopped,   0 zombie
%Cpu0  :  2.0 us,  0.0 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu2  : 52.5 us,  3.0 sy,  0.0 ni, 44.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  7.0 us,  0.0 sy,  0.0 ni, 93.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  9.0 us,  1.0 sy,  0.0 ni, 90.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  3.0 us,  1.0 sy,  0.0 ni, 96.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
.
.
.

P: 以占据CPU百分比排序
M: 以占据内存百分比排序
T: 以累积占用CPU时间排序
q: 退出 top 查看页面
s: 按下 s 键，然后按下数字，即可修改刷新时间间隔为你输入的数字，单位为秒
k: 终止指定的进程。按下k键，再输入要杀死的进程的pid，按enter键(选择信号类型，以数字标示，默认15为杀死)本步可省略按enter键（常用为-9）

其他常用的参数

# -b: batch mode，在把输出写到文件的时候很有用，在 batch mode 下 top 不会接收 input 并且会
#     一直运行直到 -n 指定的次数或者被kill掉
# -p: 指定 pid
$ top -d 0.5 -b -p 1254 > top.log &

# 查询某个程序中 CPU 使用较高的线程
$ top -Hp <pid>
$ strace -p <threadId>

vmstat

显示关于内核线程、虚拟内存、磁盘 IO、线程和 CPU 占用率的统计信息

$ vmstat
procs -----------memory----------   ---swap--  -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache    si   so    bi    bo    in  cs   us sy id wa st
 0  0  40448 96350840  0   21916900   0    0    80   189    0   0    2  0  98 0  0

FIELD DESCRIPTION FOR VM MODE
   Procs
       r: The number of runnable processes (running or waiting for run time).
       b: The number of processes in uninterruptible sleep.

   Memory
       swpd: the amount of virtual memory used.
       free: the amount of idle memory.
       buff: the amount of memory used as buffers.
       # buff 是 I/O 系统存储的磁盘块文件的元数据的统计信息
       cache: the amount of memory used as cache.
       # cache 是操作系统用来缓存磁盘数据的缓冲区，操作系统会自动调解这个参数，在内存紧张的时候会减少 cache 占用的内存空间以保证其他进程可用
       inact: the amount of inactive memory.  (-a option)
       active: the amount of active memory.  (-a option)

   Swap
       si: Amount of memory swapped in from disk (/s).
       so: Amount of memory swapped to disk (/s).
       # si、so 较大说明系统频繁使用交换区(swap)，可能是因为内存不够用造成的

   IO
       bi: Blocks received from a block device (blocks/s).
       bo: Blocks sent to a block device (blocks/s).
       # bi、bo 代表I/O 活动，根据其大小可以知道磁盘的负载情况
       
   System
       in: The number of interrupts per second, including the clock.
       cs: The number of context switches per second.
       # cs 表示线程环境的切换次数，该数值太大表明线程的同步机制有问题
   
   CPU
       These are percentages of total CPU time.
       us: Time spent running non-kernel code.  (user time, including nice time)
       sy: Time spent running kernel code.  (system time)
       id: Time spent idle.  Prior to Linux 2.5.41, this includes IO-wait time.
       wa: Time spent waiting for IO.  Prior to Linux 2.5.41, included in idle.
       st: Time stolen from a virtual machine.  Prior to Linux 2.6.11, unknown.

mpstat

用于实时监控 CPU 的一些统计信息，这些统计信息实际存在/proc/stat文件中，可以查看多核心 CPU 的平均使用信息，也可以查看某个特定的 CPU 信息

$ mpstat -P ALL
Linux 3.10.0-957.el7.x86_64 (rockfs)    05/31/2019      _x86_64_        (40 CPU)

08:30:10 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
08:30:10 AM  all    2.09    0.00    0.20    0.05    0.00    0.00    0.00    0.00    0.00   97.66
08:30:10 AM    0    2.40    0.00    0.38    0.05    0.00    0.00    0.00    0.00    0.00   97.17
...
08:30:10 AM   39    1.86    0.00    0.06    0.02    0.00    0.00    0.00    0.00    0.00   98.07

CPU
     Processor number. The keyword all indicates that statistics are calculated as averages among all processors.

%usr
     Show the percentage of CPU utilization that occurred while executing at the user level (application).

%nice
     Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.

%sys
     Show the percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this does not include time spent servicing hardware and software interrupts.

%iowait
     Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

%irq
     Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.

%soft
     Show the percentage of time spent by the CPU or CPUs to service software interrupts.

%steal
     Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.

%guest
     Show the percentage of time spent by the CPU or CPUs to run a virtual processor.

%gnice
     Show the percentage of time spent by the CPU or CPUs to run a niced guest.

%idle
     Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

pprof 火焰图

通过火焰图查看程序CPU内存占用

Go pprof 性能调优