剖析生产系统的I/O模式
阅读原文时间:2023年07月08日阅读:2

剖析生产系统的I/O模式

2019/02/13 vmunix

了解I/O的特点对于优化系统性能非常重要,I/O是顺序的还是随机的,是读操作还是写操作,读写的比例是多少,I/O数据块的大小,这些都是影响性能的关键因素。很多存储设备都基于特定的I/O模式做过调校,通用的测试工具跑分都相当漂亮,然而一到实际环境区别就来了,同样的应用环境下,不同的设备表现可能天差地别。我就见过不同厂商的设备,档次差不多,测试跑分高的那个在生产环境下的IO响应速度却慢了十倍。所以跑分高的设备真的不一定适合你的应用。

如果能够模拟出应用的I/O模式,那么在问题复现、乃至设备选型等方面都会有很大帮助。在此之前,了解I/O模式是第一步,这并不容易,像iostat之类的工具只能看到平均值,然而应用系统的I/O请求有可能是波浪式的,一秒之内也可以时高时低,I/O延迟可能平均值不高但是波动很大,而且I/O块大小也可以是变化的,尤其现在大数据应用的块大小可能在很大的范围内变化,与过去常见的交易型数据库有所不同,它们的块大小基本是固定的。

要剖析生产系统的I/O模式,好像没有现成的工具。但是我们可以利用blktrace自己做一个,blktrace在内核的block layer记录每一个I/O,提供了分析的素材。它记录的格式如下:

下面是一个简化版的示例,主要利用了Event “Q”和”C”,分别表示IO开始和IO完成,两者之间的耗时就相当于iostat看到的await,但blktrace可以精确到单个IO:

#!/bin/bash

if [ $# -ne 1 ]; then
echo "Usage: $0 "
exit
fi
if [ ! -b $1 ]; then
echo "could not find block device $1"
exit
fi

duration=10
echo "running blktrace for $duration seconds to collect data…"
timeout $duration blktrace -d $1 >/dev/null 2>&1

DEVNAME=`basename $1`

echo "parsing blktrace data…"
blkparse -i $DEVNAME |sort -g -k8 -k10 -k4 |awk '
BEGIN {
total_read=0;
total_write=0;
maxwait_read=0;
maxwait_write=0;
}
{
if ($6=="Q") {
queue_ts=$4;
block=$8;
nblock=$10;
rw=$7;
};
if ($6=="C" && $8==block && $10==nblock && $7==rw) {
await=$4-queue_ts;
if (rw=="R") {
if (await>maxwait_read) maxwait_read=await;
total_read++;
read_count_block[nblock]++;
if (await>0.001) read_count1++;
if (await>0.01) read_count10++;
if (await>0.02) read_count20++;
if (await>0.03) read_count30++;
}
if (rw=="W") {
if (await>maxwait_write) maxwait_write=await;
total_write++;
write_count_block[nblock]++;
if (await>0.001) write_count1++;
if (await>0.01) write_count10++;
if (await>0.02) write_count20++;
if (await>0.03) write_count30++;
}
}
} END {
printf("========\nsummary:\n========\n");
printf("total number of reads: %d\n", total_read);
printf("total number of writes: %d\n", total_write);
printf("slowest read : %.6f second\n", maxwait_read);
printf("slowest write: %.6f second\n", maxwait_write);
printf("reads\n> 1ms: %d\n>10ms: %d\n>20ms: %d\n>30ms: %d\n", read_count1, read_count10, read_count20, read_count30);
printf("writes\n> 1ms: %d\n>10ms: %d\n>20ms: %d\n>30ms: %d\n", write_count1, write_count10, write_count20, write_count30);
printf("\nblock size:%16s\n","Read Count");
for (i in read_count_block)
printf("%10d:%16d\n", i, read_count_block[i]);
printf("\nblock size:%16s\n","Write Count");
for (i in write_count_block)
printf("%10d:%16d\n", i, write_count_block[i]);
}'

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

#!/bin/bash

if [ $# -ne 1 ]; then

        echo "Usage: $0 "

        exit

fi

if [ ! -b $1 ]; then

        echo "could not find block device $1"

        exit

fi

duration=10

echo "running blktrace for $duration seconds to collect data…"

timeout $duration blktrace -d $1 >/dev/null 2>&1

DEVNAME=`basename $1`

echo "parsing blktrace data…"

blkparse -i $DEVNAME |sort -g -k8 -k10 -k4 |awk '

BEGIN   {

        total_read=0;

        total_write=0;

        maxwait_read=0;

        maxwait_write=0;

}

{

        if ($6=="Q") {

                queue_ts=$4;

                block=$8;

                nblock=$10;

                rw=$7;

        };

        if ($6=="C" && $8==block && $10==nblock && $7==rw) {

                await=$4-queue_ts;

                if (rw=="R") {

                        if (await>maxwait_read) maxwait_read=await;

                        total_read++;

                        read_count_block[nblock]++;

                        if (await>0.001) read_count1++;

                        if (await>0.01) read_count10++;

                        if (await>0.02) read_count20++;

                        if (await>0.03) read_count30++;

                }

                if (rw=="W") {

                        if (await>maxwait_write) maxwait_write=await;

                        total_write++;

                        write_count_block[nblock]++;

                        if (await>0.001) write_count1++;

                        if (await>0.01) write_count10++;

                        if (await>0.02) write_count20++;

                        if (await>0.03) write_count30++;

                }

        }

} END   {

        printf("========\nsummary:\n========\n");

        printf("total number of reads: %d\n", total_read);

        printf("total number of writes: %d\n", total_write);

        printf("slowest read : %.6f second\n", maxwait_read);

        printf("slowest write: %.6f second\n", maxwait_write);

        printf("reads\n> 1ms: %d\n>10ms: %d\n>20ms: %d\n>30ms: %d\n", read_count1, read_count10, read_count20, read_count30);

        printf("writes\n> 1ms: %d\n>10ms: %d\n>20ms: %d\n>30ms: %d\n", write_count1, write_count10, write_count20, write_count30);

        printf("\nblock size:%16s\n","Read Count");

        for (i in read_count_block)

                printf("%10d:%16d\n", i, read_count_block[i]);

        printf("\nblock size:%16s\n","Write Count");

        for (i in write_count_block)

                printf("%10d:%16d\n", i, write_count_block[i]);

}'

输出示例:

========
summary:
========
total number of reads: 1081513
total number of writes: 0
slowest read : 0.032560 second
slowest write: 0.000000 second
reads
> 1ms: 18253
>10ms: 17058
>20ms: 17045
>30ms: 780
writes
> 1ms: 0
>10ms: 0
>20ms: 0
>30ms: 0

block size: Read Count
256: 93756
248: 1538
64: 98084
56: 7475
8: 101218
48: 15889
240: 1637
232: 1651
224: 1942
40: 21693
216: 1811
32: 197893
208: 1907
24: 37787
128: 97382
16: 399850

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

========

summary:

========

total number of reads: 1081513

total number of writes: 0

slowest read : 0.032560 second

slowest write: 0.000000 second

reads

> 1ms: 18253

>10ms: 17058

>20ms: 17045

>30ms: 780

writes

> 1ms: 0

>10ms: 0

>20ms: 0

>30ms: 0

block size:      Read Count

       256:           93756

       248:            1538

        64:           98084

        56:            7475

         8:          101218

        48:           15889

       240:            1637

       232:            1651

       224:            1942

        40:           21693

       216:            1811

        32:          197893

       208:            1907

        24:           37787

       128:           97382

        16:          399850

这个例子统计了IO的读/写数量、最大延迟、延迟的分布情况、块大小及数量,这些信息比iostat要具体得多,有助于进一步了解系统的IO模式。blktrace数据还有更多的利用空间等待你去发掘,譬如还可以根据时间戳去统计每个毫秒内的IO数,有助于更微观地了解IO请求数量的波动。

参考:
利用BLKTRACE分析IO性能

转载自:

http://linuxperf.com/?cat=11