实验系统

虚拟机环境：VMware® Workstation 17 Player
操作系统：Ubuntu 22.04 server
虚拟硬件：4核心 4G内存
软件版本：
- gcc：11.4.0
- clang：14.0.0
- python：3.10.12
- java：openjdk 21.0.3
- Linux perf：5.15.152

系统安装与准备

问题与解决方法

遇到报错：hub_ext_port_status
- 解决方法：在菜单 > 管理 > 虚拟机设置 > USB 控制器 > 与虚拟机共享蓝牙设备一栏将勾选去掉即可。
虚拟机的输入有较大延迟，并且字体过小
- 解决方法：使用Xshell连接SSH进行操作

常用工具命令操作联系

1 2	uname -a # Linux ubuntu 5.15.0-112-generic #122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

内核名称	主机名称	内核发行号	内核版本	主机的硬件架构	处理器类型	硬件平台	操作系统名称
Linux	ubuntu	5.15.0-112-generic	#122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024	x86_64	x86_64	x86_64	GNU/Linux

sysctl -a

功能：列出系统内核的所有参数
与/proc/sys的关系：该文件夹下有多个子文件夹，每个子文件夹下以文件的形式存放有若干参数。该命令遍历/proc/sys文件夹，将其中的信息以“子文件夹.子项目名称”的形式逐条输出。

top

该命令进入交互式界面，按q退出。
该命令用于实时显示进程的动态。

PID	USER	PR	NI	VIRT	RES	SHR	S	%CPU	%MEM	TIME+	COMMAND
进程ID	进程所有者的用户名	优先级	nice值（负值表示搞优先级，正值表示低优先级）	进程使用的虚拟内存总量（KB）	进程使用的未被换出的物理内存大小(KB)	共享内存大小（KB）	进程状态	进程上次更新到现在的CPU时间占用百分比	物理内存占比	进程使用的CPU时间总计（单位1/100秒）	命令名

dmidecode

用于在Linux下获取硬件信息，遵循SMBIOS/DMI标准，可获取包括BIOS、系统、主板、处理器、内存、缓存等等硬件信息。

以内存为例，能获取到如下信息：

Handle 0x0222, DMI type 17, 34 bytes
Memory Device
	Array Handle: 0x0025
	Error Information Handle: No Error
	Total Width: 32 bits
	Data Width: 32 bits
	Size: No Module Installed
	Form Factor: DIMM
	Set: None
	Locator: NVD #63
	Bank Locator: NVD #63
	Type: Other
	Type Detail: Unknown
	Speed: Unknown
	Manufacturer: Not Specified
	Serial Number: Not Specified
	Asset Tag: Not Specified
	Part Number: Not Specified
	Rank: Unknown
	Configured Memory Speed: Unknown

numactl -H
# available: 1 nodes (0)
# node 0 cpus: 0 1 2 3
# node 0 size: 3875 MB
# node 0 free: 1045 MB
# node distances:
# node   0 
#   0:  10

NUMA（Non-Uniform Memory Access，非一致性内存访问）是一种用于多处理器计算机系统的内存架构。与传统的一致性内存访问（Uniform Memory Access, UMA）不同，NUMA架构中的处理器访问内存的速度取决于内存的物理位置。
在NUMA系统中，整个系统的内存被划分为若干个区域，每个区域（通常称为“节点”）都由一个或多个处理器和相应的本地内存组成。处理器访问其本地内存的速度较快，而访问其他节点的内存则较慢。因此，内存访问时间在不同的节点之间是不一致的。
NUMA的主要特点和优势包括：
提高可扩展性：通过将内存和处理器分布在多个节点上，可以更容易地扩展系统的处理能力和内存容量。
降低内存访问延迟：处理器可以更快地访问其本地内存，从而提高系统的性能。
资源共享：多个处理器可以共享整个系统的内存资源，提高了资源利用率。
然而，NUMA架构也带来了编程上的挑战。开发者需要优化程序，以尽量使处理器访问本地内存，从而减少跨节点访问带来的性能开销。

显示当前系统的 NUMA 拓扑结构。-H 参数（全写为 —hardware）用于打印出系统的 NUMA 硬件配置，包括节点的数量、每个节点的内存大小、节点之间的距离矩阵以及每个节点上的 CPU 列表。
- available: NUMA 节点的数量。
- node 0 cpus: 节点 0 所使用的cpu。
- node 0 size: 节点 0 的内存大小。
- node 0 free: 节点 0 的空闲内存大小。
- node distances: 节点之间的距离矩阵。

lscpu

lscpu 是一个简洁的命令行工具，用于显示 CPU 架构的信息。它汇总了关于系统 CPU 的各种信息，输出更加人性化和结构化。其内容包括但不限于：
- 架构（Architecture）: 显示 CPU 的架构类型（例如 x86_64）。
- CPU 操作模式（CPU op-mode(s)）: 表示 CPU 可以运行的操作模式（32-bit, 64-bit）。
- 字节序（Byte Order）: 显示 CPU 的字节序（Little Endian 或 Big Endian）。
- CPU 数量（CPU(s)）: 显示系统中的物理 CPU 核心数量。
- 每个插槽的核心数（Core(s) per socket）: 显示每个物理插槽中的核心数量。
- 每个核心的线程数（Thread(s) per core）: 显示每个核心中的线程数量。
- 插槽数（Socket(s)）: 显示系统中的物理 CPU 插槽数量。
- NUMA 节点数（NUMA node(s)）: 显示系统中的 NUMA 节点数量。
- CPU 最大频率（CPU max MHz）: 显示 CPU 的最大频率。
- L1d/L1i/L2/L3 缓存大小（L1d/L1i/L2/L3 cache）: 显示不同级别缓存的大小。
cat /proc/cpuinfo 直接从内核的 /proc 文件系统中读取信息，输出的信息较为详细和冗长。

free

用于显示系统的内存使用情况，包括物理内存、交换空间和缓冲区缓存的使用情况。
total：总的内存量（包括物理内存和交换空间）。
used：已经使用的内存量。
free：空闲的内存量。
shared：共享内存量，主要用于 tmpfs（临时文件系统）。
buff/cache：缓冲区和缓存的内存量。
available：可用的内存量，这个值是基于当前内存使用情况计算出的一个估计值，表示新的应用程序可以使用的内存量。

vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1

这些命令用于监控和分析系统性能，每条命令监控不同的系统资源。参数 1 表示每隔 1 秒刷新一次信息。

vmstat 命令显示虚拟内存统计信息，包括内存、分页、CPU 活动等。
- procs：进程信息
  - r：运行队列中的进程数
  - b：处于不可中断睡眠状态的进程数
- memory：内存信息
  - swpd：已使用的交换空间（KB）
  - free：空闲内存（KB）
  - buff：用于缓冲的内存（KB）
  - cache：用于缓存的内存（KB）
- swap：交换空间信息
  - si：从交换区调入内存的量（KB/s）
  - so：从内存调出到交换区的量（KB/s）
- io：I/O 信息
  - bi：块设备读取的块数（每秒）
  - bo：块设备写入的块数（每秒）
- system：系统信息
  - in：每秒中断数
  - cs：每秒上下文切换数
- cpu：CPU 信息
  - us：用户态 CPU 时间百分比
  - sy：系统态 CPU 时间百分比
  - id：空闲时间百分比
  - wa：等待 I/O 时间百分比
  - st：被盗用的时间百分比（虚拟化环境下）
mpstat 命令显示各个 CPU 的统计信息。
- CPU：CPU 编号，all 表示所有 CPU 的平均值
- %usr：用户态 CPU 时间百分比
- %nice：用于 niced 进程的 CPU 时间百分比
- %sys：系统态 CPU 时间百分比
- %iowait：等待 I/O 完成的时间百分比
- %irq：硬中断处理时间百分比
- %soft：软中断处理时间百分比
- %steal：被其他虚拟机窃取的时间百分比
- %gnice：在 nice 优先级下运行的虚拟 CPU 的时间百分比。
- %guest：运行虚拟 CPU 的时间百分比
- %idle：空闲时间百分比
pidstat 命令显示每个进程的统计信息。
- UID：进程所有者的用户ID
- PID：进程ID
- %usr：用户态 CPU 时间百分比
- %system：系统态 CPU 时间百分比
- %guest：运行虚拟 CPU 的时间百分比
- %CPU：CPU 时间百分比（用户态 + 系统态）
- CPU：运行进程的 CPU 编号
- Command：命令名称
iostat 命令显示 I/O 设备的统计信息。-x 表示显示扩展的设备统计信息，-z 表示跳过所有零值的报告。
- avg-cpu：平均 CPU 使用情况
  - %user：用户态 CPU 时间百分比
  - %nice：用于 niced 进程的 CPU 时间百分比
  - %system：系统态 CPU 时间百分比
  - %iowait：等待 I/O 完成的时间百分比
  - %steal：被其他虚拟机窃取的时间百分比
  - %idle：空闲时间百分比
- Device：设备名称
  - r/s：每秒读请求数
  - w/s：每秒写请求数
  - rkB/s：每秒读的千字节数
  - wkB/s：每秒写的千字节数
  - rrqm/s：每秒合并的读请求数
  - wrqm/s：每秒合并的写请求数
  - r_await：读请求的平均等待时间（毫秒）
  - w_await：写请求的平均等待时间（毫秒）
  - aqu-sz：平均请求队列长度
  - rareq-sz：平均读请求大小（千字节）
  - wareq-sz：平均写请求大小（千字节）
  - svctm：平均服务时间（毫秒）
  - %util：设备利用率百分比（I/O 请求的时间占比）

1	sar -n DEV 1

sar 命令用于收集、报告和保存系统活动信息。sar -n DEV 1 命令的作用是监控网络接口的流量和性能指标。

-n DEV：表示收集和报告网络设备（DEV）的统计信息。
1：表示每隔 1 秒刷新一次信息。
输出字段解释：
XX:XX:XX AM：时间戳，显示数据采集的时间点。
IFACE：网络接口名称（如 eth0, lo 等）。
rxpck/s：每秒接收的数据包数（Packets received per second）。
txpck/s：每秒发送的数据包数（Packets transmitted per second）。
rxkB/s：每秒接收的千字节数（Kilobytes received per second）。
txkB/s：每秒发送的千字节数（Kilobytes transmitted per second）。
rxcmp/s：每秒接收的压缩数据包数（Compressed packets received per second）。
txcmp/s：每秒发送的压缩数据包数（Compressed packets transmitted per second）。
rxmcst/s：每秒接收的多播数据包数（Multicast packets received per second）。
%ifutil：接口利用率百分比（Interface utilization percentage），即接口的带宽使用情况。

实验1

实验结果

Write-up 2

int main(int argc, char* argv[]) { // What is the type of argv?
  int i = 5;
  // The & operator here gets the address of i and stores it into pi
  int* pi = &i;
  // The * operator here dereferences pi and stores the value -- 5 --
  // into j.
  int j = *pi;

  char c[] = "6.172";
  char* pc = c; // Valid assignment: c acts like a pointer to c[0] here.
  char d = *pc;
  printf("char d = %c\n", d); // What does this print?

  // compound types are read right to left in C.
  // pcp is a pointer to a pointer to a char, meaning that
  // pcp stores the address of a char pointer.
  char** pcp;
  pcp = argv; // Why is this assignment valid?
  const char* pcc = c; // pcc is a pointer to char constant
  char const* pcc2 = c; // What is the type of pcc2?

  // For each of the following, why is the assignment:
  *pcc = '7'; // invalid?
  pcc = *pcp; // valid?
  pcc = argv[0]; // valid?

  char* const cp = c; // cp is a const pointer to char
  // For each of the following, why is the assignment:
  cp = *pcp; // invalid?
  cp = *argv; // invalid?
  *cp = '!'; // valid?

  const char* const cpc = c; // cpc is a const pointer to char const
  // For each of the following, why is the assignment:
  cpc = *pcp; // invalid?
  cpc = argv[0]; // invalid?
  *cpc = '@'; // invalid?

  return 0;
}

argv是什么类型？
- char* argv[]定义了argv是一个以char指针为基本类型的数组。简单来说，可以将其看做一个字符串的数组。
printf(“char d = %c\n”, d)的输出结果是？
- char d = 6
pcp = argv有效
- pcp是一个二阶指针，argv是一个char*的数组，而数组变量名即为首地址的指针，因此类型匹配。
pcc2 的类型是什么？
- pcc2 的类型是 const char。const char 和 char const* 是等价的，表示一个指向常量字符的指针（即通过该指针不能修改所指向的字符数据）。
*pcc = ‘7’无效
- pcc 是一个指向 const char 的指针，不能通过这个指针修改它所指向的字符数据。
pcc = *pcp有效
- pcp 的类型是 char，pcc 是 const char*（指向常量字符的指针），类型匹配。
pcc = argv[0]有效
- argv[0] 的类型同样是是 char*，类型匹配。
cp = *pcp无效
- cp 是一个常量指针，意味着这个指针本身不能在初始化后被修改。
cp = *argv无效
- 原因同上。
*cp = ‘!’有效
- cp 指向 char，虽然指针本身是常量，但它所指向的数据可以被修改。
cpc = *pcp无效
- cpc 是一个常量指针，意味着这个指针本身不能在初始化后被修改。
cpc = argv[0]无效
- 原因同上；cpc 是一个常量指针，不能重新赋值。
*cpc = ‘@’无效
- cpc 是一个指向 const char 的指针，所以它所指向的字符数据不能被修改。

Write-up 3

将sizes练习中的类型对应的指针大小打印出来。

定义的PRINT_SIZE宏如下：

1	#define PRINT_SIZE(TYPE_NAME, TYPE) printf("size of %s : %zu bytes \n", TYPE_NAME, sizeof(TYPE));

修改后的打印指针大小的代码如下：

PRINT_SIZE("int pointer", int *);
PRINT_SIZE("short pointer", short *);
PRINT_SIZE("long pointer", long *);
PRINT_SIZE("char pointer", char *);
PRINT_SIZE("float pointer", float *);
PRINT_SIZE("double pointer", double *);
PRINT_SIZE("unsigned int pointer", unsigned int *);
PRINT_SIZE("long long pointer", long long *);
PRINT_SIZE("uint8_t pointer", uint8_t *);
PRINT_SIZE("uint16_t pointer", uint16_t *);
PRINT_SIZE("uint32_t pointer", uint32_t *);
PRINT_SIZE("uint64_t pointer", uint64_t *);
PRINT_SIZE("uint_fast8_t pointer", uint_fast8_t *);
PRINT_SIZE("uint_fast16_t pointer", uint_fast16_t *);
PRINT_SIZE("uintmax_t pointer", uintmax_t *);
PRINT_SIZE("intmax_t pointer", intmax_t *);
PRINT_SIZE("__int128 pointer", __int128 *);
PRINT_SIZE("int[5] pointer", &x);
PRINT_SIZE("student pointer", &you);

打印的结果如下：

size of int pointer : 8 bytes 
size of short pointer : 8 bytes 
size of long pointer : 8 bytes 
size of char pointer : 8 bytes 
size of float pointer : 8 bytes 
size of double pointer : 8 bytes 
size of unsigned int pointer : 8 bytes 
size of long long pointer : 8 bytes 
size of uint8_t pointer : 8 bytes 
size of uint16_t pointer : 8 bytes 
size of uint32_t pointer : 8 bytes 
size of uint64_t pointer : 8 bytes 
size of uint_fast8_t pointer : 8 bytes 
size of uint_fast16_t pointer : 8 bytes 
size of uintmax_t pointer : 8 bytes 
size of intmax_t pointer : 8 bytes 
size of __int128 pointer : 8 bytes 
size of int[5] pointer : 8 bytes 
size of student pointer : 8 bytes

不难看出，指针的大小通常与其指向元素的类型无关，与操作系统的访存字长一致。

Write-up 4

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

void swap(int *i, int *j)
{
  int temp = *i;
  *i = *j;
  *j = temp;
}

int main()
{
  int k = 1;
  int m = 2;
  swap(&k, &m);
  // What does this print?
  printf("k = %d, m = %d\n", k, m);

  return 0;
}

Write-up 5

输入make clean; make指令后的结果：

rm -f testbed.o matrix_multiply.o matrix_multiply .buildmode \
        testbed.gcda matrix_multiply.gcda \
        testbed.gcno matrix_multiply.gcno \
        testbed.c.gcov matrix_multiply.c.gcov fasttime.h.gcov
clang -O1 -DNDEBUG -Wall -std=c99 -D_POSIX_C_SOURCE=200809L -c testbed.c -o testbed.o
clang -O1 -DNDEBUG -Wall -std=c99 -D_POSIX_C_SOURCE=200809L -c matrix_multiply.c -o matrix_multiply.o
clang -o matrix_multiply testbed.o matrix_multiply.o -lrt -flto -fuse-ld=gold

对已生成文件进行清理并重新生成可执行文件。

Write-up 6

得到的ASan的错误输出如下：

=================================================================
==5675==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 2 object(s) allocated from:
    #0 0x55f87f2df32e in __interceptor_malloc (/home/xby/hw1/matrix-multiply/matrix_multiply+0xa232e) (BuildId: e55aea457cfc44e1756fef8c1effbed264376bf8)
    #1 0x55f87f31ad5a in make_matrix /home/xby/hw1/matrix-multiply/matrix_multiply.c:40:24
    #2 0x7fe96a486d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Indirect leak of 128 byte(s) in 8 object(s) allocated from:
    #0 0x55f87f2df32e in __interceptor_malloc (/home/xby/hw1/matrix-multiply/matrix_multiply+0xa232e) (BuildId: e55aea457cfc44e1756fef8c1effbed264376bf8)
    #1 0x55f87f31adf7 in make_matrix /home/xby/hw1/matrix-multiply/matrix_multiply.c:50:36

Indirect leak of 64 byte(s) in 2 object(s) allocated from:
    #0 0x55f87f2df32e in __interceptor_malloc (/home/xby/hw1/matrix-multiply/matrix_multiply+0xa232e) (BuildId: e55aea457cfc44e1756fef8c1effbed264376bf8)
    #1 0x55f87f31ada2 in make_matrix /home/xby/hw1/matrix-multiply/matrix_multiply.c:47:32
    #2 0x7fe96a486d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: 224 byte(s) leaked in 12 allocation(s).

说明make_matrix生成的矩阵均发生了内存泄漏。

Write-up 7

代码修改：在make_matrix函数中对新创建的矩阵添加初始化的操作。

// Allocates a row-by-cols matrix and returns it
matrix *make_matrix(int rows, int cols)
{
  matrix *new_matrix = malloc(sizeof(matrix));

  // Set the number of rows and columns
  new_matrix->rows = rows;
  new_matrix->cols = cols;

  // Allocate a buffer big enough to hold the matrix.
  new_matrix->values = (int **)malloc(sizeof(int *) * rows);
  for (int i = 0; i < rows; i++)
  {
    new_matrix->values[i] = (int *)malloc(sizeof(int) * cols);
    memset(new_matrix->values[i], 0, sizeof(int) * cols);
  }

  return new_matrix;
}

运行./matrix_multiply -p后得到的结果：

Setup
Matrix A: 
------------
    3      7      8      1  
    7      9      8      3  
    1      2      6      7  
    9      8      1      9  
------------
Matrix B: 
------------
    1      3      0      1  
    5      5      7      8  
    0      1      9      8  
    9      3      1      7  
------------
Running matrix_multiply_run()...
---- RESULTS ----
Result: 
------------
   47     55    122    130  
   79     83    138    164  
   74     40     75    114  
  130     95     74    144  
------------
---- END RESULTS ----
Elapsed execution time: 0.000000 sec

经验算，结果正确。

Write-up 8

代码修改：在进程结束前，对三个矩阵进行销毁：

1
2
3

free_matrix(A);
free_matrix(B);
free_matrix(C);

使用valgrind再次运行得到的结果：

==6130== 
==6130== HEAP SUMMARY:
==6130==     in use at exit: 0 bytes in 0 blocks
==6130==   total heap usage: 39 allocs, 39 frees, 1,680 bytes allocated
==6130== 
==6130== All heap blocks were freed -- no leaks are possible
==6130== 
==6130== For lists of detected and suppressed errors, rerun with: -s
==6130== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

软件系统优化实验A1笔记