进程地址空间由进程可寻址的虚拟内存组成,内核允许进程使用这种虚拟内存中的地址。
每个进程都有一个32位或64位的平坦地址空间,空间的具体大小取决于体系结构。“平坦”指的是地址空间范围是一个独立的连续区间。
现代采用虚拟内存的操作系统通常都使用平坦地址空间而不是分段式的内存模式。
一个进程的地址空间与另一个进程的地址空间即使有相同的内存地址,实际上也彼此互不相干。这样的进程为线程。
一个进程可以寻址4GB虚拟内存(32位地址空间中),但这不代表能访问所有虚拟地址。可以被合法访问的地址空间称为内存区域。
进程只能访问有效内存区域内的内存地址,每个内存区域也具有相关权限如对相关进程有可读、可写、可执行属性。
如果一个进程访问了不在有效范围中的内粗你去与,或以不正确的方式访问了有效地址,那么内核就会终止该进程,并返回“段错误”信息。
二、内存描述符
内存描述符结构体表示进程的地址空间,该结构包含了和进程地址空间有关的全部信息。
内存描述符由mm_struct结构体表示,定义在文件
struct mm_struct {
struct vm_area_struct *mmap; /* list of VMAs 内存区域链表 */
struct rb_root mm_rb; /* VMA形成的红黑树 */
u32 vmacache_seqnum; /* per-thread vmacache */
#ifdef CONFIG_MMU
unsigned long (*get_unmapped_area) (struct file *filp,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags);
#endif
unsigned long mmap_base; /* base of mmap area */
unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
unsigned long task_size; /* size of task vm space */
unsigned long highest_vm_end; /* highest vma end address */
pgd_t * pgd; /* 页全局目录 */
atomic_t mm_users; /* How many users with user space? 使用地址空间的用户数*/
atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) 主使用计数器*/
atomic_long_t nr_ptes; /* PTE page table pages */
#if CONFIG_PGTABLE_LEVELS > 2
atomic_long_t nr_pmds; /* PMD page table pages */
#endif
int map_count; /* number of VMAs 内存区域的个数*/
spinlock\_t page\_table\_lock; /\* Protects page tables and some counters 页表锁\*/
struct rw\_semaphore mmap\_sem; /\* 内存区域信号量 \*/
struct list\_head mmlist; /\* List of maybe swapped mm's. These are globally strung
\* together off init\_mm.mmlist, and are protected
\* by mmlist\_lock 所有mm\_struct形成的链表
\*/
unsigned long hiwater\_rss; /\* High-watermark of RSS usage \*/
unsigned long hiwater\_vm; /\* High-water virtual memory usage \*/
unsigned long total\_vm; /\* Total pages mapped 全部页面数目\*/
unsigned long locked\_vm; /\* Pages that have PG\_mlocked set 上锁的页面数目\*/
unsigned long pinned\_vm; /\* Refcount permanently increased \*/
unsigned long shared\_vm; /\* Shared pages (files) \*/
unsigned long exec\_vm; /\* VM\_EXEC & ~VM\_WRITE \*/
unsigned long stack\_vm; /\* VM\_GROWSUP/DOWN \*/
unsigned long def\_flags;
unsigned long start\_code, end\_code, start\_data, end\_data; /\* 代码段开始和停止,数据段开始和停止 \*/
unsigned long start\_brk, brk, start\_stack; /\* 堆首地址,堆尾,进程栈地址 \*/
unsigned long arg\_start, arg\_end, env\_start, env\_end;
unsigned long saved\_auxv\[AT\_VECTOR\_SIZE\]; /\* for /proc/PID/auxv 保存的auxv\*/
/\*
\* Special counters, in some configurations protected by the
\* page\_table\_lock, in other configurations by being atomic.
\*/
struct mm\_rss\_stat rss\_stat;
struct linux\_binfmt \*binfmt;
cpumask\_var\_t cpu\_vm\_mask\_var; /\* 懒惰(Lazy)TLB交换掩码 \*/
/\* Architecture-specific MM context 体系结构特殊数据\*/
mm\_context\_t context;
unsigned long flags; /\* Must use atomic bitops to access the bits 状态标志\*/
struct core\_state \*core\_state; /\* coredumping support \*/
#ifdef CONFIG_AIO
spinlock_t ioctx_lock;
struct kioctx_table __rcu *ioctx_table;
#endif
#ifdef CONFIG_MEMCG
/*
* "owner" points to a task that is regarded as the canonical
* user/owner of this mm. All of the following must be true in
* order for it to be changed:
*
* current == mm->owner
* current->mm != mm
* new_owner->mm == mm
* new_owner->alloc_lock is held
*/
struct task_struct __rcu *owner;
#endif
struct user_namespace *user_ns;
/\* store ref to file /proc/<pid>/exe symlink points to \*/
struct file \_\_rcu \*exe\_file;
#ifdef CONFIG_MMU_NOTIFIER
struct mmu_notifier_mm *mmu_notifier_mm;
#endif
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
pgtable_t pmd_huge_pte; /* protected by page_table_lock */
#endif
#ifdef CONFIG_CPUMASK_OFFSTACK
struct cpumask cpumask_allocation;
#endif
#ifdef CONFIG_NUMA_BALANCING
/*
* numa_next_scan is the next time that the PTEs will be marked
* pte_numa. NUMA hinting faults will gather statistics and migrate
* pages to new nodes if necessary.
*/
unsigned long numa_next_scan;
/\* Restart point for scanning and setting pte\_numa \*/
unsigned long numa\_scan\_offset;
/\* numa\_scan\_seq prevents two threads setting pte\_numa \*/
int numa\_scan\_seq;
#endif
#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
/*
* An operation with batched TLB flushing is going on. Anything that
* can move process memory needs to flush the TLB when moving a
* PROT_NONE or PROT_NUMA mapped page.
*/
bool tlb_flush_pending;
#endif
#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
/* See flush_tlb_batched_pending() */
bool tlb_flush_batched;
#endif
struct uprobes_state uprobes_state;
#ifdef CONFIG_X86_INTEL_MPX
/* address of the bounds directory */
void __user *bd_addr;
#endif
#ifdef CONFIG_HUGETLB_PAGE
atomic_long_t hugetlb_usage;
#endif
struct work_struct async_put_work;
};
mm_struct
mm_users域记录正在使用该地址的进程数目。mm_count域是mm_struct结构体的主引用计数。
mmap和mm_rb这两个不同数据结构体描述的对象是相同的:该地址空间中的全部内存区域。前者以链表形式存放而后者以红-黑树形式存放。
mmap结构体作为链表,利于简单、高效地遍历所有元素。而mm_rb结构体作为红-黑树,更适合搜索指定元素。
所有mm_struct结构体都通过自身的mmlist域连接在一个双向链表中,该链表的首元素是init_mm内存描述符,代表init进程地址空间
操作该链表的时候需要使用mmlist_lock锁来防止并发访问,在
在进程描述符
当进程退出时,内核会调用定义在kernel/exit.c中的exit_mm()函数,该函数执行一些常规的撤销工作,同时更新一些统计量。
该函数调用mmput()函数减少内存描述符中的mm_users用户计数,如果降到0。调用mmdrop函数,减少mm_count使用计数。
内核线程没有进程地址空间,也没有相关的内存描述符。
vm_area_struct结构体,定义在文件
内核将每个内存区域作为一个单独的内存对象管理,每个内存区域都拥有一致的属性。
/*
* This struct defines a memory VMM memory area. There is one of these
* per VM-area/task. A VM area is any part of the process virtual memory
* space that has a special rule for the page-fault handlers (ie a shared
* library, the executable area etc).
*/
struct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
unsigned long vm\_start; /\* Our start address within vm\_mm. 区间的首地址\*/
unsigned long vm\_end; /\* The first byte after our end address
within vm\_mm. 区间的尾地址\*/
/\* linked list of VM areas per task, sorted by address \*/
struct vm\_area\_struct \*vm\_next, \*vm\_prev; /\* VMA链表 \*/
struct rb\_node vm\_rb; /\* 树上该VMA的节点 \*/
/\*
\* Largest free memory gap in bytes to the left of this VMA.
\* Either between this VMA and vma->vm\_prev, or between one of the
\* VMAs below us in the VMA rbtree and its ->vm\_prev. This helps
\* get\_unmapped\_area find a free area of the right size.
\*/
unsigned long rb\_subtree\_gap;
/\* Second cache line starts here. \*/
struct mm\_struct \*vm\_mm; /\* The address space we belong to. 相关的mm\_struct结构体\*/
pgprot\_t vm\_page\_prot; /\* Access permissions of this VMA. \*/
unsigned long vm\_flags; /\* Flags, see mm.h. 标志位\*/
/\*
\* For areas with an address space and backing store,
\* linkage into the address\_space->i\_mmap interval tree.
\*
\* For private anonymous mappings, a pointer to a null terminated string
\* in the user process containing the name given to the vma, or NULL
\* if unnamed.
\*/
union {
struct {
struct rb\_node rb;
unsigned long rb\_subtree\_last;
} shared;
const char \_\_user \*anon\_name;
};
/\*
\* A file's MAP\_PRIVATE vma can be in both i\_mmap tree and anon\_vma
\* list, after a COW of one of the file pages. A MAP\_SHARED vma
\* can only be in the i\_mmap tree. An anonymous MAP\_PRIVATE, stack
\* or brk vma (with NULL file) can only be in an anon\_vma list.
\*/
struct list\_head anon\_vma\_chain; /\* Serialized by mmap\_sem &
\* page\_table\_lock \*/
struct anon\_vma \*anon\_vma; /\* Serialized by page\_table\_lock 匿名VMA对象\*/
/\* Function pointers to deal with this struct. 相关的操作表\*/
const struct vm\_operations\_struct \*vm\_ops;
/\* Information about our backing store: \*/
unsigned long vm\_pgoff; /\* Offset (within vm\_file) in PAGE\_SIZE
units, \*not\* PAGE\_CACHE\_SIZE 文件中的偏移量\*/
struct file \* vm\_file; /\* File we map to (can be NULL). 被映射的文件(如果存在)\*/
void \* vm\_private\_data; /\* was vm\_pte (shared mem) 私有数据\*/
#ifndef CONFIG_MMU
struct vm_region *vm_region; /* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
struct mempolicy *vm_policy; /* NUMA policy for the VMA */
#endif
struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
};
vm_area_struct
每个内存描述符都对应于进程地址空间中的唯一区间。vm_start域指向区间的首地址,vm_end域指向区间的尾地址之后的第一个字节。vm_end-vm_start就是区间的长度。
vm_mm域指向和VMA相关的mm_struct结构体,VMA对其香瓜你的mm_struct。
VMA标志是一种标志位,定义在
/*
* vm_flags in vm_area_struct, see mm_types.h.
*/
#define VM_NONE 0x00000000
#define VM_READ 0x00000001 /* currently active flags 页面可读*/
#define VM_WRITE 0x00000002 /* 页面可写 */
#define VM_EXEC 0x00000004 /* 页面可执行 */
#define VM_SHARED 0x00000008 /* 页面可共享 */
/* mprotect() hardcodes VM_MAYREAD >> 4 == VM_READ, and so for r/w/x bits. */
#define VM_MAYREAD 0x00000010 /* limits for mprotect() etc VM_READ标志可被设置*/
#define VM_MAYWRITE 0x00000020
#define VM_MAYEXEC 0x00000040
#define VM_MAYSHARE 0x00000080
#define VM_GROWSDOWN 0x00000100 /* general info on the segment 区域可向下增长*/
#define VM_UFFD_MISSING 0x00000200 /* missing pages tracking 区域可向上增长*/
#define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */
#define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. 区域映射一个不可写文件*/
#define VM_UFFD_WP 0x00001000 /* wrprotect pages tracking */
#define VM_LOCKED 0x00002000 /* 区域中的页面被锁定 */
#define VM_IO 0x00004000 /* Memory mapped I/O or similar 区域映射设备I/O控件*/
/\* Used by sys\_madvise() \*/
#define VM_SEQ_READ 0x00008000 /* App will access data sequentially 页面可能会被连续访问*/
#define VM_RAND_READ 0x00010000 /* App will not benefit from clustered reads 页面可能会被随机访问*/
#define VM_DONTCOPY 0x00020000 /* Do not copy this vma on fork 区域不能在fork时被拷贝*/
#define VM_DONTEXPAND 0x00040000 /* Cannot expand with mremap() 区域不能通过mremap()增加*/
#define VM_LOCKONFAULT 0x00080000 /* Lock the pages covered when they are faulted in */
#define VM_ACCOUNT 0x00100000 /* Is a VM accounted object 该区域是一个记账VM对象*/
#define VM_NORESERVE 0x00200000 /* should the VM suppress accounting */
#define VM_HUGETLB 0x00400000 /* Huge TLB Page VM 区域使用了hugetlb页面*/
#define VM_ARCH_1 0x01000000 /* Architecture-specific flag */
#define VM_ARCH_2 0x02000000
#define VM_DONTDUMP 0x04000000 /* Do not include in the core dump */
#ifdef CONFIG_MEM_SOFT_DIRTY
#else
#endif
#define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */
#define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */
#define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
#if defined(CONFIG_X86)
#elif defined(CONFIG_PPC)
#elif defined(CONFIG_PARISC)
#elif defined(CONFIG_METAG)
#elif defined(CONFIG_IA64)
#elif !defined(CONFIG_MMU)
#endif
VMA标志
vm_area_struct结构体中的vm_ops域指向与指定内存区域相关的操作函数表,内核使用表中的方法操作VMA。
操作函数表由vm_operations_struct结构体表示,定义在文件
/*
* These are the virtual MM functions - opening of an area, closing and
* unmapping it (needed to keep files on disk up-to-date etc), pointer
* to the functions called when a no-page or a wp-page exception occurs.
*/
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
int (*mremap)(struct vm_area_struct * area);
int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf);
int (*pmd_fault)(struct vm_area_struct *, unsigned long address,
pmd_t *, unsigned int flags);
void (*map_pages)(struct vm_area_struct *vma, struct vm_fault *vmf);
/\* notification that a previously read-only page is about to become
\* writable, if an error is returned it will cause a SIGBUS \*/
int (\*page\_mkwrite)(struct vm\_area\_struct \*vma, struct vm\_fault \*vmf);
/\* same as page\_mkwrite when using VM\_PFNMAP|VM\_MIXEDMAP \*/
int (\*pfn\_mkwrite)(struct vm\_area\_struct \*vma, struct vm\_fault \*vmf);
/\* called by access\_process\_vm when get\_user\_pages() fails, typically
\* for use by special VMAs that can switch between memory and hardware
\*/
int (\*access)(struct vm\_area\_struct \*vma, unsigned long addr,
void \*buf, int len, int write);
/\* Called by the /proc/PID/maps code to ask the vma whether it
\* has a special name. Returning non-NULL will also cause this
\* vma to be dumped unconditionally. \*/
const char \*(\*name)(struct vm\_area\_struct \*vma);
#ifdef CONFIG_NUMA
/*
* set_policy() op must add a reference to any non-NULL @new mempolicy
* to hold the policy upon return. Caller should pass NULL @new to
* remove a policy and fall back to surrounding context--i.e. do not
* install a MPOL_DEFAULT policy, nor the task or system default
* mempolicy.
*/
int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new);
/\*
\* get\_policy() op must add reference \[mpol\_get()\] to any policy at
\* (vma,addr) marked as MPOL\_SHARED. The shared policy infrastructure
\* in mm/mempolicy.c will do this automatically.
\* get\_policy() must NOT add a ref if the policy at (vma,addr) is not
\* marked as MPOL\_SHARED. vma policies are protected by the mmap\_sem.
\* If no \[shared/vma\] mempolicy exists at the addr, get\_policy() op
\* must return NULL--i.e., do not "fallback" to task or system default
\* policy.
\*/
struct mempolicy \*(\*get\_policy)(struct vm\_area\_struct \*vma,
unsigned long addr);
#endif
/*
* Called by vm_normal_page() for special PTEs to find the
* page for @addr. This is useful if the default behavior
* (using pte_page()) would not find the correct page.
*/
struct page *(*find_special_page)(struct vm_area_struct *vma,
unsigned long addr);
};
void (\*open)(struct vm\_area\_struct \* area);
当制定的内存区域被加入到一个地址空间时,改函数被调用
void (*close)(struct vm_area_struct * area);
当制定的内存区域从地址空间删除时,该函数被调用
int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf);
当没有出现在物理内存中的页面被访问时,改函数被页面故障处理调用
int (*page_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf);
当某个页面为只读页面时,该函数被页面故障处理调用
int (*access)(struct vm_area_struct *vma, unsigned long addr,
void *buf, int len, int write);
当get_user_pages()函数调用失败时,该函数被access_process_vm()函数调用
VMA操作
vm_area_struct结构体通过自身的vm_next域连入链表,所有区域按地址增长方向排序,mmap域指向链表中第一个内存区域,链中最后一个结构体指向空。
可以使用/proc文件系统和pmap工具查看给定进程的内存空间和其中所含的内存区域。
内核市场需要在某个内存区域上执行一些操作,他们都声明在
为了找到给定的内存地址属于哪个内存区域,内核提供了find_vma()函数。函数定义在
struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr);
在指定的地址空间中搜索第一个vm_end大于addr的内存区域。
如果没有发现这样的区域,该函数返回NULL。否则返回指向匹配的内存区域的vm_area_struct结构体指针。
find_vma
与find_vma工作方式相同,返回第一个小于addr的VMA,该函数定义和声明分别在文件mm/mmap.c中和文件
struct vm_area_struct *find_vma_prev(struct mm_struct *mm, unsigned long addr, struct vm_area_struct **pprev)
pprev参数存放指向先于addr的VMA指针
find_vma_prev
返回第一个指定地址区间相交的VMA,内联函数,在
stati inline struct vm_area_struct *find_vma_intersection(struct mm_struct *mm, unsigned long start_addr, unsigned long end_addr) {
struct vm_area_struct *vma;
vma = find_vma(mm, start_addr);
if(vma && end_addr <= vma->vm_start)
vma = NULL;
return vma;
}
第一个参数mm是要搜索的地址空间,start_addr是区间开始首地址,end_addr是区间的尾位置。
如果返回NULL,没有发现这样的区域。如果返回有效VMA,则只有在该VMA起始位置于给定的地址区间结束位置之前,才将其返回。如果VMA起始位置大于指定地址范围的结束位置,则该函数返回NULL。
find_vma_intersection
内核使用do_mmap()函数创建一个新的线性地址区间。定义在文件
unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flag, unsigned long offset)
file:指定的文件,offset:偏移量,len:长度
do_mmap
如果file参数是NULL,offset也为0,代表这次映射没有文件相关,是匿名映射。
addr是可选参数,指定搜索空闲区域的起始位置。
prot参数指定内存区域中页面的访问权限,标志位在
PROT_READ:对应于VM_READ
PROT_WRITE:对应于VM_WRITE
PROT_EXEC:对应于VM_EXEC
PROT_NONE:不可被访问
页保护标志
flag参数指定了VMA标志,这些标志指定类型并该表映射的行为,也在文件
MAP_SHARED:映射可以被共享
MAP_PRIVATE:映射不能被共享
MAP_FIXED:新区间必须开始于指定地址addr
MAP_ANONYMOUS:映射不是file-backed,而是匿名的
MAP_GROWSDOWN:对应于VM_GROWSDOWN
MAP_DENYWRITE:对应于VM_DENYWRITE
MAP_EXECUTABLE:对应于VM_EXECUTABLE
MAP_LOCKED:对应于VM_LOCKED
MAP_NORESERVE:不需要为映射保留空间
MAP_POPULATE:填充页表
MAP_NONBLOCK:在I/O操作上不堵塞
页保护标志
空户空间可以通过mmap()系统调用获取内核函数do_mmap()功能,mmap调用定义如下:
void *mmap2(void *start, size_t length, int prot, int flags, int fd, off_t pgoff)
mmap2
do_munmap()函数从特定的进程地址空间中删除指定地址区间,该函数定义在文件
int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
mm:要删除区域所在的地址空间
start:要删除的开始地址
len:地址长度
成功:,否则负的错误码
do_munmap
系统调用munmap()给用户提供了删除指定地址的方法,和mmap相反。
int munmap(void *start, size_t length)
munmap
地址转换需要将虚拟地址分段,使每段虚拟地址都作为一个索引指向页表,而页表项则指向下一级别的页表或指向最终的物理页面。
Linux使用三级页表完成地址转换。使用三级页表结构可以利用“最大公约数”的思想,一种简单的体系结构。
顶级页表是全局目录(PGD)
二级页表是中间页目录(PMD)
最后一级页表指向物理页面。
页表队以ing的结构体依赖于具体的体系结构,所以定义在文件
手机扫一扫
移动阅读更方便
你可能感兴趣的文章