在各种QQ群里有大量同学会问,为什么LINUX/UNIX上free内存太少会不会有问题。
为了避免这种不必要的误会,不同的OS版本,采用了不同的策略。
例如最狠的是Solaris SUNOS ,在Solaris 8中将文件系统缓存直接算作free内存:
而Linux中从大约2014年 , 内核版本kernels 2.6.27+开始引入了MemAvailable ,其解释为:
[oracle@master ~]$ free -h total used free shared buff/cache available Mem: 31G 9.1G 422M 5.8G 21G 15G Swap: 15G 11G 3.8G
available
Estimation of how much memory is available for starting new applications, without swapping. Unlike the data provided by the cache or free fields, this field takes into account page cache and also that not all re-claimable memory slabs will be reclaimed due to items being in use (MemAvailable in /proc/meminfo, available on kernels 3.14, emulated on kernels 2.6.27+, otherwise the same as free)
Linux Kernel 的GIT中,有比较明确的概述:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773 /proc/meminfo: provide estimated available memory Many load balancing and workload placing programs check /proc/meminfo to estimate how much free memory is available. They generally do this by adding up "free" and "cached", which was fine ten years ago, but is pretty much guaranteed to be wrong today. It is wrong because Cached includes memory that is not freeable as page cache, for example shared memory segments, tmpfs, and ramfs, and it does not include reclaimable slab memory, which can take up a large fraction of system memory on mostly idle systems with lots of files. Currently, the amount of memory that is available for a new workload, without pushing the system into swap, can be estimated from MemFree, Active(file), Inactive(file), and SReclaimable, as well as the "low" watermarks from /proc/zoneinfo. However, this may change in the future, and user space really should not be expected to know kernel internals to come up with an estimate for the amount of free memory. It is more convenient to provide such an estimate in /proc/meminfo. If things change in the future, we only have to change it in one place. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Erik Mouw <erik.mouw_2@nxp.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Diffstat -rw-r--r-- Documentation/filesystems/proc.txt 9 -rw-r--r-- fs/proc/meminfo.c 37 2 files changed, 46 insertions, 0 deletions diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 22d89aa3..8533f5f 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -767,6 +767,7 @@ The "Locked" indicates whether the mapping is locked in memory or not. MemTotal: 16344972 kB MemFree: 13634064 kB +MemAvailable: 14836172 kB Buffers: 3656 kB Cached: 1195708 kB SwapCached: 0 kB @@ -799,6 +800,14 @@ AnonHugePages: 49152 kB MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code) MemFree: The sum of LowFree+HighFree +MemAvailable: An estimate of how much memory is available for starting new + applications, without swapping. Calculated from MemFree, + SReclaimable, the size of the file LRU lists, and the low + watermarks in each zone. + The estimate takes into account that the system needs some + page cache to function well, and that not all reclaimable + slab will be reclaimable, due to items being in use. The + impact of those factors will vary from system to system. Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so) Cached: in-memory cache for files read from the disk (the diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index a77d2b2..24270ec 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -26,7 +26,11 @@ static int meminfo_proc_show(struct seq_file *m, void *v) unsigned long committed; struct vmalloc_info vmi; long cached; + long available; + unsigned long pagecache; + unsigned long wmark_low = 0; unsigned long pages[NR_LRU_LISTS]; + struct zone *zone; int lru; /* @@ -47,12 +51,44 @@ static int meminfo_proc_show(struct seq_file *m, void *v) for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++) pages[lru] = global_page_state(NR_LRU_BASE + lru); + for_each_zone(zone) + wmark_low += zone->watermark[WMARK_LOW]; + + /* + * Estimate the amount of memory available for userspace allocations, + * without causing swapping. + * + * Free memory cannot be taken below the low watermark, before the + * system starts swapping. + */ + available = i.freeram - wmark_low; + + /* + * Not all the page cache can be freed, otherwise the system will + * start swapping. Assume at least half of the page cache, or the + * low watermark worth of cache, needs to stay. + */ + pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE]; + pagecache -= min(pagecache / 2, wmark_low); + available += pagecache; + + /* + * Part of the reclaimable swap consists of items that are in use, + * and cannot be freed. Cap this estimate at the low watermark. + */ + available += global_page_state(NR_SLAB_RECLAIMABLE) - + min(global_page_state(NR_SLAB_RECLAIMABLE) / 2, wmark_low); + + if (available < 0) + available = 0; + /* * Tagged format, for easy grepping and expansion. */ seq_printf(m, "MemTotal: %8lu kB\n" "MemFree: %8lu kB\n" + "MemAvailable: %8lu kB\n" "Buffers: %8lu kB\n" "Cached: %8lu kB\n" "SwapCached: %8lu kB\n" @@ -105,6 +141,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) , K(i.totalram), K(i.freeram), + K(available), K(i.bufferram), K(cached), K(total_swapcache_pages()),
其中提到的是大量 对内存敏感的程序 是简单的将 /proc/meminfo 接口中的FREE+CACHED 的2者相加,算作可用的内存。而我们国内的同学的大多内存焦虑症,是只将FREE算作可用内存。
按照Linux Kernel Rik van Riel <riel@redhat.com>的口径是,这2种算法都是不合理的。后者是焦虑症,前者没考虑部分cache的内存未必可以被释放重用。 any load balancing and workload placing programs check /proc/meminfo to estimate how much free memory is available. They generally do this by adding up “free” and “cached”, which was fine ten years ago, but is pretty much guaranteed to be wrong today. It is wrong because Cached includes memory that is not freeable as page cache, for example shared memory segments, tmpfs, and ramfs, and it does not include reclaimable slab memory, which can take up a large fraction of system memory on mostly idle systems with lots of files.
因为在2014年,物理内存已经比较大了,更少地使用SWAP了,所以其展望未来,希望有一个接口能提供有效的评估。However, this may change in the future, and user space really should not be expected to know kernel internals to come up with an estimate for the amount of free memory. It is more convenient to provide such an estimate in /proc/meminfo. If things change in the future, we only have to change it in one place.
于是添加了 接口MemAvailable, 明确说明其是 评估的可以给新程序用的内存空间大小,考虑了 MemFree、SReclaimable、 the size of the file LRU lists、low watermarks in each zone。
其详细算法如上文引用。
因为都9102年了,可以让小伙伴不要再看 free memory了, 基本上关注 available memory 就可以了!对领导也耐心解释下,看available更有意义!
另有同学问,为什么有free内存的情况下,会用到SWAP空间。 stackexchange上有大量相关的解释,这里引用高赞的回答:
It is normal for Linux systems to use some swap even if there is still RAM free. The Linux kernel will move to swap memory pages that are very seldom used (e.g., the getty
instances when you only use X11, and some other inactive daemon).
Swap space usage becomes an issue only when there is not enough RAM available, and the kernel is forced to continuously move memory pages to swap and back to RAM, just to keep applications running. In this case, system monitor applications would show a lot of disk I/O activity.
用一句话说 就是有free memory情况下,用SWAP很正常;很少用到的内存页就会被换出去,交换导致性能不好的情况是 内存真不够了,SWAP一会换出一会换入,造成I/O开销;否则问题不大。