Memory Management
-
Tuning the memory sub-system can be a complex process.
-
First of all, one has to take note that memory usage and I/O throughput are intrinsically related, as, in most cases, most memory is being used to cache the contents of files on disk. Thus, changing memory parameters can have a large effect on I/O performance, and changing I/O parameters can have an equally large converse effect on the virtual memory sub-system.
free -m total used free shared buff/cache available Mem: 7763 3178 646 1022 3938 3262 Swap: 7762 1034 6728 cat /proc/meminfo MemTotal: 7949804 kB MemFree: 669748 kB MemAvailable: 3355456 kB Buffers: 28 kB Cached: 3777140 kB SwapCached: 13160 kB Active: 2357428 kB Inactive: 3249488 kB Active(anon): 1659132 kB Inactive(anon): 1201760 kB Active(file): 698296 kB Inactive(file): 2047728 kB Unevictable: 583624 kB Mlocked: 220 kB SwapTotal: 7949308 kB ...UTILITY PURPOSE PACKAGE freeBrief summary of memory usage procps vmstatDetailed virtual memory statistics and block I/O, dynamically updated procps pmapProcess memory map procps -
The pseudofile
/proc/meminfocontains a wealth of information about how memory is being used.
/proc/sys/vm
-
The
/proc/sys/vmdirectory contains many tunable knobs to control the Virtual Memory system. -
Values can be changed either by directly writing to the entry, or using the
sysctlutility. -
When tweaking parameters in
/proc/sys/vm, the usual best practice is to adjust one thing at a time and look for effects. The primary (inter-related) tasks are: -
Controlling flushing parameters; i.e., how many pages are allowed to be dirty and how often they are flushed out to disk
-
Controlling swap behavior; i.e., how much pages that reflect file contents are allowed to remain in memory, as opposed to those that need to be swapped out as they have no other backing store
-
Controlling how much memory overcommission is allowed, since many programs never need the full amount of memory they request, particularly because of copy on write (COW) techniques
-
Memory tuning can be subtle: what works in one system situation or load may be far from optimal in other circumstances.
-
Exactly what appears in this directory will depend somewhat on the kernel version. Almost all of the entries are writable (by root).
ls /proc/sys/vm/ admin_reserve_kbytes dirty_ratio legacy_va_layout min_unmapped_ratio numa_zonelist_order panic_on_oom watermark_boost_factor compaction_proactiveness dirtytime_expire_seconds lowmem_reserve_ratio mmap_min_addr oom_dump_tasks percpu_pagelist_high_fraction watermark_scale_factor compact_memory dirty_writeback_centisecs max_map_count mmap_rnd_bits oom_kill_allocating_task stat_interval zone_reclaim_mode compact_unevictable_allowed drop_caches memfd_noexec mmap_rnd_compat_bits overcommit_kbytes stat_refresh dirty_background_bytes extfrag_threshold memory_failure_early_kill nr_hugepages overcommit_memory swappiness dirty_background_ratio hugetlb_optimize_vmemmap memory_failure_recovery nr_hugepages_mempolicy overcommit_ratio unprivileged_userfaultfd dirty_bytes hugetlb_shm_group min_free_kbytes nr_overcommit_hugepages page-cluster user_reserve_kbytes dirty_expire_centisecs laptop_mode min_slab_ratio numa_stat page_lock_unfairness vfs_cache_pressure
vmstat
vmstatis a multi-purpose tool that displays information about memory, paging, I/O, processor activity and processes.
vmstat [options] [delay] [count]If delay is given in seconds, the report is repeated at that interval count times; if count is not given, vmstat will keep reporting statistics forever, until it is killed by a signal, such as Ctrl-C.
vmstat 2 4
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 1048576 910912 28 4061280 6 16 62 42 52 151 4 2 94 0 0
0 0 1048576 940816 28 4040172 0 0 0 266 2874 5571 3 2 94 0 0
0 0 1048576 939220 28 4042236 0 0 0 44 2850 5257 3 2 95 0 0
0 0 1048576 938500 28 4042236 0 0 0 0 2695 5135 3 2 95 0 0
vmstat -SM -a 2 4
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free inact active si so bi bo in cs us sy id wa st
2 0 1024 825 3128 2305 0 0 62 42 55 157 4 2 94 0 0
0 0 1024 824 3122 2305 0 0 0 38 2829 5162 3 3 94 0 0
0 0 1024 836 3122 2305 0 0 0 3438 2983 5966 3 2 94 0 0
1 0 1024 841 3122 2305 0 0 0 44 2672 5199 3 2 95 0 0
vmstat -p /dev/sda3 2 4
sda3 reads read sectors writes requested writes
258262 26944496 303063 13001080
258262 26944496 303083 13001448
258262 26944496 303108 13001744
258263 26944504 303137 13004376-
If the option
-S mis given, memory statistics will be in MB instead of KB. -
With the
-aoption,vmstatdisplays information about active and inactive memory. -
Active memory pages are those which have been recently used; they may be clean (disk contents are up to date) or dirty (need to be flushed to disk eventually).
-
By contrast, inactive memory pages have not been recently used and are more likely to be clean and are released sooner under memory pressure.
-
If you just want to get some quick statistics on only one partition, use the
-poption
Using SWAP
Linux employs a virtual memory system, in which the operating system can function as if it had more memory than it really does. This kind of memory overcommission functions in two ways:
- Many programs do not actually use all the memory they are given permission to use. Sometimes, this is because child processes inherit a copy of the parent’s memory regions utilizing a COW (Copy On Write) technique, in which the child only obtains a unique copy (on a page-by-page basis) when there is a change.
- When memory pressure becomes important, less active memory regions may be swapped out to disk, to be recalled only when needed again.
Such swapping is usually done to one or more dedicated partitions or files; Linux permits multiple swap areas, so the needs can be adjusted dynamically. Each area has a priority, and lower priority areas are not used until higher priority areas are filled.
In most situations, the recommended swap size is the total RAM on the system. You can see what your system is currently using for swap areas by looking at the /proc/swaps file and report on current usage with free.
The commands involving swap are:
-
mkswap: format swap partitions or files -
swapon: activate swap partitions or files -
swapoff: deactivate swap partitions or filescat /proc/swaps Filename Type Size Used Priority /dev/zram0 partition 7949308 1111296 100 free -m total used free shared buff/cache available Mem: 7763 3104 1528 707 3129 3532 Swap: 7762 1085 6677
At any given time, most memory is in use for caching file contents to prevent actually going to the disk any more than necessary, or in a sub-optimal order or timing. Such pages of memory are never swapped out as the backing store is the files themselves, so writing out to swap would be pointless; instead, dirty pages (memory containing updated file contents that no longer reflect the stored data) are flushed out to disk.
OOM (Out of Memory) Killer
- Simplest way to handle memory pressure: Permit memory allocations until all memory is exhausted, then fail.
- Second simplest way: Use swap space on disk to free up some resident memory. Total available memory is RAM + swap space.
- Linux allows memory overcommitment, granting memory requests beyond RAM + swap, as many processes don’t use all requested memory.
- Example:
- An example would be a program that allocates a 1 MB buffer, and then uses only a few pages of the memory.
- Another example is that every time a child process is forked, it receives a copy of the entire memory space of the parent. Because Linux uses the COW (copy on write) technique, unless one of the processes modifies memory, no actual copy needs be made. However, the kernel has to assume that the copy might need to be done.
- Kernel allows overcommitment only for user process pages; kernel pages are not swappable and are allocated at request time.
- OOM (Out of Memory) killer selects which processes to terminate during severe memory pressure.
OOM Killer Algorithms:
- Overcommission can be modify and even turn off by setting the value of
/proc/sys/vm/overcommit_memoryvalues:0(default): Permit overcommission but refuse obvious overcommits. Root users get more memory allocation than normal users.1: Allow all memory requests to overcommit.2: Turn off overcommission. Memory requests fail when total memory commit reaches swap space + a configurable percentage of RAM (/proc/sys/vm/overcommit_ratio).
- Heuristic algorithm not for normal operations but for graceful shutdown or retrenchment.
- Process selection based on badness value (
/proc/[pid]/oom_score) for each process. - Adjust
oom_adj_scorein the same directory for each task to make adjustments.