Bugzilla – Bug 913695
possible memory leak in kernel 3.16.7
Last modified: 2015-05-20 08:03:40 UTC
openSUSE 13.2 with the latest kernel 3.16.7-7 got to the state where there was not enough free memory available. Swap was full and no process held the memory at first glance. The system ran for several weeks. There was no page allocation failure or out of memory message in the log. /proc/meminfo, slabinfo and zoneinfo are attached with the list of processes. According to Michal Hocko it could be fixed by 5ddacbe92b806cd5b4f8f154e8e46ac267fff55c in upstream kernel. It is not present in 3.16.7 version. Unfortunately I had to turn off the computer due to power down in the building and cannot reproduce it again.
Created attachment 620006 [details] meminfo
Created attachment 620007 [details] slabinfo
Created attachment 620008 [details] zoneinfo
Created attachment 620009 [details] processes
MemTotal: 8131708 kB MemFree: 1889300 kB [...] Buffers: 23424 kB [...] SwapCached: 31980 kB Active: 3174752 kB Inactive: 1886984 kB [...] Unevictable: 80 kB Mlocked: 80 kB SwapTotal: 2103292 kB SwapFree: 8 kB [...] Slab: 782384 kB [...] KernelStack: 8144 kB PageTables: 41880 kB echo $((8131708-(1889300+23424+31980+3174752+1886984+80+782384+8144+41880))) 292780 So almost 300M is missing somewhere. This might be a signal of memory leak or some kernel component allocates from the page allocator directly and uses that memory. 5ddacbe92b80 (mm: free compound page with correct order) sounds like a good fit as well because you've told me (off-bugzilla) that the machine was running for a long time and you were running kvm with 4GB of memory many times. I can imagine that THP zero page would be mapped many times in that load and so the leak could build up continually until it starts getting noticable. I will push the patch to the git even though we are not 100% sure this is the real fix for this issue. Having /proc/vmstat would be ideal because we could check thp_zero_page_alloc which is not present in meminfo nor zoneinfo.
pushed to openSUSE-13.2 branch with a note that the culprit might be different but the fix is addressing a real leak anyway and as the stable kernel for 3.16 is dead already we need it anyway.
pushed the patch to openSUSE-13.1 as well. SLE branches are fine (SLE12 has the fix from the stable, SLE11-SP3 and older do not have 97ae17497e99 which introduced the issue).
So I am in the same situation after two weeks. This is the relevant part of /proc/vmstat thp_fault_alloc 9500 thp_fault_fallback 107474 thp_collapse_alloc 3284 thp_collapse_alloc_failed 17798 thp_split 2167 thp_zero_page_alloc 21 thp_zero_page_alloc_failed 1751 I am going to install kernel with the fix and see what happens.
openSUSE-SU-2015:0713-1: An update that solves 13 vulnerabilities and has 52 fixes is now available. Category: security (important) Bug References: 867199,893428,895797,900811,901925,903589,903640,904899,905681,907039,907818,907988,908582,908588,908589,908592,908593,908594,908596,908598,908603,908604,908605,908606,908608,908610,908612,909077,909078,909477,909634,910150,910322,910440,911311,911325,911326,911356,911438,911578,911835,912061,912202,912429,912705,913059,913466,913695,914175,915425,915454,915456,915577,915858,916608,917830,917839,918954,918970,919463,920581,920604,921313,922542,922944 CVE References: CVE-2014-8134,CVE-2014-8160,CVE-2014-8559,CVE-2014-9419,CVE-2014-9420,CVE-2014-9428,CVE-2014-9529,CVE-2014-9584,CVE-2014-9585,CVE-2015-0777,CVE-2015-1421,CVE-2015-1593,CVE-2015-2150 Sources used: openSUSE 13.2 (src): bbswitch-0.8-3.6.6, cloop-2.639-14.6.6, crash-7.0.8-6.6, hdjmod-1.28-18.7.6, ipset-6.23-6.6, kernel-docs-3.16.7-13.2, kernel-obs-build-3.16.7-13.7, kernel-obs-qa-3.16.7-13.1, kernel-obs-qa-xen-3.16.7-13.1, kernel-source-3.16.7-13.1, kernel-syms-3.16.7-13.1, pcfclock-0.44-260.6.2, vhba-kmp-20140629-2.6.2, virtualbox-4.3.20-10.2, xen-4.4.1_08-12.2, xtables-addons-2.6-6.2
openSUSE-SU-2015:0714-1: An update that solves 11 vulnerabilities and has 5 fixes is now available. Category: security (important) Bug References: 903640,904899,907988,909078,910150,911325,911326,912202,912654,912705,913059,913695,914175,915322,917839,920901 CVE References: CVE-2014-7822,CVE-2014-8134,CVE-2014-8160,CVE-2014-8173,CVE-2014-8559,CVE-2014-9419,CVE-2014-9420,CVE-2014-9529,CVE-2014-9584,CVE-2014-9585,CVE-2015-1593 Sources used: openSUSE 13.1 (src): cloop-2.639-11.19.1, crash-7.0.2-2.19.1, hdjmod-1.28-16.19.1, ipset-6.21.1-2.23.1, iscsitarget-1.4.20.3-13.19.1, kernel-docs-3.11.10-29.2, kernel-source-3.11.10-29.1, kernel-syms-3.11.10-29.1, ndiswrapper-1.58-19.1, pcfclock-0.44-258.19.1, vhba-kmp-20130607-2.20.1, virtualbox-4.2.28-2.28.1, xen-4.3.3_04-37.1, xtables-addons-2.3-2.19.1
Any updates with the updated kernel?
It seems to be ok now. My current uptime is 36 days and I do not see the mentioned problems anymore. I think this can be closed as resolved/fixed. Thanks a lot.
Thanks for the feedback. Closing...