Bug 1129258 - BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 on __irq_domain_deactivate_irq+0x26/0x50 at psmouse_smbus_remove_i2c_device
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 on ...
Status: RESOLVED UPSTREAM
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
x86-64 Other
: P5 - None : Critical (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-03-14 14:37 UTC by Niklas Juslin
Modified: 2019-04-17 11:07 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
dmesg dump after clean boot and suspend (268.91 KB, text/plain)
2019-03-14 14:40 UTC, Niklas Juslin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Niklas Juslin 2019-03-14 14:37:53 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0
Build Identifier: 

This bug occurs on resume from suspend or hibernate 2/3 of the time. Has been an issue in all kernel version since last summer when I got my Lenovo X1 Carbon 6th gen laptop. The syslog is slowly flooded with crashes and sudoing, restarting (login timeout) and networking doesn't work anymore. By waiting enough when trying to write a dump to disk it might get written before the whole machine freezes. Most of the time the buffers are not flushed to the filesystem.

There have been numerous problems with the synaptics driver on the 6th gen models but I have not seen this kernel bug reported anywhere. The trackpad & trackpoint dies when this bug hits. This occurs on all or most of these laptops and the fixes work normally but when the kernel gets cranky running the commands listed here do nothing: 
https://wiki.archlinux.org/index.php/Lenovo_ThinkPad_X1_Carbon_(Gen_6)#TrackPoint_and_Touchpad_issues 

Reproducible: Sometimes

Steps to Reproduce:
1. suspend / hibernate
2. resume
Actual Results:  
[  113.898654] PM: suspend exit
[  113.906681] wlp2s0: associated
[  113.966559] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[  113.966568] PGD 0 P4D 0 
[  113.966574] Oops: 0000 [#1] SMP PTI
[  113.966580] CPU: 7 PID: 501 Comm: kworker/7:5 Tainted: G           O      4.20.13-1-default #1 openSUSE Tumbleweed (unreleased)
[  113.966587] Hardware name: LENOVO 20KH006MMX/20KH006MMX, BIOS N23ET59W (1.34 ) 11/08/2018
[  113.966596] Workqueue: events psmouse_smbus_remove_i2c_device
[  113.966604] RIP: 0010:__irq_domain_deactivate_irq+0x26/0x50
[  113.966609] Code: 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 74 38 53 48 89 fb 48 8b 7f 20 48 85 ff 75 0b eb 27 48 8b 7b 20 48 85 ff 74 1e 48 8b 47 18 <48> 8b 40 40 48 85 c0 74 08 48 89 de e8 69 97 b0 00 48 8b 5b 28 48
[  113.966618] RSP: 0018:ffffb25c420dfb68 EFLAGS: 00010082
[  113.966623] RAX: 0000000000000000 RBX: ffff9bd24107ec98 RCX: ffff9bd24107ece0
[  113.966628] RDX: ffff9bd24107ec00 RSI: 0000000000000001 RDI: ffff9bd23cd74d80
[  113.966632] RBP: ffff9bd24107ec00 R08: ffff9bd23fcda240 R09: ffff9bd23fcda2e0
[  113.966637] R10: 0000000000000000 R11: ffffffffb825e388 R12: ffff9bd24107ee08
[  113.966642] R13: ffff9bd24107ed14 R14: 000000000000008f R15: ffff9bd22e9eb000
[  113.966647] FS:  0000000000000000(0000) GS:ffff9bd2425c0000(0000) knlGS:0000000000000000
[  113.966653] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  113.966657] CR2: 0000000000000040 CR3: 000000008c20a006 CR4: 00000000003606e0
[  113.966662] Call Trace:
[  113.966670]  irq_domain_deactivate_irq+0x1a/0x30
[  113.966677]  __free_irq+0x25e/0x2b0
[  113.966683]  free_irq+0x31/0x60
[  113.966690]  release_nodes+0x18c/0x1c0
[  113.966699]  device_release_driver_internal+0x193/0x240
[  113.966705]  bus_remove_device+0xe5/0x150
[  113.966712]  device_del+0x136/0x350
[  113.966720]  ? rmi_unregister_function+0x36/0x60 [rmi_core]
[  113.966728]  rmi_unregister_function+0x2e/0x60 [rmi_core]
[  113.966736]  rmi_free_function_list+0x7b/0xf0 [rmi_core]
[  113.966745]  rmi_driver_remove+0x3f/0x50 [rmi_core]
[  113.966751]  device_release_driver_internal+0x183/0x240
[  113.966757]  bus_remove_device+0xe5/0x150
[  113.966763]  device_del+0x136/0x350
[  113.966771]  rmi_unregister_transport_device+0x12/0x20 [rmi_core]
[  113.966778]  rmi_smb_remove+0x11/0x20 [rmi_smbus]
[  113.966784]  i2c_device_remove+0x46/0xa0
[  113.966790]  device_release_driver_internal+0x183/0x240
[  113.966796]  bus_remove_device+0xe5/0x150
[  113.966808]  device_del+0x136/0x350
[  113.966816]  ? finish_task_switch+0x78/0x270
[  113.966824]  device_unregister+0x16/0x60
[  113.966831]  psmouse_smbus_remove_i2c_device+0x17/0x40
[  113.966840]  process_one_work+0x20b/0x400
[  113.966849]  worker_thread+0x2d/0x3f0
[  113.966857]  ? pwq_unbound_release_workfn+0xc0/0xc0
[  113.966864]  kthread+0x116/0x130
[  113.966871]  ? kthread_bind+0x30/0x30
[  113.966879]  ret_from_fork+0x24/0x50
[  113.966887] Modules linked in: thunderbolt cmac rfcomm ccm fuse af_packet ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter ip_tables bpfilter xt_conntrack x_tables nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rmi_smbus rmi_core arc4 msr snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_hdmi snd_hda_codec_realtek snd_soc_core btusb snd_hda_codec_generic btrtl btbcm nls_iso8859_1 xfs uvcvideo btintel nls_cp437 videobuf2_vmalloc snd_compress videobuf2_memops snd_pcm_dmaengine vfat videobuf2_v4l2 bluetooth fat snd_hda_intel iwlmvm snd_hda_codec videodev iTCO_wdt iTCO_vendor_support videobuf2_common ecdh_generic snd_hda_core intel_rapl mac80211 snd_hwdep snd_pcm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlwifi snd_timer
[  113.966941]  thinkpad_acpi e1000e kvm cfg80211 irqbypass snd mei_me joydev ptp pcspkr ucsi_acpi typec_ucsi processor_thermal_device intel_wmi_thunderbolt pps_core wmi_bmof i2c_i801 soundcore mei intel_pch_thermal intel_soc_dts_iosf typec thermal rfkill battery int3403_thermal ac int340x_thermal_zone int3400_thermal acpi_thermal_rel acpi_pad button pcc_cpufreq btrfs libcrc32c xor raid6_pq dm_crypt algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel i915 aesni_intel aes_x86_64 crypto_simd cryptd i2c_algo_bit glue_helper drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd drm usbcore serio_raw wmi video i2c_hid sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[  113.967051] CR2: 0000000000000040
[  113.967058] ---[ end trace 89ee2ee0c208ee06 ]---
[  113.967066] RIP: 0010:__irq_domain_deactivate_irq+0x26/0x50
[  113.967073] Code: 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 74 38 53 48 89 fb 48 8b 7f 20 48 85 ff 75 0b eb 27 48 8b 7b 20 48 85 ff 74 1e 48 8b 47 18 <48> 8b 40 40 48 85 c0 74 08 48 89 de e8 69 97 b0 00 48 8b 5b 28 48
[  113.967084] RSP: 0018:ffffb25c420dfb68 EFLAGS: 00010082
[  113.967091] RAX: 0000000000000000 RBX: ffff9bd24107ec98 RCX: ffff9bd24107ece0
[  113.967097] RDX: ffff9bd24107ec00 RSI: 0000000000000001 RDI: ffff9bd23cd74d80
[  113.967104] RBP: ffff9bd24107ec00 R08: ffff9bd23fcda240 R09: ffff9bd23fcda2e0
[  113.967111] R10: 0000000000000000 R11: ffffffffb825e388 R12: ffff9bd24107ee08
[  113.967117] R13: ffff9bd24107ed14 R14: 000000000000008f R15: ffff9bd22e9eb000
[  113.967124] FS:  0000000000000000(0000) GS:ffff9bd2425c0000(0000) knlGS:0000000000000000
[  113.967131] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  113.967138] CR2: 0000000000000040 CR3: 000000008c20a006 CR4: 00000000003606e0

...

[  115.886016] psmouse serio1: synaptics: Trying to set up SMBus access
[  115.889796] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1f.4/i2c-6/6-002c'
[  115.889835] CPU: 1 PID: 4669 Comm: trackpad Tainted: G      D    O      4.20.13-1-default #1 openSUSE Tumbleweed (unreleased)

...

[  115.890677] kobject_add_internal failed for 6-002c with -EEXIST, don't try to register things with the same name in the same directory.
[  115.890702] i2c i2c-6: Failed to register i2c client rmi4_smbus at 0x2c (-17)
[  115.890724] psmouse serio1: synaptics: SMbus companion is not ready yet
[  115.908965] psmouse serio1: synaptics: Unable to initialize device.
[  116.812557] input: PS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input24

Expected Results:  
When this doesn't happen all is fine and dandy.

Kernel 4.20.13-1-default
GRUB_CMDLINE_LINUX_DEFAULT="splash=silent resume=/dev/system/swap quiet initcall_debug"

I have a couple more dmesg dumps from older kernel versions if needed. They do look the same.
Comment 1 Niklas Juslin 2019-03-14 14:40:47 UTC
Created attachment 800093 [details]
dmesg dump after clean boot and suspend
Comment 2 Nicolas Patricio Saenz Julienne 2019-03-14 17:38:20 UTC
I'm curious to know why is the i2c-device being removed. Would you be able to install the "bcc-tools" package and run the following while you suspend/resume:

sudo /usr/share/bcc/tools/trace -K psmouse_smbus_disconnect

Some kernel stack traces should show up.
Comment 3 Takashi Iwai 2019-03-15 14:58:14 UTC
(In reply to Nicolas Patricio Saenz Julienne from comment #2)
> I'm curious to know why is the i2c-device being removed.

It's from psmouse_smbus_disconnect().

And this sounds like a known problem, according to the comments in psmouse_smbus_schedule_remove():

/*
 * This schedules removal of SMBus companion device. We have to do
 * it in a separate tread to avoid deadlocking on psmouse_mutex in
 * case the device has a trackstick (which is also driven by psmouse).
 *
 * Note that this may be racing with i2c adapter removal, but we
 * can't do anything about that: i2c automatically destroys clients
 * attached to an adapter that is being removed. This has to be
 * fixed in i2c core.
 */
Comment 4 Nicolas Patricio Saenz Julienne 2019-03-15 15:20:20 UTC
(In reply to Takashi Iwai from comment #3)
> (In reply to Nicolas Patricio Saenz Julienne from comment #2)
> > I'm curious to know why is the i2c-device being removed.
> 
> It's from psmouse_smbus_disconnect().
> 
> And this sounds like a known problem, according to the comments in
> psmouse_smbus_schedule_remove():
> 
> /*
>  * This schedules removal of SMBus companion device. We have to do
>  * it in a separate tread to avoid deadlocking on psmouse_mutex in
>  * case the device has a trackstick (which is also driven by psmouse).
>  *
>  * Note that this may be racing with i2c adapter removal, but we
>  * can't do anything about that: i2c automatically destroys clients
>  * attached to an adapter that is being removed. This has to be
>  * fixed in i2c core.
>  */

Fair enough, but the removal happens on resume time, which I found kind of strange.
Comment 5 Takashi Iwai 2019-03-15 15:22:07 UTC
IIRC, the driver tries to reconnect the device at resume (not sure whether it always happens or only occasionally at error, though).
Comment 6 Takashi Iwai 2019-03-15 15:30:49 UTC
... or this might be some hacks for suspend/resume, e.g. writing to sysfs or procfs to forcibly reconnect, as mentioned in the Arch wiki?
Comment 7 Jiri Slaby 2019-04-17 11:07:13 UTC
In anyway, this is most likely an upstream bug and should be reported to upstream (linux-i2c@vger.kernel.org).