Bug 1072492 - LVM2 metadata and Device-mapper event daemon do not stop properly on shutdown; SystemD killing them causes Data Integrity issues
LVM2 metadata and Device-mapper event daemon do not stop properly on shutdown...
Status: VERIFIED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem
Current
x86-64 Other
: P5 - None : Major (vote)
: ---
Assigned To: zhen ren
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-12-12 21:15 UTC by Benjamin Block
Modified: 2022-06-10 23:25 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Log during shutdown (1013.49 KB, text/plain)
2017-12-12 21:17 UTC, Benjamin Block
Details
Journal since boot (2.58 MB, text/plain)
2017-12-12 21:17 UTC, Benjamin Block
Details
Supportconfig Output (2.27 MB, application/octet-stream)
2017-12-12 21:36 UTC, Benjamin Block
Details
Log during shutdown after lvm2-monitor was enabled (1012.70 KB, text/plain)
2017-12-14 21:15 UTC, Benjamin Block
Details
Journal during shutdown after lvm2-monitor was enabled (2.58 MB, text/plain)
2017-12-14 21:16 UTC, Benjamin Block
Details
Last log messages before system halt (393.93 KB, image/jpeg)
2017-12-28 16:30 UTC, Benjamin Block
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Benjamin Block 2017-12-12 21:15:48 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
Build Identifier: 

I am running openSuse Tumbleweed (VERSION="20171203"). Every time I shutdown or reboot my system it hangs on the LVM2 metadata and Device-mapper event daemon. It takes the default systemd 1:30m timeout and then systemd kills them with signal 9. I can reproduce this on 2 independent systems, where I use the same hard-disk layout (my guess is, it is because of the partition/hard-disk layout).

This is the schematics of the layout I use on both systems (I created this during installation purely with yast!):
    /dev/sdd          │596.17 GiB│ WDC WD64-00AAKS-00A7B2│       │        │               │
    /dev/sdd1         │164.00 MiB│ EFI boot              │FAT    │EFI     │/boot/efi      │
    /dev/sdd2         │  2.00 GiB│ Linux native          │Ext2   │boot    │/boot          │
    /dev/sdd3         │594.01 GiB│ Linux LVM             │       │        │               │
    /dev/openSuse     │594.01 GiB│ LVM2 openSuse         │       │        │               │ <== using /dev/sdd3 as sole PV
    /dev/openSuse/pool│593.00 GiB│ LV                    │       │        │               │
    /dev/openSuse/root│128.00 GiB│ LV                    │XFS    │root    │/              │ <== using the pool /dev/openSuse/pool
    /dev/openSuse/swap│ 24.00 GiB│ LV                    │Swap   │swap    │swap           │ <== using the pool /dev/openSuse/pool

On the system I am writing this on, this has not yet caused big troubles, other than that the shutdown takes long; on the other system I experienced data-loss - both the swap volume and the XFS root got corrupted, while the EFI and Boot partition stayed in good shape.

So this seems to be a Data Integrity issue here, hence why the report and the severity.

I captured systemd's debug output during shutdown and will attach that. Otherwise I am not sure what you need.

Reproducible: Always
Comment 1 Benjamin Block 2017-12-12 21:17:17 UTC
Created attachment 752659 [details]
Log during shutdown
Comment 2 Benjamin Block 2017-12-12 21:17:50 UTC
Created attachment 752660 [details]
Journal since boot
Comment 3 Benjamin Block 2017-12-12 21:36:03 UTC
Created attachment 752663 [details]
Supportconfig Output
Comment 4 Benjamin Block 2017-12-12 23:48:18 UTC
OK, so as my second system was broken anyway, I did some small tests with trial and error to try narrow this down.

First I changed the basic partition/harddrive setup so that root and swap are not allocated from a thin pool anymore.
    So basically the same setup as above, but the pool was smaller than the whole VG "openSuse", and root plus swap where allocated directly as LVs using storage from the VG, and not from the thin pool.
    I still added a thin pool, taking up whatever space in the VG was left. I then added some additional thin volumes on that, now smaller, pool; but didn't format, nor mount, nor use them anywhere else - they just existed.

That didn't change anything. Same effect as before.

Then I deleted the thin pool and all the thin volumes. And then the shutdown started to work as expected.

Everything was done in yast.
Comment 5 Chenzi Cao 2017-12-13 10:19:19 UTC
Hi Eric, would you please take a look at this issue? I'm not sure whether it is right to assign it to you, if you are not the right assignee, please feel free to reassign, thanks.
Comment 7 zhen ren 2017-12-14 13:54:43 UTC
Comment on attachment 752659 [details]
Log during shutdown

Thanks for reporting. Little busy with other issues, not inspect the logs carefully yet. I will try to reproduce locally soon.

Asking myself: why these DM device are busy? 

```
[  250.780540] systemd-shutdown[1]: Could not detach DM /dev/dm-2: Device or resource busy
[  250.780971] systemd-shutdown[1]: Detaching DM 254:1.
[  250.781394] systemd-shutdown[1]: Could not detach DM /dev/dm-1: Device or resource busy
[  250.781822] systemd-shutdown[1]: Detaching DM 254:0.
[  250.782249] systemd-shutdown[1]: Could not detach DM /dev/dm-0: Device or resource busy
[  250.782679] systemd-shutdown[1]: Not all DM devices detached, 4 left.
[  250.783094] systemd-shutdown[1]: Cannot finalize remaining DM devices, continuing.
```
Comment 8 zhen ren 2017-12-14 14:08:15 UTC
If it's not hard for you, can you try to enable/start lvm2-monitor.service to see if it helps? I notice we disabled the service by default, could be irrelevant:

```
# systemctl status lvm2-monitor
● lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
   Loaded: loaded (/usr/lib/systemd/system/lvm2-monitor.service; disabled; vendor preset: disabled)
```
Comment 9 zhen ren 2017-12-14 14:53:44 UTC
(In reply to Benjamin Block from comment #1)
> Created attachment 752659 [details]
> Log during shutdown

Hmm, systemd outputs a lot of error as following, don't know what it means.

@Franck, could you can help interpret this error message?

```
grep -n "Failed to send unit remove signal" shutdown-journal.txt | egrep "dev-mapper-openSuse|dm-event"
5507:Dec 10 17:05:19 austri systemd[1]: dev-mapper-openSuse\x2dpool_tdata.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dpool_tdata.device: Transport endpoint is not connected
5542:Dec 10 17:05:19 austri systemd[1]: dev-mapper-openSuse\x2droot.device: Failed to send unit remove signal for dev-mapper-openSuse\x2droot.device: Transport endpoint is not connected
5552:Dec 10 17:05:19 austri systemd[1]: dev-mapper-openSuse\x2dswap.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dswap.device: Transport endpoint is not connected
5675:Dec 10 17:05:19 austri systemd[1]: dev-mapper-openSuse\x2dpool_tmeta.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dpool_tmeta.device: Transport endpoint is not connected
5678:Dec 10 17:05:19 austri systemd[1]: dev-mapper-openSuse\x2dpool\x2dtpool.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dpool\x2dtpool.device: Transport endpoint is not connected
20107:Dec 10 17:09:20 austri systemd[1]: dev-mapper-openSuse\x2dpool_tmeta.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dpool_tmeta.device: Transport endpoint is not connected
20115:Dec 10 17:09:20 austri systemd[1]: dm-event.service: Failed to send unit remove signal for dm-event.service: Transport endpoint is not connected
20211:Dec 10 17:09:20 austri systemd[1]: dev-mapper-openSuse\x2dpool.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dpool.device: Transport endpoint is not connected
20268:Dec 10 17:09:20 austri systemd[1]: dev-mapper-openSuse\x2dpool_tdata.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dpool_tdata.device: Transport endpoint is not connected
20381:Dec 10 17:09:20 austri systemd[1]: dev-mapper-openSuse\x2dpool\x2dtpool.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dpool\x2dtpool.device: Transport endpoint is not connected
20472:Dec 10 17:09:20 austri systemd[1]: dev-mapper-openSuse\x2dswap.device: Failed to send unit remove signal for dev-mapper-openSuse\x2dswap.device: Transport endpoint is not connected
20479:Dec 10 17:09:20 austri systemd[1]: dm-event.socket: Failed to send unit remove signal for dm-event.socket: Transport endpoint is not connected
20486:Dec 10 17:09:20 austri systemd[1]: dev-mapper-openSuse\x2droot.device: Failed to send unit remove signal for dev-mapper-openSuse\x2droot.device: Transport endpoint is not connected
```
Comment 10 Benjamin Block 2017-12-14 21:14:28 UTC
(In reply to zhen ren from comment #8)
> If it's not hard for you, can you try to enable/start lvm2-monitor.service
> to see if it helps? I notice we disabled the service by default, could be
> irrelevant:
> 
> ```
> # systemctl status lvm2-monitor
> ● lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using
> dmeventd or progress polling
>    Loaded: loaded (/usr/lib/systemd/system/lvm2-monitor.service; disabled;
> vendor preset: disabled)
> ```

That did in fact amend the hang on boot.

(In reply to zhen ren from comment #7)
> Comment on attachment 752659 [details]
> Log during shutdown
> 
> Thanks for reporting. Little busy with other issues, not inspect the logs
> carefully yet. I will try to reproduce locally soon.
> 
> Asking myself: why these DM device are busy? 
> 
> ```
> [  250.780540] systemd-shutdown[1]: Could not detach DM /dev/dm-2: Device or
> resource busy
> [  250.780971] systemd-shutdown[1]: Detaching DM 254:1.
> [  250.781394] systemd-shutdown[1]: Could not detach DM /dev/dm-1: Device or
> resource busy
> [  250.781822] systemd-shutdown[1]: Detaching DM 254:0.
> [  250.782249] systemd-shutdown[1]: Could not detach DM /dev/dm-0: Device or
> resource busy
> [  250.782679] systemd-shutdown[1]: Not all DM devices detached, 4 left.
> [  250.783094] systemd-shutdown[1]: Cannot finalize remaining DM devices,
> continuing.
> ```

That however did not go away. I am not sure whether that is relevant though. Before the last reboot I took a snapshot on what devices where what:

```
bblock@austri:~> lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                         8:0    0 232.9G  0 disk 
├─sda1                      8:1    0   450M  0 part 
├─sda2                      8:2    0   100M  0 part 
├─sda3                      8:3    0    16M  0 part 
├─sda4                      8:4    0 212.1G  0 part 
└─sda5                      8:5    0   855M  0 part 
sdb                         8:16   0   2.7T  0 disk 
├─sdb1                      8:17   0   128M  0 part 
└─sdb2                      8:18   0   2.7T  0 part 
sdc                         8:32   0   2.7T  0 disk 
├─sdc1                      8:33   0   128M  0 part 
└─sdc2                      8:34   0   2.7T  0 part 
sdd                         8:48   0 596.2G  0 disk 
├─sdd1                      8:49   0   164M  0 part /boot/efi
├─sdd2                      8:50   0     2G  0 part /boot                                     
└─sdd3                      8:51   0   594G  0 part                                            
  ├─openSuse-pool_tmeta   254:0    0    76M  0 lvm                                               
  │ └─openSuse-pool-tpool 254:2    0   593G  0 lvm                                                
  │   ├─openSuse-swap     254:3    0    24G  0 lvm  [SWAP]                                           
  │   ├─openSuse-root     254:4    0   128G  0 lvm  /                                                       
  │   └─openSuse-pool     254:5    0   593G  0 lvm                                                          
  └─openSuse-pool_tdata   254:1    0   593G  0 lvm                                                                                                                                                                                                                                
    └─openSuse-pool-tpool 254:2    0   593G  0 lvm                                                                                                                                                                                                                                
      ├─openSuse-swap     254:3    0    24G  0 lvm  [SWAP]                                                                                                                                                                                                                        
      ├─openSuse-root     254:4    0   128G  0 lvm  /                                                                                                                                                                                                                             
      └─openSuse-pool     254:5    0   593G  0 lvm                                                                                                                                                                                                                                
sr0                        11:0    1  1024M  0 rom                                                                                                                                                                                                                                
bblock@austri:~> ll /dev/mapper/                                                                                                                                                                                                                                                  
total 0                                                                                                                                                                                                                                                                           
crw------- 1 root root 10, 236 Dec 14 22:00 control                                                                                                                                                                                                                               
lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool -> ../dm-5                                                                                                                                                                                                              
lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tdata -> ../dm-1                                                                                                                                                                                                        
lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tmeta -> ../dm-0                                                                                                                                                                                                        
lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool-tpool -> ../dm-2                                                                                                                                                                                                        
lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-root -> ../dm-4                                                                                                                                                                                                              
lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-swap -> ../dm-3                                                                                                                                                                                                              
```

I will attach the shutdown log and journal again for the reboot after I took this data. If you wanna look into that.
Comment 11 Benjamin Block 2017-12-14 21:15:34 UTC
Created attachment 753146 [details]
Log during shutdown after lvm2-monitor was enabled
Comment 12 Benjamin Block 2017-12-14 21:16:03 UTC
Created attachment 753147 [details]
Journal during shutdown after lvm2-monitor was enabled
Comment 13 Franck Bui 2017-12-15 10:56:34 UTC
(In reply to Benjamin Block from comment #10)     
> 
> lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tdata -> ../dm-1  
> lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tmeta -> ../dm-0  
> lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool-tpool -> ../dm-2  
> 

Don't know what they are for but apparently they're still in used and at this point this is probably not expected.
Comment 14 zhen ren 2017-12-15 12:36:35 UTC
(In reply to Franck Bui from comment #13)
> (In reply to Benjamin Block from comment #10)     
> > 
> > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tdata -> ../dm-1  
> > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tmeta -> ../dm-0  
> > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool-tpool -> ../dm-2  
> > 
> 
> Don't know what they are for but apparently they're still in used and at
> this point this is probably not expected.

Hi Bui, yes, but I see the error message in comment#9 is more suspicious, could you take a look? thanks.
Comment 15 Franck Bui 2017-12-15 14:36:59 UTC
(In reply to zhen ren from comment #9)
> 
> @Franck, could you can help interpret this error message?
> 
> ```
> grep -n "Failed to send unit remove signal" shutdown-journal.txt | egrep
> "dev-mapper-openSuse|dm-event"
> 5507:Dec 10 17:05:19 austri systemd[1]:
> dev-mapper-openSuse\x2dpool_tdata.device: Failed to send unit remove signal
> for dev-mapper-openSuse\x2dpool_tdata.device: Transport endpoint is not
> connected
> 5542:Dec 10 17:05:19 austri systemd[1]: dev-mapper-openSuse\x2droot.device:
> Failed to send unit remove signal for dev-mapper-openSuse\x2droot.device:

Those *debug* messages only indicates that systemd failed to emit signals on D-Bus bus probably because dbus has been stopped.

These happen every times the system is shutdown so nothing suspicious here.
Comment 16 Benjamin Block 2017-12-16 17:34:49 UTC
(In reply to zhen ren from comment #14)
> (In reply to Franck Bui from comment #13)
> > (In reply to Benjamin Block from comment #10)     
> > > 
> > > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tdata -> ../dm-1  
> > > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tmeta -> ../dm-0  
> > > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool-tpool -> ../dm-2  
> > > 
> > 
> > Don't know what they are for but apparently they're still in used and at
> > this point this is probably not expected.
> 
> Hi Bui, yes, but I see the error message in comment#9 is more suspicious,
> could you take a look? thanks.

Any idea how I could find out what is having references on them open? I have a script running collecting the logs I attach here short before shutdown. I could run something and write our information about the use.
Comment 17 zhen ren 2017-12-18 02:32:35 UTC
> Those *debug* messages only indicates that systemd failed to emit signals on
> D-Bus bus probably because dbus has been stopped.
> 
> These happen every times the system is shutdown so nothing suspicious here.


Hi Franck, thanks for your confirmation. So probably, the reason why there's a lot such error message in the log is because the verbose knob for systemd was enabled.
Comment 18 zhen ren 2017-12-18 03:08:26 UTC
(In reply to Benjamin Block from comment #16)
> (In reply to zhen ren from comment #14)
> > (In reply to Franck Bui from comment #13)
> > > (In reply to Benjamin Block from comment #10)     
> > > > 
> > > > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tdata -> ../dm-1  
> > > > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool_tmeta -> ../dm-0  
> > > > lrwxrwxrwx 1 root root       7 Dec 14 22:00 openSuse-pool-tpool -> ../dm-2  
> > > > 
> > > 
> > > Don't know what they are for but apparently they're still in used and at
> > > this point this is probably not expected.
> > 
> > Hi Bui, yes, but I see the error message in comment#9 is more suspicious,
> > could you take a look? thanks.
> 
> Any idea how I could find out what is having references on them open? I have
> a script running collecting the logs I attach here short before shutdown. I
> could run something and write our information about the use.


Usually, you can use `lsof /dev/dm-1` to list who is using the device, but it fails in my testing:

ws:~ # lsof /dev/dm-0
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.

I think you can try to use the mountpoint of dm-* instead like:

#lsof /
...
Comment 19 zhen ren 2017-12-18 03:28:04 UTC
There was an issue report and fix that is very similar to your issue:

- issue:

https://www.redhat.com/archives/lvm-devel/2017-October/msg00000.html

- fixes:

https://www.redhat.com/archives/lvm-devel/2017-October/msg00004.html
https://www.redhat.com/archives/lvm-devel/2017-October/msg00006.html

These fixes are already in lvm2-02.175, and I see you're using a more recent version:

$ grep lvm2 rpm.txt
lvm2                                openSUSE Tumbleweed                 2.02.176-1.1

Hmm, could you confirm this by catting the content of dm-event.service?

"""
tw1:~ # cat /usr/lib/systemd/system/dm-event.service
[Unit]
Description=Device-mapper event daemon
Documentation=man:dmeventd(8)
Requires=dm-event.socket
After=dm-event.socket
Before=local-fs-pre.target shutdown.target
Conflicts=shutdown.target
DefaultDependencies=no
"""

BTW, when did you notice this issue? since the installation of your system, or after a certain upgrade?
Comment 20 zhen ren 2017-12-18 03:36:04 UTC
Anyway, is it possible to upgrade to the latest release?

tw1:~ # cat /etc/issue
Welcome to openSUSE Tumbleweed 20171213 - Kernel
Comment 21 zhen ren 2017-12-18 07:47:42 UTC
I can reproduce this issue locally now:

eric-pc:~ # cat /etc/issue
Welcome to openSUSE Tumbleweed 20171215  - Kernel 

eric-pc:~ # journalctl --since "2017-12-18 15:19:10"
...
Dec 18 15:19:10 eric-pc systemd[1]: Reached target Unmount All Filesystems.
Dec 18 15:19:10 eric-pc systemd[1]: Stopped target Local File Systems (Pre).
Dec 18 15:19:10 eric-pc systemd[1]: Stopped Create Static Device Nodes in /dev.
Dec 18 15:19:10 eric-pc systemd[1]: Stopped Create System Users.
Dec 18 15:19:10 eric-pc systemd[1]: Stopped Remount Root and Kernel File Systems.
Dec 18 15:19:10 eric-pc systemd[1]: Stopping Device-mapper event daemon...
Dec 18 15:19:10 eric-pc lvm[15030]: dmeventd received break, scheduling exit.
Dec 18 15:19:10 eric-pc systemd[1]: Received SIGRTMIN+20 from PID 931 (plymouthd).
Dec 18 15:20:29 eric-pc systemd[1]: lvm2-lvmetad.service: State 'stop-sigterm' timed out. Killing.
Dec 18 15:20:29 eric-pc systemd[1]: lvm2-lvmetad.service: Killing process 600 (lvmetad) with signal SIGKILL.
Dec 18 15:20:29 eric-pc systemd[1]: lvm2-lvmetad.service: Main process exited, code=killed, status=9/KILL
Dec 18 15:20:29 eric-pc systemd[1]: Stopped LVM2 metadata daemon.
Dec 18 15:20:29 eric-pc systemd[1]: lvm2-lvmetad.service: Unit entered failed state.
Dec 18 15:20:29 eric-pc systemd[1]: lvm2-lvmetad.service: Failed with result 'timeout'.
Dec 18 15:20:40 eric-pc systemd[1]: dm-event.service: State 'stop-sigterm' timed out. Killing.
Dec 18 15:20:40 eric-pc systemd[1]: dm-event.service: Killing process 15030 (dmeventd) with signal SIGKILL.
Dec 18 15:20:40 eric-pc systemd[1]: dm-event.service: Main process exited, code=killed, status=9/KILL
Dec 18 15:20:40 eric-pc systemd[1]: Stopped Device-mapper event daemon.
Dec 18 15:20:40 eric-pc systemd[1]: dm-event.service: Unit entered failed state.
Dec 18 15:20:40 eric-pc systemd[1]: dm-event.service: Failed with result 'timeout'.
Dec 18 15:20:40 eric-pc systemd[1]: Reached target Shutdown.
Dec 18 15:20:40 eric-pc systemd[1]: Starting /etc/init.d/halt.local Compatibility...
Dec 18 15:20:40 eric-pc systemd[1]: Started /etc/init.d/halt.local Compatibility.
Dec 18 15:20:40 eric-pc systemd[1]: Reached target Final Step.
Dec 18 15:20:40 eric-pc systemd[1]: Starting Power-Off...
Comment 22 zhen ren 2017-12-18 11:35:19 UTC
Hi Benjamin,

According to my testing, the following commands solve my problem:

systemctl enable lvm2-monitor.service
systemctl start lvm2-monitor.service

could you please update to the latest release and have a try?
Comment 23 zhen ren 2017-12-19 08:56:51 UTC
Hi Franck,

According to comment#22, could you help me to add the lvm2-monitor.service in the systemd-preset enablement list? Somehow, I can find the project on OBS this time:

"""
$ obs se systemd-preset-branding-openSUSE
No matches found for 'systemd-preset-branding-openSUSE' in projects
No matches found for 'systemd-preset-branding-openSUSE' in packages
"""
Comment 24 zhen ren 2017-12-19 08:57:37 UTC
(In reply to zhen ren from comment #23)
> Hi Franck,
> 
> According to comment#22, could you help me to add the lvm2-monitor.service
> in the systemd-preset enablement list? Somehow, I can find the project on
> OBS this time:

Oops, s/can/can't/

> 
> """
> $ obs se systemd-preset-branding-openSUSE
> No matches found for 'systemd-preset-branding-openSUSE' in projects
> No matches found for 'systemd-preset-branding-openSUSE' in packages
> """
Comment 25 Franck Bui 2017-12-19 09:06:30 UTC
(In reply to zhen ren from comment #23)
> 
> According to comment#22, could you help me to add the lvm2-monitor.service
> in the systemd-preset enablement list? Somehow, I can find the project on
> OBS this time:
> 
> """
> $ obs se systemd-preset-branding-openSUSE
> No matches found for 'systemd-preset-branding-openSUSE' in projects
> No matches found for 'systemd-preset-branding-openSUSE' in packages
> """

It seems to work here:

$ osc se systemd-presets-branding-openSUSE
No matches found for 'systemd-presets-branding-openSUSE' in projects
####################################################################
matches for 'systemd-presets-branding-openSUSE' in packages:

# Project                                         # Package
Base:System                                       systemd-presets-branding-openSUSE
...


As to why lvm2-monitor.service is not enabled, it's something for Martin I would say.

Martin any idea why lvm2-monitor.service is not enabled ?

It seems to cause issue during shutdown if this service is not running.
Comment 26 zhen ren 2017-12-19 09:59:14 UTC
Thanks Franck,

> # Project                                         # Package
> Base:System                                      
> systemd-presets-branding-openSUSE

Aha, I missed a 's' for preset.

> ...
> 
> 
> As to why lvm2-monitor.service is not enabled, it's something for Martin I
> would say.
> 
> Martin any idea why lvm2-monitor.service is not enabled ?
> 
> It seems to cause issue during shutdown if this service is not running.

Hha, Martin is innocent. I think it's my fault. Last year, I received a similar bug that the shutdown is not fine, it turns out to be the dm-event.socket is disabled by default. The problem was fixed by adding the dm-event.socket into the preset enblement list. But, I didn't check further more to see if more service needs to be enabled.

I will submit a request later on for systemd-presets package.
Comment 27 zhen ren 2017-12-19 10:06:19 UTC
Hmm, I'm confused that enabled service in default-openSUSE.preset is a few ones:

"""
$ cat default-openSUSE.preset 
enable cups.path
enable cups.service
enable cups.socket
enable epmd.socket
enable restorecond.service
"""

while default-SUSE.preset has many enabled services. So, how should I do?
Comment 28 zhen ren 2017-12-19 10:21:05 UTC
 
> while default-SUSE.preset has many enabled services. So, how should I do?

Checked my last change history, then just did like that.

https://build.opensuse.org/request/show/558339

I checked all service enabled in default-SUSE.preset will also be enabled on openSUSE.
Comment 29 Martin Wilck 2017-12-22 17:06:58 UTC
(In reply to zhen ren from comment #22)
> According to my testing, the following commands solve my problem:
> 
> systemctl enable lvm2-monitor.service
> systemctl start lvm2-monitor.service
> 
> could you please update to the latest release and have a try?

Any idea *why* this is so? enabling lvm2-monitor.service by default is nothing more than a workaround until we understand why it changes the shutdown behavior.
Comment 30 Benjamin Block 2017-12-22 22:39:36 UTC
(In reply to zhen ren from comment #22)
> Hi Benjamin,
> 
> According to my testing, the following commands solve my problem:
> 
> systemctl enable lvm2-monitor.service
> systemctl start lvm2-monitor.service
> 
> could you please update to the latest release and have a try?

Sry I don't have access to the system right now. I can retest next week.
Comment 31 zhen ren 2017-12-25 06:46:16 UTC
(In reply to Martin Wilck from comment #29)
> (In reply to zhen ren from comment #22)
> > According to my testing, the following commands solve my problem:
> > 
> > systemctl enable lvm2-monitor.service
> > systemctl start lvm2-monitor.service
> > 
> > could you please update to the latest release and have a try?
> 
> Any idea *why* this is so? enabling lvm2-monitor.service by default is
> nothing more than a workaround until we understand why it changes the
> shutdown behavior.

Hi Martin,

By default, when a thin/snapshot/raid LV is created, dm-event.service starts dmeventd daemon for monitoring, which creates a monitoring thread for the LV:

"""
lvm2/daemons/dmeventd/dmeventd.c:

1570 static int _handle_request(struct dm_event_daemon_message *msg,
1571                           struct message_data *message_data)
1572 {
1573         switch (msg->cmd) {
1574         case DM_EVENT_CMD_REGISTER_FOR_EVENT:
1575                 if (!message_data->events_field)
1576                         return -EINVAL;
1577                 return _register_for_event(message_data);
"""

When dm-event.service is going be stopped, dmeventd will cleanup the monitoring threads until it can exit:

"""
lvm2/daemons/dmeventd/dmeventd.c:
2147 int main(int argc, char *argv[])
2148 {
...
2251         for (;;) {
...
2272                 } else if (_exit_now == DM_SIGNALED_EXIT) {
2273                         _exit_now = DM_SCHEDULED_EXIT;
2274                         /*
2275                          * When '_exit_now' is set, signal has been received,
2276                          * but can not simply exit unless all
2277                          * threads are done processing.
2278                          */
2279                         log_info("dmeventd received break, scheduling exit.");
2280                 }
2281                 _process_request(&fifos);
2282                 _cleanup_unused_threads();
2283         }
"""

And, lvm2-monitor.service is used to stop the monitoring:

"""
In /usr/lib/systemd/system/lvm2-monitor.service:
...
 13 ExecStart=/sbin/lvm vgchange --monitor y --ignoreskippedcluster
 14 ExecStop=/sbin/lvm vgchange --monitor n --ignoreskippedcluster
"""

So, lvm2-monitor.service should be stopped first before stopping dm-event.service, otherwise dm-event.service stopping will hang.


When lvm2-monitor.service is disabled, nobody to stop the monitoring of those snapshot/thin/raid/mirror LV.
Comment 32 Benjamin Block 2017-12-28 16:27:00 UTC
(In reply to zhen ren from comment #22)
> Hi Benjamin,
> 
> According to my testing, the following commands solve my problem:
> 
> systemctl enable lvm2-monitor.service
> systemctl start lvm2-monitor.service
> 
> could you please update to the latest release and have a try?

I am at home again and retested stuff after installing updates on the installation. Seems to work fine now.
Comment 33 Benjamin Block 2017-12-28 16:30:30 UTC
Created attachment 754355 [details]
Last log messages before system halt

As for the open device mapper devices that I saw in the logs. I let the system halt before power off and took a picture of the state (no better way on x86 w/o serial console. SRY!). It tells that after it hands of to the initrd, that that dracut disassembles the device-mapper devices, and it doesn't print any more errors. That looks fine too me.

Opinions?
Comment 34 zhen ren 2017-12-29 08:13:56 UTC
Comment on attachment 754355 [details]
Last log messages before system halt

The following message during shutdown:

"
Could not detach DM /dev/dm-2: device or resource busy
"

is normal to me, which just reflects the fact. Note that the original issue of this bug is about dmeventd/special DM device monitoring things, which is only for special DM device like snapshot/thin/raid, refer to man dmeventd, while dm-[0-2] are normal DM devices ("linear" target).
Comment 35 zhen ren 2017-12-29 08:14:49 UTC
Thanks all, closing as per comment#32.
Comment 36 Benjamin Block 2018-01-02 17:52:40 UTC
OK. So I mark it resolved then. If I find problems with the in use device mapper devices I make a new issue.
Comment 37 Martin Wilck 2018-01-10 17:22:48 UTC
(In reply to zhen ren from comment #31)
> When lvm2-monitor.service is disabled, nobody to stop the monitoring of
> those snapshot/thin/raid/mirror LV.

Allright. But:

 # systemctl status lvm2-monitor.service
 lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
   Loaded: loaded (/usr/lib/systemd/system/lvm2-monitor.service; disabled; vendor preset: disabled)

"vendor preset" should be "enabled" if this service is so vital. It can't hurt much AFAICS. This would be the bug, then.
Comment 38 zhen ren 2018-01-11 05:50:26 UTC
Hi Martin,

> "vendor preset" should be "enabled" if this service is so vital. It can't
> hurt much AFAICS. This would be the bug, then.

Yes, it's already done in this PR:

https://build.opensuse.org/request/show/560488

But, not sure why OBS notification hasn't been received in this bug.