Bug 906689 - xendomains fails to auto start domus
xendomains fails to auto start domus
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Xen
13.2
x86-64 Other
: P5 - None : Normal (vote)
: ---
Assigned To: Olaf Hering
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-11-22 19:34 UTC by Romain Pelissier
Modified: 2015-09-09 07:42 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
save and restore state operation on a vm (45.78 KB, text/plain)
2014-11-24 16:38 UTC, Romain Pelissier
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Romain Pelissier 2014-11-22 19:34:49 UTC
I have a Xen server and several Domu (all hvm). Starting the vm using xl command works fine and all the config have a symbolic link in /etc/xen/auto.
But xendomains fail to restore a save state of all the domus and don't gives a lot of information about the reason:

Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: napier
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: An error occurred while restoring domain napier:
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: Loading new save file /var/lib/xen/save/napier (new xl fmt info 0x0/0x0/660)
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: Savefile contains xl domain config
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: WARNING: you seem to be using "kernel" directive to override HVM guest firmware. Ignore that. Use "firmware_override" instead if you really want a
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: WARNING: ignoring device_model directive.
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: WARNING: Use "device_model_override" instead if you really want a non-default device_model
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: xc: error: 0-length read: Internal error
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: xc: error: rdexact failed (read rc: 0, errno: 0): Internal error
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: xc: error: read: p2m_size (0 = Success): Internal error
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: libxl: error: libxl_create.c:959:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: libxl: error: libxl_create.c:1041:domcreate_rebuild_done: cannot (re-)build domain: -3
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: libxl: error: libxl.c:1405:libxl__destroy_domid: non-existant domain 4
Nov 22 13:54:23 pdrvsrv001 xendomains[1445]: libxl: error: libxl.c:1369:domain_destroy_callback: unable to destroy guest with domid 4


Note that in 13.1, all the autostart works fine.
Comment 1 Romain Pelissier 2014-11-24 16:36:48 UTC
I have tested a save and a restore operation on a running vm:

[root|pdrvsrv001:/var/tmp]xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  8882     4     r-----    2984.5
descartes                                    8  1000     1     -b----    1854.4
napier                                       9  1000     1     -b----     233.6
lagrange                                    10  1000     1     -b----     481.9
pdrapp001                                   11  1000     1     -b----    1179.8
pdrdb001                                    12   768     1     -b----     129.8
pdrfs001                                    13   768     1     -b----     113.6
galilee                                     15  1000     1     -b----       8.0

[root|pdrvsrv001:/var/tmp]xl save 15 /var/tmp/galilee
Saving to /var/tmp/galilee new xl format (info 0x0/0x0/735)
xc: Saving memory: iter 0 (last sent 0 skipped 0): 1044481/1044481  100%
[root|pdrvsrv001:/var/tmp]xl restore /var/tmp/galilee
Loading new save file /var/tmp/galilee (new xl fmt info 0x0/0x0/735)
 Savefile contains xl domain config
Parsing config from <saved>
WARNING: you seem to be using "kernel" directive to override HVM guest firmware. Ignore that. Use "firmware_override" instead if you really want a non-default firmware
WARNING: ignoring device_model directive.
WARNING: Use "device_model_override" instead if you really want a non-default device_model

and it's working perfectly fine. I put in the ticket the log of the commands with the -v option.
Comment 2 Romain Pelissier 2014-11-24 16:38:12 UTC
Created attachment 614759 [details]
save and restore state operation on a vm
Comment 3 Olaf Hering 2014-11-26 16:47:22 UTC
Is the domain properly savedI To me it looks like just the config header is stored and the memory dump is missing. Looks like systemd just kills the xl process..
Comment 4 Olaf Hering 2014-11-26 17:01:39 UTC
While I have seen the "xc: error: 0-length read: Internal error" with SLE12 the xendomains script appears to work there.

Please check what is in /var/lib/xen/save, is it just a small text file?
Comment 5 James Fehlig 2014-11-27 10:00:57 UTC
This sounds similar to a bug we had with libvirt-guests and qemu/kvm, where the qemu process running the VM was killed off before libvirt-guests could save the VM memory

https://bugzilla.redhat.com/show_bug.cgi?id=1031696

Xen is quite different in this regard, but perhaps something similar is happening.
Comment 6 Romain Pelissier 2014-12-09 17:54:54 UTC
(In reply to Olaf Hering from comment #4)
> While I have seen the "xc: error: 0-length read: Internal error" with SLE12
> the xendomains script appears to work there.
> 
> Please check what is in /var/lib/xen/save, is it just a small text file?

Hi,
I only see text files, the same in fact as the domus definition in /etc/xen/vm

Thanks
Romain
Comment 7 Olaf Hering 2015-01-13 16:43:58 UTC
If this is still reproducible, please adjust
/usr/lib/systemd/system/xendomains.service. It has an After= line, but no Requires=. I think it should look like this:

After=xencommons.service network-online.target
Requires=xencommons.service network-online.target
Comment 8 Swamp Workflow Management 2015-06-11 15:05:38 UTC
SUSE-SU-2015:1042-1: An update that solves 7 vulnerabilities and has one errata is now available.

Category: security (important)
Bug References: 906689,931625,931626,931627,931628,932770,932790,932996
CVE References: CVE-2015-3209,CVE-2015-4103,CVE-2015-4104,CVE-2015-4105,CVE-2015-4106,CVE-2015-4163,CVE-2015-4164
Sources used:
SUSE Linux Enterprise Software Development Kit 12 (src):    xen-4.4.2_06-21.1
SUSE Linux Enterprise Server 12 (src):    xen-4.4.2_06-21.1
SUSE Linux Enterprise Desktop 12 (src):    xen-4.4.2_06-21.1
Comment 9 Romain Pelissier 2015-06-11 15:12:10 UTC
(In reply to Olaf Hering from comment #7)
> If this is still reproducible, please adjust
> /usr/lib/systemd/system/xendomains.service. It has an After= line, but no
> Requires=. I think it should look like this:
> 
> After=xencommons.service network-online.target
> Requires=xencommons.service network-online.target

Hi,
I have made the change suggested and will test it soon.
Romain
Comment 10 Olaf Hering 2015-06-19 14:43:12 UTC
I assume this is fixed.
Comment 11 Swamp Workflow Management 2015-06-22 10:06:27 UTC
openSUSE-SU-2015:1092-1: An update that solves 17 vulnerabilities and has 10 fixes is now available.

Category: security (important)
Bug References: 861318,882089,895528,901488,903680,906689,910254,912011,918995,918998,919098,919464,919663,921842,922705,922706,922709,923758,927967,929339,931625,931626,931627,931628,932770,932790,932996
CVE References: CVE-2014-3615,CVE-2015-2044,CVE-2015-2045,CVE-2015-2151,CVE-2015-2152,CVE-2015-2751,CVE-2015-2752,CVE-2015-2756,CVE-2015-3209,CVE-2015-3340,CVE-2015-3456,CVE-2015-4103,CVE-2015-4104,CVE-2015-4105,CVE-2015-4106,CVE-2015-4163,CVE-2015-4164
Sources used:
openSUSE 13.2 (src):    xen-4.4.2_06-23.1
Comment 12 Olaf Hering 2015-09-09 07:42:59 UTC
Assuming this is fixed.