Bug 1072000 - qemu does not honor migrate_set_downtime over qmp
qemu does not honor migrate_set_downtime over qmp
Status: RESOLVED WORKSFORME
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: KVM
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: E-mail List
E-mail List
https://travis-ci.org/foursixnine/os-...
:
Depends on:
Blocks: 1072008
  Show dependency treegraph
 
Reported: 2017-12-08 19:01 UTC by Santiago Zarate
Modified: 2020-11-30 14:32 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Run with unit test: prove --verbose t/99-full-stack.t of os-autoinst (14.25 KB, text/plain)
2017-12-08 19:16 UTC, Santiago Zarate
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Santiago Zarate 2017-12-08 19:01:58 UTC
While trying to execute a migration of a machine via qmp, I'm sending the following commands via virsh:


> qemu-monitor-command --pretty openQA-staging '{ "execute": "migrate_set_downtime", "arguments": { "value": 0.1 } }'
> qemu-monitor-command --pretty openQA-staging '{ "execute": "migrate", "arguments": { "uri": "exec:gzip -c > /tmp/vm.dump.gz" } }'  
> qemu-monitor-command --pretty openQA-staging '{ "execute": "query-migrate" }'

however, querying the migration:

{
  "return": {
    "expected-downtime": 100,
    "status": "active",
    "setup-time": 22,
    "total-time": 2030,
    "ram": {
      "total": 2282569728,
      "postcopy-requests": 0,
      "dirty-sync-count": 2,
      "page-size": 4096,
      "remaining": 2282131456,
      "mbps": 0.32976,
      "transferred": 2769990,
      "duplicate": 272,
      "dirty-pages-rate": 0,
      "skipped": 0,
      "normal-bytes": 2756608,
      "normal": 673
    }
  },
  "id": "libvirt-49"
Comment 1 Santiago Zarate 2017-12-08 19:16:07 UTC
Created attachment 752176 [details]
Run with unit test: prove --verbose t/99-full-stack.t of os-autoinst

A reproducer can be found at: https://git.io/vb4en
Comment 2 Santiago Zarate 2017-12-08 19:17:53 UTC
The actual problem is that qemu is not honoring the set downtime, therefore if the migration takes e.g 1H, the user would need to wait for such a long time to know the result of the migration/memory migration/dump
Comment 3 Santiago Zarate 2017-12-08 19:51:30 UTC
On travis the test ran for 10 mins and travis killed the job (for bsc#1072008 which is very similar case, just without migration speed it only takes seconds to execute)
Comment 4 Bruce Rogers 2017-12-11 18:35:01 UTC
(In reply to Santiago Zarate from comment #2)
> The actual problem is that qemu is not honoring the set downtime, therefore
> if the migration takes e.g 1H, the user would need to wait for such a long
> time to know the result of the migration/memory migration/dump

Sorry, but I'm not following what the problem is here. You talk about a migration taking 1H, and that the user would need to wait for such a long time. Are you confusing the purpose of set downtime? The downtime is for specifying the maximum time allowed for the VM to be stopped during the critical phase of migration when it is not running. This is not the same as the total migration time by any means. What happens is that the vm keeps running while memory pages are being sent to the destination. If the guest isn't dirtying pages faster than they are copied to the destination, then we get to a a point where a final copy of memory pages is done with the vm stopped. If this can take place within the downtime constraint, all is well, and the migration will succeed. If not the migration is aborted and failed.

In the example you provide I see the dirty-pages-rate is zero, so there should be no issue with the migration completing in a reasonable time period.

If this didn't help, please let me know what I'm not getting about your bug report. Thanks.
Comment 5 Antoine Ginies 2020-11-30 14:32:54 UTC
old one, and normal behaviour