Bugzilla – Bug 1064854
cloud-init ignores network settings from config-drive in external network
Last modified: 2017-11-29 23:09:56 UTC
Leap 42.3, OpenStack Ocata Description: When trying to build a new base image for our private cloud based on 42.3 we couldn't use cloud-init-0.7.8 (the first version which worked for us in provider/external networks) from Tumbleweed as the page says "Resource is no longer available". So we tried to test cloud-init version 17.1, but this only works for internal/self-service networks where the network data is provided via metadata service and not via config-drive, which we need in external networks. Exptected result: Successfully launch an instance in external network via config-drive (which worked in 0.7.8) Actual result: Instance is launched without network configuration as is not reachable. After configuring the network manually to have access to the VM I can see the network data would be available: ---cut here--- test-ebl:~ # cat /mnt/openstack/latest/network_data.json {"services": [], "networks": [{"network_id": "be1a07e0-d998-4a74-847a-d44754055229", "type": "ipv4", "netmask": "255.255.255.0", "link": "tap699730da-62", "routes": [{"netmask": "0.0.0.0", "network": "0.0.0.0", "gateway": "192.168.168.1"}], "ip_address": "192.168.168.29", "id": "network0"}], "links": [{"ethernet_mac_address": "fa:16:3e:04:b6:0c", "mtu": 1500, "type": "bridge", "id": "tap699730da-62", "vif_id": "699730da-62c6-4233-b1ff-2a76ec44c8cc"}]} ---cut here--- Steps to reproduce: Launch instance with cloud-init-17.1 in an external (provider) network via config-drive. I'll attach cloud-init log files from the to-be-base-image VM and the newly launched VM.
Created attachment 745645 [details] clout-init-output.log from base image
Created attachment 745646 [details] clout-init.log from base image
Created attachment 745647 [details] clout-init-output.log from new vm
Created attachment 745648 [details] clout-init.log from new vm
I ran some more tests with cloud-init. This time I was able to get the cloud-init package from Tumbleweed repository and installed cloud-init version 0.7.8 on a freshly upgraded Leap-42.3 instance. When I restarted the cloud-init services this error occurs: ---cut here--- Cloud-init v. 0.7.8 running 'init-local' at Thu, 02 Nov 2017 11:31:19 +0000. Up 95.77 seconds. 2017-11-02 12:31:19,934 - util.py[WARNING]: Failed loading pickled blob from /var/lib/cloud/instance/obj.pkl 2017-11-02 12:31:19,948 - util.py[WARNING]: failed stage init-local failed run of stage init-local ------------------------------------------------------------ Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/cloudinit/cmd/main.py", line 521, in status_wrapper ret = functor(name, args) File "/usr/lib/python3.6/site-packages/cloudinit/cmd/main.py", line 247, in main_init init.fetch(existing=existing) File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 358, in fetch return self._get_data_source(existing=existing) File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 268, in _get_data_source pkg_list, self.reporter) File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 295, in find_source ds_list = list_sources(cfg_list, ds_deps, pkg_list) File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 335, in list_sources ['get_datasource_list']) File "/usr/lib/python3.6/site-packages/cloudinit/importer.py", line 47, in find_module mod = import_module(full_path) File "/usr/lib/python3.6/site-packages/cloudinit/importer.py", line 27, in import_module __import__(module_name) File "/usr/lib/python3.6/site-packages/cloudinit/sources/DataSourceOpenNebula.py", line 133 list = self.REG_DEV_MAC.findall(self.ip) ^ TabError: inconsistent use of tabs and spaces in indentation ---cut here--- I fixed the indentation and successfully restarted the services. (A diff of the file will be attached to this bug.) Snapshotting this instance and launching a new VM in an external network was successful, the instance's network was configured correctly via config-drive. I figured the indentation errors don't require a separate bug report. If you disagree I'll file a new report, of course. test:~ # rpm -qa | grep cloud-init cloud-init-0.7.8-12.1.x86_64 cloud-init-config-suse-0.7.8-33.1.x86_64 test:~ # cat /etc/os-release NAME="openSUSE Leap" VERSION="42.3" This topic is also an issue in the opensuse-cloud mailing-list [1]. [1] https://lists.opensuse.org/opensuse-cloud/2017-11/msg00000.html
Created attachment 746805 [details] diff of DataSourceOpenNebula.py (indentation incorrect)
Can you please try again with cloud-init-17.1? In version 0.7.8 we carried an out of tree implementation for DataSourceOpenNebula, with 17.1 we are using the upstream code. cloud-init-17.1 can be found in the Cloud:Tools project in OBS [1] [1] https://build.opensuse.org/project/show/Cloud:Tools
Comment#1 says you tried 17.1 but then in comment#5 it is about 0.7.8, confused, sorry. Also the attached patch is for the OpenNebula data source, we dropped the out of tree implementation with the switch to 17.1 as stated in comment#7
W.r.t. attachment from comment#1, it contains "ImportError: No distribution found for distro suse (searched ['suse', 'cloudinit.distros.suse'])" This was a packaging bug also reported as boo#1063716, this has been fixed in the 17.1 build in Cloud:Tools
Attachment from comment#2 has the same problem as described in comment#9 Attachment from comment#3 shows a configuration issue, in 17.1 the option for the config file was renamed to cc_zypper_add_repo it is NOT cc_zypp_add_repo I think the latest build in Cloud:Tools may fix the reported problems.
Sorry for the mixup, I thought it would be easier to track all current tests in one bug instead of multiple reports. I just wanted to show the differences between the working version 0.7.8 and the latest 17.1. From here on I'll focus on version 17.1. I installed the latest version cloud-init-17.1-6.1 and some of the reported issues are resolved (cc_zypper_add_repo, "No distribution found for distro suse"), even the ip address was applied successfully, even though network has to be restarted before it's visible. But the instance is still not reachable because there's no default gateway applied. As soon as I set it manually, the instance responds to ping. There's one more thing I'm not sure about: do I have to remove all existing data from previous cloud-init runs? Because when I first launched a new instance from a snapshot, the logs reported "not a new instance. network config is not applied" When I deleted everything under /var/lib/cloud/ and created a new snapshot, a freshly launched instance applied at least some of the network config. What should I be aware of when creating a new snapshot regarding cached data?
Please let me know if you need new logs, I'll clear the needinfo flag for now.
You need to clear everything in /var/lib/cloud There is data in /var/lib/cloud that carries across reboots and thus when you create a new snapshot for testing you need to clear /var/lib/cloud before creating the snapshot. Yes, I will need new logs.
Created attachment 748415 [details] 17.1 cloud-init.log
Created attachment 748416 [details] 17.1 cloud-init-output.log
The latest attempt failed to apply any configuration at all, one error is this line: "Creating symbolic link from '/var/lib/cloud/instance' => '/var/lib/cloud/instances/iid-datasource-none'" This makes no sense at all. Before creating the snapshot I removed all files under /var/lib/cloud as suggested. Also the config-drive seems to be ignored, instead a timeout for metadata server is reported, which is not reachable, of course, since it's an external network.
both logs from today identify cloud-init as 0.7.8
Created attachment 748485 [details] 17.1-cloud-init.log Truncated cloud-init.log file after latest cloud-init run
Created attachment 748486 [details] 17.1-cloud-init-output.log Truncated cloud-init-output.log file after latest cloud-init run
Truncated both log files and re-uploaded to get rid of irrelevant history
With the latest build of version 17.1 (13-Nov-2017 23:01) I'm not even able to get the instance to read from config-drive anymore. I reinstalled from scratch, removed all traces of previous installations and created a new snapshot, but network is not configured even though mounting the config-drive is possible after the instance has booted. Have there been further changes since the build from 11-Nov-2017? Could you point me to what I'm missing or doing wrong? While cloud-init seems to work perfectly fine for internal networks it gets really frustrating when trying to use it in external networks, which we need all the time.
Sorry for the trouble, looks like we are caught between a number of changes w.r.t. networking. Given there are no unit tests at the moment and I have no setup to reproduce the issue we'll have to continue this back and forth for a bit, sorry. So what does /etc/sysconfig/network/ifcfg-eth0 actually look like Based on the logs I am not certain we actually reach the point where the configuration would get written. It looks like the config file may be completely screwed up as the log contains: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/cloudinit/distros/__init__.py", line 346, in _bring_up_interface (_out, err) = util.subp(cmd) File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 1849, in subp cmd=args) ProcessExecutionError: Unexpected error while running command. Command: ['ifup', 'eth0'] Exit code: 162 so ifup is seeing something it doesn't like at all.
Seems like it keeps the dhcp setting from the base image: ebl1:~ # cat /etc/sysconfig/network/ifcfg-eth0 BOOTPROTO=dhcp BROADCAST= ETHTOOL_OPTIONS= IPADDR= MTU= NAME= NETMASK= NETWORK= REMOTE_IPADDR= STARTMODE=auto DHCLIENT_SET_DEFAULT_ROUTE=yes USERCONTROL=no Even if it would write the network config it probably would be wrong because it always tries to contact the metadata server, that fails and it applies DataSourceNone: ---cut here--- 2017-11-14 14:59:18,352 - __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStack'> 2017-11-14 14:59:18,353 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'headers': {'User-Agent': 'Cloud-Init/17.1'}, 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0} configuration 2017-11-14 14:59:18,363 - url_helper.py[DEBUG]: Calling 'http://169.254.169.254/openstack' failed [0/-1s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /openstack (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fef60760d90>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2017-11-14 14:59:18,363 - DataSourceOpenStack.py[DEBUG]: Giving up on OpenStack md from ['http://169.254.169.254/openstack'] after 0 seconds 2017-11-14 14:59:18,363 - handlers.py[DEBUG]: finish: init-network/search-OpenStack: SUCCESS: no network data found from DataSourceOpenStack 2017-11-14 14:59:18,363 - handlers.py[DEBUG]: start: init-network/search-None: searching for network data from DataSourceNone 2017-11-14 14:59:18,364 - __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceNone.DataSourceNone'> 2017-11-14 14:59:18,364 - handlers.py[DEBUG]: finish: init-network/search-None: SUCCESS: found network data from DataSourceNone 2017-11-14 14:59:18,364 - stages.py[INFO]: Loaded datasource DataSourceNone - DataSourceNone 2017-11-14 14:59:18,364 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg (quiet=False) 2017-11-14 14:59:18,364 - util.py[DEBUG]: Read 2112 bytes from /etc/cloud/cloud.cfg 2017-11-14 14:59:18,364 - util.py[DEBUG]: Attempting to load yaml from string of length 2112 with allowed root types (<type 'dict'>,) 2017-11-14 14:59:18,373 - util.py[DEBUG]: Reading from /etc/cloud/cloud.cfg.d/05_logging.cfg (quiet=False) 2017-11-14 14:59:18,373 - util.py[DEBUG]: Read 2057 bytes from /etc/cloud/cloud.cfg.d/05_logging.cfg 2017-11-14 14:59:18,373 - util.py[DEBUG]: Attempting to load yaml from string of length 2057 with allowed root types (<type 'dict'>,) 2017-11-14 14:59:18,378 - util.py[DEBUG]: Reading from /run/cloud-init/cloud.cfg (quiet=False) 2017-11-14 14:59:18,378 - util.py[DEBUG]: Read 37 bytes from /run/cloud-init/cloud.cfg 2017-11-14 14:59:18,378 - util.py[DEBUG]: Attempting to load yaml from string of length 37 with allowed root types (<type 'dict'>,) 2017-11-14 14:59:18,379 - util.py[DEBUG]: Attempting to load yaml from string of length 0 with allowed root types (<type 'dict'>,) 2017-11-14 14:59:18,379 - util.py[DEBUG]: load_yaml given empty string, returning default 2017-11-14 14:59:18,380 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/instance 2017-11-14 14:59:18,380 - util.py[DEBUG]: Creating symbolic link from '/var/lib/cloud/instance' => '/var/lib/cloud/instances/iid-datasource-none' ---cut here--- Since it can't find its new instance_id this is the result: ebl1:~ # ll /var/lib/cloud/instance lrwxrwxrwx 1 root root 44 14. Nov 15:59 /var/lib/cloud/instance -> /var/lib/cloud/instances/iid-datasource-none It looks like it doesn't even consider reading the contents of config-drive (anymore). Please let me know if you need more tests or information, I'm happy to help!
(In reply to Eugen Block from comment #23) > Seems like it keeps the dhcp setting from the base image: > > ebl1:~ # cat /etc/sysconfig/network/ifcfg-eth0 > BOOTPROTO=dhcp > BROADCAST= > ETHTOOL_OPTIONS= > IPADDR= > MTU= > NAME= > NETMASK= > NETWORK= > REMOTE_IPADDR= > STARTMODE=auto > DHCLIENT_SET_DEFAULT_ROUTE=yes > USERCONTROL=no This is definitely not written by cloud init when the distro is set to opensuse or sles. In the code path for opensuse/sles we only write the folowing values if present in the config: BOOTPROTO BROADCAST GATEWAY IPADDR LLADDR NETMASK STARTMODE USERCONTROL (which is always set to 'no') ETHTOOL_OPTIONS (which is always the empty string, i.e. '') Your file contains MTU, REMOTE_IPADDR thus this originates from somewhere else. I just added a unit test and pushed it upstream to ensure at least the rudimentary configuration works and writes a proper file, and it does. So the failure is unlikely to be in the code that writes the network configuration.
(In reply to Eugen Block from comment #23) > > Even if it would write the network config it probably would be wrong because > it always tries to contact the metadata server, that fails and it applies So what's in your /etc/cloud/cloud.cfg file? Are you setting the data source? datasource_list > Since it can't find its new instance_id this is the result: > > ebl1:~ # ll /var/lib/cloud/instance > lrwxrwxrwx 1 root root 44 14. Nov 15:59 /var/lib/cloud/instance -> > /var/lib/cloud/instances/iid-datasource-none > > It looks like it doesn't even consider reading the contents of config-drive > (anymore). For config drive you'd set this to datasource_list: [ ConfigDrive ] in /etc/cloud/cloud.cfg
> This is definitely not written by cloud init when the distro is set to opensuse or sles. > In the code path for opensuse/sles we only write the folowing values if present in the config I never touched the config, usually this seemed to be correct. Now I deleted the file in my base vm and reran cloud-init, now I see this: ebl1:~ # cat /etc/sysconfig/network/ifcfg-eth0 # Created by cloud-init v. 17.1 on Thu, 16 Nov 2017 09:31:31 +0000 BOOTPROTO=dhcp USERCONTROL=no STARTMODE=auto ETHTOOL_OPTIONS= Seems to be the desired output, right? But what do you mean by "if present in the config"? What else do I have to be aware of regarding network config? I thought providing it via condig-drive would be enough (as it used to before). > So what's in your /etc/cloud/cloud.cfg file? > Are you setting the data source? > For config drive you'd set this to > datasource_list: [ ConfigDrive ] I didn't have to set anything before, I just used to set config-drive to "true" and cloud-init read the provided information. I just retried with "datasource_list: [ ConfigDrive ]" and cleaned the /var/lib/cloud/ directory, but still nothing happens. I'll attach the latest logs.
Created attachment 748911 [details] clout-init.log from base image
Created attachment 748912 [details] clout-init-output.log from base image
Okay, now I somehow managed to apply (some of) the network data from config drive: ---cut here--- 2017-11-16 15:26:39,842 - stages.py[INFO]: Applying network configuration from ds bringup=False: {'version': 1, 'config': [{'subnets': [{u'routes': [{u'netmask': u'0.0.0.0', u'network': u'0.0.0.0', u'gateway': u'192.168.168.1'}], u'netmask': u'255.255.255.0', u'type': 'static', 'ipv4': True, 'address': u'192.168.168.30'}], 'mac_address': u'fa:16:3e:ee:2b:97', u'type': 'physical', 'name': 'eth0', u'mtu': 1500}]} 2017-11-16 15:26:39,842 - __init__.py[WARNING]: apply_network_config is not currently implemented for distribution '<class 'cloudinit.distros.opensuse.Distro'>'. Attempting to use apply_network 2017-11-16 15:26:39,858 - util.py[DEBUG]: Cloud-init v. 17.1 running 'modules:final' at Thu, 16 Nov 2017 15:26:39 +0000. Up 21102.58 seconds. 2017-11-16 15:26:39,868 - opensuse.py[DEBUG]: Translated ubuntu style network settings # Converted from network_config for distro <class 'cloudinit.distros.opensuse.Distro'> # Implmentation of _write_network_config is needed. auto lo iface lo inet loopback auto eth0 iface eth0 inet static hwaddress fa:16:3e:ee:2b:97 address 192.168.168.30/24 mtu 1500 post-up route add default gw 192.168.168.1 || true pre-down route del default gw 192.168.168.1 || true into {u'lo': {'auto': True, 'ipv6': {}}, u'eth0': {'auto': True, 'bootproto': u'static', 'address': u'192.168.168.30/24', 'ipv6': {}}} ---cut here--- Please note that this is the same instance I launched today. What I did in this instance (NOT the base vm): - deleting /etc/sysconfig/network/ifcfg-eth0 - removing everything under /var/lib/cloud/ - systemctl restart cloud-config.service cloud-final.service cloud-init-local.service cloud-init.service This run at least created a new /etc/sysconfig/network/ifcfg-eth0: clean-cloudinit17:~ # cat /etc/sysconfig/network/ifcfg-eth0 # Created by cloud-init v. 17.1 on Thu, 16 Nov 2017 15:15:57 +0000 BOOTPROTO=static USERCONTROL=no STARTMODE=auto ETHTOOL_OPTIONS= IPADDR=192.168.168.30/24 but it didn't bring the interface up, so I had to run 'ifdown eth0' and then 'ifup eth0'. But the instance still has no gateway set. I'll attach new logs for this run. Additionally, I tried a little "patch" from a year ago where we already had the gateway issue: ---cut here--- --- cloudinit/net/eni.py.dist 2016-08-10 07:05:38.000000000 +0200 +++ cloudinit/net/eni.py 2016-08-18 08:31:51.520458400 +0200 @@ -338,6 +338,7 @@ up = indent + "post-up route add" down = indent + "pre-down route del" or_true = " || true" + gateway = indent + "gateway " mapping = { 'network': '-net', 'netmask': 'netmask', @@ -346,6 +347,7 @@ } if route['network'] == '0.0.0.0' and route['netmask'] == '0.0.0.0': default_gw = " default gw %s" % route['gateway'] + content.append(gateway + route['gateway']) content.append(up + default_gw + or_true) content.append(down + default_gw + or_true) elif route['network'] == '::' and route['netmask'] == 0: ---cut here--- This configured the instance's network correctly, including the gateway: ---cut here--- clean-cloudinit17:~ # cat /etc/sysconfig/network/ifcfg-eth0 # Created by cloud-init v. 17.1 on Thu, 16 Nov 2017 15:15:57 +0000 BOOTPROTO=static USERCONTROL=no STARTMODE=auto ETHTOOL_OPTIONS= IPADDR=192.168.168.30/24 GATEWAY=192.168.168.1 clean-cloudinit17:~ # cat /etc/sysconfig/network/ifroute-eth0 default 192.168.168.1 ---cut here--- Although I'm not sure why there is both now, GATEWAY in ifcfg-eth0 and in ifroute-eth0. I'll attach the new log files, they will contain two runs, 15:26 and 15:47.
Created attachment 749026 [details] clout-init.log new instance
Created attachment 749028 [details] clout-init-output.log from new instance
OK, thanks this is helpful in at least figuring out the flow of where we are. From the log, this is the information about your network configuration that is being read form the config drive: {'version': 1, 'config': [{'subnets': [{u'routes': [{u'netmask': u'0.0.0.0', u'network': u'0.0.0.0', u'gateway': u'192.168.168.1'}], u'netmask': u'255.255.255.0', u'type': 'static', 'ipv4': True, 'address': u'192.168.168.30'}], 'mac_address': u'fa:16:3e:ee:2b:97', u'type': 'physical', 'name': 'eth0', u'mtu': 1500}]} This explains why the gateway shows up in ifroute-eth0, it's in the configuration {u'routes': [{u'netmask': u'0.0.0.0', u'network': u'0.0.0.0', u'gateway': u'192.168.168.1'} but certainly the keyword GATEWAY should not be in that file. Anyway, the information read is then turned into a Debian style network configuration, from the log: """ auto lo iface lo inet loopback auto eth0 iface eth0 inet static hwaddress fa:16:3e:ee:2b:97 address 192.168.168.30/24 mtu 1500 post-up route add default gw 192.168.168.1 || true pre-down route del default gw 192.168.168.1 || true """ As you can see the gateway information has already disappeared. And this is then turned into yet another format that is getting processed in the openSUSE specific code and that looks like this: {u'lo': {'auto': True, 'ipv6': {}}, u'eth0': {'auto': True, 'bootproto': u'static', 'address': u'192.168.168.30/24', 'ipv6': {}}} and as you can see, more information has disappeared. Which is why this is not working. I'll see if I can figure out where the information is getting dropped.
What does the configuration look like that you pass to cloud-init via config drive? After a second look I noticed that the "original" information contains no gateway data for the interface itself, it only contains the gateway information for the routes.
And one more thing, a network configuration that looks like the following will work, that's from the unit test I added yesterday. auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 192.168.1.5 broadcast 192.168.1.0 gateway 192.168.1.254 netmask 255.255.255.0 network 192.168.0.0
Please try it now, the change log of the new package references this bug.
(In reply to Robert Schweikert from comment #33) > What does the configuration look like that you pass to cloud-init via config > drive? > > After a second look I noticed that the "original" information contains no > gateway data for the interface itself, it only contains the gateway > information for the routes. This is from the mounted config drive: clean-cloudinit17:~ # mount /dev/sr0 /mnt/ mount: /dev/sr0 ist schreibgeschützt, wird eingehängt im Nur-Lese-Modus clean-cloudinit17:~ # cat /mnt/openstack/latest/network_data.json {"services": [], "networks": [{"network_id": "be1a07e0-d998-4a74-847a-d44754055229", "type": "ipv4", "netmask": "255.255.255.0", "link": "tapf6d6f106-c6", "routes": [{"netmask": "0.0.0.0", "network": "0.0.0.0", "gateway": "192.168.168.1"}], "ip_address": "192.168.168.30", "id": "network0"}], "links": [{"ethernet_mac_address": "fa:16:3e:ee:2b:97", "mtu": 1500, "type": "bridge", "id": "tapf6d6f106-c6", "vif_id": "f6d6f106-c6c3-4966-95c9-7faace27fa00"}]} > Please try it now, the change log of the new package references this bug. I guess I'll have to wait until the repository contains the latest build. Right now the repo contains cloud-init-17.1-6.4.x86_64.rpm while the package information already shows cloud-init-17.1-7.1.x86_64.rpm with a build date from yesterday. I'll report back as soon as I have tested it.
(In reply to Eugen Block from comment #36) > I guess I'll have to wait until the repository contains the latest build. I just downloaded the rpm files and reinstalled cloud-init instead of waiting for it to appear in the repo. I still don't get the instance configured correctly at first boot, but first another error in the latest changes, there's wrong indentation in the file /usr/lib/python2.7/site-packages/cloudinit/distros/net_util.py, line 169 (copied from OBS): ++ if 'post-up' in info: ++ routes = info['post-up'] ++ if isinstance(routes, list): ++ for route_info in routes: ++ if 'default gw' in route_info: ++ iface_info['gateway'] = ipv4.search( ++ route_info).group(0) ++ elif 'default gw' in route_info: ++ iface_info['gateway'] = ipv4.search(route_info).group(0) which leads to an error: File "/usr/lib/python2.7/site-packages/cloudinit/distros/net_util.py", line 169, in translate_network elif 'default gw' in route_info: UnboundLocalError: local variable 'route_info' referenced before assignment But even after fixing this I don't understand the workflow and why it is failing. After restart of the cloud-init services it says it can't find a valid datasource and falls back to the DataSourceNone. Could it be a problem if my base image was launched in an internal network with its metadata provided by datasource OpenStack but the snapshot is supposed to boot in an external network? I can't seem to figure out which steps exactly there are to be performed to have a valid base image. Currently, I have this cloud.cfg: ---cut here--- ebl1:~ # egrep -ve "^#|^$" /etc/cloud/cloud.cfg users: - default disable_root: false preserve_hostname: false datasource_list: [ 'ConfigDrive' ] cloud_init_modules: - migrator - seed_random - bootcmd - write-files - growpart - resizefs - disk_setup - mounts - set_hostname - update_hostname - update_etc_hosts - ca-certs - rsyslog - ssh cloud_config_modules: - ssh-import-id - locale - set-passwords - ntp - timezone - disable-ec2-metadata - runcmd cloud_final_modules: - package-update-upgrade-install - puppet - chef - salt-minion - mcollective - rightscale_userdata - scripts-vendor - scripts-per-once - scripts-per-boot - scripts-per-instance - scripts-user - ssh-authkey-fingerprints - keys-to-console - phone-home - final-message - power-state-change system_info: # This will affect which distro class gets used distro: opensuse # Default user name + that default users groups (if added/used) # Other config here will be given to the distro class and/or path classes paths: cloud_dir: /var/lib/cloud/ templates_dir: /etc/cloud/templates/ ssh_svcname: sshd ---cut here--- and I remove all data under /var/lib/cloud. I also tried it with removing /run/cloud-init, but it fails everytime. Is there anything else I'm not aware of?
(In reply to Eugen Block from comment #37) > But even after fixing this I don't understand the workflow and why it is > failing. After restart of the cloud-init services it says it can't find a > valid datasource and falls back to the DataSourceNone. As I took a closer look to the latest changes it turned out to be the correct indentation but not the right variable. When I replaced 'route_info' with 'routes' the code works: ---cut here--- test-cloudinit3:~ # diff -u /usr/lib/python2.7/site-packages/cloudinit/distros/net_util.py /usr/lib/python2.7/site-packages/cloudinit/distros/net_util.py.mod --- /usr/lib/python2.7/site-packages/cloudinit/distros/net_util.py 2017-11-17 15:53:05.924000000 +0100 +++ /usr/lib/python2.7/site-packages/cloudinit/distros/net_util.py.mod 2017-11-17 15:09:50.464000000 +0100 @@ -166,8 +166,8 @@ if 'default gw' in route_info: iface_info['gateway'] = ipv4.search( route_info).group(0) - elif 'default gw' in route_info: - iface_info['gateway'] = ipv4.search(route_info).group(0) + elif 'default gw' in routes: + iface_info['gateway'] = ipv4.search(routes).group(0) # If ipv6 is enabled, device will have multiple IPs, so we need to # update the dictionary instead of overwriting it... ---cut here--- But that's not all, I believe the systemd-unit 'cloud-init-local' has a wrong dependency because during reboot of my test instance I found these syslog messages: ---cut here--- Nov 17 15:27:08 test-cloudinit2 systemd[1]: cloud-init-local.service: Job cloud-init-local.service/stop deleted to break ordering cycle starting with basic.target/stop Nov 17 15:27:22 test-cloudinit2 systemd[1]: basic.target: Found dependency on cloud-init-local.service/start Nov 17 15:27:22 test-cloudinit2 systemd[1]: basic.target: Breaking ordering cycle by deleting job cloud-init-local.service/start ---cut here--- This prevented the instance to perform the initial steps so the rest of the configuration also failed. So I changed 'Before' to 'After': ---cut here--- test-cloudinit3:~ # diff -u /usr/lib/systemd/system/cloud-init-local.service /usr/lib/systemd/system/cloud-init-local.service.mod --- /usr/lib/systemd/system/cloud-init-local.service 2017-11-17 15:53:30.756000000 +0100 +++ /usr/lib/systemd/system/cloud-init-local.service.mod 2017-11-17 15:41:42.600000000 +0100 @@ -7,7 +7,9 @@ Before=shutdown.target # Other distros use Before=sysinit.target. There is not a clearly identified # reason for usage of basic.target instead. +# Before=basic.target After=basic.target +# After=sysinit.target Conflicts=shutdown.target RequiresMountsFor=/var/lib/cloud ---cut here--- and this worked! I removed all data from previous runs and snapshotted that instance, launching a new one in an external network gave me a correctly configured instance. ---cut here--- test-cloudinit3:~ # cat /etc/sysconfig/network/ifcfg-eth0 # Created by cloud-init v. 17.1 on Fri, 17 Nov 2017 14:42:20 +0000 BOOTPROTO=static USERCONTROL=no STARTMODE=auto ETHTOOL_OPTIONS= IPADDR=192.168.168.30/24 GATEWAY=192.168.168.1 test-cloudinit3:~ # cat /etc/sysconfig/network/ifroute-eth0 default 192.168.168.1 ---cut here--- This looks promising to me! :-) Although I'm not sure about the "After=basic.target" ;-)
There's one more thing I observed. I wanted to make my base image compatible with both datasources, ConfigDrive (for external networks) and OpenStack (for internal networks, metadata provided by neutron via magic ip). So now I have two entries in cloud.cfg: test-cloudinit2:~ # grep datasource_list /etc/cloud/cloud.cfg datasource_list: [ 'ConfigDrive', 'OpenStack' ] Launching this image in an internal network seems to work fine, but the same image fails again in an external network with config-drive. I found some information on that, e. g.: "datasource_list: [ NoCloud, ConfigDrive, OpenNebula, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, Ec2, CloudStack, None ] (/etc/cloud/cloud.cfg.d/90_dpkg.cfg) By default, first datasource listed is the priority" So I assumed this would apply, but id doesn't. For some reason the instance first tries to read OpenStack datasource and then falls back to DataSourceNone again, and the vm is not reachable. Removing OpenStack datasource from cloud.cfg within the new instance and restarting the services (after cleaning up) works. So my question is: which list entry is read first and why doesn't cloud-init try all datasources before using the fallback DataSourceNone?
Created attachment 749197 [details] cloud-init.log vm with two datasources
Created attachment 749199 [details] cloud-init-output.log vm with two datasources
First off, thanks for testing and sorry for fat-fingering the patch. I was in a hurry last night, had a dinner date with my wife and wanted to get you something for testing. Fixed patch has been pushed to Cloud:Tools :) I will write some tests and then will push this upstream. As far as you questions are concerned, one thing I noticed you had the datasources in '' when you list them for both OpenStack and ConfigDrive, i.e. datasource_list: [ 'ConfigDrive', 'OpenStack' ] but I there should be no quotes datasource_list: [ ConfigDrive, OpenStack ] I am not certain whether or not that makes a difference, but it maybe worth a try. I agree that the code should test all data sources before falling back to NoCloud. I have not looked at the code yet that handles this part thus I cannot really answer your question. Since I fixed the patch and you already confirmed that network configuration with ConfigDrive works again :) I will close the bug. Thanks again for the testing!
FYI, applied another patch to deal with the cycle.
(In reply to Robert Schweikert from comment #42) > First off, thanks for testing and sorry for fat-fingering the patch. I was > in a hurry last night, had a dinner date with my wife and wanted to get you > something for testing. I don' want to be responsible for any marriage troubles, this is not that important ;-) > As far as you questions are concerned, one thing I noticed you had the > datasources in '' when you list them for both OpenStack and ConfigDrive, i.e. > > datasource_list: [ 'ConfigDrive', 'OpenStack' ] > > but I there should be no quotes > > datasource_list: [ ConfigDrive, OpenStack ] > > I am not certain whether or not that makes a difference, but it maybe worth > a try. I agree that the code should test all data sources before falling > back to NoCloud. I have not looked at the code yet that handles this part > thus I cannot really answer your question. I gave it another shot without the single quotes, but the result is the same. Although ConfigDrive is the first datasource in this list, cloud-init first tries to get data from OpenStack and then falls back to None. Should I file another bug report for this?
(In reply to Robert Schweikert from comment #42) > > datasource_list: [ ConfigDrive, OpenStack ] > > I am not certain whether or not that makes a difference, but it maybe worth > a try. I agree that the code should test all data sources before falling > back to NoCloud. I have not looked at the code yet that handles this part > thus I cannot really answer your question. Another note: If I keep only one datasource (ConfigDrive) in /etc/cloud/cloud.cfg in my base image it works for both our scenarios: - If the new instance is launched in an external network and has a config drive attached it reads the information successfully. - If I use the same image to launch an instance in an internal network the ConfigDrive datasource fails, of course, and the fallback datasource tries to configure eth0 by dhcp which is successful. Now the instance can reach metadata server and is configured correctly. This would work for us, I guess.
(In reply to Eugen Block from comment #45) <snip> > > I gave it another shot without the single quotes, but the result is the > same. Although ConfigDrive is the first datasource in this list, cloud-init > first tries to get data from OpenStack and then falls back to None. Should I > file another bug report for this? Yes please
This is an autogenerated message for OBS integration: This bug (1064854) was mentioned in https://build.opensuse.org/request/show/543937 Factory / cloud-init