How to discover unhealthy non-global zones and fix them in Solaris

Discovering unhealthy non-global zone states

The Solaris Zone may be in different unhealthy situations, this post covers the following states:

Zone state is unavailable
Zone may be stuck in shutting_down or down state
Zone is not running when the autoboot configuration property is set

To display the current Zone state with Zone name use the following zoneadm command:

# zoneadm -z [ZONE] list -v

To display a list of all zones with states and names use:

# zoneadm list -cv

For Example:

# /usr/sbin/zoneadm list -cv
ID   NAME      STATUS       PATH              BRAND       IP
 0   global    running      /                 native      shared
[ID] [ZONE]    running      /[ZONE_PATH]      solaris     excl

If the zone is not running check if the autoboot property is set to true using the following command:

# zonecfg -z zonename info autoboot

Fixing unavailable state Zone

An unavailable state indicates that the zone has been installed but cannot be verified, made ready, booted, attached or moved and it will not self-correct. A zone enters an unavailable state at the following times:

When the zone’s storage is unavailable and svc:/system/zones:default begins, such as during system boot.
When the zone’s storage is unavailable.
When archive-based installations fail after successful archive extraction.
When the zone’s software is incompatible with the global zone’s software, such as after an improper -F (force) attach.

For kernel zones:

As a kernel zone is readied or booted, the host data is read to determine if the kernel zone’s boot storage is in use on another system. If it is in use on another system, the kernel zone will enter the unavailable state and an error message will indicate which system is using it. If it is certain that the storage is not in use on another system, the kernel zone can be repaired by using the -x force-takeover extended option to zoneadm attach. See the warning below before executing this command.

If the encryption key is inaccessible, the host data and any suspend image will not be readable. In such a circumstance, any attempt to ready or boot the zone will cause the zone to enter the unavailable state. If recovery of the encryption key is not possible, the -x initialize-hostdata extended option to the zoneadm attach sub-command can be used to generate a new encryption key and host data. See the warning below before executing this command.

WARNING: Forcing a take over or re-initialization of host data will make it impossible to detect if the zone is in use on any other system. Running multiple instances of a zone that references the same storage will lead to irreparable corruption of the zone’s file systems.

To prevent loss of the encryption key during a warm or cold migration, use zonecfg export on the source system to generate a command file to be used on the destination system. For example:

root@host1# zonecfg -z myzone export -f /net/.../myzone.cfg
root@host2# zonecfg -z myzone -f /net/.../myzone.cfg

Because myzone.cfg in this example contains the encryption key, it is important to protect its contents from disclosure. To move the zone out of an unavailable state you must first identify and possibly fix any problems related to shared storage resource connectivity or zone misconfiguration. The log file and service log may provide information to solve the problem. The log file can be found in the following directory: /var/log/zones/[zonename].{messages,console}

The service log can also be located using the following command:

# svcs -l svc:/system/zones:default | egrep ^logfile
logfile      /var/svc/log/system-zones:default.log

More information about the SMF service failure can be found by using the following command:

# svcs -xv svc:/system/zones:default

Use the following commands to identify the configuration issue(s) detected:

# zoneadm -z zonename verify
# zonecfg -z zonename verify

After fixing the zone run the following command to reattach the zone and move it into the installed zone state. If there are any outstanding issues, reattaching should be helpful in understanding what underlying problems need to be fixed before the attach can succeed.

# zoneadm -z zonename attach

It is also possible to uninstall a zone with the zoneadm uninstall command to move the zone back into the configured zone state.

Fixing shutting_down or down state Zone

The shutting_down or down state indicates that the Zone is being halted. The zone can become stuck in one of these states if it is unable to tear down the application environment state (such as mounted file systems) or if some portion of the virtual platform cannot be destroyed. In most cases, the non-global zone is waiting for a pending process and/or an unmounted file system. Such cases require operator intervention.

If the zone is stuck in shutting_down or down state it may be cleared by issuing the following command:

# zoneadm -z zonename halt

Usually it will require a reboot of the global zone which will also require an outage for all zones on the system. Other zones may be gracefully shut down before rebooting the global zone. To investigate which process did not stop or is still pending you may use the following command:

# mount -v | nawk '$3 ~ zonepath { print "fuser -c", $3 }' zonepath=ZonePath | sh -x
 + fuser -c /zones/t1
/zones/t1:
 + fuser -c /zones/t1/root
/zones/t1/root:
 + fuser -c /zones/t1/root/var
/zones/t1/root/var:    28626c   28605o

Fixing unhealthy Zone with autoboot property set

There may be a number of reasons why the Zone is not running. The following instructions will help you identify the reasons why the Zone is not running and point you to documentation to help resolve the issue. If the autoboot property is set to true and the Zone is not in an online state, check the service log, which may provide information to solve the problem.. The log file can be found in the following directory:

/var/log/zones/[zonename].{messages,console}

The service log can also be located using the following command:

# svcs -l svc:/system/zones:default | egrep ^logfile

More information about the SMF service failure can be found by using the following command:

# svcs -xv svc:/system/zones:default

If the service was in a disabled state or the issues described by the service log were fixed then enable the zone’s SMF service using the following command:

# svcadm enable svc:/system/zones:default

Another potential issue is with the zone configuration. Use the following commands to identify the configuration issue(s) detected:

# zoneadm -z zonename verify
# zonecfg -z zonename verify

After fixing the Zone configuration issues boot the zone using the following command:

# zoneadm -z zonename boot