Host system deadlock/hang when taking Linux VM snapshot

Discussion in 'General Questions' started by Forrest Gump, Jan 31, 2009.

  1. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    Host system deadlock/hang when taking VM snapshot

    When I take a snapshot of a VM, my entire host system will have a near-100% chance of experiencing a deadlock. The dock icon for Parallels will stop animating, other running apps will continue to run for a little bit, but eventually the entire system will be deadlocked with the cursor still able to move, but is a spinning beach ball.

    I'm running Build 4.0.3810 (January 23, 2009) and my host OS is OS X Leopard.

    Attempting to create a snapshot of a VM has caused a host OS deadlock in all of the following circumstances:
    - Lots of disk changes (500MB or more), or almost zero disk change
    - Guest VM still running from a previous snapshot from a previous Parallels build, or cleanly rebooted with the current build

    When the system deadlock happens, there is no disk activity. I've let it go for over 2 hours just to be certain that it wasn't just being slow. When coming back to the computer, it's just got a black screen, and a beachball cursor (which still moves around with the trackpad, but is no longer spinning). When I power down the machine and restart it, the VM now indicates that it has no snapshots available. Looking in the pvm directory for the VM, I do see a Snapshots.xml file which doesn't look obviously corrupt, and the Snapshots directory does contain snapshot files. However, there will be a .mem file with a size of zero bytes. For example:

    $ ls -alrt
    total 1738664
    -rwxr--r-- 1 <user> <user> 9710 Nov 12 15:25 {49f85a5c-35ec-4685-b485-f35f86f98ef0}.pvc.backup
    -rwxr--r-- 1 <user> <user> 59205 Nov 12 17:47 {e81b1cc0-9d76-403a-9e8d-2ec3f276cc46}.png
    -rwxr--r-- 1 <user> <user> 285212672 Nov 12 17:47 {e81b1cc0-9d76-403a-9e8d-2ec3f276cc46}.mem
    -rwxr--r-- 1 <user> <user> 9784 Nov 12 17:47 {e81b1cc0-9d76-403a-9e8d-2ec3f276cc46}.pvc
    -rwxr--r-- 1 <user> <user> 1024 Nov 12 17:47 {e81b1cc0-9d76-403a-9e8d-2ec3f276cc46}.mem.trc
    -rwxr--r-- 1 <user> <user> 11355236 Nov 12 17:48 {e81b1cc0-9d76-403a-9e8d-2ec3f276cc46}.sav
    -rwxr--r-- 1 <user> <user> 178621 Jan 30 16:03 {604fcd24-76ae-4542-ae36-c4fe24306110}.png
    -rwxr--r-- 1 <user> <user> 285212672 Jan 30 16:03 {604fcd24-76ae-4542-ae36-c4fe24306110}.mem
    -rwxr--r-- 1 <user> <user> 9860 Jan 30 16:03 {604fcd24-76ae-4542-ae36-c4fe24306110}.pvc
    -rwxr--r-- 1 <user> <user> 1024 Jan 30 16:04 {604fcd24-76ae-4542-ae36-c4fe24306110}.mem.trc
    -rwxr--r-- 1 <user> <user> 11355572 Jan 30 16:04 {604fcd24-76ae-4542-ae36-c4fe24306110}.sav
    drwxr-xr-x@ 10 <user> <user> 340 Jan 31 12:52 ..
    -rw-r--r-- 1 <user> <user> 196498 Jan 31 14:07 {43f5ee67-1da9-4f4f-bb0a-767f7aaa1b66}.png
    -rw-rw-rw- 1 <user> <user> 0 Jan 31 14:07 {43f5ee67-1da9-4f4f-bb0a-767f7aaa1b66}.mem
    drwxr-xr-x 16 <user> <user> 544 Jan 31 14:07 .
    -rw-rw-rw- 1 <user> <user> 11355572 Jan 31 14:07 {43f5ee67-1da9-4f4f-bb0a-767f7aaa1b66}.sav

    The last snapshot attempt at the bottom failed, and consists of a 0-byte .mem file. The PNG file looks fine, and I can't tell what the .sav file is, or if it is valid.

    I have 18GB disk space available, which is significantly larger than any individual VM in its entirety. My hard drive is fine, which has been verified by looking at the drive's SMART data via smartmontools, and the filesystem is OK per OS X Disk Utility.

    Not only am I losing data in my guest VMs, but I am also losing data on my host OS as I am forced to power it down uncleanly. This is a huge problem, and I have never had this problem with Parallels 3.0.
     
    Last edited: Feb 2, 2009
  2. John@Parallels

    John@Parallels Forum Maven

    Messages:
    6,333
    Try to stop VN and delete mem sav files if there are any
    Also try create Snapshot when VM is stopped, is there any difference
    - Lots of disk changes (500MB or more), or almost zero disk change
    It may take time to calculate changes, what is CPU activity at that time

    - Guest VM still running from a previous snapshot from a previous Parallels build, or cleanly rebooted with the current build
    Is there Snapshots folder and what is inside?
     
  3. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    I tried deleting the .sav files, and it didn't make any of the existing snapshots appear in the GUI again. And it didn't prevent a new snapshot from deadlocking the host OS either. So no, that doesn't seem to help at all. However, creating a snapshot of a powered-off Linux VM does work. So it appears that the problem is with the snapshotting of a powered-on Linux VM, or the associated "Data Synchronization" process. Is there a way to disable the background "Data Synchronization" part of snapshotting? If so, maybe that could help in narrowing down the problem.

    CPU is normal at this time, but within about 20 seconds or so of system deadlock, any software that is used to monitor CPU usage (e.g. top from a terminal or Activity Monitor) stops responding. So it's not possible to tell CPU or disk usage at this point. As I mentioned in my original post, I can let the system go for hours and it will not proceed any further, so it's not a timing thing. Also, snapshotting a live VM where the disk has changed by nearly zero has the same results, so it's not that I'm being impatient.

    Yeah, the snapshots folder is there. It looks like what I posted in my original message above.
     
  4. John@Parallels

    John@Parallels Forum Maven

    Messages:
    6,333
  5. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    Done. The ticket ID is: 671216
     
  6. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    OK, I've done some further testing and this also happens with Windows VMs. It may have been just coincidence that I was using Linux VMs at the time. I've also tried removing any non-Apple Kext modules (aside from the 5 parallels ones) and the system hangs still occur. Even as the machine is first booting up (e.g. at the Grub menu), taking a snapshot will result in a system hang as described above.
     
  7. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    Ok, IMPORTANT UPDATE HERE!
    I tried reverting to a previous build of Parallels 4.0 (4.0.3522 - November 8, 2008) and creating snapshots works just fine! I haven't narrowed it down to the 4.0.3810 update or the incremental update that came out on January 23, 2009 (but still indicates that it's build 3810?!).
     
  8. John@Parallels

    John@Parallels Forum Maven

    Messages:
    6,333
    Please check ticket
     
  9. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    Ok, I think I've got this one nailed down. The snapshot problem appears to be tied to a faulty "PDMU4021" update, which was released in January. Part of the issue here is that there are two different 4.0.3810 builds:
    ParallelsDesktop-parallels-en_US-4.0.3810.237520.dmg
    ParallelsDesktop-parallels-en_US-4.0.3810.351321.dmg

    In my case, installing the former and then installing the PDMU4021 after that caused my snapshot symptoms above. However, installing the latter (which apparently includes PDMU4021) by itself works fine. John, you could perhaps try those steps to reproduce the problem. If the incremental PDMU4021 is at fault, then this could be affecting a good number of users.

    Update
    Ok, further testing has shown that 4.0.3810.351321 will also deadlock the host OS. e.g. when taking a snapshot when a linux VM is at the grub screen is an easy test. I have still yet to see a deadlock with 4.0.3522, so I'm going to stick with that for now. Will post updates if I have any.
     
    Last edited: Feb 3, 2009
  10. John@Parallels

    John@Parallels Forum Maven

    Messages:
    6,333
    I will check
     
  11. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    Just a quick note:
    Reverting to a previous version has the somewhat-expected side effect of interfering with reverting to snapshots created with a newer version of Parallels. Reverting to a newer snapshot requires:
    1) Killing the parallels VM task (It's running, but not connected to the GUI)
    2) Booting the VM up from a powered-off state

    Also, I'd be very interested if anybody (John or forum readers) can confirm or deny that they are seeing the symptoms that I am.
     
  12. William McCloskey

    William McCloskey Bit poster

    Messages:
    2
    I am experiencing and trying to troubleshoot this problem

    I am experiencing and trying to troubleshoot this problem

    I've read quickly through this thread, and believe I am seeing the same thing. It just hung during Windows shapshot with Data synchronization... message on lower left corner - spinning forever... I don't see any zero size .mem files. This is a problem. Has happened several times now. Have no idea how to "revert to other builds."

    Help.

    Bill McCloskey
     
  13. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    I've done some more investigation into this issue today, and my current theory is that the system deadlock is related to snapshotting a VM that lives in a user profile where FileVault is enabled.

    Bill: Do you have FileVault enabled?

    Also, I've noticed Parallels occasionally leave the "Data Synchronization" label in the corner, even though it's done with it. If you did not experience a system-wide deadlock (hang) then maybe this is what's happening with you.
     
    Last edited: Mar 25, 2009
  14. William McCloskey

    William McCloskey Bit poster

    Messages:
    2
    No - File Vault is not enabled. I've experienced this not only on my Mac System disk, but a USB drive, and a network drive. Ehh... Writing/Reading from a 1 GBit interface to a 1.5 GBite file-server really cuts down on local disk thrashing, but I digress.

    And the spinning gear keeps going.

    FYI - Don't try putting the virtual machine on an Airport Extreme attached USB drive. Especially if you haven't upgraded the AE firmware. This can and did hang the entire AE, USB drive, and Parallels host computer. When I upgraded the firmware on the AE, things quieted down a little, but I experienced one or two more hangs - suffice it to say, my Parallels VM is now on a local attached firewire drive. Again, I digress.

    I've turned off my auto-snapshot until this problem gets fixed. Recovering a crashed VM is sometimes difficult if not impossible without a 10-second hold on the ol' power supply. FYI - I experienced a similar problem with an Open Solaris VM, but its symptom was that when I paused the VM, it never really quit. The spinning gear kept going and I ended up doing the ol' 10 second power hold, etc.

    Cheers.
     
  15. Forrest Gump

    Forrest Gump Bit poster

    Messages:
    36
    Hi Bill,

    What I meant by reverting to a previous build is this:
    1) Uninstall Parallels by using the app in the original Parallels DMG. (This may be optional.. I'm not certain)
    2) Install an older version of Parallels. e.g., the one I was using is this:
    http://download.parallels.com/desktop/v4/en_us/parallels/update1/Parallels-Desktop-4.0.3522.205912.dmg

    When I was using 4.0.3522, I could snapshot to my heart's content without fear of system deadlock. Newer builds exhibited the deadlock problem. Note that after reverting, you may not be able to jump to snapshots that were created with a newer version of Parallels. I forget exactly the situation, but you may be able to cold boot those snapshots. (i.e. rather than jump to the powered-on state, you may be able to boot it up from a powered off state)

    Anyway, if you test this out, please post your results here. If reverting to 4.0.3522 fixes your deadlock problem, then perhaps this could mean that the problem with the newer Parallels builds is not specific to FileVault, but rather that there is a more general issue such as a race condition that is triggered by FileVault or other things that can affect timing.
     

Share This Page