Multiple reboots on a Mac Xserve Intel running five parallel VM

Discussion in 'General Questions' started by amichon, Aug 7, 2007.

  1. amichon

    amichon Bit poster

    Messages:
    4
    Hello,

    Situation : i have two Mac Xserve Intel with 8Gb of memory, 2x2.6ghz connected to a XServe Raid by Xsan. Both run Parallels, one with 5 virtual machines and one with two virtual machines. All files needed by the virtuals machines are stored on the XServe Raid.

    For a month, the first Xserve (with 5 VM) reboots every 9-12 days, and the second one times in one month. We have studied all possible hardware problems without any results. Parallels is the only applicaton which running on the two Xserves so we suspect Parallels.


    An extract from the logs just before the reboot (/var/log/system) :
    Aug 2 08:39:26 gridmac01 kernel[0]: [err in mmIoWirePages ] :
    Failed to lock memory region
    -> ../host.Darwin/mm/io_memory.cpp:531
    Aug 2 08:39:26 gridmac01 kernel[0]: [err in hostLockPages ] :
    Failed to wire guest memory pages down
    -> ../host.Darwin/mm/generic.c:279
    Aug 2 08:39:26 gridmac01 kernel[0]: [err in pmmLockRegion ] :
    Can't Lock memory in PMM region
    -> pmm.c:206
    Aug 2 08:39:26 gridmac01 kernel[0]: [err in mmIoWirePages ] :
    Failed to lock memory region
    -> ../host.Darwin/mm/io_memory.cpp:531
    Aug 2 08:39:26 gridmac01 kernel[0]: [err in hostLockPages ] :
    Failed to wire guest memory pages down
    -> ../host.Darwin/mm/generic.c:279
    Aug 2 08:39:26 gridmac01 kernel[0]: [err in pmmLockRegion ] :
    Can't Lock memory in PMM region
    -> pmm.c:206
    Aug 2 08:40:19 gridmac01 kernel[0]: [err in mmIoWirePages ] :
    Failed to lock memory region
    -> ../host.Darwin/mm/io_memory.cpp:531
    Aug 2 08:40:19 gridmac01 kernel[0]: [err in hostLockPages ] :
    Failed to wire guest memory pages down
    -> ../host.Darwin/mm/generic.c:279
    Aug 2 08:40:19 gridmac01 kernel[0]: [err in pmmLockRegion ] :
    Can't Lock memory in PMM region
    -> pmm.c:206
    Aug 2 08:45:49 localhost kernel[0]: FC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 7 (Link Status Change) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionFC: Link is active for SCSI Domain = 1.
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 6 (Rescan) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1
    Aug 2 08:45:49 localhost kernel[0]: HFS: Removed 11 orphaned unlinked files
    Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 22 (SAS Discovery) for SCSI Domain = 0
    Aug 2 08:45:49 localhost kernel[0]: Discovery condition = 0x00000000
    Aug 2 08:45:49 localhost kernel[0]: Jettisoning kernel linker.
    Aug 2 08:45:49 localhost kernel[0]: Resetting IOCatalogue.
    Aug 2 08:45:49 localhost kernel[0]: Matching service count = 0
    Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9
    Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9
    Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9
    Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9
    Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9
    Aug 2 08:45:49 localhost kernel[0]: Previous Shutdown Cause: -62
    Aug 2 08:45:49 localhost kernel[0]: Apple16X50ACPI1: IdentifiedSerial Port on ACPI Device=UAR1
    Aug 2 08:45:49 localhost kernel[0]: Apple16X50ACPI::start FOUND DB9 Property for AAPL,connector
    Aug 2 08:45:49 localhost kernel[0]: Apple16X50UARTSync: Detected 16550AF/C/CF IFO=16 MaxBaud=115200
    Aug 2 08:45:49 localhost mDNSResponder-108.5 (May 9 2007 17:14:16)[69]: starting
    Aug 2 08:45:49 localhost memberd[81]: memberd starting up
    Aug 2 08:45:49 localhost DirectoryService[87]: Launched version 2.1 (v353.6)
    Aug 2 08:45:50 localhost lookupd[88]: lookupd (version 369.6) starting - Thu Aug 2 08:45:50 2007
    Aug 2 08:45:50 localhost watchdogtimerd: Automatic reboot timer enabled.\n

    Any ideas to solve my big problem? Did somebody already have the same problem?


    I look forward to hearing from you and thank you for your time and consideration.
    Alexis
     
  2. amichon

    amichon Bit poster

    Messages:
    4
    More informations

    More informations :

    This is the last entry in /Library/logs/panic.log :

    *********

    Wed Aug 22 01:18:54 2007
    panic(cpu 1 caller 0x00141505): zalloc: "kalloc.256" (1560288 elements) retry fail 3
    Backtrace, Format - Frame : Return Address (4 potential args on stack)
    0x4bc3bcf8 : 0x128d08 (0x3cc0a4 0x4bc3bd1c 0x131de5 0x0)
    0x4bc3bd38 : 0x141505 (0x3ccdb0 0x3cc34c 0x17cee0 0x3)
    0x4bc3bd98 : 0x12d97d (0x186eb70 0x1 0x4bc3bdc8 0x12db75)
    0x4bc3bdc8 : 0x12e192 (0xa8 0x1 0x4bc3be18 0x140259)
    0x4bc3bdf8 : 0x646eef36 (0xa8 0xa89b400 0x4bc3be58 0x1a2e7e)
    0x4bc3be28 : 0x646ea441 (0x64744a48 0x0 0x603d98e0 0x20)
    0x4bc3be98 : 0x647080d8 (0x18 0x0 0x2 0x0)
    0x4bc3bee8 : 0x64708bab (0xa435000 0xaf22e00 0x18 0x646eefd1)
    0x4bc3bf78 : 0x646ed882 (0xa435000 0xa435014 0xae918c4 0x0)
    0x4bc3bfc8 : 0x19ad2c (0xae918b0 0x0 0x19e0b5 0xa2f9e68) Backtrace terminated-invalid frame pointer 0x0
    Kernel loadable modules in backtrace (with dependencies):
    com.apple.filesystems.acfs(2.7.2)@0x6468e000
    dependency: com.apple.iokit.IOStorageFamily(1.5.1)@0x6375a000

    Kernel version:
    Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386


    *********

    Acfs is the file system on the XSAN, but i have four others Xserve which read/write on it without any problems.
    Any ideas?

    Alexis Michon
     
  3. Eru Ithildur

    Eru Ithildur Forum Maven

    Messages:
    1,954
    I had a similar issue before, it turned out to be that a memory module had become unseated during transportation. Close everything down and run a memtest.
     
  4. amichon

    amichon Bit poster

    Messages:
    4
    Thanks you for your answer.

    I make several Memtest at each reboot and it's ok. No problems.
    All Memory Cards (8 x 1Gb) have been changed in March 2007.

    Like you i think that my problem is linked with memory but all the memtest are ok so ...
    IMO, there is two possibilities :
    - a very very small problem with a memory card.
    - Parallels don't like acfs (Xserve Raid).

    I hope that this isn't a bug in mac os X, somebody told me that the upgrade to version 10.4.10 breaks the Xsan but without additional informations.


    Alexis
     
  5. Eru Ithildur

    Eru Ithildur Forum Maven

    Messages:
    1,954
    There are some issues with XSan and X.4.10. Have you tried rolling back? It doesn't look like an issue with the storage though. Also, the issues are not always this bad.

    I think I just might have found the issue though. I should have zeroed in on it the first time. What is fusion (did you ever install fusion?) doing in your logs? It looks like they may be conflicting (known problem), have you tried zapping both completely from your system and re-installing parallels?

    I don't run Parallels on a xRAID, although I believe in the past there was some discussion about Parallels not always behaving the best on a RAID array.

    Let me know about the Fusion.
     
  6. amichon

    amichon Bit poster

    Messages:
    4
    Fusion? wmware fusion ? never installed on this machine. In these messages, there is the chain "SCSI" i supposed they are linked with xRaid

    Rolling back from 10.4.10 isn't possible.
     
  7. Eru Ithildur

    Eru Ithildur Forum Maven

    Messages:
    1,954
    Well, it might be, I don't administer RAIDs (well, if you caught me a month or two from now I would have been studying), and I haven't their logs. I do know someone who does, I'll bug him today.
     

Share This Page