Hello, Situation : i have two Mac Xserve Intel with 8Gb of memory, 2x2.6ghz connected to a XServe Raid by Xsan. Both run Parallels, one with 5 virtual machines and one with two virtual machines. All files needed by the virtuals machines are stored on the XServe Raid. For a month, the first Xserve (with 5 VM) reboots every 9-12 days, and the second one times in one month. We have studied all possible hardware problems without any results. Parallels is the only applicaton which running on the two Xserves so we suspect Parallels. An extract from the logs just before the reboot (/var/log/system) : Aug 2 08:39:26 gridmac01 kernel[0]: [err in mmIoWirePages ] : Failed to lock memory region -> ../host.Darwin/mm/io_memory.cpp:531 Aug 2 08:39:26 gridmac01 kernel[0]: [err in hostLockPages ] : Failed to wire guest memory pages down -> ../host.Darwin/mm/generic.c:279 Aug 2 08:39:26 gridmac01 kernel[0]: [err in pmmLockRegion ] : Can't Lock memory in PMM region -> pmm.c:206 Aug 2 08:39:26 gridmac01 kernel[0]: [err in mmIoWirePages ] : Failed to lock memory region -> ../host.Darwin/mm/io_memory.cpp:531 Aug 2 08:39:26 gridmac01 kernel[0]: [err in hostLockPages ] : Failed to wire guest memory pages down -> ../host.Darwin/mm/generic.c:279 Aug 2 08:39:26 gridmac01 kernel[0]: [err in pmmLockRegion ] : Can't Lock memory in PMM region -> pmm.c:206 Aug 2 08:40:19 gridmac01 kernel[0]: [err in mmIoWirePages ] : Failed to lock memory region -> ../host.Darwin/mm/io_memory.cpp:531 Aug 2 08:40:19 gridmac01 kernel[0]: [err in hostLockPages ] : Failed to wire guest memory pages down -> ../host.Darwin/mm/generic.c:279 Aug 2 08:40:19 gridmac01 kernel[0]: [err in pmmLockRegion ] : Can't Lock memory in PMM region -> pmm.c:206 Aug 2 08:45:49 localhost kernel[0]: FC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 8 (Loop State Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Loop Initialization Packet for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 7 (Link Status Change) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionFC: Link is active for SCSI Domain = 1. Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 6 (Rescan) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 9 (Logout) for SCSI Domain = 1 Aug 2 08:45:49 localhost kernel[0]: HFS: Removed 11 orphaned unlinked files Aug 2 08:45:49 localhost kernel[0]: FusionMPT: Notification = 22 (SAS Discovery) for SCSI Domain = 0 Aug 2 08:45:49 localhost kernel[0]: Discovery condition = 0x00000000 Aug 2 08:45:49 localhost kernel[0]: Jettisoning kernel linker. Aug 2 08:45:49 localhost kernel[0]: Resetting IOCatalogue. Aug 2 08:45:49 localhost kernel[0]: Matching service count = 0 Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9 Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9 Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9 Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9 Aug 2 08:45:49 localhost kernel[0]: Matching service count = 9 Aug 2 08:45:49 localhost kernel[0]: Previous Shutdown Cause: -62 Aug 2 08:45:49 localhost kernel[0]: Apple16X50ACPI1: IdentifiedSerial Port on ACPI Device=UAR1 Aug 2 08:45:49 localhost kernel[0]: Apple16X50ACPI::start FOUND DB9 Property for AAPL,connector Aug 2 08:45:49 localhost kernel[0]: Apple16X50UARTSync: Detected 16550AF/C/CF IFO=16 MaxBaud=115200 Aug 2 08:45:49 localhost mDNSResponder-108.5 (May 9 2007 17:14:16)[69]: starting Aug 2 08:45:49 localhost memberd[81]: memberd starting up Aug 2 08:45:49 localhost DirectoryService[87]: Launched version 2.1 (v353.6) Aug 2 08:45:50 localhost lookupd[88]: lookupd (version 369.6) starting - Thu Aug 2 08:45:50 2007 Aug 2 08:45:50 localhost watchdogtimerd: Automatic reboot timer enabled.\n Any ideas to solve my big problem? Did somebody already have the same problem? I look forward to hearing from you and thank you for your time and consideration. Alexis
More informations More informations : This is the last entry in /Library/logs/panic.log : ********* Wed Aug 22 01:18:54 2007 panic(cpu 1 caller 0x00141505): zalloc: "kalloc.256" (1560288 elements) retry fail 3 Backtrace, Format - Frame : Return Address (4 potential args on stack) 0x4bc3bcf8 : 0x128d08 (0x3cc0a4 0x4bc3bd1c 0x131de5 0x0) 0x4bc3bd38 : 0x141505 (0x3ccdb0 0x3cc34c 0x17cee0 0x3) 0x4bc3bd98 : 0x12d97d (0x186eb70 0x1 0x4bc3bdc8 0x12db75) 0x4bc3bdc8 : 0x12e192 (0xa8 0x1 0x4bc3be18 0x140259) 0x4bc3bdf8 : 0x646eef36 (0xa8 0xa89b400 0x4bc3be58 0x1a2e7e) 0x4bc3be28 : 0x646ea441 (0x64744a48 0x0 0x603d98e0 0x20) 0x4bc3be98 : 0x647080d8 (0x18 0x0 0x2 0x0) 0x4bc3bee8 : 0x64708bab (0xa435000 0xaf22e00 0x18 0x646eefd1) 0x4bc3bf78 : 0x646ed882 (0xa435000 0xa435014 0xae918c4 0x0) 0x4bc3bfc8 : 0x19ad2c (0xae918b0 0x0 0x19e0b5 0xa2f9e68) Backtrace terminated-invalid frame pointer 0x0 Kernel loadable modules in backtrace (with dependencies): com.apple.filesystems.acfs(2.7.2)@0x6468e000 dependency: com.apple.iokit.IOStorageFamily(1.5.1)@0x6375a000 Kernel version: Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 ********* Acfs is the file system on the XSAN, but i have four others Xserve which read/write on it without any problems. Any ideas? Alexis Michon
I had a similar issue before, it turned out to be that a memory module had become unseated during transportation. Close everything down and run a memtest.
Thanks you for your answer. I make several Memtest at each reboot and it's ok. No problems. All Memory Cards (8 x 1Gb) have been changed in March 2007. Like you i think that my problem is linked with memory but all the memtest are ok so ... IMO, there is two possibilities : - a very very small problem with a memory card. - Parallels don't like acfs (Xserve Raid). I hope that this isn't a bug in mac os X, somebody told me that the upgrade to version 10.4.10 breaks the Xsan but without additional informations. Alexis
There are some issues with XSan and X.4.10. Have you tried rolling back? It doesn't look like an issue with the storage though. Also, the issues are not always this bad. I think I just might have found the issue though. I should have zeroed in on it the first time. What is fusion (did you ever install fusion?) doing in your logs? It looks like they may be conflicting (known problem), have you tried zapping both completely from your system and re-installing parallels? I don't run Parallels on a xRAID, although I believe in the past there was some discussion about Parallels not always behaving the best on a RAID array. Let me know about the Fusion.
Fusion? wmware fusion ? never installed on this machine. In these messages, there is the chain "SCSI" i supposed they are linked with xRaid Rolling back from 10.4.10 isn't possible.
Well, it might be, I don't administer RAIDs (well, if you caught me a month or two from now I would have been studying), and I haven't their logs. I do know someone who does, I'll bug him today.