I've been trying to learn more about Parallels, how it works, how it does things, why it does certain things, etc. This is a summary of what I've discovered so far. I've broken up my observations into 2 groups, observations while Parallels is running in normal operation, and when Parallels is suspending/resuming. My observations began while investigating some threads mentioned here, so some of my observations have been pulled from my posts to other threads. I have added more and condensed my observations into a single post.
I hope others find these observations useful, and I'd be happy if anyone would like to correct or clarify my observations. I've included information about my observation environment at the end of this post, and I'd be happy to help anyone reproduce my observations.
Parallels normal operation:
During normal guest os operation, parallels does 4k+ reads and writes, asynchronously. The async writes are potentially done one-per thread, as a pthread_kill is issued, as well as a SIGUSR2 is caught for every write. This async write method would explain why every write() is preceded by an lseek() even for linearly sequential writes.
By far, the most common syscall performed by Parallels is an ioctl on /dev/vm-main. At least with a fedora core 5 guest running gnome-terminal, there appears to be many ioctls per cursor flash. I'm guessing at least some of these ioctls are for video operations.
The second most common syscall performed by Parallels is gettimeofday. These appear to happen almost once a second. Presumably for guest/host time synchronization.
Another odd behavior is Parallels appears to statfs the user's home directory every few seconds (between 4 and 6 seconds). Presumably to see if the user's homedir is network mounted or some such. It's probably safe to say the user's home directory won't be changing filesystems once the app starts. All sorts of things will break if the user swaps homedirs out from underneath apps. Related to this, Parallels appears to use ~/.vmm-XXXXXX file(s) as temorary storage, which is immediately unlinked. Perhaps the constant statfs'ing of ~/ is part of an abstracted layer built around these temporary files. This .vmm-XXXXXX file size appears to correspond to the size of the virtual machine's memory. Perhaps it is the file that backs a vm's "physical" memory. The file gets mmapped into the Parallels process that runs the virtual machine. MAP_ANON would probably be a much better approach, if this is indeed the case.
Whenever a parallels window (not a guest OS' window) is selected or moved, parallels synchronizes its preferences, presumably to preserve window location for next launch, in the event of a crash/panic/etc. The synchronization process appears to be rather expensive, touches many files, and does many system calls. The expense is incurred by CFPreferencesSynchronize, not by parallels, but minimizing calls to sync prefs would probably help.
Another fun little tidbit is Parallels seems to like to check for the file /var/db/.AccessibilityAPIEnabled. Presumably, this is a side effect of some high level API Parallels is using, but it seems to happen fairly frequently, particularly when suspending/resuming.
Parallels appears to use Carbon, or a UI toolkit that uses Carbon for at least some of it's file operations. This can be seen by it's use of volfs (/.vol) to access files. Additionally, they appear to use C++ somewhat extensively. You can see that they link against libstdc++, but also nm shows several C++ symbols. My guess is they are using Qt to do the UI, probably on Linux and Windows as well, which contributed to the relatively fast Mac OS X port. The early betas for Mac OS X still had Qt symbols in them.
File persistence: To successfully sync data out of kernel and drive caches on Mac OS X, one needs to do an fcntl(F_FULLFSYNC). This is what we need to look for to see if Parallels is telling the host to sync data to disk, when a guest tells Parallels to do the same. When using linux, and doing an fsync(), fdatasync(), or doing hdparm and flushing caches, no fsync or fcntl call is generated in the host environment. It appears "safe" writes in the vm do not correspond to "safe" writes in the host.
Fun tidbit: it appears Parallels has used some of their Linux port to get to market quickly on Mac OS X. Some things are still left over from the linux port, such as trying to open /proc/parallels/mem.
When Parallels launches a virtual machine, it has 2 processes that run. One is where most of the VM activity takes place. The other process doesn't appear to actually do anything. While the other process is running, this process appears to just sit in select() waiting on a pipe for, presumably, a notification from the other process saying it has finished.
Parallels has a ~/.parallels_settings file, which lists your recent virtual machines, for use in the dialog box of creating a new vm, opening an existing one, or opening a recent vm. It also stores the location of the main parallels window on the screen. It also has a /Library/Parallels/.parallels_common_options file, which appears to just specify a maximum memory for all current running VM's.
The license file appears to be stored in /Library/Parallels/.parallels_license_2.1 and just stores the license key, and who parallels is registered to.
The Parallels kexts are also /System/Library/Extensions/hypervisor.kext and /System/Library/Extensions/vmmain.kext.
When a virtual machine is in use, a .<vmname>.pvs.lock file is created in the same directory as the .pvs file, which describes the virtual machine. This lock file can become stale if Parallels crashes.
Parallels also listens on an IPv4 socket on localhost port 5679 although I have not been able to capture any traffic over this port using tcpdump on lo0. Piping random data to it seemed to have no effect.
For memory, parallels appears to use a combination of both malloc and of vm_allocate. Presumably, the vm_allocated regions are for shuttling data between the kexts and the parallels app its self.
CD/DVD access appears to be exclusive at the IOKit level. Data is read/written to and from the dvd without read/write system calls, but rather through ioctls to /dev/vm-main. This also appears to be asynchronous and done in a separate thread. The rest of the host OS does not have "normal" access to the drive. There is no device node for the device when Parallels has captured exclusive access to it.
When you click on a parallels window to select it or drag it, the guest vm's display is not updated, although the VM appears to continue executing just fine. It appears host window updates are just disabled while the mouse is down on the window.
Parallels suspends and resumes by doing 16k synchronous reads/writes from/to it's .sav file. On suspend, each write appears to be preceded by ~3 lseeks to the exact same offset. Presumably this is because they are using a high level API and each layer of abstraction needs to ensure it is writing to the right location. 128k reads/writes could provide a
Between reads of the .sav file when resuming, parallels appears to do an fstat, 2 ioctls, and an lseek. I'm not sure what the fstat is for, the ioctl is presumably passing the data, or information about the data, it just read off to the kext via the device node. The lseek just baffles me, since resumes appear to be sequential linear reads of the .sav file, so why lseek to the position you're already at? Presumably, this is part of some API abstraction. Sometimes, there can be up to 4 lseeks in a row, seeking to the exact same location in the file. Sometimes it seeks backwards, and then back to where it was with no file access inbetween.
All observations were made with beta5 on a MacBook Pro 2.0GHz w/2GB memory.
The primary guest OS was Fedora Core 5 with VT-x disabled. Observations were made using fs_usage, ktrace, gdb, and nm. My home directory was on the local disk.