Please don't do 3D like VMware!

Discussion in 'Parallels Desktop for Mac' started by hoju, Mar 2, 2007.

  1. hoju


    Finally got a look at Fusion Beta 2, and I can say without a doubt Parallels has nothing to worry about.

    The long awaited Virtual 3D Acceleration is really just a hack. They are intercepting DirectX 8.1 calls and translating them to OpenGL, but ironically, they do not support OpenGL - and have no plans to do so. Because of their approach, only Windows benefits from Virtual 3D Acceleration, and just barely.

    I think this is completely the wrong approach. Why not pick a GPU instruction set and emulate it in a fashion like any other device... then existing drivers could work, and the grunt work can be left to the driver. Not only that, but any platform that supported that GPU would benefit.

    OpenGL is important for a lot of reasons, not the least of which, the entire Linux platform. Actually any platform that isn't M$... and even then. Not everyone drinks the DirectX kool-aid.

    I hope Parallels takes their time and does a proper 3D virtualization that is not just targeted at Windoze-only... that would severly limit its usefulness.

    Fusion Beta 2, overall, is the best thing to happen to Parallels marketshare.

    (They benchmarked at somewhere around 65% of parallels RC3 in Beta 1 and blamed on the debug code - an old and lame excuse - and Beta2 is no leaner) - Even in its free Beta 2 state, VMware is grossly overpriced and not worth the diskspace it takes to evaluate.

    And what is the deal with the 3 kexts VMware installs?

    I was glad for one well written piece of VMware software - the uninstaller tool works very well!
  2. unused_user_name


    Parallels also installs some kernel extensions... They need a driver to use their hypervisor.

    Also, emulating a video card would be a BAD idea. The whole point of hardware acceleration is that it is running in hardware, which is thousands of times faster then software emulation.

    They could pass commands directly to the video card through a software-based filter that marks up the video card instructions so that they all work in a window (or whatever changes are needed). This would have a downside of making all video card operations go through the CPU, so some measurements would have to be taken to figure out how much slower that would make graphics operations.

    This filter would have to be a kernel extension, for obvious reasons...

    I find it funny that VMware did DirectX support first. Supporting openGL on a Mac would be fairly trivial, just pass the calls to the underlying OS. DriectX is harder because a Mac does not support it.
  3. dkp


    I got the feeling they were leveraging existing code from the VMWare libraries for other platforms.
  4. xboxfan


    Why VMs don't do video in hardware

    This is both right and wrong. Yes, you want the hardware to do the work. Yes, the fact that Parallels needs to play nice with the host OS complicates virtualizing the video card enormously. No, software filtering doesn't mean all video ops would be CPU-bound.

    Video cards work just like every other device. The CPU communicates with them by poking values into memory (MMIO) and receiving interrupts. Typically an entire command buffer is written and then the video card is told where to find that, and then the video card tells the CPU when it's done with the command buffer so that the CPU can reclaim it. This is how video cards avoid being CPU bound.

    Video cards are hard to virtualize for several reasons. Unlike the CPU, they don't have any virtualization built in. They expect to have direct access to memory, and when they access memory they don't cause faults, which the VM uses to manage memory. The video card doesn't manage memory like the CPU does (with PTEs and TLBs and whatnot). Consequently, it's very difficult or impossible for the VM and the host OS to both talk to the video hardware at the same time.

    USB, networking, and disk controllers would also have this problem, except that VMs are able to essentially insert hubs in the first two cases (and USB and the NIC already support that concept), and already need to do some interception in the last case (redirecting your file IO into the VM's file system, instead of directly accessing the host disk) and the additional work is completely dwarfed by the time it takes to do the IO anyway.

    In the case of video, you've really got only two options. Either the VM has to do the emulation in software, intercepting either those low-level MMIO pokes or higher-level commands to the guest video driver or even higher-level API routines; or else the VM has to take over the host OS's video processing and enable itself to render the guest OS as an overlay on the host OS's render. But doing this would requrie detailed knowledge of how the video card works, and unfortunately nVidia and ATI don't publish those details (like Intel does for its CPUs).

    That's why essentially every VM out there, except for Xbox backwards compatibility, has to do software emulation of the video card. (And Xbox backwards compatibility still has to translate nVidia to ATI, which is quite a feat.)
  5. hoju


    If Xbox can do it, then I am sure a VM company can...

    And I think a logical approach would be to partner with a company like ATI, so those "hidden details" are not so hidden... and with the talk of ATI looking at a VT-like approach in future GPUs (and what with ATI being primary on the mactels), that would be a very logical partnering.

    From what I could tell Parallels installs one kext, not three.

    Perhaps that is why VMware is 3x slower (kidding).

    I didn't really follow the logic of unsed_user_name, but xbox_fan has some good points. Frankly, if you are intercepting directx calls and translating them to opengl instructions, I can't really see how the CPU would not be involved - so it was sort of a lame duck argument - no?

    I think the BAD idea is the one VMware adopted. The smart folks at Parallels surely have something better up their sleeve.
  6. ehurtley


    Xbox can do it because the emulator can have full control over the video chip. When you stick in your Xbox (original) game, the emulator runs *AS* the OS, then runs the game inside that. (Imagine it as booting from a CD that runs Parallels directly, with a minimal underlying OS.)

    VM software on a PC can't. Parallels (under Windows and Mac anyway, not positive about Linux,) CAN'T take over the video chip because the OS won't let it. And if it took exclusive control WITHIN the OS, it could crash the OS's video drivers, effectively taking the whole system down with it.
  7. Hugh Watkins

    Hugh Watkins

    for a stationary mac with two video cards and two monitors
    might it be possible to dedicate one card to WinXp alone ?

    then to put in a "gamers" card
    I think the mac hardware slots supports up to 8 monitors / cards ????

    Hugh W
  8. alvins


    All you people saying they should've done it this way or should be doing it that way. Please.. quit your bitching. I am sure they are doing their best and we do not know what the issues are that they face doing it in a particular way.

    The fact that it is taking quite awhile for both parrallels and vmware to bring out full speed 3d acceleration is testament to that. It is complex. If either companies had found a way.. it would be there because I am sure whichever company gets it out first will be heralded with much fan-fare.

    If you think you can do it better.. please go ahead.
  9. hoju


    alvins, are you championing their honor? Rescuing a damsel in distress?

    its just software dude, the whole point of it is to discuss, debate and on occassion disperage... but not in a personal way. Critique is the name of the game, and if you don't have opinions, well, you are probably in sales.

    I don't think the actual devs working on these problems are as fragile as you think. I sure hope not, from my experience, if you can't debate a design, it is truly worthless.
  10. MarkHolbrook


    I personally have never expected Parallels to achieve direct 3D video. I use Solidworks and while it would greatly benefit from this if I really need fast 3D for a big model I load it up on a Windows PC with a Big Video card.

    I was thinking exactly like Hugh... While this would not work for us MBP owners someday I hope to have a MacPro at home. It would be cool to have a big HIGH POWERED video card that Parallels took control over and said, "this is for XP/Vista or whatever" and if they could some how still make the mouse slide between them perfectly like it does now.

  11. Supadude


    I dont know about you guys, but I dont care that much about 3D acceleration. Looking at everything else Fusion Beta 2 offers I like it a lot. Fusion Beta 2 uses less resources than Parallels, especially the awful 3186 release. And best of all Fusion Beta 2 actually mounts my freaking USB hard drive!

    I payed my 80 bucks to Parallels in the hopes of improvements in the future. Unfortunately Parallels, Fusion Beta 2 already offers me more than your current pay software.
  12. xboxfan


    Some quibbles over terminology, but you're on the right track.

    Parallels is an emulator. What matters is how it does the emulation.

    "All video operations go through the CPU" always -- it's how the computer works. The CPU asks the video card (or any other device) to perform all its operations. The CPU sets up the GPU command buffer and rendering properties, tells the GPU where to find textures (and loads them into memory for the GPU), and then essentially tells the GPU "here's a long list of stuff you should do to draw this frame. Go."

    Emulating the GPU is analogous to emulating the CPU. You've got the same set of choices for where and how you work (and these choices are mostly independent):

    (1) Hardware level (MMIO; vendor specific). You're already emulating the guest hardware, but you have to intercept the GPU memory ranges and generate the GPU interrupts. Impossible without vendor documentation on how exactly the GPU works.
    (2) Driver level (guest OS specific). You have to write a driver and tunnel its operations through to the host.
    (3) API level (Direct X, OpenGL, etc.; guest OS and API specific). You have to patch executables as they're loaded by the guest OS.

    (a) Interpret (translate operations one-at-a-time into host equivalents)
    (b) Translate (translate operations many-at-a-time to host equivalents)
    (c) Virtualize (swap ownership of the device between guest and host).

    The ideal performance-wise is #1c. But for GPUs, #c doesn't exist yet (although the GPU vendors are working on it). The only emulator I know of that does #b is the Xbox emulator on the Xbox 360; everyone else does #a. VMWare is apparently experimenting with #3a. Parallels does #2a. The Xbox 360 does #1b.

    As long as you're stuck doing #a (or sometimes #b), the advantages to #3 are that there are fewer instructions to interpret/translate (because you're working at a higher level) and the guest OS does less of the work than in #2. The disadvantages are that it's very fragile and these APIs have a huge surface area (much larger than the underlying driver or hardware model). In a nutshell, VMWare is trading some (theoretical) stability for some (practical) performance.
  13. xboxfan


    I left out

    (d) Don't (render in the guest OS into a memory buffer)

    which is also a popular choice, but obviously slow.
  14. soopahfly


    Or limit it to fullscreen

    Thereby creating exclusive access to the GFX Hardware. And forcepause the VM on exit (of fullscreen) while 3D support is selected. No Window Mode, but perfect Performance for the gamers amongst us who seem to desire 3D support most.
  15. Mathew Burrack

    Mathew Burrack

    I've been thinking about how I would go about writing 3D support for Parallels myself if I had the opportunity. xboxfan has a good rundown of the options, but frankly I think doing option 2a for Parallels would not be that hard, at least for OpenGL support for Windows.

    For starters, operating on a fullscreen versus windowed context in OpenGL makes almost no difference in commands. So apart from the actual initialization, API calls can be mapped 1:1.

    The API calls that really matter are all WGL calls (or GLX or whatever OS it happens to be running, so things like Linux probably wouldn't be too much harder). Those calls would have to be rerouted to create basically a windowed context under OS X (easy), at which point all future API calls would just operate on that context. The context would have to be composited into the final Parallels display, which can be easy or hard depending on how its implemented.

    Actually capturing the API calls under Win32 should be trivial--all they do is implement an OpenGL driver just like they have a GDI driver for a "parallels video card". Expose the OpenGL API under that driver, and boom, all OGL API calls will be rerouted to them.

    Fullscreen would be trickier to manage, but not terribly difficult.

    Now, supporting extensions would certainly be a bit more difficult, so the best we could probably hope for in the initial revision would be basic OGL 1.1 or 1.2 support, but that's at least a step in the right direction, and would give us basic acceleration for lots of older games.

    DirectX is a whole 'nother ballgame, as the Wine people can attest to. It can be done, just not nearly as easy. But hey, I'd be happy with OGL-only support for a while.

    (Actually, I was contemplating how to write a Glide driver, which would be *much* simpler since a) the API is so small, b) it only ever ran in fullscreen, which makes all the surface configuration options much simpler, c) doesn't support any of the newer fancy features like render-to-texture that might complicate things, and d) was a very popular choice for earlier 3D games, some of which ONLY ran under Glide or software, not OGL or DX, so it would provide a cool avenue for retro-gaming. Only problem is, I have no clue what method(s) Parallels uses to communicate between the guest and host OS, so I could basically write both sides of the driver, but not the bridge between them. Oops.)

  16. Resuna


    I think you have this backwards.
    The ideal performance-wise is 3, not 1. Hardware emulation of a different device is almost always slower than a custom driver for the new device, and implementing the API directly allows much more room for optimizations. Why is it so? Because the lower level you go the less opportunity you have to implement optimal code for the hardware you're actually running on.

    OpenGL is a relatively high level API, and implementing an OpenGL library that simply sanitized and transformed the OpenGL calls and tunneled them through to Apple's OpenGL would give you the best performance.
  17. Mathew Burrack

    Mathew Burrack

    Y'know, apparently I'm dyslexic, b/c I didn't notice that one. Yes, #3 would be the fastest, followed closely by #2 (#2 only adds the cost of the tunnelling, which ideally would be small compared to the performance of the rest of the app, since GPU programming encourages minimizing your calls to the gfx card)

  18. websyndicate


    i dont really care as long as it runs good and does take up too much resources and is still STABLE. I think that what most users want is Stability
  19. hoju


    Xboxfan really outdid himself on a quality rich post, I had no idea this would start such an interesting thread.

    I still say the major barrier to #1 is vendor documentation and collaboration. Since Parallels is not open source, I see no reason I why they could not strike an agreement with ATI to emulate say an X1600 - it would have plenty of shelf life... it is emulation after all, they don't need a new emulated card for every actual physical card out there. The argument that cards change is moot. You pick one to emulate, and that is your 3D emulated solution for many years to come. Something where there is stable driver for every platform (for your guests).

    I would think the time saved not writing a driver for each OS (leveraging the existing drivers) would pay dividends over the other options. And 1a or 1b would be great until c becomes an option, which is already in the works.
  20. Mathew Burrack

    Mathew Burrack

    I think the work to put into writing an emulator for #1, even IF you have complete documentation, would simply be too great to be worthwhile. You're talking about a considerable number of registers, each of which have to essentially be emulated to translate back into calls to the host OS OpenGL API. Plus, it would be slow as all get out: you have one driver layer in the guest OS translating API calls down into register calls (which would be much greater in number than the API calls), which all have to be shunted to Parallels and translated *back* into API calls.

    Plus, with how fast graphics hardware advances, you'd be putting a lot of effort into something that would be very quickly far out-of-date, if it wasn't already by the time you finished!


Share This Page