Parallels Desktop for Mac computers with Apple silicon M4 chips

Discussion in 'Parallels Desktop on a Mac with Apple silicon' started by Mikhail Ushakov, Oct 30, 2024.

  1. cmarinas

    cmarinas Junior Member

    Messages:
    15
    TL;DR warning: some thinking out loud, don't read unless you're interested in the Linux kernel fork() mechanism and the Arm architecture.

    fork() in Linux duplicates the parent page tables into the child process while marking the PTEs read-only. In the parent process, there's a single TLBI ASIDE1IS at the end of the page table copy (and before the child is started). There is no need for a TLBI in the child process since it starts with its own ASID and presumably no stale TLB entries for the new ASID (when the ASIDs run out, there's a full local TLBI on each CPU - TLBI VMALLE1; we call this a roll-over event).

    The stack smashing check failure looks like copy-on-write (CoW) does not always happen for the stack page when both the parent and the child process access it shortly after fork(). The stack is likely the first page accessed after the fork() and, when the bug triggers, either the parent or the child succeed in writing it without triggering a permission fault into the kernel (for CoW). This typically happens if there are stale TLB entries.

    I think we have two main scenarios after fork():
    1. The parent writes the stack without CoW. Since we had a TLBI ASIDE1IS already, that's very unlikely, especially if the parent is not migrated to another vCPU (which may run on another CPU). Well, there's a small chance that the parent migrated to another vCPU (and on a different physical CPU) and the TLBI ASIDE1IS did not get propagated there for some hardware reason. I find this unlikely
    2. The child writes the stack without CoW. This would not be possible if the TLB cache is empty for the new ASID. However, we can have an ASID roll-over given that Apple Silicon only exposes 256 ASIDs, at least to the VM (a shell script with lots of forking would quickly run through them). Sub-scenario (a) is that M4 does some TLB sharing between CPUs but the local TLBI (non-inner-shareable) that Linux does on ASID roll-over doesn't invalidate all such shared TLBs, things can go wrong with stale TLB entries. A more likely possibility is (b) the hypervisor framework does not properly invalidate the TLB when multiplexing multiple vCPUs on the same CPU.
    2.b is important as Linux assumes that the TLBs are private to a (v)CPU and can do a local TLBI on ASID roll-over, deferred until the next context switch on a (v)CPU. If M4 has a lot larger TLBs and the hypervisor framework is missing proper maintenance on multiplexing vCPUs on a CPU, things can go wrong, especially when fork() and CoW are heavily involved. Actually, this could even trigger scenario 1 if the parent is scheduled out and back again requiring an ASID roll-over.

    FWIW, Linux/KVM had a bug in this area, fixed about 8 years ago - https://lore.kernel.org/all/20161103222706.24129-1-marc.zyngier@arm.com/.
     
  2. LiquidV

    LiquidV Junior Member

    Messages:
    10
    I can confirm 1 core works with Ubuntu Desktop ARM64 24.04.1 on MBP M4 Pro.
     
  3. Freddy2

    Freddy2 Junior Member

    Messages:
    15
    @Mikhail Ushakov,
    That is nice to know but
    Are you guys going or not to patch version 19?
    Again, I'm not asking to get version 20 for free but minimum user support for a product that it's not even a year old.
    If you cannot fix the issue on previous versions;
    could you please offer a special discount for people like us (many in this chat) will be highly appreciated ?

    Thanks,
    Fred
     
    MrParallels likes this.
  4. Avinash Bundhoo

    Avinash Bundhoo Staff Member

    Messages:
    641
    Hello,
    Thank you for reaching out.
    We have released a new update Parallels Desktop 20.1.2 which fixes this issue.
    Please install the latest update as soon as possible.
    Thanks
     
  5. charlesr11

    charlesr11 Bit poster

    Messages:
    1
    Y
    I agree. I bought v19 outright in Nov 2023 and now I find that the solution is to "future proof" by paying more money! Australian consumer laws ask a simple question: "Would you have bought the product if you had known of this issue" (that it wouldn't work a year later).
     
    MrParallels likes this.
  6. AlexeyS8

    AlexeyS8 Bit poster

    Messages:
    8
    lovely write up!
    I hope Parallels can fix the desktop.
    Or apple fix hypervisor.
    And in any case lets hope it is not M4 hardware issue. otherwise quite silly to upgrade and run on 1 CPU
     
  7. Peter V.

    Peter V.

    Messages:
    2

    Just stop being cheap. People like you are annoying.

    MacOS Sequoia was released on September 16th, 2024.
    Parallels 20 was released in September 2024.
    The Apple M4 MacBook Pro line was released on October 30th 2024.
    The Apple M4 MacBook Pro line was released with MacOS Sequoia.
    So it's easy... if you have a M4 MacBook Pro, then get Parallels 20.
    Parallels did not force you to buy a M4 MacBook Pro/ M4 Mac Mini, did they?

    Expecting legacy support for free (quote from you: "patch version 19")/ expecting a (quote from you...) "special discount" as you do is just cheap.

    Parallels customers already get a discount if they choose to upgrade.
    Quote from a Parallels website: "For all one-time purchases and customers with a previous version of Parallels Desktop, upgrade to the latest version at a discounted rate."
    Link: https://www.parallels.com/products/desktop/buy/ -> "Upgrade"

    Otherwise...
    Again the quote from you: "fix the issue on previous versions"
    That would be called legacy support.
    Would you be willing to pay an extra for legacy support? If you would appreciate the time of the Parallels software developers, then you would pay an extra. Why should the developers work for free?
    Have you ever thought about the possibility that the current price model of Parallels might be build on the idea to keep the software price as low as possible and so not to offer legacy support?
     
  8. cmarinas

    cmarinas Junior Member

    Messages:
    15
    A quick update on the Linux front - I changed the kernel TLB flushing code in the kernel to ignore the application ASID and do the all-ASID variants instead:
    TLBI VAE1IS -> VAAE1IS
    TLBI VALE1IS -> VAALE1IS
    TLBI ASIDE1IS -> VMALLE1IS​
    (and similarly for the range operations)
    It seems to be working fine with 10 vCPUs, no failures for my simple tests (typically /sbin/mkinitramfs -o /dev/null). I have not tried a full distro install as that comes with its own kernel.

    What does this mean? Probably stale TLB entries from incorrect handling by the Hypervisor Framework. The all-ASID TLB invalidation ensures that the forked process starts with clean TLBs, at least for those ranges invalidated in the parent space. Another possibility is that the hardware does not propagate the ASID+VMID invalidation correctly to other CPUs, though I'm sure this would have been seen by Apple engineers already. Not sure Parallels engineers can fix either of these but one thing to try would be to pin each vCPU to a physical CPU (no idea how to do this on macOS and the Hypervisor Framework). This would avoid problems with multiplexing two or more vCPUs on a single CPU and, if it works, we can rule out hardware bugs.
     
    TonyH13 likes this.
  9. Nul_l

    Nul_l Bit poster

    Messages:
    5
    Can confirm - setting any distro to 1 Core removes all issues and allows all installs. Hopefully this can be addressed soon
     
  10. Krystic

    Krystic Bit poster

    Messages:
    1
    Yes, setting the CPU to single-core can resolve the crashing issue. I have successfully installed and used Ubuntu Desktop. Thank you!
    Hopefully, the official team will fix the multi-core bug soon.
     
  11. AlexeyS8

    AlexeyS8 Bit poster

    Messages:
    8
    it is not being cheap, it is expecting reasonable service and quality of support for something that is marketed as premium product.
    btw, parallels 20 has same issues and there is no fix still.
     
  12. AlexeyS8

    AlexeyS8 Bit poster

    Messages:
    8
    I do not think it is hardware issues, but specifically Parallels software issues.
    I can run standard Docker.app on mac M4 and it builds my images with 8 vCPUs. underlying it is a linux VM.
    I can also run VMWare Fusion 13 and it runs ubuntu22/arm64 perfectly fine on 8 vCPUs.
    What does NOT work is any flavour of Ubuntu/arm64 on Parallels 20.
    These are all tests on the same MBP M4 Max.
     
    MrParallels likes this.
  13. cmarinas

    cmarinas Junior Member

    Messages:
    15
    I'm getting to the same conclusion. I tried UTM (which just uses qemu on top of the hypervisor framework) and I didn't see any problems with a full Debian install or my simple mkinitramfs litmus test. Not sure how much control these tools have over the Hypervisor Framework to get it wrong. They might set the vCPU affinity to each physical CPU, though I tried to oversubscribe UTM to use 16 vCPUs and nothing went wrong (though very little testing).

    Another thing I may try later today or tomorrow - disable (PSCI-based) cpuidle in Linux. I've seen problems before on actual hardware with firmware not correctly handling the TLB maintenance during CPU sleep/wake-up. While it's difficult to get this wrong with virtualisation (the physical CPUs remain in the coherency domain), if Parallels in combination with the Hypervisor Framework tries to do something smarter but incorrectly recover guest system registers, strange behaviour can happen.

    I guess we just wait and see what the Parallels engineers come up with. But given that there are tools out there that work, I'm more optimistic about a quick fix.
     
  14. cmarinas

    cmarinas Junior Member

    Messages:
    15
    Yet another possibility is that M4 has 16-bit ASIDs but for some reason the guest VM is only told about 8-bit ASIDs. In an 8-bit configuration, Linux uses bits 63..8 as a generation number to track ASID roll-overs. When the ASID is written in TTBR1_EL1, it writes the full 16-bit under the assumption that bits 15..8 are ignored. However, we can have two threads of the same process running on different CPUs, one active during an ASID roll-over. The other thread being scheduled in will be assigned a new ASID generation but keeping the same ASID (bottom 8-bit). This can lead to the 16-bit written in TTBR1_EL1 for the two threads on different CPUs to be different with only the bottom 8-bit being the actual ASID Linux is aware of. Any TLB maintenance will use a 16-bit ASID + (part of) generation that's different between threads, leading to missed invalidation.

    I'll try this later by changing the Linux kernel to mask out the generation bits written in TTBR1_EL1 and report back.
     
    TonyH13 likes this.
  15. taxicamkhe

    taxicamkhe Taxi Cẩm Khê

    Messages:
    2
    Bài viết của bạn rất hay và hữu ích
     
  16. cmarinas

    cmarinas Junior Member

    Messages:
    15
    That's the problem. UTM reports 16-bit ASIDs to Linux while Parallels only 8-bit ASIDs. Arguably a Linux bug as well as it's writing non-zero RES0 bits in TTBR1_EL1. I'll fix that but it will take time for the fix to trickle down into distro kernels. In the meantime, Parallels can just enable 16-bit ASIDs. For whoever's curious, the Linux patch is something like below:

    Code:
    diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
    index e1e0dca01839..d0d9c99c8a0b 100644
    --- a/arch/arm64/mm/context.c
    +++ b/arch/arm64/mm/context.c
    @@ -32,9 +32,9 @@ static unsigned long nr_pinned_asids;
     static unsigned long *pinned_asid_map;
     
     #define ASID_MASK        (~GENMASK(asid_bits - 1, 0))
    -#define ASID_FIRST_VERSION    (1UL << asid_bits)
    +#define ASID_FIRST_VERSION    (1UL << 16)
     
    -#define NUM_USER_ASIDS        ASID_FIRST_VERSION
    +#define NUM_USER_ASIDS        (1UL << asid_bits)
     #define ctxid2asid(asid)    ((asid) & ~ASID_MASK)
     #define asid2ctxid(asid, genid)    ((asid) | (genid))
     
    
     
  17. cmarinas

    cmarinas Junior Member

    Messages:
    15
    An alternative Linux fix is to properly configure TCR_EL1.AS to match the ASID bits. Currently it just goes for 16-bit, irrespective of what the hardware supports, under the assumption that the setting is either ignored or the ID registers report 16-bit ASIDs. I'll figure out the best fix in the next day or so. But a Parallels update to the guest visible ID_AA64MMFR0_EL1 register so that its reports as 16-bit ASIDs would be appreciated. Thank you.
     
  18. Freddy2

    Freddy2 Junior Member

    Messages:
    15
    Wondering if you please confirm if a patch is coming for version 19 or not in order to solve this issue?

    Thanks,
    Fred
     
  19. cmarinas

    cmarinas Junior Member

    Messages:
    15
    For some reason M4 does not seem to like TCR_EL1.AS == 0. The failures are a lot more frequent, I can't even boot the OS (similar stack smashing errors). Maybe it's not just a Linux bug but it's too late in the day to figure out.
     
  20. MrParallels

    MrParallels Bit poster

    Messages:
    2
    I am also unable to start my Windows 11 VM after moving it from an M1 Mac to an M4 Mac. Since the version of Parallels Desktop (18) and macOS (15.1) is exactly the same it must be a compatibility issue with the new processor, not the operating system. I get that the company cannot support old versions indefinitely, but Parallels own compatibility list specifies versions of macOS and Parallels, not CPUs. To me this looks like a bug that ought to be patched in versions 18 and 19, since so many users have exactly the same problem. To me the «advice» like this:

    obfuscates the fact that Parallels 18/19 were working fine on macOS 15.1 - and still do with M1, M2 and M3! Parallels should cover themselves by explicitly mentioning which processors they intend to support, not hide behind whatever OS was preinstalled.

    I guess it depends on what kind of reputation Parallels wants. Learning from how you treat bugs like this does not exactly tempt me into upgrading to the latest version. I'm going with the free alternative Whisky forward. Lastly, how can you say you don't have an incentive to push customers to buy new versions every year - selling your software is how you make money:

     

Share This Page