A quick update on the Linux front - I changed the kernel TLB flushing code in the kernel to ignore the application ASID and do the all-ASID variants instead:
TLBI VAE1IS -> VAAE1IS
TLBI VALE1IS -> VAALE1IS
TLBI ASIDE1IS -> VMALLE1IS
(and similarly for the range operations)
It seems to be working fine with 10 vCPUs, no failures for my simple tests (typically /sbin/mkinitramfs -o /dev/null). I have not tried a full distro install as that comes with its own kernel.
What does this mean? Probably stale TLB entries from incorrect handling by the Hypervisor Framework. The all-ASID TLB invalidation ensures that the forked process starts with clean TLBs, at least for those ranges invalidated in the parent space. Another possibility is that the hardware does not propagate the ASID+VMID invalidation correctly to other CPUs, though I'm sure this would have been seen by Apple engineers already. Not sure Parallels engineers can fix either of these but one thing to try would be to pin each vCPU to a physical CPU (no idea how to do this on macOS and the Hypervisor Framework). This would avoid problems with multiplexing two or more vCPUs on a single CPU and, if it works, we can rule out hardware bugs.