Where the latency actually comes - just some thoughts:
Same metropolitan area as your jamming buddy - 10ms ; much larger if it’s geographically remote
Modern operating systems have their scheduler tick at 1000 Hz -> this would give a minimal 1ms latency, in case there is light load on the puter and a context switch is necessary, the DAW might use multiple threads etc. I’m not up to details here anymore, but I think the OS scheduler is not an issue. And actually we have multiple cores on home computers nowadays… so, yeah, not an issue I think.
Real time computing has plenty of other challenges, I think there could be lag due to the sound being processed, another process needing attention on the system and so on. For reasons like this we have realtime kernels and you might need one for this project. A realtime kernel could push the system lag to < 50ms I think.
Sound processing? I believe there are other experts here but it might take some time.
Buffering - any transmission of sound needs buffering but assuming you have a good network connection, the buffer can be reduced. In any case I think this is going to be a major latency issue, maybe in the order of 200 ms or something like that?
So in practice, think this gives us a few hundred ms of latency.
And actually not sure how much that is an issue. Players anticipate the next note coming from the jamming buddy anyway, but … maybe this kind of jamming would require its own type of skill.