Hello I've started attempting to port this over AMD devices, HIP to be specific. Currently stuck on working out exactly what the asm instructions are doing via CUDA/PTX. I have been reading up on the amd gpu instruction sets but it's all pretty complex and vary alot from nvidia https://llvm.org/docs/AMDGPUUsage.html https://gpuopen.com/amd-isa-documentation/ currently utilizing a RX 6800 so that would be rdna2 arch.
I have been considering just not using inline asm at all for this project so we can support more different gpu models.
Any help at all for this project would be greatly appreciated.
Targeting RX 6000s series gpus for now as that would be what I can test on.
This would be the project repo it's forked from JLP's repo.
https://github.com/TooPlain/Kangaroo-HIP