Sergio Demian-Lerner has discussed this in February of 2015 on his blog:
https://bitslog.wordpress.com/2015/02/17/faster-sha-256-asics-using-carry-reduced-adders/Basically it is an interesting idea, but neither Sergio nor those 3 guys discussed how it could be affected by the overall pipeline design. It seems like those guys from UIUC considered only one (or maybe two) pipeline layouts (the alternate drawn in dashed lines).
Much better science would be to consider way more pipeline layouts including something extreme like 32-way pipelined ripple-carry-adder that adds two 32-bit integers in 32 clocks. It seems slow, but the area is unbeatable. At least those guys explicitly discussed area*delay products. But it doesn't seem like they carried this to the ultimate conclusion of power/hash rate and area/hash rate (or better yet price/hash rate).
But it is the only paper that I've seen that was actually brave enough to include the plain ripple-carry-adder (RCA) in the final comparison tables and graphs.