For Tornado, Raptor and Online codes, time linearity comes with no hidden big constant, hence they are theoretically most optimized algorithms available nowadays.
And existing implementations that I've seen are not just several times slower than the code we use, but also several times slower than a 16-bit RS code that is implemented very well at least for message sizes of interest to us.
It sounds are parroting stuff on the internet which is written by solution partisans and by people without a lot of practical experience-- without experience with any of it yourself. We tried these things. They were slow. Feel free to knock yourself out looking into them further, I agree they are very interesting.
On a modern computer a "fast algorithm" that cannot make effective use of SIMD will lose access to (say) 93% percent of the cpu's peak throughput. Turns out that can hide a pretty big log n factor!
One thing to keep in mind when looking at algorithm asymptotics is that it is often reasonable to read "log n" as "a small constant" because no problem that fits in memory on your target computer will make log n larger than (say) 30... and often (but not always), any problem that doesn't fit in memory will have its time utterly dominated by IO costs and so then cache is critical which asympotics say nothing about. (Pealing decoders, in particular, have poor cache locality and poor branch prediction friendliness).
It's very easy to have a factor of thirty difference between implementations. So it's not especially surprising to see an O(N log N) algorithm matching an O(N) one... even on large inputs. I've also found that it's not that uncommon for N^2 algorithms on realistic problems to beat N log N ones in practice (often due to cache localities), and found it to be fairly common for O(N) algorithms to beat log(N) ones, for problems small enough to fit in L3 cache (usually due to branch prediction and parallizability).
In any case, for the workload presented by fibre the fastest code implemented for it that we've been able to find is a mixed density quasi random graph code... and fairly close nearest runner up is an FFT using 16-bit RS code (not coincidentally written by the same author).
For instance, suppose you want to tune RS parameters to support a 10% loss channel, i.e. a channel with 90% capacity, you need to choose n,k such that R=k/n <= 0.9 now even if you choose them to be like (8,10) you have definitely high risks of not being able to recover original message from 8 out of 10 encoded blocks in a 30% noisy day.
In bitcoin block relay problem having no assumption about δ looks more decent which is achievable by rateless-ness feature of Online Codes
You can just as well use one "ratelessly" by sending as much as is needed: you just end up with a minimum bound on the rate that arises out of the size of the RS symbol size, you can only generate so many non-duplicate symbols. There is no overhead created from doing this (and in fact all other options are not MDS codes, so they have overhead, though e.g. for ours the probability of a failed decode when you have enough data is 0.03^(extrapackets+1), so negligible but technically it's overhead).
Say you have 512 bits you want to send. You can code it with 32 symbols in a 16-bit RS code then send up to 65535 symbols (128kb) (or 65503 non-systematic symbols) and the receiver will recover once they've received any 32 of them. So this code works for a rate of 1.0 to 0.00048. A true rateless code could drive the rate arbitrarily close to zero: But for our application mempool losses alone mean we never need a code with a rate of less than 0.5 for mempool losses.
A code with a minimum rate of 0.00048 is sufficient with 100% of the mempool missed for up to _99.95%_ packet loss.
While "we will not simultaneously lose more than 80% of all packets and have no mempool hits" is not technically "no assumption" it's close enough for engineering purposes: at 80% loss the FEC is not really the system performance limiting consideration in any case (the limiting one is that any link that is currently losing 80% packets is very likely about to start losing 100%
). If you have more than 80% packet loss, you have worse things to worry about, also if you have 80% packet loss the efficiency loss from repeating packets is not particularly substantial (since you're losing most packets, its unlikely that you received any particular duplicate), so if you really did want a lower rate code you could just start duplicating packets. Aside, a couple messages up you were also trying to argue that packet loss wasn't relevant.
In practice, because we don't want to waste unbounded amounts of bandwidth on a down peer fibre currently only sends up to 8x the size of the block so we never use a rate less than 0.125. In practice, even under extremely high packet loss a rate of 0.333 is more than low enough to accommodate the loss and no mempool hits at all. The mean rate we need for decode is like 0.94 since missing tx data and packet loss are about 6% or so.
As of Raptor, not being used in the industry, you are correct, I mixed it up with something else,
I expect that we won't see those codes widely deployed until after the patents expire. There are few enough applications where actual ratelessness is all that useful compared to just wide rate flexiblity that you can achieve with many fast well understood codes-- especially by using a product of two codes (like two layers of RS codes, or RS+duplication code like I mentioned above).