TPTB_need_war (OP)
|
|
April 12, 2016, 10:45:05 AM |
|
what is about to happen to Javascript
What is about to happen to JavaScript is very likely that it will continue to become even more and more ubiquitous. That will happen first because it is headed towards a peaking (critical mass about now with everyone racing to switch from for example Silverlight to Javascript, Java to Node.js, C to ASM.js, etc.). But that peak will be quickly followed by a disruption because Javascript is being deployed in use cases where it is not best fit (merely because it is expedient and only choice) and this will result in disappointment, which is for example what happened with C and C++. A better fit programming language and App platform will arise to challenge the Internet browser. I know because I am the one who is making it. No Javascript can't disrupt the non-browser Apps entirely, because it is not a best fit to applications
I agree it won't entirely, because I don't see that it brings sufficient advantages, and it has obvious disadvantages, so it will end up being better in some cases, worse in others. The other languages will continue be used alongside it in a fragmented environment. You don't seem to grasp yet how unifying the mobile device is coupled with how unfit the current options of HTML5 apps or App store apps are. Appearances can be deceiving. Dig down into what users are actually doing and not doing. But I am not yet ready to reveal my entire plan. It will be in the JAMBOX crowdfund document when ever we decide to crowdfund. I am lately thinking of keeping it hidden a bit longer, because of potential copycats such as Synereo. I have enough funding to continue developing for now. No need to rush the crowdfund, especially if my health is improving. I see it. Stay tuned.
If it is what we have discussed before, then I see very, very high costs along with the benefits. I don't believe I have ever discussed anything about JAMBOX with you. If it is something new, then I look forward to seeing what you have invented. Sounds interesting.
Just remember that the mobile device is eating (almost) everything.
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
April 12, 2016, 10:51:38 AM |
|
what is about to happen to Javascript
What is about to happen to JavaScript is very likely that it will continue to become even more and more ubiquitous. That will happen first because it headed towards a peaking (critical mass about now with everyone racing to switch from example Silverlight to Javascript). But that peak will be quickly followed by a disruption because Javascript is being deployed in use cases where it is not fit and this will result in disappointment, which is for example what happened with C and C++. C++ has declined more rapidly than C. It seems there is perhaps no use for which it is fit! But I am not yet ready to reveal my entire plan. It will be in the JAMBOX crowdfund document when ever we decide to crowdfund. I am lately thinking of keeping it hidden a bit longer, because of potential copycats such as Synereo. I have enough funding to continue developing for now. No need to rush the crowdfund, especially if my health is improving. You can also consider not revealing your entire plan and still crowdfunding. I don't believe I have ever discussed anything about JAMBOX with you. Okay I thought you meant programming language properties specifically. In terms of the overall platform, no.
|
|
|
|
TPTB_need_war (OP)
|
|
April 12, 2016, 11:11:48 AM Last edit: April 12, 2016, 11:26:25 AM by TPTB_need_war |
|
what is about to happen to Javascript
What is about to happen to JavaScript is very likely that it will continue to become even more and more ubiquitous. That will happen first because it headed towards a peaking (critical mass about now with everyone racing to switch from example Silverlight to Javascript). But that peak will be quickly followed by a disruption because Javascript is being deployed in use cases where it is not fit and this will result in disappointment, which is for example what happened with C and C++. C++ has declined more rapidly than C. It seems there is perhaps no use for which it is fit! Good point and application of my theory about designed fitness. Thanks. Yeah everytime I imagine going back to C++ to get some low-level control married with generics and some fugly excuse for a first-class function, I decide I'd rather not code, lol. I think this extreme resistance to entering clusterfucks comes with past experience of them. Some crypto-currencies have decided they are willing to be sacrificed at the altar for that marriage of features. Perhaps it is a pragmatic decision since crypto in a high level language kind of sucks. I don't know if anyone could pay me enough to work on C++ code again, especially the complexity that ostensibly has been layered on top since I last used it in the late 1990s (which I haven't bothered to learn). ( btw another reason I wasn't excited to work on Monero) Whereas, for the moment there is no alternative to C when you want portable lowest-level control. I don't believe I have ever discussed anything about JAMBOX with you.
Okay I thought you meant programming language properties specifically. In terms of the overall platform, no. I tried to think about using Javacript for my current plan, but there are just so many cases of tsuris and non-unification lurking. Why start another multi-year journey with a monkey patching expedient yet fundamental handicap. However, there are many details remaining for me to analyze so it is possible I might conclude that not all objectives (e.g. JIT fast start compilation) can be achieved with a language alternative. Stay tuned.
|
|
|
|
TPTB_need_war (OP)
|
|
April 12, 2016, 11:39:12 AM Last edit: April 12, 2016, 12:00:06 PM by TPTB_need_war |
|
I tried to think about using Javacript for my current plan, but there are just so many cases of tsuris and non-unification lurking. Why start another multi-year journey with a monkey patching expedient but fundamental handicap.
The answer would be time to market for one, accessibility by target market for another (though I don't know your target market so perhaps disregard). That's not saying you should do so, just giving reasons why one might consider it. Absolutely. That is the tragedy of inertia. Requires extreme cleverness or insightful identification of a wormhole perhaps to overcome. Edit: but note I am creating a platform not just an App, so expediency is less important than staying power, which I think has been your point of Monero's strategy.
|
|
|
|
AlexGR
Legendary
Offline
Activity: 1708
Merit: 1049
|
|
April 12, 2016, 05:16:03 PM |
|
@AlexGR I'm still puzzled at what the hell you are going off about. It seems almost entirely strawman to me.
Very few new applications are being written in C afaik (and I do have somewhat of an idea). It is only being done by a few fetishists such as jl777.
Technically you may be correct about new applications. I don't know. But what I'm seeing in linux repositories is a flood of c/c++ programs. Python is probably close behind. I am not seeing much javascript really.
|
|
|
|
|
TPTB_need_war (OP)
|
|
April 13, 2016, 07:05:31 AM Last edit: April 13, 2016, 10:48:51 AM by TPTB_need_war |
|
rangedriver wins the crowdsourcing of the cure to my illness for suggesting Oregano oil. Well I don't want to make a sure conclusion, because I've had relapses in the past every time I thought I was improving. But the changes I am observing to myself over the past several days are extremely encouraging.
The breakthrough seemed to come a few days ago when I decided to start taking the Oregano oil sublingually instead of digesting it. Also I had increased to taking it 5 - 6 times per day. I also increased the Bentonite clay doses to 3 - 4 times per day between meals. The other significant change I had made in the past few days was starting to eat 5 eggs per day instead of only small tuna (no mercury risk). Also I was craving dark greens as I started to heal, so I started to eat a huge plate of cooked down "collard greens" per day. And thus I was able to reduce my oatmeal intake slightly, since I had increased my carbos and fiber from vegetables. When I first started this change, I immediately got sinus and throat effects as if I was getting more sick and my eyes got so irritated I could hardly see. That lasted for about 2 - 3 days, but now suddenly I am starting to feel really, really good.
The reason I decided to radically increase the dosage and go sublingual to increase absorption into the blood stream, was because I was pissed off that nothing was working. I threw all caution to the wind and decided I had nothing to lose.
1. The "75 year old skin" effect I had obtain on my shoulders and upper back during my illness (especially past year) is clearing. The pigment had turned to brown splotches with pink areas in a very rapid change which didn't seem to make sense to me as a sudden shift around age 50 and I had thought it might be caused by my illness. This has cleared up significantly and my skin is miraculously looking more normal again.
2. My voice has returned to deeper, more masculine. My gf said to me today, "are you sick, what happened to your voice?". She had never heard me with my normal, male testosterone voice.
3. I am no longer getting pimples on my face and tongue nor red irritated splotches on my face that would ooze clear liquid when I would scratch them (at least not past days since I made the above changes).
4. My thought process has been less arduous and fighting fatigue back to way it should be where I am exploding with thoughts with ease.
5. My gf noticed I am talking non-stop changed from when I refused to talk most of the time (to conserve energy).
6. My sleep is easy and I can sleep anytime I want to. The insomnia had become so bad with my illness that I could barely sleep 5 hours and only after I had stayed awake for so long that I would collapse. Now I can just decide to go lay down and get sleepy like a normal human being can.
7. The pain in my stomach/abdomen is for the most of the time non-existent!
8. I was able to get and hold an erection.
I don't know if this is permanent. But it is uber encouraging! After suffering for years and trying every sort of experimental cure, I am so excited maybe I can work and live normally again!
Note I have not yet really pushed myself athletically yet, so that will be a real test of whether this is a cure. And I do still have lingering symptoms, including my eyes still getting slightly irritated, some slight peripheral neuropathy effects, and slight MS-like symptoms at the back of the skull, but all very slight and improving.
I think it was important to share this given I may try to lead/join some open source and commercial projects now.
And especially to give rangedriver the credit he is due (which is not really enough to show the value this means to me, if it is really the cure).
I should mention that Jason Hommel suggested Oregano oil to me in the past years ago, but I thought he meant topically only. And this process of learning about which homeopathic treatments are bullshit and which are really powerful has been a very deep learning curve for me. I went in with an open mind and didn't favor one or the other and tried to climb down the rabbit hole.
Also I can't be sure it is only the Oregano oil and not also the Bentonite clay and dietary changes as well.
Edit: I had mentioned upthread that I had cut out all vegetable oils from my diet within the past month, as these are poisons. And also my only grain has been oatmeal lately.
|
|
|
|
rangedriver
|
|
April 13, 2016, 08:45:53 AM |
|
rangedriver wins the crowdsourcing of the cure to my illness for suggesting Oregano oil. Well I don't want to make a sure conclusion, because I've had relapses in the past every time I thought I was improving. But the changes I am observing to myself over the past several days are extremely encouraging.
And especially to give rangedriver the credit he is due (which is not really enough to show the value this means to me, if it is really the cure).
Hey, thanks. It's great that you're making good progress and I really hope it sustains. As I mentioned before, the use of bentonite clay (along with the oregano oil) is a really smart move as it would function similar to bile acid sequestrants such as cholestyramine, the current gold standard for fungal detoxification. It's always kinda fascinating when someone has a mysterious disease, as from my experience a lot of things can eventually be cured given a little bit of time, research and careful introspection. And I know first-hand how miserable it can be to be blighted by a seemingly incurable condition that perpetually plagues the mind and body, often leaving little hope of salvation. Hopefully the convalescence will continue. If not, at least you're a little bit closer to the cure.
|
|
|
|
TPTB_need_war (OP)
|
|
April 13, 2016, 10:47:38 AM |
|
As I mentioned before, the use of bentonite clay (along with the oregano oil) is a really smart move as it would function similar to bile acid sequestrants such as cholestyramine, the current gold standard for fungal detoxification
Feeling great after dinner (3 microwaved eggs no vegetable oil, steamed broccoli, and cooked collard greens) and normally before I would feel ill and various bad symptoms after eating. Here is the brand I am taking: Yerba Prima Bentonite, Detox Pint, 16 OunceThe total surface area to grab toxins in one tablespoon serving is over 8,100 square feet, more than the floor surface area of four average 2000 square feet homes. Read the reviews. See also the other reason this stuff works: https://researchmatters.asu.edu/stories/attacking-mrsa-metals-antibacterial-clays-2649https://nsf.gov/discoveries/disc_summ.jsp?cntn_id=132052&org=NSF
|
|
|
|
AlexGR
Legendary
Offline
Activity: 1708
Merit: 1049
|
|
April 13, 2016, 10:50:15 PM Last edit: April 14, 2016, 12:45:14 PM by AlexGR |
|
After our recent discussion, I made a small program that calculates square roots, for like 100mn loops (x4 = finding 400mn square roots). When it tends back to 1, it starts adding to the variables in order that it can keep going on with the square roots. I started this to see what the performance difference is between pascal and c (which I avoid like the plague, but anyway) - in terms of binaries (=compiler performance) but then I expanded the experiment to see what is wrong with their SSE use. The code for Pascal, C and ASM (inside the pascal window) here => http://s23.postimg.org/j74spnqc9/wastingtimewithbenchmarks.jpgSo, Pascal, after fiddling around on all available optimizations, gave me ~4.5sec. Interestingly, the debugger (objdump) shows that it uses SSE commands like SQRTPD, but it's doing so in a weird way. C, with GCC 5.3.x, gave me 3.5 - 3.9 secs. Paradoxically, it liked lower -O settings, like -O0... -O1 lost it speed (3.8secs) and -2 / -3 tried to regain it. I also got more performance with -mtune=nocona than -mtune=core2 which is closer (architecturally) to my q8200 and what it takes automatically when -march=native is used. I also tried -msse2,3,mssse3,msse4.1 etc, -mfpmath with all combos, etc, etc, at best it got down to 3.55 secs. The object dumps of the gcc binary didn't enlighten me very much but I could see that it's using the sqr instruction 4 times: The source is: for (i = 0; i <100000000; i++) { b=sqrt (b); bb=sqrt(bb); bbb=sqrt(bbb); bbbb=sqrt(bbbb); and the dump is: 40072e: 0f 84 9b 00 00 00 je 4007cf <main+0x12f> 400734: f2 0f 51 d6 sqrtsd %xmm6,%xmm2 400738: 66 0f 2e d2 ucomisd %xmm2,%xmm2 40073c: 0f 8a 63 02 00 00 jp 4009a5 <main+0x305> 400742: 66 0f 28 f2 movapd %xmm2,%xmm6 400746: f2 0f 51 cd sqrtsd %xmm5,%xmm1 40074a: 66 0f 2e c9 ucomisd %xmm1,%xmm1 40074e: 0f 8a d9 01 00 00 jp 40092d <main+0x28d> 400754: 66 0f 28 e9 movapd %xmm1,%xmm5 400758: f2 0f 51 c7 sqrtsd %xmm7,%xmm0 40075c: 66 0f 2e c0 ucomisd %xmm0,%xmm0 400760: 0f 8a 47 01 00 00 jp 4008ad <main+0x20d> 400766: 66 0f 28 f8 movapd %xmm0,%xmm7 40076a: f2 0f 51 c3 sqrtsd %xmm3,%xmm0 40076e: 66 0f 2e c0 ucomisd %xmm0,%xmm0 400772: 0f 8a b5 00 00 00 jp 40082d <main+0x18d> ...when proper SSE use means it would load two values on the same register and do a batch processing ( =2 commands x 2 data processing on the same registers). So, I went back to Pascal, which I like better for the Turbo Pascal-like IDE in the console, and changed the code over there from: for i:= 1 to 100000000 do b:=sqrt(b); bb:=sqrt(bb); bbb:=sqrt(bbb); bbbb:=sqrt(bbbb); ...to for i:= 1 to 100000000 do //100mn loop begin; movlpd xmm1, b //loading the first variable "b" to the lower part of xmm1 movhpd xmm1, bb //loading the second variable "bb" to the higher part of xmm1 SQRTPD xmm1, xmm1 //batch processing both variables for their square root, with one SIMD command movlpd xmm2, bbb //loading the third variable "bbb" to the lower part of xmm2 movhpd xmm2, bbbb //loading the fourth variable "bbbb" to the higher part of xmm2 SQRTPD xmm2, xmm2 //batch processing their square roots movlpd b, xmm1 // movhpd bb, xmm1 // Returning all results from the register back to the Pascal variables movlpd bbb, xmm2 // movhpd bbbb, xmm2 // ...and voila, my times went down to 2.2s So: Pascal ~4.5s, C ~3.6s, Pascal with simple, rational SSE use by someone who is not even a coder and goes to RTFM of what the SSE instructions do in order to use them = 2.2s. Ladies and gentlemen it is official. Our language compilers SUCK BALLS.I had the 4 variable assignment / sqrt lines, lined up one after another so that it was made extremely easy for the compiler to do a batch processing with SSE. I even issued a #pragma directive to the gcc to force it, and it didn't do anything. No, the compilers "know better". ...That's how "C is a fast language" goes down the drain. With a simple -02 compilation it would be at 3.8s (by "using" SSE, or, more precisely, mis-using them) vs my 2.2s of manual tampering in Pascal. So C became ~70% slower even when faced with almost ideally placed source code that it could exploit. (side by side Pascal / C / ASM inside Pascal): http://s23.postimg.org/j74spnqc9/wastingtimewithbenchmarks.jpg
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
April 13, 2016, 11:46:53 PM |
|
Looking at the asm output from the original my guess is that you subtly changed the behavior of the program, probably having to do with rounding, overflows or NaN, setting of errno, or some combination. You didn't include the entire disassembly so it is hard to see where the jumps are going.
|
|
|
|
AlexGR
Legendary
Offline
Activity: 1708
Merit: 1049
|
|
April 13, 2016, 11:58:06 PM Last edit: April 15, 2016, 06:50:23 PM by AlexGR |
|
I have a logic test in the end (not displayed here) to always check the numbers for correctness. It goes like: Writeln(); Write('Final number: ',b+bb+bbb+bbbb:0:22,' '); if (b+bb+bbb+bbbb) > 4.0000032938759028 then Write('Result [INCORRECT - 4.0000032938759027 expected]'); if (b+bb+bbb+bbbb) < 4.0000032938759026 then Write('Result [INCORRECT- 4.0000032938759027 expected]'); ... anyway the source for c is: #include <math.h> #include <stdio.h> #include <time.h> int main() { printf("\n");
const double a = 911798473; const double aa = 143314345; const double aaa = 531432117; const double aaaa = 343211418; unsigned int i; double score;
double b; double bb; double bbb; double bbbb;
b = a; bb = aa; bbb = aaa; bbbb = aaaa;
double total_time; clock_t start, end; start = clock(); for (i = 0; i <100000000; i++) { b=sqrt (b); bb=sqrt(bb); bbb=sqrt(bbb); bbbb=sqrt(bbbb); if (b <= 1.0000001) {b=b+i+12.432432432;} if (bb <= 1.0000001) {bb=bb+i+15.4324442;} if (bbb <= 1.0000001) {bbb=bbb+i+19.42884;} if (bbbb <= 1.0000001) {bbbb=bbbb+i+34.481;} }
end = clock();
total_time = ((double) (end - start)) / CLOCKS_PER_SEC * 1000; score = (10000000 / total_time); printf("\nTime elapsed: %0.0f msecs", total_time); printf("\nScore: %0.0f\n", score); return 0; }
And pascal/asm (freepascal / 64 / linux) - including the logic test. {$ASMMODE intel} Uses sysutils;
Const //some randomly chosen constants to begin math functions a: double = 911798473; aa: double = 143314345; aaa: double = 531432117; aaaa: double = 343211418;
Var b,bb,bbb,bbbb: double; //variables that will be used for storing square roots time1,score: single; //how much time the program took, and what the benchmark score is i: longword; //loop counter
Begin Writeln(); //just printing an empty line
b:=a; //begin to assign some large values in order to start finding square roots bb:=aa; bbb:=aaa; bbbb:=aaaa;
sleep(100); // a 100ms delay before we start the timer, so that any I/O has stopped
time1:= GetTickCount64();
for i:= 1 to 100000000 do //100mn loop
begin; asm movlpd xmm1, b //loading the first variable "b" to the lower part of xmm1 movhpd xmm1, bb //loading the second variable "bb" to the higher part of xmm1 SQRTPD xmm1, xmm1 //batch processing both variables for their square root, in the same register, with one SIMD command movlpd xmm2, bbb //loading the third variable "bbb" to the lower part of xmm2 movhpd xmm2, bbbb //loading the fourth variable "bbbb" to the higher part of xmm2 SQRTPD xmm2, xmm2 //batch processing their square roots movlpd b, xmm1 // movhpd bb, xmm1 // Returning all results from the register back to memory (the Pascal program variables) movlpd bbb, xmm2 // movhpd bbbb, xmm2 // end;
{ b:=sqrt(b); // This entire part was replaced with the asm above. bb:=sqrt(bb); // In my machine this code gives me ~4530ms while the asm above gives 2240ms. bbb:=sqrt(bbb); // bbbb:=sqrt(bbbb);} //
if b <= 1.0000001 then b:=b+i+12.432432432; // increase b/bb/bbb/bbb back to higher values by if bb <= 1.0000001 then bb:=bb+i+15.4324442; // adding integers and decimals on them, in order if bbb <= 1.0000001 then bbb:=bbb+i+19.42884; // to keep the variables large and continue the if bbbb <= 1.0000001 then bbbb:=bbbb+i+34.481; // process of finding square roots, instead of the variables going to "1" // due to finite decimal precision. end;
time1:= GetTickCount64() - time1; score:= 10000000 / time1; // Just a way to give a "score" insead of just time elapsed. // Baseline calibration is at 1000 points rewarded for 10000ms delay... // In other words if you finish 5 times faster, say 2000ms, you get 5000 points.
Writeln(); Write('Final number: ',b+bb+bbb+bbbb:0:22,' '); if (b+bb+bbb+bbbb) > 4.0000032938759028 then Write('Result [INCORRECT - 4.0000032938759027 expected]'); //checking result if (b+bb+bbb+bbbb) < 4.0000032938759026 then Write('Result [INCORRECT- 4.0000032938759027 expected]'); //checking result
Writeln(); Writeln('Time elapsed: ',time1:0:0,' msecs.'); // Time elapsed announced to the user Writeln('Score: ', FloatToStr(round(score))); // Score announced to the user End.
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
April 14, 2016, 12:27:52 AM |
|
I have a logic test in the end (not displayed here) to always check the numbers for correctness.
I didn't say you changed the result but you changed the behavior of the relevant portion of the program for certain inputs. sqrt() has edge case behavior that you didn't replicate in replacing it with 'asm SQRTPD'. If you expect the compiler to optimize the entire thing globally based on specific inputs that you know won't hit the edge cases, well that's kind of arbitrary. It could optimize the whole thing and just return a number. High performance code usually uses well-tuned math libraries that avoid these pitfalls. The standard library math functions aren't really suitable for anything beyond casual use.
|
|
|
|
AlexGR
Legendary
Offline
Activity: 1708
Merit: 1049
|
|
April 14, 2016, 12:44:33 AM |
|
1. The result is ok, so no problem there with code behavior.
2. The sqrt is already giving SSE code. I could see it in the disassembler. The problem is that it is not in line with the SIMD spirit. Meaning that the whole concept is of a *single* instruction processing *multiple* data.
If I have 4 x sqrt code and 4 x commands reaching the CPU, then where is the SIMD? It's 4 instructions doing 4 data pieces. That's, well, Single Instruction Single Data... and on a 128bit register (using just 64 bit lengths). .
When you see the disassembler giving you 4 SIMD when it should be 2 (because the variables are 64), you know it's all fucked up right there. I could use the 387 unit as well. Actually I did that out of curiosity. It was slower than the SSE. Apparently the SSE unit is better at that.
// The x387 way / 5150ms // // fld b // fsqrt // fstp b // fld bb // fsqrt // fstp bb // fld bbb // fsqrt // fstp bbb // fld bbbb // fsqrt // fstp bbbb
...so back to SSE for doing it right (2 commands, processing 2 data pieces each). If I was using single precision I could do that with 1 command processing 4 data pieces at once (128 bit register fits 4x32bit).
3. My c equivalent code *was* using the c math library - which should be fast, right? Still, very slow at ~3.8s with a normal -O2 build and at best 3.5s after thorough tampering.
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
April 14, 2016, 12:47:31 AM |
|
I added -fno-math-errno which removes some of the error handling logic performed by the sqrt() function that is not performed by the instruction and indeed the compiler is smart enough to remove the whole loop because you never used the result at all. Time elapsed: 0 msecs Score: inf The reason you see four steps in the original asm output is that the error checking (if it were used) would require doing the operations sequentially so that the correct invocation of sqrt is the one generating the error. Of course, this doesn't really apply if you apply sufficiently global optimizations but then we are back to the above result and what I said earlier. EDIT: I added something to display the results at the end so it does't drop the entire loop and while using the right compiler options improves things a bit, it is still generating ucomisd which clearly indications some sort of range/error/NaN checking. I didn't go through the code carefully to figure out what it is doing but it suffices to say that sqrt() and 'asm SQRTPD' are not functionally equivalent. If you write some code that doesn't pull in floating point (especially library functions) minutiae you will often see actual vectorization.
|
|
|
|
AlexGR
Legendary
Offline
Activity: 1708
Merit: 1049
|
|
April 14, 2016, 01:01:16 AM |
|
EDIT: I added something to display the results at the end so it does't drop the entire loop and while using the right compiler options improves things a bit, it is still generating ucomisd which clearly indications some sort of range/error/NaN checking. I didn't go through the code carefully to figure out what it is doing but it suffices to say that sqrt() and 'asm SQRTPD' are not functionally equivalent.
If you write some code that doesn't pull in floating point (especially library functions) minutiae you will often see actual vectorization.
Yeah I can't really tell what it's doing either, but seeing 4x SI MD for 4 variables, well, that's a "winner" right there for "FAIL". If instructions aren't less than the data variables = you are doing it wrong. And that's not related to the various checks btw.
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
April 14, 2016, 01:04:00 AM |
|
EDIT: I added something to display the results at the end so it does't drop the entire loop and while using the right compiler options improves things a bit, it is still generating ucomisd which clearly indications some sort of range/error/NaN checking. I didn't go through the code carefully to figure out what it is doing but it suffices to say that sqrt() and 'asm SQRTPD' are not functionally equivalent.
If you write some code that doesn't pull in floating point (especially library functions) minutiae you will often see actual vectorization.
Yeah I can't really tell what it's doing either, but seeing 4x SI MD for 4 variables, well, that's a "winner" right there for "FAIL". If instructions aren't less than the data variables = you are doing it wrong. And that's not related to the various checks btw. It's just a straightforward translation of your source code with four separate sqrt() calls. It is using the SIMD instructions (in a SISD mode) because they are faster than the FPU instructions, as you pointed out. It is related too the error checks because the errors are specific to a particular sqrt() call. For example, if the code is interrupted at some point, then the values of the intermediate variable and the error state would have to be consistent even if that happens after, say 2 of the 4 sort() calls and before the other two. As I said, nobody expects the standard library stuff with error checking to be highly optimized. Use a math library, or use a language where math functions aren't defined to do things like set a global variable if they have weird inputs.
|
|
|
|
AlexGR
Legendary
Offline
Activity: 1708
Merit: 1049
|
|
April 14, 2016, 01:06:44 AM |
|
EDIT: I added something to display the results at the end so it does't drop the entire loop and while using the right compiler options improves things a bit, it is still generating ucomisd which clearly indications some sort of range/error/NaN checking. I didn't go through the code carefully to figure out what it is doing but it suffices to say that sqrt() and 'asm SQRTPD' are not functionally equivalent.
If you write some code that doesn't pull in floating point (especially library functions) minutiae you will often see actual vectorization.
Yeah I can't really tell what it's doing either, but seeing 4x SI MD for 4 variables, well, that's a "winner" right there for "FAIL". If instructions aren't less than the data variables = you are doing it wrong. And that's not related to the various checks btw. It's just a straightforward translation of your source code with four separate sqrt() calls. It is using the SIMD instructions (in a SISD mode) because they are faster than the FPU instructions, as you pointed out. I'm just a "noob" but is it too much to have the audacious expectation where the gcc will actually group things that can be grouped, in order to be processed faster? I mean, I couldn't make it any easier for the compiler in the way I ordered it one after the other without other logic steps interfering and making the compiler question whether it is safe or not to do it (in case other stuff might be dependent on a "sequential" result). Sequential but separate = safe.
|
|
|
|
TPTB_need_war (OP)
|
|
April 14, 2016, 06:53:41 AM |
|
Hehe, tech shop in my thread. Cool.
Feeling strong again today. Have more Oregano oil on mail order tomorrow so I can maintain the high doses.
60 points on "Mamba Out". Had to grab my basketball for a few minutes even though my right rotator cuff is not entirely healed yet, but I easily elevated 25" without a warm up (age 51 in June), so I am excited to see what is still possible (maybe by next month start training again).
We'll be probably seeing Kobe potentially as a mogul in social media.
What a way to go out, dazzling glimpses of the younger Kobe. Great way to start my day (except for the brownout which forced me to watch only the highlights from 4th quarter on YouTube).
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
April 14, 2016, 06:57:34 AM |
|
EDIT: I added something to display the results at the end so it does't drop the entire loop and while using the right compiler options improves things a bit, it is still generating ucomisd which clearly indications some sort of range/error/NaN checking. I didn't go through the code carefully to figure out what it is doing but it suffices to say that sqrt() and 'asm SQRTPD' are not functionally equivalent.
If you write some code that doesn't pull in floating point (especially library functions) minutiae you will often see actual vectorization.
Yeah I can't really tell what it's doing either, but seeing 4x SI MD for 4 variables, well, that's a "winner" right there for "FAIL". If instructions aren't less than the data variables = you are doing it wrong. And that's not related to the various checks btw. It's just a straightforward translation of your source code with four separate sqrt() calls. It is using the SIMD instructions (in a SISD mode) because they are faster than the FPU instructions, as you pointed out. I'm just a "noob" but is it too much to have the audacious expectation where the gcc will actually group things that can be grouped, in order to be processed faster? I mean, I couldn't make it any easier for the compiler in the way I ordered it one after the other without other logic steps interfering and making the compiler question whether it is safe or not to do it (in case other stuff might be dependent on a "sequential" result). Sequential but separate = safe. It would do exactly that (sometimes at least) when you are dealing with operations that don't have bizarre side effects and complicated rules about specific input values, as the sort() library function does. For example in the Monero mining code there is a loop that XORs some values one byte at at time but when you look at the compiled code it actually does it 16 bytes at a time using SSE (PXOR instruction IIRC). A better compiler certainly could do better with sqrt() in some cases, especially with the flag I mentioned (and even without, given sufficient global analysis, but as I said how much of that to do is somewhat of a judgement call), but I'm just pointing out that the program you fed it was not as simple as it appeared, in terms of what you were asking for. I'm not sure the deal with Pascal, I never use it.
|
|
|
|
|