Bitcoin Forum

Other => CPU/GPU Bitcoin mining hardware => Topic started by: 1l1l11ll1l on June 18, 2012, 11:09:19 PM



Title: Xeon Phi
Post by: 1l1l11ll1l on June 18, 2012, 11:09:19 PM
Interesting. Turned the Knights Corner into a PCIe product.

http://www.engadget.com/2012/06/18/intel-christens-its-MIC-products-xeon-phi/

http://newsroom.intel.com/servlet/JiveServlet/showImage/102-2851-9-2346/Intel_Xeon_Phi_PCIe_Card.jpg



Title: Re: Xeon Phi
Post by: crazyates on June 18, 2012, 11:58:51 PM
The question is: can it mine? 50 cores isn't a lot compared to ATIs thousands, but IIRC, these would be WAY higher clocked?


Title: Re: Xeon Phi
Post by: rjk on June 19, 2012, 12:37:15 AM
Cool! Wonder if it will be the Litecoin endgame. ;D


Title: Re: Xeon Phi
Post by: DiabloD3 on June 19, 2012, 01:14:17 AM
So what is it, exactly?


Title: Re: Xeon Phi
Post by: rjk on June 19, 2012, 01:15:41 AM
So what is it, exactly?
50-core x86 co-processor, code named Knights Corner, in a PCIe card.


Title: Re: Xeon Phi
Post by: Electricbees on June 19, 2012, 01:29:11 AM
I've been waiting for someone to make something like this...


Title: Re: Xeon Phi
Post by: DiabloD3 on June 19, 2012, 01:52:03 AM
So what is it, exactly?
50-core x86 co-processor, code named Knights Corner, in a PCIe card.

50 normal cores? And 22nm trigate fab, so its probably some derivative of Sandy Bridge (but with Failabee-like ring busses). I wonder what sort of onboard memory it as and if its considered local memory and normally addressable.


Title: Re: Xeon Phi
Post by: rjk on June 19, 2012, 01:53:40 AM
So what is it, exactly?
50-core x86 co-processor, code named Knights Corner, in a PCIe card.

50 normal cores? And 22nm trigate fab, so its probably some derivative of Sandy Bridge (but with Failabee-like ring busses). I wonder what sort of onboard memory it as and if its considered local memory and normally addressable.
Well apparently they are x86 compatible, but I can't imagine that they are totally complete in the sense that they could be used as a main processor. I seem to recall reduced cache and a few other limitatoins.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 19, 2012, 01:56:44 AM
So what is it, exactly?
50-core x86 co-processor, code named Knights Corner, in a PCIe card.

50 normal cores? And 22nm trigate fab, so its probably some derivative of Sandy Bridge (but with Failabee-like ring busses). I wonder what sort of onboard memory it as and if its considered local memory and normally addressable.
Well apparently they are x86 compatible, but I can't imagine that they are totally complete in the sense that they could be used as a main processor. I seem to recall reduced cache and a few other limitatoins.

Cache isn't the issue, really. But if it still can perform just as well on highly branchy code, I might have a use for one of those.


Title: Re: Xeon Phi
Post by: 1l1l11ll1l on June 19, 2012, 02:20:19 AM
http://blogs.intel.com/technology/files/2012/06/phi_coprocessor_TEASER1.jpg


Title: Re: Xeon Phi
Post by: Bitcoin Oz on June 19, 2012, 02:25:13 AM
Someone should make a coin that is optimsed for nvidia hardware...

NCoin lol


Title: Re: Xeon Phi
Post by: pekv2 on June 19, 2012, 02:42:59 AM
Just ran across this on hardocp.com.

I was post the article till I seen this thread & ask the same questions.

What kind of results would this bring to Bitcoin and Litecoin mining?


Title: Re: Xeon Phi
Post by: BinaryMage on June 19, 2012, 03:14:19 AM
Just ran across this on hardocp.com.

I was post the article till I seen this thread & ask the same questions.

What kind of results would this bring to Bitcoin and Litecoin mining?

I would guess nothing spectactular, especially for its likely price point.

For comparision, a 5870 puts out ~2.7 TFLOPS. Bitcoin mining isn't floating point, but that indicates to some degree that this unit would likely have ~150-200 MH/s. (For comparision, one high-end Xeon core puts out around 3-4 MH/s; multiply that by 50)

As an enterprise product, its pricing will be in the multiple thousands at least. With less than 0.1 MH per dollar, this isn't going to change anything.




Title: Re: Xeon Phi
Post by: goxed on June 19, 2012, 05:09:06 AM
Mind you it's 1TFLOPS Double Precision.. http://newsroom.intel.com/community/intel_newsroom/blog/2012/06/17/latest-intel-xeon-processors-e5-product-family-achieves-fastest-adoption-of-new-technology-on-top500-list
IMO, no single chip GPU has that kind of performance for double precision arithmetic.


Title: Re: Xeon Phi
Post by: Gladamas on June 19, 2012, 05:17:42 AM
By "highly parallel tasks" does that include two rounds of SHA-256?


Title: Re: Xeon Phi
Post by: BinaryMage on June 19, 2012, 06:15:29 AM
Mind you it's 1TFLOPS Double Precision.. http://newsroom.intel.com/community/intel_newsroom/blog/2012/06/17/latest-intel-xeon-processors-e5-product-family-achieves-fastest-adoption-of-new-technology-on-top500-list
IMO, no single chip GPU has that kind of performance for double precision arithmetic.

That makes a difference; I just read the post a few above, not the press release. Still, I doubt this will be a gamechanger. ;)

By "highly parallel tasks" does that include two rounds of SHA-256?

Well, I'm sure Intel would say so, but GPUs are also excellent at "highly parallel tasks".


Title: Re: Xeon Phi
Post by: tgmarks on June 19, 2012, 06:21:31 AM
I wonder what the power draw of something like this is.


Title: Re: Xeon Phi
Post by: goxed on June 19, 2012, 06:51:13 AM
power draw
I think 200 Watts. Here are some pics, I can see the two 3x2  PCI-E power connectors.
http://cdn.itproportal.com/photos/knightsferry-ins-situ.jpg

http://cdn.itproportal.com/photos/knightsferryrear.jpg

http://cdn.itproportal.com/photos/knightsferry-demo.jpg



Title: Re: Xeon Phi
Post by: Miner99er on June 19, 2012, 07:06:04 AM
Its actually 50+ original Pentium cores (think after 486), just shrunk down to 22nm and built with their 3D Trigate design, Not Sandy Bridge at all. I'm sure they've added some extra instructions but still, it's not much more than that.


Title: Re: Xeon Phi
Post by: John (John K.) on June 19, 2012, 07:15:44 AM
power draw
I think 200 Watts. Here are some pics, I can see the two 3x2  PCI-E power connectors.
http://cdn.itproportal.com/photos/knightsferry-ins-situ.jpg

http://cdn.itproportal.com/photos/knightsferryrear.jpg

http://cdn.itproportal.com/photos/knightsferry-demo.jpg



I see 8TH of power staring at me. :o My goodness, won't the cards overheat from being so packed against each other?


Title: Re: Xeon Phi
Post by: BR0KK on June 19, 2012, 08:56:17 AM
Its actually 50+ original Pentium cores (think after 486), just shrunk down to 22nm and built with their 3D Trigate design, Not Sandy Bridge at all. I'm sure they've added some extra instructions but still, it's not much more than that.

Yep i read that too. Something about Pentium, if not Pentium 3 or 4.


Title: Re: Xeon Phi
Post by: mrb on June 19, 2012, 09:07:43 AM
Its actually 50+ original Pentium cores (think after 486), just shrunk down to 22nm and built with their 3D Trigate design, Not Sandy Bridge at all. I'm sure they've added some extra instructions but still, it's not much more than that.

It is a lot more than that. Xeon Phi implements 512-bit SIMD units, so it can execute 16x more operations per clock than one (non-MMX) Pentium core.


Title: Re: Xeon Phi
Post by: Lethos on June 19, 2012, 09:22:50 AM
Have to admit the Intel Phi looks good. However they are hitting a rather small market (Supercomputing), so I hope they are considering very competitive prices, since their competition is GPU's that have performance numbers that match that already.

It's advantage they make out that since it's CPU based it's easier to program for. In supercomputing you aren't exactly dealing with average joe's here, you will be dealing with programmers and scientists with Masters and PHD's, not the sort who are phased by that.

I don't know why Intel did this, is it or not just a form of multi-core CPU? or is more like a GPU?
After all their is a reason why bit coining mining moved away from CPU's, moved up to GPU's and in the next few years will moved mostly to FPGA's I expect.
Supercomputing is on the same path, it has gone more GPGPU powered in recent years.
Intel trying to hold supercomputers back to CPU's based supercomputing I can't see lasting. However this isn't really a normal CPU, so I'll still be interested in some real results.



Title: Re: Xeon Phi
Post by: BR0KK on June 19, 2012, 09:23:50 AM
so that is good for mining?

what are the odds we can get hold of one unit? will be expensive and u need some enterprise contacts to by one (not a consumer Product). Like tesla units ....


Title: Re: Xeon Phi
Post by: Lethos on June 19, 2012, 09:47:48 AM
so that is good for mining?

what are the odds we can get hold of one unit? will be expensive and u need some enterprise contacts to by one (not a consumer Product). Like tesla units ....

It be good for mining like 50-60 CPU's would be. It can be done, but not a very good idea. I doubt they've improved the architecture and design in these to be more like a GPU enough that it would be a worthwhile thing to hash on. I expect these to cost 1000's like most Fermi GPU's do.

So no, not good for mining.


Title: Re: Xeon Phi
Post by: Gabi on June 19, 2012, 12:13:01 PM
It is NOT good for mining. An ATI GPU is probably much much faster.

This thing is 1TFLOP in double precision in x86

No need to mess with CUDA, OpenCL and other fail GPU languages with tons of problems. This is x86. Everything will run on it.

Do you know BOINC? Nice, now buy some of these Xeon Phi, run BOINC and suddenly epic computing power.


Title: Re: Xeon Phi
Post by: Miner99er on June 19, 2012, 12:35:39 PM
Its actually 50+ original Pentium cores (think after 486), just shrunk down to 22nm and built with their 3D Trigate design, Not Sandy Bridge at all. I'm sure they've added some extra instructions but still, it's not much more than that.

It is a lot more than that. Xeon Phi implements 512-bit SIMD units, so it can execute 16x more operations per clock than one (non-MMX) Pentium core.

Well, It's built off the same platform Intel's defunct Larrabee GPU design is. You're right about what it can do. I should have posted a more thurough post myself... but below is the wiki for larrabee. I don't think Phi will be any different (sans the graphics capabilities.)


http://en.wikipedia.org/wiki/Larrabee_(microarchitecture) (http://en.wikipedia.org/wiki/Larrabee_(microarchitecture))

Edit: Reading though the Wiki on Larrabee... it was going to be a 2 TFLOP card, while Phi is only 1 TFLOP. Wonder what happened.


Title: Re: Xeon Phi
Post by: goxed on June 19, 2012, 12:40:20 PM
Quote
Edit: Reading though the Wiki on Larrabee... it was going to be a 2 TFLOP card, while Phi is only 1 TFLOP. Wonder what happened.
The 2TFLOP estimate for Larabee is for single precision floating point operation, while the 1TFLOP quoted for Phi is for double precision.


Title: Re: Xeon Phi
Post by: mrb on June 19, 2012, 12:49:47 PM
AFAIK Intel targets ~1.5GHz for the Xeon Phi.
So, 50 cores with their 512-bit SIMD instruction set would execute 1200 billion 32-bit instructions per second.
Assuming a core can execute the SHA-256 operations in 1 clock cycle (rotate, shift, add, xor, or, and), and does not have an instruction like BFI_INT to optimize ch() and maj(), then it would take about 4300 clocks to compute a Bitcoin hash.

Given all these assumptions, a Xeon Phi card should mine at roughly 280 Mhash/s, or about as fast as a low end HD 7850. Not impressive.


Title: Re: Xeon Phi
Post by: BR0KK on June 19, 2012, 12:51:59 PM
Quote
Edit: Reading though the Wiki on Larrabee... it was going to be a 2 TFLOP card, while Phi is only 1 TFLOP. Wonder what happened.

I read something about that they wanted to implement more than 80 (100) Cores, but they failed to do that.


Title: Re: Xeon Phi
Post by: mc_lovin on June 19, 2012, 04:52:08 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!


Title: Re: Xeon Phi
Post by: rjk on June 19, 2012, 05:20:07 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.


Title: Re: Xeon Phi
Post by: cmg5461 on June 19, 2012, 06:09:43 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

If intel made a *working* GPU, I bet it would kick the shit out of AMD compute wise..


Title: Re: Xeon Phi
Post by: DiabloD3 on June 19, 2012, 06:16:17 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

If intel made a *working* GPU, I bet it would kick the shit out of AMD compute wise..

Then why don't they?


Title: Re: Xeon Phi
Post by: BR0KK on June 19, 2012, 06:20:31 PM
Cause they suck at ist ;)


Title: Re: Xeon Phi
Post by: bulanula on June 19, 2012, 06:23:47 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

Helps heat dissipation too  ;) This is not crappy AMD card with plastic cover  :D


Title: Re: Xeon Phi
Post by: DiabloD3 on June 19, 2012, 06:25:29 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

Helps heat dissipation too  ;) This is not crappy AMD card with plastic cover  :D

You do realize thats an air channel, right?


Title: Re: Xeon Phi
Post by: bulanula on June 19, 2012, 06:26:42 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

Helps heat dissipation too  ;) This is not crappy AMD card with plastic cover  :D

You do realize thats an air channel, right?

Hot air channel will heat up top metal cover by convection ( hot air rises ) and conduction and the metal will get hot up there.

Even the plastic on my AMD cards is quite hot to the touch and not normal room temp. even if it does not conduct heat as well as metal


Title: Re: Xeon Phi
Post by: crazyates on June 19, 2012, 06:37:56 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

Helps heat dissipation too  ;) This is not crappy AMD card with plastic cover  :D

You do realize thats an air channel, right?

Hot air channel will heat up top metal cover by convection ( hot air rises ) and conduction and the metal will get hot up there.

Even the plastic on my AMD cards is quite hot to the touch and not normal room temp. even if it does not conduct heat as well as metal

Ur an idiot.


Title: Re: Xeon Phi
Post by: rjk on June 19, 2012, 06:39:36 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

Helps heat dissipation too  ;) This is not crappy AMD card with plastic cover  :D

You do realize thats an air channel, right?

Hot air channel will heat up top metal cover by convection ( hot air rises ) and conduction and the metal will get hot up there.

Even the plastic on my AMD cards is quite hot to the touch and not normal room temp. even if it does not conduct heat as well as metal

Ur an idiot.
Not totally, a metal shroud would conduct more heat, resulting in different thermal dynamics. But I don't know how much of an actual temperature difference there would be.


Title: Re: Xeon Phi
Post by: bulanula on June 19, 2012, 07:02:44 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

Helps heat dissipation too  ;) This is not crappy AMD card with plastic cover  :D

You do realize thats an air channel, right?

Hot air channel will heat up top metal cover by convection ( hot air rises ) and conduction and the metal will get hot up there.

Even the plastic on my AMD cards is quite hot to the touch and not normal room temp. even if it does not conduct heat as well as metal

Ur an idiot.

What an insightful post. Buy one of these and show me that the metal case is not hotter than room temperature then call me an idiot.

Until then you are a fool ... metal will surely be hot and help with heat dissipation.


Title: Re: Xeon Phi
Post by: cmg5461 on June 19, 2012, 07:07:54 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

Helps heat dissipation too  ;) This is not crappy AMD card with plastic cover  :D

You do realize thats an air channel, right?

Hot air channel will heat up top metal cover by convection ( hot air rises ) and conduction and the metal will get hot up there.

Even the plastic on my AMD cards is quite hot to the touch and not normal room temp. even if it does not conduct heat as well as metal

Ur an idiot.

So.. you can remove heat from metal to air... but you can't impose heat by air to metal.. hmm yes, makes sense!


Title: Re: Xeon Phi
Post by: Miner99er on June 19, 2012, 07:10:01 PM
Wow, this whole thread went to shit. (http://www.youtube.com/watch?v=5hfYJsQAhl0)


Title: Re: Xeon Phi
Post by: DiabloD3 on June 19, 2012, 07:10:39 PM
so that's what it would look like if Intel made a GPU :)

F'n sexy!
Yeah I love the industrial metal look without all the plastic.

Helps heat dissipation too  ;) This is not crappy AMD card with plastic cover  :D

You do realize thats an air channel, right?

Hot air channel will heat up top metal cover by convection ( hot air rises ) and conduction and the metal will get hot up there.

Even the plastic on my AMD cards is quite hot to the touch and not normal room temp. even if it does not conduct heat as well as metal

Ur an idiot.
Not totally, a metal shroud would conduct more heat, resulting in different thermal dynamics. But I don't know how much of an actual temperature difference there would be.

Technically you'd want a material that conducts zero heat so it is properly vented out of the case and not leaked into it. Plastic isn't entirely appropriate, but its cheap. Metal isn't appropriate.


Title: Re: Xeon Phi
Post by: cmg5461 on June 19, 2012, 07:11:24 PM

errr.. don't they mean 2 pcie slots? LOL


Title: Re: Xeon Phi
Post by: DiabloD3 on June 19, 2012, 07:12:29 PM

Single slot, double wide.


Title: Re: Xeon Phi
Post by: cmg5461 on June 19, 2012, 07:13:06 PM
gotchya.  didn't think of that.  Used to counting 'spaces' for card :p


Technically you'd want a material that conducts zero heat so it is properly vented out of the case and not leaked into it. Plastic isn't entirely appropriate, but its cheap. Metal isn't appropriate.

Correct, but the solutions these will be used for have large amounts of air flowing over the cards.  IE server racks.  So while this neglects to be used in a regular computer, a server rack filled with these will do just fine.


Title: Re: Xeon Phi
Post by: rjk on June 19, 2012, 07:13:46 PM
Yep, that's why I said different thermal dynamics, not better thermal dyanmics. But in the case of a server where these will be used in, it is likely that the case fans will take care of excess dissipated heat with no problems. It also appears to have a plastic coating on the metal shell, unless that is just paint.

EDIT: cmg5461 beat me to it.


Title: Re: Xeon Phi
Post by: cmg5461 on June 19, 2012, 07:14:17 PM
Yep, that's why I said different thermal dynamics, not better thermal dyanmics. But in the case of a server where these will be used in, it is likely that the case fans will take care of excess dissipated heat with no problems. It also appears to have a plastic coating on the metal shell, unless that is just paint.

Beat you :P


Title: Re: Xeon Phi
Post by: 1l1l11ll1l on June 19, 2012, 09:21:20 PM
Wow, this whole thread went to shit. (http://www.youtube.com/watch?v=5hfYJsQAhl0)

+1


Title: Re: Xeon Phi
Post by: 1l1l11ll1l on June 19, 2012, 09:23:52 PM

Wait, whahhh!? Why you no call idiot first?

This is an Excellent example of how to correct someone.

Thank you D3


Title: Re: Xeon Phi
Post by: Electricbees on June 19, 2012, 09:26:29 PM
I wonder if anyone realizes that people with functioning brains design these products with their features, ON PURPOSE.

Shocking, I know... But true. :P


Title: Re: Xeon Phi
Post by: bulanula on June 19, 2012, 10:57:11 PM
I wonder if anyone realizes that people with functioning brains design these products with their features, ON PURPOSE.

Shocking, I know... But true. :P

No way, the metal casing was just a design feature to make it look expensive. Plastic looks cheap :P


Title: Re: Xeon Phi
Post by: release on June 20, 2012, 12:36:52 AM

Single Pcie slot. 2 expansion slots. So it depends on which slot you're talking about.


Title: Re: Xeon Phi
Post by: multi#lord on June 20, 2012, 03:13:00 AM
So word on the street is the Xeon Phi is supposed to be directed to compete with the nVidia Tesla presence in the supercomputer market? Seems like a bit of a war will roll out: x86 vs CUDA vs OpenCl? Did AMD stop developing Firestream cards?


Title: Re: Xeon Phi
Post by: rjk on June 20, 2012, 01:39:59 PM
So word on the street is the Xeon Phi is supposed to be directed to compete with the nVidia Tesla presence in the supercomputer market? Seems like a bit of a war will roll out: x86 vs CUDA vs OpenCl? Did AMD stop developing Firestream cards?
I'm pretty sure each x86 core is significantly more capable than any of nVidia's CUDA cores. But because of this, they are larger and there are fewer of them. It depends on how complex your stuff is whether you use this or whether you use CUDA.


Title: Re: Xeon Phi
Post by: ice_chill on June 20, 2012, 01:54:29 PM
This 50core card is designed to do double precision calculations, consumer GPUs only do single precision.


Title: Re: Xeon Phi
Post by: mrb on June 20, 2012, 02:05:42 PM
This 50core card is designed to do double precision calculations, consumer GPUs only do single precision.

High-end AMD consumer GPUs (69xx, 77xx, 78xx, 79xx) do support double precision.
Most Nvidia ones support it too (albeit artificially throttled).


Title: Re: Xeon Phi
Post by: BCMan on June 20, 2012, 04:36:30 PM
This 50core card is designed to do double precision calculations, consumer GPUs only do single precision.

High-end AMD consumer GPUs (69xx, 77xx, 78xx, 79xx) do support double precision.
Most Nvidia ones support it too (albeit artificially throttled).

Well, just nvidia always sucks hard.


Title: Re: Xeon Phi
Post by: rjk on June 20, 2012, 04:44:37 PM
This 50core card is designed to do double precision calculations, consumer GPUs only do single precision.

High-end AMD consumer GPUs (69xx, 77xx, 78xx, 79xx) do support double precision.
Most Nvidia ones support it too (albeit artificially throttled).

Well, just nvidia always sucks hard.
Yeah, that's why their Tesla stuff shows up in pretty much all of the new supercomputer builds these days, right?  ::)


Title: Re: Xeon Phi
Post by: cmg5461 on June 20, 2012, 06:19:21 PM
Yeah, that's why their Tesla stuff shows up in pretty much all of the new supercomputer builds these days, right?  ::)

their high end tesla's kick some major ass.


Title: Re: Xeon Phi
Post by: multi#lord on June 21, 2012, 03:17:59 PM
Probably covered some of the stuff in thread, but interesting read on the Xeon Phi:

http://vr-zone.com/articles/intel-xeon-family-finally-accepts-the-larrabee-in-xeon-phi-and-its-futures/16361.html



Title: Re: Xeon Phi
Post by: rjk on June 21, 2012, 03:26:46 PM
Probably covered some of the stuff in thread, but interesting read on the Xeon Phi:

http://vr-zone.com/articles/intel-xeon-family-finally-accepts-the-larrabee-in-xeon-phi-and-its-futures/16361.html


Interesting!
Quote
The 50+ simple two-way in-order Pentium (yes, 1995 Pentium!) like cores feed the same number of 512-bit wide SIMD FP units, with the ability to deliver around 1 TFLOPs peak in double precision at around 1 GHz.


Title: Re: Xeon Phi
Post by: cmg5461 on June 21, 2012, 03:32:06 PM
Quote
And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.

umm. wat


Title: Re: Xeon Phi
Post by: rjk on June 21, 2012, 03:33:53 PM
Quote
And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.

umm. wat
True story, but only in the hands of the engineers that designed them, no one else that I know of has been able to make it happen.


Title: Re: Xeon Phi
Post by: cmg5461 on June 21, 2012, 03:36:02 PM
True story, but only in the hands of the engineers that designed them, no one else that I know of has been able to make it happen.

ah.  I never knew it was possible.  I guess you could think of a gpu as a slower cpu.  It must need a heavily modifies kernel though


Title: Re: Xeon Phi
Post by: crazyates on June 21, 2012, 03:49:35 PM
True story, but only in the hands of the engineers that designed them, no one else that I know of has been able to make it happen.

ah.  I never knew it was possible.  I guess you could think of a gpu as a slower cpu.  It must need a heavily modifies kernel though

Still. Gentoo on an APU is gonna be awesome come 2015!

Architecture:
[ ] x86
[ ] amd64
[X] opencl
[ ] arm


Title: Re: Xeon Phi
Post by: DiabloD3 on June 22, 2012, 12:29:32 AM
Quote
And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.

umm. wat

AMD's Fusion is a product of years of research. AMD "demo'ed" an all HyperTransport Radeon about a year after they bought ATI, and they've also been showing off prototype Fusions that don't just have Radeon pipes on-die* but usable from the x86 interface side, although what "usable" means is still up in the air, but if they've managed to use them as the backend for SIMD instructions (ie, no more dedicated FPU units, and the x86 instruction scheduler issues as many ops as it can in parallel (instead of just, say, 2 per core), instead 512 Radeon ALUs across the entire CPU) this could mean a huge goddamned increase in FP performance without needing a dedicated HAL API like OpenCL.

* On-die Fusion Radeons don't have a Radeon memory controller and natively speak HyperTransport. The up side is, they have direct access to system memory as a native processor and can access stuff directly out of on-die cache: this means you have basically zero wait time to send stuff to the GPU for processing and you have zero cost cache coherency.


Title: Re: Xeon Phi
Post by: crazyates on June 22, 2012, 03:52:19 AM
Quote
And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.

umm. wat

AMD's Fusion is a product of years of research. AMD "demo'ed" an all HyperTransport Radeon about a year after they bought ATI, and they've also been showing off prototype Fusions that don't just have Radeon pipes on-die* but usable from the x86 interface side, although what "usable" means is still up in the air, but if they've managed to use them as the backend for SIMD instructions (ie, no more dedicated FPU units, and the x86 instruction scheduler issues as many ops as it can in parallel (instead of just, say, 2 per core), instead 512 Radeon ALUs across the entire CPU) this could mean a huge goddamned increase in FP performance without needing a dedicated HAL API like OpenCL.

* On-die Fusion Radeons don't have a Radeon memory controller and natively speak HyperTransport. The up side is, they have direct access to system memory as a native processor and can access stuff directly out of on-die cache: this means you have basically zero wait time to send stuff to the GPU for processing and you have zero cost cache coherency.
http://media.forumpcs.com.br/wp-content/blogs.dir/34/files/llano_a8-3850-3066433/llano1.jpg/1200_0,0,0,0/llano1.jpg/llano1.jpg

This is an old slide, but it gives a good vision of AMD's overall goal. We are somewhere between step 2 and step 3, and it's only going to be getting better! AMD has one of the most creative and innovative visions for the future of consumer computing (as opposed to intel just shrinking nm die sizes), and I  think it's progressing quite well (just look at the success of their APU sales) I also think it's only going to get better for them as they move along with even more amazing features like what you just described.

/amdfanboyrant


Title: Re: Xeon Phi
Post by: DiabloD3 on June 22, 2012, 04:17:24 AM
Quote
And, at least behind the closed doors, both AMD and Nvidia GPUs have been shown booting Linux on their own, without requiring a CPU.

umm. wat

AMD's Fusion is a product of years of research. AMD "demo'ed" an all HyperTransport Radeon about a year after they bought ATI, and they've also been showing off prototype Fusions that don't just have Radeon pipes on-die* but usable from the x86 interface side, although what "usable" means is still up in the air, but if they've managed to use them as the backend for SIMD instructions (ie, no more dedicated FPU units, and the x86 instruction scheduler issues as many ops as it can in parallel (instead of just, say, 2 per core), instead 512 Radeon ALUs across the entire CPU) this could mean a huge goddamned increase in FP performance without needing a dedicated HAL API like OpenCL.

* On-die Fusion Radeons don't have a Radeon memory controller and natively speak HyperTransport. The up side is, they have direct access to system memory as a native processor and can access stuff directly out of on-die cache: this means you have basically zero wait time to send stuff to the GPU for processing and you have zero cost cache coherency.
http://media.forumpcs.com.br/wp-content/blogs.dir/34/files/llano_a8-3850-3066433/llano1.jpg/1200_0,0,0,0/llano1.jpg/llano1.jpg

This is an old slide, but it gives a good vision of AMD's overall goal. We are somewhere between step 2 and step 3, and it's only going to be getting better! AMD has one of the most creative and innovative visions for the future of consumer computing (as opposed to intel just shrinking nm die sizes), and I  think it's progressing quite well (just look at the success of their APU sales) I also think it's only going to get better for them as they move along with even more amazing features like what you just described.

/amdfanboyrant

Yeah, what I described is clearly Step 3 or later. Intel also seems to have finally sold a "step 3" type of device in the Phi, depending on what it actually can do.


Title: Re: Xeon Phi
Post by: crazyates on June 22, 2012, 04:28:05 AM
Yeah, what I described is clearly Step 3 or later. Intel also seems to have finally sold a "step 3" type of device in the Phi, depending on what it actually can do.

Intel seems more interested in incorporating the CPU into the GPU, while AMD is incorporating the GPU into the CPU. Totally different mindsets/endgames/results.


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 22, 2012, 04:54:03 AM
But if it still can perform just as well on highly branchy code, I might have a use for one of those.

That would make it crazy awesome for raytracing.

Yeah, that's why their Tesla stuff shows up in pretty much all of the new supercomputer builds these days, right?  ::)

their high end tesla's kick some major ass.

On the wallet maybe. Last I checked, a Tesla M2090 was north of $4000.

Also 1 TFLOP is not that impressive. HD7970 is 947 DP GFLOP and it was released in January and doesn't have access to Intel's 22 nm 3D tri-gate tech.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 22, 2012, 05:42:50 AM
Yeah, what I described is clearly Step 3 or later. Intel also seems to have finally sold a "step 3" type of device in the Phi, depending on what it actually can do.

Intel seems more interested in incorporating the CPU into the GPU, while AMD is incorporating the GPU into the CPU. Totally different mindsets/endgames/results.

They both want branch/loop happy highly parallel computation. The Radeon's biggest "problem" (and I'm using the term loosely) is that wavefronts are ran in lockstep: both sides of a branch are the same length, even if it requires inserting no-ops, and loops that have lengths that are set at runtime (instead of static/compile time set) are just as nasty.

CPUs, otoh, can't do highly parallel calculations because of all the hardware dedicated dealing with branching, branch prediction, cache prediction, etc etc etc takes up a lot of room, produces a lot of heat, and uses a lot of power. I wonder how much stuff Intel removed to put 50 cores on a card.


Title: Re: Xeon Phi
Post by: goxed on June 22, 2012, 06:21:53 AM
I wonder how much stuff Intel removed to put 50 cores on a card.
Here's a pdf depicting the organization of Larrabee, the precursor of Phi.
http://users.ece.gatech.edu/lanterma/mpg08/Larrabee_ECE4893.pdf (http://users.ece.gatech.edu/lanterma/mpg08/Larrabee_ECE4893.pdf)


Title: Re: Xeon Phi
Post by: DiabloD3 on June 22, 2012, 07:40:37 AM
I wonder how much stuff Intel removed to put 50 cores on a card.
Here's a pdf depicting the organization of Larrabee, the precursor of Phi.
http://users.ece.gatech.edu/lanterma/mpg08/Larrabee_ECE4893.pdf (http://users.ece.gatech.edu/lanterma/mpg08/Larrabee_ECE4893.pdf)

Im already well aware of how they designed that. Its more butchered than Atom. But from what I've heard, Phi isn't nearly as bad.


Title: Re: Xeon Phi
Post by: Gabi on June 22, 2012, 11:26:23 AM

Also 1 TFLOP is not that impressive. HD7970 is 947 DP GFLOP and it was released in January and doesn't have access to Intel's 22 nm 3D tri-gate tech.
Protip: Xeon Phi run x86 code

Good luck using the 7970 (or nvidia) computing power, having to fight with opencl and cuda.


Title: Re: Xeon Phi
Post by: Gabi on June 22, 2012, 11:30:19 AM
Probably covered some of the stuff in thread, but interesting read on the Xeon Phi:

http://vr-zone.com/articles/intel-xeon-family-finally-accepts-the-larrabee-in-xeon-phi-and-its-futures/16361.html


This article fail:

Quote
So, how does it stand performance wise? Its double precision FP throughput is the same as the typical AMD Radeon HD7970 card which costs one quarter of the amount but with much smaller memory, 3 GB, and no ECC.
No ECC? The 7970 has ECC  ::)


Title: Re: Xeon Phi
Post by: rjk on June 22, 2012, 11:53:45 AM

The 7970 has ECC  ::)
Are you sure? That's usually reserved for the expensive enterprisey cards like FirePro.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 22, 2012, 02:46:37 PM
Probably covered some of the stuff in thread, but interesting read on the Xeon Phi:

http://vr-zone.com/articles/intel-xeon-family-finally-accepts-the-larrabee-in-xeon-phi-and-its-futures/16361.html


This article fail:

Quote
So, how does it stand performance wise? Its double precision FP throughput is the same as the typical AMD Radeon HD7970 card which costs one quarter of the amount but with much smaller memory, 3 GB, and no ECC.
No ECC? The 7970 has ECC  ::)

No it doesn't. What GCN did was add ECC to all internal on-die memory (caches, local stores, etc), but the only cards AMD has that have ECC GDDR5 are FirePro/FireStream cards, and although they're normal GCN chips, they're not referred to as such.


Title: Re: Xeon Phi
Post by: multi#lord on June 22, 2012, 05:34:06 PM
I was also under the impression the 7970 did have ECC, but I thought it was not used. It would cost performance? My impression is based from some articles and postings such as the two below.

I read this sometime back on: http://www.anandtech.com/Show/Index/4455?cPage=4&all=False&sort=0&page=6&slug=amds-graphics-core-next-preview-amd-architects-for-compute

Quote
Finally on the memory side, AMD is adding proper ECC support to supplement their existing EDC (Error Detection & Correction) functionality, which is used to ensure the integrity of memory transmissions across the GDDR5 memory bus. Both the SRAM and VRAM memory can be ECC protected. For the SRAM this is a free operation, while for the VRAM there will be a performance overhead. We’re assuming that AMD will be using a virtual ECC scheme like NVIDIA, where ECC data is distributed across VRAM rather than using extra memory chips/controllers.

Shamino has done some LN2 overclocking when the 7970 was released, in his forum he wrote for the 7970,

Quote
actually 1800 ram is easy, i ran 2000 ram and it got the ECC correction and the score was worse.

http://kingpincooling.com/forum/showthread.php?t=1559


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 22, 2012, 08:53:50 PM
Protip: Xeon Phi run x86 code

Good luck using the 7970 (or nvidia) computing power, having to fight with opencl and cuda.

Highly unlikely Phi is going to run x86 code unmodified.

Also, OpenCL isn't bad, it's the whole making things "parallel" that is the hard part of any problem.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 22, 2012, 09:25:44 PM
I was also under the impression the 7970 did have ECC, but I thought it was not used. It would cost performance? My impression is based from some articles and postings such as the two below.

I read this sometime back on: http://www.anandtech.com/Show/Index/4455?cPage=4&all=False&sort=0&page=6&slug=amds-graphics-core-next-preview-amd-architects-for-compute

Quote
Finally on the memory side, AMD is adding proper ECC support to supplement their existing EDC (Error Detection & Correction) functionality, which is used to ensure the integrity of memory transmissions across the GDDR5 memory bus. Both the SRAM and VRAM memory can be ECC protected. For the SRAM this is a free operation, while for the VRAM there will be a performance overhead. We’re assuming that AMD will be using a virtual ECC scheme like NVIDIA, where ECC data is distributed across VRAM rather than using extra memory chips/controllers.

Shamino has done some LN2 overclocking when the 7970 was released, in his forum he wrote for the 7970,

Quote
actually 1800 ram is easy, i ran 2000 ram and it got the ECC correction and the score was worse.

http://kingpincooling.com/forum/showthread.php?t=1559

ECC on external RAM like that is done by adding more chips. If these were DIMMs, you'd have DIMMs with 9 chips instead of 8.

The easy way to figure this out is if someone finds a picture of the ref board naked and count the chips.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 22, 2012, 09:27:53 PM
Protip: Xeon Phi run x86 code

Good luck using the 7970 (or nvidia) computing power, having to fight with opencl and cuda.

Highly unlikely Phi is going to run x86 code unmodified.

Also, OpenCL isn't bad, it's the whole making things "parallel" that is the hard part of any problem.

Actually, it might run x86 code unmodified. That just isn't the best way to performance on those machines.


Title: Re: Xeon Phi
Post by: multi#lord on June 22, 2012, 10:36:25 PM
I was also under the impression the 7970 did have ECC, but I thought it was not used. It would cost performance? My impression is based from some articles and postings such as the two below.

I read this sometime back on: http://www.anandtech.com/Show/Index/4455?cPage=4&all=False&sort=0&page=6&slug=amds-graphics-core-next-preview-amd-architects-for-compute

Quote
Finally on the memory side, AMD is adding proper ECC support to supplement their existing EDC (Error Detection & Correction) functionality, which is used to ensure the integrity of memory transmissions across the GDDR5 memory bus. Both the SRAM and VRAM memory can be ECC protected. For the SRAM this is a free operation, while for the VRAM there will be a performance overhead. We’re assuming that AMD will be using a virtual ECC scheme like NVIDIA, where ECC data is distributed across VRAM rather than using extra memory chips/controllers.

Shamino has done some LN2 overclocking when the 7970 was released, in his forum he wrote for the 7970,

Quote
actually 1800 ram is easy, i ran 2000 ram and it got the ECC correction and the score was worse.

http://kingpincooling.com/forum/showthread.php?t=1559

ECC on external RAM like that is done by adding more chips. If these were DIMMs, you'd have DIMMs with 9 chips instead of 8.

The easy way to figure this out is if someone finds a picture of the ref board naked and count the chips.

But would counting the number of chips tell whether the gpu support ECC? Probably on DIMMs they may have the extra memory chips to spread the bits equally through chipkill (e.g. 13-bit word = 8-bit data and 5-bit parity, needs to be spread across 13 DRAM chips)? The 7970 most likely has a virtual scheme to implement ECC, that is through BCH code or Hamming as possible examples - the Anandtech article I previously posted made note of it probably being a virtual ECC implementation. DIMMS with 9 chips do parity on 1 bit, multiple ECC DIMMs can do multiple parity bits across DIMMS. As for the 7970, relying on chip-virtual ECC implementation allows for multi-bit errors to be corrected/detected more conveniently and cheaper.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 22, 2012, 11:23:16 PM
I was also under the impression the 7970 did have ECC, but I thought it was not used. It would cost performance? My impression is based from some articles and postings such as the two below.

I read this sometime back on: http://www.anandtech.com/Show/Index/4455?cPage=4&all=False&sort=0&page=6&slug=amds-graphics-core-next-preview-amd-architects-for-compute

Quote
Finally on the memory side, AMD is adding proper ECC support to supplement their existing EDC (Error Detection & Correction) functionality, which is used to ensure the integrity of memory transmissions across the GDDR5 memory bus. Both the SRAM and VRAM memory can be ECC protected. For the SRAM this is a free operation, while for the VRAM there will be a performance overhead. We’re assuming that AMD will be using a virtual ECC scheme like NVIDIA, where ECC data is distributed across VRAM rather than using extra memory chips/controllers.

Shamino has done some LN2 overclocking when the 7970 was released, in his forum he wrote for the 7970,

Quote
actually 1800 ram is easy, i ran 2000 ram and it got the ECC correction and the score was worse.

http://kingpincooling.com/forum/showthread.php?t=1559

ECC on external RAM like that is done by adding more chips. If these were DIMMs, you'd have DIMMs with 9 chips instead of 8.

The easy way to figure this out is if someone finds a picture of the ref board naked and count the chips.

But would counting the number of chips tell whether the gpu support ECC? Probably on DIMMs they may have the extra memory chips to spread the bits equally through chipkill (e.g. 13-bit word = 8-bit data and 5-bit parity, needs to be spread across 13 DRAM chips)? The 7970 most likely has a virtual scheme to implement ECC, that is through BCH code or Hamming as possible examples - the Anandtech article I previously posted made note of it probably being a virtual ECC implementation. DIMMS with 9 chips do parity on 1 bit, multiple ECC DIMMs can do multiple parity bits across DIMMS. As for the 7970, relying on chip-virtual ECC implementation allows for multi-bit errors to be corrected/detected more conveniently and cheaper.

I really doubt they're doing that, it would add too much complexity to the memory controllers. So, yes, find a 7970 photo, count the memory chips, and post the number in here.


Title: Re: Xeon Phi
Post by: multi#lord on June 22, 2012, 11:40:26 PM
I was also under the impression the 7970 did have ECC, but I thought it was not used. It would cost performance? My impression is based from some articles and postings such as the two below.

I read this sometime back on: http://www.anandtech.com/Show/Index/4455?cPage=4&all=False&sort=0&page=6&slug=amds-graphics-core-next-preview-amd-architects-for-compute

Quote
Finally on the memory side, AMD is adding proper ECC support to supplement their existing EDC (Error Detection & Correction) functionality, which is used to ensure the integrity of memory transmissions across the GDDR5 memory bus. Both the SRAM and VRAM memory can be ECC protected. For the SRAM this is a free operation, while for the VRAM there will be a performance overhead. We’re assuming that AMD will be using a virtual ECC scheme like NVIDIA, where ECC data is distributed across VRAM rather than using extra memory chips/controllers.

Shamino has done some LN2 overclocking when the 7970 was released, in his forum he wrote for the 7970,

Quote
actually 1800 ram is easy, i ran 2000 ram and it got the ECC correction and the score was worse.

http://kingpincooling.com/forum/showthread.php?t=1559

ECC on external RAM like that is done by adding more chips. If these were DIMMs, you'd have DIMMs with 9 chips instead of 8.

The easy way to figure this out is if someone finds a picture of the ref board naked and count the chips.

But would counting the number of chips tell whether the gpu support ECC? Probably on DIMMs they may have the extra memory chips to spread the bits equally through chipkill (e.g. 13-bit word = 8-bit data and 5-bit parity, needs to be spread across 13 DRAM chips)? The 7970 most likely has a virtual scheme to implement ECC, that is through BCH code or Hamming as possible examples - the Anandtech article I previously posted made note of it probably being a virtual ECC implementation. DIMMS with 9 chips do parity on 1 bit, multiple ECC DIMMs can do multiple parity bits across DIMMS. As for the 7970, relying on chip-virtual ECC implementation allows for multi-bit errors to be corrected/detected more conveniently and cheaper.

I really doubt they're doing that, it would add too much complexity to the memory controllers. So, yes, find a 7970 photo, count the memory chips, and post the number in here.

Nah, I'm sure you can search for a ref board and count ram chips - besides it it a new architecture, why not implement ECC? ECC does not mean adding more RAM chips as I mentioned before - there are a number of articles out there that refer to ECC on the 7970, but yes, I'm sure you can search for them if you want to find out.


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 23, 2012, 12:34:20 AM
Counting memory chips isn't going to tell you if a graphics board is ECC capable.

The Quadro 6000s I have at work have the traditional 384-bit memory bus (6x64 bits) found in the GTX480.


Title: Re: Xeon Phi
Post by: 2112 on June 27, 2012, 05:00:34 PM
Given all these assumptions, a Xeon Phi card should mine at roughly 280 Mhash/s, or about as fast as a low end HD 7850. Not impressive.
I wonder how this number is going to change once we include the information that the basic core resembles Pentium which was dual pipeline and that now the cores are 4 way hyperthreaded.

From the Pentium days I remember straightforward Fortran & C code easily retiring more than 1 instruction per clock: 1.3-1.6 with nothing more than "-Ofast".

I would double your number to 560 Mhash/s. This should be a safe assumption that the pipeline utilization could get close to 100%.


Title: Re: Xeon Phi
Post by: crazyates on June 27, 2012, 05:04:08 PM
Given all these assumptions, a Xeon Phi card should mine at roughly 280 Mhash/s, or about as fast as a low end HD 7850. Not impressive.
I wonder how this number is going to change once we include the information that the basic core resembles Pentium which was dual pipeline and that now the cores are 4 way hyperthreaded.

From the Pentium days I remember straightforward Fortran & C code easily retiring more than 1 instruction per clock: 1.3-1.6 with nothing more than "-Ofast".

I would double your number to 560 Mhash/s. This should be a safe assumption that the pipeline utilization could get close to 100%.

I thought hyperthreading only have like a 25% performance boost tops?


Title: Re: Xeon Phi
Post by: mrb on June 27, 2012, 05:33:17 PM
I thought hyperthreading only have like a 25% performance boost tops?

2112 is not talking about hyperthreading. He is talking about the U and V pipelines of the original Pentium CPU, where a single core, a single thread, can execute up to 2 instructions per clock.

It is unclear whether Xeon Phi can dual-issue LRBni (512-bit) instructions (and has enough execution units to execute 2 per cycle), or can only do it for x86-64 (32/64-bit) instructions. I assumed the former, hence my 280 Mhash/s estimate. If not, performance would be 560 Mhash/s as 2112 pointed out.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 27, 2012, 05:41:20 PM
Given all these assumptions, a Xeon Phi card should mine at roughly 280 Mhash/s, or about as fast as a low end HD 7850. Not impressive.
I wonder how this number is going to change once we include the information that the basic core resembles Pentium which was dual pipeline and that now the cores are 4 way hyperthreaded.

From the Pentium days I remember straightforward Fortran & C code easily retiring more than 1 instruction per clock: 1.3-1.6 with nothing more than "-Ofast".

I would double your number to 560 Mhash/s. This should be a safe assumption that the pipeline utilization could get close to 100%.

I thought hyperthreading only have like a 25% performance boost tops?

Superscalar is not hyperthreading.


Title: Re: Xeon Phi
Post by: crazyates on June 27, 2012, 06:22:25 PM
I wonder how this number is going to change once we include the information that the basic core resembles Pentium which was dual pipeline and that now the cores are 4 way hyperthreaded.

Sorry, I got a little confused.


Title: Re: Xeon Phi
Post by: 2112 on June 27, 2012, 09:52:12 PM
It is unclear whether Xeon Phi can dual-issue LRBni (512-bit) instructions (and has enough execution units to execute 2 per cycle), or can only do it for x86-64 (32/64-bit) instructions. I assumed the former, hence my 280 Mhash/s estimate. If not, performance would be 560 Mhash/s as 2112 pointed out.
I agree that unclear is the operative word. I have feeling that Intel's James Reinders is heavily under the influence of the marketing department. There's some talk that the current board is a coprocessor, yet the ISA manual clearly shows the unit booting in the 16-bit segmented real mode.

Intel is either doing artificial market segmentation or something didn't work out in the memory controller/quickpath/chipset interface portion of the design.

I also wonder if the references to the "original Pentium" are similar to the branding exercise that happened with the announcement of the orignal Atoms. The Atoms had completely redesigned microarchitecture called Bonnell, but with various features disabled. Yet the marketing described them as "reissue of the classic Pentium" while pretty much the only thing that they had in common was lack of deep speculation and in-order execution.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 27, 2012, 10:18:04 PM
It is unclear whether Xeon Phi can dual-issue LRBni (512-bit) instructions (and has enough execution units to execute 2 per cycle), or can only do it for x86-64 (32/64-bit) instructions. I assumed the former, hence my 280 Mhash/s estimate. If not, performance would be 560 Mhash/s as 2112 pointed out.
I agree that unclear is the operative word. I have feeling that Intel's James Reinders is heavily under the influence of the marketing department. There's some talk that the current board is a coprocessor, yet the ISA manual clearly shows the unit booting in the 16-bit segmented real mode.

Intel is either doing artificial market segmentation or something didn't work out in the memory controller/quickpath/chipset interface portion of the design.

I also wonder if the references to the "original Pentium" are similar to the branding exercise that happened with the announcement of the orignal Atoms. The Atoms had completely redesigned microarchitecture called Bonnell, but with various features disabled. Yet the marketing described them as "reissue of the classic Pentium" while pretty much the only thing that they had in common was lack of deep speculation and in-order execution.

Well, the real problem is, they want to be able to boot existing x86 code on it. Not merely run, but boot.

I'm thinking they're this: Atom-like cores, dual issue, in order execution, no x87 FPU, and a 512 bit SIMD unit that does both integer and fp, 32?kb of L1, and a small amount of L2.

Now, given that sounds shitty, but if I can run normal threads on those instead of lockstep thread clusters and the SIMD units support booleans (512 of them at a time) or chars (64 at a time), this could actually end up with surprisingly fast mining.


Title: Re: Xeon Phi
Post by: 2112 on June 28, 2012, 01:50:48 AM
Well, the real problem is, they want to be able to boot existing x86 code on it. Not merely run, but boot.
Well, I was thinking of coprocessor as something directly accessible through the QuickPath that doesn't require an OS at all. For example what AMD does to support FPGA in Opteron sockets over HyperChannel. Such co-processor wouldn't need to boot in the classic OS sense, more like it would need to support "reset" without resetting the neighboring CPU.

I'm thinking they're this: Atom-like cores, dual issue, in order execution, no x87 FPU, and a 512 bit SIMD unit that does both integer and fp, 32?kb of L1, and a small amount of L2.

Now, given that sounds shitty, but if I can run normal threads on those instead of lockstep thread clusters and the SIMD units support booleans (512 of them at a time) or chars (64 at a time), this could actually end up with surprisingly fast mining.
I think Knight had granted your wishes, mostly. There's still support for legacy FP, but XMM & YMM registers are replaced by ZMM. There's no support for chars, but there is for Int32 and Int64. If you were thinking of bit-slice parallel implementation for miner then those Int* types will allow that. Multiprocessing and miltithreading is all compliant with OpenMP.

The docs for architecture are near the bottom of this page:

http://software.intel.com/en-us/forums/showthread.php?t=105443

The instruction set is supported by the recent Intel C/C++ and Fortran compilers. The GNU port was just to compile the Linux kernel and doesn't really support the new instructions.


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 28, 2012, 04:15:08 AM
Well, the real problem is, they want to be able to boot existing x86 code on it. Not merely run, but boot.

According to Anandtech:

Quote
Meanwhile on the software side of things in an interesting move Intel is going to be equipping Xeon Phi co-processors with their own OS, in effect making them stand-alone computers (despite the co-processor designation) and significantly deviating from what we’ve seen on similar products (i.e. Tesla). Xeon Phis will be independently running an embedded form of Linux, which Intel has said will be of particular benefit for cluster users. Drivers of course will still be necessary for a host device to interface with the co-processor, with the implication being that these drivers will be fairly thin and simple since the co-processor itself is already running a full OS.

Which means it'll actually boot an OS. Dont know if you cant boot your own though. Can't imagine why you wouldn't be able to though.


Title: Re: Xeon Phi
Post by: 2112 on June 28, 2012, 10:36:27 AM
Which means it'll actually boot an OS. Dont know if you cant boot your own though. Can't imagine why you wouldn't be able to though.
Yeah, after further thought I now assume that calling it a co-processor is just an artificial market segmentation. Intel probably has an agreement with Cray, SGI, etc. to let them announce their supercomputers as first standalone systems using Xeon Phi. Then maybe later the second-tier vendors like Microway will announce single/dual/quad Xeon Phi workstations.

This is very clearly a product targeted for the OpenMP market.


Title: Re: Xeon Phi
Post by: Gabi on June 28, 2012, 07:22:07 PM
I remember reading that on the Xeon Phi will run a version of Linux. And from it you can run things

Get a computer with 3-4 of these and run BOINC on them. Epic computing power @Home  :D


Title: Re: Xeon Phi
Post by: mrb on June 29, 2012, 01:38:53 AM
I wonder how this number is going to change once we include the information that the basic core resembles Pentium which was dual pipeline and that now the cores are 4 way hyperthreaded.

Sorry, I got a little confused.

I see why you are confused. 2112 meant superscalar, not hyperthreaded, as pointed out by others.


Title: Re: Xeon Phi
Post by: 2112 on June 29, 2012, 03:11:08 AM
I see why you are confused. 2112 meant superscalar, not hyperthreaded, as pointed out by others.
I think the confusion runs deeper that just me.

Here's the quote from the "Knights Corner Performance Monitoring Units";
Intel's  document number: 327357-001

Quote
2. 4-Way Threaded: Each Knights Corner core is able to process 4 threads concurrently.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 29, 2012, 04:59:50 AM
I see why you are confused. 2112 meant superscalar, not hyperthreaded, as pointed out by others.
I think the confusion runs deeper that just me.

Here's the quote from the "Knights Corner Performance Monitoring Units";
Intel's  document number: 327357-001

Quote
2. 4-Way Threaded: Each Knights Corner core is able to process 4 threads concurrently.

Yeah, but I suspect thats the Radeon trick: have a pipeline 4 issue deep, so memory latency is effectively hid. It probably cant switch on demand.


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 29, 2012, 05:39:48 AM
I see why you are confused. 2112 meant superscalar, not hyperthreaded, as pointed out by others.
I think the confusion runs deeper that just me.

Here's the quote from the "Knights Corner Performance Monitoring Units";
Intel's  document number: 327357-001

Quote
2. 4-Way Threaded: Each Knights Corner core is able to process 4 threads concurrently.

Yeah, but I suspect thats the Radeon trick: have a pipeline 4 issue deep, so memory latency is effectively hid. It probably cant switch on demand.

I remember reading somewhere that MIC was supposed to be barrel-threaded (ie. fine-grained multithreading), somewhat akin to Ultrasparc T1. I can't find the source now though.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 29, 2012, 05:57:05 AM
I see why you are confused. 2112 meant superscalar, not hyperthreaded, as pointed out by others.
I think the confusion runs deeper that just me.

Here's the quote from the "Knights Corner Performance Monitoring Units";
Intel's  document number: 327357-001

Quote
2. 4-Way Threaded: Each Knights Corner core is able to process 4 threads concurrently.

Yeah, but I suspect thats the Radeon trick: have a pipeline 4 issue deep, so memory latency is effectively hid. It probably cant switch on demand.

I remember reading somewhere that MIC was supposed to be barrel-threaded (ie. fine-grained multithreading), somewhat akin to Ultrasparc T1. I can't find the source now though.

Thats almost the same trick. Ultrasparc T[1-4]s and newer IBM POWERs have multiple thread decoders, and switch to the next on anything that would block execution. AMD Bulldozers do the same, but have a semi-unified scheduler that schedules the next instruction (from one of two already decoded streams) onto the next ALU (2 threads -> 4 integer ALUs and 2 FP ALUs).

Radeons, however, don't switch on block. They automatically assume there is memory latency and results won't be available for four pipeline executions later. It makes the hardware simpler and easier to design compilers for. I suspect Knights Corner is more like a Radeon than a Niagara in this case.


Title: Re: Xeon Phi
Post by: mrb on June 29, 2012, 06:25:11 AM
I see why you are confused. 2112 meant superscalar, not hyperthreaded, as pointed out by others.
I think the confusion runs deeper that just me.

Here's the quote from the "Knights Corner Performance Monitoring Units";
Intel's  document number: 327357-001

Quote
2. 4-Way Threaded: Each Knights Corner core is able to process 4 threads concurrently.

Xeon Phi is hyperthreaded (the vendor-neutral term for this is SMT = symmetric multithreading), but as I am sure you know SMT does not increase the performance at all of ALU-bound workloads. Therefore we can ignore SMT when making theoretical estimations of the performance of bitcoin mining.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 29, 2012, 06:29:12 AM
I see why you are confused. 2112 meant superscalar, not hyperthreaded, as pointed out by others.
I think the confusion runs deeper that just me.

Here's the quote from the "Knights Corner Performance Monitoring Units";
Intel's  document number: 327357-001

Quote
2. 4-Way Threaded: Each Knights Corner core is able to process 4 threads concurrently.

Xeon Phi is hyperthreaded (the vendor-neutral term for this is SMT = symmetric multithreading), but as I am sure you know SMT does not increase the performance at all of ALU-bound workloads. Therefore we can ignore SMT when making theoretical estimations of the performance of bitcoin mining.


Its only hyperthreaded? Thats kinda pointless altogether.


Title: Re: Xeon Phi
Post by: mrb on June 29, 2012, 06:54:06 AM
Its only hyperthreaded? Thats kinda pointless altogether.

I use this term liberally. I have no idea what type of threading Xeon Phi will implement (the GPU way: switching unconditionally to the next thread on each instruction; or the CPU way: switching to the next thread when the current one would wait on memory).

But either way, it does not matter to us. Mining is an embarrassingly parallel workload, so an implementation can be adjusted to fully exploit the ALU resources of Xeon Phi, and whatever type of threading Xeon Phi implement will not add supplemental performance.


Title: Re: Xeon Phi
Post by: 2112 on June 29, 2012, 08:06:03 PM
Its only hyperthreaded? Thats kinda pointless altogether.
It is both superscalar (2-way) and hyperthreaded (4-way).

Another quote from the same manual:
Quote
0x00 0x16 INSTRUCTIONS_EXECUTED Number of instructions executed (up to two per clock)
0x00 0x17 INSTRUCTIONS_EXECUTED_V_PIPE Number of instructions executed in the V_pipe. The event indicates the number of instructions that were paired.
0x20 0x16 VPU_INSTRUCTIONS_EXECUTED Counts the number of VPU instructions executed in both u- and v-pipes.
0x20 0x17 VPU_INSTRUCTIONS_EXECUTED_V_PIPE Counts the number of VPU instructions that paired and executed in the v-pipe.

As mrb said:
Mining is an embarrassingly parallel workload, so an implementation can be adjusted to fully exploit the ALU resources of Xeon Phi,
but hyperthreading should make the "adjustment" work easier. The threads will not be fighting for cache lines, which is the most common cause for not gaining the performance in hyperthreaded processors.

Anyway, we'll see. I'm not really up to downloading Intel compilers, compiling the code and analyzing the assembly. Maybe in winter?


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 30, 2012, 12:14:22 AM
switch to the next on anything that would block execution.
That's the definition of coarse-grained multithreading and I believe you're mistaken. No major processor architecture implements coarse-grained multithreading.

The Ultrasparc T1/T2 switches thread on every cycle, which is the definition of fine-grained multithreading.

The IBM Power series have implemented true SMT since Power5.

AMD Bulldozers do the same, but have a semi-unified scheduler that schedules the next instruction (from one of two already decoded streams) onto the next ALU (2 threads -> 4 integer ALUs and 2 FP ALUs).

AMD's implementation is actually SMT if you only regard the int ALU resources. The innovation they made is that they share FP resources across two cores (or they have dedicated integer resources for each core if you want to look at it that way).

Quote
Radeons, however, don't switch on block. They automatically assume there is memory latency and results won't be available for four pipeline executions later. It makes the hardware simpler and easier to design compilers for. I suspect Knights Corner is more like a Radeon than a Niagara in this case.

That's fine-grained multithreading then because it's basically switching to a new thread every cycle.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 30, 2012, 02:42:14 AM
switch to the next on anything that would block execution.
That's the definition of coarse-grained multithreading and I believe you're mistaken. No major processor architecture implements coarse-grained multithreading.

The Ultrasparc T1/T2 switches thread on every cycle, which is the definition of fine-grained multithreading.

The IBM Power series have implemented true SMT since Power5.

AMD Bulldozers do the same, but have a semi-unified scheduler that schedules the next instruction (from one of two already decoded streams) onto the next ALU (2 threads -> 4 integer ALUs and 2 FP ALUs).

AMD's implementation is actually SMT if you only regard the int ALU resources. The innovation they made is that they share FP resources across two cores (or they have dedicated integer resources for each core if you want to look at it that way).

Quote
Radeons, however, don't switch on block. They automatically assume there is memory latency and results won't be available for four pipeline executions later. It makes the hardware simpler and easier to design compilers for. I suspect Knights Corner is more like a Radeon than a Niagara in this case.

That's fine-grained multithreading then because it's basically switching to a new thread every cycle.

I wasn't arguing coarse vs fine. I was just arguing on who does what.

Niagaras really do switch on block, but the newer ones might just switch every cycle now that they have a FP unit per core instead of per socket.

Radeons switch every VLIW clause (which can be up to 128 instructions long) due to the unique register layout.

AMD is SMT if you look at it that way, but I look at it much finer grained than SMT: not only do you get SMT, but instructions are scheduled to run on the next free ALU. Fine grained "switch every cycle" would waste resources as ALUs would have no work to run most of the time.

Intel Hyperthreading switches on block on P4s, and I think on i*s that have it they switched to every cycle.

I mean, what I'm trying to say is, all modern high performance archs use the same set of tricks, but its at what level do they exploit them. Its rumored that POWER in the future (9? 10?) will just be something like 64 threads piping into one core, where that one core has like 128 int ALUs and a similar number of FPUs, which is probably the future of non-lockstep highly parallel programming.


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 30, 2012, 09:46:32 AM
Quote from: DiabloD3
AMD is SMT if you look at it that way, but I look at it much finer grained than SMT: not only do you get SMT, but instructions are scheduled to run on the next free ALU. Fine grained "switch every cycle" would waste resources as ALUs would have no work to run most of the time.

Quote from: DiabloD3
Intel Hyperthreading switches on block on P4s, and I think on i*s that have it they switched to every cycle.

I think you have a fundamental misunderstanding of what SMT is as evidenced by the quotes above. SMT is neither fine-grained nor coarse-grained multithreading.

SMT is simultaneous multithreading so there is no switching. Instructions from all threads are eligible to be issued at any given clock cycle.

Quote from: DiabloD3
I wasn't arguing coarse vs fine. I was just arguing on who does what.

Niagaras really do switch on block, but the newer ones might just switch every cycle now that they have a FP unit per core instead of per socket.

From Wikipedia's UltraSPARCT1 article (http://UltraSPARCT1 article):

Quote
Each core is a barrel processor, meaning it switches between available threads each cycle.

It doesn't switch on a block, it switches EVERY cycle. You can confirm this if you read Sun's architecture manuals (T1 has been open-sourced)


Title: Re: Xeon Phi
Post by: DiabloD3 on June 30, 2012, 10:00:20 AM
Quote from: DiabloD3
AMD is SMT if you look at it that way, but I look at it much finer grained than SMT: not only do you get SMT, but instructions are scheduled to run on the next free ALU. Fine grained "switch every cycle" would waste resources as ALUs would have no work to run most of the time.

Quote from: DiabloD3
Intel Hyperthreading switches on block on P4s, and I think on i*s that have it they switched to every cycle.

I think you have a fundamental misunderstanding of what SMT is as evidenced by the quotes above. SMT is neither fine-grained nor coarse-grained multithreading.

SMT is simultaneous multithreading so there is no switching. Instructions from all threads are eligible to be issued at any given clock cycle.

Quote from: DiabloD3
I wasn't arguing coarse vs fine. I was just arguing on who does what.

Niagaras really do switch on block, but the newer ones might just switch every cycle now that they have a FP unit per core instead of per socket.

From Wikipedia's UltraSPARCT1 article (http://UltraSPARCT1 article):

Quote
Each core is a barrel processor, meaning it switches between available threads each cycle.

It doesn't switch on a block, it switches EVERY cycle. You can confirm this if you read Sun's architecture manuals (T1 has been open-sourced)

I know what SMT is, and it only describes one core doing multiple threads simultaniously. It doesn't describe how, coarse or not. T1s, Intel P4s, Intel i*s with HT, Bulldozers, and Radeons can all be described as SMT. They just don't all do it the same way.

As for T1s doing it on block, this is what Sun advertised it as. I'm not surprised their marketing department got it slightly wrong, so I'll let you have that one.


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 30, 2012, 10:12:47 AM

I know what SMT is, and it only describes one core doing multiple threads simultaniously. It doesn't describe how, coarse or not. T1s, Intel P4s, Intel i*s with HT, Bulldozers, and Radeons can all be described as SMT. They just don't all do it the same way.

You seriously still not do understand what SMT is. If you have Computer Architecture: A Quantitative Approach, I suggest you go flip to the chapter on multithreading and reading it.

Also look at slide 35 (http://www.cs.utexas.edu/~witchel/352H/lectures/Lecture_24.ppt) from this Powerpoint presentation on threading.

Ultrasparc T1 cannot possibly be SMT. From wikipedia article on SMT (http://en.wikipedia.org/wiki/Simultaneous_multithreading):

Quote
The key factor to distinguish them is to look at how many instructions the processor can issue in one cycle and how many threads from which the instructions come. For example, Sun Microsystems' UltraSPARC T1 (known as "Niagara" until its November 14, 2005 release) is a multicore processor combined with fine-grain multithreading technique instead of simultaneous multithreading because each core can only issue one instruction at a time.


Title: Re: Xeon Phi
Post by: DiabloD3 on June 30, 2012, 10:17:57 AM

I know what SMT is, and it only describes one core doing multiple threads simultaniously. It doesn't describe how, coarse or not. T1s, Intel P4s, Intel i*s with HT, Bulldozers, and Radeons can all be described as SMT. They just don't all do it the same way.

You seriously still not do understand what SMT is. If you have Computer Architecture: A Quantitative Approach, I suggest you go flip to the chapter on multithreading and reading it.

Also look at slide 35 (http://www.cs.utexas.edu/~witchel/352H/lectures/Lecture_24.ppt) from this Powerpoint presentation on threading.

Ultrasparc T1 cannot possibly be SMT. From wikipedia article on SMT (http://en.wikipedia.org/wiki/Simultaneous_multithreading):

Quote
The key factor to distinguish them is to look at how many instructions the processor can issue in one cycle and how many threads from which the instructions come. For example, Sun Microsystems' UltraSPARC T1 (known as "Niagara" until its November 14, 2005 release) is a multicore processor combined with fine-grain multithreading technique instead of simultaneous multithreading because each core can only issue one instruction at a time.

Thats an unusually strict definition of SMT. Issuing instructions from more than one thread at a time to fill load requirements over multiple ALUs is not a requirement to be SMT.


Title: Re: Xeon Phi
Post by: AzN1337c0d3r on June 30, 2012, 10:19:40 AM
Thats an unusually strict definition of SMT. Issuing instructions from more than one thread at a time to fill load requirements over multiple ALUs is not a requirement to be SMT.

Dude, why do you think it's called simultaneous multithreading then?

If you have found a less strict definition somewhere from an authoritative source, please do share. All research papers I've read regarding SMT has had that definition.


Title: Re: Xeon Phi
Post by: 2112 on July 10, 2012, 10:00:35 PM
Just an interesting tidbit I've found on the 2nd pass through the Knights Corner documentation:
Quote
EBX[23:16] = 248; // Maximum number of logical processors
248/4 = 62 not 50.

Why could that be?

1) A leftover from Knights Ferry?
2) Yield with all 62 cores enabled would be zero?
3) Something else?

Please share your guesses.


Title: Re: Xeon Phi
Post by: markodude on November 18, 2012, 09:37:30 AM
Im getting a shot of one next week, has anyone got the code to get it mining? Thanks


Title: Re: Xeon Phi
Post by: 2112 on November 18, 2012, 12:50:25 PM
Im getting a shot of one next week, has anyone got the code to get it mining? Thanks
pooler's cpuminer will mine both Bitcoins and Litecons. You'll have to recompile to take advantage of the new instructions and massive multithreading. To fully utilize the long vector units you'll probably need to restructure to loops somewhat.

https://bitcointalk.org/index.php?topic=55038.0


Title: Re: Xeon Phi
Post by: kiyominer on March 27, 2014, 02:42:53 PM
i ported Pooler's cpuminer to Intel Xeon Phi so it takes advantage of the 512 bits registers and compute 16 hashes at once.
i was able to test is on Intel Xeon Phi series 5100 (60 cores, 240 threads @ 1.053 GHz)

so far, i was able to measure 140 MHash/s (using 240 threads).
it was interesting to note that if using 60 threads (using one thread per core) i was able
to achieve 65 MHash/s, which means there could be some room for optimisation, and 260 MHash/s
could be achieved.

on the other hand, i could only achieve 13.8 MHash/s on 240 threads by using the plain C code
(e.g. no use of 512 bit registers use)

the code (Linux only) can be downloaded at https://github.com/kiyominer/cpuminer

Cheers,

kiyo


Title: Re: Xeon Phi
Post by: redmonski on March 27, 2014, 03:31:40 PM
i ported Pooler's cpuminer to Intel Xeon Phi so it takes advantage of the 512 bits registers and compute 16 hashes at once.
i was able to test is on Intel Xeon Phi series 5100 (60 cores, 240 threads @ 1.053 GHz)

so far, i was able to measure 140 MHash/s (using 240 threads).
it was interesting to note that if using 60 threads (using one thread per core) i was able
to achieve 65 MHash/s, which means there could be some room for optimisation, and 260 MHash/s
could be achieved.

on the other hand, i could only achieve 13.8 MHash/s on 240 threads by using the plain C code
(e.g. no use of 512 bit registers use)

the code (Linux only) can be downloaded at https://github.com/kiyominer/cpuminer

Cheers,

kiyo

Hi, does it work with scrypt? thanks.


Title: Re: Xeon Phi
Post by: kiyominer on March 27, 2014, 03:48:41 PM
Quote
does it work with scrypt?

sorry, only sha256d is supported at this time.
at first glance, efficient scrypt implementation does not look as easy as sha256d.