Bitcoin Forum
May 05, 2024, 01:59:24 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Run my program in your Nvidia GPU (for bitcoins)  (Read 4602 times)
jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 11, 2012, 10:17:23 AM
 #1

Do you have a relatively new Nvidia graphic card (must support CUDA)? Do you have linux installed?
You could run my project so that I can compare performance times with a better graphic than my GT 9500 and a newer CPU than my old single core AMD.
You will need to install a couple of programs: nvcc (the CUDA compiler), nasm (an assembler), gnuplot (to draw some charts) and maybe some basic C++ dev tools if you don't have them yet. I'll help you set up the program and then you only need to wait for the program to run. I'm not sure how much will it take but it can take a few hours in which you shouldn't be using your computer much.

Here's the software prepared to be installed and run with just "make":
http://content.wuala.com/contents/jtimon/temp/preann.tar.gz

The project is free software and anyone can also download it from this repository:
http://sourceforge.net/projects/preann/develop

Please, contact me if you're interested in helping me with your computing power.
Although I have no idea about what would be fair, I'm willing to pay you some bitcoins for this.
I can also pay you with village's hours.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
1714874364
Hero Member
*
Offline Offline

Posts: 1714874364

View Profile Personal Message (Offline)

Ignore
1714874364
Reply with quote  #2

1714874364
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
Jay_Pal
Legendary
*
Offline Offline

Activity: 1493
Merit: 1003



View Profile
June 11, 2012, 06:37:29 PM
 #2

I have an onboard GT 8600, are you interested?

Best faucet EVER! - Freebitco.in
Don't Panic... - 1G8zjUzeZBfJpeCbz1MLTc6zQHbLm78vKc
Why not mine from the browser?
jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 12, 2012, 09:15:36 AM
 #3

I have an onboard GT 8600, are you interested?

I don't know. It seems to have the same core config (32 cores) than my 9500 GT. And with that I obtained "disappointing" results last time I tried. Maybe it was the old motherboard and CPU what was responsible.
Now it seems that my old Desktop pc has broken down just by not being turned on for so long (or maybe it's just the power supply, no beep),
 so maybe I'm interested in your offer if no one else shows up.
My hope was getting to use an NVIDIA from the 500 or 600 series, with at least 192 cores or so (ideally a GeForce GTX 680 which has 1536 of them), but thank you for your interest, anyway.

Another clarification: I don't need towers with various GPUs because my program is not adapted to work with multiple GPUs anyway.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
Luceo
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


Per aspera ad astra!


View Profile
June 12, 2012, 11:30:11 AM
 #4

I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.

Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.

Edit: CPU is a Phenom II X6 1100T.

Jay_Pal
Legendary
*
Offline Offline

Activity: 1493
Merit: 1003



View Profile
June 12, 2012, 11:45:06 AM
 #5

I have an onboard GT 8600, are you interested?

I don't know. It seems to have the same core config (32 cores) than my 9500 GT. And with that I obtained "disappointing" results last time I tried. Maybe it was the old motherboard and CPU what was responsible.
Now it seems that my old Desktop pc has broken down just by not being turned on for so long (or maybe it's just the power supply, no beep),
 so maybe I'm interested in your offer if no one else shows up.
My hope was getting to use an NVIDIA from the 500 or 600 series, with at least 192 cores or so (ideally a GeForce GTX 680 which has 1536 of them), but thank you for your interest, anyway.

Another clarification: I don't need towers with various GPUs because my program is not adapted to work with multiple GPUs anyway.


Anytime you need, you're welcome!

I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.

Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.

Edit: CPU is a Phenom II X6 1100T.

That's a great idea!

Best faucet EVER! - Freebitco.in
Don't Panic... - 1G8zjUzeZBfJpeCbz1MLTc6zQHbLm78vKc
Why not mine from the browser?
jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 12, 2012, 12:38:42 PM
 #6

I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.

Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.

Edit: CPU is a Phenom II X6 1100T.

Great!! That should be more than enough.
It doesn't need internet access but I'm afraid of the impact on performance that a live CD could have, since the purpose is precisely to measure performance.
The project may be large, but you won't be actually running all of it.
The tasks, game, genetic, etc, parts aren't necessary.
If you use the version here, the makefile is prepared to just compile and run chronoBuffers, chronoConnections and chronoFunctions (I forgot to include this one the last time I put the thing prepared there) and these main files don't use the whole code. I think it's just neural, optimization, tamplate, common and loop parts plus ChronoPlotter and these main .cpp files. But you can be sure following the includes.

If the reason you want to use the live CD is that you're concern with what the program can do, we could first run a test with the live CD and small values for the loops so you can be sure that it won't broke anything.
Wait...that won't work because you need to install nasm, nvcc and gnuplot first and you can't install things using the liveCD, right?

Probably the safest solution for you is then to run it in a separate partition, but that will make you spend more time.
I'm open to other suggestions but I fear the liveCD won't be a feasible solution.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
speedmann
Newbie
*
Offline Offline

Activity: 63
Merit: 0


View Profile
June 14, 2012, 05:05:01 PM
 #7

Hi there,

are you interested in testing an Geforce GT 420m (yea mobile version Sad) with Intel Core i3 370M Processor?
If yes, please contact me Wink
jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 14, 2012, 05:28:07 PM
 #8

Hi there,

are you interested in testing an Geforce GT 420m (yea mobile version Sad) with Intel Core i3 370M Processor?
If yes, please contact me Wink

Thanks for you offer.
Luceo's GTX 570 was more attractive but I'm not sure if he's still willing to do it.
If you don't mind I'll wait until next week for his answer (or others) and tell you then, ok?
Even if your version is mobile, still has 96 cores instead of my 32 and the CPU has 2 cores instead of one (so system processes shouldn't get in the way).
If no one says anything else, you're the winner for now Smiley

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
Jello
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 17, 2012, 09:34:14 PM
 #9

I have a card with 96 CUDA cores on an 2nd gen Sandy Bridge on CentOS. Not sure if that's enough for your needs.
mokahless
Sr. Member
****
Offline Offline

Activity: 471
Merit: 256



View Profile
June 18, 2012, 12:07:04 AM
 #10

What does your program do? Mining? Or are you cracking government passwords or something sketchy?

jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 18, 2012, 01:35:07 PM
 #11

I have a card with 96 CUDA cores on an 2nd gen Sandy Bridge on CentOS. Not sure if that's enough for your needs.

I guess CentOS will have g++ and gnuplot in its repositories, but not so sure about nasm (the assembler). I've develeoped everything under ubuntu but it shouldn't be a problem. I also guess Sandy Bridge is compatible with SSE2 and the XMM co-processor, after all is intel.
Don't you know what GPU model do you have? 96 cores will be fine anyway, but I'm not sure if your machine is better or worse than speedmann's.

What does your program do? Mining? Or are you cracking government passwords or something sketchy?

It has different names: Evolutionary artificial neural networks, neuro-evolution...Is basically a machine learning technique combining neural networks and genetic algorithms.
What does my program learn?
To correctly calculate AND or OR from 2 bit vectors (very basic classification tasks), XOR (harder, you need NN with more than 1 layer) and to play reversi/othello, a board game.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
Luceo
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


Per aspera ad astra!


View Profile
June 18, 2012, 06:32:14 PM
Last edit: June 18, 2012, 07:08:48 PM by Luceo
 #12

I've had a look at the code and I'm happy running this on my machine 'naked'.

A LiveCD wouldn't affect performance as I'd reboot into it and everything'd be running in RAM, but getting all the software on one would be a pain.

I'm using Arch Linux x86_64.

Edit: CXX_BASE isn't set by default, set that. Next issue:

(3:527)~ make
g++ -ggdb -I src/   -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -ggdb -I src/   -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -ggdb -I src/   -c src/common/dummy.cpp -o build/common/dummy.o
g++ -ggdb -I src/   -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -ggdb -I src/   -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: error: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
make: *** [build/common/util.o] Error 1

Set CXX_BASE to -fpermissive, not sure if this'll affect the test.

Also have to change the cuda directory. Makefile looks pretty different.

Hmm, didn't run right, any ideas?

...

EDIT 2: OK, got a little further. Your 'if' loop in the Makefile was being ignored, so I manually set these in the Makefile:

FACT_OBJ = $(FULL_OBJ)
FACT_FLAGS += -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL

Then I added my CUDA 'bin' path to PATH variable.

Now, I get the following when I try to compile:

Code:
(3:587)# make
mkdir -p build/common
mkdir -p build/optimization
mkdir -p build/neural
mkdir -p build/genetic
mkdir -p build/game
mkdir -p build/tasks
mkdir -p build/loop
mkdir -p build/loopTest
mkdir -p build/test/
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/dummy.cpp -o build/common/dummy.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: warning: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
nasm -f elf src/optimization/sse2_code.asm -o build/optimization/sse2_code.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined

/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined

src/optimization/cuda_code.cu(60): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, unsigned int)"
            function "min(unsigned int, unsigned int)"
            argument types are: (unsigned long, unsigned int)

src/optimization/cuda_code.cu(175): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, unsigned int)"
            function "min(unsigned int, unsigned int)"
            argument types are: (unsigned long, unsigned int)

4 errors detected in the compilation of "/tmp/tmpxft_000046f1_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2

Any suggestions? Please PM me, I'd like to get this running as I've already spent a fair amount of time hacking at the Makefile to get it to.

jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 19, 2012, 08:56:16 AM
 #13

Sorry about all the problems you're having. I've compiled the program with older software and this is the result.
Being CUDA propietary software, when they decided to remove the emulator I got stuck with version 2.3 which means g++ 4.3 and I'm using make 3.81.
The permisive flag seems to solve the problems with the newer g++ and you've hacked the makefile to make it work with the newer make.
I forgot that the path for nvcc is different in newer versions.

It doesn't run well because it hasn't compiled correctly. I don't even know why the executables are created.

I've changed cuda_code.cu to try to solve those problems. You can get it here:
http://preann.svn.sourceforge.net/viewvc/preann/preann/src/optimization/cuda_code.cu?view=log

If you have more problems, please, post them. Maybe the easiest solution (if you can get a live CD with g++-4.3) is to install the legacy version of CUDA: http://developer.nvidia.com/cuda-toolkit-23-downloads

Thank you for taking the time to try this.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
Luceo
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


Per aspera ad astra!


View Profile
June 19, 2012, 05:55:16 PM
 #14

No problem, it clearly needs working through so glad I can be of assistance. Current output still fails, though:

Code:
(3:600)# make
mkdir -p build/common
mkdir -p build/optimization
mkdir -p build/neural
mkdir -p build/genetic
mkdir -p build/game
mkdir -p build/tasks
mkdir -p build/loop
mkdir -p build/loopTest
mkdir -p build/test/
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/dummy.cpp -o build/common/dummy.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: warning: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
nasm -f elf src/optimization/sse2_code.asm -o build/optimization/sse2_code.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined

/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_00006ce2_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2

This issue was present in some of the CUDA demos, and was fixed by adding this to the top of cuda_code.cu (although I'm totally unsure what this does, it's just a forum suggestion):

Code:
#undef _GLIBCXX_ATOMIC_BUILTINS
#undef _GLIBCXX_USE_INT128

Then, the following return was given to a 'make' (this could be related to my edit as described in the last codeblock, though):

Code:
(3:602)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
src/optimization/cuda_code.cu(382) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumBitsConnectionsKernel<(BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(382) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumBitsConnectionsKernel<(BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(457) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumBitsInvertedConnectionsKernel<(BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(457) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumBitsInvertedConnectionsKernel<(BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)2> ") is not allowed

34 errors detected in the compilation of "/tmp/tmpxft_00006d0d_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2

jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 19, 2012, 06:43:33 PM
 #15

No problem, it clearly needs working through so glad I can be of assistance. Current output still fails, though:

...

This issue was present in some of the CUDA demos, and was fixed by adding this to the top of cuda_code.cu (although I'm totally unsure what this does, it's just a forum suggestion):

Code:
#undef _GLIBCXX_ATOMIC_BUILTINS
#undef _GLIBCXX_USE_INT128

Thank you. I'll find out what this means. Probably desabling new stuff by default or something.

Then, the following return was given to a 'make' (this could be related to my edit as described in the last codeblock, though):

...

That happens when you change things and you only test your "improvements" with the emulator. Sorry again.
I've uploaded the file again to the repository with a change to try to solve that (and your change).
Can you try it again?


2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
Luceo
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


Per aspera ad astra!


View Profile
June 19, 2012, 10:56:46 PM
Last edit: June 20, 2012, 12:05:13 AM by Luceo
 #16

OK, I had to swap around the #undef and #include statements at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one:

Code:
(3:624)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/optimization/factory.cpp -o build/optimization/factory.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/interface.cpp -o build/neural/interface.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/connection.cpp -o build/neural/connection.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/neuralNet.cpp -o build/neural/neuralNet.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/layer.cpp -o build/neural/layer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/inputLayer.cpp -o build/neural/inputLayer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/buffer.cpp -o build/neural/buffer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/individual.cpp -o build/genetic/individual.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/task.cpp -o build/genetic/task.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/population.cpp -o build/genetic/population.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/game/reversiBoard.cpp -o build/game/reversiBoard.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/game/board.cpp -o build/game/board.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/reversiTask.cpp -o build/tasks/reversiTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/binaryTask.cpp -o build/tasks/binaryTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/classificationTask.cpp -o build/tasks/classificationTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/rangeLoop.cpp -o build/loop/rangeLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/enumLoop.cpp -o build/loop/enumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/genericPlotter.cpp -o build/loop/genericPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/plot.cpp -o build/loop/plot.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/loop.cpp -o build/loop/loop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/joinEnumLoop.cpp -o build/loop/joinEnumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/test.cpp -o build/loop/test.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/parametersMap.cpp -o build/loop/parametersMap.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loopTest/taskPlotter.cpp -o build/loopTest/taskPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loopTest/chronoPlotter.cpp -o build/loopTest/chronoPlotter.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -L/opt/cuda-toolkit/lib64 -lcudart  build/test/testMemoryLosses.o build/common/chronometer.o build/common/dummy.o build/common/enumerations.o build/common/util.o build/optimization/factory.o build/neural/interface.o build/neural/connection.o build/neural/neuralNet.o build/neural/layer.o build/neural/inputLayer.o build/neural/buffer.o build/genetic/individual.o build/genetic/task.o build/genetic/population.o build/game/reversiBoard.o build/game/board.o build/tasks/reversiTask.o build/tasks/binaryTask.o build/tasks/classificationTask.o build/loop/rangeLoop.o build/loop/enumLoop.o build/loop/genericPlotter.o build/loop/plot.o build/loop/loop.o build/loop/joinEnumLoop.o build/loop/test.o build/loop/parametersMap.o build/loopTest/taskPlotter.o build/loopTest/chronoPlotter.o build/optimization/sse2_code.o build/optimization/cuda_code.o -o bin/testMemoryLosses.exe
/usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1

Feel like we're getting somewhere. Tongue I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile:

Code:
FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ)
FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL

Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to:

Code:
 while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 : 
Implementation SSE2 is not allowed.

However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like:

Code:
 while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 : 
The maximum float input size is 4032.

Complete output from this run has been pasted here.

Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).

jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 20, 2012, 07:18:50 AM
 #17

OK, I had to swap around the #undef and #include statements at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one:

I'll correct that, thanks.

Code:
/usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1

Oh, I didn't thought about that. The assembly code is incompatible with the 64bit operative system.

Feel like we're getting somewhere. Tongue I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile:

Code:
FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ)
FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL

Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to:

Code:
 while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 : 
Implementation SSE2 is not allowed.

Actually I wanted to compare the SSE2 version with the same CPU.
If you can't use a 32 bit OS, I guess I can repeat some of the charts and compare results independently.
After removing the SSE2 version, to not having those errors it is necessary to change the running main files (chronoBuffers.cpp, chronoConnections.cpp, chronoFunctions.cpp) to remove the SSE2 option, but those errors aren't really important.
Anyway, the changes would be something like this:

In chronoBuffers.cpp
EnumLoop linesLoop(ET_IMPLEMENTATION, 3, IT_C, IT_SSE2, IT_CUDA);
for
EnumLoop linesLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA);

In chronoConnections.cpp
EnumLoop linesLoop(ET_IMPLEMENTATION);
for
EnumLoop linesLoop(ET_IMPLEMENTATION, 4, IT_C, IT_SSE2, IT_CUDA, IT_CUDA_REDUC, IT_CUDA_INV);

Maintain
linesLoop.addInnerLoop(new EnumLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA));
for
chronoFunctions.cpp

However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like:

Code:
 while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 : 
The maximum float input size is 4032.

Complete output from this run has been pasted here.

Don't worry, that's expected. One CUDA implementation doesn't allow certain sizes.
I don't understand why it says
Cmake: *** [cuda_emu] Interrupt

By default it should be [all] and not [cuda_emu] but it doesn't matter.
The cuda part is compiled without the "--device-emulation" so everything's fine:

Code:
(3:624)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o


Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).

Sorry for not answering earlier. I would prefer to run this with a 32  bit OS and the SSE2 version, but again, if that's not possible, I'll work it out somehow.
But the other errors are fine, is just an implementation which is more limited trying to run greater sizes for the layers than it can. That's expected.

Thank you again for all your effort and patience.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
Luceo
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


Per aspera ad astra!


View Profile
June 20, 2012, 12:03:50 PM
Last edit: June 20, 2012, 12:21:17 PM by Luceo
 #18

I actually only have 64-bit operating systems, but since SSE2 is CPU-based, it's not really comparable to the CUDA (GPU) results and should be able to be tested on just about anything.

If you want to update the assembler code for a 64-bit CPU, I'd be willing to test it, but I expect that's a lot of work and not as simple as just using 64-bit registers (my asm knowledge is very rusty).

I'll run the program next time I nap and post results here if it finishes during that nap.

jtimon (OP)
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 20, 2012, 06:52:02 PM
 #19

I actually only have 64-bit operating systems, but since SSE2 is CPU-based, it's not really comparable to the CUDA (GPU) results and should be able to be tested on just about anything.

If you want to update the assembler code for a 64-bit CPU, I'd be willing to test it, but I expect that's a lot of work and not as simple as just using 64-bit registers (my asm knowledge is very rusty).

I'll run the program next time I nap and post results here if it finishes during that nap.

In my old computer with poor communication between CPU and GPU memories the SSE2 implementation was actually superior. That's what I didn't want to show to my teachers. But yes, I can compare SSE2 and CUDA against C separately.
But if I find easy to port the assembly code to 64 bits I'll do it just to make it nicer.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
June 20, 2012, 07:05:10 PM
 #20

I'm using Arch Linux x86_64.
I just checked that Arch Linux has support for multilib. This means that 64-bit OS can run 32-bit programs, provided that:

1) the multilib support packages are installed;
2) gcc/g++ are invoked with -m32 flag.

So there's no need to laboriously rewrite the assembly code. All you need to do is modify the makefiles.

Have fun.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!