jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 11, 2012, 10:17:23 AM |
|
Do you have a relatively new Nvidia graphic card (must support CUDA)? Do you have linux installed? You could run my project so that I can compare performance times with a better graphic than my GT 9500 and a newer CPU than my old single core AMD. You will need to install a couple of programs: nvcc (the CUDA compiler), nasm (an assembler), gnuplot (to draw some charts) and maybe some basic C++ dev tools if you don't have them yet. I'll help you set up the program and then you only need to wait for the program to run. I'm not sure how much will it take but it can take a few hours in which you shouldn't be using your computer much. Here's the software prepared to be installed and run with just "make": http://content.wuala.com/contents/jtimon/temp/preann.tar.gzThe project is free software and anyone can also download it from this repository: http://sourceforge.net/projects/preann/developPlease, contact me if you're interested in helping me with your computing power. Although I have no idea about what would be fair, I'm willing to pay you some bitcoins for this. I can also pay you with village's hours.
|
|
|
|
Jay_Pal
Legendary
Offline
Activity: 1493
Merit: 1003
|
|
June 11, 2012, 06:37:29 PM |
|
I have an onboard GT 8600, are you interested?
|
|
|
|
jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 12, 2012, 09:15:36 AM |
|
I have an onboard GT 8600, are you interested?
I don't know. It seems to have the same core config (32 cores) than my 9500 GT. And with that I obtained "disappointing" results last time I tried. Maybe it was the old motherboard and CPU what was responsible. Now it seems that my old Desktop pc has broken down just by not being turned on for so long (or maybe it's just the power supply, no beep), so maybe I'm interested in your offer if no one else shows up. My hope was getting to use an NVIDIA from the 500 or 600 series, with at least 192 cores or so (ideally a GeForce GTX 680 which has 1536 of them), but thank you for your interest, anyway. Another clarification: I don't need towers with various GPUs because my program is not adapted to work with multiple GPUs anyway.
|
|
|
|
Luceo
Sr. Member
Offline
Activity: 350
Merit: 250
Per aspera ad astra!
|
|
June 12, 2012, 11:30:11 AM |
|
I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.
Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.
Edit: CPU is a Phenom II X6 1100T.
|
|
|
|
Jay_Pal
Legendary
Offline
Activity: 1493
Merit: 1003
|
|
June 12, 2012, 11:45:06 AM |
|
I have an onboard GT 8600, are you interested?
I don't know. It seems to have the same core config (32 cores) than my 9500 GT. And with that I obtained "disappointing" results last time I tried. Maybe it was the old motherboard and CPU what was responsible. Now it seems that my old Desktop pc has broken down just by not being turned on for so long (or maybe it's just the power supply, no beep), so maybe I'm interested in your offer if no one else shows up. My hope was getting to use an NVIDIA from the 500 or 600 series, with at least 192 cores or so (ideally a GeForce GTX 680 which has 1536 of them), but thank you for your interest, anyway. Another clarification: I don't need towers with various GPUs because my program is not adapted to work with multiple GPUs anyway. Anytime you need, you're welcome! I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.
Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.
Edit: CPU is a Phenom II X6 1100T.
That's a great idea!
|
|
|
|
jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 12, 2012, 12:38:42 PM |
|
I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.
Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.
Edit: CPU is a Phenom II X6 1100T.
Great!! That should be more than enough. It doesn't need internet access but I'm afraid of the impact on performance that a live CD could have, since the purpose is precisely to measure performance. The project may be large, but you won't be actually running all of it. The tasks, game, genetic, etc, parts aren't necessary. If you use the version here, the makefile is prepared to just compile and run chronoBuffers, chronoConnections and chronoFunctions (I forgot to include this one the last time I put the thing prepared there) and these main files don't use the whole code. I think it's just neural, optimization, tamplate, common and loop parts plus ChronoPlotter and these main .cpp files. But you can be sure following the includes. If the reason you want to use the live CD is that you're concern with what the program can do, we could first run a test with the live CD and small values for the loops so you can be sure that it won't broke anything. Wait...that won't work because you need to install nasm, nvcc and gnuplot first and you can't install things using the liveCD, right? Probably the safest solution for you is then to run it in a separate partition, but that will make you spend more time. I'm open to other suggestions but I fear the liveCD won't be a feasible solution.
|
|
|
|
speedmann
Newbie
Offline
Activity: 63
Merit: 0
|
|
June 14, 2012, 05:05:01 PM |
|
Hi there, are you interested in testing an Geforce GT 420m (yea mobile version ) with Intel Core i3 370M Processor? If yes, please contact me
|
|
|
|
jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 14, 2012, 05:28:07 PM |
|
Hi there, are you interested in testing an Geforce GT 420m (yea mobile version ) with Intel Core i3 370M Processor? If yes, please contact me Thanks for you offer. Luceo's GTX 570 was more attractive but I'm not sure if he's still willing to do it. If you don't mind I'll wait until next week for his answer (or others) and tell you then, ok? Even if your version is mobile, still has 96 cores instead of my 32 and the CPU has 2 cores instead of one (so system processes shouldn't get in the way). If no one says anything else, you're the winner for now
|
|
|
|
Jello
Newbie
Offline
Activity: 10
Merit: 0
|
|
June 17, 2012, 09:34:14 PM |
|
I have a card with 96 CUDA cores on an 2nd gen Sandy Bridge on CentOS. Not sure if that's enough for your needs.
|
|
|
|
mokahless
|
|
June 18, 2012, 12:07:04 AM |
|
What does your program do? Mining? Or are you cracking government passwords or something sketchy?
|
|
|
|
jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 18, 2012, 01:35:07 PM |
|
I have a card with 96 CUDA cores on an 2nd gen Sandy Bridge on CentOS. Not sure if that's enough for your needs.
I guess CentOS will have g++ and gnuplot in its repositories, but not so sure about nasm (the assembler). I've develeoped everything under ubuntu but it shouldn't be a problem. I also guess Sandy Bridge is compatible with SSE2 and the XMM co-processor, after all is intel. Don't you know what GPU model do you have? 96 cores will be fine anyway, but I'm not sure if your machine is better or worse than speedmann's. What does your program do? Mining? Or are you cracking government passwords or something sketchy?
It has different names: Evolutionary artificial neural networks, neuro-evolution...Is basically a machine learning technique combining neural networks and genetic algorithms. What does my program learn? To correctly calculate AND or OR from 2 bit vectors (very basic classification tasks), XOR (harder, you need NN with more than 1 layer) and to play reversi/othello, a board game.
|
|
|
|
Luceo
Sr. Member
Offline
Activity: 350
Merit: 250
Per aspera ad astra!
|
|
June 18, 2012, 06:32:14 PM Last edit: June 18, 2012, 07:08:48 PM by Luceo |
|
I've had a look at the code and I'm happy running this on my machine 'naked'. A LiveCD wouldn't affect performance as I'd reboot into it and everything'd be running in RAM, but getting all the software on one would be a pain. I'm using Arch Linux x86_64. Edit: CXX_BASE isn't set by default, set that. Next issue: (3:527)~ make g++ -ggdb -I src/ -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o g++ -ggdb -I src/ -c src/common/chronometer.cpp -o build/common/chronometer.o g++ -ggdb -I src/ -c src/common/dummy.cpp -o build/common/dummy.o g++ -ggdb -I src/ -c src/common/enumerations.cpp -o build/common/enumerations.o g++ -ggdb -I src/ -c src/common/util.cpp -o build/common/util.o src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’: src/common/util.cpp:61:48: error: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive] make: *** [build/common/util.o] Error 1 Set CXX_BASE to -fpermissive, not sure if this'll affect the test. Also have to change the cuda directory. Makefile looks pretty different. Hmm, didn't run right, any ideas? ... EDIT 2: OK, got a little further. Your 'if' loop in the Makefile was being ignored, so I manually set these in the Makefile: FACT_OBJ = $(FULL_OBJ) FACT_FLAGS += -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL Then I added my CUDA 'bin' path to PATH variable. Now, I get the following when I try to compile: (3:587)# make mkdir -p build/common mkdir -p build/optimization mkdir -p build/neural mkdir -p build/genetic mkdir -p build/game mkdir -p build/tasks mkdir -p build/loop mkdir -p build/loopTest mkdir -p build/test/ g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/dummy.cpp -o build/common/dummy.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/util.cpp -o build/common/util.o src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’: src/common/util.cpp:61:48: warning: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive] nasm -f elf src/optimization/sse2_code.asm -o build/optimization/sse2_code.o /usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined
src/optimization/cuda_code.cu(60): error: more than one instance of overloaded function "min" matches the argument list: function "min(int, unsigned int)" function "min(unsigned int, unsigned int)" argument types are: (unsigned long, unsigned int)
src/optimization/cuda_code.cu(175): error: more than one instance of overloaded function "min" matches the argument list: function "min(int, unsigned int)" function "min(unsigned int, unsigned int)" argument types are: (unsigned long, unsigned int)
4 errors detected in the compilation of "/tmp/tmpxft_000046f1_00000000-4_cuda_code.cpp1.ii". make: *** [build/optimization/cuda_code.o] Error 2 Any suggestions? Please PM me, I'd like to get this running as I've already spent a fair amount of time hacking at the Makefile to get it to.
|
|
|
|
jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 19, 2012, 08:56:16 AM |
|
Sorry about all the problems you're having. I've compiled the program with older software and this is the result. Being CUDA propietary software, when they decided to remove the emulator I got stuck with version 2.3 which means g++ 4.3 and I'm using make 3.81. The permisive flag seems to solve the problems with the newer g++ and you've hacked the makefile to make it work with the newer make. I forgot that the path for nvcc is different in newer versions. It doesn't run well because it hasn't compiled correctly. I don't even know why the executables are created. I've changed cuda_code.cu to try to solve those problems. You can get it here: http://preann.svn.sourceforge.net/viewvc/preann/preann/src/optimization/cuda_code.cu?view=logIf you have more problems, please, post them. Maybe the easiest solution (if you can get a live CD with g++-4.3) is to install the legacy version of CUDA: http://developer.nvidia.com/cuda-toolkit-23-downloadsThank you for taking the time to try this.
|
|
|
|
Luceo
Sr. Member
Offline
Activity: 350
Merit: 250
Per aspera ad astra!
|
|
June 19, 2012, 05:55:16 PM |
|
No problem, it clearly needs working through so glad I can be of assistance. Current output still fails, though: (3:600)# make mkdir -p build/common mkdir -p build/optimization mkdir -p build/neural mkdir -p build/genetic mkdir -p build/game mkdir -p build/tasks mkdir -p build/loop mkdir -p build/loopTest mkdir -p build/test/ g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/dummy.cpp -o build/common/dummy.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/util.cpp -o build/common/util.o src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’: src/common/util.cpp:61:48: warning: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive] nasm -f elf src/optimization/sse2_code.asm -o build/optimization/sse2_code.o /usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined
2 errors detected in the compilation of "/tmp/tmpxft_00006ce2_00000000-4_cuda_code.cpp1.ii". make: *** [build/optimization/cuda_code.o] Error 2 This issue was present in some of the CUDA demos, and was fixed by adding this to the top of cuda_code.cu (although I'm totally unsure what this does, it's just a forum suggestion): #undef _GLIBCXX_ATOMIC_BUILTINS #undef _GLIBCXX_USE_INT128 Then, the following return was given to a 'make' (this could be related to my edit as described in the last codeblock, though): (3:602)# make /usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o src/optimization/cuda_code.cu(382) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumBitsConnectionsKernel<(BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(382) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumBitsConnectionsKernel<(BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(457) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumBitsInvertedConnectionsKernel<(BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(457) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumBitsInvertedConnectionsKernel<(BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)0> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)1> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)2> ") is not allowed
src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min<unsigned long> ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)2> ") is not allowed
34 errors detected in the compilation of "/tmp/tmpxft_00006d0d_00000000-4_cuda_code.cpp1.ii". make: *** [build/optimization/cuda_code.o] Error 2
|
|
|
|
jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 19, 2012, 06:43:33 PM |
|
No problem, it clearly needs working through so glad I can be of assistance. Current output still fails, though: ... This issue was present in some of the CUDA demos, and was fixed by adding this to the top of cuda_code.cu (although I'm totally unsure what this does, it's just a forum suggestion): #undef _GLIBCXX_ATOMIC_BUILTINS #undef _GLIBCXX_USE_INT128 Thank you. I'll find out what this means. Probably desabling new stuff by default or something. Then, the following return was given to a 'make' (this could be related to my edit as described in the last codeblock, though):
...
That happens when you change things and you only test your "improvements" with the emulator. Sorry again. I've uploaded the file again to the repository with a change to try to solve that (and your change). Can you try it again?
|
|
|
|
Luceo
Sr. Member
Offline
Activity: 350
Merit: 250
Per aspera ad astra!
|
|
June 19, 2012, 10:56:46 PM Last edit: June 20, 2012, 12:05:13 AM by Luceo |
|
OK, I had to swap around the #undef and #include statements at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one: (3:624)# make /usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/optimization/factory.cpp -o build/optimization/factory.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/interface.cpp -o build/neural/interface.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/connection.cpp -o build/neural/connection.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/neuralNet.cpp -o build/neural/neuralNet.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/layer.cpp -o build/neural/layer.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/inputLayer.cpp -o build/neural/inputLayer.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/buffer.cpp -o build/neural/buffer.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/individual.cpp -o build/genetic/individual.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/task.cpp -o build/genetic/task.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/population.cpp -o build/genetic/population.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/game/reversiBoard.cpp -o build/game/reversiBoard.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/game/board.cpp -o build/game/board.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/reversiTask.cpp -o build/tasks/reversiTask.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/binaryTask.cpp -o build/tasks/binaryTask.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/classificationTask.cpp -o build/tasks/classificationTask.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/rangeLoop.cpp -o build/loop/rangeLoop.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/enumLoop.cpp -o build/loop/enumLoop.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/genericPlotter.cpp -o build/loop/genericPlotter.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/plot.cpp -o build/loop/plot.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/loop.cpp -o build/loop/loop.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/joinEnumLoop.cpp -o build/loop/joinEnumLoop.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/test.cpp -o build/loop/test.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/parametersMap.cpp -o build/loop/parametersMap.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loopTest/taskPlotter.cpp -o build/loopTest/taskPlotter.o g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loopTest/chronoPlotter.cpp -o build/loopTest/chronoPlotter.o /usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -L/opt/cuda-toolkit/lib64 -lcudart build/test/testMemoryLosses.o build/common/chronometer.o build/common/dummy.o build/common/enumerations.o build/common/util.o build/optimization/factory.o build/neural/interface.o build/neural/connection.o build/neural/neuralNet.o build/neural/layer.o build/neural/inputLayer.o build/neural/buffer.o build/genetic/individual.o build/genetic/task.o build/genetic/population.o build/game/reversiBoard.o build/game/board.o build/tasks/reversiTask.o build/tasks/binaryTask.o build/tasks/classificationTask.o build/loop/rangeLoop.o build/loop/enumLoop.o build/loop/genericPlotter.o build/loop/plot.o build/loop/loop.o build/loop/joinEnumLoop.o build/loop/test.o build/loop/parametersMap.o build/loopTest/taskPlotter.o build/loopTest/chronoPlotter.o build/optimization/sse2_code.o build/optimization/cuda_code.o -o bin/testMemoryLosses.exe /usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output collect2: error: ld returned 1 exit status make: *** [bin/testMemoryLosses.exe] Error 1 Feel like we're getting somewhere. I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile: FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ) FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to: while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 : Implementation SSE2 is not allowed. However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like: while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 : The maximum float input size is 4032. Complete output from this run has been pasted here. Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).
|
|
|
|
jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 20, 2012, 07:18:50 AM |
|
OK, I had to swap around the #undef and #include statements at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one: I'll correct that, thanks. /usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output collect2: error: ld returned 1 exit status make: *** [bin/testMemoryLosses.exe] Error 1 Oh, I didn't thought about that. The assembly code is incompatible with the 64bit operative system. Feel like we're getting somewhere. I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile: FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ) FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to: while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 : Implementation SSE2 is not allowed. Actually I wanted to compare the SSE2 version with the same CPU. If you can't use a 32 bit OS, I guess I can repeat some of the charts and compare results independently. After removing the SSE2 version, to not having those errors it is necessary to change the running main files (chronoBuffers.cpp, chronoConnections.cpp, chronoFunctions.cpp) to remove the SSE2 option, but those errors aren't really important. Anyway, the changes would be something like this: In chronoBuffers.cpp EnumLoop linesLoop(ET_IMPLEMENTATION, 3, IT_C, IT_SSE2, IT_CUDA); for EnumLoop linesLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA); In chronoConnections.cpp EnumLoop linesLoop(ET_IMPLEMENTATION); for EnumLoop linesLoop(ET_IMPLEMENTATION, 4, IT_C, IT_SSE2, IT_CUDA, IT_CUDA_REDUC, IT_CUDA_INV); Maintain linesLoop.addInnerLoop(new EnumLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA)); for chronoFunctions.cpp However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like: while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 : The maximum float input size is 4032. Complete output from this run has been pasted here. Don't worry, that's expected. One CUDA implementation doesn't allow certain sizes. I don't understand why it says Cmake: *** [cuda_emu] Interrupt By default it should be [all] and not [cuda_emu] but it doesn't matter. The cuda part is compiled without the "--device-emulation" so everything's fine: (3:624)# make /usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).
Sorry for not answering earlier. I would prefer to run this with a 32 bit OS and the SSE2 version, but again, if that's not possible, I'll work it out somehow. But the other errors are fine, is just an implementation which is more limited trying to run greater sizes for the layers than it can. That's expected. Thank you again for all your effort and patience.
|
|
|
|
Luceo
Sr. Member
Offline
Activity: 350
Merit: 250
Per aspera ad astra!
|
|
June 20, 2012, 12:03:50 PM Last edit: June 20, 2012, 12:21:17 PM by Luceo |
|
I actually only have 64-bit operating systems, but since SSE2 is CPU-based, it's not really comparable to the CUDA (GPU) results and should be able to be tested on just about anything.
If you want to update the assembler code for a 64-bit CPU, I'd be willing to test it, but I expect that's a lot of work and not as simple as just using 64-bit registers (my asm knowledge is very rusty).
I'll run the program next time I nap and post results here if it finishes during that nap.
|
|
|
|
jtimon (OP)
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 20, 2012, 06:52:02 PM |
|
I actually only have 64-bit operating systems, but since SSE2 is CPU-based, it's not really comparable to the CUDA (GPU) results and should be able to be tested on just about anything.
If you want to update the assembler code for a 64-bit CPU, I'd be willing to test it, but I expect that's a lot of work and not as simple as just using 64-bit registers (my asm knowledge is very rusty).
I'll run the program next time I nap and post results here if it finishes during that nap.
In my old computer with poor communication between CPU and GPU memories the SSE2 implementation was actually superior. That's what I didn't want to show to my teachers. But yes, I can compare SSE2 and CUDA against C separately. But if I find easy to port the assembly code to 64 bits I'll do it just to make it nicer.
|
|
|
|
2112
Legendary
Offline
Activity: 2128
Merit: 1073
|
|
June 20, 2012, 07:05:10 PM |
|
I'm using Arch Linux x86_64.
I just checked that Arch Linux has support for multilib. This means that 64-bit OS can run 32-bit programs, provided that: 1) the multilib support packages are installed; 2) gcc/g++ are invoked with -m32 flag. So there's no need to laboriously rewrite the assembly code. All you need to do is modify the makefiles. Have fun.
|
|
|
|
|