Bitcoin Forum
May 10, 2024, 10:12:04 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: [1]
1  Bitcoin / Bitcoin Wallet for Android / Use a Tor hidden service as trusted node? on: August 30, 2013, 05:08:21 PM
Should it be possible? I tried it, but it didn't work.
Setup your app to use a Tor hs (.onion address) running on my computer, but got no connection.
My computer is reachable if I don't configure it to use Tor and add my external IP as trusted node.

Dia
2  Bitcoin / Development & Technical Discussion / seeking Qt capable developer / supporters on: January 08, 2013, 05:39:22 PM
Currently Bitcoin-Qt lacks of development speed / big innovations and help from the community would be greatly appreciated.
I thought it's a good idea to ask for some input from you guys here in the forums. I would love to see some others contribute via pull-requests or bug reports (with good descriptions, so that we can look into fix even more stuff). Over the last months we made some very good progress in maturing the code and ironing out Qt bugs + some nice Windows things related to security (e.g. DEP / ASLR support) ... I hope more is to come Smiley.

Dia
3  Bitcoin / Development & Technical Discussion / standard procedure to handle security vulnerabilities? on: September 23, 2012, 11:20:13 AM
As we recently added version information to bitcoin-qt.exe, I requested Secunia to add Bitcoin-Qt to their Secunia PSI database and today Bitcoin-Qt is found and listed after a scan Smiley!
Secunia PSI (https://secunia.com/vulnerability_scanning/personal/) is a tool for Windows, which checks installed programs and warns if it finds versions, which contain a known security vulnerability and offers auto-updates or at least a link with valuable information what a user can do to fix it.



IMHO it would be nice, if we report our CVEs (https://en.bitcoin.it/wiki/Common_Vulnerabilities_and_Exposures) or at least known security problems to them, so they can add such infos to their database.

https://secunia.com/community/advisories/report_vulnerability/

The headline of this thread is "standard procedure to handle security vulnerabilities?" and I'm asking is there one and do others agree that is is valuable to report security problems (or is this already beeing done?)?

Dia
4  Bitcoin / Development & Technical Discussion / Any quick way to mine some Coins in the testnet? on: June 09, 2012, 12:24:52 AM
I need to mine a few coins in the testnet and seek a way to do this quick and dirty, any ideas?

Dia
5  Bitcoin / Development & Technical Discussion / Github - bitcoin commit break? on: April 02, 2012, 06:07:34 AM
Perhaps that's the wrong place to ask, but I would like to now how long there will be a commit break. I added a few commits, which come from one local source and it get's hard to keep track of the different commits Cheesy. Is the normal process to switch to 0.7 now on Git?

Dia
6  Bitcoin / Development & Technical Discussion / compiling Bitcoin-Qt on Windows (solved - can a dev have a look inside ^^?) on: March 20, 2012, 07:06:04 AM
Setup is as follows:

Windows 7 x64 SP1
Qt SDK 1.2 with Qt Creator 2.4.1 and included MinGW 4.4.0
contents of qtgui_deps_1.zip were extracted into the MinGW folder in D:\QtSDK\mingw

I have the current Git version of Bitcoin as a local repo, started Qt Creator by double-clicking on bitcoin-qt.pro and added the "USE_UPNP=-" flag to qmake. As it didn't work out of the box, I upgraded MinGW to 4.6.2 via mingw-get-inst-20111118.exe.

Now the build-log is as follows:

Quote
./build\bitcoin.o:bitcoin.cpp:(.text$_ZN5boost12interprocess6winapi23get_wmi_class_attributeERSbIwSt11char_traitsIwESaIwEEPKwS8_[boost::interprocess::winapi::get_wmi_class_attribute(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, wchar_t const*, wchar_t const*)]+0x368): undefined reference to `_imp__VariantClear@4'
./build\util.o:util.cpp:(.text+0xfa1): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0xfc7): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0xfd7): undefined reference to `boost::filesystem3::detail::create_directory(boost::filesystem3::path const&, boost::system::error_code*)'
./build\util.o:util.cpp:(.text+0x2d15): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x2d52): undefined reference to `boost::filesystem3::path::root_name() const'
./build\util.o:util.cpp:(.text+0x2d8f): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x2dc0): undefined reference to `boost::filesystem3::path::operator/=(boost::filesystem3::path const&)'
./build\util.o:util.cpp:(.text+0x2e09): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x2e5b): undefined reference to `boost::filesystem3::path_traits::convert(wchar_t const*, wchar_t const*, std::string&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x2e7e): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x2e94): undefined reference to `boost::filesystem3::path::root_directory() const'
./build\util.o:util.cpp:(.text+0x2ece): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x3069): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x30a6): undefined reference to `boost::filesystem3::path::root_name() const'
./build\util.o:util.cpp:(.text+0x30e3): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x3114): undefined reference to `boost::filesystem3::path::operator/=(boost::filesystem3::path const&)'
./build\util.o:util.cpp:(.text+0x315d): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x31af): undefined reference to `boost::filesystem3::path_traits::convert(wchar_t const*, wchar_t const*, std::string&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x31d2): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x31e8): undefined reference to `boost::filesystem3::path::root_directory() const'
./build\util.o:util.cpp:(.text+0x3222): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x3eac): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x3edd): undefined reference to `boost::filesystem3::detail::system_complete(boost::filesystem3::path const&, boost::system::error_code*)'
./build\util.o:util.cpp:(.text+0x3eee): undefined reference to `boost::filesystem3::detail::status(boost::filesystem3::path const&, boost::system::error_code*)'
./build\util.o:util.cpp:(.text+0x3f5b): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x3fdc): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x43b6): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x43d8): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x482f): undefined reference to `boost::filesystem3::path_traits::convert(wchar_t const*, wchar_t const*, std::string&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x4889): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x48ba): undefined reference to `boost::filesystem3::detail::system_complete(boost::filesystem3::path const&, boost::system::error_code*)'
./build\util.o:util.cpp:(.text+0x48d0): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\util.o:util.cpp:(.text+0x49be): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text+0x49e1): undefined reference to `boost::filesystem3::path_traits::convert(wchar_t const*, wchar_t const*, std::string&, std::codecvt<wchar_t, char, int> const&)'
./build\util.o:util.cpp:(.text.startup+0x38): undefined reference to `boost::system::generic_category()'
./build\util.o:util.cpp:(.text.startup+0x42): undefined reference to `boost::system::generic_category()'
./build\util.o:util.cpp:(.text.startup+0x4c): undefined reference to `boost::system::system_category()'
./build\main.o:main.cpp:(.text+0x1307): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\main.o:main.cpp:(.text+0x1334): undefined reference to `boost::filesystem3::detail::space(boost::filesystem3::path const&, boost::system::error_code*)'
./build\main.o:main.cpp:(.text+0x1492): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\main.o:main.cpp:(.text.startup+0x3c): undefined reference to `boost::system::generic_category()'
./build\main.o:main.cpp:(.text.startup+0x46): undefined reference to `boost::system::generic_category()'
./build\main.o:main.cpp:(.text.startup+0x50): undefined reference to `boost::system::system_category()'
./build\init.o:init.cpp:(.text+0x2e0): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\init.o:init.cpp:(.text+0x306): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\init.o:init.cpp:(.text+0x316): undefined reference to `boost::filesystem3::detail::status(boost::filesystem3::path const&, boost::system::error_code*)'
./build\init.o:init.cpp:(.text+0x6b6): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\init.o:init.cpp:(.text+0x6db): undefined reference to `boost::filesystem3::detail::remove(boost::filesystem3::path const&, boost::system::error_code*)'
./build\init.o:init.cpp:(.text+0x894): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\init.o:init.cpp:(.text.startup+0x36): undefined reference to `boost::system::generic_category()'
./build\init.o:init.cpp:(.text.startup+0x40): undefined reference to `boost::system::generic_category()'
./build\init.o:init.cpp:(.text.startup+0x4a): undefined reference to `boost::system::system_category()'
./build\db.o:db.cpp:(.text+0x4b35): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\db.o:db.cpp:(.text+0x4b77): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\db.o:db.cpp:(.text+0x4ba1): undefined reference to `boost::filesystem3::detail::status(boost::filesystem3::path const&, boost::system::error_code*)'
./build\db.o:db.cpp:(.text+0x4bca): undefined reference to `boost::filesystem3::detail::copy_file(boost::filesystem3::path const&, boost::filesystem3::path const&, boost::filesystem3::copy_option::enum_type, boost::system::error_code*)'
./build\db.o:db.cpp:(.text+0x4bcf): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\db.o:db.cpp:(.text+0x4cff): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\db.o:db.cpp:(.text+0x4d1f): undefined reference to `boost::filesystem3::path_traits::convert(wchar_t const*, wchar_t const*, std::string&, std::codecvt<wchar_t, char, int> const&)'
./build\db.o:db.cpp:(.text+0x4d42): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\db.o:db.cpp:(.text+0x4d58): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\db.o:db.cpp:(.text+0x4d92): undefined reference to `boost::filesystem3::path::operator/=(boost::filesystem3::path const&)'
./build\db.o:db.cpp:(.text+0x4e36): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\db.o:db.cpp:(.text+0x56c3): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\db.o:db.cpp:(.text+0x56f1): undefined reference to `boost::filesystem3::path_traits::convert(char const*, char const*, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >&, std::codecvt<wchar_t, char, int> const&)'
./build\db.o:db.cpp:(.text+0x5701): undefined reference to `boost::filesystem3::detail::create_directory(boost::filesystem3::path const&, boost::system::error_code*)'
./build\db.o:db.cpp:(.text$_ZNK5boost11filesystem34path6stringEv[boost::filesystem3::path::string() const]+0x9): undefined reference to `boost::filesystem3::path::wchar_t_codecvt_facet()'
./build\db.o:db.cpp:(.text$_ZNK5boost11filesystem34path6stringEv[boost::filesystem3::path::string() const]+0x3f): undefined reference to `boost::filesystem3::path_traits::convert(wchar_t const*, wchar_t const*, std::string&, std::codecvt<wchar_t, char, int> const&)'
./build\db.o:db.cpp:(.text.startup+0x38): undefined reference to `boost::system::generic_category()'
./build\db.o:db.cpp:(.text.startup+0x42): undefined reference to `boost::system::generic_category()'
./build\db.o:db.cpp:(.text.startup+0x4c): undefined reference to `boost::system::system_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0xc80): undefined reference to `boost::system::system_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0x4864): undefined reference to `boost::system::generic_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0x486e): undefined reference to `boost::system::generic_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0x4878): undefined reference to `boost::system::system_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0x48ac): undefined reference to `boost::system::system_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0x48b6): undefined reference to `boost::system::system_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0x48c0): undefined reference to `boost::system::system_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0x5919): undefined reference to `boost::system::system_category()'
./build\bitcoinrpc.o:bitcoinrpc.cpp:(.text+0x5988): more undefined references to `boost::system::system_category()' follow

I excluded some warnings in the beginning that were all like this:
Quote
In file included from src/util.h:27:0,
                 from src\qt\/walletmodel.h:6,
                 from src\qt\sendcoinsentry.cpp:6:
src/netbase.h:11:0: warning: "_WIN32_WINNT" redefined [enabled by default]
d:\qtsdk\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/windef.h:20:0: note: this is the location of the previous definition

For me as a novice Qt user with basic programming skills this looks like some sort of problems with the Boost library. Any help would be great :). One last thing I noticed, the supplied libs in qtgui_deps_1.zip seem to be based on Boost 1.43, but 1.47 is mentioned in https://github.com/bitcoin/bitcoin/blob/master/doc/build-msw.txt. Perhaps I need to compile the libs myself with 1.47? Next question would be how to do that ^^.

Dia
7  Bitcoin / Bitcoin Technical Support / permanently decrypt an encrypted wallet? on: March 18, 2012, 08:43:21 PM
Perhaps I'm missing something, but as I tried to play around with wallet encryption I was sure I could take the same way back.
Instead of "encrypt wallet", I awaited the option "decrypt wallet" in the Bitcoin client GUI (0.6 RC4) which is not there.

Is that not implemented? How can I now decrypt the wallet again?

Dia
8  Bitcoin / Development & Technical Discussion / Client 0.5.2 - how many incoming connections are considered normal? on: February 29, 2012, 03:32:19 PM
My router forwards port 8333 to my PC, which runs the client in server mode. But I currently only have 9 connections from the internet, which seems low to me.
What are your numbers?

Dia
9  Bitcoin / Mining software (miners) / DiaKGCN kernel for CGMINER + Phoenix 2 (79XX / 78XX / 77XX / GCN) - 2012-05-25 on: January 27, 2012, 06:56:42 PM
DiaKGCN is a work-in-progress GCN optimised mining-kernel for CGMINER and Phoenix 2. Currently it ate weeks of hard work and trial and error. It will run on VLIW4 and VLIW5 GPUs just fine, but it's not optimised for them.

As the kernel is now part of CGMINER since version 2.2.7, there is no need to download additional files, you can use it out of the box. I will supply an updated kernel package for Phoenix 2, when the final version is available!

I'd like to get feedback, performance results and ideas to optimise it even further!
To support the further development of this kernel please donate to: 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x (0.94 BTC donated so far, thanks!)

Diapolo



CGMINER thread with download links and documentation:
https://bitcointalk.org/index.php?topic=28402.0

DiaKGCN - Phoenix 2 download history:
https://anonfiles.com/file/a88219997407050d4b2ec153b35b2c0a
http://www.filedropper.com/diakgcnphoenix2
http://www.filedropper.com/diakgcnphoenix2preview_1

DiaKGCN - Phoenix 1 download history (just for reference):
http://www.filedropper.com/diakgcn04-02-2012
http://www.filedropper.com/diakgcn03-02-2012_1
http://www.filedropper.com/diakgcn02-02-2012
http://www.filedropper.com/diakgcn29-01-2012
http://www.filedropper.com/diakgcn28-01-2012



instructions for CGMINER

To use the current optimal settings on 79XX cards add this parameters to your CGMINER command-line:
Code:
-k diakgcn -v 2 -w 256

You need CGMINER >= 2.2.7 to be able to use diakgcn!



instructions for Phoenix 2

Place the folder diakgcn in phoenix2\plugins and use this for your config-file on 79XX cards (here it's for platform and device 0):
Code:
[cl:0:0]
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256

For VLIW4 / VLIW5 you should use:
Code:
[cl:0:0]
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = false
vectors4 = false
vectors8 = true
worksize = 128

With the current Phoenix 2 version don't use 1 instance with mixed GCN or VLIW4 / VLIW5 GPUs as this will lead to very poor performance!



instructions for Phoenix 1

Place the folder diakgcn in phoenix\kernels and use this command line on 79XX cards:
Code:
-k diakgcn AGGRESSION=12 VECTORS2 WORKSIZE=256

For VLIW4 / VLIW5 you should use:
Code:
-k diakgcn AGGRESSION=12 VECTORS4 WORKSIZE=128
or
Code:
-k diakgcn AGGRESSION=12 VECTORS8 WORKSIZE=128

If you encounter high CPU usage and use multiple cards, try to give each Phoenix instance a single CPU core (set a CPU affinity)!



DiaKGCN parameter description for Phoenix

BFI_INT
Use BFI_INT instruction patching (default is true).

GOFFSET
Use OpenCL 1.1 global offset parameter (default is true).

VECTORS2
Enable uint2 vector support in the kernel (default is false).

VECTORS4
Enable uint4 vector support in the kernel (default is false).

VECTORS8
Enable uint8 vector support in the kernel (default is false).



BFI_INT patching whitelist (only VLIW4 / VLIW5 GPUs)

Barts
BeaverCreek
Caicos
Cayman
Cedar
Cypress
Devastator
Juniper
Loveland
Redwood
Scrapper
Turks
WinterPark



changelog 04-02-2012:
- added uint8 vectors support in the kernel and the init (use VECTORS8 switch to activate it)
- added GOFFSET switch to be able to disable global offset parameter (use GOFFSET=False to disable it)
  -> perhaps GOFFSET is slower for some, now you can try the alternative
- changed some kernel parameter descriptions
- removed unused VECTORS3 code, never got it working :-/
- renamed OpenCL11 flag to hasOpenCL11 in the init
- removed some unneeded references to phatk from the init
- added a few comments in the init
- upped init revision to 127

changelog 03-02-2012:
- fixed the VECTORS4 code-path, which is now usable again
  -> VECTORS4 should be beneficial for VLIW4 / VLIW5, but not for GCN
- removed the (u) typecasts in the non BFI_INT Ch() and Ma() versions
  -> the hex values, who are directly used in Ch() or Ma() were changed to be unsigned
- added 2 different Ma() versions, one for VECTORS2 or VECTORS4 defined (was in before), the other for the scalar version of the kernel (new)
  -> new scalar version saves 4 Bytes in compiled GPU ISA code (but VECTORS2 is still fastest for GCN)
- hardened the BFI_INT auto patching code in the init
  -> a whitelisted OpenCL device is now checked for cl_amd_media_ops extension
- fixed a small bug where I tried to use the C-operator "&" as a "logical and" in the init
  -> changed into an Python "and" ^^
- removed a few lines unused code from the init
- upped init revision to 126

changelog 02-02-2012:
- added an automatic usage of the OpenCL 1.1 global offset parameter, on OpenCL >= 1.1 platforms -> Thanks DiabloD3 for the idea
- removed both __constant arrays in the kernel, values are now used directly
- changed Ma() function from a general one into faster ones for the BFI_INT path and the non BFI_INT path
- added new kernel parameters (W16addK16, W17addK17, state0A and state0B)
- added 2 new local variables state0AaddV0 and state0BaddV0
- rewrote some rounds to use new kernel parameters and variables for faster execution
- fixed a write to output buffer bug for the non VECTORS path in the kernel
- changed the BFI_INT whitelisted flag code in the init
- added an OpenCL >= 1.1 flag in the init used for activating the global offset parameter
- reactivated PyOpenCL version output in the init
- upped init revision to 125
- removed unneeded code or comments from the kernel and the init
- added DiabloMiner kernel as addition reference for getting new ideas in the kernel header

changelog 29-01-2012:
- reordered kernel parameters in order of usage in the kernel
- removed unused kernel parameters (B1addF1addK6, C1addG1addK5, D1addH1)
- added new kernel parameter (PreVal0addK7)
- rewrote first 4 rounds to speed up the kernel
- VECTORS4 parameter is not finished, it currently uses VECTORS2 code-path
10  Bitcoin / Mining software (miners) / howto: start mining software with CPU affinity (Win only) to avoid "core eating" on: July 15, 2011, 03:17:47 PM
I saw quite a few topics from Phoenix users, who complain that Phoenix sucks up 1 full CPU core for each running instance.
Not all are seeing this behaviour, but the ones who do may be interested in this.

That is how I start Phoenix for my HD5830 via a batchscript:

Code:
CD $PATH_TO_YOUR_MINING_EXE$ (C:\Bitcoin\Phoenix for example)
START /AFFINITY 0x01 phoenix.exe -v -u http://XYZ:8080 -a 50 -k phatk AGGRESSION=12 BFI_INT DEVICE=1 FASTLOOP=false VECTORS WORKSIZE=256

Now let's take a look at /AFFINITY 0x01. This parameter allows you to set CPU affinity via a hex mask. I have a six core Phenom II, now I want Phoenix to use only the first core.

You have to think of your CPU cores as bits, so the Phenom would be 000001 (the right bit is ths first core, right -1 would be second core and so on).
Now you have to convert your CPU mask into hex, which is easy for 000001, it is 0x01. To use only core 2 (000010) the hex value is 0x02 and so on.

You now can try to set the affinity of every Phoenix instance to the same CPU or you distribute the instances over your cores.

An affinity gives you a great advantage, there is no core hopping of processes PLUS you can make Phoenix (or whatever you use) only use one core and not more.

Give it a try and if it works, consider a small donation Cheesy.

Dia

PS.: I had to use the CD $PATH_TO_YOUR_MINING_EXE$, because the affinity was not set if using START /D:$PATH_TO_YOUR_MINING_EXE$!
11  Bitcoin / Mining software (miners) / further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 on: July 04, 2011, 08:15:55 AM
This is a repost from the Newbies forum, because I'm now allowed to post here :).
original Thread is located here: http://forum.bitcoin.org/index.php?topic=25135.0

If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).



Important (2012-01-13): The FASTLOOP=False parameter is not needed anymore, because FASTLOOP defaults to false in this version. Update: FASTLOOP=True works now, I uploaded a fixed version!

Important: since OpenCL SDK / Runtime version 2.6 AMD updated their OpenCL compiler, so that some older kernels and optimizations in them seem to not work anymore or are not needed anymore. In order to reflect this change I had to edit the kernel performance section of this thread.

Important: since version 2011-08-27 you don't need to supply the BFI_INT switch anymore. If your HW supports it, it's enabled automatically. To disable it use BFI_INT=false.

Important: since version 2011-08-04 (pre-release) you have to use the switch VECTORS2 instead of VECTORS. I made this change to be clear what vectors are used in the kernel (2- or 4-component). To use 4-component vectors use switch VECTORS4.

Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix miner) is included in this package and has to be used! The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.



This is the preferred switch for Phoenix with phatk_dia in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 VECTORS2 WORKSIZE=128


Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg
Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji
Download version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4j
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788
Download version 2011-07-17: http://www.mediafire.com/?4zxdd5557243has
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6
Download version 2011-07-07: http://www.mediafire.com/?o7jfp60s7xefrg4
Download version 2011-07-06: http://www.mediafire.com/?f8b8q3w5u5p0ln0
Download version 2011-07-03: http://www.mediafire.com/?xlkcc08jvp5a43v
Download version 2011-07-01: http://www.mediafire.com/?5jmt7t0e83k3eox

Kernel performance (BFI_INT / VECTORS2 / WORKSIZE=128 / SDK 2.6 / APP KernelAnalyzer 1.11 - Cal 11.12 profile):
HD5870
2011-08-20: 22 GPR / 1427 ALU OPs / 66 CF OPs
2011-08-27: 22 GPR / 1426 ALU OPs / 66 CF OPs
2011-12-21: 20 GPR / 1400 ALU OPs / 66 CF OPs
2012-01-13: 21 GPR / 1394 ALU OPs / 67 CF OPs

HD6970
2011-08-20: 21 GPR / 1687 ALU OPs / 66 CF OPs
2011-08-27:  23 GPR / 1688 ALU OPs / 68 CF OPs
2011-12-21: 21 GPR / 1687 ALU OPs / 66 CF OPs
2012-01-13: 20 GPR / 1687 ALU OPs / 66 CF OPs



Kernel performance (BFI_INT / VECTORS2 / SDK 2.5 / APP KernelAnalyzer 1.9 - Cal 11.7 profile):
HD5870
original phatk 1.X: 1393 ALU OPs
2011-07-01: 1389 ALU OPs
2011-07-03: 1385 ALU OPs
2011-07-06: 1380 ALU OPs
2011-07-07: 1380 ALU OPs
2011-07-11: 1378 ALU OPs
2011-07-17: 1376 ALU OPs
2011-08-04 (pre-release): 1368 ALU OPs
2011-08-11: 1364 ALU OPs
2011-08-27: 1363 ALU OPs (30 less compared to original phatk 1.X)
HD6970
original phatk 1.X: 1707 ALU OPs
2011-07-01: 1710 ALU OPs
2011-07-03: 1706 ALU OPs
2011-07-06: 1702 ALU OPs
2011-07-07: 1702 ALU OPs
2011-07-11: 1701 ALU OPs
2011-07-17: 1699 ALU OPs
2011-08-04 (pre-release): 1689 ALU OPs
2011-08-11: 1687 ALU OPs
2011-08-27: 1687 ALU OPs (20 less compared to original phatk 1.X)



changelog:

2012-01-13
Kernel:
- modified: Disclaimer is now the same as in original Phoenix package
- removed: all (u) typecasts in front of scalars, where vectors and scalars were used together because per OpenCL definition this is not needed
- removed: all () brackets around n in the #define parts of the kernel
- removed: S0(), which is now again merged into s0()
- removed: brackets around the commands in t1W(), t1(), t2() and W() were removed, to allow the compiler to reorder these
- added: W() function missed an ; at it's end
- added: init variable B1addK6 used in round 6 to save an add -> THX to DiabloD3
- added: a (uint) typecast in front of get_local_id() and get_group_id() calls, because return value could be 64 bits long, which is not wanted
- modified: replaced all ma() + s0() or s0() + ma() calls with t2()
- modified: round 6 now uses the new new B1addK6 variable
- modified: reordered W[] calculation for rounds 32, 91 and 92
- modified: rounds 121, 122 and 123 to not compute Vals[4], Vals[5] and Vals[6], because they are not needed for final computation of Vals[7] -> THX to jhajduk
- modified: removed + H[7] from round 124 and use -0xec9fcd13 to check for valid nonces
- added: result_r124 variable to take the result of the last round 124, this saves a few ALU OPs on VLIW5 GPUs
Python Init:
- modified: replaced spaces with tabs in the source code formatting (I really dislike this part in Python ^^)
- modified: a few comments and commands were reformatted for better readability or to be better understandable
- modified: FASTLOOP parameter now defaults to False, which means you don't need to supply FASTLOOP=False anymore
- removed: OUTPUT_SIZE is not used anymore so all references to it were removed
- modified: changed REVISION to 122
- modified: moved the WORKSIZE checks below the part where the check, if and which vectors should be used is performed
            this takes into account, that the global worksize passed to the kernel is influenced by vector usage and vectorsize
            (currently the use of FASTLOOP can break this, because of the "dynamic" number of iterations)
- added: some debug info about worksize and pyOpenCL is displayed at the start
- added: B1 + K[6] is passed as new kernel parameter
- modified: made enqueue_read_buffer() / enqueue_write_buffer() blocking and removed finish() after the read, as per AMDs recommendations
            to minimize API overhead

2011-08-27:
Kernel:
- added: code path for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the kernel, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- modified: non BFI_INT Ch() function, which was broken in 2011-08-11 -> THX to Vince
- modified: kernel output buffer is now an ulong array and not an uint array
- removed: OUTPUT_SIZE argument is not passed and used in the kernel anymore
- modified: WORKSIZEx4, WORKSIZEx3 and WORKSIZEx2 arguments were merged into WORKSIZExVECSIZE
- modified: removed, reordered and added some brackets and type-casting stuff in the kernel
- modified: restored command order for round 108 - 123 to free a GPR
- modified: added H[7] into round 124 calculation
- modified: changed the checking for positive nonces again to cover the H[7] change
- modified: writing of nonces to output now uses 1 write for Vec2 and max. 2 writes for Vec4, because 2x uints are now encoded into 1x ulong
Python Init:
- added: code for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the Python init, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- added: detection of maximum supported WORKSIZE per Device, which is used if no WORKSIZE is supplied, if supplied WORKSIZE > max. supported WORKSIZE
    or if WORKSIZE is not a power of 2
- added: code to decode the ulong from the output buffer into 2x uint and process the results
- modified: comments, code formating and line breaks for better readability
- modified: output buffer size is now the WORKSIZE -> THX to Phaetus

2011-08-11:
- modified: reverted a former change to the Ma() function to save an ALU OP for 69XX cards
- added: S0() and S1() function, which is a compiler help -> THX Phateus
- modified: a few brackets and layout of all helper functions for better readability and compatibility
- added: t2() function, which is (s0(n) + ma(n)) and saves a few GPRs -> THX Phateus and myself (had this in earlier, but removed it sometime ^^)
- modified: changed layout of kernel definition for better readability
- modified: all values which for example had a 10u now have a 10U (uppercase) to be consistent in the whole kernel
- modified: modified round 94 W calculation for better performance
- modified: round 108 - 123 now consists of 2 W() blocks followed by 2 sharoundW() blocks to save a GPR
- modified: changed the checking for positive nonces again to never create an invalid share and lower ALU OP usage

2011-08-04 (pre-release):
- added: user Vince into disclaimer -> THX Vince :)
- added: kernel is now able to work with 4-component vectors (switch VECTORS4) -> THX to Phateus
- modified: to use 2-component vectors I renamed the switch VECTORS to VECTORS2
- added: __attribute__((reqd_work_group_size(WORKSIZE, 1, 1))) -> THX to Phateus
- added: constants PreW31 and PreW32, which store P2() + P4() for round 31 and 32 -> THX to Phateus
- renamed - modified: W17_2 is now PreW19, W2 is now PreW18, PreVal4addT1 is now PreVal4 (= PreVal4 + T1), state0subT1 is now PreVal0 (= Preval4 + state0)
- modified: base is now declared as u to save the addidion of uint2(0, 1) or uint4(0, 1, 2, 3) for W_3 init -> THX to Phateus
- modified: nonce calculation now uses the local Work-Item ID, the group ID and the WORKSIZE instead of only the global Work-Item ID -> THX to Phateus
- added: saved a multiplication by passing WORKSIZEx2 and WORKSIZEx4 constants to the kernel
- modified: calculation for W[18 - O] was optimized so that P2(18) is only calculated for x component (if Vectors are used), because x and y only differ
       in the LSB and afterwards Bit 14 and 25 are rotated for W[18 - O].y -> THX to Phateus
- modified: saved an addition for Vals[0] init, because of the change to PreVal0
- modified: reordered code for round 4 - 95 to optimize for less ALU OPs used -> THX Phateus and myself ^^
- modified: ordering of variables in additions for Round 124 was changed to optimize for less ALU OPs used
- modified: rewrote the part where nonces are checked, if they are positive and where they are written into output buffer
       (saves 2 global writes per work-item and saves additional ALU OPs)
- modified: changed variables W_3, P2_18_x, P2_18 and nonce into a constant
- modified: changed code formating for rounds 4 - 124 better readability
- removed: some comments to cleanup the code

2011-07-17:
- added: offset for W[] array to reduce it's size -> THX to user Vince
- modified: function t1() renamed to t1W() / function sharound() renamed to sharoundW()
- added: function t1() and sharound() which are used where the W[] addition can be left out, because W[] == 0
    I guess the compiler already does this optimization, but doesn't hurt) -> THX to user Vince
- modified: P1() - P4() and W() to male use of the offset
- modified: quite a few kernel parameters have new values or were added (mixed ideas from User Vince with own ones)
    C1addK5: C1addK5 = C1 + K[5]: C1addK5 = C1 + 0x59f111f1
    D1: D1 = D1 + K[4] + W[4]: D1 = D1 + 0xe9b5dba5 + 0x80000000U
    W2: W2 + W16 in P1(): W2 = P1(18) + P4(18)
    W17_2: 0x80000000U in P2() = 0x11002000 + W17 in P1(): W17_2 = P1(19) + P2(19)
    PreValaddT1: PreValaddT1 = PreVal4 + T1
    T1substate0: T1substate0 = T1 - substate0
- added: variable W_3, which stores the first value formely held in W[3]
- added: Temp variable used to speed up calculation for rounds 4 and 5
- modified: changed round 3 so that it's more efficient (uses: Vals[0] and Vals[4])
- modified: W[0] - W[14] are now kind of hard-coded or left out, where they were 0
- modified: optimized P1(18) + P2(18) + P4(18)
- modified: optimized P1(19) + P2(19) + P4(19)
- modified: optimized round 4 + 5
- modified: rounds 6 - 14 and 73 - 78 now use new sharound() without W[] addition
- modified: offset added for all parts, where W[] is used
- modified: W_3 is used as result instead of W[3] (W[3] is still used to generate random possition in output buffer) -> THX to user Vince

2011-07-11:
- modified: constant H[7] has a new value (saves an addition in round 124)
- modified: non BFI_INT Ch() function now uses OpenCL built-in bitselect
- modified: reordered W[] calculations for round 18 - 30, 87 and 94
- modified: reordered calculation for round 5
- modified: W[] calculation for round 80 - 86 is now a block before sharound() is called
- removed: K[60] from round 124 (because of new H[7] value)

2011-07-07:
- removed: some large comments in the source were removed
- modified: Ma() function is now unique in the kernel, no matter if BFI_INT is used or not -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: Ch() function which uses OpenCL bitselect() command (but it's not active, so you are free to try it) -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- modified: u W[128] is replaced with u W[124] because no more than 124 values are used
- modified: initialisation for Vals[0], Vals[3], Vals[4] and Vals[7] is now processed in other places to save some unneeded writes to these variables
- fixed: some hex values, which were used in vector additions are now properly type-casted, which hopefully restores AMD APP SDK 2.1 compatibility
- modified: rounds 3, 4 and 5 were modified for better performance (guess this can be tuned, if I have a working KernelAnalyzer)

2011-07-06:
- modified: H[] constants were reordered (2 were not used because of earlier mods)
- added: ulong L constant added (it's value doesn't fit into an uint)
- modified: new Ma() for non BFI_INT capable cards, should be faster -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- removed: t1W()
- modified: t1() reordered function calls for better performance
- modified: W() reordered function calls for better performance
- modified: sharound() removed writing to t1, now t1() is called twice, which makes this function FASTER (OpenCL compiler optimization)
- removed: sharound2() (if needed W() + sharound() is used instead)
- removed: partround() not needed because of another solution for round 3 and 124
- removed: t1 and t1W variabled
- modified: rounds 3, 19, 30, 81, 87, 94 and 124 were modified for better performance

2011-07-03:
- removed: t2(), w(n), r0(x), r1(x), R0(n) and R1(n)
- renamed - modified: R(x) to W(x) plus now uses P1, P2, P3 and P4 directly
- modified: P1(x) and P2(x) to not use R1(x - 2), R0(x - 15) but do that directly
- modified: SHA rounds 31, 32, 47 - 61, 86, 87, 114 - 119 now use sharound2() instead of W() + sharound()
- modified: reordered code for SHA rounds 66 - 94 -> saw no decrease in performance -> better readability
- modified: SHA rounds 18, 19, 20, 80, 93, 94 now use a simpler calculation because of removed zero addions
--> 1x P1(x), 2x P2(x), 4x P3(x) and 2x P4(x) were removed which should give a little MHash/sec boost
- modified: sharound() so that a double execution of t1() is avoided -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

2011-07-01:
Code:
Vals[7] = 0xb0edbdd0 + K[0] +  W[64] + 0x08909ae5U; -> Vals[7] = 0xfc08884d + W[64];
Vals[3] = 0xa54ff53a + 0xb0edbdd0 + K[0] +  W[64]; -> Vals[3] = 0x198c7e2a2 + W[64];
- removed the
Code:
Vals[7] += H[7]
addition and replaced the final if-statements in the Kernel
- reordered some W[n] = statements to remove some unneeded additions
- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse :D)
- removed some unneeded brackets
- re-formatted for better readability

If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).

Thanks,
Dia
12  Other / Beginners & Help / AMD Catalyst 11.7 preview (official source) on: July 01, 2011, 10:23:12 PM
Cat 11.7 preview can be downloaded here: http://developer.amd.com/Downloads/110619a-121104E.zip

Has a newer OpenCL Runtime - OpenCL 1.1 AMD-APP-SDK-v2.5 (684.211), supports cl_khr_fp64 (for GPUs that support DPFP) and adds cl_ext_atomic_counters_32 extension.
Would be nice to see if you get higher MHash/s Smiley. Post here!

Dia

13  Other / Beginners & Help / further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 on: July 01, 2011, 05:59:34 PM
If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).



Important (2012-01-13): The FASTLOOP=False parameter is not needed anymore, because FASTLOOP defaults to false in this version. Update: FASTLOOP=True works now, I uploaded a fixed version!

Important: since OpenCL SDK / Runtime version 2.6 AMD updated their OpenCL compiler, so that some older kernels and optimizations in them seem to not work anymore or are not needed anymore. In order to reflect this change I had to edit the kernel performance section of this thread.

Important: since version 2011-08-27 you don't need to supply the BFI_INT switch anymore. If your HW supports it, it's enabled automatically. To disable it use BFI_INT=false.

Important: since version 2011-08-04 (pre-release) you have to use the switch VECTORS2 instead of VECTORS. I made this change to be clear what vectors are used in the kernel (2- or 4-component). To use 4-component vectors use switch VECTORS4.

Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix miner) is included in this package and has to be used! The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.



This is the preferred switch for Phoenix with phatk_dia in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 VECTORS2 WORKSIZE=128


Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg
Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji
Download version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4j
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788
Download version 2011-07-17: http://www.mediafire.com/?4zxdd5557243has
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6
Download version 2011-07-07: http://www.mediafire.com/?o7jfp60s7xefrg4
Download version 2011-07-06: http://www.mediafire.com/?f8b8q3w5u5p0ln0
Download version 2011-07-03: http://www.mediafire.com/?xlkcc08jvp5a43v
Download version 2011-07-01: http://www.mediafire.com/?5jmt7t0e83k3eox

Kernel performance (BFI_INT / VECTORS2 / WORKSIZE=128 / SDK 2.6 / APP KernelAnalyzer 1.11 - Cal 11.12 profile):
HD5870
2011-08-20: 22 GPR / 1427 ALU OPs / 66 CF OPs
2011-08-27: 22 GPR / 1426 ALU OPs / 66 CF OPs
2011-12-21: 20 GPR / 1400 ALU OPs / 66 CF OPs
2012-01-13: 21 GPR / 1394 ALU OPs / 67 CF OPs

HD6970
2011-08-20: 21 GPR / 1687 ALU OPs / 66 CF OPs
2011-08-27:  23 GPR / 1688 ALU OPs / 68 CF OPs
2011-12-21: 21 GPR / 1687 ALU OPs / 66 CF OPs
2012-01-13: 20 GPR / 1687 ALU OPs / 66 CF OPs



Kernel performance (BFI_INT / VECTORS2 / SDK 2.5 / APP KernelAnalyzer 1.9 - Cal 11.7 profile):
HD5870
original phatk 1.X: 1393 ALU OPs
2011-07-01: 1389 ALU OPs
2011-07-03: 1385 ALU OPs
2011-07-06: 1380 ALU OPs
2011-07-07: 1380 ALU OPs
2011-07-11: 1378 ALU OPs
2011-07-17: 1376 ALU OPs
2011-08-04 (pre-release): 1368 ALU OPs
2011-08-11: 1364 ALU OPs
2011-08-27: 1363 ALU OPs (30 less compared to original phatk 1.X)
HD6970
original phatk 1.X: 1707 ALU OPs
2011-07-01: 1710 ALU OPs
2011-07-03: 1706 ALU OPs
2011-07-06: 1702 ALU OPs
2011-07-07: 1702 ALU OPs
2011-07-11: 1701 ALU OPs
2011-07-17: 1699 ALU OPs
2011-08-04 (pre-release): 1689 ALU OPs
2011-08-11: 1687 ALU OPs
2011-08-27: 1687 ALU OPs (20 less compared to original phatk 1.X)



changelog:

2012-01-13
Kernel:
- modified: Disclaimer is now the same as in original Phoenix package
- removed: all (u) typecasts in front of scalars, where vectors and scalars were used together because per OpenCL definition this is not needed
- removed: all () brackets around n in the #define parts of the kernel
- removed: S0(), which is now again merged into s0()
- removed: brackets around the commands in t1W(), t1(), t2() and W() were removed, to allow the compiler to reorder these
- added: W() function missed an ; at it's end
- added: init variable B1addK6 used in round 6 to save an add -> THX to DiabloD3
- added: a (uint) typecast in front of get_local_id() and get_group_id() calls, because return value could be 64 bits long, which is not wanted
- modified: replaced all ma() + s0() or s0() + ma() calls with t2()
- modified: round 6 now uses the new new B1addK6 variable
- modified: reordered W[] calculation for rounds 32, 91 and 92
- modified: rounds 121, 122 and 123 to not compute Vals[4], Vals[5] and Vals[6], because they are not needed for final computation of Vals[7] -> THX to jhajduk
- modified: removed + H[7] from round 124 and use -0xec9fcd13 to check for valid nonces
- added: result_r124 variable to take the result of the last round 124, this saves a few ALU OPs on VLIW5 GPUs
Python Init:
- modified: replaced spaces with tabs in the source code formatting (I really dislike this part in Python ^^)
- modified: a few comments and commands were reformatted for better readability or to be better understandable
- modified: FASTLOOP parameter now defaults to False, which means you don't need to supply FASTLOOP=False anymore
- removed: OUTPUT_SIZE is not used anymore so all references to it were removed
- modified: changed REVISION to 122
- modified: moved the WORKSIZE checks below the part where the check, if and which vectors should be used is performed
            this takes into account, that the global worksize passed to the kernel is influenced by vector usage and vectorsize
            (currently the use of FASTLOOP can break this, because of the "dynamic" number of iterations)
- added: some debug info about worksize and pyOpenCL is displayed at the start
- added: B1 + K[6] is passed as new kernel parameter
- modified: made enqueue_read_buffer() / enqueue_write_buffer() blocking and removed finish() after the read, as per AMDs recommendations
            to minimize API overhead

2011-08-27:
Kernel:
- added: code path for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the kernel, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- modified: non BFI_INT Ch() function, which was broken in 2011-08-11 -> THX to Vince
- modified: kernel output buffer is now an ulong array and not an uint array
- removed: OUTPUT_SIZE argument is not passed and used in the kernel anymore
- modified: WORKSIZEx4, WORKSIZEx3 and WORKSIZEx2 arguments were merged into WORKSIZExVECSIZE
- modified: removed, reordered and added some brackets and type-casting stuff in the kernel
- modified: restored command order for round 108 - 123 to free a GPR
- modified: added H[7] into round 124 calculation
- modified: changed the checking for positive nonces again to cover the H[7] change
- modified: writing of nonces to output now uses 1 write for Vec2 and max. 2 writes for Vec4, because 2x uints are now encoded into 1x ulong
Python Init:
- added: code for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the Python init, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- added: detection of maximum supported WORKSIZE per Device, which is used if no WORKSIZE is supplied, if supplied WORKSIZE > max. supported WORKSIZE
    or if WORKSIZE is not a power of 2
- added: code to decode the ulong from the output buffer into 2x uint and process the results
- modified: comments, code formating and line breaks for better readability
- modified: output buffer size is now the WORKSIZE -> THX to Phaetus

2011-08-11:
- modified: reverted a former change to the Ma() function to save an ALU OP for 69XX cards
- added: S0() and S1() function, which is a compiler help -> THX Phateus
- modified: a few brackets and layout of all helper functions for better readability and compatibility
- added: t2() function, which is (s0(n) + ma(n)) and saves a few GPRs -> THX Phateus and myself (had this in earlier, but removed it sometime ^^)
- modified: changed layout of kernel definition for better readability
- modified: all values which for example had a 10u now have a 10U (uppercase) to be consistent in the whole kernel
- modified: modified round 94 W calculation for better performance
- modified: round 108 - 123 now consists of 2 W() blocks followed by 2 sharoundW() blocks to save a GPR
- modified: changed the checking for positive nonces again to never create an invalid share and lower ALU OP usage

2011-08-04 (pre-release):
- added: user Vince into disclaimer -> THX Vince :)
- added: kernel is now able to work with 4-component vectors (switch VECTORS4) -> THX to Phateus
- modified: to use 2-component vectors I renamed the switch VECTORS to VECTORS2
- added: __attribute__((reqd_work_group_size(WORKSIZE, 1, 1))) -> THX to Phateus
- added: constants PreW31 and PreW32, which store P2() + P4() for round 31 and 32 -> THX to Phateus
- renamed - modified: W17_2 is now PreW19, W2 is now PreW18, PreVal4addT1 is now PreVal4 (= PreVal4 + T1), state0subT1 is now PreVal0 (= Preval4 + state0)
- modified: base is now declared as u to save the addidion of uint2(0, 1) or uint4(0, 1, 2, 3) for W_3 init -> THX to Phateus
- modified: nonce calculation now uses the local Work-Item ID, the group ID and the WORKSIZE instead of only the global Work-Item ID -> THX to Phateus
- added: saved a multiplication by passing WORKSIZEx2 and WORKSIZEx4 constants to the kernel
- modified: calculation for W[18 - O] was optimized so that P2(18) is only calculated for x component (if Vectors are used), because x and y only differ
       in the LSB and afterwards Bit 14 and 25 are rotated for W[18 - O].y -> THX to Phateus
- modified: saved an addition for Vals[0] init, because of the change to PreVal0
- modified: reordered code for round 4 - 95 to optimize for less ALU OPs used -> THX Phateus and myself ^^
- modified: ordering of variables in additions for Round 124 was changed to optimize for less ALU OPs used
- modified: rewrote the part where nonces are checked, if they are positive and where they are written into output buffer
       (saves 2 global writes per work-item and saves additional ALU OPs)
- modified: changed variables W_3, P2_18_x, P2_18 and nonce into a constant
- modified: changed code formating for rounds 4 - 124 better readability
- removed: some comments to cleanup the code

2011-07-17:
- added: offset for W[] array to reduce it's size -> THX to user Vince
- modified: function t1() renamed to t1W() / function sharound() renamed to sharoundW()
- added: function t1() and sharound() which are used where the W[] addition can be left out, because W[] == 0
    I guess the compiler already does this optimization, but doesn't hurt) -> THX to user Vince
- modified: P1() - P4() and W() to male use of the offset
- modified: quite a few kernel parameters have new values or were added (mixed ideas from User Vince with own ones)
    C1addK5: C1addK5 = C1 + K[5]: C1addK5 = C1 + 0x59f111f1
    D1: D1 = D1 + K[4] + W[4]: D1 = D1 + 0xe9b5dba5 + 0x80000000U
    W2: W2 + W16 in P1(): W2 = P1(18) + P4(18)
    W17_2: 0x80000000U in P2() = 0x11002000 + W17 in P1(): W17_2 = P1(19) + P2(19)
    PreValaddT1: PreValaddT1 = PreVal4 + T1
    T1substate0: T1substate0 = T1 - substate0
- added: variable W_3, which stores the first value formely held in W[3]
- added: Temp variable used to speed up calculation for rounds 4 and 5
- modified: changed round 3 so that it's more efficient (uses: Vals[0] and Vals[4])
- modified: W[0] - W[14] are now kind of hard-coded or left out, where they were 0
- modified: optimized P1(18) + P2(18) + P4(18)
- modified: optimized P1(19) + P2(19) + P4(19)
- modified: optimized round 4 + 5
- modified: rounds 6 - 14 and 73 - 78 now use new sharound() without W[] addition
- modified: offset added for all parts, where W[] is used
- modified: W_3 is used as result instead of W[3] (W[3] is still used to generate random possition in output buffer) -> THX to user Vince

2011-07-11:
- modified: constant H[7] has a new value (saves an addition in round 124)
- modified: non BFI_INT Ch() function now uses OpenCL built-in bitselect
- modified: reordered W[] calculations for round 18 - 30, 87 and 94
- modified: reordered calculation for round 5
- modified: W[] calculation for round 80 - 86 is now a block before sharound() is called
- removed: K[60] from round 124 (because of new H[7] value)

2011-07-07:
- removed: some large comments in the source were removed
- modified: Ma() function is now unique in the kernel, no matter if BFI_INT is used or not -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: Ch() function which uses OpenCL bitselect() command (but it's not active, so you are free to try it) -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- modified: u W[128] is replaced with u W[124] because no more than 124 values are used
- modified: initialisation for Vals[0], Vals[3], Vals[4] and Vals[7] is now processed in other places to save some unneeded writes to these variables
- fixed: some hex values, which were used in vector additions are now properly type-casted, which hopefully restores AMD APP SDK 2.1 compatibility
- modified: rounds 3, 4 and 5 were modified for better performance (guess this can be tuned, if I have a working KernelAnalyzer)

2011-07-06:
- modified: H[] constants were reordered (2 were not used because of earlier mods)
- added: ulong L constant added (it's value doesn't fit into an uint)
- modified: new Ma() for non BFI_INT capable cards, should be faster -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- removed: t1W()
- modified: t1() reordered function calls for better performance
- modified: W() reordered function calls for better performance
- modified: sharound() removed writing to t1, now t1() is called twice, which makes this function FASTER (OpenCL compiler optimization)
- removed: sharound2() (if needed W() + sharound() is used instead)
- removed: partround() not needed because of another solution for round 3 and 124
- removed: t1 and t1W variabled
- modified: rounds 3, 19, 30, 81, 87, 94 and 124 were modified for better performance

2011-07-03:
- removed: t2(), w(n), r0(x), r1(x), R0(n) and R1(n)
- renamed - modified: R(x) to W(x) plus now uses P1, P2, P3 and P4 directly
- modified: P1(x) and P2(x) to not use R1(x - 2), R0(x - 15) but do that directly
- modified: SHA rounds 31, 32, 47 - 61, 86, 87, 114 - 119 now use sharound2() instead of W() + sharound()
- modified: reordered code for SHA rounds 66 - 94 -> saw no decrease in performance -> better readability
- modified: SHA rounds 18, 19, 20, 80, 93, 94 now use a simpler calculation because of removed zero addions
--> 1x P1(x), 2x P2(x), 4x P3(x) and 2x P4(x) were removed which should give a little MHash/sec boost
- modified: sharound() so that a double execution of t1() is avoided -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

2011-07-01:
Code:
Vals[7] = 0xb0edbdd0 + K[0] +  W[64] + 0x08909ae5U; -> Vals[7] = 0xfc08884d + W[64];
Vals[3] = 0xa54ff53a + 0xb0edbdd0 + K[0] +  W[64]; -> Vals[3] = 0x198c7e2a2 + W[64];
- removed the
Code:
Vals[7] += H[7]
addition and replaced the final if-statements in the Kernel
- reordered some W[n] = statements to remove some unneeded additions
- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse :D)
- removed some unneeded brackets
- re-formatted for better readability

If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).

Thanks,
Dia
Pages: [1]
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!