1 2010-10-24 00:05:20 <Keefe> ArtForz: how packed are your ALUs? looking at the ISA for my kernel, it looks like mine are 91% packed
  2 2010-10-24 00:05:36 <Diablo-D3> thats not too unuptimum
  3 2010-10-24 00:05:42 <Diablo-D3> the real fucking problem is register rape
  4 2010-10-24 00:05:54 <Keefe> i'm not using vectors
  5 2010-10-24 00:06:09 <Keefe> dunno how to determine my register usage
  6 2010-10-24 00:06:34 <ArtForz> I think I'm ~93%
  7 2010-10-24 00:06:39 <Diablo-D3> well, the problem with mining, from what I can tell, is you run out of registers
  8 2010-10-24 00:06:45 <Keefe> the .isa file says "GprPoolSize = 0" near the end, but obviously that's not correct
  9 2010-10-24 00:07:12 <Keefe> or it doesn't mean what i thought
 10 2010-10-24 00:07:15 <ArtForz> mainly semms to be the T unit having nothing useful to do
 11 2010-10-24 00:07:40 <Diablo-D3> texture unit?
 12 2010-10-24 00:07:51 <ArtForz> nope, the 5th ALU
 13 2010-10-24 00:07:53 <Keefe> doesn't the compiler leave the T unit to last when trying to fill ALU's?
 14 2010-10-24 00:08:05 <ArtForz> it pretty much has to
 15 2010-10-24 00:08:15 <Diablo-D3> is it a special use ALU?
 16 2010-10-24 00:08:18 <ArtForz> yep
 17 2010-10-24 00:08:26 <ArtForz> Transcendental unit
 18 2010-10-24 00:08:27 <Diablo-D3> let me guess, very limited functionality?
 19 2010-10-24 00:08:41 <ArtForz> well, not really
 20 2010-10-24 00:08:50 <ArtForz> it doens't have floating point add/mul
 21 2010-10-24 00:08:59 <Diablo-D3> http://www.beyond3d.com/content/reviews/53/8
 22 2010-10-24 00:09:32 <ArtForz> still has basic int ops, but mainly is for sin/cos/...
 23 2010-10-24 00:09:50 <Keefe> it can do alot of the same int ops the others can do
 24 2010-10-24 00:09:55 <ArtForz> yep
 25 2010-10-24 00:10:01 <Diablo-D3> Moving to the artist formerly known as the RysUnit, currently known as the transcendental ALU after it threatened to sue, it remains a rather special chap, being higher precision than the other units (FP40 versus FP32). It can handle transcendentals, just like before, each being single cycle at least according to our measurements. It can also do a single 32-bit INT MUL per cycle, by virtue of it's more accommodating man
 26 2010-10-24 00:10:41 <ArtForz> so we have 4 simple + 1 complex 32-bit ALUs
 27 2010-10-24 00:10:41 <Diablo-D3> also, this entire article is shit
 28 2010-10-24 00:10:46 <Diablo-D3> it was written by someone who doesnt get this shit
 29 2010-10-24 00:11:03 <ArtForz> small tiny problem, the external data paths are... 128 bit
 30 2010-10-24 00:11:33 <Diablo-D3> I wanna see an updated opencl opt guide from AMD for 6xxx
 31 2010-10-24 00:11:47 <Diablo-D3> because the 69xx look potentially interesting
 32 2010-10-24 00:12:10 <Keefe> do the TEX sections of the isa code run in parallel with the ALU sections?
 33 2010-10-24 00:12:22 <ArtForz> well, kinda but not really
 34 2010-10-24 00:12:38 <Diablo-D3> I wish I could read m0's kernel
 35 2010-10-24 00:12:43 <Diablo-D3> but its insanely packed
 36 2010-10-24 00:12:51 <ArtForz> remember theres several "threads" excuting in a SMT-like fashion
 37 2010-10-24 00:13:31 <Diablo-D3> SMT doesnt even entirely define it
 38 2010-10-24 00:13:48 <Diablo-D3> its a multi-stage pipeline that can be stuffed due to the insane design of the ALUs
 39 2010-10-24 00:13:54 <ArtForz> well, it's kinda SMTs ugly cousin :P
 40 2010-10-24 00:14:15 <Diablo-D3> its 5 round 4 item SIMD
 41 2010-10-24 00:14:24 <Diablo-D3> and it checks in on it every 20 items
 42 2010-10-24 00:15:00 <ArtForz> and then we have independent ALUs and load/store units :P
 43 2010-10-24 00:15:15 <Diablo-D3> yes
 44 2010-10-24 00:15:25 <Diablo-D3> the load/store units are batshit
 45 2010-10-24 00:15:29 <Diablo-D3> but dont get me wrong
 46 2010-10-24 00:15:33 <Diablo-D3> its the only way to design this
 47 2010-10-24 00:15:39 <ArtForz> = a modern GPU is a pretty complicated beast
 48 2010-10-24 00:15:50 <Diablo-D3> nvidia has all this fucking design overhead because they keep making the pipeline longer
 49 2010-10-24 00:16:08 <Diablo-D3> instead of allowing multi-issue pipeline shit like AMD did
 50 2010-10-24 00:16:12 <ArtForz> well, it feels kinda similar to old parallel supercomputers
 51 2010-10-24 00:16:33 <Diablo-D3> ArtForz: well
 52 2010-10-24 00:16:41 <Diablo-D3> nvidia, from what I can tell, it feels like normal SIMD
 53 2010-10-24 00:16:46 <Diablo-D3> and not a very optimum one at that
 54 2010-10-24 00:17:17 <Diablo-D3> amd is VLIW on top of SIMD
 55 2010-10-24 00:17:23 <Diablo-D3> heavy on the VLIW side
 56 2010-10-24 00:17:26 <ArtForz> while ATI went with crazy paradigm mix
 57 2010-10-24 00:17:56 <Diablo-D3> well, VLIW with huge ass register banks that any ALU can access works great
 58 2010-10-24 00:18:03 <ArtForz> yep
 59 2010-10-24 00:18:19 <Diablo-D3> well optimized code just babysits ALU in/out
 60 2010-10-24 00:19:05 <ArtForz> yep
 61 2010-10-24 00:19:21 <Diablo-D3> it just means you have these insanely huge ALUs
 62 2010-10-24 00:19:46 <Diablo-D3> but, on the flip side, you can send stuff into the pipeline while stuff is still waiting to come out
 63 2010-10-24 00:20:03 <Diablo-D3> because it has coupled stages instead of one giant monolithic pipe
 64 2010-10-24 00:20:33 <Diablo-D3> and I bet it can exit stages early for various reasons
 65 2010-10-24 00:20:52 <ArtForz> what feels weird, the whole thing appears as 2-group register latency in ASM
 66 2010-10-24 00:21:23 <ArtForz> = write to a reg in VLIW group 1, you can read it again in group 3
 67 2010-10-24 00:22:02 <ArtForz> of course if you don't use indexed regs, you can just use the prev vector/scalar path and also use the output in group 2
 68 2010-10-24 00:22:37 <Diablo-D3> the problem is writing the compiler
 69 2010-10-24 00:22:44 <ArtForz> yep
 70 2010-10-24 00:22:47 <Diablo-D3> a compiler that can actually count timing is difficult as fuck
 71 2010-10-24 00:22:51 <Diablo-D3> ask the gcc guys
 72 2010-10-24 00:22:58 <ArtForz> yep
 73 2010-10-24 00:23:24 <Kiba> why you say yep all the time
 74 2010-10-24 00:23:34 <Kiba> being a yesman for Diablo-D3
 75 2010-10-24 00:23:40 <Diablo-D3> heh
 76 2010-10-24 00:23:45 <ArtForz> well, because thats how it is
 77 2010-10-24 00:23:57 <Diablo-D3> ArtForz: I really should look into doing opencl
 78 2010-10-24 00:24:00 <Diablo-D3> it cant be THAT hard
 79 2010-10-24 00:24:18 <Diablo-D3> I know glsl, I know how to code massively parallel code
 80 2010-10-24 00:24:18 <Keefe> my kernel's isa has: 1103 ADD, 364 AND, 1076 BIT_ALIGN, 175 LSHR, 241 OR, 1074 XOR
 81 2010-10-24 00:24:28 <ArtForz> if I find the time I wanna move away from OpenCL
 82 2010-10-24 00:24:37 <Diablo-D3> what, into straight IL?
 83 2010-10-24 00:24:42 <ArtForz> yep
 84 2010-10-24 00:24:54 <ArtForz> CAL + IL kernel
 85 2010-10-24 00:25:05 <Diablo-D3> yeah, thats too much like coding in assembly for me
 86 2010-10-24 00:25:05 <Keefe> was going to ask if you already had
 87 2010-10-24 00:25:40 <Keefe> only about 4K ops, not too crazy to attempt :)
 88 2010-10-24 00:25:50 <ArtForz> yep
 89 2010-10-24 00:26:07 <Diablo-D3> Keefe: well the big thing is
 90 2010-10-24 00:26:12 <Diablo-D3> this code SHOULD run quickly
 91 2010-10-24 00:26:18 <Diablo-D3> its not especially complex code
 92 2010-10-24 00:26:22 <ArtForz> and 90% of that is just the same thing 122 times
 93 2010-10-24 00:26:24 <Diablo-D3> its just vastly repetative
 94 2010-10-24 00:26:28 <Diablo-D3> yeah
 95 2010-10-24 00:26:45 <Diablo-D3> I should bang out m0
 96 2010-10-24 00:26:46 <Diablo-D3> ser
 97 2010-10-24 00:26:50 <Diablo-D3> I should bang out m0's code on java
 98 2010-10-24 00:27:00 <Diablo-D3> just to try out that AMD thing
 99 2010-10-24 00:27:05 <Diablo-D3> java bytecode -> opencl
100 2010-10-24 00:27:28 <ArtForz> I kinda like what the CALPP guys did
101 2010-10-24 00:27:57 <ArtForz> C++ -> IL using mainly templating...
102 2010-10-24 00:27:59 <Diablo-D3> not particularly interested in c++
103 2010-10-24 00:29:00 <ArtForz> at least it appears that way
104 2010-10-24 00:29:39 <ArtForz> as in, it's using the C++ compiler to produce IL, then hopes the CAL IL -> ASM compiler is smart enough to optimize the result
105 2010-10-24 00:30:19 <Diablo-D3> well
106 2010-10-24 00:30:24 <Diablo-D3> Im interested in how AMD's shit work
107 2010-10-24 00:30:27 <Diablo-D3> because it doesnt work on nvidia
108 2010-10-24 00:30:34 <Diablo-D3> Im wondering if its outputting IL or ASM directly
109 2010-10-24 00:31:28 <Diablo-D3> ArtForz: btw, I think I need a list of test data
110 2010-10-24 00:36:01 <Keefe> ArtForz: how many alu ops in your kernel isa? my total is 4036 (including a few odd ones, not including tex ops)
111 2010-10-24 00:36:39 <Keefe> i'm thinking you must have squeezed it down to fewer ops, to get 6% more mhps with only 2% better alu packing
112 2010-10-24 00:38:40 <Diablo-D3> I wonder if I can beat the 75m art thinks I should get
113 2010-10-24 00:39:59 <Keefe> i guess i should try m0's code sometime for comparison
114 2010-10-24 00:40:16 <Diablo-D3> because I don't think anybody's code is optimal enough
115 2010-10-24 00:40:34 <Diablo-D3> ArtForz: is there a test list of shit?
116 2010-10-24 00:41:45 <Keefe> test for what?
117 2010-10-24 00:41:56 <Diablo-D3> for mine attempts
118 2010-10-24 00:42:11 <Diablo-D3> if you have x input, the output should be y
119 2010-10-24 00:43:10 <ArtForz> 3931
120 2010-10-24 00:43:50 <Keefe> just modify the bitcoin code such that it does some cpu hashing of the same data at the same time
121 2010-10-24 00:44:01 <Keefe> as the gpu code
122 2010-10-24 00:44:11 <Diablo-D3> Keefe: but I dont use bitcoin for this
123 2010-10-24 00:44:32 <Diablo-D3> nor do I wanna touch that code with a ten foot pole
124 2010-10-24 00:44:50 <Keefe> guess i don't know what you're talking about
125 2010-10-24 00:44:53 <ArtForz> in 848 VLIW clauses
126 2010-10-24 00:45:03 <Diablo-D3> Keefe: the miner isnt part of bitcoin
127 2010-10-24 00:45:07 <ArtForz> so ~ 92.7% packing
128 2010-10-24 00:45:08 <Diablo-D3> it uses the getwork patch
129 2010-10-24 00:45:15 <Diablo-D3> and art's does some other weird shit
130 2010-10-24 00:45:35 <Keefe> mine is custom also
131 2010-10-24 00:46:02 <Keefe> so it's not hard for me to code it to run cpu at the same time for a subset as a test
132 2010-10-24 00:47:12 <Keefe> i have 887 alu and 12 tex clauses
133 2010-10-24 00:47:18 <Diablo-D3> Keefe: yes but
134 2010-10-24 00:47:24 <Diablo-D3> the getwork patch only gets work
135 2010-10-24 00:47:33 <Diablo-D3> theres no way to ask the client for the right answer
136 2010-10-24 00:47:49 <Keefe> i see
137 2010-10-24 00:48:31 <ArtForz> Diablo-D3: blkXXXX.dat has ~ 80k test vectors ;)
138 2010-10-24 00:49:00 <Diablo-D3> ArtForz: yeah, but how do I read it?
139 2010-10-24 00:50:33 <ArtForz> you only need the block header, bitcointools has the needed parts
140 2010-10-24 00:58:42 <Diablo-D3> bitcointools?
141 2010-10-24 00:58:51 <ArtForz> http://github.com/gavinandresen/bitcointools
142 2010-10-24 01:05:49 <Diablo-D3> ugh
143 2010-10-24 01:05:54 <Diablo-D3> ArtForz: how do I get the shit out?
144 2010-10-24 01:08:01 <Keefe> i think i remember a link to download it in one tar
145 2010-10-24 01:08:11 <Diablo-D3> Keefe: not use git you fool
146 2010-10-24 01:08:11 <theymos> Click the "downloads" button.
147 2010-10-24 01:08:29 <Diablo-D3> Im talking about which python script does what I want
148 2010-10-24 01:09:49 <Keefe> you'll probably want to modify one to output just what you want, either that or lots of post processing
149 2010-10-24 01:10:05 <Keefe> isn't there a readme?
150 2010-10-24 01:10:12 <Diablo-D3> the readme is retarded
151 2010-10-24 01:11:44 <Keefe> read dbdump.py and figure out what it can do. that's what i did
152 2010-10-24 01:12:02 <Diablo-D3> fucking python
153 2010-10-24 01:12:09 <Diablo-D3> why the fuck are people using python to begin with
154 2010-10-24 01:12:19 <Keefe> not the worst language :)
155 2010-10-24 01:12:26 <Diablo-D3> its pretty up there
156 2010-10-24 01:12:30 <nameless> |Diablo-D3: Because it's better than ruby
157 2010-10-24 01:12:39 <Diablo-D3> nameless|: thats not a good excuse.
158 2010-10-24 01:12:40 <Keefe> or perl
159 2010-10-24 01:12:44 <Diablo-D3> Keefe: fuck you.
160 2010-10-24 01:12:49 <Diablo-D3> perl > python every day of the week
161 2010-10-24 01:13:24 <Keefe> ugh, i realize it's really popular. but last time i tried to understand perl i really didn't like it
162 2010-10-24 01:13:36 <ArtForz> if you write code, yes, if you have to maintain someone elses code... not so much
163 2010-10-24 01:13:37 <Keefe> i'd rather use c
164 2010-10-24 01:14:02 <Keefe> my native lang is vb.net
165 2010-10-24 01:14:04 <ArtForz> the designers made it pretty damn hard to write unreadable python
166 2010-10-24 01:14:13 <Diablo-D3> Keefe: no wonder you didnt get perl, you're braindamaged
167 2010-10-24 01:14:24 <ArtForz> my fav lang is C
168 2010-10-24 01:14:30 <Diablo-D3> ArtForz: they made is also pretty damn hard to write useful python
169 2010-10-24 01:14:41 <ArtForz> yep
170 2010-10-24 01:14:54 <nameless> |Diablo-D3: It's better than PHP?
171 2010-10-24 01:14:59 <nameless> |It's better than basic?
172 2010-10-24 01:15:07 <Keefe> i'll admit vb.net has made me lazy
173 2010-10-24 01:15:09 <nameless> |It's better than brainfuck?
174 2010-10-24 01:15:13 <ArtForz> it's also better than brainfuck
175 2010-10-24 01:15:19 <Diablo-D3> brainfuck is a different classification of language
176 2010-10-24 01:15:27 <Diablo-D3> python is for complete utter noobs
177 2010-10-24 01:15:34 <Diablo-D3> why would I want to use code by noobs
178 2010-10-24 01:15:34 <nameless> |Diablo-D3: it is a language and you can code it in brainfuck
179 2010-10-24 01:15:42 <ArtForz> might be on par with whitespace
180 2010-10-24 01:15:50 <Diablo-D3> ffff whitespace
181 2010-10-24 01:15:59 <Diablo-D3> speaking of whitespace
182 2010-10-24 01:16:04 <Diablo-D3> fuck you python
183 2010-10-24 01:16:18 <Diablo-D3> and fuck all of you fuckers who think its okay to use tab outside of java.
184 2010-10-24 01:16:23 <ArtForz> thats one area where the python desigenrs fucked up royally
185 2010-10-24 01:16:35 <Diablo-D3> two fucking spaces.
186 2010-10-24 01:16:37 <Diablo-D3> not a tab.
187 2010-10-24 01:16:41 <Diablo-D3> not one or three or more spaces.
188 2010-10-24 01:16:42 <Diablo-D3> two.
189 2010-10-24 01:16:44 <Diablo-D3> TWO.
190 2010-10-24 01:16:45 <ArtForz> either it's spaces or tabs, don't f*ing allow both in a language with syntactic whitespace