1 2017-02-25 00:02:23 0|cfields|sipa: want me to PR that? Or you have something else in mind?
2 2017-02-25 00:03:00 0|BlueMatt|bad_alloc and leaving pcoinsTip in an inconsistent state, but std::bad_alloc is not an std::runtime_error so the catch at the end of FlushStateToDisk didnt catch it. OK, phew, so now we go to connect the next block and fail to connect its inputs (without allocating new disk space). I /think/ that explains your debug.log without hand waving about hardware errors
3 2017-02-25 00:03:00 0|BlueMatt|sipa: ok, here's a theory......"10:39:08 Prune:" is the prune called by AcceptBlock's FlushStateToDisk call (seems obvious given the Pre-allocating blk just above)....now, that flush call hit fDoFullFlush triggering a pcoinsTip->Flush(). OK, so now in CCoinsViewDB::BatchWrite we iterate over the entries in mapCoins, allocating memory for each one in the leveldb batch, erasing *AS WE GO*...at some point it ran out of memory, throwing
4 2017-02-25 00:04:37 0|sipa|BlueMatt: sounds like a plausible theory
5 2017-02-25 00:04:41 0|sipa|cfields: PR what?
6 2017-02-25 00:04:50 0|cfields|BlueMatt: that was my reading of it as well
7 2017-02-25 00:05:13 0|cfields|sipa: https://github.com/theuni/bitcoin/commit/28afe574567ba838d46959047282460dbab39b91
8 2017-02-25 00:05:26 0|sipa|if that's the problem (and that seems to become more plausible as dbcache gets set higher, as you get the duplication only at flush time)
9 2017-02-25 00:05:44 0|sipa|we should probably make the batch construction not erase things for ccoinscache
10 2017-02-25 00:06:14 0|BlueMatt|sipa: yes, an easy way to make this particular, very specific issue, less likely is to not erase as we go
11 2017-02-25 00:06:21 0|BlueMatt|ofc that means more memory usage, but...better than blowing up
12 2017-02-25 00:08:17 0|sipa|BlueMatt: alternatively, we could fail to flush, and still delete everything
13 2017-02-25 00:08:20 0|sipa|from the cache
14 2017-02-25 00:08:37 0|BlueMatt|sipa: hmm?
15 2017-02-25 00:08:58 0|sipa|... and then reset pcoinsTip->hashBlock and chainActive to the last succesfully flushed block (which may be 1000s ago)
16 2017-02-25 00:09:22 0|BlueMatt|sipa: in that case we must AbortNode, I think?
17 2017-02-25 00:09:29 0|sipa|oh, absolutely
18 2017-02-25 00:10:19 0|sipa|but i'm worried we may not be able to to instantly abort (due to locks being held all over the place while the OOM occurs)
19 2017-02-25 00:11:12 0|BlueMatt|yea, I mean ideally we'd broaden the catch in FlushStateToDisk and use the AbortNode there
20 2017-02-25 00:11:25 0|BlueMatt|just this time dont let the coins map get into an inconsistent state
21 2017-02-25 00:12:21 0|sipa|cfields: looks good
22 2017-02-25 00:13:19 0|BlueMatt|cfields: oh, somehow I missed that that /is/ only for bad_alloc
23 2017-02-25 00:13:23 0|BlueMatt|yea, lets totally do that!
24 2017-02-25 00:13:34 0|cfields|<BlueMatt> wait, does this apply to more than bad_alloc?
25 2017-02-25 00:13:34 0|cfields|<cfields> no
26 2017-02-25 00:13:36 0|cfields|:)
27 2017-02-25 00:13:45 0|BlueMatt|oh, reading comprehension fail
28 2017-02-25 00:18:09 0|cfields|sipa: want to switch prevector to new[] and ditch realloc()? Or call get_new_handler()() if they return null?
29 2017-02-25 00:18:25 0|cfields|(or just leave it alone and let something write to 0 :p)
30 2017-02-25 00:18:42 0|sipa|cfields: just switch it to new[]
31 2017-02-25 00:18:58 0|cfields|ok
32 2017-02-25 00:19:00 0|BlueMatt|it would suck to drop realloc?
33 2017-02-25 00:19:21 0|BlueMatt|just assert(new_ptr)
34 2017-02-25 00:19:40 0|sipa|how often does realloc actually help?
35 2017-02-25 00:20:03 0|BlueMatt|no idea, but the difference between get_new_handler()() and assert(new_ptr) is minimal
36 2017-02-25 00:20:53 0|sipa|well i don't like making prevector assume that OOMs are fatal
37 2017-02-25 00:21:00 0|sipa|that's something application level
38 2017-02-25 00:21:31 0|BlueMatt|sipa: yea, and in our application they always are :p
39 2017-02-25 00:22:48 0|sipa|still, it's nicer to have that knowledge in only one place
40 2017-02-25 00:26:16 0|BlueMatt|sipa: internet says realloc is useful, though in our usage of prevector its likely not super noticeable
41 2017-02-25 00:26:24 0|BlueMatt|so I'm fine with whatever
42 2017-02-25 00:30:43 0|sipa|BlueMatt: oh, i just realized... calling get_new_handler()() in prevector doesn't imply it's going to fail
43 2017-02-25 00:31:08 0|BlueMatt|huh? you mean doesnt imply its going to std::terminate instead of throwing or so?
44 2017-02-25 00:31:59 0|sipa|yes
45 2017-02-25 00:32:06 0|sipa|so, i don't care.
46 2017-02-25 00:32:07 0|BlueMatt|ok, if you prefer that, do that
47 2017-02-25 00:41:09 0|bitcoin-git|[13bitcoin] 15jtimon opened pull request #9855: RPC: Use integer satoshis instead BTC with decimals (06master...060.15-rpc-satoshis) 02https://github.com/bitcoin/bitcoin/pull/9855
48 2017-02-25 03:35:25 0|sipa|cfields, BlueMatt: crap, it seems i reported this exact same bug a year ago already...
49 2017-02-25 04:06:46 0|cfields|sipa: heh. Well, at least we can get it in for 0.14
50 2017-02-25 04:07:00 0|cfields|sipa: does that mean it takes 1+ years to sync your rpi? :)
51 2017-02-25 04:18:36 0|sipa|cfields: it's a new one
52 2017-02-25 04:18:50 0|sipa|it took 2 days to sync to the point where it was
53 2017-02-25 05:06:29 0|cfields|sipa: i have no clue how representative the CCheckQueueSpeedPrevectorJob bench is, but there's a noticible difference when switching to new[] and dropping realloc
54 2017-02-25 05:07:38 0|cfields|sipa: for the sake of being conservative this late in the release cycle, i think get_new_handler()() is likely to have much less impact
55 2017-02-25 05:29:20 0|sipa|cfields: ack
56 2017-02-25 05:30:31 0|bitcoin-git|[13bitcoin] 15theuni opened pull request #9856: Terminate immediately when allocation fails (06master...06bad_alloc_terminate2) 02https://github.com/bitcoin/bitcoin/pull/9856
57 2017-02-25 05:32:37 0|cfields|damn fanquake, you're fast!
58 2017-02-25 05:59:36 0|cfields|sipa: well i guess that discussion has been rendered moot. assert for 0.14, and do an efficient specialized unsigned char* prevector for master?
59 2017-02-25 05:59:49 0|sipa|cfields: ok
60 2017-02-25 07:46:08 0|wump|bah, #9856 is really ugly :/
61 2017-02-25 07:46:10 0|gribble|https://github.com/bitcoin/bitcoin/issues/9856 | Terminate immediately when allocation fails by theuni ÷ Pull Request #9856 ÷ bitcoin/bitcoin ÷ GitHub
62 2017-02-25 07:52:02 0|wumpus|I understand wanting to terminate immediately on a memory error in some cases, but can't you just catch the exception at some point instead of registereing a global handler?
63 2017-02-25 07:52:44 0|sipa|wumpus: the problem is exactly that it is being caught... and then we continue in an inconsistent state
64 2017-02-25 07:53:01 0|wumpus|well then crash the program at the catch site!
65 2017-02-25 07:53:14 0|sipa|that's what this does
66 2017-02-25 07:53:36 0|sipa|tell the program to crash whenever new fails
67 2017-02-25 07:53:41 0|wumpus|no, this uses some poorly documented feature of c++ to change a global callback for every failed allocation
68 2017-02-25 07:53:49 0|wumpus|instead of just in the consensus code or wherever it is actually dangerous
69 2017-02-25 07:55:07 0|wumpus|some allocation errors, either in bitcoin itself or in libraries may actually be recoverable, and they would now cause a crash
70 2017-02-25 07:56:15 0|sipa|oh, sure, longer term i think we can find something more specific
71 2017-02-25 07:56:40 0|sipa|but as long as we have "catch std::exception& e" everywhere, which catches everything, that's pretty hard
72 2017-02-25 07:56:56 0|wumpus|I think this is a bad direction to go in with the code base
73 2017-02-25 07:57:01 0|wumpus|shoudln't we be addressing that then?
74 2017-02-25 07:57:15 0|sipa|in 0.14?
75 2017-02-25 07:57:42 0|wumpus|for 0.14 this is ok, but this is a PR to master
76 2017-02-25 07:58:19 0|sipa|hmm, you think this should go in just 0.14?
77 2017-02-25 07:58:43 0|wumpus|the way the PR is now? yes, I think so. This just too aspecific an hack
78 2017-02-25 07:58:48 0|sipa|i think terminating when OOM is a very reasonable thing, and certainly more reasonable than trying to continuing
79 2017-02-25 07:59:03 0|wumpus|I think terminating on OOM in select cases is a reasonable thing
80 2017-02-25 08:00:09 0|sipa|there may be more user-friendly things to do in some cases, but it's very hard... allocation failures can occur pretty much everywhere, and it seems really hard to avoid inconsistent states
81 2017-02-25 08:00:38 0|wumpus|sure, it's difficult
82 2017-02-25 08:00:46 0|wumpus|but this just feels like giving up on any kind of sane error handling
83 2017-02-25 08:01:22 0|sipa|i don't think so... OOM is different from disk errors, for example
84 2017-02-25 08:01:31 0|sipa|you can cleanly recover from that, usually
85 2017-02-25 08:01:35 0|wumpus|I really dislike global handlers like this, and I don't think much software uses this
86 2017-02-25 08:01:53 0|wumpus|it just changes the expectation of the c++ standard that allocations return an exception
87 2017-02-25 08:02:18 0|wumpus|which is adding surprising behavior at a low level
88 2017-02-25 08:04:28 0|sipa|i'm surprised to read that
89 2017-02-25 08:04:47 0|gmaxwell|on many systems allocation failures are just silently ignored and the program just crashes when it tries to use the memory.
90 2017-02-25 08:05:00 0|sipa|i think that most software just doesn't try to catch bad_alloc
91 2017-02-25 08:05:14 0|sipa|and it just crashes
92 2017-02-25 08:05:19 0|gmaxwell|In virtually none of the software could an allocation even possibly be handled, esp as almost everything we do involves allocation.
93 2017-02-25 08:05:45 0|wumpus|ok ,never mind
94 2017-02-25 08:06:02 0|wumpus|seems you all agree, just do this
95 2017-02-25 08:06:37 0|gmaxwell|but I agree that just fixing the badalloc is not a full solution. What about other exceptions? If bad alloc can cause spurrious block invalidations how long until the next thing.
96 2017-02-25 08:07:19 0|sipa|well, i think we should get rid of the catch std::exception and catch ... everywhere
97 2017-02-25 08:07:22 0|wumpus|yes if the code is not exception-safe, that should be addressed
98 2017-02-25 08:07:29 0|sipa|and only catch expected exceptions
99 2017-02-25 08:07:34 0|gmaxwell|So I am of the opinion that bad alloc should just crash but also that the software needs to deal better with unexpected exceptions in block validation... as I mentioned before this is the third case of exceptions causing bogus block rejection that we've had.
100 2017-02-25 08:07:58 0|sipa|but by doing that, we'd still just crash in case of an OOM
101 2017-02-25 08:08:08 0|sipa|at least with this PR, we write to debug.log what happened
102 2017-02-25 08:08:28 0|wumpus|but unlike AbortNode() no attempt is made to show a dialog to the user
103 2017-02-25 08:08:48 0|gmaxwell|can't show a dialog without allocating.
104 2017-02-25 08:09:07 0|sipa|well in many cases, that would be possible
105 2017-02-25 08:09:08 0|wumpus|well the point is: maybe it needs less allocation than the original allocation request
106 2017-02-25 08:09:12 0|wumpus|not all allocations are equal
107 2017-02-25 08:09:15 0|sipa|indeed
108 2017-02-25 08:09:23 0|sipa|typically those that trip an OOM are large ones
109 2017-02-25 08:09:34 0|wumpus|someone might be trying to allocate a 4GB buffer, which fails, then after that the program could continue keeping in mind that it can't have that buffer
110 2017-02-25 08:09:38 0|gmaxwell|That is a fair point. Also if it did something like shut down the mempool it might free enough to display a message.
111 2017-02-25 08:10:15 0|gmaxwell|I wonder what the distribution of vm_overcommit settings is, it used to be ubiquitious, the fact that sipa even saw this suggests that it isn't anymore.
112 2017-02-25 08:11:06 0|sipa|for this specific issue, it may be possible to catch the bad_alloc inside the leveldb wrapper, and if so, mark the db object as read-only
113 2017-02-25 08:11:13 0|gmaxwell|apparently some hurestic is now the default.
114 2017-02-25 08:11:17 0|wumpus|in many cases showing a simple modal dialog woudl still work, especially on windows where it's sort of a direct sytem operation. It should at least we attempted
115 2017-02-25 08:11:26 0|sipa|that's fair
116 2017-02-25 08:11:41 0|sipa|but i think it requires too many changes for 0.14
117 2017-02-25 08:11:43 0|gmaxwell|I have no issue with that, so long as if it fails the system still shuts down.
118 2017-02-25 08:11:44 0|wumpus|and we have a general 'A fatal issuee has occured check debug.log' already in AbortNode
119 2017-02-25 08:12:17 0|sipa|i really want to think harder about possible inconsistent states
120 2017-02-25 08:12:29 0|wumpus|well an allocation failure in qt will just propagate the std::alloc_error
121 2017-02-25 08:12:32 0|wumpus|qt is exception safe
122 2017-02-25 08:12:43 0|gmaxwell|sipa: I'm kinda surprised that cfield's patch works. IIRC from gdbing boost asio it used to try to allocate 4TB of memory and the fail. I'm surprised nothing else in boost does.
123 2017-02-25 08:12:59 0|wumpus|so just catch that and abort() if there is one
124 2017-02-25 08:13:05 0|sipa|gmaxwell: it's c++11
125 2017-02-25 08:13:09 0|sipa|boost does not assume c++11
126 2017-02-25 08:13:30 0|sipa|for example, do we want to prevent writing to mempool.dat after some unhandled system error, to prevent corrupting it?
127 2017-02-25 08:14:03 0|wumpus|yes
128 2017-02-25 08:14:18 0|sipa|if catching of the exception occurs higher up, there is less risk of those things
129 2017-02-25 08:14:28 0|sipa|but what about writes that happen in unrelated background threads
130 2017-02-25 08:14:35 0|sipa|the wallet gets flushed in a background thread
131 2017-02-25 08:14:46 0|wumpus|set a flag and don't flush anything anymore
132 2017-02-25 08:14:57 0|wumpus|I mean all the databases are atomic right?
133 2017-02-25 08:15:11 0|wumpus|you can call writes, and as long as you don't flush, they'll be ignored on shutdown
134 2017-02-25 08:15:30 0|sipa|but a flag where? an exception can only propagate up to its threads' main function
135 2017-02-25 08:15:38 0|wumpus|a global flag
136 2017-02-25 08:15:43 0|wumpus|checked before each database flag
137 2017-02-25 08:15:45 0|wumpus|flush*
138 2017-02-25 08:16:16 0|sipa|if an OOM occurs inside the background runner thread, it will - if uncaught - just kill the background runner
139 2017-02-25 08:16:20 0|wumpus|a kind of alarm state, which makes sure nothing is permanently committed
140 2017-02-25 08:16:38 0|wumpus|catch the exception and call a handler to set that flag?
141 2017-02-25 08:16:49 0|sipa|that's really ugly...
142 2017-02-25 08:17:36 0|wumpus|why is that ugly? you should catch exceptions such as std::alloc anyway to warn the user, log something , and initiaite shutdown
143 2017-02-25 08:17:39 0|sipa|doesn't leveldb have some background writer thread?
144 2017-02-25 08:18:00 0|wumpus|why is leveldb's background thread a problem here?
145 2017-02-25 08:18:08 0|wumpus|we're concerned with corruptions of our own state, right?
146 2017-02-25 08:18:24 0|wumpus|leveldb won't be writing anything that we haven't committed
147 2017-02-25 08:18:49 0|wumpus|so as soon as there is the suspicion of inconsistent state, don't write anything to leveldb anymore
148 2017-02-25 08:20:31 0|sipa|i'm worried about an OOM caused by a thread we don't control
149 2017-02-25 08:20:46 0|wumpus|then the inconsistent state won't be committed to disk, and leveldb should be fine. Same for the wallet
150 2017-02-25 08:20:47 0|sipa|but i assume that would just cause a crash, as we can't catch a bad_alloc there anyway
151 2017-02-25 08:21:28 0|wumpus|well yes the writer thread could die in the middle of writing, sure, but there's no way to avoid that. The database needs t obe able to handle that, also for other crash robustness
152 2017-02-25 08:22:21 0|wumpus|what it won't do, however, is write inconsistent state if we haven't told it to
153 2017-02-25 08:22:31 0|sipa|yes, indeed
154 2017-02-25 08:23:36 0|sipa|i think you're right - with some changes it may be possible to do this safely
155 2017-02-25 08:23:54 0|sipa|if we make sure there is no code that may unintentionally catch a bad_alloc
156 2017-02-25 08:24:04 0|wumpus|right
157 2017-02-25 08:24:14 0|sipa|and that all the catches that do, also call AbortNode, which sets this flag
158 2017-02-25 08:24:25 0|wumpus|I think there should be a rule: if you catch a general std::exception, you can't continue there, and need to stop the program
159 2017-02-25 08:24:49 0|wumpus|as you have no idea what kind of error you have and whether it's recoverable
160 2017-02-25 08:24:52 0|sipa|right
161 2017-02-25 08:24:59 0|wumpus|(so that's for top-level handlers only)
162 2017-02-25 08:25:53 0|sipa|we have 53 places where std::exception is caught
163 2017-02-25 08:26:07 0|sipa|and 22 where ... is caught (even worse, as you can't even inspect the object)
164 2017-02-25 08:26:14 0|wumpus|the ptential worry is in the networking code... some parsing problems throw exceptions, we need to be really sure to catch them and continue otherwise there's a DoS
165 2017-02-25 08:26:57 0|wumpus|e.g. serialization throws std::runtime_error instead of something specific
166 2017-02-25 08:27:30 0|sipa|serialization throws std::ios_base
167 2017-02-25 08:27:38 0|sipa|serialization throws std::ios_base::failure
168 2017-02-25 08:28:00 0|wumpus|okay That's always an i/o error. Better than I thought
169 2017-02-25 08:28:13 0|sipa|there are some other things that can throw that
170 2017-02-25 08:28:32 0|wumpus|catching that and just reporting it and continuing in the network code would be ok
171 2017-02-25 08:28:54 0|sipa|network code or net_handling?
172 2017-02-25 08:29:20 0|sipa|net handling should catch serialization failures, and disconnect the offending peer
173 2017-02-25 08:29:31 0|wumpus|well the packet processing. It should make sure that one node is disconnected and not thewhole program comes tumbling down
174 2017-02-25 08:29:36 0|wumpus|yes
175 2017-02-25 08:29:45 0|sipa|right
176 2017-02-25 08:29:55 0|sipa|in fact, it's that except catch this is the culprit here
177 2017-02-25 08:30:03 0|sipa|*that exact
178 2017-02-25 08:30:25 0|wumpus|right, it should be more specific
179 2017-02-25 08:30:43 0|wumpus|catching ios::base and potentially a few others instead of std::exception would go a long way
180 2017-02-25 08:30:50 0|sipa|agree
181 2017-02-25 08:33:24 0|wumpus|and some exceptions like std::bad_alloc should initiate a quick shutdown. It could still be a DoS, but the program is probably in such a bad state when it happens that continuing with the other nodes is hopeless.
182 2017-02-25 08:33:55 0|wumpus|(though theoretically it could stil be a botched alloc due to the remote peer specifying a very large buffer which would never occur in reality)
183 2017-02-25 08:36:45 0|wumpus|so I guess uploading executables for rc2 is no longer necessary and we should merge #9856 and do rc3?
184 2017-02-25 08:36:47 0|gribble|https://github.com/bitcoin/bitcoin/issues/9856 | Terminate immediately when allocation fails by theuni ÷ Pull Request #9856 ÷ bitcoin/bitcoin ÷ GitHub
185 2017-02-25 08:42:59 0|wumpus|speaking about memory usage: another report of running out of memory with 0.14. #9857
186 2017-02-25 08:43:01 0|gribble|https://github.com/bitcoin/bitcoin/issues/9857 | Sync problem crashes Bitcoin: EXCEPTION: St9bad_alloc ÷ Issue #9857 ÷ bitcoin/bitcoin ÷ GitHub
187 2017-02-25 08:43:10 0|wumpus|maybe 0.14 increased steady-state memory usage?
188 2017-02-25 08:49:24 0|sipa|maybe
189 2017-02-25 08:49:39 0|sipa|the reuse of mempool memory by coincache may contribute to it
190 2017-02-25 08:50:01 0|sipa|though that doesn't affect worst-case memory usage, it may increase the average
191 2017-02-25 08:50:56 0|sipa|for RPi we should recommend lowering the default settings, i'm afraid
192 2017-02-25 08:56:44 0|wumpus|bah, what really helps when creating a gdb parachute is propagating the exit code, if not it will just always return success https://github.com/bitcoin/bitcoin/pull/9851/commits/6d7a789fb510d540906955c377143b92acb1fbb6 . Forgot this and might have been spinning for nothing for a while :p
193 2017-02-25 09:07:04 0|wumpus|uhm. Caught that issue in the act now. Unfortunately, the traceback is useless. Gdb near-crashes while generating it!
194 2017-02-25 09:07:29 0|wumpus|"Error while running hook_stop: Cannot access memory at address 0x24"
195 2017-02-25 09:08:03 0|wumpus|okay, so much for gdb...
196 2017-02-25 09:11:09 0|wumpus|that leaves trying to make it cough up a core dump, and uploading that somewhere...
197 2017-02-25 09:12:24 0|jeremyrubin|wumpus: cuckoocache in theory may have increase memory usage a bit
198 2017-02-25 09:13:07 0|jeremyrubin|(if you're curious for all steadystate memory usage increases)
199 2017-02-25 09:13:31 0|wumpus|by how much?
200 2017-02-25 09:13:36 0|jeremyrubin|unclear
201 2017-02-25 09:14:02 0|jeremyrubin|it always uses 100% of whatever's it's allocated
202 2017-02-25 09:14:27 0|jeremyrubin|whereas the previous one only uses space when it has entries (in theory)
203 2017-02-25 09:14:43 0|sipa|the default max is 32 MB instead of 40 MB though
204 2017-02-25 09:14:49 0|sipa|so at peak, it's smaller
205 2017-02-25 09:14:50 0|jeremyrubin|so at startup, would be using 32 MB whereas the prior was 0
206 2017-02-25 09:15:54 0|jeremyrubin|it's just unclear how long you have to be on the network to experience a full cache w/ old code
207 2017-02-25 09:16:33 0|jeremyrubin|anyways I'm pretty sure this isn't the cause :)
208 2017-02-25 09:17:24 0|sipa|1GB of memory is not much
209 2017-02-25 09:18:11 0|wumpus|indeed, runninga node with 1GB of memory has always required tweaking
210 2017-02-25 09:18:28 0|sipa|it's more surprising to me that 0.13.2 worked, though
211 2017-02-25 09:18:34 0|sipa|he's even reducing dbcache
212 2017-02-25 09:19:05 0|wumpus|I do see increased reporting of out-of-memory errors with 0.14
213 2017-02-25 09:19:43 0|jeremyrubin|yeah, totally could be the sigcache, could definitely put someone over-edge
214 2017-02-25 09:19:50 0|jeremyrubin|especially if they set the value higher
215 2017-02-25 09:19:59 0|jeremyrubin|e.g., 128 mb
216 2017-02-25 09:20:15 0|jeremyrubin|prior version you could set it to a GB and run happily for a while
217 2017-02-25 09:20:29 0|wumpus|they don't touch that setting, just dbcache
218 2017-02-25 09:20:44 0|wumpus|possibly they should. We should update the recommedations for low-memory nodes.
219 2017-02-25 09:21:29 0|jeremyrubin|I don't think practically they'd need to adjust it down, just be less agressive adjusting it up -- not much benefit in making it too large
220 2017-02-25 09:22:34 0|jeremyrubin|Can we (by any chance) fast track sipa's hack to get cuckoocache back to be non pow 2?
221 2017-02-25 09:23:15 0|jeremyrubin|it sucks if they're picking between 8 16 32 MB sizes, some more resolution there would be good
222 2017-02-25 09:23:42 0|wumpus|which # is that?
223 2017-02-25 09:24:10 0|jeremyrubin|#9533
224 2017-02-25 09:24:12 0|gribble|https://github.com/bitcoin/bitcoin/issues/9533 | Allow non-power-of-2 signature cache sizes by sipa ÷ Pull Request #9533 ÷ bitcoin/bitcoin ÷ GitHub
225 2017-02-25 09:26:06 0|wumpus|jeremyrubin: would help to have more than a concept ACK there from you
226 2017-02-25 09:26:44 0|jeremyrubin|k I'll do some more thorough review
227 2017-02-25 09:26:48 0|wumpus|thanks
228 2017-02-25 09:29:01 0|wumpus|also like gmaxwell I wonder what the performance difference would be, 8 multiplication would be slower than 8 logical AND
229 2017-02-25 09:31:38 0|jeremyrubin|I'm not currently set up to run great performance tests currently (will take some work to reset up my benching environment)
230 2017-02-25 09:31:58 0|jeremyrubin|But honestly performance was fine even when it was %
231 2017-02-25 09:32:08 0|wumpus|'fine' hehe
232 2017-02-25 09:32:18 0|jeremyrubin|*shrug*
233 2017-02-25 09:32:21 0|sipa|jeremyrubin: i'm sure that for the sigcache it doesn't matter
234 2017-02-25 09:32:35 0|sipa|but maybe it does for other things in the future
235 2017-02-25 09:32:37 0|wumpus|well it's obviously a compromise between memory usage and performance
236 2017-02-25 09:32:49 0|jeremyrubin|I mostly changed it to hash_mask to make sipa happy :p
237 2017-02-25 09:33:04 0|wumpus|neither '&' or '* >>' is 'better', but we need to be careful what to choose
238 2017-02-25 09:33:17 0|sipa|well you violated the core tenet of never using integer division by a variable!
239 2017-02-25 09:33:25 0|sipa|it invokes the wrath of maxwell
240 2017-02-25 09:33:33 0|jeremyrubin|It was checked to not be 0 ;)
241 2017-02-25 09:34:00 0|jeremyrubin|anyways...
242 2017-02-25 09:34:06 0|wumpus|division? where?
243 2017-02-25 09:34:20 0|jeremyrubin|(old version of cuckoocache used % in the hash computation)
244 2017-02-25 09:34:21 0|wumpus|I saw only multiplies, if there's a division, I'm going to NACK this one
245 2017-02-25 09:34:24 0|wumpus|ooh :p
246 2017-02-25 09:35:09 0|sipa|and that PR brings back the flexibility of variable sizes, but with a * and a >> instead of a %
247 2017-02-25 09:35:12 0|jeremyrubin|I recall benchmarking and not being able to see a difference between %, &, >>
248 2017-02-25 09:35:41 0|wumpus|yes using % would be more straightforward code readability wise but division is even worse than multiplication
249 2017-02-25 09:35:46 0|jeremyrubin|yeah
250 2017-02-25 09:35:56 0|jeremyrubin|so I think it's fairly safe to take this one.
251 2017-02-25 09:36:01 0|wumpus|yes
252 2017-02-25 09:36:08 0|jeremyrubin|My one suggestion sipa is to convert size to a uint64_t
253 2017-02-25 09:36:12 0|jeremyrubin|like the member
254 2017-02-25 09:36:22 0|jeremyrubin|so you can get rid of all the casts in the hash part
255 2017-02-25 09:36:35 0|wumpus|would this result in more efficient code on 32-bit though?
256 2017-02-25 09:36:42 0|wumpus|as it guarantees that the top bits are zero
257 2017-02-25 09:36:51 0|jeremyrubin|I'm not sure
258 2017-02-25 09:37:03 0|jeremyrubin|but they have to be cast to uint64_t in hashing part now
259 2017-02-25 09:37:19 0|jeremyrubin|so my thought is not having to do that cast every hash is slightly more efficient code
260 2017-02-25 09:37:35 0|wumpus|casts don't generally generate code
261 2017-02-25 09:37:50 0|jeremyrubin|uint32_t to uint64_t even?
262 2017-02-25 09:38:01 0|jeremyrubin|there are instructions that auto-0 it to the alu?
263 2017-02-25 09:38:04 0|jeremyrubin|that's fine then
264 2017-02-25 09:38:16 0|wumpus|no. on 32-bit, it just means 'assume top bits are 0'. On 64 bit evrything is stored in 64-bit registers and the top bits are always 0
265 2017-02-25 09:38:38 0|jeremyrubin|cool, then that's fine as is
266 2017-02-25 09:38:45 0|wumpus|at least on x86_64 you can't load something into the lower 32 bits without zeroing the upper bits
267 2017-02-25 09:39:12 0|jeremyrubin|I'd consider it an Ack from me then.
268 2017-02-25 09:40:01 0|jeremyrubin|The tests are pretty agressive on cuckoocache, so if the hashing idea were broken in some way it would either catch it or it would be broken in a very minor way.
269 2017-02-25 09:40:01 0|wumpus|of course the considereations depend on the specific architecture, but in general casts are very cheap or generate no extra code at all especially between big types. Arithmetic with 8-bit and 16-bit values can result in extra 'ANDs' in some architectures you're right
270 2017-02-25 09:40:19 0|wumpus|but then it's not so much the cast that generates the code
271 2017-02-25 09:40:55 0|wumpus|yes, having good tests is great!
272 2017-02-25 09:41:41 0|wumpus|the state of testing in bitcoin core improved a lot last year
273 2017-02-25 09:42:08 0|jeremyrubin|*cough* merge my checkqueue tests *cough* :P
274 2017-02-25 09:42:23 0|wumpus|which reminds me I should merge #9847
275 2017-02-25 09:42:24 0|gribble|https://github.com/bitcoin/bitcoin/issues/9847 | Extra test vector for BIP32 by sipa ÷ Pull Request #9847 ÷ bitcoin/bitcoin ÷ GitHub
276 2017-02-25 09:42:28 0|wumpus|jeremyrubin: oh, aren't they yet?
277 2017-02-25 09:42:56 0|jeremyrubin|#9497
278 2017-02-25 09:42:58 0|gribble|https://github.com/bitcoin/bitcoin/issues/9497 | CCheckQueue Unit Tests by JeremyRubin ÷ Pull Request #9497 ÷ bitcoin/bitcoin ÷ GitHub
279 2017-02-25 09:43:00 0|wumpus|usually test-only pulls are merged very quickly
280 2017-02-25 09:43:27 0|wumpus|then again I cannot really keep up anymore with the volume of pulls
281 2017-02-25 09:43:53 0|jeremyrubin|It's actually fine that they weren't, I found a mixed up loop condition (such that one test wasn't doing much) when I was sitting with kalle
282 2017-02-25 09:43:56 0|wumpus|which is good and bad
283 2017-02-25 09:44:01 0|bitcoin-git|13bitcoin/06master 1430aedcb 15Pieter Wuille: BIP32 extra test vector
284 2017-02-25 09:44:01 0|bitcoin-git|[13bitcoin] 15laanwj pushed 2 new commits to 06master: 02https://github.com/bitcoin/bitcoin/compare/f19afdbfb4cb...6206252e5073
285 2017-02-25 09:44:02 0|bitcoin-git|13bitcoin/06master 146206252 15Wladimir J. van der Laan: Merge #9847: Extra test vector for BIP32...
286 2017-02-25 09:44:02 0|sipa|but division is even worse than multiplication -> nehalem can issue a 64-bit mul every 2 cycles; a division every 19-69 cycles
287 2017-02-25 09:44:15 0|jeremyrubin|But that's all fixed now
288 2017-02-25 09:44:20 0|bitcoin-git|[13bitcoin] 15laanwj closed pull request #9847: Extra test vector for BIP32 (06master...06bip32up) 02https://github.com/bitcoin/bitcoin/pull/9847
289 2017-02-25 09:44:24 0|wumpus|sipa: yeah, that would classify as 'bad' in my eyes :)
290 2017-02-25 09:44:38 0|sipa|wumpus: the mul or the div?
291 2017-02-25 09:44:42 0|wumpus|the div
292 2017-02-25 09:44:47 0|jeremyrubin|nehalem?
293 2017-02-25 09:44:52 0|sipa|yes, absolutely
294 2017-02-25 09:45:10 0|sipa|but 'even worse' sounds like multiplication is already pretty bad... every 2 cycles is pretty good i'd say :)
295 2017-02-25 09:45:24 0|wumpus|divs have historically been really bad, some CPUs don't even have a specific instruction for it, but even those that have it's generally terrible
296 2017-02-25 09:45:43 0|wumpus|because of the underlying computational structure of course, not because those CPUs are badly implemented or so :)
297 2017-02-25 09:45:50 0|jeremyrubin|ah, nehalem: an intel arch name I'd thoroughly forgotten :)
298 2017-02-25 09:46:33 0|wumpus|2 cycles is pretty good, espeically if it's *every* 2 cycles
299 2017-02-25 09:46:48 0|jeremyrubin|and that hash routine should pipeline well
300 2017-02-25 09:46:55 0|sipa|the latency is higher than 2 cycles
301 2017-02-25 09:46:57 0|wumpus|usually mul is quite quick but the number of multiplication units is limited, so issuing a lot will bottleneck on that
302 2017-02-25 09:47:02 0|wumpus|but not on nehalem, apparently :)
303 2017-02-25 09:47:06 0|sipa|but the throughput is once every 2 cycles
304 2017-02-25 09:47:20 0|sipa|same on skylaky, btw
305 2017-02-25 09:47:23 0|wumpus|what would be the throughput for logical and?
306 2017-02-25 09:47:54 0|sipa|0.5 or 1, depending on the type of inputs
307 2017-02-25 09:47:55 0|wumpus|can't be much faster than per 2 cycles
308 2017-02-25 09:47:57 0|wumpus|oh!
309 2017-02-25 09:48:06 0|sipa|0.25 for and with a constant, even
310 2017-02-25 09:48:23 0|sipa|http://www.agner.org/optimize/instruction_tables.pdf
311 2017-02-25 09:48:24 0|wumpus|modern CPU architectures are so fast we can almost ignore computation overhead. Too bad memory didn't keep up :)
312 2017-02-25 09:48:44 0|jeremyrubin|Also if you want to really make that hashing faster, could switch to balke2b or something
313 2017-02-25 09:49:35 0|wumpus|sipa: thanks , that seems like a very useful document
314 2017-02-25 09:50:11 0|jeremyrubin|wumpus: +1 thanks!
315 2017-02-25 09:51:00 0|sipa|LOCK ADD... 18 cycles!
316 2017-02-25 09:53:27 0|wumpus|ieeeh
317 2017-02-25 09:57:40 0|jeremyrubin|so I'm trying to write some test software
318 2017-02-25 09:57:48 0|jeremyrubin|and I want to force segwit active
319 2017-02-25 09:58:06 0|jeremyrubin|seems that isn't trivial to do?
320 2017-02-25 09:58:11 0|sipa|regtest?
321 2017-02-25 09:59:21 0|jeremyrubin|it seems to still require some blocks to vote for it before it can be made active?
322 2017-02-25 09:59:37 0|sipa|yes
323 2017-02-25 09:59:57 0|sipa|3*144 blocks
324 2017-02-25 10:00:49 0|jeremyrubin|hmm... would be kinda nice to not have to do that, similar to how I can set BIPxxxHeight = 0.
325 2017-02-25 10:14:14 0|jeremyrubin|would it be safe to change VersionBitsState to read VersionBitsDeploymentInfo for an override state parameter?
326 2017-02-25 10:19:04 0|jeremyrubin|no I guess I'd neet to change GetStateFor
327 2017-02-25 12:42:14 0|btcdrak|jeremyrubin you could just turn down the number of blocks required on regtest from 144 to something lower like 10.
328 2017-02-25 15:45:27 0|cfields|woohoo, 0.13 ec2 sync finally finished :)
329 2017-02-25 15:58:46 0|cfields|for those interested... ec2 xlarge 4core, 16gb ram. All default settings:
330 2017-02-25 15:59:04 0|cfields|v0.13.2: 01:12:39:43
331 2017-02-25 15:59:04 0|cfields|v0.14.0: 00:06:24:17
332 2017-02-25 16:01:10 0|paveljanik|nice!
333 2017-02-25 16:03:55 0|cfields|Setting a reasonable dbcache cuts the 0.14 time in half. Unsure what it does for 0.13. Similar, I assume
334 2017-02-25 16:04:34 0|cfields|Actually no... 0.13 is doing lots more validation. So it'd still be much more than half.
335 2017-02-25 16:48:26 0|wumpus|FYI rc2 executables are online: https://bitcoin.org/bin/bitcoin-core-0.14.0/test.rc2/
336 2017-02-25 17:06:48 0|Lauda|sync is faster with 0.14.0?
337 2017-02-25 19:57:49 0|bitcoin-git|[13bitcoin] 15jameshilliard opened pull request #9858: remove TestBlockValidity from CreateNewBlock critical path (06master...06remove-tbv) 02https://github.com/bitcoin/bitcoin/pull/9858
338 2017-02-25 21:48:49 0|cfields|luke-jr: jinx :)
339 2017-02-25 21:48:55 0|luke-jr|?
340 2017-02-25 21:49:09 0|cfields|luke-jr: 9858
341 2017-02-25 21:54:21 0|luke-jr|i c â˺
342 2017-02-25 22:24:47 0|luke-jr|would be nice if everything that went into stable branches were daggy fixes.
343 2017-02-25 22:25:36 0|sipa|daggy?
344 2017-02-25 22:27:35 0|luke-jr|sipa: http://wiki.secondlife.com/wiki/Creating_a_version_control_repository#The_.22Daggy.22_Fix
345 2017-02-25 22:31:19 0|luke-jr|basically means bugfix commits are parented to the bug-introducing commit, allowing a merge into each branch of the same fix commit
346 2017-02-25 22:37:18 0|gmaxwell|are really that many of our fixes ones where there is a obvious bug inducing commit which is recent enough for that to be useful?
347 2017-02-25 22:41:35 0|luke-jr|gmaxwell: maybe not. as a middle-ground, I typically just try to rebase to the most recent branching point that is a clean merge to master
348 2017-02-25 22:42:02 0|luke-jr|eg, everything I do now would be based on 9828f9a at the latest (if possible)
349 2017-02-25 22:42:26 0|luke-jr|(which I have simple-tagged as "branch-0.14" in my repo)