1 2017-02-24 00:00:03 0|sipa|but the progress estimation code was changed significantly in 0.14
2 2017-02-24 00:02:08 0|gmaxwell|reindexing spends something like 20 minutes up front scanning for headers, which might be distorting your numbers.
3 2017-02-24 00:35:15 0|pfeerpedr|who do i need to talk to in order to speed up my transaction?
4 2017-02-24 00:39:06 0|bitcoin-git|[13bitcoin] 15MarcoFalke opened pull request #9846: doc: Small release notes fixups in the list of pulls (060.14...06Mf1702-014doc) 02https://github.com/bitcoin/bitcoin/pull/9846
5 2017-02-24 02:02:55 0|bitcoin-git|[13bitcoin] 15sipa opened pull request #9847: Extra test vector for BIP32 (06master...06bip32up) 02https://github.com/bitcoin/bitcoin/pull/9847
6 2017-02-24 02:42:21 0|achow101|cfields: just reset my gitian and got 8d4bb27b5ab1916f04b74a2bcdccf8781c46fea96a3d5eb4a4a7f587577df64c bitcoin-0.14.0-osx-unsigned.dmg
7 2017-02-24 02:42:25 0|achow101|does that match yours?
8 2017-02-24 02:42:43 0|achow101|It's probably doing the alternating thing again
9 2017-02-24 02:56:28 0|fanquake|achow101 looks like it does match
10 2017-02-24 02:57:00 0|fanquake|So you've got the alternating builds again? I'm just about to finish mine.
11 2017-02-24 03:11:30 0|bitcoin-git|[13bitcoin] 15appop opened pull request #9848: update (06master...06master) 02https://github.com/bitcoin/bitcoin/pull/9848
12 2017-02-24 03:12:23 0|bitcoin-git|[13bitcoin] 15fanquake closed pull request #9848: update (06master...06master) 02https://github.com/bitcoin/bitcoin/pull/9848
13 2017-02-24 03:21:37 0|fanquake|achow101 Interestingly, my osx gitian results now match cfields. Which is weird, because nothings changes since rc1 that could have fixed gitian issues.
14 2017-02-24 03:42:46 0|achow101|actually, just ran gitian again and it got cfields's results. I'll run it a few more times to make sure it is deterministic
15 2017-02-24 04:01:35 0|bitcoin-git|[13bitcoin] 15luke-jr opened pull request #9849: Qt: Network Watch tool (06master...06gui_netwatch) 02https://github.com/bitcoin/bitcoin/pull/9849
16 2017-02-24 04:03:04 0|cfields|achow101: it'd be really helpful if you could upload the .o files from a non-matching build
17 2017-02-24 04:03:54 0|achow101|I think I can give you the kvm image of the non-matching build. I just need to make sure it is the right one
18 2017-02-24 04:04:02 0|cfields|achow101: "on-target" after the build gives you a shell
19 2017-02-24 04:09:33 0|achow101|cfields: well that build ended a while ago and I have since done other builds. right now I am trying to start the vm with that image of the mismatching build which I saved and then ssh'ing into it, but it doesn't seem to be working now
20 2017-02-24 04:16:02 0|cfields|achow101: ok, let me know if you manage to get them. I'll check back in the morning
21 2017-02-24 04:39:45 0|achow101|cfields: I got all of the build stuff off of the vm and tar'ed it. It should contain all of the .o files. Download: https://drive.google.com/file/d/0Bxw3ip9QfNOUVzkwUnlhMTExYjg/view?usp=sharing
22 2017-02-24 04:40:06 0|achow101|also I can give you the vm which contains all of that stuff too. I'm waiting for the upload of that to finish
23 2017-02-24 04:46:31 0|achow101|cfields: vm with the mismatching build: https://drive.google.com/file/d/0Bxw3ip9QfNOUN0E2aDZZQU1Pd2s/view?usp=sharing
24 2017-02-24 05:25:06 0|cfields|achow101: er, you sure that's a broken build?
25 2017-02-24 05:30:13 0|luke-jr|(we have 3 sigs on rc2)
26 2017-02-24 05:30:27 0|luke-jr|oh, but not all matching
27 2017-02-24 05:31:27 0|cfields|luke-jr: yea, i think i'll delay signing until morning once a few more are in
28 2017-02-24 05:31:50 0|cfields|now that we have achow101's objects for comparison, I'm hoping it'll point to the culprit
29 2017-02-24 05:31:54 0|achow101|cfields: I'm pretty sure that's the broken build
30 2017-02-24 05:35:51 0|achow101|luke-jr: my matching osx ones are pr'ed
31 2017-02-24 05:36:12 0|cfields|achow101: are you positive? All of my object files are identical as far as i can tell
32 2017-02-24 05:37:12 0|achow101|yes.
33 2017-02-24 05:37:23 0|achow101|you can fire up the vm image I gave you to check as well
34 2017-02-24 05:44:05 0|cfields|achow101: ok nm, got the diff now
35 2017-02-24 05:48:06 0|achow101|cool
36 2017-02-24 05:53:11 0|cfields|achow101: mmm, they're different kernels
37 2017-02-24 05:53:37 0|cfields|that's the only obvious thing i see
38 2017-02-24 05:54:26 0|cfields|maybe qt embeds uname output?
39 2017-02-24 05:56:22 0|achow101|but why would it only affect osx?
40 2017-02-24 05:56:40 0|achow101|also, how are they different kernels? I thought the vms were built exactly the same
41 2017-02-24 05:56:56 0|cfields|they should be
42 2017-02-24 05:57:26 0|achow101|oh, maybe the upgrade that happens every time was failing some of the time?
43 2017-02-24 05:57:29 0|cfields|-uname -r = 3.13.0-108-generic
44 2017-02-24 05:57:29 0|cfields|+uname -r = 3.13.0-77-generic
45 2017-02-24 05:57:30 0|luke-jr|LXC uses the host's kernel
46 2017-02-24 05:57:45 0|luke-jr|so no matter what, we can't rely on kernels to match
47 2017-02-24 05:57:46 0|achow101|luke-jr: I'm using kvm
48 2017-02-24 05:59:30 0|cfields|luke-jr: well the fact that the kernels don't match is indicative that they're not using the same base
49 2017-02-24 05:59:42 0|cfields|in which case glibc (or something) may be different
50 2017-02-24 05:59:54 0|luke-jr|hm
51 2017-02-24 06:00:49 0|cfields|so it seems to be some kind of gitian issue
52 2017-02-24 07:42:24 0|luke-jr|jonasschnelli: what kind of locking issues? can you elaborate?
53 2017-02-24 07:43:23 0|jonasschnelli|luke-jr: the app is unresponsive. I had to force shut down... will take a closer look
54 2017-02-24 07:43:28 0|jonasschnelli|luke-jr: but I like the PR
55 2017-02-24 07:48:52 0|wumpus|so it looks like someone had the test_bitcoin issue outside of travis: #9850
56 2017-02-24 07:48:53 0|gribble|https://github.com/bitcoin/bitcoin/issues/9850 | test_bitcoin: /usr/include/boost/thread/pthread/recursive_mutex.hpp:104: boost::recursive_mutex::~recursive_mutex(): Assertion `!pthread_mutex_destroy() failed. ÷ Issue #9850 ÷ bitcoin/bitcoin ÷ GitHub
57 2017-02-24 07:50:28 0|jonasschnelli|yes
58 2017-02-24 07:51:01 0|jonasschnelli|I tried to reproduce in ubuntu 14.04. but did not had the issue
59 2017-02-24 07:51:10 0|wumpus|same here.
60 2017-02-24 07:51:28 0|wumpus|did a depends build, just like travis, on 14.04, just like travis
61 2017-02-24 07:51:42 0|wumpus|so that means the same version of boost, gcc, etc
62 2017-02-24 07:52:07 0|wumpus|this is really strange
63 2017-02-24 07:52:29 0|jonasschnelli|Oh. Even that.
64 2017-02-24 07:53:17 0|gmaxwell|hurray! (?)
65 2017-02-24 07:53:30 0|jonasschnelli|I ran test_bitcoin in valgrind and I could see some uninitialised value
66 2017-02-24 07:54:15 0|jonasschnelli|invoked by the toggle_network RPC tests
67 2017-02-24 08:00:17 0|wumpus|jonasschnelli: that is a potential concern, however what happens in the RPC tests shouldn't affect test_bitcoin?
68 2017-02-24 08:00:38 0|jonasschnelli|wumpus: I meant the RPC unit tests...
69 2017-02-24 08:00:42 0|wumpus|no valgrind errors in test_bitcoin?
70 2017-02-24 08:00:44 0|wumpus|ooh!
71 2017-02-24 08:01:02 0|jonasschnelli|look for rpc_togglenetwork
72 2017-02-24 08:01:12 0|jonasschnelli|rpc_tests.cpp
73 2017-02-24 08:01:29 0|jonasschnelli|Not sure if its related... we have added this a couple of weeks (or even months) ago
74 2017-02-24 08:02:11 0|jonasschnelli|Here's my valgrind run: https://0bin.net/paste/2xS-7aRGhWA11BlS#uwUOiDB9X4h+puz6AxdtnWiMXF5KJlUhC-WFL8bCy4k
75 2017-02-24 08:02:32 0|jonasschnelli|This also frightens me: ==59692== Conditional jump or move depends on uninitialised value(s)
76 2017-02-24 08:05:54 0|gmaxwell|what version are you running it against those line number do not agree with my code here.
77 2017-02-24 08:06:26 0|jonasschnelli|9949ebfa6a548260858df429f4d0e716e0a26065
78 2017-02-24 08:06:31 0|jonasschnelli|I think this is 0.14.0rc1
79 2017-02-24 08:07:41 0|jonasschnelli|my setup: ./configure --enable-zmq --enable-glibc-back-compat --enable-reduce-exports CPPFLAGS=-DDEBUG_LOCKORDER --with-incompatible-bdb
80 2017-02-24 08:08:02 0|jonasschnelli|(same as the failing travis setup)
81 2017-02-24 08:08:11 0|gmaxwell|oh geesh we have source files with the same name. bet that'll be fun for anyone trying to build with msvc.
82 2017-02-24 08:09:12 0|jonasschnelli|you mean the problem when we removed the rpc_ prefix and moved them into the rpc/ folder?
83 2017-02-24 08:09:41 0|gmaxwell|yea, at least last time I used it MSVC couldn't handle source files having the same name even if they were in different directories. :)
84 2017-02-24 08:09:42 0|jonasschnelli|My IDEs find by filename also doesn't like this
85 2017-02-24 08:10:16 0|jonasschnelli|We could have kept the rpc_ prefix even after moving them into the specific folder
86 2017-02-24 08:13:17 0|gmaxwell|so in that rpc tests I don't see anything that sets up the conman object. But if it's executing those objects it's not null. How is the g_conman setup in the tests?
87 2017-02-24 08:13:47 0|fanquake|jonasschnelli I can see the same results with valgrind
88 2017-02-24 08:13:49 0|jonasschnelli|TestingSetup() jas a g_connman = std::unique_ptr<CConnman>(new CConnman(0x1337, 0x1337)); // Deterministic randomness for tests.
89 2017-02-24 08:14:48 0|fanquake|https://0bin.net/paste/DLBX7+ZYaQ79TrRS#ACJ-Fp8c8aAZrLW2jDShhRMKbnTlxlnRJDkCRhXfpcI
90 2017-02-24 08:15:16 0|jonasschnelli|Thanks fanquake
91 2017-02-24 08:33:12 0|cfields|https://github.com/theuni/bitcoin/commit/72aa3324bc69640937f2fda6a63634bcf1e8c6c1
92 2017-02-24 08:33:26 0|cfields|should fix the connman issue, though i seriously doubt that's the crasher
93 2017-02-24 08:33:58 0|cfields|(thanks marcofalke for pointing that out earlier)
94 2017-02-24 08:35:04 0|cfields|i'll PR that in the morning
95 2017-02-24 09:22:35 0|bitcoin-git|13bitcoin/06master 14f81f0d0 15Russell Yanofsky: Update sendfrom RPC help to correct coin selection misconception
96 2017-02-24 09:22:35 0|bitcoin-git|[13bitcoin] 15laanwj pushed 2 new commits to 06master: 02https://github.com/bitcoin/bitcoin/compare/692c9eddba67...00285cece814
97 2017-02-24 09:22:36 0|bitcoin-git|13bitcoin/06master 1400285ce 15Wladimir J. van der Laan: Merge #9840: Update sendfrom RPC help to correct coin selection misconception...
98 2017-02-24 09:22:58 0|bitcoin-git|[13bitcoin] 15laanwj closed pull request #9840: Update sendfrom RPC help to correct coin selection misconception (06master...06pr/fromacct) 02https://github.com/bitcoin/bitcoin/pull/9840
99 2017-02-24 09:54:06 0|bitcoin-git|13bitcoin/06master 14ef9f495 15Marko Bencun: Trivial: fix comments referencing AppInit2...
100 2017-02-24 09:54:06 0|bitcoin-git|[13bitcoin] 15laanwj pushed 2 new commits to 06master: 02https://github.com/bitcoin/bitcoin/compare/00285cece814...dd6e0d630167
101 2017-02-24 09:54:07 0|bitcoin-git|13bitcoin/06master 14dd6e0d6 15Wladimir J. van der Laan: Merge #9833: Trivial: fix comments referencing AppInit2...
102 2017-02-24 09:54:26 0|bitcoin-git|[13bitcoin] 15laanwj closed pull request #9833: Trivial: fix comments referencing AppInit2 (06master...06stalecomments) 02https://github.com/bitcoin/bitcoin/pull/9833
103 2017-02-24 09:58:08 0|paveljanik|FWIW - I'm not able to reproduce test_bitcoin failures on any of my machines (different unices) :-(
104 2017-02-24 10:03:07 0|wumpus|darn
105 2017-02-24 10:03:22 0|bitcoin-git|[13bitcoin] 15laanwj closed pull request #9846: doc: Small release notes fixups in the list of pulls (060.14...06Mf1702-014doc) 02https://github.com/bitcoin/bitcoin/pull/9846
106 2017-02-24 10:05:43 0|wumpus|there seems to be nothing *special* in the config.log posted in #9850
107 2017-02-24 10:05:45 0|gribble|https://github.com/bitcoin/bitcoin/issues/9850 | test_bitcoin: /usr/include/boost/thread/pthread/recursive_mutex.hpp:104: boost::recursive_mutex::~recursive_mutex(): Assertion `!pthread_mutex_destroy() failed. ÷ Issue #9850 ÷ bitcoin/bitcoin ÷ GitHub
108 2017-02-24 10:06:12 0|wumpus|standard ubuntu 16.04 versions of everything
109 2017-02-24 10:08:46 0|wumpus|no arguments to configure
110 2017-02-24 10:22:57 0|paveljanik|I suspect some travis issue
111 2017-02-24 10:23:11 0|paveljanik|(even if it was reproduced outside of it)
112 2017-02-24 10:24:54 0|wumpus|I forgot something in my testing yesterday; the travis build passes, --enable-glibc-back-compat --enable-reduce-exports and LDFLAGS=-static-libstdc++" . No difference in reproduction, though
113 2017-02-24 10:26:07 0|wumpus|I also test it faster now, launch test_bitcoin and kill it after a second (after all, the problem happens just before the Running ... line so there's no need to go all the way)
114 2017-02-24 10:30:09 0|wumpus|in any case it just works perfectly, every time, no matter what I do. Almost feels like travis is trolling us
115 2017-02-24 10:35:37 0|gmaxwell|"Why do the patterns of failuers seem to be spelling ascii digits? ...'wouldnt want to give yo..'"
116 2017-02-24 10:36:31 0|wumpus|hehe, yes that would be a giveaway
117 2017-02-24 10:45:14 0|wumpus|never felt so unhappy to see "*** No errors detected"
118 2017-02-24 10:54:54 0|wumpus|well, so much for trying to reproduce locally, going to try set up a trap for this on travis
119 2017-02-24 11:05:48 0|wumpus|ok my gdb script is working, this should work
120 2017-02-24 11:10:51 0|bitcoin-git|[13bitcoin] 15laanwj opened pull request #9851: [do not merge] travis gdb parachute for #9825 (06master...062017_02_travisissue) 02https://github.com/bitcoin/bitcoin/pull/9851
121 2017-02-24 11:41:53 0|bitcoin-git|[13bitcoin] 15zcc0721 opened pull request #9852: Merge remote-tracking branch 'refs/remotes/bitcoin/master' (06master...06master) 02https://github.com/bitcoin/bitcoin/pull/9852
122 2017-02-24 11:42:48 0|bitcoin-git|[13bitcoin] 15laanwj closed pull request #9852: Merge remote-tracking branch 'refs/remotes/bitcoin/master' (06master...06master) 02https://github.com/bitcoin/bitcoin/pull/9852
123 2017-02-24 11:49:17 0|bitcoin-git|13bitcoin/06master 14dc222f8 15Karl-Johan Alm: Trivial: Rephrase the definition of difficulty in the code.
124 2017-02-24 11:49:17 0|bitcoin-git|[13bitcoin] 15laanwj pushed 2 new commits to 06master: 02https://github.com/bitcoin/bitcoin/compare/dd6e0d630167...f19afdbfb4cb
125 2017-02-24 11:49:18 0|bitcoin-git|13bitcoin/06master 14f19afdb 15Wladimir J. van der Laan: Merge #9612: [trivial] Rephrase the definition of difficulty....
126 2017-02-24 11:49:33 0|bitcoin-git|[13bitcoin] 15laanwj closed pull request #9612: [trivial] Rephrase the definition of difficulty. (06master...06clarify-difficulty) 02https://github.com/bitcoin/bitcoin/pull/9612
127 2017-02-24 12:00:41 0|wumpus|wth, one of the builds in #9825 is rebuilding all the dependencies?
128 2017-02-24 12:00:42 0|gribble|https://github.com/bitcoin/bitcoin/issues/9825 | Intermittent FAIL: test/test_bitcoin in Travis ÷ Issue #9825 ÷ bitcoin/bitcoin ÷ GitHub
129 2017-02-24 12:01:14 0|wumpus|eh #9851
130 2017-02-24 12:01:15 0|gribble|https://github.com/bitcoin/bitcoin/issues/9851 | [do not merge] travis gdb parachute for #9825 by laanwj ÷ Pull Request #9851 ÷ bitcoin/bitcoin ÷ GitHub
131 2017-02-24 12:04:34 0|wumpus|Everything that can go wrong is going wrong, man, it's hard to think of a more nightmarish way to debug things. Well maybe debugging the kernel for GPU cache issues wins by a bit :/
132 2017-02-24 12:06:19 0|wumpus|I'm going to cancel all other travis builds to give this one priority, sorry
133 2017-02-24 12:11:06 0|wumpus|ah the builds are starting, let's see what surprises await this time
134 2017-02-24 12:11:11 0|wumpus|NOOOOOOO don't start building ccache :(
135 2017-02-24 12:41:29 0|wumpus|cfields: what would be the best way to skip buildling of dependencies for a PR, for debugging?
136 2017-02-24 12:43:41 0|wumpus|I don't understand why all three builds of #9851 trigger a complete dependency rebuild, but this way it's not going to work, I need a fast iteration time to have any chance of reproducing the issue
137 2017-02-24 12:43:43 0|gribble|https://github.com/bitcoin/bitcoin/issues/9851 | [do not merge] travis gdb parachute for #9825 by laanwj ÷ Pull Request #9851 ÷ bitcoin/bitcoin ÷ GitHub
138 2017-02-24 12:49:42 0|wumpus|oh not all three, just #3, which is the nowallet one. Could just remove that one.
139 2017-02-24 12:49:44 0|gribble|https://github.com/bitcoin/bitcoin/issues/3 | Encrypt wallet ÷ Issue #3 ÷ bitcoin/bitcoin ÷ GitHub
140 2017-02-24 14:18:37 0|achow101|did the signed binary detached sigs come out yet?
141 2017-02-24 14:27:47 0|BlueMatt|wumpus: you could do it on your own personal fork?
142 2017-02-24 14:45:56 0|jonasschnelli|Any idea why the LXC gitian initialization takes that long?
143 2017-02-24 14:46:09 0|jonasschnelli|Here it takes >5mins during "Upgrading system, may take a while"... seems to be very long
144 2017-02-24 14:46:33 0|jonasschnelli|(step between "install.log" and starting of "build.log")
145 2017-02-24 15:36:42 0|cfields|wumpus: note that DEBUG=1 is used for the crash case. That adds the extra bounds checking from libstdc++
146 2017-02-24 15:41:35 0|cfields|wumpus: as for rebuilding depends, the travis cache depends on the env vars set. So if you change an env var, it will create a new cache because it looks like a new build that it shouldn't clobber
147 2017-02-24 15:42:04 0|cfields|where "change" also includes adding/removing env vars
148 2017-02-24 15:48:33 0|cfields|gitian builders: sigs for v0.14.0rc2 are pushed
149 2017-02-24 16:13:22 0|wumpus|ah so the env vars are the secret :)
150 2017-02-24 16:29:04 0|bitcoin-git|[13bitcoin] 15jnewbery opened pull request #9853: Fix error codes from various RPCs (06master...06fixerrorcodes) 02https://github.com/bitcoin/bitcoin/pull/9853
151 2017-02-24 16:29:52 0|bitcoin-git|[13bitcoin] 15jnewbery closed pull request #9713: Fix error causes and messages in rpc/net.cpp (06master...06fixsetbanerrormessages) 02https://github.com/bitcoin/bitcoin/pull/9713
152 2017-02-24 16:29:58 0|bitcoin-git|[13bitcoin] 15jnewbery closed pull request #9714: Return correct error codes from bumpfee() (06master...06bumpfeeerrormessages) 02https://github.com/bitcoin/bitcoin/pull/9714
153 2017-02-24 16:39:36 0|BlueMatt|so now that we have named args someone should probably do a pass and fix the million places that we reject args that are null even when they have a default value, I suppose?
154 2017-02-24 16:41:56 0|wumpus|BlueMatt: yes - null should be interpreted as the default value, on a call by call basis
155 2017-02-24 16:42:08 0|wumpus|I intend to get around to that for 0.15
156 2017-02-24 16:46:19 0|wumpus|in most cases it's trivial
157 2017-02-24 16:46:54 0|wumpus|there are a few such as getbalance that have slightly different functionality based on the number of arguments, some discussion will be needed there
158 2017-02-24 21:17:25 0|sipa|i can't file an issue right now, but my RPi3 bitcoind OOMed, and marked a block invalid as a result
159 2017-02-24 21:17:32 0|sipa|that's very bad...
160 2017-02-24 21:17:37 0|sipa|on 0.14.0rc1
161 2017-02-24 21:28:10 0|cfields|sipa: yikes
162 2017-02-24 21:28:28 0|cfields|sipa: any idea where it oom'd?
163 2017-02-24 22:23:05 0|sipa|cfields: #9854
164 2017-02-24 22:23:06 0|gribble|https://github.com/bitcoin/bitcoin/issues/9854 | Bitcoind 0.14.0rc1: OOM -> block marked invalid ÷ Issue #9854 ÷ bitcoin/bitcoin ÷ GitHub
165 2017-02-24 23:10:36 0|cfields|sipa: seems i just managed to bring down my dev box while testing a fix (forcing OOM). Hope you're happy :)
166 2017-02-24 23:11:33 0|cfields|woohoo, rescued
167 2017-02-24 23:16:46 0|BlueMatt|cfields: so you have a fix? or should I go look into it?
168 2017-02-24 23:17:11 0|cfields|BlueMatt: yea, i have a patch ready. I'm uneasy about it though, so debate welcome
169 2017-02-24 23:17:12 0|cfields|sec
170 2017-02-24 23:18:18 0|BlueMatt|k
171 2017-02-24 23:22:04 0|cfields|BlueMatt: see 9854
172 2017-02-24 23:22:09 0|BlueMatt|oh
173 2017-02-24 23:22:12 0|BlueMatt|hmmmm, I like that
174 2017-02-24 23:22:29 0|BlueMatt|wait, does this apply to more than bad_alloc?
175 2017-02-24 23:23:15 0|cfields|no
176 2017-02-24 23:23:19 0|BlueMatt|if we can make it apply only to std::bad_alloc then I'm all for it (or is there a list of all the things this could apply to?)
177 2017-02-24 23:23:19 0|gmaxwell|cfields: next time replace malloc with a wrapper. :P
178 2017-02-24 23:23:23 0|BlueMatt|lol
179 2017-02-24 23:23:55 0|gmaxwell|BlueMatt: sipa pointed out the error to me earlier in private, my comment:
180 2017-02-24 23:23:58 0|cfields|gmaxwell: you mean new? :)
181 2017-02-24 23:23:58 0|gmaxwell|11:55 <gmaxwell> God damnit. it really should not reject the block because of a fucking exception!
182 2017-02-24 23:24:01 0|gmaxwell|11:55 <gmaxwell> I hate that we use exceptions for error handling in the seralization.
183 2017-02-24 23:24:04 0|gmaxwell|11:56 <gmaxwell> maybe we can wrap the allocator so that failures kill the process.
184 2017-02-24 23:24:21 0|gmaxwell|cfields: well I mean the underlying libc function new calls, which is malloc. (same way tcmalloc replaces the allocator)
185 2017-02-24 23:24:27 0|cfields|gmaxwell: this isn't our exception. This is a c++ feature.
186 2017-02-24 23:24:53 0|cfields|gmaxwell: right, this overrides what happens when "new" fails. So this is essentially what you're asking for
187 2017-02-24 23:25:11 0|gmaxwell|cfields: no no: Our mistake is that a var int decode failure is an exception. Because of this we cannot wrap block processing with a catch * {tell user their hardware is befucked or someting bad happened}.
188 2017-02-24 23:25:55 0|cfields|gmaxwell: oh, i see what you mean
189 2017-02-24 23:26:04 0|gmaxwell|Which basically means that random programming errors that throw exceptions can cause blocks to be rejected intead of the node shutting down, which is exactly what produced the bdb locks as a fork rather than a brief DOS.
190 2017-02-24 23:26:44 0|BlueMatt|wait, ok, so has someone identified what actually happened here?
191 2017-02-24 23:26:50 0|gmaxwell|There are basically three states for block processing: "I have a valid block", "I have an invalid block.", and "I notice that I am confused." the latter should shut down without marking the block invalid.
192 2017-02-24 23:26:52 0|sipa|gmaxwell: i think you're overgeneralizing
193 2017-02-24 23:27:05 0|cfields|gmaxwell: i completely agree. but this is a specific case that can be easily detected
194 2017-02-24 23:27:13 0|sipa|gmaxwell: problems during deserialization shouldn't _ever_ cause a block to be marked invalid
195 2017-02-24 23:27:20 0|gmaxwell|yes, this one we can work around. But where is the next one? this is the second one of those btw.
196 2017-02-24 23:27:27 0|gmaxwell|Leveldb internal errors also used to do this to us.
197 2017-02-24 23:27:37 0|gmaxwell|Third if you count bdb's internal errors.
198 2017-02-24 23:27:42 0|cfields|gmaxwell: so let's fix that independently :)
199 2017-02-24 23:27:53 0|gmaxwell|I'm fine with your general fix approach for now.
200 2017-02-24 23:28:31 0|cfields|there's one gotcha there, though... prevector calls malloc directly
201 2017-02-24 23:28:40 0|gmaxwell|I am lamenting that C++ code randomly calls exceptions without documenting the possiblity clearly. And that we make use of exceptions to mark invalidity. Which means that random internal errors can mark invalidity. And you all know I hate exceptions, so that bias is not in question. :)
202 2017-02-24 23:29:08 0|gmaxwell|cfields: why are you replacing new and not malloc? (I don't have a strong opinion, it's just a question)
203 2017-02-24 23:29:11 0|sipa|cfields: it could use new[] instead, i think
204 2017-02-24 23:29:20 0|sipa|gmaxwell: how do you replace malloc?
205 2017-02-24 23:29:40 0|gmaxwell|glibc has a specific override. But perhaps there is no portable way?
206 2017-02-24 23:29:42 0|sipa|you'd need to do it with link-time magic, and hope that libstdc++ doesn't bypass it somehow
207 2017-02-24 23:29:48 0|BlueMatt|I'm still confused, where do we use such exceptions to mark invalidity?
208 2017-02-24 23:29:54 0|sipa|BlueMatt: i don't know!
209 2017-02-24 23:29:58 0|sipa|we shouldn't!
210 2017-02-24 23:30:12 0|BlueMatt|yes, I dont see the specific issue here, yet
211 2017-02-24 23:30:24 0|gmaxwell|sipa: your logs showed we did exactly that.
212 2017-02-24 23:30:29 0|BlueMatt|gmaxwell: no they dont
213 2017-02-24 23:30:37 0|BlueMatt|"ERROR: ConnectBlock(): inputs missing/spent"
214 2017-02-24 23:30:39 0|cfields|BlueMatt: my take-away from the above was that if we didn't throw in deserialization, we could just wrap acceptblock and activatebestchain in try/catch(), and abort any time something's caught
215 2017-02-24 23:30:44 0|BlueMatt|that was after the bad_alloc
216 2017-02-24 23:31:02 0|cfields|gmaxwell: memory allocation failed, then the _next block_ was rejected
217 2017-02-24 23:31:08 0|BlueMatt|cfields: yes, and we should do that, probably still
218 2017-02-24 23:31:14 0|sipa|gmaxwell: my assumption is that the error _is_ caught somewhere, not passed up, and as a result a normal "fail" return value is returned, and a higher layer interprets that as invalid block
219 2017-02-24 23:31:27 0|sipa|gmaxwell: i don't think we have anywhere a direct "exception? mark invalid!" logic
220 2017-02-24 23:31:43 0|BlueMatt|sipa: script interpreter does
221 2017-02-24 23:31:46 0|BlueMatt|but thats it i believe
222 2017-02-24 23:32:26 0|BlueMatt|(in the debug log you posted I do not believe that was the error, either)
223 2017-02-24 23:32:58 0|gmaxwell|we do all over the place! we have a generic catch that returns false on functions that must be true for validity.
224 2017-02-24 23:33:11 0|BlueMatt|gmaxwell: we do?
225 2017-02-24 23:33:20 0|gmaxwell|Open up validation.cpp basically every catch in there does this.
226 2017-02-24 23:33:21 0|cfields|sipa: my take was that the block was accepted, but we didn't switch to the new tip, so the next block failed when looking up inputs
227 2017-02-24 23:33:24 0|BlueMatt|only script interpreter i believe
228 2017-02-24 23:34:33 0|gmaxwell|okay it's not as bad as I thought.
229 2017-02-24 23:34:40 0|sipa|did we in 0.14 introduce the SendRejectsAndCheckIfBanned(pfrom, connman) call in net_processing:2754 ?
230 2017-02-24 23:34:42 0|BlueMatt|well now that i check it is worse than I thought :p
231 2017-02-24 23:34:48 0|sipa|which before used to be inside the catch block?
232 2017-02-24 23:34:52 0|BlueMatt|some disk reads shit that probably should be smarter than it is
233 2017-02-24 23:35:11 0|BlueMatt|sipa: yes, and no, before it didnt exist
234 2017-02-24 23:35:17 0|BlueMatt|(was only in SendMessages)
235 2017-02-24 23:35:17 0|cfields|sipa: it's new, we used to only send rejects+ban from SendMessages()
236 2017-02-24 23:36:09 0|sipa|i see
237 2017-02-24 23:38:10 0|cfields|imo the throw happened somewhere around SetBestChain, it was just caught in ProcessMessages because that's the only place we do a generic catch(...)
238 2017-02-24 23:38:13 0|gmaxwell|BlueMatt: well I thought it was _every_ one of them, but I checked readblockfromdisk and it's not.