1 2018-06-10 02:46:59	0|gmaxwell|sipa: did you look at their implementation? https://github.com/armfazh/flo-shani-aesni/blob/master/sha256/flo-shani.c
 2 2018-06-10 02:51:39	0|sipa|gmaxwell: yes, just interleaving
 3 2018-06-10 07:36:16	0|ilufang|quit
 4 2018-06-10 09:07:12	0|provoostenator|sipa: thanks for the extra context. Maintaining a large dbcache is mainly useful during IBD, so can't the problem of reorgs be avoided by only doing the optimization for very deep blocks?
 5 2018-06-10 09:09:34	0|provoostenator|And just in case, if during IBD an alternative set of headers is found that would trigger a deep reorg, you'd flush the cache and turn off the optimization, before switching to that new branch.
 6 2018-06-10 09:13:22	0|provoostenator|Right now it seems that 500 MB < dbcache < 7000 MB is a performance dead zone. Though I can try tweaking #11658 to see where the diminishing returns are.
 7 2018-06-10 09:13:24	0|gribble|https://github.com/bitcoin/bitcoin/issues/11658 | During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after by luke-jr · Pull Request #11658 · bitcoin/bitcoin · GitHub
 8 2018-06-10 15:57:07	0|provoostenator|Do I understand correctly that the only way for a coin cache entry to be dirty, is if the UTXO existed before the last flush and was spent since then? Would it be worth trying to bypass the cache in those cases and update the disk when spending a UTXO that's not in the cache?
 9 2018-06-10 16:00:35	0|provoostenator|I wonder if OS's make any effort to optimize a write to the same physical place on disk that you just read from.
10 2018-06-10 16:01:17	0|sipa|provoostenator: it can be dirty because it's created after the last flush, or spent after the last flush while ot was created before
11 2018-06-10 16:02:18	0|sipa|and of course we can bypass the cache... if we don't care about the performance it offers
12 2018-06-10 16:03:28	0|sipa|provoostenator: i guess we could only do the background flushing during IBD, but that's still very scary
13 2018-06-10 16:06:14	0|provoostenator|"dirty because it's created after the last flush" - how does that work? I thought they always get the FRESH flag in that case.
14 2018-06-10 16:06:54	0|provoostenator|(I meant DIRTY flag, not dirty in general db terminology)
15 2018-06-10 16:08:10	0|provoostenator|Of course I do care about the performance impact of such a change. My working theory is that too many DITRY entries slows things down to a state that's worse than a smaller cache. So perhaps preventing accumulation of DIRTY entries would prevent that.
16 2018-06-10 16:09:13	0|provoostenator|(my "aggresive" pruning branch is much slower than master, despite the cache growing much bigger)
17 2018-06-10 16:10:24	0|provoostenator|I'm currently running IBD from block 320,000 - 480,000 on my iMac several times with decreasing dbcache (and once from genesis without interrupting) to see what happens.
18 2018-06-10 16:17:20	0|provoostenator|My hypothesis, based on what I've seen so far, is that when running from genesis to with "infinite" cache, going from 320K to 480K will be fastest. Followed by starting at 320K with infinite cache. A 3 GB cache will be slower, but a 500 MB cache will _faster_ than a 3 GB cache. Possibly regardless of pruning.
19 2018-06-10 16:18:24	0|gmaxwell|I think that would be very surprising.
20 2018-06-10 16:18:51	0|provoostenator|Indeed
21 2018-06-10 16:23:02	0|bitcoin-git|[13bitcoin] 15ken2812221 opened pull request #13426: [WIP, bugfix] Add u8path and u8string to boost to fix #13103 (06master...06u8path_u8string) 02https://github.com/bitcoin/bitcoin/pull/13426
22 2018-06-10 16:29:52	0|sipa|provoostenator: FRESH implies DIRTY
23 2018-06-10 16:30:37	0|sipa|provoostenator: too many dirty entries slows things down... there may be a memory locality effect from just having many entries, but i don't see any way how dirtyness can impact that
24 2018-06-10 16:30:53	0|provoostenator|sipa: ah I see, so I should have said "DIRTY but not FRESH"
25 2018-06-10 16:32:10	0|provoostenator|Is there any sorting going on when entries are added?
26 2018-06-10 16:33:01	0|sipa|no
27 2018-06-10 16:33:05	0|sipa|it's a hash table
28 2018-06-10 16:33:32	0|sipa|provoostenator: i meant to say "about too many dirty entries slowing things doen"
29 2018-06-10 16:33:39	0|sipa|i don't believe that can be the case
30 2018-06-10 16:38:36	0|sipa|provoostenator: the time to flush itself may be proportional or worse to the number of dirty entries, though
31 2018-06-10 16:41:07	0|provoostenator|From what I saw on my AWS nodes, the pruning (which usually coincided with a cache flash) took just minutes and happened just a dozen or so times, on a IBD measured in days.
32 2018-06-10 16:42:08	0|sipa|right
33 2018-06-10 16:42:15	0|sipa|that seems expected
34 2018-06-10 16:42:35	0|provoostenator|So if an entry is not found int he cache, it starts walking through the disk looking for it? But there's no reason to assume that would be slower than without cache.
35 2018-06-10 16:43:10	0|sipa|of course disk will be slower than cache
36 2018-06-10 16:43:47	0|sipa|is it possible you're running into swap space?
37 2018-06-10 16:44:18	0|provoostenator|Amazon Ubuntu images don't have swap on by default, so I don't think so, but I already deleted those machine.
38 2018-06-10 16:48:05	0|provoostenator|At least I can rule that out in this current experiment, since I have 48 GB RAM
39 2018-06-10 16:51:16	0|provoostenator|When there's a cache, every time it calls CCoinsViewCache::FetchCoin it walks through the memory cache and if nothing is found walks through the disk cache. So there's potentially some duplicate effort, maybe that becomes a problem?
40 2018-06-10 16:51:53	0|provoostenator|Oh no, because it's a hash table, it's not walking, it just fetches it.
41 2018-06-10 16:52:42	0|provoostenator|The term "iterator" confused me there.
42 2018-06-10 16:52:56	0|sipa|yes
43 2018-06-10 16:53:22	0|sipa|and on disk, it just fetches from leveldb, which has indexes and other structure to guide the search - it's isn't really iterating either
44 2018-06-10 16:55:09	0|provoostenator|If a big cache causes a slowdown compared to a small cache, it has to be the in-memory stuff I would guess.
45 2018-06-10 17:03:29	0|sipa|how long does flushing take?
46 2018-06-10 17:03:42	0|sipa|it can be minutes even on high end systems for multi-gb caches
47 2018-06-10 17:06:20	0|provoostenator|Minutes as far as I know, let me upload the logs...
48 2018-06-10 17:16:39	0|provoostenator|https://ufile.io/tlvv3 (prune3000_sjors.log was the slowest, I gave up after 5 days)
49 2018-06-10 17:19:35	0|provoostenator|TIL about OnionShare, so here you go: http://4nzykwc37ncqcwhp.onion/recall-shiftless
50 2018-06-10 23:33:59	0|sipa|gmaxwell: i win
51 2018-06-10 23:34:39	0|sipa|intel's SSE4 sha256 code, transliterated to sse4 intrinsics... is 8% faster than the asm version
52 2018-06-10 23:34:48	0|sipa|(on a Ryzen system)
53 2018-06-10 23:48:20	0|sipa|on i7 the intrinsics version is slightly slower (0.7% slower for long hashes, 1.5% slower for double-SHA256, 4$ slower for 32-byte hashes)