Another unresponsive (testnet) node issue
Summary
Testnet node went unresponsive. crown-cli commands hang with no response. THe daemon is still running but not doing anything useful. There are 3 lock-wait threads (compared with only 2 in issue #341)
Steps to reproduce
Run a bunch of testnet nodes. Wait...
Expected behavior
Nodes should not deadlock
Problematic behavior
Nodes sometimes deadlock and stop doing useful work or communicating with users over RPC
Crown-core environment info
Ubuntu 16.04
Crown-core application info
Crown version v0.13.9.0-d5fd2fe3 (2019-06-21 15:57:45 +0300)
Relevant logs, dumps and/or screenshots
Attached debugger to unresponsive daemon, found this
crown@crown-testnet-02:~/.crown/testnet3$ sudo gdb /usr/local/bin/crownd 1344
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/crownd...(no debugging symbols found)...done.
Attaching to program: /usr/local/bin/crownd, process 1344
[New LWP 1345]
[New LWP 1346]
[New LWP 1347]
[New LWP 1348]
[New LWP 1349]
[New LWP 1350]
[New LWP 1368]
[New LWP 1370]
[New LWP 1372]
[New LWP 1373]
[New LWP 1374]
[New LWP 1375]
[New LWP 1376]
[New LWP 1377]
[New LWP 1378]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000080b1cd in boost::thread::join_noexcept() ()
#2 0x0000000000424781 in AppInit(int, char**) ()
#3 0x0000000000414d2f in main ()
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f2b3991f740 (LWP 1344) "crownd" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
2 Thread 0x7f2b38188700 (LWP 1345) "crownd" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
3 Thread 0x7f2b37987700 (LWP 1346) "crown-scriptch" pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
4 Thread 0x7f2b37186700 (LWP 1347) "crownd" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
5 Thread 0x7f2b36985700 (LWP 1348) "crownd" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
6 Thread 0x7f2b36184700 (LWP 1349) "crownd" 0x00007f2b38568a13 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
7 Thread 0x7f2b35983700 (LWP 1350) "crownd" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
8 Thread 0x7f2b2c58d700 (LWP 1368) "crownd" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
9 Thread 0x7f2b291a4700 (LWP 1370) "crown-legacysig" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
10 Thread 0x7f2b23fff700 (LWP 1372) "crown-net" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
11 Thread 0x7f2b237fe700 (LWP 1373) "crown-addcon" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
12 Thread 0x7f2b22ffd700 (LWP 1374) "crown-opencon" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
13 Thread 0x7f2b227fc700 (LWP 1375) "crown-msghand" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
14 Thread 0x7f2b21ffb700 (LWP 1376) "crown-dumpaddr" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
15 Thread 0x7f2b217fa700 (LWP 1377) "crown-miner" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
16 Thread 0x7f2b20ff9700 (LWP 1378) "crown-wallet" pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
(gdb) thread 10
[Switching to thread 10 (Thread 0x7f2b23fff700 (LWP 1372))]
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f2b38834e42 in __GI___pthread_mutex_lock (mutex=0xedb620 <cs_main>) at ../nptl/pthread_mutex_lock.c:115
#2 0x0000000000454845 in CMutexLock<AnnotatedMixin<boost::recursive_mutex> >::CMutexLock(AnnotatedMixin<boost::recursive_mutex>&, char const*, char const*, int, bool) ()
#3 0x000000000048bb05 in (anonymous namespace)::InitializeNode(int, CNode const*) ()
#4 0x0000000000523bd5 in boost::signals2::detail::signal_impl<void (int, CNode const*), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (int, CNode const*)>, boost::function<void (boost::signals2::connection const&, int, CNode const*)>, boost::signals2::mutex>::operator()(int, CNode const*) ()
#5 0x000000000050fbc1 in CNode::CNode(unsigned int, CAddress, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
#6 0x0000000000513404 in ThreadSocketHandler() ()
#7 0x000000000051b408 in void TraceThread<void (*)()>(char const*, void (*)()) ()
#8 0x0000000000809bf2 in thread_proxy ()
#9 0x00007f2b388326ba in start_thread (arg=0x7f2b23fff700) at pthread_create.c:333
#10 0x00007f2b3856841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) thread 13
[Switching to thread 13 (Thread 0x7f2b227fc700 (LWP 1375))]
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f2b38834e42 in __GI___pthread_mutex_lock (mutex=0xedb620 <cs_main>) at ../nptl/pthread_mutex_lock.c:115
#2 0x0000000000454845 in CMutexLock<AnnotatedMixin<boost::recursive_mutex> >::CMutexLock(AnnotatedMixin<boost::recursive_mutex>&, char const*, char const*, int, bool) ()
#3 0x000000000049d039 in GetTransaction(uint256 const&, CTransaction&, uint256&, bool) ()
#4 0x000000000071b4e0 in IsBudgetCollateralValid(uint256, uint256, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, long&, int&) ()
#5 0x000000000071bde2 in CBudgetProposal::IsValid(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool) const ()
#6 0x0000000000722509 in CBudgetManager::CheckAndRemove() ()
#7 0x000000000072283e in CBudgetManager::NewBlock() ()
#8 0x00000000004b4790 in ProcessNewBlock(CValidationState&, CNode*, CBlock*, CDiskBlockPos*) ()
#9 0x00000000004bbb01 in ProcessMessage(CNode*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, CDataStream&, long) ()
#10 0x00000000004bedbf in ProcessMessages(CNode*) ()
#11 0x000000000050d86c in ThreadMessageHandler() ()
#12 0x000000000051b408 in void TraceThread<void (*)()>(char const*, void (*)()) ()
#13 0x0000000000809bf2 in thread_proxy ()
#14 0x00007f2b388326ba in start_thread (arg=0x7f2b227fc700) at pthread_create.c:333
#15 0x00007f2b3856841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) thread 15
[Switching to thread 15 (Thread 0x7f2b217fa700 (LWP 1377))]
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 in ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f2b38834e42 in __GI___pthread_mutex_lock (mutex=0xf31c10 <budget+48>) at ../nptl/pthread_mutex_lock.c:115
#2 0x0000000000454845 in CMutexLock<AnnotatedMixin<boost::recursive_mutex> >::CMutexLock(AnnotatedMixin<boost::recursive_mutex>&, char const*, char const*, int, bool) ()
#3 0x0000000000714dee in CBudgetManager::IsBudgetPaymentBlock(int) const ()
#4 0x00000000007319f7 in FillBlockPayee(CMutableTransaction&, long) ()
#5 0x00000000004f762f in CreateNewBlock(CScript const&, CWallet*, bool) ()
#6 0x00000000004f8a2f in CreateNewBlockWithKey(CReserveKey&, CWallet*, bool) ()
#7 0x00000000004f8e58 in BitcoinMiner(CWallet*, bool) ()
#8 0x0000000000505057 in ThreadStakeMiner() ()
#9 0x000000000051b408 in void TraceThread<void (*)()>(char const*, void (*)()) ()
#10 0x0000000000809bf2 in thread_proxy ()
#11 0x00007f2b388326ba in start_thread (arg=0x7f2b217fa700) at pthread_create.c:333
#12 0x00007f2b3856841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Drop full logs & dumps here: https://nextcloud.crownplatform.com/index.php/s/Q6H8enXNmJsQYCD
Possible fixes
(Any comments on what you think might be responsible for the problem, if you have particular insight.)
/cc @artem
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information