Image Source: WikiMedia Commons
** If you’re reading this because you’re node is down and you need it fixed fast, scroll down to skip the reading and go straight to our checklist **
Introduction:
Anyone with experience operating a Bitcoin Lightning Network node has been here: node is down, red alert! Hopefully, this isn’t a situation you find yourself in too often, especially as the various implementations mature, reliability improves and so does the troubleshooting knowledge base of node operators.
That said, at times everyone’s Lightning node will go down, whether it’s the most built-up, heavily monitored, industrial grade routing node on the network, or a two-week old node with 2 channels running in your basement. Unplanned downtime happens. It’s just a fact of life in tech.
This is a guide to help you and your team to bring that beautiful node of yours, whether big or small, back to life as quickly and painlessly as possible. Our guide focuses on the LND (Lightning Network Daemon) implementation of the Lightning Network, and we’re assuming that it is running on Ubuntu Linux. So the specific commands listed would need to be adapted to other implementations. However, the strategy should translate, so hopefully it’s helpful to all!
Is this helpful or cool? Send us some sats or a note at ⚡ealvar13@getalby.com
Scenario:
Ok so here you are: you’ve received the notice that your node is offline. What now?
Likely you’re running some monitoring and automation software to alert you when your node goes offline. Monitoring is a subject for another blog post, but if you don’t have any setup yet, a service like LNWatch Bot is a great place to start.
Image Source: WikiMedia Commons
LND Node Troubleshooting Guide
First (and most important step):
Don’t panic! Remember slow is smooth, and smooth is fast.
That said, let’s get going!
Let’s talk about the types of things that can go wrong with a node setup. Of course this list is potentially huge, but let’s just cover the top 5 categories of issues in roughly chronological order.
1. Connection Issues
Connection issues, or getting one part of your setup to communicate with another, or to the network at large, is a common problem with unique or custom setups. If you are using a service like Voltage, or software such as Umbrel, you likely won’t encounter this category of problems.
However, if you’ve got LND running on one machine and trying to connect to a Bitcoin backend on another machine, you’re probably familiar with connection issues.
The first things to think about here are IP addresses, ports, and firewalls. LND and bitcoind can be connected to each other via their respective config files. You just need to tell LND where (IP address and port) to find bitcoind and give it the gRPC connection data. However, you’ll also need to be sure that bitcoind is listening for connection from either LND’s IP address or all IP addresses. Both machines will need to be set up to send and accept traffic on the appropriate ports(check firewalls). If you’ve got a fancy AWS setup you may need to mess with security groups, etc.
A very important security issue here is that by default LND and bitcoind’s gRPC connection with each other will not be encrypted. You’ll need to ensure that traffic has some other method of protection. Perhaps you can have both machines running on the same virtual private cloud or SSH tunneling is another common option.
2. Config Issues
Test your config! LND config changes take effect when you restart LND. So please don’t make a config change, not restart and then just leave the node. It may reboot for some other reason in the middle of the night, LND doesn’t like the config, and then the node goes down and Monday morning you realize you’ve lost half your channels!!! (I may have some experience with this particular issue 😬)
Note: Setting up node monitoring can help avoid this one.
LND really has a wide variety of configuration options. Likely there is already a config option for whatever it is that you are trying to do. Check this doc for a well commented list of options.
While many in the Lightning world don’t like to use testnet, this is one (of many) situations where it can be a huge help. It’s great when you can have a testing node where you can try out any config changes to see what sort of impact they will have on the operation of your node without risking losing real funds.
Also when upgrading between versions of LND, if you run into any trouble you’ll want to check if any changes to the config file are necessary.
We highly recommend getting very familiar with this file as a good 80% of the time whatever problem you’re having will get fixed here.
3. Liquidity Issues
Liquidity issues are defo the most common when it comes to failed payments. There is a lot of “the payment failed, my node is not working!” when actually your node is working just fine, it just doesn’t have the appropriate liquidity to handle that particular payment.
Now we could, and perhaps at some point will, write an entire post on how to get and manage liquidity. But here we’ll just summarize and list some tools.
First off, liquidity, inbound liquidity, outbound liquidity, is a complex topic and instead of reinventing the wheel, we’ll direct you to this documentation on the topic. Enjoy!
Tools and resource for obtaining liquidity:
Tools and resources for managing liquidity:
4. Backups and Security
Your LND node is essentially a hot wallet and funds can be lost if things go wrong. Lightning Labs has written a better guide on security and backups than we could. So here will link you to these fabulous docs on security and recovery, and then summarize the top to do’s.
LND offers a cool feature called the Static Channel Backup file. Long story short, if you were to lose your channel.db file or it became corrupted, you likely will not be able to get your node back up and running the way it was, but (if you have your seed) and the SCB file, you’ll likely be able to keep your node identity and recover all your funds.
What you need to do to be sure you have this option, is to make sure that you are generating and storing (on another machine) your SCB file whenever there is a channel open or close event.
5. Databases
Due to the current nature of the Lightning Network, a record needs to be kept for every HTLC that your node has handled… and that can be a lot! Database management and bloat can be a big issue for node operators.
In a standard setup, the channel.db file located at /.lnd/data/graph/[network]/channel.db
Keep an eye on this file! It can get out of hand and cause you some issues. Here are some things you can do to keep it in check…
Enable database compaction via the lnd.conf file
Use a database compaction tool such as bbolt
Ensure that your setup provides enough storage for this file to be quite large.
As of LND 0.15 there are increased database options including support for switching to a SQL based database. This may be a great idea for enterprise level setups that may want to maintain the LND database via a database service perhaps via AWS, etc.
LND Node Troubleshooting Checklist
- Is the machine/server up and running?
- Can you access it?
- Is LND running?
- Can you check its status via systemctl/crontab?
- Does $ lncli getinfo return a json or an error?
- Gather all the error data you can.
- Is lncli returning an error?
- A good way to test is $lncli getinfo and carefully read results
- (Note: recently “Synced to Graph=False” and “Synced to Chain=False”
- are indications of a bug that requires upgrading LND)
- Can you access and read the error logs?
- For a mainnet node the error log is located at ~/.lnd/logs/bitcoin/mainnet/lnd.log
- Check out this guide
- If you are running LND with the bitcoind backend, any error there may come in handy
- ~/.bitcoin/debug.log
- $ tail -f /home/ubuntu/.bitcoin/mainnet/debug.log
- What’s in your config file?
- By default your config file will be at ~/.lnd/conf.lnd
- If you’re unsure of what a line of config is doing, you can check here
- Remember that you’ll need to restart LND for the new config to take effect.
- Connection or gRPC issues are common
- Can LND talk to it’s Bitcoin node?
- Check that you have rpc enabled and that you have the correction authentication info in both the lnd.conf file and the config file for your backend.
- Could something else be using your ports?!
- Mainnet ports:
- 8332 Bitcoin rpc
- 8333 Bitcoin p2p network
- 8334 Tor
- 28332 ZMQ blocks
- 28333 ZMQTransactions
- 10009 LND rpc
- 9735 Lightning p2p network
- 8080 LND REST API
- Check with some variety of the $ lsof command or favorite networking tool
- Perhaps $ sudo lsof -i -P -n | grep LISTEN
- If LND needs to communicate with another machine(like a remote bitcoind node) look at the lnd.conf file to see how it’s connecting and/or use curl to check which machines you can reach.
- Firewall setting
- $ sudo ufw status
- Which version of LND are you running?
- Check with $ lnd –version
- You can also find the location of the program with $ which lnd
- Are you in full on disaster recovery mode?
- Don’t panic, but do read this
- Maybe someone already solved this issue for you?
- Did you ask the interwebs about the problem?
- Run a search along the lines of “Lightning LND [error message and/or problem]”
- Did you search for similar issues here?
- Be sure to search for both open and closed issues.
- Maybe you found a bug and should open an issue, but first, did you have a chat about it?
- Slack: https://lightning.engineering/slack.html
- Telegram: https://t.me/lightninglab
0 Comments