Unfortunate Downtime

Hello users!

I just wanted to write up a post on what happened yesterday. If you were online around 3 – 5 PM (CT) you may have noticed the server went down and you were unable to see your contacts online. I first noticed the problem around 3:50 PM (CT) and started investigating. Prosody was using 100% of 1 core on the temporary server. This choked everything up, you could sign in sometimes, other times you could not. If you were online, you may have noticed that all of your contacts were offline. Sadly this was the case until around 5 – 6 PM (CT).

The attack was due to tons and tons of XMPP spam directed towards one user that was offline (DDoS protection won’t protect from this). The way Prosody (the XMPP server) handles offline messages is flawed and has been mentioned here. This attack has occurred before but never seriously interrupted service like it did yesterday. Since the temp server wasn’t as powerful it couldn’t handle the load.

I took some drastic action and enabled debug logs for the time being to see exactly what was going on. There were so many messages coming in, that the log file is huge now (over 20GB) and still climbing. Here’s a snippet from the debug log. I’m posting the full one without censoring anything either. I believe this user may be involved in the attack.

Oct 04 14:20:36 s2sin55dee11bc500 debug Received[s2sin]: <presence type=’subscribe’ to=’xsndr@xmpp.is’ from=’fhr72w1vkcet@keinhorn.de’>
Oct 04 14:20:36 xmpp.is:presence debug inbound presence subscribe from fhr72w1vkcet@keinhorn.de for xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: asked for: xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: loading for offline user: xsndr@xmpp.is
Oct 04 14:20:36 stanzarouter debug Routing to remote…
Oct 04 14:20:36 s2sout55dee4948a30 debug going to send stanza to keinhorn.de from xmpp.is
Oct 04 14:20:36 s2sout55dee4948a30 debug sending: <presence type=’unavailable’ to=’fhr72w1vkcet@keinhorn.de’ from=’xsndr@xmpp.is’>
Oct 04 14:20:36 s2sout55dee4948a30 debug stanza sent over s2sout
Oct 04 14:20:36 rostermanager debug load_roster: asked for: xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: loading for offline user: xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: asked for: xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: loading for offline user: xsndr@xmpp.is
Oct 04 14:20:36 xmpp.is:auth_internal_hashed debug account not found for username ‘xsndr’
Oct 04 14:20:36 rostermanager debug not saving roster for xsndr@xmpp.is: the user doesn’t exist

The temporary fix was to disable offline messages entirely. I also migrated XMPP to a better server (temporarily) and I used my beta deploy script, which in turn didn’t include some important things that I had to fix later on… I haven’t had time to get my dedicated hypervisor reconfigured yet as I’ve been too busy with work.

I’m writings scripts and configs on GitHub to easily push updates and for transparency reasons. I’m going to dedicate more time to XMPP.is in the near future to improve how everything is managed and prevent downtime like this from ever happening again!

New Domains & Let’s Encrypt!

Hello users!

If you were online 7/21/17 you should have received an announcement via XMPP about the new domains that were being added.

As of a couple days ago I’m happy to say they’ve been setup for public use and you can register an account on any of them you’d like. If someone has taken a cool username on say, @xmpp.is you could just register it on @xmpp.co, @xmpp.cx & @xmpp.xyz vice versa. All of the domains support DNSSEC and the same security practices have been applied (no logs kept, full disk encryption, PFS supported & TLS required). All requests go to the same server specifically so communicating between these domains is even more secure. If you’re interested in grabbing an account go to the register page!

I’d also like to mention that the site and XMPP server (Prosody) now entirely use Let’s Encrypt. In the past I wasn’t able to automate the renewal of certificates because Prosody would have to be restarted every time the cert was renewed (every 3 months). To minimize downtime I found that you could just reload the Prosody TLS module when reloading the service using mod_reload_modules. This seemed to work flawlessly in tests. Along with this, a systemd service for renewals and my new script, all certs are now provided by Let’s Encrypt!

Donation Goal Reached!

Hello users!

I’d just like to thank everyone who donated via PayPal & Bitcoin recently. We have finally hit our donation goal of $40 (PayPal) and I received a generous $20 Bitcoin donation last week. I’ll be increasing the goal every time it is hit from now on. This will help us properly keep track of the amount of PayPal donations we receive. I’m planning to use the money to buy new domains for XMPP.is, that people will be able to use primarily for XMPP accounts. If you have any ideas for a domain, let me know! I already own the domain “xmpp.co” but looking to acquire others, even buy them from others if needed.

In other news… I have since disabled the Prosody throttle_presence module because it was very buggy and generating a massive amount of error logs. I’ve also added a few more donations methods including Ethereum, Litecoin and Auroracoin which you can find at xmpp.is/donate.

Update On The XMPP DDoS Attack

Hello users!

I sent out a notification to all online users yesterday informing of an emergency Prosody restart due to an attack. Don’t worry though, it doesn’t affect the security of the server itself! It seems the goal was to exhaust resources (CPU/Memory/Storage). I’ve implemented some counter-measures to prevent this from causing the server’s load to spike as high as it did. As of yesterday mod_limits has been enabled. This should prevent any single connection whether C2S or S2S from hogging up system resources. I’ve also witnessed attacks that spam offline users with messages going up to 20GB in some cases. XMPP.is stores offline messages and there was no limit set before I noticed. I’ve setup a cron job to find and delete offline messages (every minute) above a certain file size to prevent some malicious spammer from filling up the entire disk.

This attack didn’t really do all that much though… It brought down the registration page a few times but everything else stayed up.

Full attack traffic (48 hours):

CPU usage (48 hours):