Unfortunate Downtime

Hello users!

I just wanted to write up a post on what happened yesterday. If you were online around 3 – 5 PM (CT) you may have noticed the server went down and you were unable to see your contacts online. I first noticed the problem around 3:50 PM (CT) and started investigating. Prosody was using 100% of 1 core on the temporary server. This choked everything up, you could sign in sometimes, other times you could not. If you were online, you may have noticed that all of your contacts were offline. Sadly this was the case until around 5 – 6 PM (CT).

The attack was due to tons and tons of XMPP spam directed towards one user that was offline (DDoS protection won’t protect from this). The way Prosody (the XMPP server) handles offline messages is flawed and has been mentioned here. This attack has occurred before but never seriously interrupted service like it did yesterday. Since the temp server wasn’t as powerful it couldn’t handle the load.

I took some drastic action and enabled debug logs for the time being to see exactly what was going on. There were so many messages coming in, that the log file is huge now (over 20GB) and still climbing. Here’s a snippet from the debug log. I’m posting the full one without censoring anything either. I believe this user may be involved in the attack.

Oct 04 14:20:36 s2sin55dee11bc500 debug Received[s2sin]: <presence type=’subscribe’ to=’xsndr@xmpp.is’ from=’fhr72w1vkcet@keinhorn.de’>
Oct 04 14:20:36 xmpp.is:presence debug inbound presence subscribe from fhr72w1vkcet@keinhorn.de for xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: asked for: xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: loading for offline user: xsndr@xmpp.is
Oct 04 14:20:36 stanzarouter debug Routing to remote…
Oct 04 14:20:36 s2sout55dee4948a30 debug going to send stanza to keinhorn.de from xmpp.is
Oct 04 14:20:36 s2sout55dee4948a30 debug sending: <presence type=’unavailable’ to=’fhr72w1vkcet@keinhorn.de’ from=’xsndr@xmpp.is’>
Oct 04 14:20:36 s2sout55dee4948a30 debug stanza sent over s2sout
Oct 04 14:20:36 rostermanager debug load_roster: asked for: xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: loading for offline user: xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: asked for: xsndr@xmpp.is
Oct 04 14:20:36 rostermanager debug load_roster: loading for offline user: xsndr@xmpp.is
Oct 04 14:20:36 xmpp.is:auth_internal_hashed debug account not found for username ‘xsndr’
Oct 04 14:20:36 rostermanager debug not saving roster for xsndr@xmpp.is: the user doesn’t exist

The temporary fix was to disable offline messages entirely. I also migrated XMPP to a better server (temporarily) and I used my beta deploy script, which in turn didn’t include some important things that I had to fix later on… I haven’t had time to get my dedicated hypervisor reconfigured yet as I’ve been too busy with work.

I’m writings scripts and configs on GitHub to easily push updates and for transparency reasons. I’m going to dedicate more time to XMPP.is in the near future to improve how everything is managed and prevent downtime like this from ever happening again!