Server Woes

Figuring out how to scale InstructBot as the user base has grown.

Published: 2022-06-17 by Omsad

I have been wanting to release the version 3.7 of InstructBot to my testers for a couple of months now. Wanting is the important word as unfortunately I've been stopped by issues with the backend servers.

When I first started to investigate what was wrong the picture didn't look good. As whilst I wasn't getting hundreds of request per minute, the requests I was receiving where taking long and longer to process.

Past request count and duration of InstructBot.

As you can see in the above the request count flattens as the average time per request rises (as the server is hitting 100% CPU usage and they are rejected). You can also see this in the below as I have requests taking more than a reasonable amount of time, e.g. anything over a second for simple requests and couple of seconds for more complicated ones.

Previous duration of each request to InstructBot.

This ultimately was what was impacting the ability to download commands which had been shared.

If you were lucky enough to have you request processed just after the server had restarted then everything went well. But make the same request an hour later and it likely would have not been processed for some time.

This made meaningfully testing the changes I made take time, as although I could load test to a degree in my development environment, the live environment didn't always behave the same way.

However over the last couple of months I have been slowly improving the situation by rolling out backend changes to fix bugs / improve performance and then waiting to see if the change actually made the improvement I thought it would.

Sometimes this meant changing the database to be more efficient at the cost of taking more space, which necessitated downtime. Other times it mean rolling back a the change I made because it had no effect (or made things worse).

Today however the servers now are in a pretty good position. For example the below shows the last 24 hours which is a much nicer to looking graph, as no request, on average, is taking more than a second.

Current request count and duration of InstructBot.

You can see this more clearly in the below, as 95% of the request are being processed within 0.88 seconds, although there are a few which are taking slightly longer.

Current duration of each request to InstructBot.

So why all the effort?

Well with version 3.7 of InstructBot there are several new command types and triggers which need more information than just the command.

For example there is a new poll trigger for commands which will execute the command when a specified poll result is received. This poses the problem of that if you share this command you must share the poll linked to the command as well otherwise the trigger would be pointless.

Another example is that there is a new disable profile command, sharing this command would be pointless if you also didn't also get the linked profiles (and all the commands linked to those profiles etc...).

This is the extent of the update needed for version 3.7, meaning that I can now finishing my own testing before handing version 3.7 off to my testers.

In the future I may also allow profiles to be shared explicitly, e.g. a download profile window. I could also allow sharing of poll and predictions although I think there would be less value in this.

Ultimately if any of this happens will depend upon user feedback which you can give by joining our discord .