Category Archives: Tech Stuff

Under the hood of my Hack24 entry

I competed by myself at hack24. I don’t see competing alone as a massive issue personally, as it cuts down communication overhead by a massive amount. ūüėČ

For those that haven’t heard of hackathons before, the concept is fairly simple:

  • Teams of 1-4 people.
  • Multiple challenges (at least in this event)
  • 24 hours to build a solution.
  • Only code you have written during the event (or Open Source libraries/apps) can be used.

It is quite gruelling to compete in these types of events; you have limited time and you need to prioritise your time across features and components carefully so you don’t end up with a complete backend but no frontend or similar.

Generally you omit anything that is not-essential such as login systems, logging or some error checks etc.

I used the event to test some of the ideas and applications (Microservices, RabbitMQ, Riak, Nginx, php-fpm etc) I have been using on larger projects in a smaller time constrained project – and I think the outcome proves that it works.

I am a big fan of microservices; breaking your app into small testable components with fixed interfaces is awesome for quick building and easy debugging – want to test if something is working, inject a message into the input and see what the outputs are. Once working you can mostly treat it as a single component and move onto the next piece. It keeps your mind focused and, at least for me, reduces stress.

I am a long-time PHP coder and so it made sense, when it came to quick prototyping, to use this and the lightweight Slim framework as my language of choice – both for the frontend and all the workers on the backend all running on top of a single Vultr VPS [disclosure: url contains affiliate link].


SMS proxy – my hack24 entry

I broke the project into 4 main parts:

  • Frontend (website, sms callback hook for receiving messages from Esendex (the challenge sponsor))
  • Receiving worker (take messages from callback and lookup entries in Riak and route to next processing step either filter or directly to sending worker)
  • Filter (responsible for transforming the message before routing the message to the sending worker)
  • Sending worker (responsible for sending the message out via Esendex)

I am open-sourcing the code as a learning tool for other people (and perhaps so I can abuse it in later events…)

You are welcome to ask any questions and I will do my best to answer them.

If you want to contact me, please use twitter or email mike @ technomonk . com

Microservices payload design

There are a lot of people talking about microservices at present. I understand it is fashionable and a lot of people are trying to get rich from consulting in the domain, but a lot of the things I hear are just plain wrong or bad practice.

People are just throwing microservices out there with little consideration of any of the basic requirements of operations. No idea of how they will monitor them; no concept of of how they can debug them and no regard as to how they upgrade them.

I came from a dev background and over time migrated into operations. I dislike fragile, hard to debug components.

I have learnt the hard way just how difficult a distributed microservice architecture can be – but it is still (when designed correctly) a better choice than a hulking monolith of code.

I like my payloads to be debuggable. For this reason I like JSON as a interchange format. It isn’t the most efficient representation format but can be easily human-parsed and is supported by most languages.

I like to know how long a request has taken through the system so I put a uuid and microsecond timestamp when every request is first seen. I also like to know how long each step takes so I also append a timestamp and additional id for every process the request passes through.

Doing these in a uniform way allows you to take a payload from anywhere in the system and be able to use the same tools to perform basic analysis.

It can bloat the payload a little, but being able to see at a glance how quickly each worker is dealing with requests, how long each request takes end to end and what bottlenecks you are seeing allows you to define and keep contracts with your service consumers and detect potential failures before they happen and automatically trip circuit breakers to mitigate some of these issues.

I don’t have all the answers, but this is starting to work well for me.

Instant Messaging

I was asked after my last post what I used for instant messaging. The answer is kinda interesting in my opinion.

As some of you know I am a privacy advocate, but try to also balance that with ease of use and ease of integrating with other people.

I run my own XMPP server which I share with a number of collegues. This is running of Prosody an open source XMPP server. We use the MUC module to host a number of conference rooms where our bots and other notifications are sent.

In addition to this I also use Skype, not because I trust Microsoft to not sell my contact info out to the NSA, but because it reduces the friction of contacting me for a number of people. Forcing them to use my private XMPP server (even through federation) is too big a hurdle for them to jump.

A look at the new rackspace cloud

I have been very lucky to be included in the first rollout of the new rackspace cloud and while it has been live (to me) for a few days, I have only just had a few minutes to have a play with it. Continue reading

10 things any newbie web developer needs to know

PHP or other language

I’m not going to preach about the pros and cons of different languages, suffice to say that any sufficiently advanced website is indistinguishable from magic requires more than just just static files to make it work. PHP is my language of choice (and has been for over a decade) but there are plenty of other options out there (Python, Ruby or even (shudder) .net)

Which you choose depends more on what is most comfortable to you and what support network you have to call upon. It isn’t generally a good idea to learn Python in a Microsoft shop for example unless there are other pressing reasons to do so.

Whatever you end up learning, you need to be competent in its use, make use of the extensions available (no point reinventing the wheel when there are dozens already made for you) and know how to use the online documentation. Coding style helps, but as long as you are consistent really isn’t that pressing unless you are working with others.


Even if you don’t deal with much front-end code yourself, perhaps you use templates, you still need to understand at least the basics of XHTML and CSS to make it easier for the front-end developers to deal with your output. At the very least your code needs to output well formed, correctly nested code. We mean

<div><p><b><i>hello world</i></b></p></div>

rather than

<div><p><i><b>hello world</i></b></div>

(note the lack of ending <p> tag and improperly nested <i> and <b> tags) an XHTML validator will help, but only on the finished output. They don’t work on fragments of code like this.

MVCs, frameworks and libraries

As stated above, and especially if you are getting paid on a per project basis, you want to reduce the number of times you want to reinvent the wheel. This is known as Don’t Repeat Yourself or DRU. There is no point writing yet another PDF library for example when FPDF, Zend_PDF, Pdflib and many others already exist (or docbook/ps via converters etc) and do an adequate job for most tasks. MVC or Model-View-Controllers to give them their full name are a concept (or design pattern) that allows you to separate the data access parts from the actual logic and the templates used in your projects. This is useful in a team based environment because the designer can design, the coders can code and the coders and DBAs can control the models and how they interact with the databases – it just tends to separate the roles more cleanly and make modifications at a later date easier. If you want to add sharding to the database, you can simply modify the models while keeping the interface the same and suddenly your app can be cluster aware. It’s really a win-win situation.

I use Zend Framework under PHP to provide a MVC and other libraries. I’ve tried Code-Igniter and Kohana and while they work, they either feel dated (in the case of Code Igniter) or just are poorly documented (in the case of Kohana). Zend Framework, while not trivial to get up and running for the novice, worked well for the way I work.


OOP is one of those things, you either love it or hate it. I personally tolerate it with a mild distain – but that is just me. (I predate the whole OOP revolution.) Put simply, OOP allows a set of related variables and functions to be pulled together into a single package called a class. You then can either extend the class in your own projects overriding or extending the various functions (called methods) or use it as is. The biggest advantage, especially when you are importing a lot of libraries is that unless you are careful, two libraries may have a function or variable with the same name. (This is solved with namespaces in the latest PHP versions) At best the interpreter or compiler throws an error, at worst one library may end up calling a function or trampling over the data from another unintentionally.


At its simplest level a regex is a way of matching patterns within strings. When used to their full ability, a regex can sometimes replace 10-20 lines of non regex code with a single line. They are often used in Mod_rewrite rules, input validation and searching for certain things within quanities of text. Learn them, use them and master them – they will save you a lot of time.

Source Control

Even when working on your own projects, you sometimes wish you could turn back time and undo something stupid you did a month ago in some code. Your backups have been recycled and a lot of changes have been made since – it sounds like a lost cause. This is where version control comes in. Everytime you are happy with a set of modifications you check them in. The difference between the old version and the current version are worked out and saved. At a later date you can go back and see who made the change, when it was done and exactly what was changed. If you want to checkout a copy of the code as it was at this time, you can do that. This is almost essential in a team-based environment as it allows auditing of code in a more detailed way than simply looking at backups from a particular point in time. What if one of the coders inserted a backdoor into your code? Wouldn’t you want to be able to identify every other change that was made and check them for similar nefariousness?


As a developer you will often find questions and problems that you don’t have the answers for. You can try to puzzle through it yourself, but there is a whole world of people out there, many of whom have already solved similar issues and put their answers online. Being able to search effectively puts these answers at your fingertips.


Not everyone in this world has perfect vision, perfect hearing or the ability to use a mouse. Many sites are designed in such a way that every user is assumed to have adequate abilities. If your site assumes that all readers can use your fixed-sized font, will have Javascript enabled and have will be read rather than listened to using a screen reader then it probably wont be accessible. There are many tools and concepts that don’t take much effort to make your site easier for users with impaired abilities, some are listed below.

Monitoring and SEO

A site is a success if it reaches its goals. It is hard to know if you are hitting them if you have no way of measuring the values the goals specify. Are you being seen by 100 people a day? Are you selling at least 3 widgets from your site a week? Does your advertising campaign result in increased sales and traffic? You need to measure anything that you need if you are going to try to set goals on these things – simple idea, but it’s suprising how few people ‘get it’.

SEO is related to monitoring your traffic in a similar way to energy efficiency is related to watching how much power you use. Unless you know where you are starting from and where you are now, you can’t know how much you have improved by and if you are paying for adwords or similar advertising, then how much each extra visitor or sale is costing you.

JQuery and other JS toolkits

It is all well and good being able to produce dynamic pages and process forms from the server-side, but sometimes you really need something extra on the clientside as well. Enter JavaScript.

JavaScript or JS has been around since the mid 90s and is standard on all major browsers. The big problem comes when certain features exposed to Javascript (Ajax, Storage, Location, DOMs/BOMs etc) vary from browser to browser.

The easy solution to this is to take a library that hides these differences and make it easy to write code once that will run on most other browsers. Jquery is one option, but not the only one.

In addition to this there are many libraries, shims and other utilities that make doing more complex things easier. JQuery for example has the UI library which makes certain tasks such as tabs and date pickers trivial to add to your page.

This isn’t an exhaustive list and I’m sure there are many things could be added to the list. If you feel I have missed something, please add it below in the comments section.


MongoDB Journalling

A quick one if you are wondering about MongoDB and Journalling (especially on Debian/Ubuntu from the MongoDB repo) .

By default mongodb will generate 3gig of journal files in /var/lib/mongodb/journal this is far from ideal if you are running on a Rackspace CloudServer or smaller VPS.

There are two easy solutions:

  • If you are running as part of a ReplicaSet then you can choose to go without journalling altogether with the

    option in mongodb.conf.

  • Alternatively you can request that MongoDB uses smaller preallocated files for journalling use the

    option in mongodb.conf.

To also reduce some usage in your datafiles there is the also the

noprealloc = true

option, however I don’t see this as particularly useful considering the preallocation starts pretty small and only grows as your data does.

Mongo Benchmarking #1

These aren’t perfect benchmarks – far from it in fact – but I just wanted to get a rough idea of the relative tradeoffs between fsync and safe over normal unsafe writes…

Time taken for 10,000 inserts – (no indexes, journalling on MongoDB 2.0.1 on Debian (Rackspace CloudServer 256Meg))

default      - time taken: 0.2401921749115 seconds
fsync        - time taken: 358.55523014069 seconds
safe=true    - time taken: 1.1818060874939 seconds

Edit: and for journalling disabled and smallfiles=true in mongo.conf

default      - time taken: 0.15036606788635 seconds 
fsync        - time taken: 34.175970077515 seconds 
safe=true    - time taken: 1.0593159198761 seconds

The results aren’t perfect, but do show how big the difference is…


$mongo = new Mongo();
$db = $mongo->selectDB('bench');
$collection = new MongoCollection($db,'bench');

$start = microtime(TRUE);
for ($i=0;$i<10000;$i++) {
  $collection->insert(array('data' => sha1(rand()) ));
$end = microtime(TRUE);
echo 'default      - time taken: '.($end-$start)." seconds \n";

$start = microtime(TRUE);
for ($i=0;$i<10000;$i++) {
  $collection->insert(array('data' => sha1(rand()) ),array('fsync' => true));
$end = microtime(TRUE);
echo 'fsync        - time taken: '.($end-$start)." seconds \n";

$start = microtime(TRUE);
for ($i=0;$i<10000;$i++) {
  $collection->insert(array('data' => sha1(rand()) ),array('safe' => true));
$end = microtime(TRUE);
echo 'safe=true    - time taken: '.($end-$start)." seconds \n";

I’m not sure that the existing number of records will make a massive amount of difference besides through the pre-allocation of files which we have little control of anyway – but it doesn’t look like there is an increase between runs even when there are a lot of entries… (perhaps we’d see more with indexes enabled)

Each run will add an extra 20,000 entries into the collection with little perceptable slowdown.

root@test:/var/www/test# php bench1.php
default      - time taken: 0.53534507751465 seconds
safe=true    - time taken: 1.2793118953705 seconds
root@test:/var/www/test# php bench1.php
default      - time taken: 0.203537940979 seconds
safe=true    - time taken: 1.2887620925903 seconds
root@test:/var/www/test# php bench1.php
default      - time taken: 0.22933197021484 seconds
safe=true    - time taken: 1.6565799713135 seconds
root@test:/var/www/test# php bench1.php
default      - time taken: 0.19606184959412 seconds
safe=true    - time taken: 1.5315411090851 seconds
root@test:/var/www/test# php bench1.php
default      - time taken: 0.2510199546814 seconds
safe=true    - time taken: 1.2419080734253 seconds

It is hard testing on a cloud server as you are at the mercy of other users impacting the available bandwidth and processor utilisation, but you can at least see trends. I hope this has been enlightening and I hope to expand on this in future…

Edit: for the one person that asked me about storage efficiency… here goes…

 > db.bench.stats()
    "ns" : "bench.bench",
    "count" : 140001,
    "size" : 10640108,
    "avgObjSize" : 76.00022856979594,
    "storageSize" : 21250048,
    "numExtents" : 7,
    "nindexes" : 1,
    "lastExtentSize" : 10067968,
    "paddingFactor" : 1,
    "flags" : 1,
    "totalIndexSize" : 4562208,
    "indexSizes" : {
        "_id_" : 4562208
    "ok" : 1

So based on this… we can work out that size/storageSize = approx 50% efficiency… so MongoDB on this dataset is using about the same again for the data storage.

If we add in indexes size/(storageSize+totalIndexSize) then the result is only about 41% efficient. I think this is a reasonable tradeoff for the raw speed it gives personally…

Bash Tip #1

Often when operating on the commandline you may want to re-execute something with elevated privileges. There is a shorthand way to do this rather than either cutting and pasting or retyping the line.

mike@mike-P35C-DX3R:~$ updatedb
updatedb: can not open a temporary file for `/var/lib/mlocate/mlocate.db'
mike@mike-P35C-DX3R:~$ sudo !!
sudo updatedb

The first updatedb command needs to be ran as root, initially I forgot the sudo so it couldn’t update the database. Running it again with !! meant that I didn’t have to retype the whole command… Obviously this is more impressive with longer lines…

Nginx virtual host ordering

Just a quick note.

Nginx will include configuration files based on ASCII ordering. This means that even if you have two files default (with the default config) and cloud, cloud will be included before default and will answer for any undefined hostnames.

The easy way to solve this (and obvious if you have been around Linux for a while) is to prefix all config files with a number. For example:


This will result in 00-default being included in the configuration first.

I hope this is useful for someone out there.

Case Study: Optimising a Cloud Application

I was recently brought in to examine the infrastructure of a small startup. This wasn’t anything really special, I do it quite often for various reasons. What was different was that they didn’t have issues with scaling out particularly – they had that working well with their shared nothing web application and mongodb backend. What they were having issues with was their infrastructure costs.

I normally work on through a 6 step process that has been built up over time –

  1. Monitor and gather stats/measure the problem,
  2. Standardise on a reference architecture,
  3. Add configuration management and version control,
  4. Start to define a playbook of how to do things (like up/downscale or provision new machines and clusters) and start to automate them,
  5. Bring everything to reference architecture/consolidate underutilised servers and eliminate unused infrastructure,
  6. Consider architecture changes to make it more efficient.
  7. …and repeat

I will take you through a case study showing how this process was used to lower their monthly costs. Names and details have been changed in places to protect the guilty… ūüėČ Continue reading