Tag Archives: linux

Mongo Benchmarking #1

These aren’t perfect benchmarks – far from it in fact – but I just wanted to get a rough idea of the relative tradeoffs between fsync and safe over normal unsafe writes…

Time taken for 10,000 inserts – (no indexes, journalling on MongoDB 2.0.1 on Debian (Rackspace CloudServer 256Meg))

default      - time taken: 0.2401921749115 seconds
fsync        - time taken: 358.55523014069 seconds
safe=true    - time taken: 1.1818060874939 seconds

Edit: and for journalling disabled and smallfiles=true in mongo.conf

default      - time taken: 0.15036606788635 seconds 
fsync        - time taken: 34.175970077515 seconds 
safe=true    - time taken: 1.0593159198761 seconds

The results aren’t perfect, but do show how big the difference is…

Source:

<?php
$mongo = new Mongo();
$db = $mongo->selectDB('bench');
$collection = new MongoCollection($db,'bench');

$start = microtime(TRUE);
for ($i=0;$i<10000;$i++) {
  $collection->insert(array('data' => sha1(rand()) ));
}
$end = microtime(TRUE);
echo 'default      - time taken: '.($end-$start)." seconds \n";

$start = microtime(TRUE);
for ($i=0;$i<10000;$i++) {
  $collection->insert(array('data' => sha1(rand()) ),array('fsync' => true));
}
$end = microtime(TRUE);
echo 'fsync        - time taken: '.($end-$start)." seconds \n";

$start = microtime(TRUE);
for ($i=0;$i<10000;$i++) {
  $collection->insert(array('data' => sha1(rand()) ),array('safe' => true));
}
$end = microtime(TRUE);
echo 'safe=true    - time taken: '.($end-$start)." seconds \n";
?>

I’m not sure that the existing number of records will make a massive amount of difference besides through the pre-allocation of files which we have little control of anyway – but it doesn’t look like there is an increase between runs even when there are a lot of entries… (perhaps we’d see more with indexes enabled)

Each run will add an extra 20,000 entries into the collection with little perceptable slowdown.

root@test:/var/www/test# php bench1.php
default      - time taken: 0.53534507751465 seconds
safe=true    - time taken: 1.2793118953705 seconds
root@test:/var/www/test# php bench1.php
default      - time taken: 0.203537940979 seconds
safe=true    - time taken: 1.2887620925903 seconds
root@test:/var/www/test# php bench1.php
default      - time taken: 0.22933197021484 seconds
safe=true    - time taken: 1.6565799713135 seconds
root@test:/var/www/test# php bench1.php
default      - time taken: 0.19606184959412 seconds
safe=true    - time taken: 1.5315411090851 seconds
root@test:/var/www/test# php bench1.php
default      - time taken: 0.2510199546814 seconds
safe=true    - time taken: 1.2419080734253 seconds

It is hard testing on a cloud server as you are at the mercy of other users impacting the available bandwidth and processor utilisation, but you can at least see trends. I hope this has been enlightening and I hope to expand on this in future…

Edit: for the one person that asked me about storage efficiency… here goes…

 > db.bench.stats()
{
    "ns" : "bench.bench",
    "count" : 140001,
    "size" : 10640108,
    "avgObjSize" : 76.00022856979594,
    "storageSize" : 21250048,
    "numExtents" : 7,
    "nindexes" : 1,
    "lastExtentSize" : 10067968,
    "paddingFactor" : 1,
    "flags" : 1,
    "totalIndexSize" : 4562208,
    "indexSizes" : {
        "_id_" : 4562208
    },
    "ok" : 1
}

So based on this… we can work out that size/storageSize = approx 50% efficiency… so MongoDB on this dataset is using about the same again for the data storage.

If we add in indexes size/(storageSize+totalIndexSize) then the result is only about 41% efficient. I think this is a reasonable tradeoff for the raw speed it gives personally…

Bash Tip #1

Often when operating on the commandline you may want to re-execute something with elevated privileges. There is a shorthand way to do this rather than either cutting and pasting or retyping the line.

mike@mike-P35C-DX3R:~$ updatedb
updatedb: can not open a temporary file for `/var/lib/mlocate/mlocate.db'
mike@mike-P35C-DX3R:~$ sudo !!
sudo updatedb

The first updatedb command needs to be ran as root, initially I forgot the sudo so it couldn’t update the database. Running it again with !! meant that I didn’t have to retype the whole command… Obviously this is more impressive with longer lines…

Nginx virtual host ordering

Just a quick note.

Nginx will include configuration files based on ASCII ordering. This means that even if you have two files default (with the default config) and cloud, cloud will be included before default and will answer for any undefined hostnames.

The easy way to solve this (and obvious if you have been around Linux for a while) is to prefix all config files with a number. For example:

00-default
10-wildcard.example.com
10-wildcard.foo.com

This will result in 00-default being included in the configuration first.

I hope this is useful for someone out there.