Premium Hosted Website & Server Monitoring Tool.

(Sysadmin / Devops blog)

visit our website

Author Archives: David Mytton

About David Mytton

David Mytton is the founder of Server Density. He has been programming in PHP and Python for over 10 years, regularly speaks about MongoDB (including running the London MongoDB User Group), co-founded the Open Rights Group and can often be found cycling in London or drinking tea in Japan. Follow him on Twitter and Google+.
  1. What’s new in Server Density – Jan 2014

    2 Comments

    Each month we’ll round up all the feature changes and improvements we made this month to our server and website monitoring product, Server Density.

    New dashboard widgets – RSS feeds, cloud status, open alerts

    Yesterday we officially announced our ops dashboard which includes a range of new widgets for our dashboard. These are all useful for displaying an overview of your entire infrastructure, e.g. on a big TV!

    Dashboard in office

    There are several new widgets available:

    Cloud status

    Choose your cloud vendor and product/region and we’ll pull in the latest status from their public status feeds. This is useful to see if any alerts are being caused by known issues in your region. We support status feeds from Amazon Web Services, Rackspace Cloud, Digital Ocean, Joyent, Google Compute, Linode, IBM SoftLayer and Microsoft Azure.

    RSS feed

    If you want to see the latest items from a generic RSS feed or a cloud provider we don’t support, you can enter the URL and we’ll pull in the latest items.

    Open alerts

    Display how many open alerts there are on your entire account, at a group level or on specific devices or service checks.

    Group alerts for service checks

    You have been able to create alerts on a group level for devices/servers for some time, but we have now extended this functionality to service web checks too. This means you only need to create the alerts once on a group level and all members of that group will inherit the alert config.

    You can configure group alerts when viewing the Alerting tab for a particular web check or by clicking the name of the group in the services list.

    Service check group alerts

    What’s next?

    We’re planning to submit our iPhone app to Apple at the end of this month and the Android app will follow shortly afterwards at the start of Feb. We’ll then be moving on to more detailed process monitoring as well as a range of improvements to the device view with better default graphs for specific metrics.

  2. Picking server hostnames

    8 Comments

    Last night we had an issue in our failover data center in San Jose, USA. We use our data center (Softlayer) internal DNS resolvers in the /etc/resolv.conf on all our servers and their resolvers went down. This didn’t cause any customer impact but meant that all the servers in that data center were unable to resolve remote hostnames.

    Our workaround for this was to change the resolvers to something else and because we’re using Puppet, we can do this very quickly across a large cluster of machines. However, we only wanted to apply the change to the SJC data center.

    Our hostname policy is fairly straightforward. An example web server is:

    hcluster3-web1.sjc.sl.serverdensity.net

    This is made up of several parts:

    • hcluster3 – this describes what the server is used for. In this case, it’s of cluster 3, which hosts our alerting and notification service (all of Server Density is built using a service orientated architecture). Other examples could be mtx2 (our time series metrics storage cluster, version 2) or sdcom (servers which power our website).
    • web1 – this is a web server (either Apache or nginx) and is number 1 in the cluster. We have multiple load balanced web servers.
    • sjc – this is the data center location code, San Jose in this case. We also have locations like wdc (Washington DC) or tyo (Tokyo).
    • sl – this is the facility vendor name, Softlayer in this case. We also have vendors like rax (Rackspace) and aws (Amazon Web Services).

    When we had a much smaller number of servers, the naming convention was based on characters in His Dark Materials by Philip Pullman. So for example, a master database server was Lyra with the slave being Pan. Picking names like this doesn’t scale after 10 or so servers but then you can transition to other things, like names of stars, lakes, rivers, etc.

    xkcd: Permanence

    xkcd: Permanence

    We moved to the current naming structure a few years ago and this now allows us to quickly identify key information about our servers but also helps when we want to filter by provider or specific locations.

    In our Puppet /etc/resolv.conf template we can then do things like:

    <% if (domain =~ /sl/) -%>
    <% if (domain =~ /sjc.sl/) -%>
    # google DNS - temp until SL fixed
    nameserver 8.8.8.8
    nameserver 8.8.4.4
    <% else %>
    # Internal Softlayer DNS
    nameserver 10.0.80.11
    nameserver 10.0.80.12
    <% end -%>
    ...

    And when it comes to using the Puppet console, we can quickly trigger puppet runs or take actions against specific nodes:

    Puppet Filtering

    How do other people do this?

    If we dissect a traceroute to the Server Density website we can pick out locations (lon, nyc, wdc) and router names:

    22/01/14 12:28:18 david@dm-mba ~: mtr serverdensity.com
     1. 10.0.0.1
     2. c.gormless.thn.aa.net.uk
     3. a.aimless.thn.aa.net.uk
     4. bbr01.lon01.networklayer.com
     5. ae1.bbr02.tl01.nyc01.networklayer.com
     6. ae7.bbr01.tl01.nyc01.networklayer.com
     7. ae1.bbr01.eq01.wdc02.networklayer.com
     8. ae0.dar02.sr01.wdc01.networklayer.com
     9. po2.fcr03.sr02.wdc01.networklayer.com
    10. gip.wdc.sl.serverdensity.net

    Our ISP uses names related to the use case with “-less” appended, because when they ordered their first piece of equipment they took the structure from the Dilbert cartoon of the day!

    Looking at how Amazon route traffic for the static website hosting feature of S3, you can even look at network maps to visualise where packets are flowing from the UK, to Osaka and then Tokyo in Japan.

    22/01/14 12:28:50 david@dm-mba ~: mtr serverdensity.jp
     1. 10.0.0.1
     2. c.gormless.thn.aa.net.uk
     3. d.aimless.thn.aa.net.uk
     4. 83.217.238.93
     5. ae-7.r23.londen03.uk.bb.gin.ntt.net
     6. as-0.r22.osakjp01.jp.bb.gin.ntt.net
     7. ae-0.r22.osakjp02.jp.bb.gin.ntt.net
     8. 61.200.91.154
     9. 27.0.0.199
    10. 27.0.0.2011
    11. 103.246.151.32
    12. 103.246.151.38
    13. s3-website-ap-northeast-1.amazonaws.com

    This works nicely for a relatively small number of servers which are not transient and have defined workloads but it could be problematic if you have a huge number of machines and/or frequently launch/destroy transient cloud instances or VMs. You can see how Amazon deals with this by setting the hostname to the internal IP address but I could see other solutions where you use an instance ID instead. I’d be interested to learn how other people are doing this.

  3. What’s new in Server Density – Dec 2013

    Leave a Comment

    Each month we’ll round up all the feature changes and improvements we made this month to our server and website monitoring product, Server Density.

    December is always a strange month because the final weeks are taken up with holidays, so we decided to focus on many smaller improvements. There were a lot of changes, bug fixes and performance improvements behind the scenes but these are the main things you might notice.

    Resizable dashboard widgets

    All dashboard widgets can now be resized to any reasonable size you want. This allows you to fit multiple graphs on the same line and reorder the widgets as you like. Simply hover over the right side bar of the widget you want to resize and click, hold and drag the widget left or right.

    Resizable dashboard

    Linux/Mac/FreeBSD agent: aggregate CPU stats & i/o stats

    The latest version of the Linux, Mac and FreeBSD Python-based monitoring agent now returns aggregated CPU stats so you can see a value for “ALL” as well as each individual CPU core. This is useful if you have many cores.

    i/o stats are also now collected on OS X with metrics for kilobytes per transfer, transfers per second and megabytes per second on disk0.

    CPU stats are also now collected on OS X too.

    These new metrics are for v2 users only and are available just by updating your agent. There are other bug fixes too, mentioned in the full release notes.

    CPU Stats

    Windows agent: i/o stats

    The new Windows agent release also introduces new stats for disk i/o: disk read/writes (bytes/s), disk % utilization, average disk queue length, disk reads/writes per second, average seconds per transfer.

    These new metrics are for v2 users only and are available just by updating your agent. There are other bug fixes too, mentioned in the full release notes.

    Windows i/o stats

    Improved MySQL replication error monitoring

    When there is an error in MySQL replication, MySQL reports the Seconds_Behind_Master status as NULL. This is now handled and reported as -1 to allow alerting.

    Updated product documentation

    With the migration of users from the v1 product, our old support site was out of date and confusing as some articles referenced v1 and some v2. We’ve now updated all the docs so they reference v2 only.

    This includes documentation on group alerts, the notification center filters, deploying the agent automatically and various time saving features like multi select.

    For those still on v1, we recommend migrating your account. We’ll be announcing the ability to have an engineer support you with migrations in 2014.

    Displaying agent version

    The currently installed agent version is now shown when viewing each device, with links through to the release notes.

    Agent version

    Improvements for small screens

    We’ve made some initial improvements to help users on smaller screen sizes e.g. laptops, to ensure the UI resizes and adjusts without wrapping.

    Resize window

    Triggered alerts highlighted

    When viewing alert configs, any config with a triggered alert will be highlighted.

    Highlight triggered

    Mb/s network traffic alerting

    We’ve supported graphing network traffic data in MB/s (megabytes per second) and Mb/s (megabits per second) for a long time but only supported alerting in MB/s. Alert configs can now be created in Mb/s too.

    What’s next?

    The main release for January will be our mobile apps for both iPhone and Android, supporting push notifications. We’re also working on detailed process level monitoring.

  4. Saving $500k per month buying your own hardware – cloud vs colocation

    15 Comments

    This post was originally published on GigaOm on 7th Dec.

    Last week I compared cloud instances against dedicated servers showing that for long running uses such as databases, it’s significantly cheaper if you do not use the cloud, but that’s not the end of it. Since you are still paying on a monthly basis then if you project the costs out 1 or 3 years, you end up paying much more than it would have cost to purchase the hardware outright. This is where buying your own hardware and colocating it becomes a better option.

    Continuing the comparison with the same specs for a long running database instance, If we price a basic Dell R415 with x2 processors each with 8 cores, 32GB RAM, a 500GB SATA system drive and a 400GB SSD, then the one time list price is around $4000 – more than half the price of the SoftLayer server at $9,468/year in the previous article.

    Dell PowerEdge R415 front

    Dell PowerEdge R415 front

    Of course, the price you pay SoftLayer includes power and bandwidth and these are fees which depend on where you locate your server. Power usage is difficult to calculate because you need to actually stress test the server to figure out the maximum draw and then run real workloads to see what your normal usage is.

    My company, Server Density, just started experimenting with running our own hardware in London. We tested our 1U Dell with very similar specs as discussed above was using 0.6A normally but stress tested with everything maxed out to 1.2A. Hosting this with the ISP who supplies our office works out at $161/month or $1932/year (it would work out cheaper to get a whole rack at a big data centre but this was just our first step).

    This makes the total annual cost look as follows:

    serverdensitychart

    Remember, again, that this is a database server so whilst with Rackspace, Amazon and SoftLayer you pay that price every year, after the first year with colocation the annual cost drops to $1932 because you already own the hardware. Further, the hardware can also be considered an asset which has tax benefits.

    Server Density is still experimenting at on small scale but I spoke to Mark Schliemann VP of Technical Operations at Moz.com because they run a hybrid environment. They recently moved the majority of their environment off AWS and into a colo facility with Nimbix but are still using AWS for processing batch jobs (the perfect use case for elastic cloud resources).

    Moz worked on detailed cost comparisons to factor in the cost of the hardware leases (routers, switches, firewalls, load balancers, SAN/NAS storage & VPN), virtualization platforms, misc software, monitoring software/services, connectivity/bandwidth, vendor support, colo and even travel costs. Using this to calculate their per server costs means on AWS they would spend $3,200/m vs $668/m with their own hardware. Projecting out 1 year results in costs of $8,096 vs AWS at $38,400.

    Moz’s goal for the end of Q1 2014 is to be paying $173,000/month for their own environment plus $100,000/month for elastic AWS cloud usage. If they remained entirely on AWS it would work out at $842,000/month.

    Optimizing utilization is much more difficult on the cloud because of the fixed instance sizes. Moz found they were much more efficient running their own systems virtualized because they could create the exact instance sizes they needed. Cloud providers often increase CPU allocation alongside memory when in real world uses you tend to need one or the other. Running your own environment allows you to optimize this and was one of the big areas Moz have used to improve their utilization. This has helped them become much more efficient with spend.

    Right now we are able to demonstrate that our colo is about 1/5th the cost of Amazon but with RAM upgrades to our servers to increase capacity we are confident we can drive this down to something closer to 1/7th the cost of Amazon.

    Colocation has its benefits once you’re established

    Colocation looks like a winner but there are some important caveats:

    • First and foremost, you need in-house expertise because you need to build and rack your own equipment and design the network. Networking hardware can be expensive and if things go wrong, you need to have the knowledge about how to deal with the problem. This can involve support contracts with vendors and/or training your own staff. However, this does not usually require hiring new people because the same team that has to deal with cloud architecture, redundancy, failover, APIs, programming, etc, can work on the ops side of things running your own environment.
    • The data centers chosen have to be easily accessible 24/7 because you may need to visit at unusual times. This means having people on-call and available to travel, or paying remote hands at the data center high hourly fees to fix things.
    • You have to purchase the equipment upfront which means large capital outlay but this can be mitigated by leasing.

    So what does this mean for the cloud? On a pure cost basis, buying your own hardware and colocating it is significantly cheaper. Many will say that the real cost is hidden with staffing requirements but that’s not the case because you still need a technical team to build your cloud infrastructure.

    At a basic level, compute and storage are commodities. The way the cloud providers differentiate is with supporting services. Amazon has been able to iterate very quickly on innovative features, offering a range of supporting products like DNS, mail, queuing, databases, auto scaling and the like. Rackspace has been slower to do this but is now starting to offer similar features.

    Flexibility of cloud needs to be highlighted again too. Once you buy hardware you’re stuck with it for the long term but the point of the example above was that you had a known workload.

    Considering the hybrid model

    Perhaps a hybrid model makes sense, then? This is where I believe a good middle ground is and we can see Moz making good use of such a model. You can service your known workloads with dedicated servers and then connect to the public cloud when you need extra flexibility. Data centers like Equinix offer Direct Connect services into the big cloud providers for this very reason, and SoftLayer offers its own public cloud to go alongside dedicated instances. Rackspace is placing bets in all camps with public cloud, traditional managed hosting, a hybrid of the two and support services for OpenStack.

    And when should you consider switching? Dell(s dell) cloud exec Nnamdi Orakwue said companies often start looking at alternatives when their monthly AWS bill hits $50,000 but is even this too high?

  5. Cloud vs dedicated pricing – which is cheaper?

    Leave a Comment

    This post was originally published on GigaOm on 29th Nov.

    Using cloud infrastructure is the natural starting point for any new project because it’s one of the ideal use cases for cloud infrastructure – where you have unknown requirements; the other being where you need elasticity to run workloads for short periods at large scale, or handle traffic spikes. The problem comes months later when you know your baseline resource requirements.

    Let’s consider a high throughput database as an example. Most web applications have a database storing customer information behind the scenes but whatever the project, requirements are very similar – you need a lot of memory and high performance disk I/O.

    Evaluating pure cloud

    Looking at the costs for a single instance illustrates the requirements. In the real world you would need multiple instances for redundancy and replication but will just work with a single instance for now:

    Amazon EC2 c3.4xlarge (we can’t consider m2.2xlarge because it is not SSD backed)
    = 30GB RAM, 320GB SSD storage
    = $1.20/hr or $3726 + $0.298/hr heavy utilization reserved

    Rackspace Cloud 30GB Performance
    = 30GB RAM, 300GB SSD storage
    = $1.36/hr

    Databases also tend to exist for a long time and so don’t generally fit into the elastic model. This means you can’t take advantage of the hourly or minute based pricing that makes cloud infrastructure cheap in short bursts.

    So extend those costs on an annual basis:

    Amazon EC2 c3.4xlarge heavy utilization reserved
    = $3,726 + ($0.298 * 24 * 365)
    = $6,336.48
    Rackspace Cloud 30GB Performance
    = $1.36 * 24 * 365
    = $11,913.60

    Another issue with databases is they tend not to behave nicely if you’re contending for I/O on a busy host so both Rackspace and Amazon let you pay for dedicated instances — on Amazon this has a separate fee structure and on Rackspace you effectively have to get their largest instance type. Calculating those costs out for our annual database instance would look like this:

    Amazon EC2 c3.4xlarge dedicated heavy utilization reserved
    = $4099 + ($0.328 + $2.00) * 24 * 365
    = $24,492.28
    Rackspace Cloud 120GB Performance
    = $5.44 * 24 * 365
    = $47,654.40

    (The extra $2 per hour on EC2 is charged once per region)

    Note that because we have to go for the largest Rackspace instance, the comparison isn’t direct — you’re paying Rackspace for 120GB RAM and x4 300GB SSDs. On one hand this isn’t a fair comparison because the specs are entirely different but on the other hand, Rackspace doesn’t have the flexibility to give you a dedicated 30GB instance.

    Consider the dedicated hardware option…

    Given the annual cost of these instances, the next logical step is to consider dedicated hardware where you rent the resources and the provider is responsible for upkeep. Here at Server Density, we use Softlayer, now owned by IBM, and have dedicated hardware for our database nodes. IBM is becoming very competitive with Amazon and Rackspace so let’s add a similarly spec’d dedicated server from SoftLayer, at list prices:

    To match a similar spec we can choose the Dual Processor Hex Core Xeon 2620 – 2.0Ghz Sandy Bridge with 32GB RAM, 32GB system disk and 400GB secondary disk. This costs $789/month or $9,468/year. This is 80 percent cheaper than Rackspace and 61 percent cheaper than Amazon before you add data transfer costs – SoftLayer includes 5,000GB of data transfer per month which would cost $600/month on both Amazon and Rackspace, a saving of $7200/yearly.

    … or buy your own

    There is another step you can take as you continue to grow — purchasing your own hardware and renting data center space i.e. colocation. We’ll look into the tradeoffs on that scenario in a post to come so make sure you subscribe.

  6. Measuring load against temperature

    Leave a Comment

    We’ve had our first colocated server running live for just over a week now and have written some custom plugins for Server Density to help us monitor hardware metrics like temperature and power draw. We’ll be releasing these in our plugin directory shortly (there are already 2 plugins for Dell temperature monitoring and Dell fan monitoring) but there are some fun, interesting initial observations regarding load and temperature.

    Anyone who has ever run a full screen flash video or Google Hangouts knows that they are good stress tests for your CPU, and after a while the fans start spinning as the CPU usage causes temperatures to increase.

    Now we’re graphing that, we can actually see and prove the direct correlation between CPU load and temperature. Look how closely they match each other:

    Load against temperature

    Load against temp last week

    Of course this also means more power is being drawn and we can see a similar correlation between load and power usage:

    Load against power

    Temperature against power

    These graphs are cool to look at, and there’s a purpose behind monitoring metrics of this kind – making sure they stay within acceptable ranges. We want to know if temperatures suddenly go up (could indicate a failed fan or a data center issue), if power suddenly drops to zero on one PSU (again, failure of that PSU) and same for fan speed if it drops too low.

    For more information on how you could start monitoring metrics like this, it’s worth trying out our hosted server monitoring software. After all, it’s free for 15 days!

  7. What’s new in Server Density – Nov 2013

    Leave a Comment

    Each month we’ll round up all the feature changes and improvements we made this month to our server and website monitoring product, Server Density.

    Dashboard and custom graph builder

    At the start of the month we released the ability to create custom graphs by combining any metric from any device or service check. These graphs can then be arranged onto custom dashboards alongside status widgets for service checks, showing uptime, response time and status.

    Full details are in the announcement post and you can read more about the frontend engineering behind the project, which was based on a number of design principles.

    hosted server monitoring dashboard

    Improved notification center filtering

    The notification center allows you to see all open and closed alerts across your account. Previously the red sign that shows the count of open alerts would be a global count across your whole account. You can now toggle a filter to have this only count alerts where you are a recipient so the count only reflects the alerts you care about.

    There’s also a new toggle on the far right which allows you to disable or enable the context sensitive nature of the notification center. As you browse the app the notification center will change to show only device or service specific alerts depending on where you are in the app. This toggle allows you to disable this if you always want to see the global view.

    Notification center filters

    Postbacks API

    You can now post back data to us through the Server Density v2 API without needing to install the agent. Many customers used this in Server Density v1 to send data through from custom devices, scripts or other areas of their infrastructure. The API documentation has been updated to expose this new method.

    Users API

    The Server Density API has been updated to add user management so you can add, edit and delete users on your account.

    New pricing packages

    We found were were often getting requests for packages inbetween the old server and web check quotas of our 1, 10, 50 and 100 packages. As such, we’ve added new packages inbetween at 2, 5, 25 and 75. Pricing is mostly the same but has been slightly decreased or increased to make the jumps logical. Existing customers remain on their existing packages with no changes to prices but you can just switch to the new ones from within your account.

    Server Density pricing packages

    Increased free SMS alerts

    We are now offering more free SMS credits for each pricing package. The limits have been set high enough to make them effectively unlimited, although there is still a cap which is based on the package you are on. You can see the new included SMS credits here.

    For existing customers the increased SMS credits will be automatically applied to your account on your next billing date.

    Geckoboard integration

    Server Density v2 is now integrated into Geckoboard so you can pull out stats into your existing Geckoboard dashboards. It supports pulling graphs and current values as well as service web check monitoring response times and statuses.

    Server Density Geckoboard

    Flexiant integration

    The Flexiant cloud platform now has an official plugin in their latest release which enables you to easily add, remove or change the status of servers being monitored automatically as you make the same changes inside your Flexiant platform. Flexiant is a cloud management platform which gives service providers, telcos and others the ability to create and sell cloud services.

    Flexiant

    What’s next?

    We’ve started work on our mobile apps for iPhone and Android so you can get push notification alerts and manage your notifications from your device. These will be out in January.

  8. Building your own Chaos Monkey

    4 Comments

    Back in 2012, Netflix released perhaps the most famous component from their range of tools that help them run their cloud environment on Amazon Web Services – the Chaos Monkey – something they had been using since at least 2010.

    Chaos Monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group.

    The idea is to randomly kill parts of your infrastructure to check the redundancy of components and ensure you can handle failover.

    We decided to build our own lightweight version as a simple Python script because we don’t use AWS or Java. When doing this, or using the actual Netflix code, there are some considerations to keep in mind:

    Design principles

    Trigger chaos events during business hours

    Although real failures can happen at any time, you want deliberate failures to happen when people are around to a) respond to them and b) fix them. It’s not fair to be waking people up with unnecessary on-call events in the middle of the night! Consider what business hours are not just for you but your entire team, especially if you have remote workers or several offices in different timezones.

    Decide what level of mystery you want

    When our script triggers a chaos event, it posts into our Hipchat room saying it has done so, but not what the details are. This partially simulates a real outage because in the initial stages you need to triage the alerts to see where the failures are but it means that everyone knows to look out for strange things. This prevents issues going unnoticed so you can see if you need to improve your monitoring not by being told stuff is broken by customers, but by discovering the results of a known chaos event.

    Have several failure modes

    Killing instances is just one way to simulate failure but doesn’t cover all possible options. It’s good to try and simulate as many different complete or partial failures as possible. We use the Softlayer API to trigger server power downs but also use their API to disable public and/or private networking interfaces too. This gives you full failure with a power off but also a network failure mode where the host still remains up (and may even still report up to your monitoring).

    Don’t trigger sequential events

    After one chaos event you don’t want to have to deal with another one just a short time later, especially if the bugs discovered aren’t fixed yet. Have a wait period so after triggering one event another one won’t be triggered for a few hours. You don’t want people constantly firefighting.

    Play around with the event probability

    Events should be infrequent and random and there may be none triggered for several days. This helps to test your on-call response to keep the unexpected nature of these kinds of events real.

    Initial findings

    None of the issues that we discovered from these chaos events were in the server level software – failover in load balancers (Nginx) and databases (MongoDB), for example, works very well. Every problem we have found has been in our own code, mostly in how the code interacts with databases in failover mode and mostly in libraries we’ve not written. This has allowed us to report bugs upstream and improve the resiliancy of our own software but does require some engineering time and effort to get right.

    Using a Chaos Monkey is really the only way to test how your infrastructure will behave under unknown failure conditions. Failure will happen so you have to engineer around it – doing so in a theoretical way only goes so far and the only test is to trigger random events in the real world.

  9. Custom graphs and dashboards in Server Density

    Leave a Comment

    Today we have released the ability to create custom graphs by combining any metric from any device or service check. These graphs can then be arranged onto custom dashboards alongside status widgets for service checks, showing uptime, response time and status.

    hosted server monitoring dashboard

    Custom graphs allow you to compare metrics across multiple servers and design the graphs you need for troubleshooting or monitoring metrics across clusters. Design your own graphs with any metric, including plugins and with the choice of which axis to plot the line on.

    These can then be placed onto dashboards designed to be viewed on a big TV or separate display, so you can see the status at a glance. You can create multiple dashboards across multiple time ranges, all shared within your account.

    This is available now for all Server Density v2 accounts. Initial dashboard widgets are service check status, uptime and response time plus graphs – we want to hear what other widgets you’d like to see next!

    Read about how we designed the graphs in our previous post.

    Custom graph builder - pick the metrics you want to graph.

    Custom graph builder – pick the metrics you want to graph.

    Pick metrics across any device or service check to graph, and choose which axis to plot

    Pick metrics across any device or service check to graph, and choose which axis to plot.

  10. What’s new in Server Density – Oct 2013

    3 Comments

    Each month we’ll round up all the feature changes and improvements we made this month to our server and website monitoring product, Server Density.

    Global notification center

    An expandable right hand panel reveals the new notification center which gives you a global view of alerting on your account. You can see all open alerts across devices and service checks, filtering by specific group, device, service or whether the alert is notifying you or across the whole account. It’s context sensitive so changes depending on what your current view is and allows you to view the alert history for closed alerts.

    It will also display any errors for your cloud servers, such as failing to start cloud instances.

    Notification center

    This will form the basis for our upcoming mobile apps, which will initially focus on alerts before expanding to all functionality.

    Improved service monitoring accuracy

    We received a number of reports of service monitor web checks thinking services were down or timing out because the monitoring nodes were very sensitive to network issues. We’ve pushed out changes to fix this so you will see any issues of false positives resolved, meaning no incorrect alerts saying sites are down when they are up caused by transient timeouts. The check methodology has been changed to retry failed requests within a few seconds to verify they are actually down.

    Left / right graph axis

    You can choose which axis each graph series will be plotted, which allows you to view series with different scales or units on the same graph. This makes it easier to compare metrics which have very different values and still be able to see the spikes.

    Left right graph axis

    This is part of new base functionality released ready for our custom graph builder and dashboard feature, due at the start of November.

    Group level “no data received” alerts

    You can now set up “no data received” alerts on a group level rather than needing to do it on individual servers. This was originally postponed as it requires active checking on a regular schedule across all members of a group, which is more complex to implement than on individual devices.

    v1 migrations

    All v1 users can now self migrate their own accounts to v2 by clicking the tab in-app. This will migrate all settings but still let you use v1 alongside v2, so as not to affect production monitoring. Full details are here.

    What’s next?

    At the start of November we’ll be releasing our custom graph builder and dashboards to allow you to create graphs combining metrics across devices and service checks, plus creating custom dashboards to display your metrics and graphs.