Last week we took Pingdom into use at Futurice. Traditionally monitoring and uptime information is secret, but we didn't find any reason to continue with this tradition. If we are doing a good job with keeping our services up, we can be proud of it, and if not, we need to improve.
Our goal is to have only green markers in the front page - meaning 99.9% uptime with every service - but we are still pretty far from it. However, things are improving, and after a few months we'll have better historical data.
Some tests are more complex than just ping, which explains longer response times (>200ms). For example, www.futurice.com and blog.futurice.com actually load the HTML page and check that no error messages are present. At the time of writing, Berlin router shows red. It's not actually down, they just use a temporary internet connection after relocation to new premises.
Pingdom only monitors some services and network connectivity. Internally, we use Zabbix for monitoring server performance and availability. All information in Zabbix is available to every employee. It's not perfect, but it was the best alternative that filled our requirements.
One of our key principles at Futurice is openness, and we try to improve it by increasing transparency one small step at a time. With public status information, relevant information is shared to users without the need to contact the IT team. Unfortunately, Zabbix will be staying in the current "employee only" status, as it contains a great deal of confidential client information.