Tom Peruzzi's thoughts on digital, innovation, IT and operations

DevOps as the solution?

Posted in organizational by opstakes on October 18, 2010

I got more and more info regarding DevOps and how good it is within the last weeks. I even started posting at some of their blogs and during my first steps I really liked it, it looked like being a good approach to keep on driving the idea of an operational platform. Nothing new, but another good driver for bringing Operations as a discipline of operating and engineering upfront.

The more I think on that the more I believe that this is just another approach and it will take a much longer time and much more approaches like DevOps to convert Operations from a barrier to a driver within tons of organizations.

You disagree? See why: In the past we have seen some very interesting scenarios. One – very long supported by all departments within companies – was to see IT operators as the barrier of truth. Whoever and however you survived talking to them ,you were a hero. Introducing new functionality was more or less incredible and they – the IT Operators – always believed that they save business’ live by acting as a barrier for innovation. Even keeping things slow was king to them.

The other typical operations department was a little bit more open as they were seen as the IT engineers unable to write code. So those 3rd class employees needed work and why not acting on the simple infrastructure basis? They really did not like Development and you know why … they developed their own style, their own culture and propably this was not the intention of the business. They tried to establish themselve as the better IT within the IT …

Those 2 artefacts really need DevOps or alternatives like our Platform Engineering, dialects of ITIL or others. The question is how to transform from barrier thinking to business enabling.

If you understand operations as the fundament of business than you definitely need strong, accepted and rich of engineers operations, not barrier minded, not 3rd class developers, not people stopping invention during mainframe area. Those will neither accept DevOps nor anything else, everything was already there and they know when and why …

Beside DevOps think more not on the tool (DevOps may be one), think on how you can transform your IT opps organization to a living one, being well accepted with good and strong communications, transparent operations and KPI willingness. By starting the cultural change you potentially will result in DevOps, but DevOps itself will not result your organization in good communications, transparen ops …

Tagged with: ,

On Demand FTEs

Posted in general failures, organizational, startup failures by opstakes on May 12, 2010

You definitely know that story: You came in on a smooth Monday morning and the first you hear is a lack of resources. But why? How could it happen that you are running out of resources over weekend? Private accidents? One of your technicians felt in love with a girl and will not come back? Or is it just the simple fact that project Nr 30 has kicked off? (and indeed 10 of them are prio 1 :-))

So how do you want to plan scenarios like that? Whenever you build a new resource (or hire, maybe the better word) you should keep in mind to build resources internally you will need on a long term approach. So ask the following:

  1. Is the resource for day2day operations?
  2. Is the FTE still needed after successful close of project X and deprovisioning of current platform?
  3. How do you secure to take over knowledge after the external FTE has gone?
  4. Is it easier/faster to get that special skill/knowhow via a service or do we really talk about FTE?
  5. Is it in or out of budget and how to handle?
  6. How fast will you get that person, is it within demand time?
  7. Can you upgrade internally?

I want to point out one aspect, the knowledge transfer: Whenever you get in an external, you should be able to manage him more or less the same way you manage your employees. An external is as good as he gets direction, empathy and loyalty, even if you need him/her for a longer time.

Second point: Keep in mind that the external will leave the company after day X. So prepare your organization to take over the knowhow, transfer the experience within appropriate time. Otherwise you will not be able to successfully deprovision the external party. Lack of expertise, lack of knowhow after the external has left will directly fall back to you as the internally responsible person.

I tend to say put a minimum 50 % of all your externals within the existing tasks and support your internal FTEs to bring themselve upfront onto the new technology. After the switch only 50% of your current stuff has to be “upgraded”. This can be done by your already transformed employees so operations and the new fancy stuff is well organized and understood by your employees.

I know that it is quite hard to bring in externals under pressure and on the one hand side bring them upfront on the old – to be replaced – services and on the other hand use the same employees which train/support the externals to bring a new project up and running.  The more the less you will reach the goal. Maybe not the first time, maybe even not the second or third time, but your employees will learn to live with permanent transformation and if they understand the message the next project will survive and knowhow transfer as a core rule for bringing in “on demand FTEs” will be accepted.

Ops is only in focus if it fails

Posted in 1, BCM, organizational by opstakes on April 19, 2010

You know that story, you work as good as you are and you will never get that funky stuff until your apps get public attention. So how do you mostly get attention? By failure as nearly all major projects of operations only affect people (in a bad manner) if something wrong happens. (a user will never see any change in his day to day live by getting a new IP address, but what happens if exactly that IP isn’t working …., just a silly change of an address with potential fatal attraction).

So is this either good or bad to only get money on demand? How does business think? Definitily, you – as an IT leader – has plenty of options to inverent but mainly two:

  • accept and push your budget
  • start building awareness via BCM or quality improvement programmes

The first one is the more reactive, you ask for more money, you don’t get it. If anything fails you always have the option to push back and say something like “I asked you for more money to mitigate those risks but I did not get it …” So your direct Head is responsible for your perosnal fiasco …

The second is much more aggressive. You try to offer solutions for potential scenarios, you interact directly with all related departmens and they will start putting pressure on the CEO to get money to mitigate/diminish their business risks (you hear, it is no longer an IT risk, it has changed to a business risk) You drive the BCM initiative and get the money via the involved departments, everybody happy but:

  • BCM never stops, don’t forget to block money/resources in your budget for the upcoming years
  • after a BCM initiative is before a BCM initiative, risks/business changes and you should discuss those changes in an open minded environment.
  • a failure after such an initiative could potentially risk your job, so be aware to make sure that you get what you promised!

There will always be some risks you never can mitigate that easy, but most of them should fit into a BCM initiative.

So how to start? Talk to the departmens, understand their demands, get their OK for BCM and start your BCM lobbying parties … after a while make a good presentation at C* level and declare how and when and what happens, if BCM will not start. Sounds easy, is easy, but it will need you and the awareness of all involved departments. It is not a fight IT versus board, it is the “we make it better” party!

But what to do if it still fails? Open dialogue, fresh information about your plan to react and a fast closing project after you have installed some work arounds are the only way to get back respect and trust.

Deliver and own services

Posted in general failures, organizational, technical by opstakes on February 1, 2010

Mostly technicians seem to have an incredible understanding about service delivery. For them this means that they own and control the whole delivery chain, beginning by each stored bit and byte going over to the databases, the apps, the network, the associated (and hopefully existing) security, the frontend, the user training and and and … and if possible please forget documentation, we know what we do 🙂

But world changes, even now we stand on another step forward within a realy service oriented, clouded environment and the more you think about clouds, the more you have to dematerialize service delivery. It is not a bunch of servers connected with a bunch of network devices, secured with a bunch of security appliances which creates the service, the service is much more and the goal of modern IT should not be to deliver hardware-related stuff to non IT staff. For them IT does not matter (btw. thanks to the great book, Nicolas Carr) they just want to use. And non IT thinks different, they think – as we intend to say – emotional not rational about IT; either it works appropriate and the service desk is OK or it is not sufficient delivered. And they think in terms of economy.

An IT service delivered to the non IT people should be competitive in terms of service and pricing and it should be interoperable and portable. As we know, lots of offers out their try to do so and making a deeper view into it offers incredible stuff …

And what happens now? All the cloud offers, IaaS, PaaS, SaaS, internal and external, private, enterprise or public, shortly the clouds offer new innovative services with much more speed, power, resources and economies of scale.

Why should I continue maintaining my own hardware, software … if I do commodity stuff? It will be more expensive, more to integrate, more to maintain … so my resources are secured but nobody knows for how long.

Right now there exist only a few real issues for not going to a cloud:

  • cloud to cloud data exchange is still lacking true interoperable and useable security
  • the right size, if you have reached a size where you can gain profit from the top discounts too the gap within money will be closed.
  • Real-Time, if you need Real Time you will have to build it for yourself (now)
  • compliance: potentially, especially in the financial industry you will not be allowed to move your user data out of the computing country.

So the goal or mission of the IT of the (near) future will be, to aggregate the service delivery which is spread over the world. The IT will take care of

  • interoperability
  • portability
  • first level
  • combined service catalogue
  • economics

If so, what should you change today? Yes maybe ITIL, but ITIL should be no more than it is, a goo practice. Use, what’s useable for you but think about your service definition and how to get that deep into the organization that you know which IT is needed and why. Acting as an account manager and understanding the own company as the key customer would potentially help. Leave IT behind, think in solutions and services and deliver them in time and with appropriate objectives (nobody asks who delivers, so if it looks like you but it comes trom anywhere else ….)

So please start thinking about the tremendous change what will happen in the near future and don’t repeat the last 20 years standard opstake: you can’t deliver and own all services within a more and more complex IT world.

SLA Mania

Posted in general failures, ITSM, organizational by opstakes on January 21, 2010

Even if processes do not or only fragmented exist, people start writing weird SLAs with implicit and explicit definitions of services without any service catalogue, portfolio or even without any understanding of what differs between infrastructure, app and a service …

What happens then? They start to build their opinion like “Hey, we have an SLA framework … ” or “Hey, we have a service catalogue ..” Asking them about how services are defined, build, maintained, monitored and reported results in answers like “this ist still done on a manual and on demand basis … ” or “… we are working on that topic …” or “hey dude, we still have to go a long way, keep on waiting …”

To be clear, thinking about SLAs is quite important to the organization and defining a structured aproach (aka SLA framework) is a really good way to go but:

  • if your service support is still foggy, a major SLA part will be foggy too
  • if your ownerships are not defined, a major SLA part will be undefined
  • if you do not have any process about what and why is an offered service, you will define either customers wishlist or a very technical perspective

We will be able to list 100 or more additional blamings, but it is not our goal to do so. The goal should be the answer on what’s a better or more appropriate way?

I would suggest the following:

  • Know your Service Support processes
  • Get an Service Responsible/Representative in
  • Know what you will be able to monitor and report
  • Define your Service level internally, know what you are able to deliver (aka OLA)
  • If possible show numbers, eco values mostly help by talking to your (even internal) customer
  • Only define achievable goals/metrices
  • Only define countable goals/metrices, at best they should be monitored and reported automatically by systems/machines

Start by defining the most valuable (business) processes for you customer and always think in terms of your customer and his potential end-2-end view. And  – maybe the essential point – communicate the SLA to all engineers which are needed to run the service. Defining is step1, running is 2 and running is king!

You can use frameworks like ITIL, CobiT and others to get an understanding of what and how you should define within an SLA and how the process can look like (how to come to a service portfolio, catalogue and the associated SLAs) but potentially this will be pretty to much at first step and business pressure will not allow you to run a one-year-project just for a bunch of SLAs.

So keep in mind, go as defined as possible, know what you can deliver and how to measure, agree on internally first, know your support processes, escalations and at last go to your customer. Assure, that your customer is familiar with those processes too and start defining the mutual understanding of the service (the SLA). At end communicate the results to your IT department and start implementing. And don’t forget to monitor not only the values, monitor the SLA too… (is it still valid, useful, anything to define better …)

All the best with your SLAs!

Test is evil

Posted in organizational, technical by opstakes on January 11, 2010

It does not depend on whether we talk about testing for software, hardware, configuration all of them or processes. You will either have no/less time, resources, money or the will to do so. Interesting to see that the more they talk about why they cannot test, the more they misuse time in talking about it.

And QA/test is part of the management team. If you don’t live QA and testing, why should your employees then do so? If you don’t give them time to make all the houskeeping stuff and keep code clean and tidy why should they use their extratime for doing so?

2 things on testing, do it whenever you change anything relevant for you, your IT, your department, your business! Second: If you test think about how and why? It does not make as much sense to test the resilience of a known working cluster instead of testing the new code and how business processes are built in.

This leads me to the last point, the acceptance criteria. If you start a project, think about how the desired goal should look like. I don’t talk about the gui, I talk about the functionality. At end test, whether functionality is met or not. It is very often really hard to find correct numbers and values for acceptance criterias but the longer you run your QA, the better you will be.

QA consists of many more, an adequate test environment (near live), systems and routines, unit testing ,functional testing, integrational testing, load testing and and and. But if you don’t start, you will never come to that point of thinking about which test you need when and why.

Continual improvement is the key to successful QA!


Posted in organizational, startup failures by opstakes on December 22, 2009

Yes you know, you will need them. You will need that crazy guys talking about systems, services, processes and tasks you potentially have never heard about before. And yes, they will let you know who the Know How owner is. And yes, you will need them cause:

  • your Know How can only be scaled up if you have people which already transformed an idea to a working business.
  • your Know How can only be pushed onto next level if you have people in place which have already seen the next level.
  • there is compliance demand out there and you need people who know how to work with.
  • and you will need technical, organizational and even cultural guidance.

This should never be an anthem on old, hard working before, employees. Seniority means that those people have seen different company cultures, different levels of expertise and they know how to transform. Potentially 3 years are enough, maybe 20 or more maybe they never reach seniority level. It really depends on the person, it’s personality, attitude, behaviour and culture (those famous ABC).

You will not need hundreds of Seniors, maybe one or two but you will have to enable them to transfer their knowledge to the teams and to you.
So keep in mind, even if you are the founder, funder, owner or whatever CxO, you will need to have an ear for them, their ideas, their expertise. If you can enable them, they will enable your organization based on the values and objectives you showed them as being important for your business model.
Not having Seniors means trying to create the wheel an extra time and loosing time, resources, nerves and money. Young rebels are good, focused your rebels make the business.

Tagged with: ,

We know it better

Posted in BCM, ISMS, ITSM, KnowHow, Skill, startup failures, technical by opstakes on November 17, 2009

Potentially yes, but how to be sure? It is of tremendous importance that a growing organization knows what it is able to deliver and how to get additional knowhow/expertise/resource on board. This is not a prayer for externals but:

  • understaffed organizations fail on a long term view
  • there is no more space for innovation, thus you stop developing you, your employees, your organization
  • Peaks should be offloaded, innovation stream not be broken
  • you cannot know everything the way the market does

So keep in mind, that you should only run your strategic stuff and if you start innovation, bring the market experience in. This does not mean that external partners do all the work. Where necessary, they should support, assist, coach you, but you are the one who knows your business best and if you bring in externals, do not forget to manage them. They are just people, maybe good ones with special expertise, but they are people like your employees are, the need order, direction and management. An uncontrolled external is a potential high risk for your organization as he is seen as a special knohow carrier and he can not only break your project, he can damage your organization too.

So to summit, two things to say: Bring in external partners where necessary, nobody expects you and your org to know everything better than the market. And last manage partners to deliver successful and in-time projects.

BCM is not Infrastructure

Posted in BCM by opstakes on November 16, 2009

It is all time the same. Nobody know why, but the myth of “we must have bcm” is in everyone’s ear within IT department. First enthusiasts start reading what BCM (Business Continuity Management) should look like and immediately the first failure begins:

  • BCM is a Business topic
  • IT should be invited to join business wide BCM and not vice versa
  • BCM is a process driven topic, not infrastructure

So BCM is much more a company wide umbrella and as IT Service Continuity Management in ITIL v2 already claims, the mission of the IT is to support the Business Continuity, not to run, drive it.

What we often see ist that there is often seen a direct link between resilience, high availability and BCM. Yes, at the end, all those kinds of providing a more robust service will likely be part of the mitigation plan of the BCM initiative but keep in mind that BCM for IT is:

  • Define business critical processes
  • Define technical services supporting those
  • Assess and analyse risks assiociated to those services
  • Define action plan for those risks (accept, mitigate ,delegate, solve)
  • Define operations handbook and declare how desaster process is invoked by ops stuff
  • ….

Reading those lines shows that only the mitigation/solve in the risk plan is a real “technology” part.

Definitely it is always a good idea to think about risks and how to mitigate, solve or whatever. BCM is much more a method to drive your thoughts/ideas to the mission critical services, nearly nobody will need a resilient lab 🙂 in terms of a desaster (except you are a research organization). So focus on your value supporting technology and do your best and stop thinking in a tech-way only, it is the business which should be supported, not the nice blinking light in your Rack ^^

Tagged with: , , , ,

2 people are a NOC

Posted in organizational, startup failures by opstakes on November 10, 2009

Despite the fact that people cost money you will never be able to run a commercial 24/7 site with just 2 people in a secure and safe 24/7 way. On the one hand side you burn your employeees on the other hand side you will potentially break local law regarding workers rights.

And, to be honest, If you want to run your application you have to think about what the work of an NOC will be, is it?

  • monitoring, remediation
  • reporting, escalations

or even?

  • application support
  • ops tool maintainance
  • tasks like backup, rollback
  • qa topics?

the more you think about the more you will come to the conclusion that an active NOC can be a major advantage for your organisation and business. So, if it is not built as a technical Call Center but moch more as the name claims an operations center, than you will gain major advantages. But his means, that you need a structure and the right people, not 2 potentially not hundreds. And you need time, a working NOC is not a matter of a bunch of definitions and nice mission statements. You need role separation (dev, sys engineering, ops, noc), technical clarification and setup of the NOC itself, including processes, space, people and resources.

So what does this mean for startups?

You should potentially think about a shared NOC or think about when the right point will be for thinking about a NOC. And believe me, there will be long time no potential need for getting such an or structure up and running. Try to work based on OnCall procedures as long as possible. NOC costs money, even if there is major benefit. And a NOC requires working structures and procedures. So only start building a NOC if you are already aware of processes.

%d bloggers like this: