Tom Peruzzi's thoughts on digital, innovation, IT and operations

The art of Ops projects

Posted in general failures, organizational by opstakes on October 25, 2010

During my article about DevOps I started thinking about the way Ops projects work or should work. By doing so I came to the end that it is quite worth having a deeper look on it. In general you can differ between 3 types of projects within Ops:

  • Business Projects delivering new functionality, driven and owned within Business
  • Internal Ops projects, mostly new / adapted platforms, new ops or management platforms
  • External Projects driven by Hardware or Software Updates, Security Upgrades and others

Doing so let’s have a look on how those projects work and what you need to deliver best:

Business Projects are – focussing on Ops – more Project Controlling and Coordination tasks than real Project Management, scope, timeline … is mostly driven by Business even if Ops had the chance to plug into that project very early. Thinking so it is very useful to have one person acting as a project coordination instance, triggering people, timelines … within ops, but not doing full Project Management, as this will fail. Thinking on that you will recognise that the Project Coordination Instance is cross functional over all departments of IT Operations, each department has to agree on time of their people being used and managed/coordinated by the Ops Project Coordinator. But it will help the Ops Manager, all his/her Head Ofs and the overall organization as projects will pass through more successfully being coordinated by one instance than being managed by the engineer itself.

Internal Ops Projects are mostly hard to cover as they are managed by engineers and, much worser, mostly don’t fit into focus of the company. To get this sorted the Ops Manager has to talk to the companies’ Steering Comitee (if existing) or communicate very early and clear the goal of the project to all related stakeholders. Remember: goal should not be the “much better” Firewall, it should be countable business value. If you are a lucky guy you will get a PM resource from another department or you have a very skilled engnineer within your department … but always keep in mind: Engineering is a skill, PM is a skill too, a very good PM engineer isn’t met that often and secondly your organization has to be able to support a PM entity (lot of you think you do but tbh PM means day2day transparency and most of us don’t like it in that detail, we often feel too genious to submit tasks and end dates and how and why we will reach that date …)

External driven projects seem to be driven easily but there is one major conflict that the release date from external will not fit within the maintenance window calendar of your organization and – much worser – the features which change don’t fit into your orga either or they potentially threaten exactly the one you use most … So keep in mind that even it looks like easy to cover you need to communicate very early and distinct to avoid later pain. Who’s doing that? In my personal view this is the only project an engineer can cover beside/within his normal operations tasks as it is mainly the most technical approach of changes within the IT landscape.

Thinking so the result is the following:

  • get a project coordination instance in for all IT ops disciplines
  • use internal/external PM help for larger IT ops projects if you have no extra skilled resources
  • use your engineers for updating/patching if feasible and affordable by workload

Potentially you can keep on discussing and what I don’t want to mention in that article is the question about the right PM methodology for each flavour of projects or whether one would fit for all (most organizations tend to think so knowing that they all differ by > 80% …)

The Agility stuff

Posted in general failures, startup failures by opstakes on September 23, 2010

Whenever we hear terms like agility, scrum, XP, KanBan or whatever most people think like “This is cool development and innovation stuff, ops doesn’t have to care on that” NOT TRUE!!!

Whenever you hear something about a new development methodology, framework or anything else be prepared, changing developments life will change your interfaces hence your operational life too!

And better to act on interfaces than reacting. We currently do a lot of investigation on cloud, agile and how it changes our ops life but to be honest, agility drives the operational need for clouds.

Think about the following: You best act with scrum teams if you show them your boarders and limitations (aka frameworks, standards, tec. recommendations) and act as an active stakeholder with and within the scrum team. The better the teams will be, the more they will need agile resources from your ops department. Flexibility or agility can be achieved by a bunch of technologies and with different investment scenarios but one which probably fits best is reacting with cloud computing resources or highly available virtual resources (hence highly automated and “near cloud”) and provide proper feedback to the agile teams.
Doing so you will get a very high throughput within your IT organization, tons of congratulations as you were one of the very rare operators thinking in business terms and needs and you will get a very effective and efficient ops team with strict and accepted boarders. The better and clearer they are, the better your automatisation is, the better agility is supported and the better feedback will be.
If not, do what you have to do with such developers 😉
I will keep on writing about agility and cloud operations as I personally believe this is the way we will operate the next years long.

Tagged with: , ,

all time cloud

Posted in general failures, KnowHow, startup failures, technical by opstakes on July 20, 2010

I stop writing about operational failures only, potentially this blog will go on to write about “hot” topics within IT operations. It will still stay focused on operations, as I am an ops guy.

Good reason for telling, why I stopped writing for the last weeks: I wondered where cloud computing will go to!

We do quite a lot of different cloud projects and right now it seems that either there is no space left to deal with clouds or on the other hand side there is still a lack of experience out there on all sides of business. This is just a short draft about my ongoing thoughts, discussion welcome!

Cloud topics on business side:

why do we still believe, that it is as easy as writtin in the prospect? Haven’t we learned from all former proposed functionality? Yepp there is high potential to get it done and delivered in a smarter and more cost sensitive way, but at what risk and cost? And how does operations look like afterwards?

Cloud topics on IT side:

Clouds is nothing we can pass by. Clouds have to be worked with, IT has to understand pros and cons of clouds and how to live with them for the next decade. Clouds are neither friends nor enemies, they are a new way of delivering services to customers, more service based than ever before. Clouds are not VMWare and are not xen or kvm, clouds are a business case thou IT has to understand business and business methodology otherwise they will deliver virtualization. Not bad at all but only a few percent of cloud power.

As IT I would strongly recommend not to put to much pressure on compliance, legal and data security. There exist several organisations covering that topic and it is max. a question of weeks or months to get it fully done. Secondly there are already SAS70 ready solutions out there and other standards are met too, if you cover that topic it is OK, cause it is a risk, but nothing more. Using compliance as IT against cloud will mark you as the one securing your own office place …

Cloud topics on operations side:

clouds mean to no longer be the prime operations partner. To be honest, then thinking about all the complexity getting more and more clouds can even help you reducing YOUR complexity and getting things done. Yeah, number of systems will potentially go down, lot will be delivered out of cloud, partly you will act as an cloud offer. BUT, this is good news, you can transform yourself from an 24/7 operator to a platform architect handling tons of tons of tons of different systems without dealing with the day2day problems, they are within the cloud handeled by others 🙂

Tagged with: ,

On Demand FTEs

Posted in general failures, organizational, startup failures by opstakes on May 12, 2010

You definitely know that story: You came in on a smooth Monday morning and the first you hear is a lack of resources. But why? How could it happen that you are running out of resources over weekend? Private accidents? One of your technicians felt in love with a girl and will not come back? Or is it just the simple fact that project Nr 30 has kicked off? (and indeed 10 of them are prio 1 :-))

So how do you want to plan scenarios like that? Whenever you build a new resource (or hire, maybe the better word) you should keep in mind to build resources internally you will need on a long term approach. So ask the following:

  1. Is the resource for day2day operations?
  2. Is the FTE still needed after successful close of project X and deprovisioning of current platform?
  3. How do you secure to take over knowledge after the external FTE has gone?
  4. Is it easier/faster to get that special skill/knowhow via a service or do we really talk about FTE?
  5. Is it in or out of budget and how to handle?
  6. How fast will you get that person, is it within demand time?
  7. Can you upgrade internally?

I want to point out one aspect, the knowledge transfer: Whenever you get in an external, you should be able to manage him more or less the same way you manage your employees. An external is as good as he gets direction, empathy and loyalty, even if you need him/her for a longer time.

Second point: Keep in mind that the external will leave the company after day X. So prepare your organization to take over the knowhow, transfer the experience within appropriate time. Otherwise you will not be able to successfully deprovision the external party. Lack of expertise, lack of knowhow after the external has left will directly fall back to you as the internally responsible person.

I tend to say put a minimum 50 % of all your externals within the existing tasks and support your internal FTEs to bring themselve upfront onto the new technology. After the switch only 50% of your current stuff has to be “upgraded”. This can be done by your already transformed employees so operations and the new fancy stuff is well organized and understood by your employees.

I know that it is quite hard to bring in externals under pressure and on the one hand side bring them upfront on the old – to be replaced – services and on the other hand use the same employees which train/support the externals to bring a new project up and running.  The more the less you will reach the goal. Maybe not the first time, maybe even not the second or third time, but your employees will learn to live with permanent transformation and if they understand the message the next project will survive and knowhow transfer as a core rule for bringing in “on demand FTEs” will be accepted.

Threaten Business

Posted in financial, general failures by opstakes on March 24, 2010

I still wonder why so many IT people do argue that they talk to business, involve the business and respond to the business but if you ask them how and how often, you get no concrete answer.

Currently I got a query of a colleague asking me why IT isn’t interested in getting direct involvement via a project steering board from the business. The IT directors are part of an internal change project leading to better response times, lean IT and massive cost reduction. Those are all business related values and if business is neither involved nor partnering the project nobody will ever know what happened to the project!

The way to a good understanding of both sides is a long and stony way. Both sides have to learn to talk to each other, get understanding for their situation and their attitudes and behaviour. The goal is to convert the business culture and IT culture to a corporate culture. Sounds easy? Isn’t it: potentially tons of arguments will stress, disrupt or even destroy that idea, plenty of (personal) interests will have to be aligned with the business idea itself. IT has to understand that business should be a partner and sponsor, business has to understand that it needs a strong IT.

So stop living beneath within the building!

Deliver and own services

Posted in general failures, organizational, technical by opstakes on February 1, 2010

Mostly technicians seem to have an incredible understanding about service delivery. For them this means that they own and control the whole delivery chain, beginning by each stored bit and byte going over to the databases, the apps, the network, the associated (and hopefully existing) security, the frontend, the user training and and and … and if possible please forget documentation, we know what we do 🙂

But world changes, even now we stand on another step forward within a realy service oriented, clouded environment and the more you think about clouds, the more you have to dematerialize service delivery. It is not a bunch of servers connected with a bunch of network devices, secured with a bunch of security appliances which creates the service, the service is much more and the goal of modern IT should not be to deliver hardware-related stuff to non IT staff. For them IT does not matter (btw. thanks to the great book, Nicolas Carr) they just want to use. And non IT thinks different, they think – as we intend to say – emotional not rational about IT; either it works appropriate and the service desk is OK or it is not sufficient delivered. And they think in terms of economy.

An IT service delivered to the non IT people should be competitive in terms of service and pricing and it should be interoperable and portable. As we know, lots of offers out their try to do so and making a deeper view into it offers incredible stuff …

And what happens now? All the cloud offers, IaaS, PaaS, SaaS, internal and external, private, enterprise or public, shortly the clouds offer new innovative services with much more speed, power, resources and economies of scale.

Why should I continue maintaining my own hardware, software … if I do commodity stuff? It will be more expensive, more to integrate, more to maintain … so my resources are secured but nobody knows for how long.

Right now there exist only a few real issues for not going to a cloud:

  • cloud to cloud data exchange is still lacking true interoperable and useable security
  • the right size, if you have reached a size where you can gain profit from the top discounts too the gap within money will be closed.
  • Real-Time, if you need Real Time you will have to build it for yourself (now)
  • compliance: potentially, especially in the financial industry you will not be allowed to move your user data out of the computing country.

So the goal or mission of the IT of the (near) future will be, to aggregate the service delivery which is spread over the world. The IT will take care of

  • interoperability
  • portability
  • first level
  • combined service catalogue
  • economics

If so, what should you change today? Yes maybe ITIL, but ITIL should be no more than it is, a goo practice. Use, what’s useable for you but think about your service definition and how to get that deep into the organization that you know which IT is needed and why. Acting as an account manager and understanding the own company as the key customer would potentially help. Leave IT behind, think in solutions and services and deliver them in time and with appropriate objectives (nobody asks who delivers, so if it looks like you but it comes trom anywhere else ….)

So please start thinking about the tremendous change what will happen in the near future and don’t repeat the last 20 years standard opstake: you can’t deliver and own all services within a more and more complex IT world.

SLA Mania

Posted in general failures, ITSM, organizational by opstakes on January 21, 2010

Even if processes do not or only fragmented exist, people start writing weird SLAs with implicit and explicit definitions of services without any service catalogue, portfolio or even without any understanding of what differs between infrastructure, app and a service …

What happens then? They start to build their opinion like “Hey, we have an SLA framework … ” or “Hey, we have a service catalogue ..” Asking them about how services are defined, build, maintained, monitored and reported results in answers like “this ist still done on a manual and on demand basis … ” or “… we are working on that topic …” or “hey dude, we still have to go a long way, keep on waiting …”

To be clear, thinking about SLAs is quite important to the organization and defining a structured aproach (aka SLA framework) is a really good way to go but:

  • if your service support is still foggy, a major SLA part will be foggy too
  • if your ownerships are not defined, a major SLA part will be undefined
  • if you do not have any process about what and why is an offered service, you will define either customers wishlist or a very technical perspective

We will be able to list 100 or more additional blamings, but it is not our goal to do so. The goal should be the answer on what’s a better or more appropriate way?

I would suggest the following:

  • Know your Service Support processes
  • Get an Service Responsible/Representative in
  • Know what you will be able to monitor and report
  • Define your Service level internally, know what you are able to deliver (aka OLA)
  • If possible show numbers, eco values mostly help by talking to your (even internal) customer
  • Only define achievable goals/metrices
  • Only define countable goals/metrices, at best they should be monitored and reported automatically by systems/machines

Start by defining the most valuable (business) processes for you customer and always think in terms of your customer and his potential end-2-end view. And  – maybe the essential point – communicate the SLA to all engineers which are needed to run the service. Defining is step1, running is 2 and running is king!

You can use frameworks like ITIL, CobiT and others to get an understanding of what and how you should define within an SLA and how the process can look like (how to come to a service portfolio, catalogue and the associated SLAs) but potentially this will be pretty to much at first step and business pressure will not allow you to run a one-year-project just for a bunch of SLAs.

So keep in mind, go as defined as possible, know what you can deliver and how to measure, agree on internally first, know your support processes, escalations and at last go to your customer. Assure, that your customer is familiar with those processes too and start defining the mutual understanding of the service (the SLA). At end communicate the results to your IT department and start implementing. And don’t forget to monitor not only the values, monitor the SLA too… (is it still valid, useful, anything to define better …)

All the best with your SLAs!

Our own datacenter is the best

Posted in financial, general failures, technical by opstakes on December 8, 2009

Potentially no, but there is still a sub reality out there showing IT Operators or Facility Managers that owning their own datacenter is much cheaper, cooler, greaner, leaner, and whatever even if they just want to run 50 servers.

To be honest it could make sense, depending on your location, some geographic stuff and depending on the growth. But even google did not start by building their own datacenters. They rented before and now step by step they migrate to their owns – because they have reached a size of interest for having their own ones!

And if you think about your datacenter think about:

  • who’s operating the facility
  • who keeps care on USV, Diesel …
  • who keeps care on getting all licence stuff done?
  • who cleans the filters
  • who is responsible for CCU & friends
  • who is the cabling expert?
  • who is the power expert?
  • who is responsible for rack planing and provisioning …

and all of that in a twenty-four-seven environment …. ? Do you really want to be the Facility guy PLUS the Ops Leader? Can you combine those two or run them in a professional manner? Will you get headcounts for an electric expert as an IT Ops Leader? Tons of questions and nearly all – especially if you think about associated risks – should lead to the decision to NOT rund your own DC before you reached critical size.

The next interesting topic which always happens: If we still do not have the critical size, why don’t we do some shared hosting too? Because you do not have the skillset to do this? Because you are an internal service unit and you are not set up to offer on market prize on the external market? Because your SLAs are not that strong? Not really, potentially no, the main answer is: Because it is not your business!

So when do you reach the critical size? It does not depend on the number of systems, it depends on strategic and economic questions

  • is running own datacenters a potential USP and can I run it cheaper/equal market?
  • Sum of mass discount smaller than savings by own DC (economic value)
  • extra flexibility needed (keep care! flex against price, and flex. against coolness)

The how and when will vary and potentially we will have to work on a new definition of critical size with regards to cloud computing, the price modell and the new datacenters (generation IV) which should reduce costs too.

Developers are good Operators

Posted in general failures, startup failures by opstakes on November 9, 2009

This is definitely what often happens in startup companies. You have an idea, you have a development team and you have business/expenditure pressure. The easiest way is to let developers do IT Ops stuff (backup, monitoring, day2day operations …) But what happens if you do so?

First, there is no difference between an inventer (developer) and an operator thou there is no border between fancy new stuff and a stable platform. Secondly, there is no declared transformation process which shifts the proven and checked app to the operational platform. Third, those 2 groups think totally different. While a developer is interesting in getting new fancy stuff done, an operator’s holy grale is to provide a stable, reliant and proven (never changing) platform.

Next if you don’t differ, you will have real trouble keeping dev tasks out of the online/productive environment. How fast it would be for a developer to fix a bug online, if he don’t has to test it in the background? But this means, acting online, thou no protection line, no border for safetiness between.

There will be endless discussion and yes, there are even examples out there showing that developer and operator can be one person without any probs, but the vast majority of IT people out there will not be able to do the same.

So keep in mind: Different tasks with different attitude and behaviour need different people!

%d bloggers like this: