Tom Peruzzi's thoughts on digital, innovation, IT and operations

the wrong trust in cloud computing

Posted in BCM, general failures, startup failures by opstakes on April 29, 2011

What we have seen last week is that even large companies like Amazon can fail – on whatever reasons and with their tons of engineers, processes, procedures, technologies and mass of systems and servers. The question for you as a (potential) customer should not be why did they fail?

They had a rapid growth and even with the best engineers ever both growth and quality cannot run the same speed, their must be some sort of risk even if we still talk about human created systems. And to be honest I really believe that it can happen to all service providers out there soon, maybe tomorrow, next week or never, the likelihood to fail is a built-in function.

But what’s the question you should ask and answer yourself: How could I survive if my (wherever) instances or data goes down? It is not the fault of your provider if you miss all your data, it’s your fault if you had no strategy on how to deal with such a disaster? Nobody will expect you to come back to live and operations within 30 min if such a case occurs but you should have your BCM work done before. I know, if you – like many running on clouds now – are within online business time and speed matters, risk is ok as long as it is not happening, afterwards you get asked what happened and why you had no plan against …

So keep in mind, data security (integrity, authenticity, availability) is always your job, you can outsource (move to cloud) parts of the technical stuff, but the management and umbrella function always belongs to you. Yes it is a pity if your service provider goes done, but it is a shame if you have no plan how to cover such scenarios and come back to ops immediately.

This is the wrong trust in cloud computing, cloud computing can help you a lot, it can mitigate your volatility, it can enable you immediate growth, fast test and beta and whatever but you should know what your cloud provider is and what he delivers, do not overtrust. The provider delivers technology, you do the mesh-up, so keep an eye on the availability and security of your mesh-up!


the missing IT and Ops strategy

Posted in general failures, organizational, startup failures by opstakes on February 8, 2011

It often sounds like operators – or in specific IT operators – just operate on a day2day basis independent of what’s coming from the business and where the business is going to.

In fact this is bullshit. You cannot act as an operator if you do not know where your company is willing to go to! And even you cannot operate if your IT and your IT Ops department don’t knows how to answer on the business challenge and on how to challenge the own IT department. There is difference between “headless” and strategy less. We often see organisations with strong management in terms of discipline, procedures and routines but they still fail. The reason why is not bad engineering … it is a lack of understanding that beside discipline and processes you need 2 more factors (I would not call them soft or whatever and I will not write about culture!)

  • a strategy showing people there to go to
  • challenge from the market

It is quite interesting to see that the less IT strategy exists the more you hear something like “we are so extra complex and not comparable to market … we have superior engineering on board …. we cannot compare to market as we have special self written applications …. the market will not understand our demand …. ” Potentially we will be able to name tons more of those bullshit arguments.

I worked a serious long time as a systems engineer with potential the same “ideas” regarding our rocket science ops platform ;-). Once I went to the CTO as he asked me to have a look at a special solution being on the market. I told him the pros and cons for about an hour and explained why this is shit. At the end he pointed out that if 1000 people think this is good solution and I think this is shit …. who will be the right one? The funny thing behind …. we used that solution and were quite happy, it was near market standards and we started to build our special ops platform market conform and got tons of more possibilities; on the economic and on the tech value!

Why this is important? The CTO had the strategy to be as market compliant as possible but staying rocket science in the business related tasks, processes and programmes. He showed us that this strategy is able to work and how the company benefits from the strategy (he did not mention in detail that engineers are easier to exchange if you use market standard hardware and software 😉 )

Next thing is that if you do not be on your own on both, organisational and technical, than you can take part on market innovation and inspiration. Mostly market will be much faster and innovative than you are, especially this should help you in the security environment. Keep an eye on being as secure as the market allows you to be. The most innovative internal solution will not help if you cannot participate in security development speed!

Let’s summarize: Have a strategy, give your people a mission, a scope and an idea of how to go forward, don’t forget to check the market and do not hesitate to accept that market is faster and more innovative than you and your department, nothing  to shame on, only if you think you can be much faster as the rest of the world. Hopefully or potentially you will be able to exactly tell the “I’m the fastest” story to your business than talking about core processes, metrics and IT/business behaviour!

Ops Predictions 2011

Posted in general failures, ITSM, organizational, startup failures by opstakes on December 21, 2010

End of year is coming, time for review and predictions …

What we have seen this year is the emerging trend to try to move to the cloud. Why say try? Cause a lot of different lacks did delay decisions: lack of experience, lack of manageability, lack of security, lack of commodity, lack of portability and much more but the train cannot be stopped anymore. We will continue to see different diverse ways to the cloud, the aggressive one (we just do it), the one’s moving via private virtualization, the one’s doing outtasking to the cloud and the one’s not knowing that they are already in the cloud.

So what’s next? According to the analysts cloud is directly on the way to the phase of desilusion. Sounds bad but isn’t so. We now reach the working scene, the marketing whow is over and we can start working on a deep and permanent way. So think about it: cloud will become commodity in 2011, we will stop talking about who’s in the cloud or not, we will start just using it.

This leads to another trend for 2011: cloud operations. We did central operations, decentral operations, virtual operations, outsourced operations, outtasking and whatever, next is cloud operations. Maybe you will not take care on it but potentially you will have to think about how to operate your IT then parts of your IT are somewhere (you do not even know exactly the location, just the name/identifier of the cloud).

This leads to tons of aspects in terms of all ITSM processes, especially change mgmt (do you still own your cloud virtual environment … how to combine those releases …), incident, event and problem mgmt. (who manages what?), SL management and all others, with special focus on IT financials.

Next trend, partly invoked by ideas like DevOps is agile operations. The more agile the company, the more agile development the more event driven the IT. This leads to agile operations for the IT ops department. So how to do so?

Agility means being very flexible and self responsible within a certain frame/border. Agile operations mean being very reactive, fast and flexible within a fixed set of frameworks/standards to deliver prompt IT resources on a very $$based approach.

So agile operations relies on cloud operations and vice versa. In my understanding and strong believe the trend per se for 2011 should then be called

agile Ops operations

So what does this mean for you? Think about strong boundaries and frameworks married with a high level of ops automation. This superset is then offered to your company / development enabling them to use ops resources on demand and cost sensitive. You as the ops entity do all the cloud stuff either private, hybrid or public within your defined subset to deliver on a regular and flexible bases predictable IT.
For me this sounds reasonable good. Remember, I’m an ops man … doing agile ops operations even means you create your ops platform (DevOps), you keep the releases within your responsibility but you stop from reacting and being the holy grail nobody knows about within your company. Ops get’s public, viable and business enabling to the company! This is our all time goal and this must be the goal for all of us.

We will see what happens exactly in 2011, hopefully my predictions comes to truth by 80 %.

cloud reloaded

Posted in financial, startup failures, technical by opstakes on October 7, 2010

I had the chance to present some general thoughts on cloud computing on an aicooma and Microsoft event yesterday.
While being in general a pro cloud geek especially for operations I got some more hints to cover:

  • scrum & cloud really cooperate well on a very high level (aicooma will present some whitepaper regarding that topic soon)
  • the deeper you look at all potential hidden costs the less interesting a cloud offer looks like in the first, but keep in mind that you always have to take care on a service lifecycle perspective
  • Moving from Managed Services to a real cloud offering is quite hard, on the one hand side for the moving organization to get an understanding and feeling for the cloud, on the other hand side for the partner, right now nearly all major outsourcing parties claim to offer cloud but the contract looks quite different afterwards …
  • Even cloud vendors now tell the truth: a cloud will never ever fit into each setup

Dealing with that topics it shows that there is still some FUD in regarding how cloud computing could help me, my department my organization and whether it fits or not. A quite good way would be like I do in general:

  1. Get your service catalogue and your service portfolio up
  2. Include lifecycle infos into portfolio (time of reinvest …)
  3. take the 5 out where reinvest should occur within next 18 months
  4. have a very deep look (organizational, technical, financial) on those 5
  5. find a potential cloud substitute
  6. compare in depth

After doing this once or twice it’s getting quite easy to deal with, it is not that much work as it looks like in the beginning but it offers you a very transparent view on your portfolio and on the potential of cloud offerings being out and stable right now, more or less it demystifies cloud offerings and makes them compareable to your internal or external managed services like comparing apples with apples, and that’s the goal, nothing about emotions, coolness or hype, realistic and transparent decision taking is king.

Tagged with: , ,

I do not like requirements engineering

Posted in startup failures by opstakes on October 4, 2010

What’s the worst? An organization not keeping in sync. Means that if you make a decision it should be a viable and livable decision supported by you and the management and it should not be the opposite by tomorrow.
Organizations not relying on their own decisions tend to implement requirement engineering as a method to support their day-changing decision flipflops. This leads to not usable requirements supporting the political situation of today, not the strategic or tactical direction the company should go to.
If you believe in decisions and if you believe in requirements you will have a good, feasible and cost/usage driven process.
Next you will have enough management support to get good requirements out of your organization. I would strongly recommend to either differ between functional and non-functional requirements, sort them by company, division, deparmental interests and ask whether they are requirements, needs or expectations and whether those are valid for now or for 12 or 24 months.
If doing so you start moving your organization from a decision driven (“we need a new datacenter”) to a requirements driven (“we need additional space based on the current capacity plan and current architecture”).
A last short tip for your beginning: Start with KO criterias only, nothing more and if you start with those and if it is established you will be able to get into deeper requirements processing soon.

The Agility stuff

Posted in general failures, startup failures by opstakes on September 23, 2010

Whenever we hear terms like agility, scrum, XP, KanBan or whatever most people think like “This is cool development and innovation stuff, ops doesn’t have to care on that” NOT TRUE!!!

Whenever you hear something about a new development methodology, framework or anything else be prepared, changing developments life will change your interfaces hence your operational life too!

And better to act on interfaces than reacting. We currently do a lot of investigation on cloud, agile and how it changes our ops life but to be honest, agility drives the operational need for clouds.

Think about the following: You best act with scrum teams if you show them your boarders and limitations (aka frameworks, standards, tec. recommendations) and act as an active stakeholder with and within the scrum team. The better the teams will be, the more they will need agile resources from your ops department. Flexibility or agility can be achieved by a bunch of technologies and with different investment scenarios but one which probably fits best is reacting with cloud computing resources or highly available virtual resources (hence highly automated and “near cloud”) and provide proper feedback to the agile teams.
Doing so you will get a very high throughput within your IT organization, tons of congratulations as you were one of the very rare operators thinking in business terms and needs and you will get a very effective and efficient ops team with strict and accepted boarders. The better and clearer they are, the better your automatisation is, the better agility is supported and the better feedback will be.
If not, do what you have to do with such developers 😉
I will keep on writing about agility and cloud operations as I personally believe this is the way we will operate the next years long.

Tagged with: , ,

all time cloud

Posted in general failures, KnowHow, startup failures, technical by opstakes on July 20, 2010

I stop writing about operational failures only, potentially this blog will go on to write about “hot” topics within IT operations. It will still stay focused on operations, as I am an ops guy.

Good reason for telling, why I stopped writing for the last weeks: I wondered where cloud computing will go to!

We do quite a lot of different cloud projects and right now it seems that either there is no space left to deal with clouds or on the other hand side there is still a lack of experience out there on all sides of business. This is just a short draft about my ongoing thoughts, discussion welcome!

Cloud topics on business side:

why do we still believe, that it is as easy as writtin in the prospect? Haven’t we learned from all former proposed functionality? Yepp there is high potential to get it done and delivered in a smarter and more cost sensitive way, but at what risk and cost? And how does operations look like afterwards?

Cloud topics on IT side:

Clouds is nothing we can pass by. Clouds have to be worked with, IT has to understand pros and cons of clouds and how to live with them for the next decade. Clouds are neither friends nor enemies, they are a new way of delivering services to customers, more service based than ever before. Clouds are not VMWare and are not xen or kvm, clouds are a business case thou IT has to understand business and business methodology otherwise they will deliver virtualization. Not bad at all but only a few percent of cloud power.

As IT I would strongly recommend not to put to much pressure on compliance, legal and data security. There exist several organisations covering that topic and it is max. a question of weeks or months to get it fully done. Secondly there are already SAS70 ready solutions out there and other standards are met too, if you cover that topic it is OK, cause it is a risk, but nothing more. Using compliance as IT against cloud will mark you as the one securing your own office place …

Cloud topics on operations side:

clouds mean to no longer be the prime operations partner. To be honest, then thinking about all the complexity getting more and more clouds can even help you reducing YOUR complexity and getting things done. Yeah, number of systems will potentially go down, lot will be delivered out of cloud, partly you will act as an cloud offer. BUT, this is good news, you can transform yourself from an 24/7 operator to a platform architect handling tons of tons of tons of different systems without dealing with the day2day problems, they are within the cloud handeled by others 🙂

Tagged with: ,

On Demand FTEs

Posted in general failures, organizational, startup failures by opstakes on May 12, 2010

You definitely know that story: You came in on a smooth Monday morning and the first you hear is a lack of resources. But why? How could it happen that you are running out of resources over weekend? Private accidents? One of your technicians felt in love with a girl and will not come back? Or is it just the simple fact that project Nr 30 has kicked off? (and indeed 10 of them are prio 1 :-))

So how do you want to plan scenarios like that? Whenever you build a new resource (or hire, maybe the better word) you should keep in mind to build resources internally you will need on a long term approach. So ask the following:

  1. Is the resource for day2day operations?
  2. Is the FTE still needed after successful close of project X and deprovisioning of current platform?
  3. How do you secure to take over knowledge after the external FTE has gone?
  4. Is it easier/faster to get that special skill/knowhow via a service or do we really talk about FTE?
  5. Is it in or out of budget and how to handle?
  6. How fast will you get that person, is it within demand time?
  7. Can you upgrade internally?

I want to point out one aspect, the knowledge transfer: Whenever you get in an external, you should be able to manage him more or less the same way you manage your employees. An external is as good as he gets direction, empathy and loyalty, even if you need him/her for a longer time.

Second point: Keep in mind that the external will leave the company after day X. So prepare your organization to take over the knowhow, transfer the experience within appropriate time. Otherwise you will not be able to successfully deprovision the external party. Lack of expertise, lack of knowhow after the external has left will directly fall back to you as the internally responsible person.

I tend to say put a minimum 50 % of all your externals within the existing tasks and support your internal FTEs to bring themselve upfront onto the new technology. After the switch only 50% of your current stuff has to be “upgraded”. This can be done by your already transformed employees so operations and the new fancy stuff is well organized and understood by your employees.

I know that it is quite hard to bring in externals under pressure and on the one hand side bring them upfront on the old – to be replaced – services and on the other hand use the same employees which train/support the externals to bring a new project up and running.  The more the less you will reach the goal. Maybe not the first time, maybe even not the second or third time, but your employees will learn to live with permanent transformation and if they understand the message the next project will survive and knowhow transfer as a core rule for bringing in “on demand FTEs” will be accepted.


Posted in organizational, startup failures by opstakes on December 22, 2009

Yes you know, you will need them. You will need that crazy guys talking about systems, services, processes and tasks you potentially have never heard about before. And yes, they will let you know who the Know How owner is. And yes, you will need them cause:

  • your Know How can only be scaled up if you have people which already transformed an idea to a working business.
  • your Know How can only be pushed onto next level if you have people in place which have already seen the next level.
  • there is compliance demand out there and you need people who know how to work with.
  • and you will need technical, organizational and even cultural guidance.

This should never be an anthem on old, hard working before, employees. Seniority means that those people have seen different company cultures, different levels of expertise and they know how to transform. Potentially 3 years are enough, maybe 20 or more maybe they never reach seniority level. It really depends on the person, it’s personality, attitude, behaviour and culture (those famous ABC).

You will not need hundreds of Seniors, maybe one or two but you will have to enable them to transfer their knowledge to the teams and to you.
So keep in mind, even if you are the founder, funder, owner or whatever CxO, you will need to have an ear for them, their ideas, their expertise. If you can enable them, they will enable your organization based on the values and objectives you showed them as being important for your business model.
Not having Seniors means trying to create the wheel an extra time and loosing time, resources, nerves and money. Young rebels are good, focused your rebels make the business.

Tagged with: ,

Borders for prod. environment

Posted in startup failures, technical by opstakes on December 3, 2009

A good developer will potentially never be a good operator and vice versa. But there is a grey area of work behind, mostly within transition from development to production environment, technically speaking the test, validation and pre/near-live environment.

The question is, who is responsible for that environment and why and at what level?

Traditionally development feels capable or resonsible for that equipment arguing that it is their effort to get it running. I would bring in another party/role named Quality Assurance (QA). Those should run at minimum the test environment and the associated tests (functional, integrational, scenarios, plans, procedures, acceptance criteria ….)

Next is the load test which is the gate to the production environment. As it is the last border it is top prio fo operations to be responsible for that system and test. If it passes, it is live. Who runs the live system? Ops, so they must do their loadtest job too. Otherwise they will get software live and they never had chance before to put an eye on it.

And – don’t forget – being responsible for it does not mean doing the whole job, the loadtest itself can be potentially run by QA – adviced by operations. But the establishing of the near live env, the evaluation of the result, and the sign-off should always be part of operations – no qa, no dev.

Another reason why ops should do so: If you do your loadtest on a near live, you will be able to get a bunch of numbers which should show your overall requests by system as a benchmark for capacity planing of the live environment. As we know those numbers change release by release (more functions coming in, complexity decrease/increase …). So not being responsible for those tests and letting dev things done means not being proactive in terms of environmental capacity.As a result you will put reactive forces on the live environment all time, binding resources and not fulfilling the goal of a good operations framework.

%d bloggers like this: