Rainforest Brook

Tuesday, March 31, 2015

On-boarding an Application for DevOps Support

Application Development Completed, Production Release done, Party … now what

Here comes the bigger phase of any Solution / tool / Product, the Support phase. Once the Production wheel started spinning for any application (internal/external/COTS), the DevOps or the Support workforce are the one going to get the first impact of any incidents, outages, etc. Particularly if the Application infrastructure is pretty new stack to the Support team and intertwined with lot other systems inside the infrastructure.

Is the Support team always geared with right set of skills, details, credentials, tools to tackle the newly inducted application? Support team here I mean is the team beyond raising tickets against. Typically that category of team falls in Tier 3 or Tier4, beyond whom only Developers or the Product vendor could help (who usually own the source code of the item).

Here we see how a T3, T4 Operations team be inducted to support a new application with right set of details to better handle the show.

Application Eco-system

In current Service layered assembly of solutions and tools, it’s hard to see any application serve independently. The team has to be provided this eco-system knowledge of the application. It could be

· What the tool is about?

· Where the application entirely fit in the companies’ offering?

· What are the different tiers of that application? DB, Message Queue, Storage, etc.

· Is this application clustered, redundant? This information provides whether restarting or bringing down the application have what level of impact.

· Is the application spread across multiple Data Centers but still served as one URL?

· How is the application accessed by end-users? For e.g. Thick client, web browser, headless, within other code, etc.

· Who is the contact person in the architect team to refer for any clarification?

All these details about the application should be prepared as knowledge base page with diagrams to detail the stack.

Sandbox for Practice

It’s not always just takes theory to complete the knowledge. A Sandbox or practice system could help the team in much more ways in practicing installation, upgrade, repeating production issues, etc. When providing such system we also need to consider the additional license cost, learning materials, trainings to be taken for it if it was purchased from external vendor. Also Sandboxing may not be possible with every application or solution if It is a major one or involving cost incurring inter-dependencies in the infrastructure (for e.g. Additional Storage unit).

Monitoring Set up

Once a product is rolled down, it would be monitored in some or other way like Port, URL, Disk usage, counters, queue size, etc. This details has to be shared with the Support team to give them clear picture of what is being monitored and why is that KPI important to the whole system functioning. It’s also better to provide what is the monitoring system being used like Netcool, Nagios, etc. Sometimes we may get two different alert indications. At times its better to check the Monitoring system itself to see if it has correct visual of the application or it’s just a network glitch. This is to avoid any false alerts/escalations or ignoring a vital alert, cross verification, etc.

Administration Activities

When T3/T4 teams become the top Support team for everyone to fallback, it’s no wonder they might also involve in Install, Upgrade, Recovery, Restoration, Patching, Decommission of the Application. The team has to be trained well on all these activities. Most of the vendor now-days have a common portal for product download, license renewal, patch download, issue tracking, knowledge base, etc. This portal credentials needs to be share with right people to enable them to manage the product. If the application is in-house then an alternate and standard arrangement for all above items should be made. A periodical meeting with vendor/developer/architect would also be fruitful for initial period of support while the team gets well versed in managing the application.

Release Activities

Usually a T3/T4 Support team is the one who would be doing the periodical release activities for the application. This involves patching, upgrading, feature addition, deployments, bringing up new instance, etc. The team has to be trained well with Change Approval Process, Release process and seriousness of its violation. The Release calendar for the company or the application should be posted in some common intranet page where its provided in advance and always available for reference. The team should also be advised on the urgent change process which might be required on outages impacting revenue. This may be completely different from the regular approval, change process.

Support

Here comes the actual day to day work. Incidents and Requests always hit the top activities in any Application management. Though Requests looks less serious than Incidents, it has to be carefully handled with all approvals, etc.

Requests

Before any user addition, group addition, providing additional permissions, it’s not just enough to get the approvals, but to see whether it’s a short lived or long lived request, is it a duplication effort. For e.g. A team might be having common credentials for certain activity, but a member of that team may request a separate privilege for the same activity which makes a duplicate effort. If these kinds of items are unchecked then later, the application authorization system would be having 100s of users and groups with most of them unused and in confused state leading to incompliant state.

Incidents

Incidents are the complex activity the Support team might struggle with always. But with right set of details and tools, this work can almost be standardized. Most of the time the incidents come as alerts from monitoring systems, T1, T2 teams, failure of other dependent system, failure inside application stack, network issue, etc. The team should be provided clarity on all these paths and how to handle each path in standard way. It is very usual for the team to immediately login to servers to debug the issue, irrespective of which path the issue came from.

Credentials

Having right set of credentials for the infrastructure makes difference in MTTR. When we talk about credentials it’s not just applicable for Production, but also for TEST, QA, STAGE, DEV instances. When the application consists of DB, MQ, and Protected Web Services in the backend, then the number of credentials that the team need to manage will grow exponentially. On top of these to be complaint the company might need to change all of these passwords periodically. This poses a new problem of keeping the passwords in sync. Usually individuals use Excel sheet, KeePass or similar tools to store these kinds of passwords. The best way is to host a password portal with one credentials and have everyone refer here when required.

Alert Category

Alert categorization is one other detail that should not be left to individuals. Though it’s always agreed that P1 marks business impact, revenue impact, P2 to P4 are usually the confused ones. Correct categorization is essential for other teams to understand the issue and work accordingly per SLA.

Knowledge Base

This is nothing new for a Support team. Every team might have some Knowledge base source either in intranet, Excel sheet, Document, Database, Portal, etc. There has to be someone held responsible for updating, purging and creating these records. Usually this kind of systems has more consumers than creators. Everybody thinks that others creating it. There has to be a standard way for creating and updating them, otherwise it become big, ugly and unreliable.

Troubleshooting tool set

Toolset can range from simple shell interface to the server to the advanced diagnostic Portal. Based on individual’s system expertise, each tends to use their own set of tools to troubleshoot the application issue. This may be command shell, grep, wget, curl, soap-ui, browser built-ins, telnet, netstat, etc. The first step is to make this toolset uniform and available as one package to be deployed in all machines involving troubleshooting. Next a standard steps to be defined to narrow down the issue using the above tool set. The steps should always lead close to a particular root cause always. This frees inconsistencies in troubleshooting between individual team members

User Group to keep in loop

Sometimes it’s just not enough if someone assigned the Incident ticket and started working on it. Some issues requires the appropriate stakeholders to be updated on time. This could be internal management staff, customers, vendors, etc. At the same time the team shouldn’t ring the wrong bell. This knowledge about escalation and keep-informed culture has to be standardized across team members. Preferably documented and available in common area like intranet.

Hand offs and Continuing Incidents

For a distributed Support team working across the geography, handing off the issues is additional task. Likewise handing over knowledge about continuing issues (most of the time P1) should be taken care well. This is essential for the other team to effectively continue their troubleshooting. Often this procedure happens through reading mail chain or chatting one-to-one between handing and receiving team member. If that is the case, then anyone else on the receiving team who wanted to identify the root cause or contribute to the troubleshooting would be left with lack of information. This procedure to the extent can be standardized through a common portal FORM, Filling a structured document and sending across to entire distribution list.

Daily Standups

Standups doesn’t just belong to Scrum and Agile projects. Support team can conduct a daily standup probably at their end of day to discuss and share the issues occurred over the day and ways in which that got resolved. This brings the private knowledge to the table and aid in sharing it. It should be just open discussion, some lead person could take a note of the issues and their resolution steps on day to day basis so that it can be applied to improving root cause identification, reduce MTTR, duplication of work , more standardization of process. The same meeting can also be utilized for communicating upcoming release events related to application.

Vendor/Developer Meet

Support team can meet with either the architect/developer (for internal applications) or Vendor periodically and discuss repeated issues that can be resolved by some design change, feature suggestion, etc. This can also help Support team in receiving knowledge from the other side for better manage the application.

Reports & MIS Data

Assume there are no Incidents or Requests to an application for some time and everyone was peaceful, does that really a state of peace unless added with supporting data. This is where MIS data comes in picture. The application usage and its internal resource usage, uptime report, Resolution time graph, etc. to be captured and discussed periodically to ensure application is behaving normally and to its expectation, issues are resolved effectively within SLA.

Conclusion

Handing over a new Application to Support isn't just taking one day session to the team and letting them fight the war day to day. I discussed about Sandboxes, right set of credentials, Application eco-system, Architect Contacts, Daily Standups, Continuing Incidents, Application Release Activities, Monitoring Setup, etc. When effectively handed over, the team can fight the Incidents and contribute the improvement in managing the tool than fighting with accessing, troubleshooting, narrowing down issue. In fact to say "Automation" at Support level tasks can yield lot of advantage and make all the issues discussed above to disappear. I'll cover about it in my further posts.

Friday, September 19, 2014

Growth

There is no time in this world that we are stagnant (mentally), we keep growing daily in some or other way. But the happy news is that we can choose the path we want to grow. If we do not choose one, our thought process & surroundings time to time choose every step of ours and lead us to somewhere we never planned to go.

Wednesday, September 17, 2014

(un) Planning

Not planning always causes burden to someone or other

Tuesday, September 16, 2014

AIM

If you do not have any AIM or cannot find one until now, don’t worry. Just begin doing what you find favorite with only one rule. Your action should be service to humanity in some form. At your extreme involvement in it, you will find your AIM.

Thursday, January 16, 2014

The Laptop

It was the time I rarely seen the laptop close to few feet distance. It was around 2002. Sometimes we watched our MD using it in important occasions like while taking some presentations to clients. One day I got a chance to work in that laptop to edit some last minute changes to a PowerPoint slides which we need to show to client the next day. Since it was getting late in the evening and my MD had to leave early. He asked me to take the laptop to home and arrive to client office directly next day morning. I was so happy that I had a chance to take the laptop to my home and show it to my Family members. I finished all my work and left to home.

I told my sisters I got laptop with me in my bag. Everyone was very eager to see how it looked. I took out a big flat black box. They were amazed to see the flat keyboard and flip able monitor. My Mom asked me whether that would do everything a normal PC Computer does. I told YES with exhilaration. My sisters were so excited to touch the cute keyboard and the small mouse pad. They also played Pinball and paint brush in it. My Father asked me how much does the laptop cost. I told it could be around 50,000 Rs. He was so proud that company trusted me to carry such a worth of object.

I felt very proud and happy to carry that big leather bag with laptop on one side of my shoulder and drive my motorcycle. Anyone can tell that kind of black leather bag contains a laptop and the person carrying that must be someone techie or a software engineer.

I carry my laptop everyday to work and home. I never remember when I opened paint brush for drawing something. It always would be used to save a screen captured image. Sometime it slipped my hand while carrying causing the corners to damage. The screen contains dust which lasted for many days.

I shutdown it once in few weeks, so not to lose my opened windows and browsing tabs. Most of the time I only hibernates it. Some days it remained on all through the night in bed while I slept after a tiresome work. My 4 year old nephew would always look for the time I finish my work and close the lid, so he can grab it for playing games.

I carry it along with me whenever I go to hometown. I sense sometimes cumbersome to carrying it around always. I never remember the time my laptop was never around me.

Recently one day there was a critical Customer issue on weekend and I have not carried laptop to hometown. My Manager called me to address that and I told I was not having laptop with me. I got scolded for it. L

Tuesday, August 20, 2013

Hindrances of Blogging

Hindrances of Blogging

I often mistook blogging as some perfect story writing technique. But it turned out to be not. It is just recording the reflection of our mind and thought process in some understandable form to others or the target audience. In the past I made many attempts to blog but failed once the heightened sense of blogging gets drained. Otherwise I blog only my thoughts and experience were really overflowing that I can’t stop keeping it within.

Blogging on Multiple Domains

I have another misconception that a person blogging has to have a separate blogging sites for each domain he is writing about such as travel, technical, socializing, economics, etc. The basic idea is not to confuse the reader with multiple junk of articles from different domain while he is interested in one particular set of articles. But this is also a misconception from my side. Tagging helps to overcome this problem and categorize the document accordingly. I’m yet to explore any other advanced method for this problem, so that I can blog under one single common site which constitute all the topics I’m interested to blog.

Thinking Blogging as Scientific Invention

I used to think that blogging is majorly considered as telling or recording something new that the world had never heard or experienced. On One angle this is quiet simple to think. Why would I write about something that everyone already knows or experiencing day to day? But again this also proved to be wrong sometime. Blogging is about our experiences or thoughts on something we went through. No two experiences are same to the letter; obviously there is room for some dynamics. These dynamics joined together to form a knowledge about the entity and avoid mistakes all that experiences covered. So writing about something that everyone went through is not a big waste of time. It helps sometimes.

Discouraged by finding the same type of Blog already floating around

Yes this is true. This is more or less related to the above reason. I always wanted to write something that no one has ever written about. The finding of similar type of blog in the internet makes me discouraged to write about that topic. I used to think that I’m adding another junk to the www world. People may not be interested to read another same topic. But as I told in the above paragraph this also may not be true always. What if your blog got in to somebody’s path first than the others? What if your experience rather than other similar ones closely related to the person reading? As a classical example, I searched in the internet before writing this blog as well J and found the blog 3 Hindrances to Successful Blogging more or less similar, yet not covered some of the angle I thought about. After all we all humans are same but different. Why do we need another human in this world if everyone is human already? the Answer is YES, we need new minds.

Thinking Blogging takes more time

I used to think that blogging on some topic is a very deep thought process and recording activity which requires quiet an amount of time from the regular life. But once I started writing I realize that writing one article takes the time that I spend for reading newspapers. That’s quite simple, as told in the HOPEpreneurs blog the fix is to avoid perfectionism.

Ok, lets blog now without wait for more contents on this topic J J