Description:
Provides 24/7 technical operations support for cloud-based solutions to clients on the supported application, DevOps, middleware, security, and infrastructure.
As a single integrated E2E operation support that includes Level 1.5 remediation across the fleet.
Following a standard workflow and incident handling processes, they receive and record incident-related information using a variety of tools and process, selects appropriate actions to resolve issues and communicates the solution or action plan to the client.
Supports a number of tools as part of the Integrated toolchain.
They use professional knowledge and problem determination/source identification skills to resolve problems involving APIs, application services, IaaS, PaaS, SaaS, microservices, containers, Kubernetes nodes, ICP management, middleware components, network, security, and infrastructure issues alike.
If unable to resolve, will triage and route the incident to the appropriate level of support.
Understand high-level cloud application architecture and the ability to do an initial analysis of incidents.
Provide Application ID management support.
Provide cloud elasticity by auto-scaling up/down of resources based on the business requirements. Provide DR and manual redundancy failovers.
Provide daily, weekly & monthly integrated service management reports across the solution.
Would work on ticketing tools such as ServiceNow, IBM Control Desk & Remedy, and automation tools such as Rundeck and IBM Runbook Automation.
Should understand the ITIL processes of Incident (including Critical Incident Management), Problem, Change management, and Integrated Service Level Management.
Should have used monitoring tools such as IBM APM, NewRelic, Runscope, and Netcool OmniBus to monitor the client’s environment.
Technical understanding of IBM Cloud Platform (Bluemix PaaS), Container management, Kubernetes node, ICP management, HA infrastructure, and load balancers.
Requirements:
Two (2) to four (4) years of experience of AWS cloud based technology.
Two (2) to four (4) years of experience of writing system integration and/or systems requirements is required.
Strong experience building and maintaining production systems on AWS using EC2, RDS, S3, ELB, Cloud Formation, etc. and familiarity interacting with the AWS APIs.
Understanding of software release process and management.
Proficient in high level script languages as well as script environments. Experience administering Linux systems. Thorough understanding of configuration management concepts. Ansible experience is highly desired.
Experience with monitoring, metrics, and visualization tools for network, server, and application status (DataDog, Zenoss, Sensu, Nagios, Graphite, e