KKStream is seeking site reliability engineers who have passions for DevOps technologies. We serve our enterprise level customers coming from top Japanese companies and support 24x7 operations.
AS a SRE member in KKStream, we participate in the project development and refactoring, work closely with RD and service teams to build scalable and operational environments. We like to share everything that helps projects, solve various puzzles and embrace lots of technologies. If you have fresh ideas, love to show unique and incredible viewpoints, and enjoy collaborating with cross-functional teams to develop real-world solutions and fantastic user experiences. Welcome to join us!
Responsibility:
- Participate on-call rotation.
- Develop and maintain service monitoring software stack.
- Develop and maintain infrastructure orchestration on clouds.
- Govern policies for development and resource accessing.
- Engage in and improve the whole lifecycle of servicesImprove the reliability and scalability for released projects
Requirement:
- Bachelor's degree in Computer Science or a related technical field involving software or systems engineering, or equivalent practical experience.
- Experience in Cloud services like AWS, Azure or Elastic Cloud.
- Working knowledge of cloud networking, storaging, and computing services (e.g. EC2, S3, CDN. Lambda ...etc)Working knowledge of Git.
- Working knowledge of database technologies, such as Relational database and NoSQL.
- Develop automation tools and operation tools such as alarm as code.
Nice to Have:
- Good skills and experience in communication.
- Love to embrace open source and love sharing knowledge in the DevOps area.
- Experience in programming languages (e.g. Python , Golang, or Bash script)
- Experience in Container Orchestration technologies, such as Kubernetes(AWS EKS/self-hosted Kubernetes) or AWS ECS/Fargate.
- Experience in infrastructure deployment automation tools. (e.g. Terraform or AWS Cloudformation)
- Experience in CI/CD technical stacks such as gitlab.
- Experience in operating and deploying services on AWS.
- Experience in monitoring dashboards and collecting metrics/logging/tracing such as AWS CloudWatch, Container Insights, Prometheus/Grafana , Fluentd and Athena.
- Experience in proactive approaches to spotting problems, areas for improvement, and performance tuning.
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems.
*** The salary range is for reference and the actual salary will be further discussed with the candidate***