Software Engineer III - Java, Kafka, Kubernetes
JPMorganChase
We have an exciting and rewarding opportunity for you to take your software engineering career to the next level.
As a Software Engineer III at JPMorgan Chase within the Commercial & Investment Bank, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way. You are responsible for carrying out critical technology solutions across multiple technical areas within various business functions in support of the firm's business objectives.
Job responsibilities
- Provide Level 3 production support for critical Java microservices and batch/streaming workloads.
- Own major incident (P1/P2) triage, troubleshooting, mitigation, and restoration within SLA. Perform deep Root Cause Analysis (RCA) including log/metric/trace analysis; deliver corrective and preventive actions.
- Support and tune Kafka-based event streaming: consumer lag issues, rebalancing, partition strategy, retries/DLQ patterns, idempotency, ordering concerns. Support Spring Boot services: thread dumps, heap analysis, GC behavior, connection pooling, dependency issues.
- Diagnose and resolve database (Oracle) issues: slow queries, locks/deadlocks, indexing, execution plans, connection pool saturation.
- Support Elasticsearch (Gaia): index health, query performance, mapping issues, shard/replica allocation, ingestion failures.
- Collaborate with development teams on bug fixes, hotfix validation, release readiness, and post-deployment verification.
- Improve observability: enhance dashboards/alerts, log correlation, runbooks, and operational KPIs. Participate in on-call rotations, change management, and planned maintenance activities.
- Drive problem management: identify recurring issues, reduce noise, and implement automation/self-healing where feasible.
- Leverages enterprise-authorized AI coding assist tools within the work environment to improve code quality, delivery speed, and productivity across complex deliverables (e.g., code generation/refactoring, unit test creation, documentation), while validating outputs through peer review, automated testing, and secure coding standards; contributes learnings and reusable patterns to improve broader team effectiveness.
- Applies knowledge of tools within the Software Development Life Cycle toolchain, including enterprise-authorized AI-assisted development and automation capabilities, to improve the value realized by automation.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 3+ years applied experience
- Strong experience in Java production troubleshooting (Java 11+). Strong hands-on with Spring Boot (REST APIs, configuration, actuator/metrics, dependency management).
- Experience supporting Apache Kafka in production (topics, partitions, consumer groups, offsets, retries/DLQ, schema/versioning concepts). Strong knowledge of Oracle (SQL tuning, indexing, locking, performance troubleshooting).
- Experience supporting Elasticsearch (cluster health, index lifecycle, query DSL basics, performance diagnosis).
- Proven ability with incident management and RCA (problem statements, timeline, 5-Whys, action items). Strong Linux skills: process inspection, file/log handling, networking basics (netstat/ss, DNS, TLS concepts).
- CI/CD and release support exposure (any of): Jenkins, GitLab CI, ArgoCD, etc. Scripting/automation: Shell / Python basics for diagnostics and operational tooling.
- Experience with Kubernetes and cloud platform operations (assumed GKE/GCP or your “GKP” platform): pods, deployments, configmaps/secrets, scaling, resource limits, troubleshooting restarts/OOM.
- Hands-on experience using enterprise-authorized AI-assisted software development tools within the work environment (e.g., for coding, test creation, troubleshooting, or documentation) with demonstrated ability to critically evaluate, validate, and refine AI-generated outputs for correctness, performance, and security.
- Understanding of responsible AI use in engineering workflows, including data sensitivity considerations, secure handling of inputs/outputs, and adherence to resiliency and security expectations; ability to guide peers on safe and effective usage within team practices.
Preferred qualifications, capabilities, and skills
- Experience with schema registries (e.g., Avro/Protobuf concepts) and message compatibility strategies.
- Performance testing and capacity planning exposure (load patterns, bottleneck identification). Knowledge of ITIL processes (Incident/Problem/Change) and service management tools (e.g., ServiceNow). Experience, with 3+ years in L3/production support for distributed systems preferred
- Familiarity with security basics: secrets handling, certificate rotation, least privilege, vulnerability remediation support.
- Experience in regulated/high-availability environments (financial services is a plus). Familiarity with observability tools (any of): Splunk/ELK, Prometheus, Grafana, AppDynamics/Dynatrace, OpenTelemetry.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
SOLUTION ARCHITECT L1
Sales Advisor - Part Time
Brand & Marketing Strategist