Kubernetes pod auto restart with exit 137 code

ghz 4days ago ⋅ 6 views

This logs i got from exited container from Kubernetes one of node

can please anyone helo i think it's a memory issue but i have set sufficient resources to pod.

Memory is gradually increasing with time so memory leak may chance. Please help on this thanks.

It's only working on staging perfectly and on production it restart. Also i was thinking due to python-slim image i am using in docker so kernel or Linux itself killing my python process.

Thanks in advance

Nov 26 00:24:03 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: python3 invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=993
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  oom_kill_process+0x23e/0x490
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  out_of_memory+0x100/0x4c0
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  mem_cgroup_out_of_memory+0x3f/0x60
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  mem_cgroup_oom_synchronize+0x2dd/0x300
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  pagefault_out_of_memory+0x25/0x56
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 2106
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: memory+swap: usage 1048576kB, limit 9007199254740988kB, failcnt 0
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod2ef4b832-1101-11ea-9b9a-42010a8000a9: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod2ef4b832-1101-11ea-9b9a-42010a8000a9/4a728f33240d29d15761e3224c1c08a41943c233e8d2970b5068a19c95f1f3e1: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:48KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod2ef4b832-1101-11ea-9b9a-42010a8000a9/7dd87463773c32fbffad267b50f3986cdb969bd9915ab32cc371a50c9e2dc16f: cache:128KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod2ef4b832-1101-11ea-9b9a-42010a8000a9/6006c2b3ae7dcc7e6ddf41e765c747db71ed3b09c49e83cec281501ff848419e: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod2ef4b832-1101-11ea-9b9a-42010a8000a9/7f54c48546807d9430b82469e1968da2e83772b60c2c6b65a308d78b50eefc56: cache:0KB rss:1041756KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:1042088KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup out of memory: Kill process 3951244 (python3) score 2004 or sacrifice child
Nov 29 06:45:02 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: oom_reaper: reaped process 3951244 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Nov 29 06:45:24 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: python3 invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=993
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  oom_kill_process+0x23e/0x490
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  out_of_memory+0x100/0x4c0
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  mem_cgroup_out_of_memory+0x3f/0x60
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  mem_cgroup_oom_synchronize+0x2dd/0x300
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel:  pagefault_out_of_memory+0x25/0x56
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 795
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: memory+swap: usage 1048576kB, limit 9007199254740988kB, failcnt 0
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod06691003-1101-11ea-9b9a-42010a8000a9: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod06691003-1101-11ea-9b9a-42010a8000a9/c1265c7dc67ee140d0033c3527adcb4e47fded0e8ac27822701d2e56acbb528f: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod06691003-1101-11ea-9b9a-42010a8000a9/471908fde52e37475d1e454fd23755ac0066fd16f324aa1b8dcdae70ae3ee4db: cache:128KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod06691003-1101-11ea-9b9a-42010a8000a9/d29c34fa96c3350d5b5caf09f19be16d68d07bbd54dd80c7bb709f7d55937ae7: cache:0KB rss:44KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup stats for /kubepods/burstable/pod06691003-1101-11ea-9b9a-42010a8000a9/10686aeb2ec1f3054f2d6da37b75a74076c3c1ad61d0fda16601bcca8f66f8c2: cache:12KB rss:1042092KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:1042244KB inactive_file:0KB active_file:0KB unevictable:0KB
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: Memory cgroup out of memory: Kill process 3956354 (python3) score 2004 or sacrifice child
Nov 29 06:45:25 gke-cluster-highmem-pool-gen2-f2743e02-msv2 kernel: oom_reaper: reaped process 3956354 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Answer

It seems that your Kubernetes pod is experiencing an Out of Memory (OOM) error in production, causing it to be killed by the kernel’s OOM killer. The log you provided indicates that the memory usage is reaching the memory limit set for the pod, which triggers the OOM killer to terminate the Python process.

Here are some steps you can take to investigate and potentially resolve the issue:

1. Memory Limits and Requests

  • Verify Resource Requests and Limits: Ensure that you are setting appropriate memory requests and limits for your pod. In Kubernetes, the requests specify the minimum memory the pod needs, and limits define the maximum it can consume. If the pod exceeds the memory limit, it will be killed, as you're seeing in the logs. Double-check that your pod's memory limit is high enough for the workload.
  • Example configuration:
    resources:
      requests:
        memory: "512Mi"
      limits:
        memory: "2Gi"
    
  • If you're using python-slim or another minimal base image, ensure that it's not causing resource constraints. Sometimes, minimal images can lead to increased memory usage because of missing dependencies or inefficiencies.

2. Memory Leak

  • Since you mentioned that memory usage gradually increases, it's possible you have a memory leak in your Python application.
    • You can use memory profiling tools like memory_profiler or tracemalloc to monitor memory usage and identify any objects that are growing unexpectedly.
    • Ensure that your application is properly cleaning up unused objects and resources. For example, large datasets in memory or open file handles can contribute to memory leaks.

3. Using the python-slim Image

  • The python-slim image is quite minimal and may lack certain optimizations or dependencies that could help manage memory more efficiently. If possible, you can try using a larger base image (e.g., python:3.x) to see if the issue persists. The difference in memory consumption between python:slim and python:3.x could be significant.

4. Heap Dump and Garbage Collection

  • In Python, memory leaks often occur when objects aren't being garbage collected. Ensure that the garbage collector is running as expected. You can force a manual garbage collection by using:
    import gc
    gc.collect()
    
  • You can also dump the heap to analyze memory usage:
    import objgraph
    objgraph.show_growth()
    
  • This will help you identify if certain objects are not being freed properly.

5. Container Resource Monitoring

  • You can enable container memory monitoring to see how the memory usage evolves over time. This will help you track if memory usage spikes at specific times or under certain conditions.
  • You can use Kubernetes' built-in tools like kubectl top pod to monitor memory and CPU usage in real time:
    kubectl top pod <pod-name>
    

6. Check for Python Process Memory Usage

  • You can monitor memory usage of your Python process by using tools like psutil to track memory consumption programmatically:
    import psutil
    process = psutil.Process(os.getpid())
    print(f"Memory usage: {process.memory_info().rss / (1024 * 1024)} MB")
    

7. Kernel OOM Killer

  • The OOM killer is a last-resort measure used by the Linux kernel when a process exceeds the memory limit. If your pod is constantly hitting this, it suggests that the memory limit set for the container might be too low for the workload. Increase the memory limits to see if the issue resolves.

8. Application Logging

  • Add more detailed logging to your application to see if there are specific operations that are leading to memory spikes. Look for operations like loading large files into memory or running memory-intensive computations.

9. Testing in Staging vs. Production

  • Since this issue only happens in production, consider testing the application with similar resource limits and conditions in staging to identify any configuration discrepancies between the environments.
  • Ensure that the staging and production environments have the same resource allocation and configurations, including the same Kubernetes node pool size, memory limits, and image versions.

10. Consider Horizontal Scaling

  • If your application can be distributed, consider scaling your application horizontally by increasing the number of pods. This can help alleviate memory pressure on individual pods.

By following these steps, you should be able to identify the root cause of the memory issue and fix the pod's memory usage. Let me know if you'd like further assistance with any of these steps!