<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LucasBG0.com</title><link>https://lucasbg0.com/</link><description>Lucas Barbosa Gomes' blog — DevOps, Kubernetes, Linux, cloud and infrastructure engineering. English edition.</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 12 Jun 2026 19:58:20 GMT</lastBuildDate><atom:link href="https://lucasbg0.com/index.xml" rel="self" type="application/rss+xml"/><item><title>How to size Java memory in Kubernetes (MaxRAMPercentage and OOMKill)</title><link>https://lucasbg0.com/en/2026/06/09/jvm-memoria-em-containers/</link><guid isPermaLink="true">https://lucasbg0.com/en/2026/06/09/jvm-memoria-em-containers/</guid><pubDate>Tue, 09 Jun 2026 19:30:00 GMT</pubDate><description>&lt;p&gt;This post came out of a real problem. A fleet of Java applications running on
Kubernetes had standardized JVMs on
&lt;code&gt;-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75&lt;/code&gt; and sized memory &lt;code&gt;requests&lt;/code&gt;
and &lt;code&gt;limits&lt;/code&gt; by looking at &lt;code&gt;container.memory.usage&lt;/code&gt; and &lt;code&gt;working_set&lt;/code&gt; in
observability. On paper it looked right: the JVM takes 75% of the limit, 25% is
left for everything else, and you track RSS to tune.&lt;/p&gt;
&lt;p&gt;In practice, two symptoms showed up:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;RSS for &lt;strong&gt;every&lt;/strong&gt; application sat pinned at ~75% of the limit — including
&lt;code&gt;dev&lt;/code&gt; environments that barely saw traffic. There was no way to tell who was
actually wasting memory.&lt;/li&gt;
&lt;li&gt;When we tried to cut limits based on that RSS, several applications started
getting &lt;strong&gt;OOMKilled (exit 137)&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is one of those cases where the metric you&amp;rsquo;re looking at is lying — not
because it&amp;rsquo;s wrong, but because you&amp;rsquo;re interpreting it wrong. I&amp;rsquo;ll document what
I found investigating data from real environments and, at the end, I built a
&lt;strong&gt;reproducible PoC&lt;/strong&gt; (Docker + Java 21) that proves every claim with measured
numbers. All PoC code lives in the public repo
&lt;a href="https://github.com/LucasBG0/poc-jvm-memory-containers"target="_blank" rel="noopener"&gt;LucasBG0/poc-jvm-memory-containers&lt;/a&gt;
and runs with &lt;code&gt;./run.sh&lt;/code&gt;.&lt;/p&gt;</description><content:encoded><![CDATA[<p>This post came out of a real problem. A fleet of Java applications running on
Kubernetes had standardized JVMs on
<code>-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75</code> and sized memory <code>requests</code>
and <code>limits</code> by looking at <code>container.memory.usage</code> and <code>working_set</code> in
observability. On paper it looked right: the JVM takes 75% of the limit, 25% is
left for everything else, and you track RSS to tune.</p>
<p>In practice, two symptoms showed up:</p>
<ol>
<li>RSS for <strong>every</strong> application sat pinned at ~75% of the limit — including
<code>dev</code> environments that barely saw traffic. There was no way to tell who was
actually wasting memory.</li>
<li>When we tried to cut limits based on that RSS, several applications started
getting <strong>OOMKilled (exit 137)</strong>.</li>
</ol>
<p>This is one of those cases where the metric you&rsquo;re looking at is lying — not
because it&rsquo;s wrong, but because you&rsquo;re interpreting it wrong. I&rsquo;ll document what
I found investigating data from real environments and, at the end, I built a
<strong>reproducible PoC</strong> (Docker + Java 21) that proves every claim with measured
numbers. All PoC code lives in the public repo
<a href="https://github.com/LucasBG0/poc-jvm-memory-containers"target="_blank" rel="noopener">LucasBG0/poc-jvm-memory-containers</a>
and runs with <code>./run.sh</code>.</p>
<blockquote>
  <p>The &ldquo;production&rdquo; numbers throughout the text come from a real Java app fleet on
K8s, anonymized. The &ldquo;PoC&rdquo; numbers were measured on my machine and you can
reproduce them.</p>

</blockquote>
<h2>The basics everyone gets wrong<span class="hx:absolute hx:-mt-20" id="the-basics-everyone-gets-wrong"></span>
    <a href="#the-basics-everyone-gets-wrong" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>Before anything else, you need to separate four flags that look like they do the
same thing and don&rsquo;t.</p>
<h3><code>-Xms</code> and <code>-Xmx</code> (absolute)<span class="hx:absolute hx:-mt-20" id="-xms-and--xmx-absolute"></span>
    <a href="#-xms-and--xmx-absolute" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>These are the <strong>initial</strong> (<code>-Xms</code>) and <strong>maximum</strong> (<code>-Xmx</code>) heap sizes, in
absolute values (<code>-Xmx512m</code>). The classic container problem: for a long time,
the JVM <strong>did not see the cgroup</strong> and calculated these values from the <strong>host&rsquo;s
full RAM</strong>. You set a 512 MiB pod limit and the JVM thought it had 64 GiB to
play with → OOMKill on the first load. That was fixed (JDK 8u191+ and 10+ are
container-aware), but fixed <code>-Xms</code>/<code>-Xmx</code> is still <strong>manually coupled</strong>: if
someone changes the container limit and forgets to change <code>-Xmx</code>, the two drift
apart.</p>
<h3><code>-XX:MaxRAMPercentage</code> (the ceiling, cgroup-aware)<span class="hx:absolute hx:-mt-20" id="-xxmaxrampercentage-the-ceiling-cgroup-aware"></span>
    <a href="#-xxmaxrampercentage-the-ceiling-cgroup-aware" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>Sets the <strong>maximum</strong> heap as a percentage of <strong>available</strong> memory (which, in a
container, is the cgroup limit). This is the modern, recommended approach: you
change the pod limit and the heap follows. <code>MaxRAMPercentage=75</code> in a 768 MiB
container → 576 MiB max heap.</p>
<h3><code>-XX:InitialRAMPercentage</code> (the initial size, cgroup-aware)<span class="hx:absolute hx:-mt-20" id="-xxinitialrampercentage-the-initial-size-cgroup-aware"></span>
    <a href="#-xxinitialrampercentage-the-initial-size-cgroup-aware" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>Same idea, but for the <strong>initial</strong> heap size (the percentage equivalent of
<code>-Xms</code>). This is where much of the confusion in this post lives: <strong>setting a high
<code>InitialRAMPercentage</code> does not mean your app needs that much — it means the JVM
will <em>commit</em> that much at boot.</strong></p>
<h3><code>-XX:MinRAMPercentage</code> (the gotcha)<span class="hx:absolute hx:-mt-20" id="-xxminrampercentage-the-gotcha"></span>
    <a href="#-xxminrampercentage-the-gotcha" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>This is the JDK&rsquo;s most misleading name. <code>MinRAMPercentage</code> does <strong>not</strong> define
a heap floor, despite the name. It only kicks in when available memory is
<strong>small</strong> (below ~256 MiB by default) and, in that case, sets the max heap to
that percentage. For any container with &ldquo;normal&rdquo; memory, <code>MinRAMPercentage</code> is
<strong>simply ignored</strong> and <code>MaxRAMPercentage</code> wins. I&rsquo;ll prove that with the PoC
later.</p>
<h2>The PoC: what it measures<span class="hx:absolute hx:-mt-20" id="the-poc-what-it-measures"></span>
    <a href="#the-poc-what-it-measures" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>The PoC is a single Java program (<code>MemReport.java</code>) running in Java 21 &ldquo;source
mode&rdquo;, inside an <code>eclipse-temurin:21-jdk</code> image, with a fixed container limit
via <code>docker run --memory</code>. It:</p>
<ul>
<li>retains a controlled <strong>live set</strong> (30 MiB of <code>byte[]</code> that survive GC) —
representing the memory the app actually needs;</li>
<li>generates <strong>churn</strong> (short-lived garbage) to fill eden;</li>
<li>takes two snapshots: <strong>BOOT</strong> (right after startup, before allocating) and
<strong>POST</strong> (after retaining the live set + churn);</li>
<li>forces a <code>System.gc()</code> at the end and measures the <strong>post-GC live set</strong>
(<code>old + survivor</code>);</li>
<li>reads <code>/sys/fs/cgroup/memory.current</code> and <code>memory.max</code> <strong>from inside the
container</strong> to report real RSS and limit.</li>
</ul>
<p>I ran seven scenarios, all with the same 30 MiB live set, varying only the flags.
Here is the full table (values in MiB):</p>
<table>
	<thead>
			<tr>
					<th>scenario</th>
					<th style="text-align: right">limit</th>
					<th style="text-align: right">heap max (Xmx)</th>
					<th style="text-align: right">committed BOOT</th>
					<th style="text-align: right">RSS BOOT</th>
					<th style="text-align: right">committed POST</th>
					<th style="text-align: right">RSS POST</th>
					<th style="text-align: right">heap used POST</th>
					<th style="text-align: right">non-heap committed</th>
					<th style="text-align: right">threads</th>
					<th style="text-align: right">live set (post-GC)</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td><code>default</code></td>
					<td style="text-align: right">768</td>
					<td style="text-align: right">192</td>
					<td style="text-align: right">27</td>
					<td style="text-align: right">85</td>
					<td style="text-align: right">192</td>
					<td style="text-align: right">175</td>
					<td style="text-align: right">82</td>
					<td style="text-align: right">21</td>
					<td style="text-align: right">6</td>
					<td style="text-align: right">64</td>
			</tr>
			<tr>
					<td><code>init-max-75</code></td>
					<td style="text-align: right">768</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">104</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">239</td>
					<td style="text-align: right">212</td>
					<td style="text-align: right">21</td>
					<td style="text-align: right">6</td>
					<td style="text-align: right">64</td>
			</tr>
			<tr>
					<td><code>init-max-75-pretouch</code></td>
					<td style="text-align: right">768</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">645</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">654</td>
					<td style="text-align: right">212</td>
					<td style="text-align: right">21</td>
					<td style="text-align: right">6</td>
					<td style="text-align: right">64</td>
			</tr>
			<tr>
					<td><code>low-init-max-75</code></td>
					<td style="text-align: right">768</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">118</td>
					<td style="text-align: right">108</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">392</td>
					<td style="text-align: right">187</td>
					<td style="text-align: right">21</td>
					<td style="text-align: right">6</td>
					<td style="text-align: right">64</td>
			</tr>
			<tr>
					<td><code>xmx-xms</code></td>
					<td style="text-align: right">768</td>
					<td style="text-align: right">512</td>
					<td style="text-align: right">512</td>
					<td style="text-align: right">114</td>
					<td style="text-align: right">512</td>
					<td style="text-align: right">231</td>
					<td style="text-align: right">160</td>
					<td style="text-align: right">21</td>
					<td style="text-align: right">6</td>
					<td style="text-align: right">64</td>
			</tr>
			<tr>
					<td><code>minram-small</code></td>
					<td style="text-align: right">200</td>
					<td style="text-align: right">100</td>
					<td style="text-align: right">25</td>
					<td style="text-align: right">67</td>
					<td style="text-align: right">100</td>
					<td style="text-align: right">122</td>
					<td style="text-align: right">70</td>
					<td style="text-align: right">21</td>
					<td style="text-align: right">6</td>
					<td style="text-align: right">64</td>
			</tr>
			<tr>
					<td><code>minram-large</code></td>
					<td style="text-align: right">768</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">21</td>
					<td style="text-align: right">92</td>
					<td style="text-align: right">576</td>
					<td style="text-align: right">379</td>
					<td style="text-align: right">188</td>
					<td style="text-align: right">21</td>
					<td style="text-align: right">6</td>
					<td style="text-align: right">64</td>
			</tr>
			<tr>
					<td><code>oom-xmx-acima-do-limit</code></td>
					<td style="text-align: right">600</td>
					<td style="text-align: right">—</td>
					<td style="text-align: right">—</td>
					<td style="text-align: right">—</td>
					<td style="text-align: right">—</td>
					<td style="text-align: right"><strong>exit 137</strong></td>
					<td style="text-align: right">—</td>
					<td style="text-align: right">—</td>
					<td style="text-align: right">—</td>
					<td style="text-align: right">—</td>
			</tr>
	</tbody>
</table>
<p>I&rsquo;ll unpack each lesson using this table.</p>
<h2>Pitfall #1: <code>InitialRAMPercentage</code> commits, but that&rsquo;s not &ldquo;usage&rdquo;<span class="hx:absolute hx:-mt-20" id="pitfall-1-initialrampercentage-commits-but-thats-not-usage"></span>
    <a href="#pitfall-1-initialrampercentage-commits-but-thats-not-usage" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>Look at the <strong>committed BOOT</strong> column (heap committed at boot, before any
allocation):</p>
<ul>
<li><code>default</code> (default Initial ~1.5%): <strong>27 MiB</strong></li>
<li><code>low-init-max-75</code> (Initial=15%): <strong>118 MiB</strong></li>
<li><code>init-max-75</code> (Initial=75%): <strong>576 MiB</strong></li>
<li><code>xmx-xms</code> (<code>-Xms512m</code>): <strong>512 MiB</strong></li>
</ul>
<p>In other words: <code>InitialRAMPercentage</code>/<code>-Xms</code> controls <strong>how much heap the JVM
reserves and commits at startup</strong>, regardless of what the app needs. With
<code>Initial=75</code>, the JVM commits 576 MiB of heap before the app does anything
useful.</p>
<p>But — and this &ldquo;but&rdquo; is the heart of the problem — <strong>committing is not touching</strong>.
Look at the <strong>RSS BOOT</strong> column: even after committing 576 MiB of heap,
<code>init-max-75</code> boots with RSS of only <strong>104 MiB</strong>, practically the same as
<code>default</code> (85 MiB). The kernel only counts in RSS the pages that were <strong>actually
accessed</strong> (page fault). Committed but untouched heap is reserved address space,
not physical memory.</p>
<p>Here is the boot snapshot from the <code>init-max-75</code> scenario, straight from the PoC
log:</p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">### BOOT (before allocating)
</span></span><span class="line"><span class="cl">  container limit (cgroup) : 768 MiB
</span></span><span class="line"><span class="cl">  container RSS   (cgroup) : 120 MiB
</span></span><span class="line"><span class="cl">  heap max (effective Xmx) : 576 MiB
</span></span><span class="line"><span class="cl">  heap used                : 10 MiB
</span></span><span class="line"><span class="cl">  heap committed           : 576 MiB   &lt;-- committed 75% of limit</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<h3>So why did production RSS stay pinned at 75%?<span class="hx:absolute hx:-mt-20" id="so-why-did-production-rss-stay-pinned-at-75"></span>
    <a href="#so-why-did-production-rss-stay-pinned-at-75" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>Two reasons, and the PoC shows both.</p>
<p><strong>(a) <code>AlwaysPreTouch</code>.</strong> If the JVM starts with <code>-XX:+AlwaysPreTouch</code> (common in
setups that pin the heap for predictable latency), it <strong>touches every committed
page at boot</strong>. See the <code>init-max-75-pretouch</code> scenario: RSS BOOT jumps from 104
to <strong>645 MiB</strong>. Now RSS reflects <code>committed</code>, not demand.</p>
<p><strong>(b) Fixed heap + real load.</strong> With <code>Initial = Max</code>, the heap never shrinks, and
as the app runs (allocation, GC evacuation), pages get touched until RSS hits
committed and stays there. In production, with continuous traffic for days, that&rsquo;s
exactly what happens: RSS saturates at ~75% and stays.</p>
<p>The outcome is the same either way: <strong><code>working_set</code>/<code>container.memory.usage</code> stop
reflecting real demand and start marking ~75% of the limit for everyone.</strong> That&rsquo;s
why <code>dev</code>, with no load, showed the same RSS as production. Sizing <code>request</code> from
that number means sizing from your own <code>MaxRAMPercentage</code>, not from what the app
needs.</p>
<h2>Pitfall #2: &ldquo;real usage&rdquo; is the post-GC live set, and it&rsquo;s invariant<span class="hx:absolute hx:-mt-20" id="pitfall-2-real-usage-is-the-post-gc-live-set-and-its-invariant"></span>
    <a href="#pitfall-2-real-usage-is-the-post-gc-live-set-and-its-invariant" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>If RSS lies, what number doesn&rsquo;t? The <strong>live set</strong>: what remains on the heap
<strong>after</strong> a GC — the objects the app actually holds.</p>
<p>The JVM splits the heap into generations. The identity is exact:</p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">heap_used = eden + survivor + old</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<ul>
<li><strong>eden</strong>: where new objects are born. It&rsquo;s <strong>churn</strong> — short-lived garbage the
GC sweeps. It grows and shrinks with available heap.</li>
<li><strong>survivor + old</strong>: what <strong>survived</strong> GC. That&rsquo;s the <strong>live set</strong> — memory the
app actually retains.</li>
</ul>
<p>The proof is in the PoC. In <strong>all</strong> seven scenarios that ran to completion, the
post-GC live set was exactly <strong>64 MiB</strong> (last column), because it&rsquo;s always the
same app retaining the same blocks. (I retained 30 arrays of 1 MiB, but each one
spills into a G1 region and rounds to two → ~60 MiB of humongous objects +
retained classes ≈ 64 MiB; the detail doesn&rsquo;t matter, what matters is that it&rsquo;s
<strong>constant</strong>.) Heap configuration doesn&rsquo;t change what the app needs — only how
much empty space surrounds it.</p>
<p>Now look at the perverse side effect in the <strong>heap used POST</strong> column (peak heap
used during churn):</p>
<ul>
<li><code>default</code> (192 MiB heap): peak of <strong>82 MiB</strong></li>
<li><code>init-max-75</code> (576 MiB heap): peak of <strong>212 MiB</strong></li>
</ul>
<p>Same app, same 64 MiB live set, but the <code>heap_used</code> peak is <strong>2.5× larger</strong> just
because the heap is bigger. Why? Bigger heap → GC runs <strong>less often</strong> → more
floating garbage (eden + dead objects not yet collected) accumulates between
collections. That&rsquo;s another reason looking at the <strong>peak</strong> of <code>heap_used</code> (or
RSS, which follows touched heap) overestimates real need. The honest number is
the <strong>post-GC trough</strong>: <code>old + survivor</code>.</p>
<p>The metrics that survive the distortion are:</p>
<ul>
<li><code>jvm.gc.old_gen_size</code> + <code>jvm.gc.survivor_size</code> → live set (retained heap);</li>
<li><code>jvm.non_heap_memory</code> (Metaspace, Code Cache, Compressed Class) → off-heap,
grows on demand;</li>
<li><code>jvm.buffer_pool.direct.used</code> (DirectByteBuffer) and <code>jvm.thread_count</code> (≈ 1 MiB
per thread stack) → <strong>native</strong> memory, off-heap, but counted in RSS.</li>
</ul>
<p>And the ones you should <strong>stop using</strong> for sizing while <code>Initial</code> is high:
<code>container.memory.usage</code>, <code>working_set</code>, and <code>jvm.heap_memory_committed</code> — all
inflated.</p>
<h2>Pitfall #3: the one that hurts — OOMKill 137<span class="hx:absolute hx:-mt-20" id="pitfall-3-the-one-that-hurts--oomkill-137"></span>
    <a href="#pitfall-3-the-one-that-hurts--oomkill-137" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>This is the part that cost us. With &ldquo;real usage&rdquo; in hand, the first attempt was
the intuitive formula:</p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">request = limit = real_live_usage × 1.2</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<p>Take <code>old + survivor + non_heap_used + direct + threads</code>, multiply by 1.2 for
headroom, and cut the limit. Looked great on the dashboard: ~50% savings.</p>
<p>Result in <code>dev</code>/<code>qa</code>: a wave of applications getting <strong>OOMKilled (exit 137)</strong>.</p>
<p>The root cause had two parts, both ignored by the naive formula:</p>
<ol>
<li><strong><code>non_heap_committed</code>, not <code>non_heap_used</code>.</strong> Metaspace and Code Cache reserve
(commit) blocks slightly above what they use and almost never give them back.
It&rsquo;s <code>committed</code> that counts in RSS and triggers OOM, not <code>used</code>. The gap
between the two is small, but it&rsquo;s worth using <code>non_heap_committed</code> out of
conservatism.</li>
<li><strong>Invisible native overhead.</strong> GC and JIT internal structures, page cache, and
especially <strong>APM/monitoring agents</strong> (Datadog Agent, New Relic, AppDynamics,
Elastic APM…) — none of this shows up in <code>jvm.*</code> metrics, but all of it
occupies RSS. Reconciling against real RSS in that fleet, this native residue
was <strong>66–254 MiB</strong> (average ~130 MiB). The <strong>150 MiB</strong> constant was the value
that worked for that set of services; the right number for your fleet depends
on what runs inside the container. How to calibrate is described in the
formula section below.</li>
</ol>
<p>Add the two together and you can see why limits cut with the &ldquo;live × 1.2&rdquo; formula
landed below the JVM&rsquo;s <strong>physical floor</strong> and died.</p>
<p>The PoC reproduces exit 137 deterministically in the <code>oom-xmx-acima-do-limit</code>
scenario: a 600 MiB container with <code>-Xms700m -Xmx700m -XX:+AlwaysPreTouch</code>. The
configured heap (700 MiB) doesn&rsquo;t fit in the container (600 MiB), and
<code>AlwaysPreTouch</code> tries to touch everything at boot:</p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">&gt;&gt; scenario: oom-xmx-acima-do-limit  (--memory=600m)  flags: -Xms700m -Xmx700m -XX:+AlwaysPreTouch
</span></span><span class="line"><span class="cl">   [!] container exited with code 137 (137 = OOMKill)</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<p>Same mechanism, more explicit: when <code>heap_max + non_heap_committed + native</code>
exceeds <code>limit</code>, the kernel kills the process. In production it happened quietly
because nobody was adding non-heap and native to the math.</p>
<h2>The formula that survived<span class="hx:absolute hx:-mt-20" id="the-formula-that-survived"></span>
    <a href="#the-formula-that-survived" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>After the 137s, the sizing formula became this (<code>request = limit</code> policy, i.e.
<strong>Guaranteed</strong> QoS — the JVM tends to grow to the ceiling, so <code>request &lt; limit</code>
doesn&rsquo;t help):</p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">limit = live_heap_pico / occ
</span></span><span class="line"><span class="cl">      + non_heap_committed
</span></span><span class="line"><span class="cl">      + direct.used
</span></span><span class="line"><span class="cl">      + thread_count × 1 MiB
</span></span><span class="line"><span class="cl">      + N MiB   (native residue: calibrated per fleet — see below)</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<p>The terms, mapped to metrics that survive the distortion:</p>
<table>
	<thead>
			<tr>
					<th>term</th>
					<th>metric</th>
					<th>role</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td><code>live_heap_pico</code></td>
					<td><code>max(old_gen_size + survivor_size)</code></td>
					<td>heap the app retains</td>
			</tr>
			<tr>
					<td><code>occ</code></td>
					<td>—</td>
					<td>target heap occupancy (live as fraction of heap)</td>
			</tr>
			<tr>
					<td><code>non_heap_committed</code></td>
					<td><code>max(non_heap_memory_committed)</code></td>
					<td>Metaspace + Code Cache (reserved)</td>
			</tr>
			<tr>
					<td><code>direct.used</code></td>
					<td><code>max(buffer_pool.direct.used)</code></td>
					<td>DirectByteBuffer (native)</td>
			</tr>
			<tr>
					<td><code>thread_count × 1 MiB</code></td>
					<td><code>max(thread_count)</code></td>
					<td>thread stacks (native)</td>
			</tr>
			<tr>
					<td><code>N MiB</code></td>
					<td>calibrated constant</td>
					<td>native overhead with no direct metric</td>
			</tr>
	</tbody>
</table>
<h3>How to calibrate native residue (<code>N</code>)<span class="hx:absolute hx:-mt-20" id="how-to-calibrate-native-residue-n"></span>
    <a href="#how-to-calibrate-native-residue-n" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p><code>N</code> has no dedicated JVM metric because it lives outside managed heap and
non-heap. In practice, it&rsquo;s the gap between measured RSS and everything you can
sum directly:</p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">N  ≈  RSS_stable  −  heap_committed  −  non_heap_committed  −  direct.used  −  (threads × 1 MiB)</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<p><strong>Stable RSS</strong> is <code>container.memory.usage</code> (or cgroup <code>memory.current</code>) read when
the app is warmed up and under representative load, but <strong>without</strong> high
<code>InitialRAMPercentage</code> or <code>AlwaysPreTouch</code> — otherwise RSS reflects committed
but untouched heap, and the calculated <code>N</code> is artificially inflated. Use a low
<code>InitialRAMPercentage</code> config (e.g. 25%) for this measurement.</p>
<p>The main components of this residue:</p>
<ul>
<li><strong>APM agent</strong> (Datadog Agent, New Relic, AppDynamics, Elastic APM…): the Java
agent attaches as a <code>-javaagent</code> and allocates its own native memory — 30 to
100+ MiB depending on the agent and instrumentation level.</li>
<li><strong>Metrics exporter</strong> (Prometheus JMX Exporter, Micrometer…): smaller impact,
but not zero.</li>
<li><strong>Native thread stacks</strong> beyond the nominal <code>1 MiB</code>: each thread&rsquo;s real stack
(default <code>-Xss1m</code> on Linux) plus associated kernel structures.</li>
<li><strong>GC internal overhead</strong>: G1GC maintains card tables, remembered sets, and
marking bitmaps that scale with heap size (typically 1–5% of max heap).</li>
<li><strong>Kernel page cache and I/O buffers</strong>: <code>mmap</code>&rsquo;d files, network buffers — the
kernel counts them in process RSS.</li>
</ul>
<p>In the fleet that originated this post, <code>N = 150 MiB</code> covered most services well
(actual range: 66–254 MiB). If you run a heavy APM agent or have many threads,
<strong>measure and adjust</strong>; 150 MiB is a starting point, not a universal constant.
The worst case is underestimating: you&rsquo;ll see OOMKill. The second worst is
overestimating a lot: you waste memory but the app survives.</p>
<p>With JVM metrics collected, you can build a dashboard that applies this formula
automatically and shows the recommended <code>request</code>/<code>limit</code> per service:</p>
<p><img src="/2026/06/09/jvm-memoria-em-containers/example-dashboard-jvm-recomendation.png" alt="Dashboard with JVM metrics and calculated request/limit recommendation"  loading="lazy" /></p>
<h3>Why divide by <code>occ</code>?<span class="hx:absolute hx:-mt-20" id="why-divide-by-occ"></span>
    <a href="#why-divide-by-occ" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>The heap must be <strong>larger</strong> than the live set to fit eden allocation between GCs,
G1 evacuation working space, and spikes. Rule of thumb: live set shouldn&rsquo;t exceed
~70% of heap, or GC enters thrashing (chained full GCs → high CPU → OOM from <em>GC
overhead limit</em>).</p>
<ul>
<li><code>occ = 0.70</code> (aggressive): <code>heap = live × 1.43</code>. Saves more memory, more
frequent GC.</li>
<li><code>occ = 0.60</code> (recommended): <code>heap = live × 1.67</code>. More headroom, less GC, a
bit more memory.</li>
</ul>
<p>The central trade-off is <strong>memory ↔ CPU/safety</strong>. We settled on <code>0.60</code> as the
default.</p>
<h3>Two equivalent ways to apply it<span class="hx:absolute hx:-mt-20" id="two-equivalent-ways-to-apply-it"></span>
    <a href="#two-equivalent-ways-to-apply-it" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>The target heap (<code>live/occ</code>) is the same; only how you express it changes:</p>
<p><strong>(A) Percentage:</strong></p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">-XX:InitialRAMPercentage=&lt;P&gt; -XX:MaxRAMPercentage=&lt;P&gt;
</span></span><span class="line"><span class="cl">where P = (live/occ) / limit × 100</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<p>The heap follows the limit and <strong>never exceeds it</strong> — <code>MaxRAMPercentage</code> gives
you that guardrail for free.</p>
<p><strong>(B) Explicit:</strong></p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">-Xms&lt;live/occ&gt; -Xmx&lt;live/occ&gt;</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<p>Direct, but <strong>decoupled from the limit</strong>. Requires a guardrail in Helm/CI
ensuring <code>-Xmx + non_heap + native ≤ limit</code>, or you hit the PoC&rsquo;s exit 137
scenario.</p>
<p>In real microservice apps, that <code>P</code> landed in the <strong>26–45%</strong> range — far from the
standardized 75%. That was the waste.</p>
<h2>Bonus pitfall: high <code>MaxRAMPercentage</code> embeds OOM risk<span class="hx:absolute hx:-mt-20" id="bonus-pitfall-high-maxrampercentage-embeds-oom-risk"></span>
    <a href="#bonus-pitfall-high-maxrampercentage-embeds-oom-risk" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>There&rsquo;s a dangerous structural detail. Because <code>MaxRAMPercentage</code> ties <code>heap_max</code>
to the limit, in small apps the JVM&rsquo;s <strong>floor</strong> can already exceed the limit:</p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">heap_max (75% of 768) = 576 MiB
</span></span><span class="line"><span class="cl">+ non_heap_committed   = 227 MiB   (real case)
</span></span><span class="line"><span class="cl">= 803 MiB  &gt;  limit of 768 MiB</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<p>And that&rsquo;s <strong>before</strong> counting native overhead. With <code>MaxRAMPercentage=75</code>, a
non-heap-heavy app is born with a theoretical ceiling above the limit. It works
while the heap doesn&rsquo;t fill — but it&rsquo;s a time bomb. The immediate mitigation in
production was lowering <code>Initial/Max</code> from 75 → 65 on the tightest apps, then
applying the formula per service.</p>
<h2>The <code>MinRAMPercentage</code> gotcha, proven<span class="hx:absolute hx:-mt-20" id="the-minrampercentage-gotcha-proven"></span>
    <a href="#the-minrampercentage-gotcha-proven" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>Back to the misleading name. Compare the two PoC scenarios, both with
<code>-XX:MinRAMPercentage=50 -XX:MaxRAMPercentage=75</code>:</p>
<ul>
<li><code>minram-small</code> (<strong>200 MiB</strong> container): heap max = <strong>100 MiB</strong> = 50% of 200
→ <strong><code>MinRAMPercentage</code></strong> governed.</li>
<li><code>minram-large</code> (<strong>768 MiB</strong> container): heap max = <strong>576 MiB</strong> = 75% of 768
→ <code>MinRAMPercentage</code> was <strong>ignored</strong>, <code>MaxRAMPercentage</code> governed.</li>
</ul>
<p>The rule: below ~256 MiB of available memory, <code>MinRAMPercentage</code> sets the
ceiling; above that, it does nothing. In practice, for 99% of app containers,
<strong>setting <code>MinRAMPercentage</code> has no effect</strong> — and it&rsquo;s a recurring source of
confusion. If you want to control heap, the lever is <code>MaxRAMPercentage</code>.</p>
<h2>Actionable best-practices checklist<span class="hx:absolute hx:-mt-20" id="actionable-best-practices-checklist"></span>
    <a href="#actionable-best-practices-checklist" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>What we learned, summarized:</p>
<ol>
<li><strong>Don&rsquo;t size JVM memory from <code>container.memory.usage</code>/<code>working_set</code></strong> if you
use high <code>InitialRAMPercentage</code> or <code>AlwaysPreTouch</code>. Those numbers mark
~<code>MaxRAMPercentage</code> of the limit, not demand.</li>
<li><strong>Measure real usage from the post-GC live set</strong>: <code>old_gen_size + survivor_size</code>. Eden is churn, don&rsquo;t add it.</li>
<li><strong>Don&rsquo;t forget non-heap and native.</strong> <code>non_heap_committed</code> + thread stacks +
direct buffers + native residue (<code>N</code>). Ignoring them caused the OOMKill 137s.
<code>N</code> depends on what runs in the container — calibrate by measuring
<code>RSS_stable − heap_committed − non_heap_committed − direct − threads×1MiB</code>.
150 MiB worked for that fleet; yours may differ.</li>
<li><strong><code>request = limit</code> (Guaranteed)</strong> for JVM — it grows to the ceiling.</li>
<li><strong><code>MaxRAMPercentage</code> vs fixed <code>-Xmx</code></strong>: <code>MaxRAMPercentage</code> follows the limit
and gives you a guardrail. If you use <code>-Xmx</code>, add an explicit guardrail in
CI/Helm.</li>
<li><strong><code>MinRAMPercentage</code> is almost never what you want.</strong> It only acts in tiny
containers (&lt;256 MiB).</li>
<li><strong>Watch high <code>MaxRAMPercentage</code> on small apps</strong>: <code>heap_max + non_heap_committed</code> may already exceed the limit.</li>
</ol>
<h2>Appendix: running the PoC<span class="hx:absolute hx:-mt-20" id="appendix-running-the-poc"></span>
    <a href="#appendix-running-the-poc" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>Everything is in <a href="https://github.com/LucasBG0/poc-jvm-memory-containers"target="_blank" rel="noopener"><code>poc</code></a>.
Prerequisite: Docker with cgroup v2 — runs on the <code>eclipse-temurin:21-jdk</code> image.</p>
<div class="hextra-code-block hx:relative hx:mt-6 hx:first:mt-0 hx:group/code">

<div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">git clone git@github.com:LucasBG0/poc-jvm-memory-containers.git
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> poc
</span></span><span class="line"><span class="cl">./run.sh</span></span></code></pre></div></div><div class="hextra-code-copy-btn-container hx:opacity-0 hx:transition hx:group-hover/code:opacity-100 hx:flex hx:gap-1 hx:absolute hx:m-[11px] hx:right-0 hx:top-0">
  <button
    class="hextra-code-copy-btn hx:group/copybtn hx:cursor-pointer hx:transition-all hx:active:opacity-50 hx:bg-primary-700/5 hx:border hx:border-black/5 hx:text-gray-600 hx:hover:text-gray-900 hx:rounded-md hx:p-1.5 hx:dark:bg-primary-300/10 hx:dark:border-white/10 hx:dark:text-gray-400 hx:dark:hover:text-gray-50"
    title="Copy code"
  >
    <div class="hextra-copy-icon hx:group-[.copied]/copybtn:hidden hx:pointer-events-none hx:h-4 hx:w-4"></div>
<div class="hextra-success-icon hx:hidden hx:group-[.copied]/copybtn:block hx:pointer-events-none hx:h-4 hx:w-4"></div>
  </button>
</div>
</div>
<p>The script builds the image, runs seven scenarios with the same container limit,
and generates <code>results.md</code> (the table in this post) and a <code>logs/&lt;scenario&gt;.log</code>
with two memory snapshots per run. Scenarios cover:</p>
<ul>
<li><code>default</code>, <code>init-max-75</code>, <code>init-max-75-pretouch</code>, <code>low-init-max-75</code> — effect of
<code>Initial</code>/<code>Max</code> on committed, RSS, and heap peak;</li>
<li><code>xmx-xms</code> — explicit form, equivalent to percentage;</li>
<li><code>minram-small</code> / <code>minram-large</code> — the <code>MinRAMPercentage</code> gotcha;</li>
<li><code>oom-xmx-acima-do-limit</code> — deterministic exit 137.</li>
</ul>
<p>Numbers vary slightly between runs (RSS is instantaneous and oscillates with GC),
but the deterministic signals — heap committed at boot, heap ceiling, RSS with
<code>AlwaysPreTouch</code>, post-GC live set — are stable and tell the whole story.</p>
]]></content:encoded><category>java</category><category>jvm</category><category>kubernetes</category><category>oomkill</category><category>maxrampercentage</category><category>docker</category><category>g1gc</category><category>heap</category><category>cgroups</category><category>sre</category><category>performance</category></item><item><title>About</title><link>https://lucasbg0.com/en/about/</link><guid isPermaLink="true">https://lucasbg0.com/en/about/</guid><pubDate>Mon, 01 Jan 0001 00:00:00 GMT</pubDate><description>&lt;p&gt;My name is Lucas Barbosa Gomes, and I&amp;rsquo;m a DevOps/SRE/Platform Engineer — or whatever the next job title they come up with turns out to be, lol. I have nearly a decade of professional experience, specializing in cloud-native architectures and Kubernetes. I currently work as a Senior DevOps Engineer, focused on high availability, infrastructure automation, DevSecOps practices, and building scalable solutions to make developers&amp;rsquo; lives easier. I hold the &lt;strong&gt;CKA (Certified Kubernetes Administrator)&lt;/strong&gt;, &lt;strong&gt;Azure Administrator Associate (AZ-104)&lt;/strong&gt;, and &lt;strong&gt;Azure DevOps Engineer Expert (AZ-400)&lt;/strong&gt; certifications.&lt;/p&gt;</description><content:encoded><![CDATA[<p>My name is Lucas Barbosa Gomes, and I&rsquo;m a DevOps/SRE/Platform Engineer — or whatever the next job title they come up with turns out to be, lol. I have nearly a decade of professional experience, specializing in cloud-native architectures and Kubernetes. I currently work as a Senior DevOps Engineer, focused on high availability, infrastructure automation, DevSecOps practices, and building scalable solutions to make developers&rsquo; lives easier. I hold the <strong>CKA (Certified Kubernetes Administrator)</strong>, <strong>Azure Administrator Associate (AZ-104)</strong>, and <strong>Azure DevOps Engineer Expert (AZ-400)</strong> certifications.</p>
<h2>My journey<span class="hx:absolute hx:-mt-20" id="my-journey"></span>
    <a href="#my-journey" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>My path into tech started with technical courses — IT, computer assembly and maintenance, and later computer graphics. What pulled me in at first was the dream of making games (I was a hardcore gamer), but what really stuck was the curiosity to understand how things work under the hood.</p>
<p>I started my career while still in college, as an intern at <strong>a news portal</strong>, assembling newsletters in a fairly manual, repetitive job. That&rsquo;s where I discovered what I actually enjoy: instead of repeating manual tasks, I started automating them, building tools, and provisioning environments. I spent 4 years there and learned a lot about development on a real product — it&rsquo;s where I first worked with WordPress and where I built, the hard way, the habit of learning on my own. I eventually became responsible for the IT department, and that autonomy let me focus on exactly what interested me most: development, automation, and infrastructure.</p>
<p>At <strong>a WordPress-focused agency</strong>, my growth in DevOps practices accelerated a lot. That&rsquo;s where I felt firsthand how important scalable solutions are, and the real cost of duplicated pipelines and repetitive processes. I deepened my knowledge of Cloud and IaC tools like Ansible and shell scripting, working alongside more developers and seeing operations at scale. There, I helped raise best practices for clients running that CMS to a high level of security and performance.</p>
<p>Then I joined <strong>a tech consultancy</strong>, where I deepened my work with Kubernetes and platform engineering in production — on-call rotation, incidents, and controlled changes — with responsibility for the GitLab CI/CD platform and production Kubernetes clusters (EKS and AKS).</p>
<p>I&rsquo;ve also participated in the technical leadership of major migrations — from on-premises environments to cloud and from Kubernetes clusters to Azure — while evolving shared pipeline libraries, operational automation, and infrastructure as code with Terraform, Helm, ArgoCD, and Ansible.</p>
<h2>About this blog<span class="hx:absolute hx:-mt-20" id="about-this-blog"></span>
    <a href="#about-this-blog" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>This blog is a space to document real experiences — what worked, what didn&rsquo;t, and why. I write about DevOps, Kubernetes, Linux, cloud, automation, self-hosting, and everything that orbits the infrastructure and platform engineering world.</p>
<p>Most of what I&rsquo;ve learned came less from courses and more from broken production environments, late-night incidents, outdated documentation, and hours staring at logs in a terminal. Since I learned to study on my own early on, I know how much it helps to find an honest account from someone who has already been through the problem. That&rsquo;s what I try to bring here: real context, not Hello World tutorials.</p>
<p>I don&rsquo;t write as a guru or as a tool evangelist. The goal is to share what I&rsquo;ve learned — including the mistakes.</p>
<h2>Where to find me<span class="hx:absolute hx:-mt-20" id="where-to-find-me"></span>
    <a href="#where-to-find-me" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><ul>
<li><a href="https://github.com/LucasBG0"target="_blank" rel="noopener">GitHub</a></li>
<li><a href="https://www.linkedin.com/in/lucasbg0/"target="_blank" rel="noopener">LinkedIn</a></li>
</ul>
<p>If anything here helps you in some way, then this blog has already done its job.</p>
]]></content:encoded></item></channel></rss>