Michael DiLeo on GoToSocial<p>This weekend I did something very funny and disastrous in my setup of <a href="https://gotosocial.michaeldileo.org/tags/talos" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>talos</span></a> <a href="https://gotosocial.michaeldileo.org/tags/kubernetes" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>kubernetes</span></a> cluster. I got up and running with my first node and various services running and saw that I was using about 5GB of RAM just for infrastructure stuff - <a href="https://gotosocial.michaeldileo.org/tags/longhorn" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>longhorn</span></a>, <a href="https://gotosocial.michaeldileo.org/tags/openobserve" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>openobserve</span></a>, etc. So, I decided to add another node with my <a href="https://gotosocial.michaeldileo.org/tags/netcup" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>netcup</span></a> provider and add VLAN, which isn't something that they advertise well.</p><p>Anyway, I purchased an identical VPS (10 arm vcpu, 16GB ram, 512GB storage), copied the machine config and patched the names, and added it to the new VPS after installing talos. It came online fine and attached to the cluster. Then I wanted to add the VLAN, so I attached that to the VMs and restarted n1(?) first - I kinda forget the order. What happened then was that I didn't quite have the right networking configuration for the vlan interface. Despite configuring <code>dhcp: false</code>, talos was trying to get <a href="https://gotosocial.michaeldileo.org/tags/dhcp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dhcp</span></a> off of the new interface and failing, causing apid to not start, so I couldn't access the node. I was totally locked out. Eventually the same thing happened to n1, but what else had happened was that when I restarted the node to apply the vlan interface, the cluster lost quorum because, guess what? 50% is not >50%. Woops.</p><p>So, the cluster was down and I was totally locked out. With the way the interfaces work, I wound up wiping the disks and reinstalling talos on n2 until I could find the right magic.</p><p>I found a solution, but I noticed that <code>external-dns</code> was trying to use the internal IP and kubelet didn't know about the external id. I got around that by using explicit IP addresses for external-dns annotations for now, and also adding <code>nodeIp: ....</code> in the configs. Here's the final version. Notice that <code>eth0</code> no longer works, I had to use <code>enps70</code>.</p><p>networking config</p><pre><code>machine:
network:
hostname: n2
interfaces:
- dhcp: true
interface: enp7s0
addresses:
- <my external node ip>/22 # /22 is how it's reported in netcup
- dhcp: false
interface: enp9s0
addresses:
- 10.132.0.20/24
</code></pre><pre><code>machine:
kubelet:
extraArgs:
node-ip: "<my external node ip>"
</code></pre><p><a href="https://gotosocial.michaeldileo.org/tags/selfhosting" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>selfhosting</span></a></p>