Page MenuHomePhabricator

KubernetesTag
ActivePublic

Details

Description

A tag for anything related to Kubernetes. For the discussion see T147187: Create a tag for #kubernetes.

See also:

Recent Activity

Today

kamila created T377857: Cookbook to roll-reimage k8s nodes.
Tue, Oct 22, 2:56 PM · Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2089.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2089 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410221237_jayme_1935895_wikikube-worker2089.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Tue, Oct 22, 12:55 PM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2086.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2086 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410221234_jayme_1935371_wikikube-worker2086.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Tue, Oct 22, 12:53 PM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2085.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2085 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410221231_jayme_1935361_wikikube-worker2085.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Tue, Oct 22, 12:50 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host wikikube-worker2088.codfw.wmnet with OS bookworm completed:

  • wikikube-worker2088 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410221227_jayme_1935630_wikikube-worker2088.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Tue, Oct 22, 12:45 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wikikube-worker2089.codfw.wmnet with OS bookworm

Tue, Oct 22, 12:10 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wikikube-worker2088.codfw.wmnet with OS bookworm

Tue, Oct 22, 12:09 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wikikube-worker2086.codfw.wmnet with OS bookworm

Tue, Oct 22, 12:08 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host wikikube-worker2085.codfw.wmnet with OS bookworm

Tue, Oct 22, 12:08 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T362408: Migration to containerd and away from docker.

Change #1081910 merged by JMeybohm:

[operations/puppet@production] Migrate wikikube-worker208[5689] to containerd

https://gerrit.wikimedia.org/r/1081910

Tue, Oct 22, 12:01 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T362408: Migration to containerd and away from docker.

Change #1082191 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/cookbooks@master] k8s.pool-depool-node: Add support for multiple nodes

https://gerrit.wikimedia.org/r/1082191

Tue, Oct 22, 11:32 AMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
akosiaris added a comment to T376762: Remove `.cluster.local.` suffix in PTR responses.

Tracked separately in T377805

Tue, Oct 22, 9:04 AMKubernetes

Yesterday

CDanis added a comment to T376762: Remove `.cluster.local.` suffix in PTR responses.

21.75.192.10.in-addr.arpa. 5 IN PTR 10-192-75-21.eventstreams-production-tls-service.eventstreams.svc.cluster.local.
This is actually "eventstreams-production" on the staging-codfw cluster. Impossible to tell from this DNS response.

This is badly named.

Mon, Oct 21, 3:19 PMKubernetes
akosiaris added a comment to T376762: Remove `.cluster.local.` suffix in PTR responses.

Should we attempt this on a cluster or two? One of the stagings, and then perhaps aux?

Mon, Oct 21, 2:59 PMKubernetes
jijiki triaged T365571: Rename wikikube worker nodes during OS reimage as Medium priority.
Mon, Oct 21, 2:59 PMKubernetes, Prod-Kubernetes, serviceops
CDanis renamed T376762: Remove `.cluster.local.` suffix in PTR responses from CoreDNS upgrade so we can rewrite `.cluster.local.` suffix in PTR responses to Remove `.cluster.local.` suffix in PTR responses.
Mon, Oct 21, 1:51 PMKubernetes
CDanis closed T344171: Reverse DNS for k8s pods IPs as Resolved.
root@db1169:~# ss -tr | grep :mysql
ESTAB 0      945     db1169.eqiad.wmnet:mysql   10-67-163-233.mediawiki.mw-api-ext.svc.cluster.local:57846            
ESTAB 0      0       db1169.eqiad.wmnet:mysql       10-67-163-180.mediawiki.mw-web.svc.cluster.local:54236            
ESTAB 0      0       db1169.eqiad.wmnet:mysql        10-67-172-94.mediawiki.mw-web.svc.cluster.local:40890            
ESTAB 0      0       db1169.eqiad.wmnet:mysql   10-67-163-151.mediawiki.mw-api-ext.svc.cluster.local:42186            
ESTAB 0      0       db1169.eqiad.wmnet:mysql   10-67-133-187.mediawiki.mw-api-int.svc.cluster.local:57268            
ESTAB 0      674     db1169.eqiad.wmnet:mysql   10-67-190-118.mediawiki.mw-api-ext.svc.cluster.local:42074            
ESTAB 0      11      db1169.eqiad.wmnet:mysql    10-67-158-92.mediawiki.mw-api-ext.svc.cluster.local:33186            
ESTAB 0      753     db1169.eqiad.wmnet:mysql       10-67-172-220.mediawiki.mw-web.svc.cluster.local:60126            
ESTAB 0      0       db1169.eqiad.wmnet:mysql       10-67-189-105.mediawiki.mw-web.svc.cluster.local:32812            
ESTAB 0      673     db1169.eqiad.wmnet:mysql        10-67-167-12.mediawiki.mw-web.svc.cluster.local:35374
Mon, Oct 21, 1:46 PMPatch-For-Review, Traffic, serviceops, Prod-Kubernetes, Kubernetes
CDanis added a comment to T376762: Remove `.cluster.local.` suffix in PTR responses.

Should we attempt this on a cluster or two? One of the stagings, and then perhaps aux?

Mon, Oct 21, 1:40 PMKubernetes
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.k8s.reimage-stacked-control-plane started by jayme@cumin1002 Reimaging k8s control planes of cluster staging-eqiad: containerd migration completed:

  • kubestagemaster1003 (PASS)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410210826_jayme_1708770_kubestagemaster1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Oct 21, 10:08 AMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.k8s.reimage-stacked-control-plane started by jayme@cumin1002 Reimaging k8s control planes of cluster staging-eqiad: containerd migration completed:

  • kubestagemaster1003 (PASS)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410210826_jayme_1708770_kubestagemaster1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Oct 21, 9:27 AMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.k8s.reimage-stacked-control-plane started by jayme@cumin1002 Reimaging k8s control planes of cluster staging-eqiad: containerd migration completed:

  • kubestagemaster1003 (PASS)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410210826_jayme_1708770_kubestagemaster1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Oct 21, 8:48 AMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T362408: Migration to containerd and away from docker.

Change #1081910 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Migrate wikikube-worker208[5689] to containerd

https://gerrit.wikimedia.org/r/1081910

Mon, Oct 21, 8:23 AMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops

Fri, Oct 18

JMeybohm closed T377132: containerd logs are not properly parsed during ingestion to logstash as Resolved.

This looks great, thanks!

While checking I saw that for non JSON logs, timestamp, steam and tags seem to not (always?) be stripped (example) - is that expected?

Fri, Oct 18, 7:51 PMPatch-For-Review, Observability-Logging, observability, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm closed T377132: containerd logs are not properly parsed during ingestion to logstash, a subtask of T362408: Migration to containerd and away from docker, as Resolved.
Fri, Oct 18, 7:47 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2005.codfw.wmnet with OS bookworm completed:

  • kubestagemaster2003 (PASS)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410181502_jayme_1220197_kubestagemaster2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, Oct 18, 4:54 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T377132: containerd logs are not properly parsed during ingestion to logstash.

Change #1081401 merged by Cwhite:

[operations/puppet@production] logstash/containerd: fix regexp to match also non-json entries

https://gerrit.wikimedia.org/r/1081401

Fri, Oct 18, 4:25 PMPatch-For-Review, Observability-Logging, observability, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2005.codfw.wmnet with OS bookworm

Fri, Oct 18, 4:10 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2004.codfw.wmnet with OS bookworm completed:

  • kubestagemaster2003 (PASS)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410181502_jayme_1220197_kubestagemaster2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, Oct 18, 4:10 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2004.codfw.wmnet with OS bookworm

Fri, Oct 18, 3:26 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2003.codfw.wmnet with OS bookworm completed:

  • kubestagemaster2003 (PASS)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410181502_jayme_1220197_kubestagemaster2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, Oct 18, 3:26 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
BTullis edited projects for T373195: Migrate Search Platform-owned helm charts to Calico Network Policies, added: Data-Platform-SRE (2024.10.19 - 2024.11.08); removed Data-Platform-SRE (2024.09.28 - 2024.10.18).
Fri, Oct 18, 3:13 PMData-Platform-SRE (2024.10.19 - 2024.11.08), Data-Engineering, Event-Platform, Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm updated the task description for T362408: Migration to containerd and away from docker.
Fri, Oct 18, 2:40 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2003.codfw.wmnet with OS bookworm

Fri, Oct 18, 2:39 PMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T377132: containerd logs are not properly parsed during ingestion to logstash.

Change #1081401 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] logstash/containerd: fix regexp to match also non-json entries

https://gerrit.wikimedia.org/r/1081401

Fri, Oct 18, 1:19 PMPatch-For-Review, Observability-Logging, observability, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2005.codfw.wmnet with OS bookworm completed:

  • kubestagemaster2005 (PASS)
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202410181121_jayme_1191485_kubestagemaster2005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, Oct 18, 11:43 AMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
ops-monitoring-bot added a comment to T362408: Migration to containerd and away from docker.

Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2005.codfw.wmnet with OS bookworm

Fri, Oct 18, 11:00 AMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T362408: Migration to containerd and away from docker.

Change #1081224 merged by JMeybohm:

[operations/puppet@production] etcd::v3: Ensure trusted-ca-file is not set on first puppet run with 3.4

https://gerrit.wikimedia.org/r/1081224

Fri, Oct 18, 9:33 AMPatch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T362408: Migration to containerd and away from docker.

Change #1081377 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/cookbooks@master] Add a cookbook to roll-reimage stacked k8s control planes

https://gerrit.wikimedia.org/r/1081377

Fri, Oct 18, 9:17 AM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm added a comment to T377132: containerd logs are not properly parsed during ingestion to logstash.

This looks great, thanks!

Fri, Oct 18, 7:19 AM · Patch-For-Review, Observability-Logging, observability, Prod-Kubernetes, Kubernetes, serviceops

Thu, Oct 17

gerritbot added a comment to T377132: containerd logs are not properly parsed during ingestion to logstash.

Change #1080603 merged by Cwhite:

[operations/puppet@production] logstash: parse new containerd log format

https://gerrit.wikimedia.org/r/1080603

Thu, Oct 17, 8:38 PM · Patch-For-Review, Observability-Logging, observability, Prod-Kubernetes, Kubernetes, serviceops
Maintenance_bot removed a project from T370934: Build and publish multiple MediaWiki production images for a given set of PHP versions: Patch-For-Review.
Thu, Oct 17, 7:30 PM · Kubernetes, Deployments, Release-Engineering-Team (Priority Backlog 📥)
CodeReviewBot added a comment to T370934: Build and publish multiple MediaWiki production images for a given set of PHP versions.

dancy merged https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/484

Thu, Oct 17, 6:58 PM · Kubernetes, Deployments, Release-Engineering-Team (Priority Backlog 📥)
gerritbot added a comment to T362408: Migration to containerd and away from docker.

Change #1081224 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] etcd::v3: Ensure trusted-ca-file is not set on first puppet run with 3.4

https://gerrit.wikimedia.org/r/1081224

Thu, Oct 17, 5:19 PM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T362408: Migration to containerd and away from docker.

Change #992629 merged by JMeybohm:

[operations/puppet@production] etcd::v3: Don't set trusted-ca-file if client-cert-auth is false

https://gerrit.wikimedia.org/r/992629

Thu, Oct 17, 12:52 PM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T362408: Migration to containerd and away from docker.

Change #992629 had a related patch set uploaded (by JMeybohm; author: Mxmxchere):

[operations/puppet@production] etcd::v3: Don't set trusted-ca-file if client-cert-auth is false

https://gerrit.wikimedia.org/r/992629

Thu, Oct 17, 12:01 PM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T377132: containerd logs are not properly parsed during ingestion to logstash.

Change #1081089 merged by JMeybohm:

[operations/puppet@production] etcd::v3: Add an etcd_version fact

https://gerrit.wikimedia.org/r/1081089

Thu, Oct 17, 11:36 AM · Patch-For-Review, Observability-Logging, observability, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T377132: containerd logs are not properly parsed during ingestion to logstash.

Change #992629 had a related patch set uploaded (by JMeybohm; author: Mxmxchere):

[operations/puppet@production] etcd::v3: Don't set trusted-ca-file if client-cert-auth is false

https://gerrit.wikimedia.org/r/992629

Thu, Oct 17, 10:08 AM · Patch-For-Review, Observability-Logging, observability, Prod-Kubernetes, Kubernetes, serviceops
gerritbot added a comment to T377132: containerd logs are not properly parsed during ingestion to logstash.

Change #1081089 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] etcd::v3: Don't set trusted-ca-file if client-cert-auth is false

https://gerrit.wikimedia.org/r/1081089

Thu, Oct 17, 9:31 AM · Patch-For-Review, Observability-Logging, observability, Prod-Kubernetes, Kubernetes, serviceops

Wed, Oct 16

Etonkovidova moved T357122: linkrecommendation-internal regularly uses more than 95% of its memory limit from Inbox to Triaged on the Growth-Team board.
Wed, Oct 16, 11:06 PM · Observability-Tracing, Patch-For-Review, Growth-Team, Add-Link, serviceops, Prod-Kubernetes, Kubernetes
CodeReviewBot added a project to T370934: Build and publish multiple MediaWiki production images for a given set of PHP versions: Patch-For-Review.

swfrench opened https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/484

Wed, Oct 16, 9:44 PM · Kubernetes, Deployments, Release-Engineering-Team (Priority Backlog 📥)