Milestone for Data Platform SRE work
Details
Yesterday
Change #1081911 merged by Brouberol:
[operations/puppet@production] ceph/server: fix typo in caps
Change #1081911 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] ceph/server: fix typo in caps
Change #1081905 merged by Brouberol:
[operations/puppet@production] ceph/server: fix the dse-k8s-csi-cephfs according to the CSI doc
Change #1081903 merged by Brouberol:
[operations/deployment-charts@master] ceph-csi-cephs: fix RBAC by granting cluster-wide permisions on PVC and storageclasses
Change #1081905 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] ceph/server: fix the dse-k8s-csi-cephfs according to the CSI doc
Change #1081903 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] ceph-csi-cephs: fix RBAC by granting cluster-wide permisions on PVC and storageclasses
Fri, Oct 18
Ah, there is a slight problem with an-worker1176. It won't take long to sort out.
The partition table on /dev/sda looks like it was created for the operating system disk, which makes sense.
We can see here that /dev/sda1 is only 1 GB in size, with 3.6TB unused in /dev/sda2
btullis@an-worker1176:~$ lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 3.6T 0 disk ├─sda1 8:1 0 953M 0 part /var/lib/hadoop/data/m └─sda2 8:2 0 3.6T 0 part
I think that we will have to modify the partition table and resize the file system.
Change #1081261 merged by Bking:
[labs/private@master] analytics_test_cluster: add secret
The custom-config.json has been added and is now available on Wikikube:
Change #975006 abandoned by Btullis:
[operations/puppet@production] Set a non-default mapreduce file committer algorithm for spark
Reason:
See: https://phabricator.wikimedia.org/T351388#10237936
Change #1081382 merged by jenkins-bot:
[operations/deployment-charts@master] wikidata-query-gui: fix volumeMount with subPath
Change #1081382 had a related patch set uploaded (by Jelto; author: Jelto):
[operations/deployment-charts@master] wikidata-query-gui: fix volumeMount with subPath
Change #1079466 merged by jenkins-bot:
[operations/deployment-charts@master] wikidata-query-gui: mount custom-config.json into pod
Change #1079465 merged by jenkins-bot:
[operations/deployment-charts@master] miscweb: add support to mount add confimaps
When I do a search for discovery in DataHub I can see 17 hive tables returned.
I exported the results as a CSV and the entity URLs are here:
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.webrequest_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.wikibase_rdf_subgraphs,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.cirrus_index,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.cirrus_index_without_content,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.search_satisfaction_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.wikibase_rdf,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.subgraph_pair_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.general_subgraph_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.popularity_score,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.query_clicks_daily,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.subgraph_pair_query_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.query_clicks_hourly,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.general_query_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.general_subgraph_query_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.per_subgraph_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.per_subgraph_query_metrics,PROD)
- https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,discovery.processed_external_sparql_query,PROD)
Yes. Each Airflow instance will have a kerberos principal associated with it anyway. The workers will need this principal in order to access HDFS, YARN, and Hive.
We then supply each instance with a keytab which is a means of making sure that services can automatically authenticate with their given Kerberos principal.
The hosts are now set back to Active on netbox, and are now part of the hadoop cluster and are catching up to the production hosts. Keeping an eye on this
Thu, Oct 17
Change #1081285 had a related patch set uploaded (by Eevans; author: Eevans):
[operations/puppet@production] Add jebe to airflow-analytics-product-admins per access request
Approved
Change #1081268 had a related patch set uploaded (by Bking; author: Bking):
[operations/puppet@production] airflow: make 'secret_key' configurable
Change #1081261 had a related patch set uploaded (by Bking; author: Bking):
[labs/private@master] analytics_test_cluster: add secret
FYI, just updated Ops week page with docs on using Airflow cli to rerun tasks:
I approve membership in airflow-analytics-product-admins
Change #1081230 merged by jenkins-bot:
[operations/deployment-charts@master] airflow-analytic-test: disable remote logging