Kubeflow pit records
Kubeflow UI Jupyter permission problem
User None is not authorized to list … for namespace: anonymous
The solution of official issue has been tried, but it does not take effect. One of the issues mentioned that the source code can be changed to dev mode to remove permission authentication
User None is not authorized to list … · KubeFlow /kubeflow
Jupyter source
kubeflow/kubeflow
# Modify jupyter Kustomize, add the red part of the parameters, restart the Jupyter
#. The cache manifests/manifests - 0.7 - branch/jupyter/jupyter - web - app/base/deployment yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
replicas: 1
template:
spec:
containers:
- env:
- name: ROK_SECRET_NAME
valueFrom:
configMapKeyRef:
name: parameters
key: ROK_SECRET_NAME
- name: UI
valueFrom:
configMapKeyRef:
name: parameters
key: UI
- name: USERID_HEADER
value: $(userid-header)
- name: USERID_PREFIX
value: $(userid-prefix)
- name: FLASK_ENV
value: development
image: GCR. IO/kubeflow - images - public/jupyter - web - app: v0.5.0
imagePullPolicy: $(policy)
command: ["python3"."main.py"]
args: ["--dev"]
name: jupyter-web-app
ports:
- containerPort: 5000
volumeMounts:
- mountPath: /etc/config
name: config-volume
serviceAccountName: service-account
volumes:
- configMap:
name: config
name: config-volume
Copy the code
# to restart
kustomize build | kubectl delete -f -
kustomize build | kubectl apply -f -
Copy the code
Error message in Charge UI
Error: mysql_query failed: errno: 2006, error: MySQL server has gone away. Code: 13
Restart GRPC – Metadata pod according to official issue resolved
Error: mysql_query failed: errno: 2006, error: MySQL server has gone away. Code: 13 · Issue #4604 · kubeflow/kubeflow
Reasons why
Mysql_query failed: errno: 2006, error: MySQL server has gone away · Issue #198 · kubeflow/metadata
The notebook-server cannot be connected
Sorry, /notebook is not a valid page #5010
Check whether port-forward is available
Kubectl port-forward SVC/Kenwood-test-n anonymous 8080:80 –address 10.10.62.180
According to the official issue, it is the deployment parameter of note-controller that is hardcoded without use_IStio on
Sorry, /notebook is not a valid page · Issue #5010 · kubeflow/kubeflow
Modify note- Controller parameters
#. The cache/manifests/manifests - 0.7 - branch/jupyter/notebook - controller/base/deployment. Yaml
Change # USE_ISTIO value to true
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
template:
spec:
containers:
- name: manager
image: gcr.io/kubeflow-images-public/notebook-controller:v20190614-v0-160-g386f2749-e3b0c4
command:
- /manager
env:
- name: USE_ISTIO
value: "true"
- name: POD_LABELS
value: $(POD_LABELS)
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /metrics
port: 8080
initialDelaySeconds: 30
periodSeconds: 30
serviceAccountName: service-account
Copy the code
Restart the node – controller
kustomize build | kubectl delete -f -
kustomize build | kubectl apply -f -
Copy the code
Image problem
-
For some mirror pull policies, Always needs to be changed to IfNotPresent
-
Some mirrors reference SHA256 and need to be changed to tag
-
GCR image pull problem, using github Action to do synchronization to dockerHub
I can fork my project for a change
kenwoodjw/sync_gcr
SHA Digest used in Knative -install · Issue #1521 · SSE
conclusion
- FUCK GFW, pulling mirrors is a waste of time
- All by issue solution
Kfserving model deployment
Complete Kubeflow uses teaching-developing ML models, conducting decentralized training and deploying services
The KFServing underlying layer is implemented by Knative and Istio, so it is possible to deploy both versions of the model simultaneously for Canary Deployment for A/B test.
Kubeflow V0.7, KNative 0.8 and Istio 1.1.6 are installed by default as part of the Kubeflow installation.
Kubeflow 1.0 KNative 0.11.1 and Istio 1.1.6 are installed by Default
kubeflow/kfserving
Modify the Knative Image tag
GCR. IO/knative - releases/knative. Dev/serving/CMD/activator: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/autoscaler - hpa: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/autoscaler: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/controller: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/networking/istio: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/webhook: v0.8.0
GCR. IO/knative - releases/knative. Dev/serving/CMD/queue: v0.8.0
Synchronize the mirror of kfSERVING ConfigMap
McR.microsoft.com/onnxruntime/server:v0.5.0
GCR. IO/kfserving sklearnserver: 0.2.0
GCR. IO/kfserving xgbserver: 0.2.0
GCR. IO/kfserving pytorchserver: 0.2.2
NVCR. IO/nvidia/tensorrtserver: 19.05 - py3
GCR. IO/kfserving alibi - explainer: 0.2.2
GCR. IO/kfserving/storage -, initializer: 0.2.2
GCR. IO/kfserving/logger: 0.2.2
Copy the code