由 SuKai August 8, 2021
当训练好了一个模型,如何对外提供推理服务。Seldon Core是在Kubernetes上部署机器学习模型的流行组件。简单地说,Seldon Core将模型封装成生产级的REST/GRPC微服务。Seldon Core已与Istio、Jeager、Prometheus做了集成,支持灰度发布、A/B测试、链路跟踪、指标监控等。
今天给大家示例的是最简化的使用方式,仅有Seldon Core,无其他开源组件。我只想用Seldon Core来完成我的模型加载和提供API服务。
| 部署Seldon Core Operator
编辑values.yaml,禁用ambassador, istio
ambassador:
enabled: false
istio:
enabled: false
因为我的Kubernetes集群版本v1.22.2,所以要修改一下webhook.yaml里的协议版本
sideEffects: None
admissionReviewVersions:
- v1beta1
| 安装
helm install -n seldon-system seldon-core-operator seldon-core-operator
| Prefect工作流
Prefect agent role添加seldon API操作权限
- apiGroups:
- machinelearning.seldon.io
resources:
- seldondeployments
verbs:
- '*'
| 修改模型训练任务
在训练模型任务返回MLflow的run_id,归档模型文件时,不注册模型版本
@task
def train_model(data, mlflow_experiment_id, alpha=0.5, l1_ratio=0.5):
mlflow.set_tracking_uri(f'http://mlflow.platform.sukai.com/')
train, test = train_test_split(data)
# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]
with mlflow.start_run(experiment_id=mlflow_experiment_id):
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
predicted_qualities = lr.predict(test_x)
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
print(" RMSE: %s" % rmse)
print(" MAE: %s" % mae)
print(" R2: %s" % r2)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
#mlflow.sklearn.log_model(lr, "model",registered_model_name="ElasticnetWineModel")
mlflow.sklearn.log_model(lr, "model")
run_id = mlflow.active_run().info.run_id
return run_id
| 添加注册模型任务
任务返回模型的下载地址
# Wait until the model is ready
def wait_until_ready(model_name, model_version):
client = MlflowClient()
for _ in range(10):
model_version_details = client.get_model_version(
name=model_name,
version=model_version,
)
status = ModelVersionStatus.from_string(model_version_details.status)
print("Model status: %s" % ModelVersionStatus.to_string(status))
if status == ModelVersionStatus.READY:
break
time.sleep(3)
@task
def register_model(run_id: str, model_name: str, stage: str = "staging"):
client = MlflowClient()
artifact_path = "model"
model_uri = "runs:/{run_id}/{artifact_path}".format(run_id=run_id, artifact_path=artifact_path)
model_details = mlflow.register_model(model_uri=model_uri, name=model_name)
wait_until_ready(model_details.name, model_details.version)
client.transition_model_version_stage(
name=model_details.name,
version=model_details.version,
stage=stage,
)
return model_details.source
| 添加deploy model任务
通过modelUri指定训练好的模型,在Kubernetes中创建SeldonDeployment CR资源。Seldon默认以UID 8888运行容器,发现启动报目录没有写权限,这里指定root运行。
容器在启动后,安装依赖包耗时比较多,设置容器探针启动延时120秒。
seldon_deployment = """
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: wines-classifier
namespace: ai
spec:
name: wines-classifier
predictors:
- graph:
implementation: MLFLOW_SERVER
modelUri: dummy
envSecretRefName: seldon-init-container-secret
name: classifier
name: default
replicas: 1
componentSpecs:
- spec:
# We are setting high failureThreshold as installing conda dependencies
# can take long time and we want to avoid k8s killing the container prematurely
containers:
- name: classifier
image: seldonio/mlflowserver:1.11.2-dev
securityContext:
runAsUser: 0
livenessProbe:
initialDelaySeconds: 120
failureThreshold: 100
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
readinessProbe:
initialDelaySeconds: 120
failureThreshold: 100
periodSeconds: 5
successThreshold: 1
httpGet:
path: /health/ping
port: http
scheme: HTTP
"""
CUSTOM_RESOURCE_INFO = dict(
group="machinelearning.seldon.io",
version="v1",
plural="seldondeployments",
)
@task
def deploy_model(model_uri: str, namespace: str = "seldon"):
logger = prefect.context.get("logger")
logger.info(f"Deploying model {model_uri} to enviroment {namespace}")
config.load_incluster_config()
custom_api = client.CustomObjectsApi()
dep = yaml.safe_load(seldon_deployment)
dep["spec"]["predictors"][0]["graph"]["modelUri"] = model_uri
try:
resp = custom_api.create_namespaced_custom_object(
**CUSTOM_RESOURCE_INFO,
namespace=namespace,
body=dep,
)
logger.info("Deployment created. status='%s'" % resp["status"]["state"])
except:
logger.info("Updating existing model")
existing_deployment = custom_api.get_namespaced_custom_object(
**CUSTOM_RESOURCE_INFO,
namespace=namespace,
name=dep["metadata"]["name"],
)
existing_deployment["spec"]["predictors"][0]["graph"]["modelUri"] = model_uri
resp = custom_api.replace_namespaced_custom_object(
**CUSTOM_RESOURCE_INFO,
namespace=namespace,
name=existing_deployment["metadata"]["name"],
body=existing_deployment,
)
| 工作流中添加deploy_model任务
with Flow("train-wine-quality-model", schedule, storage=storage, result=result, run_config=run_config) as flow:
alpha = Parameter('alpha', default=0.3)
l1_ratio = Parameter('l1_ratio', default=0.3)
data = fetch_data()
run_id = train_model(data=data, mlflow_experiment_id=4, alpha=alpha, l1_ratio=l1_ratio)
source = register_model(run_id=run_id, model_name="ElasticnetWineModel", stage="staging")
deploy_model(model_uri=source, namespace="ai")
Seldon模型服务容器启动时会创建conda环境过程耗时且经常安装包失败,所以这里我使用了二次构建的镜像,已经安装好了conda环境,并修改脚本不安装conda环境。
Dockerfile-seldon-mlflowserver
FROM seldonio/mlflowserver:1.11.2
COPY conda.yaml /tmp/conda.yaml
COPY conda_env_create.py /microservice/conda_env_create.py
RUN conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ && conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ && conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ && pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN chmod 777 /microservice && mkdir /.cache && chown 8888:8888 /.cache
RUN conda env create -n mlflow --file /tmp/conda.yaml
conda_env_create.py不执行conda env创建,但仍会安装model依赖包
def create_env(env_file_path):
"""Creates Conda environment from YAML.
Creates a Conda environment from a YAML file describing Python version,
dependencies, etc.
The new environment name is read from the `CONDA_ENV_NAME` environment
variable.
If the variable is not defined, it falls back to `mlflow`.
"""
env_file_name = os.path.basename(env_file_path)
env_name = os.getenv("CONDA_ENV_NAME", DEFAULT_CONDA_ENV_NAME)
env_name = quote(env_name)
env_file_path = quote(env_file_path)
log.info(f"Creating Conda environment '{env_name}' from {env_file_name}")
cmd = f"conda env create -n {env_name} --file {env_file_path}"
#run(cmd, shell=True, check=True)
| 执行流水线
(base) jovyan@jupyter-0:~/ai-demo/cicd$ python wine-quality-pipeline.py
[2021-11-20 14:37:34+0000] INFO - prefect.S3 | Uploading train-wine-quality-model/2021-11-20t14-37-34-655792-00-00 to prefect-sukai
Flow URL: http://localhost:8080/default/flow/1b3c08e5-3405-432d-8f68-d895769d7ea4
└── ID: eba90e34-6095-4aa5-b49d-5ccf263637c3
└── Project: wine-quality-project
└── Labels: []
| 查看流水线


| 查看Kubernetes
sukai@sukai:~$ kubectl -n ai get SeldonDeployment
NAME AGE
wines-classifier 17h
sukai@sukai:~$ kubectl -n ai get pods
NAME READY STATUS RESTARTS AGE
codeserver-0 1/1 Running 0 7d21h
sukai-ss-0-0 1/1 Running 0 9d
jupyter-0 1/1 Running 0 46h
minio-console-68f6f6466f-g5klf 1/1 Running 0 10d
minio-operator-75f99f579-mpjng 1/1 Running 0 10d
mlflow-0 1/1 Running 0 9d
optuna-dashboard-848fbbc75b-rvw5k 1/1 Running 0 6d1h
prefect-agent-6455b6897d-h8jst 1/1 Running 5 (4d4h ago) 4d5h
prefect-apollo-679b57c674-jcjz9 1/1 Running 2 (4d5h ago) 4d5h
prefect-create-tenant-job--1-925pb 0/1 Completed 6 4d5h
prefect-graphql-6db5c5d5c-42t2j 1/1 Running 0 4d5h
prefect-hasura-f74f8b6c8-zh9hb 1/1 Running 4 (4d5h ago) 4d5h
prefect-postgresql-0 1/1 Running 0 4d5h
prefect-towel-5f4bfc58c7-nm8lj 1/1 Running 0 4d5h
prefect-ui-795c9cb7b9-qkm7p 1/1 Running 0 4d5h
wines-classifier-default-0-classifier-d5b687dcf-v8nms 2/2 Running 0 17h
sukai@sukai:~$ kubectl -n ai get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
codeserver ClusterIP None <none> 8443/TCP 7d22h
console ClusterIP 10.211.92.95 <none> 9090/TCP,9443/TCP 10d
sukai-console ClusterIP 10.211.251.129 <none> 9090/TCP 10d
sukai-hl ClusterIP None <none> 9000/TCP 10d
jupyter ClusterIP 10.211.23.122 <none> 8888/TCP,7777/TCP,2222/TCP 4d23h
jupyter-headless ClusterIP None <none> 8888/TCP,7777/TCP,2222/TCP 4d23h
minio ClusterIP 10.211.148.215 <none> 80/TCP 10d
mlflow ClusterIP None <none> 5000/TCP 9d
operator ClusterIP 10.211.198.175 <none> 4222/TCP 10d
optuna-dashboard ClusterIP 10.211.126.243 <none> 80/TCP 6d1h
prefect-apollo LoadBalancer 10.211.18.199 <pending> 4200:31555/TCP 4d5h
prefect-graphql ClusterIP 10.211.78.161 <none> 4201/TCP 4d5h
prefect-hasura ClusterIP 10.211.75.152 <none> 3000/TCP 4d5h
prefect-postgresql ClusterIP 10.211.175.82 <none> 5432/TCP 4d5h
prefect-postgresql-headless ClusterIP None <none> 5432/TCP 4d5h
prefect-ui LoadBalancer 10.211.152.173 192.168.0.119 8080:30889/TCP 4d5h
wines-classifier-default ClusterIP 10.211.74.144 <none> 8000/TCP,5001/TCP 17h
wines-classifier-default-classifier ClusterIP 10.211.62.109 <none> 9000/TCP,9500/TCP 17h
sukai@sukai:~$
sukai@sukai:~$ kubectl -n ai get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
codeserver <none> codeserver.platform.sukai.com 80 7d22h
sukai-console <none> sukai-minio.platform.sukai.com 80 9d
sukai-minio <none> s3.platform.sukai.com 80 9d
jupyter <none> jupyter.platform.sukai.com 80 4d23h
minio-console <none> minio.platform.sukai.com 80 10d
mlflow <none> mlflow.platform.sukai.com 80 9d
optuna-dashboard <none> optuna.platform.sukai.com 80 6d1h
prefect-apollo <none> apollo.platform.sukai.com 80 4d5h
prefect-ui <none> prefect.platform.sukai.com 80 4d5h
wine-quality-model <none> wine.platform.sukai.com 80 4h51m
sukai@sukai:~$
| 为模型服务创建Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: wine-quality-model
namespace: ai
annotations:
kubernetes.io/ingress.class: traefik
spec:
rules:
- host: wine.platform.sukai.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: wines-classifier-default-classifier
port:
name: http
| 调用服务API
