Automating machine learning with devops/Mini sagemaker!

Deepanshu Yadav
6 min readSep 24, 2020

What is AWS Sagemaker ?

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.

Traditional ML development is a complex, expensive, iterative process made even harder because there are no integrated tools for the entire machine learning workflow. You need to stitch together tools and workflows, which is time-consuming and error-prone. SageMaker solves this challenge by providing all of the components used for machine learning in a single toolset so models get to production faster with much less effort and at lower cost.

Why ML startups fail many times ?

Due to lots of manual process in DL/ML like changing activation functions, layers, etc to find best accuracy this is not possible every time. This problem is solved by sagemaker. But today we will see the backend of sagemaker!

Now let’s try to understand how it works and then we will move to have our own.

Below is a task which we are going to do which is a completely automated deep learning model which will give you feel of sagemaker and then you will understand how it works.

Let me first share what we are going to do:

1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile

2. When we launch this image, it should automatically starts train the model in the container.

3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins

4. Job1 : Pull the Github repo automatically when some developers push repo to Github.

5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).

6. Job3 : Train your model and predict accuracy or metrics.

7. Job4 : if metrics accuracy is less than 80% , then tweak the machine learning model architecture.

8. Job5: Retrain the model or notify that the best model is being created

9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left

Prerequisites- Git, github, jenkins, docker, ML and DL, scripting, linux

let me share you the Dockerfile which i have created so that we can train our model in a docker container. I have installed libraries according to my model needs. you can install accordingly.

Now you can create image using the command →docker build -t task3:v1 . it will look like this. Read the previous article if you have any problem in making docker image.

link- https://medium.com/@deepanshuyadavv11/task2-launching-docker-server-accourding-to-the-type-of-code-and-sending-email-notification-5d5ec7c651bc

After building the image when you launch container it will automatically start training as you can see by the last line of the dockerfile.

use the command below to launch the container

docker run -dit -v /root/wstask3mlops/task3:/acc/ — name task3 task3:v7

if you want to see what is happening inside you can use command →docker logs task3(or any name you have given to your container). I have mapped a docker’s file with my base system file so that we can store the accuracy.

Now let's start building the jobs:

  1. create a new item in jenkins with freestyle project and configure this job as shown in the screenshots below

task 3 is the folder where the code would be put by jenkins. you can give any name.

2. Now, to start the training according to the code pushed by github using, follow the below screenshots to configure it

As soon as the container runs it starts training the model and stores the accuracy in acc file. we have written this in our code. refer the code below

link- https://github.com/dipuyadav/task3mlops

Now let's configure the job 4 which is the main part where we have to tweak the model to get more accuracy, if the coming accuracy is less the 80%. using this, the container would launch again and again and model is trained repeatedly till we reach accuracy 80. see below how to configure it.

if you want more accuracy, you can change 80 to 90 or anything.Here we have set it to add more layers to increase accuracy. you can add further things like changing the activation function, etc to get more accuracy.

Now the last is to configure jenkins for monitoring the containers. if any container fails due to any reasons, jenkins will restart it as shown below

so, this is how we can make our model change and we can apply all combinations of activation functions, layers, etc to get the best accuracy. For this you only just need to be good in scripting. and should also know the concepts of devops and ML/DL too. Here i have shown you just a small automation but you can take it to the next level easily and can build your own product like sagemaker.

So, for this, you need to have a good computing power also which we can get from cloud easily. if you have any query regarding it you can anytime contact me on linked or through mail.

This was actually a task give by vimal daga sir in his training of MLops. he was the one who give me a path to think and do and make such kind of things. All credits goes to him🧡.

--

--