The author | LAKSHAY ARORA compile | source of vitamin k | Analytics Vidhya
An overview of the
-
Deploying the machine learning model is a key part of every ML project
-
Learn how to deploy machine learning models into production using Flask
-
Model deployment is a core topic in data scientist interviews
introduce
I remember my early days in machine learning. I enjoy working on multiple problems and am interested in all phases of machine learning projects. Like many before me, I am fascinated by the construction of models throughout their life cycle.
I talked to domain experts, project managers, and everyone involved to make sure their input was included in the model. But then I hit a roadblock — how on earth do I get my model to my client? I can’t give them a Jupyter notebook!
Everything I learned focused on the model building component. Not many people talk about how to deploy your machine learning model. What does it mean to put your model into production? What does it need?
These are key career-defining questions that every data scientist needs to answer. That’s why I decided to write this tutorial to demonstrate how to deploy a machine learning model using Flask.
We’ll first look at the concept of model deployment, then discuss what Flask is, how to install it, and finally, we’ll dive into a problem statement to learn how to deploy machine learning models using Flask.
directory
-
What is model deployment?
-
What is a Flask?
-
Install the Flask on the machine
-
Understand the problem statement
-
Build our machine learning model
-
Set the Twitter API
-
Create web pages
-
Connect the web page to the model
-
Viewing the Deployment Model
What is model deployment?
In a typical machine learning and deep learning project, we usually start by defining a problem statement, then move on to data collection and preparation, data understanding, and model building, right?
But, in the end, we want our model to be available to end users so they can take advantage of it. Model deployment is one of the final stages of any machine learning project and can be a bit tricky. How to deliver machine learning models to customers/stakeholders? What are the different things you need to pay attention to when your model goes into production? How can you start deploying a model?
Flask comes into his own.
What is a Flask?
Flask is a Web application framework written in Python. It has multiple modules that make it easier for Web developers to write applications without worrying about the details of protocol management, thread management, and so on.
Flask is one of the options for developing Web applications, and it provides us with the tools and libraries necessary to build web applications.
In this tutorial, we will use Flask’s resources to help us deploy our own machine learning model. You’d love to work in Flask!
Install the Flask on the machine
Installing Flask is straightforward. At this point, I assume that you have Python 3 and PIP installed. To install Flask, run the following command:
sudo apt-get install python3-flaskCopy the code
In this way! Be prepared to dive deeper into the problem statement and move one step closer to deploying the machine learning model.
Understand the problem statement
In this section, we will use the Twitter data set. Our goal is to find hate speech on Twitter. For simplicity, if the tweet is racist or sexist, we say it contains hate speech.
We will create a web page containing the following text box (the user can search for any text) :
For any search query, we’ll crawl tweets related to that text in real time, and for all of those crawled tweets, we’ll classify racist and sexist tweets using a hate speech detection model.
Setting up the project workflow
- Model building: We will build a logistic regression model pipeline to classify whether tweets contain hate speech or not. Our focus here is not on how to build a very accurate classification model, but how to deploy this model using Flask
- Install the Twitter application: We will create a Twitter application on the Twitter developer’s website and get the authentication key. We’ll write a Python script to grab the tweets associated with a particular text query
- Web page Template: Here we will design a user interface where the user can submit his query
- Get Tweets: After getting the query from the user, we will use the Twitter API to get the Tweets related to the search query
- Predict the class and send the results: Next, predict the class of tweets using the saved model and send the results back to the web page
Here’s a schematic of the steps we just saw:
Build our machine learning model
We have data about Tweets in CSV files mapped to tags. We will use a logistic regression model to predict whether tweets contain hate speech.
You can download the full code and data set here.
Github.com/lakshay-aro…
Start by importing some of the required libraries:
Import pandas as pd from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS, TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.metrics import f1_score from sklearn.model_selection import train_test_splitCopy the code
Next, we’ll read the dataset and look at the top row:
Data = pd.read_csv('dataset/ twitter_content.csv ') # Check top line data.head()Copy the code
The dataset has 31962 rows and 3 columns:
- Id: Unique number of each row
- Label: For normal tweets, it is 0; For racist or sexist tweets, it would be 1. There are 29,720 zeros and 2,242 ones
- Tweet: A tweet posted on Twitter
Now, we’ll use SciKit Learn’s train_test_split function to split the data into training and testing. We only use 20% of the data for testing. We will stratify the data on the label column so that the target labels are equally distributed in the training and test data:
Train_test_split (data, test_size = 0.2, stratify = data['label'], Shape, test. Shape ## >> ((25569, 3), (6393, 3))Copy the code
Now, we’ll use TfidfVectorizer to create a TF-IDF vector for the tweet column and set the argument lowercase to True so that it first converts the text to lowercase. We will also keep Max Features at 1000 and pass the list of predefined stop words in the SciKit Learn library.
First, create the object of TFidfVectorizer, build the model and match the model with the training data tweets:
Tfidf_vectorizer = TfidfVectorizer(lowercase= True, max_features=1000, Tfidf_vectorizer.fit (train. Tweet)Copy the code
Transform the tweets of training and test data using the model:
Train_idf = tFIDF_vectorizer.transform (train.tweet) test_IDf = tFIDF_vectorizer.transform (test.tweet) train_IDF = tFIDF_vectorizer.transform (test.tweet)Copy the code
Now we will create a Logistic regression model of the object.
Remember, our point is not to build a very accurate classification model, but to see how we deploy the prediction model to get results.
Model_lr.fit (train_idf, Predict_train = model_lR.predict (train_IDF) # Predict_test = Model_lr. predict(test_idf) # f1_score(y_true= train. Y_pred = predict_train) ## >> predict_score (y_true= test. Y_pred = predict_test) ## >> predict_testCopy the code
Let’s define the steps of the pipe:
-
Step 1: Create a TF-IDF vector of tweet text with 1000 features defined above
-
Step 2: Use logistic regression model to predict target tags
Both steps are performed when we use the fit() function on the pipe object. After the model training process, we used the predict()) function to generate the predictions.
Steps = [('tfidf', TfidfVectorizer(lowercase=True, max_features=1000, Stop_words = ENGLISH_STOP_WORDS)), ('model', LogisticRegression())])Copy the code
Now we’ll test the pipeline with an example tweet:
# example tweet text = ["Virat Kohli, AB de Villiers set to auction their 'Green Day' kits from 2016 IPL match to raise funds"] # pipeline.predict(text) ## >> array([0])Copy the code
Having successfully built the machine learning pipeline, we will use the dump function in the Joblib library to save the pipeline object. Simply pass the pipe object and filename:
Dump (pipeline, filename="text_classification. Joblib ")Copy the code
It will create a file “text_classification. Joblib”. Now, we’ll open another Python file and use the Joblib library’s load function to load the pipe model.
Let’s see how to use the saved model:
Text = ["Virat Kohli, AB de Villiers set to auction their 'Green Day' kits from 2016 IPL match to raise funds"] # load saved Pipleine model pipeline = Load ("text_classification. Joblib ") ## >> array([0])Copy the code
Set the Twitter API
The first thing we need to do is get the API Key, API Secret Key, Access Token, Access Token Secret from the Twitter developer site. These keys will help the API with authentication. First, go to this page and fill out the form.
Developer.twitter.com/en/apps/cre…
Once you’ve filled out the form, you’ll get the key.
Install tweepy
Now, we’ll install Tweepy, a Python library that allows us access to the Twitter API.
! pip3 install tweepyCopy the code
Import the required libraries and add the authentication key received from Twitter. Tweepy tries to make authentication as painless as possible for you.
To start the process, create an instance of OAuthHandler and pass the API key and API Secret key. The instance is then authenticated using access Tokens and Access Token Secret.
Import tweepy import time import pandas as pd pd.set_option('display.max_colwidth', 1000) # api key api_key = "Enter API Key Here" # api secret key api_secret_key = "Enter API Secret Key Here." # access token access_token = "Enter Access Token Here" # access token secret access_token_secret = "Enter Access Token Secret "# API Key authentication = tweepy.OAuthHandler(api_key, Authentication. Set_access_token (access_token, access_token, access_token) API = tweepy.API(authentication, wait_on_rate_limit=True)Copy the code
Next, we’ll define a function “get_related_tweets” that will take the parameter text_query and return 50 tweets related to that particular text query. We’ll use the search API to get the results from Twitter.
Some of the parameters of the search API are:
- Q – A search query string of up to 500 characters
- Geocode – Returns tweets for users within a given radius of a given latitude/longitude
- Lang – Restricts tweets to a given language, given by ISO 639-1 code
- result_type– Specify the type of search results you want to receive. The current default is “mixed”. Valid values include:
- Mixed: Returns both popular and real-time results
- Recent: Returns only the most recent result
- Popular: Returns only the most popular results
- Count – The number of results attempted per page. You can request up to 100 tweets at a time
- Max_id – Returns only states whose ID is less than (that is, earlier than) or equal to the specified ID. With this option, you can automatically get a large number of unique tweets
We will request 50 tweets for a given text query along with the time the tweet was created, the tweet ID, and the tweet text, and the function will return data frames for all tweets:
Def get_related_tweets(text_query): # tweets_list = [] # count = 50 try: # Tweets for tweet in api.search(q=text_query, count=count): Print (tweet.text) # add to the list of tweets tweets_list.append({'created_at': tweet.created_at, 'tweet_id': tweet.id, 'tweet_text': tweet.text}) return pd.DataFrame.from_dict(tweets_list) except BaseException as e: print('failed on_status,', str(e)) time.sleep(3)Copy the code
Create web pages
Here we will create a web page that looks something like the following:
It will have a text box in which the user can type a text query and then click the Search button to get the results of the search text query.
We need to add a form tag to collect the data in the search container, where we pass the methods POST and name as “search”. By providing this method, our back-end code will be able to know that we have received some data named “search”, and on the back end, we need to process it and send some data.
This is just part of the HTML file. You can download the complete code and other files related to the project here.
Github.com/lakshay-aro…
Connect the web page to the model
We’ve done the front end, now we connect the model to the web page. The first step is to load the saved pipe model, and we’ll define a function requestResults that will get the tweets for the requested query and use the pipe to get the tags and return the final results to be sent.
Flask import flask, render_template, Request, Redirect, Url_for from joblib import load from get_tweets import get_related_tweets # load pipeline = Load ("text_classification. Joblib ") # def requestResults(name): # tweets = get_related_tweets(name) # tweets['prediction'] = pipeline. Predict (tweets['tweet_text']) # Data = STR (tweets.prediction.value_counts()) + '\n\n' return data + STR (tweets)Copy the code
Now, first, create an object of the Flask class that will take the current module name as an argument. The Route function tells the Flask application which URL to render on the Web page next.
When the Flask server is running, the Flask application routes to the default URL path and calls the home function, which renders the home.html file.
Flask will now detect the POST method and call the get_data function every time someone sends a text query, where we’ll use the name search to get the form data and then redirect to the SUCCESS function.
Finally, the SUCCESS function uses the requestResults function to get the data and send it back to the web page.
Flask app = flask (__name__) @app.route('/') def home(): Return render_template('home.html') # @app.route('/', methods=['POST', 'GET']) def get_data(): if request. Method == 'POST': user = request.form['search'] return redirect(url_for('success', @app.route('/success/<name>') def success(name): return "<xmp>" + str(requestResults(name)) + " </xmp> "Copy the code
Now, call the run function to start the Flask server:
app.run(debug=True)Copy the code
Viewing the Deployment Model
Flask server has been successfully started! Open your browser and go to this address -http://127.0.0.1:5000/. You will see that the Flask server has rendered the default template. Now search any query such as iPLT20:
Flask server will receive iPLT20-related data and new tweets requests and use this model to predict tags and return results.
Amazing! Here, out of 50 tweets, our model predicted 3 that contained hate speech. We can add more features, like requesting tweets from a particular country.
At the end
This is how you deploy your model using Flask! Deploying a machine learning model may sound like a complex and onerous task, but once you understand what it is and how it works, you’re halfway there.
The original link: www.analyticsvidhya.com/blog/2020/0…
Welcome to panchuangai blog: panchuang.net/
Sklearn123.com/
Welcome to docs.panchuang.net/