Abstract:Python has become the most popular language in the era of machine learning. What Python libraries are being used? Today we take a look at the top 10 most popular Python machine learning libraries of 2017.

December is the time for everyone to review their achievements in the past year and make plans for the future as well. For programmers, December is usually a time to review open source libraries released this year or recently popular ones, as they are great tools to solve our problems for some time to come.



With the rapid development of AI, machine learning has reached its peak. Today we take a look at the most popular machine learning libraries (ML) in 2017, hoping that you can find the “sharp tools” here in the future.

1. Pipenv

Pipenv is the official recommendation tool for managing dependencies that was opened earlier this year. Pipenv was originally a project created by Kenneth Reitz to incorporate ideas from other package managers, such as NPM or YARN, into Python. Install VirtualEnv and VirtualenvWrapper, and ensure that dependency versions of dependencies are repeatable (read more about this here). With Pipenv, you can specify all dependencies, often using commands to add, remove, or update dependencies. The tool generates a file that makes your build deterministic, which helps you avoid hard-to-catch bugs.

2. PyTorch

PyTorch, a DLT framework introduced by Facebook this year, is popular with the deep learning community. PyTorch is built on top of the popular Torch framework, especially as it is based on Python. Given that people have been using Python for data science for the past few years, this is why deep learning libraries are mostly Python.

Most notably, PyTorch has become one of the frameworks of choice for many researchers because it implements the novel Dynamic Computational Graph paradigm. When writing code using frameworks such as TensorFlow, CNTK, or MXNet, you must first define something called a computational graph. This diagram specifies all the operations that our code will run, which will later be compiled and optimized by the framework to run faster in parallel on the GPU. This example is called a static graph because you can take advantage of various optimizations, and the graph, once built, can run on different devices. However, in tasks such as natural language processing, the workload is often variable. Before feeding the image to the algorithm, the image is adjusted to a fixed resolution, but the same processing cannot be done for sentences of variable length. This is exactly what PyTorch and dynamic charting are good for. By letting you use standard Python control instructions in your code, the graphics will be defined at execution time, giving you more freedom, which is essential for several tasks.

Of course, PyTorch calculates gradients automatically, too, and is very fast and scalable.

3. Caffe2

As unrealistic as it may sound, Facebook has also released another DL framework this year – Caffe2. The original Caffe framework has been widely used for many years and is known for very good performance and a tested code base. However, the recent trend of DL has made this framework seem out in some ways. Caffe2 has become its replacement.

Caffe2 supports distributed training, deployment and supports the latest CPU and CUDA hardware. Although PyTorch may be better suited for research, Caffe2 is better suited for large-scale deployment. In fact, you can build and train models in PyTorch while deploying with Caffe2! Isn’t that great?

4. Pendulum

Last year, Arrow was a class designed to make it easier for you to use Python date time while making the list, while this year it’s Pendulum.

One of the nice things about Pendulum is that it’s a direct replacement for Python’s standard DateTime class, so you can easily integrate it with existing code and use its functionality only when you need it. The authors take special care to ensure that time zones are handled correctly, making each instance time zone aware of its own time zone by default. You’ll also get an extension to timedelta so date-time arithmetic is easier.

Unlike other libraries, it strives to make the API behave predictably. If you’re doing something small that involves dates, check out more documentation.

5. Dash

If you’re doing data science, you’ll probably use excellent tools like Pandas and SciKit-learn in the Python ecosystem. JupyterNotebook can also be used to manage your workflow. But what do you do when you’re doing a job with people who don’t know how to use the tools? How do you build an interface that makes it easy for people to play with data and visualize it along the way? In the past, you might have needed a professional JavaScript front end team to build these GUIs.

Dash has released an open source library for building Web applications in recent years, specifically Web applications that leverage data visualization in pure Python. It’s built on top of Flask, Plotly. Js and React and provides interfaces so you don’t have to learn these frameworks to develop efficiently. If you want to learn more about Dash’s fun apps, click here.

6. PyFlux

There are many libraries in Python for studying data science and ML, but when your data is metrics that change over time (such as stock prices, instrument measurements, etc.), this can be a tricky problem for most libraries.

PyFlux is an open source Python library developed specifically for time series. Time series research is a subfield of statistics and econometrics in which goals describe how time series behave (in terms of characteristics of underlying factors or interests) and can be used to predict future behavior.

PyFlux allows time series modeling and has implemented modern time series models like GARCH.

7. Fire

Typically, you will need to create a command line interface (CLI) for your project. In addition to the traditional Argparse, Python has several such tools, Clik and Docopt. Fire, a software library released by Google this year, takes a different approach to this problem.

Fire is an open source library that automatically generates a CLI for any Python project. The key is that it is automatic, you hardly need to write any code or documentation to build your CLI! All you need to do is call a Fire method and pass what it needs to build to the CLI.

If you want to learn more about this, read the guide, as this library can save you a lot of time.

8. Imbalanced-learn

In an ideal world, we would have perfectly balanced data sets, but unfortunately the real world doesn’t work that way, and some tasks have very lopsided data. For example, in predicting fraud in credit card transactions, you would expect the vast majority of transactions (99.9 percent) to be legitimate. Naively training ML algorithms can lead to disappointing performance, so care needs to be taken when dealing with these types of data sets.

Fortunately, imbalanced-Learn is a Python package that provides some solutions to such problems, as well as implementations of techniques that are compatible with Scikit-learn and are part of the Scikit-learn-contrib project.

9. FlashText

If you need to search for some text and replace it with something else (as in most data cleansing processes), this usually turns into a regular expression. In general, regular expressions work perfectly. But sometimes it happens that the number of terms you need to search is in the thousands, and then the regular expression can become very slow. In this case, FlashText is a better choice, which makes the overall operation run time significantly higher (from 5 days to 15 minutes). The beauty of FlashText is that the runtime is the same no matter how many search criteria there are, whereas in a regular expression the runtime grows almost linearly with the number of criteria.

FlashText demonstrates the importance of algorithms and data structure design so that even for simple problems, better algorithms can easily outperform the fastest cpus.

10. Luminoth

Images are ubiquitous in the real world, and understanding their content is critical for multiple applications. Thankfully, image processing technology has improved a lot thanks to the development of DL.

Luminoth is an open source Python toolkit for computer vision built using TensorFlow and Sonnet. Currently, it supports object detection in the form of a model called Faster R-CNN.

And Luminoth is not just an implementation of a specific model, but is built on the basis of modularity and extensibility, so customizing existing parts or extending it with new models to handle different problems allows for more reuse of code. It provides the engineering work needed to build DL models with ease such as: Convert your data to a format used to provide data pipelines (records for TensorFlow), perform data enhancement, train on multiple Gpus, run evaluation metrics, visualize in TensorBoard, and deploy trained models in a simple API or browser interface for people to use.

Other excellentPythonLibrary:

1.PyVips

You’ve probably never heard of the Libvips library, first it’s an image processing library like Pillow or ImageMagick, and supports a variety of formats. However, libvips is faster and takes up less memory than other libraries. PyVips is a recently released Python bundle for libvips that is compatible with Python 2.7-3.6 (and even PyPy) and easy to use with PIP. If you need some form of image manipulation in your application, consider it.

2.Requestium

There will be times when you need to automate operations on your network, such as crawling websites, testing applications, and filling out web forms, which is necessary to perform operations on a site that doesn’t expose apis. Python has a good request library that allows you to do some of this, but unfortunately the HTML code you request for may not have a form, and you might try to find a form to fill in the automated tasks. The solution to this problem is to reverse engineer the requests made by the JavaScript code, which can mean spending a lot of time debugging. Another option is to switch to a library like Selenium, which allows you to programmatically interact with a Web browser and run Javascript code. With this, the problem can be solved.

The Requestium library lets you start with a request and seamlessly switch to using Selenium, which can be used as a direct replacement for a request. It also integrates with Parsel, so writing all the selectors for finding elements in a page is faster than the other way around.

3.skorch

Let’s say you enjoy using the SciKit-Learn API, but come across the need to use PyTorch to get things done. Don’t worry, Skorch is a package that provides PyTorch programming through a SkLear-like interface. If you are familiar with these libraries, the syntax will be easy to understand. With Skorch, you get some abstract code, so you can focus more on the things that really matter, like doing data science.

This article is translated by Ali Yunqi Community Organization.

The original title of this article was top-10-python-libraries-of-2017.

Author: Alan Descoins blog: https://tryolabs.com/blog/authors/alan-descoins/

Translator: The tiger said eight things

The article is a brief translation. For more details, please refer to the original text