Tags: GitHub, Machine Learning, Open Source, Python, scikit-learn

We examine top Python Machine learning open source projects on Github, both in terms of contributors and commits, and identify most popular and most active ones.

By Bhavya Geethika Peddibhotla.

We analyze Top 20 Python Machine learning projects on GitHub and find that scikit-Learn, PyLearn2 and NuPic are the most actively contributed projects. Explore these popular projects on Github!



Fig. 1: Python Machine learning projects on GitHub, with color corresponding to commits/contributors. Bob, Iepy, Nilearn, and NuPIC have the highest such value.

  1. Scikit-learn, 18845 commits, 404 Characterization, www.github.com/scikit-lear… scikit-learn is a Python module for machine learning built on top of SciPy.It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
  2. Pylearn2, 7027 commits, 117 Characterization, www.github.com/lisa-lab/py… Pylearn2 is a library designed to make machine learning research easy. Its a library based on Theano
  3. NuPIC, 4392 commits, 60 contributors, www.github.com/numenta/nup… The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implements the HTM learning algorithms. HTM is a detailed computational theory of the neocortex. At the core of HTM are time-based continuous learning algorithms that store and recall spatial and temporal patterns. NuPIC is suited to a variety of problems, particularly anomaly detection and prediction of streaming data sources.
  4. Nilearn, 2742 commits, 28 contributors, www.github.com/nilearn/nil… Nilearn is a Python module for fast and easy statistical learning on NeuroImaging data. It leverages the scikit-learn Python toolbox for multivariate statistics with applications such as predictive modeling, classification, decoding, or connectivity analysis.
  5. PyBrain, 969 commits, 27 contributors to www.github.com/pybrain/pyb… PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.
  6. The Pattern, 943 commits, 20 contributors, www.github.com/clips/patte… Pattern is a web mining module for Python. It has tools for Data Mining, Natural Language Processing, Network Analysis and Machine Learning. It supports vector space model, clustering, classification using KNN, SVM, Perceptron
  7. Fuel, 497 commits, 12 focal points, www.github.com/mila-udem/f… Fuel provides your machine learning models with the data they need to learn. it has interfaces to common datasets such as MNIST, CIFAR-10 (image datasets), Google’s One Billion Words (text). It gives you the ability to iterate over your data in a variety of ways, such as in minibatches with shuffled/sequential examples
  8. Bob, 5080 commits, 11 contributors, www.github.com/idiap/bob Bob is a free signal-processing and machine learning toolbox The toolbox is written in a mix of Python and C++ and is designed to be both efficient and reduce development time. It is composed of a reasonably large number of packages that implement tools for image, audio & video processing, machine learning and pattern recognition
  9. Skdata, 441 commits, 10 contributors, www.github.com/jaberg/skda… Skdata is a library of data sets for machine learning and statistics. This module provides standardized Python access to toy problems as well as popular computer vision and natural language processing data sets.
  10. MILK, 687 commits, 9 contributors,

    www.github.com/luispedro/m…

    Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs, k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.For unsupervised learning, milk supports k-means clustering and affinity propagation.
  11. IEPY, 1758 commits, 9 contributors, www.github.com/machinalis/… IEPY is an open source tool for Information Extraction focused on Relation Extraction It’s aimed at users needing to perform Information Extraction on a large dataset. scientists wanting to experiment with new IE algorithms.
  12. Quepy, 131 commits, 9 contributors, www.github.com/machinalis/… Quepy is a python framework to transform natural language questions to queries in a database query language. It can be easily customized to different kinds of questions in natural language and database queries. So, with little coding you can build your own system for natural language access to your database. Currently Quepy provides support for Sparql and MQL query languages, with plans to extended it to other database query languages.
  13. Hebel, 244 commits, 5 Characterization, www.github.com/hannes-brt/… Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.
  14. Mlxtend, 135 commits, 5 contributors, www.github.com/rasbt/mlxte… Its a library consisting of useful tools and extensions for the day-to-day data science tasks.
  15. Nolearn, 192 commits, 4 contributors, www.github.com/dnouri/nole… This package contains a number of utility modules that are helpful with machine learning tasks. Most of the modules work together with scikit-learn, others are more generally useful.
  16. Ramp, 179 commits, 4 contributors,

    www.github.com/kvh/ramp

    Ramp is a python library for rapid prototyping of machine learning solutions. It’s a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently.
  17. Feature Forge, 219 commits, 3 contributors, www.github.com/machinalis/… A set of tools for creating and testing machine learning features, with a scikit-learn compatible API. This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).
  18. REP, 50 commits, 3 contributors,

    www.github.com/yandex/rep

    REP is environment for conducting data-driven research in a consistent and reproducible way. It has a unified classifiers wrapper for variety of implementations like TMVA, Sklearn, XGBoost, uBoost. It can train classifiers in parallel on a cluster. It supports interactive plots
  19. Python Machine Learning Samples, 15 commits, 3 contributors, www.github.com/awslabs/mac… A collection of sample applications built using Amazon Machine Learning.
  20. Python – ELM, 17 commits, 1 contributor, www.github.com/dclambert/P… This is an implementation of the Extreme Learning Machine in Python, based on scikit-learn.

This post used some content from
www.pansop.com/1039/






Related:



Most popular last 30 days


 


Most viewed last 30 days

  1. Poll: What Predictive Analytics, Data Mining, Data Science software/tools you used in the past 12 months? – May 7, 2015.
  2. 7 Steps for Learning Data Mining and Data Science – Oct 10, 2013.
  3. The Grammar of Data Science: Python vs R – Mar 28, 2015.
  4. How To Become a Data Scientist And Get Hired – May 1, 2015.
  5. Awesome Public Datasets on GitHub – Apr 6, 2015.
  6. Data Scientists Automated and Unemployed by 2025? – May 5, 2015.
  7. Top 10 Data Analysis Tools for Business – Jun 13, 2014.
  8. The Myth of Model Interpretability – Apr 27, 2015.
  9. Top 10 R Packages to be a Kaggle Champion – Apr 21, 2015.
  10. 9 Must-Have Skills You Need to Become a Data Scientist – Nov 22, 2014.


 


 

Most shared last 30 days

  1. Poll: What Predictive Analytics, Data Mining, Data Science software/tools you used in the past 12 months? – May 7, 2015.
  2. Most Viewed Big Data Videos on YouTube – May 9, 2015.
  3. The Inconvenient Truth About Data Science – May 5, 2015.
  4. Data Scientists Automated and Unemployed by 2025? – May 5, 2015.
  5. The Myth of Model Interpretability – Apr 27, 2015.
  6. 3 Things About Data Science You Wont Find In Books – May 11, 2015.
  7. Will the Real Data Scientists Please Stand Up? – May 18, 2015.
  8. R vs Python, why each is better – May 19, 2015.
  9. Top 10 Data Mining Algorithms, Explained – May 21, 2015.
  10. 5 Not-to-be-Missed Ideas about Big Data – May 21, 2015.