Article | water without sugar production | OSC open source community (ID: oschina2013)

InfoWorld has released its list of the best Open Source software for 2021.

InfoWorld is an international technology media brand dedicated to leading IT decision makers on the cutting edge of technology. Every year, InfoWorld is organized according to the contribution of software to the open source community. InfoWorld’s Best of Open Source Software Awards (Bossies), which has been awarded for more than a decade.

According to InfoWorld, the 28 winning open source projects represent the best and most innovative software that open source software has to offer today — software development, development, cloud native computing, machine learning, and more.

Next, take a look at each project in detail (click on each project name to jump to the project introduction page).

Svelte and SvelteKit

InfoWorld comments that of the many innovative, open source, front-end JavaScript frameworks, Svelte and its full-stack counterpart, SvelteKit, are perhaps the most ambitious and far-sighted. Svelte has upended the status quo from its inception by adopting a compile-time strategy and has moved forward with outstanding performance, continued growth, and a great developer experience. SvelteKit, now in public beta, continues the Svelte tradition of making a leap forward by adopting the latest tools and deploying to a serverless environment as a built-in feature.

Minikube

InfoWorld believes that Minikube can be considered as an alternative to Docker Desktop. Minikube is an easy tool to run Kubernetes locally. It makes it easy to create standalone Kubernetes clusters in a virtual machine on your laptop. Easy to try Kubernetes or use Kubernetes daily development.

Pixie

Pixie is an observability tool for Kubernetes applications. It can view advanced cluster states, such as service maps, cluster resources and application traffic. You can also drill down to more detailed views, such as POD status, flame maps, and individual full-body application requests. Pixie collects telemetry data automatically using eBPF, which collects, stores, and queries all telemetry data locally in the cluster, using less than 5% of the cluster CPU. Pixie’s use cases include network monitoring, infrastructure health, service performance, and database query profiling within a cluster.

FastAPI

FastAPI is a high-performance Web framework for building apis. Main features:

  • Fast: Very high performance, on par with NodeJS and Go
  • Fast coding: Speed up feature development by about 200% to 300%
  • Fewer errors: about 40% fewer human errors
  • Intuitive: Powerful editor support, autocompletion everywhere, and less debugging time
  • Simplicity: Designed to be easy to use and learn, reducing the time spent reading documents.
  • Brevity: Reduces code duplication.
  • Robust: Get code that can be used in a production environment, with automated interactive documentation
  • Standards-based: Open standards based on and fully compatible with THE API OpenAPI and JSON Schema

Crystal

Crystal has been in development for several years as a project to provide a programming language with the speed of C and expressiveness of Ruby. With the release of Crystal 1.0 earlier this year, the language is now stable enough for general workloads. Crystal uses static typing and the LLVM compiler to achieve high speed and avoid common problems such as empty references at run time. Crystal can interface with existing C code for further speed and convenience, and it can use compile-time macros to extend the syntax of the base language.

Windows Terminal

Windows Terminal is a new, popular and powerful command-line Terminal tool. It has a lot of features that the community is calling for, such as multi-tab support, rich text, multi-language support, configurability, themes and styles, emoji support and GPU-based text rendering, and more. At the same time, the terminal still meets our goals and requirements to ensure that it remains fast, efficient, and does not consume a lot of memory and power.

InfoWorld says that given time, Windows Terminal will one day replace the old console console in Windows.

OBS Studio

OBS Studio is real-time streaming and screen recording software designed to efficiently capture, compose, encode, record and stream video content across all streaming media platforms.

Features:

  • High-performance real-time video/audio capture and mixing. Create scenes from multiple sources, including window captures, images, text, browser Windows, webcams, capture cards, and more.
  • Set up an unlimited number of scenarios that users can seamlessly switch through custom transitions.
  • Intuitive audio mixer with each source filter, such as noise gate, noise suppression and gain. Full control of VST plug-in support.
  • Powerful and easy to use configuration options. Add new sources, copy existing sources, and easily adjust their properties.
  • A compact Settings panel gives users access to a variety of configuration options to adjust various aspects of broadcast or recording.
  • The modular Dock UI allows the user to completely rearrange the layout as needed. Users can even pop each individual Dock into their own window.

Shotcut

Shotcut is a cross-platform video editing tool that allows people to make all the standard corrections to audio and video tracks while applying effects and layering. Shotcut has a very active community and offers plenty of hands-on videos and instructions to help novice and advanced photographers. It runs on Mac, Linux, BSD, and Windows — and while it’s cross-platform, its interface is agile and relatively simple to use compared to similar tools.

Weave GitOps Core

Weave GitOps supports efficient GitOps workflows for continuous delivery of applications to Kubernetes clusters. It is based on the leading GitOps engine CNCF Flux.

Apache Solr

Apache Solr is a Lucene-based full-text search server and the most popular enterprise-class search engine. Apache Lucene is the basic search technology behind the search capabilities of most of the software you use — including other search engines such as Elasticsearch. Unlike Elasticsearch, Solr dropped its open source license, but it’s still free. Solr is clusterable, cloud-deployable, and powerful enough to build cloud-level search services. It even includes LTR algorithms to help automatically adjust and weight results.

MLflow

MLflow, created by Databricks and hosted by the Linux Foundation, is an MLOps platform that lets people track, manage, and maintain various machine learning models, experiments, and their deployment. It gives you tools to record and query experiments (code, data, configuration, results), package data science code into projects, and chain those projects into workflows.

Orange

Orange aims to make data mining “productive and fun”. Orange allows users to create a data analytics workflow that performs various machine learning and analysis functions as well as visualizations. Orange is very intuitive compared to programmatic or text tools like R Studio and Jupyter. You can drag widgets onto the canvas to load files, analyze the data with the model, and visualize the results.

Flutter

Flutter was built by Google’s team of engineers to create high-performance, cross-platform mobile applications. Flutter is optimized for current and future mobile devices, focusing on low latency input and high frame rates on Android and iOS.

Flutter provides developers with a simple and efficient way to build and deploy cross-platform, high-performance mobile applications. To provide users with beautiful, fast, jitter-free app experience.

Apache Superset

Apache Superset is Airbnb’s open source data exploration and visualization platform (formerly known as Panoramix and Caravel) that features visualization, ease of use, and interactivity that allows users to easily visualize and analyze data. Apache Superset is also an enterprise-class business intelligence Web application.

Presto

Presto is an open source distributed SQL engine for online analysis processing, running in a cluster. Presto can query a wide variety of data sources, from files to databases, and return the results to many business intelligence and analytics environments. More importantly, Presto allows you to query where your data resides, including Hive, Cassandra, relational databases, and proprietary data stores. A Presto query can combine data from multiple sources. Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse.

Facebook, Uber, Twitter and Alibaba created the Presto Foundation. Other members now include Alluxio, Ahana, Upsolver and Intel.

Apache Arrow

Apache Arrow defines a language-independent columnar memory format for flat and layered data, organized for efficient analysis operations on modern cpus and Gpus. The Arrow memory format also supports zero-copy reads for blitzkriening data access without serialization overhead. The Arrow library is available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby and Rust.

InterpretML

InterpretML is an open source Explainable AI (XAI) package that contains several state-of-the-art machine learning interpretable technologies. InterpretML lets you train interpretable Glassbox models and interpret black box systems. InterpretML helps you understand the global behavior of your model, or understand the reasons behind individual predictions. Among its many features, InterpretML has a “glass box” model from Microsoft Research called Explainable Boosting Machine, It supports Lime for post-hoc interpretation with black-box model approximations.

Lime

Lime (short for Local Interpretable Model – Agnostic translation) is a post-hoc technique that interprets the predictions of any machine learning classifier by perturbing the features of the input and examining the predictions. Lime can interpret any black-box classifier with two or more classes for both text and image domains. Lime is also included in InterpretML.

Dask

Dask is an open source library for parallel computing that extends Python packages to multiple machines. Dask can distribute data and computation across multiple Gpus, whether in the same system or in a multi-node cluster. Dask is integrated with Rapids cuDF, XGBoost, and Rapids cuML for GPU-accelerated data analysis and machine learning. It also integrates with NumPy, Pandas, and SciKit-Learn to parallelize its workflow.

BlazingSQL

BlazingSQL is a GPU-accelerated SQL engine built on the RAPIDS ecosystem. RAPIDS is based on the Apache Arrow columnar memory format, and cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and manipulating data.

BlazingSQL is an SQL interface to cuDF with various capabilities to support large-scale data science workflows and enterprise data sets.

Rapids

Nvidia’s Rapids open source software library and API suite gives you the ability to perform end-to-end data science and analytics pipelines entirely on your GPU. Rapids uses Nvidia CUDA primiprimians for low-level computing optimization and exposes the parallelism and high bandwidth memory speed of the GPU through a user-friendly Python interface. Rapids relies on the Apache Arrow columnar memory format, including cuDF, a DataFrame library similar to Pandas; CuML, a collection of machine learning libraries that provide GPU versions of most of the algorithms in SciKit-Learn; And cuGraph, an accelerated graph analysis library similar to NetworkX.

PostHog

PostHog is an open source product analysis platform built for developers. Automatically collects every event on your website or app without sending data to third parties. It provides event-based analysis at the user level, capturing your product usage data to see which users are performing which actions in your application. It automatically captures clicks and aggregate views to analyze what your users are doing, rather than manually pushing events.

LakeFS

LakeFS provides a way to “manage your data lake as you manage your code”, adding a layer of version control similar to Git to object storage. This application of Git semantics allows users to create their own isolated, zero-copy branches of data on which to work, experiment, and model analysis without the risk of breaking shared objects. LakeFS brings useful Commit Notes, metadata fields, and ROLLBACK options to your data, as well as validation hooks to maintain data integrity and quality — running format and schema checks before an uncommitted branch is accidentally merged back into production. With LakeFS, familiar techniques for managing and securing code bases can be extended to modern databases such as Amazon S3 and Azure Blob storage.

Meltano

Meltano was spun out of GitLab this year as a free, open source, DataOps replacement tool chain for traditional ELT (Extract, load, Transform). Meltano’s data warehouse framework makes it easy to model, extract, and transform data for your projects, and complements the integration and transformation pipeline with built-in analysis tools and dashboards that simplify reporting. With a reliable library of extractors and loaders, as well as support for singer-standard DATA taps and data loading targets, Meltano has become a power source for data choreography.

Trino

Trino (formerly PrestoSQL) is a distributed SQL analysis engine capable of running extremely fast queries against large distributed data sources. Trino allows you to execute queries against data lakes, relational stores, or multiple sources simultaneously without copying or moving data for processing. And Trino works well with any business intelligence and analytics tools your data scientist might use, whether interactive or AD hoc, minimizing the learning curve. As data engineers strive to support complex analysis from an increasing number of data sources, Trino provides a way to optimize query execution and accelerate results from different sources.

StreamNative

StreamNative is a highly scalable message and event flow platform that greatly simplifies the data pipelining of real-time reporting and analysis tools and enterprise application flows. StreamNative combines Apache Pulsar’s powerful distributed stream processing architecture with enterprise extras such as Kubernetes and hybrid cloud support, a large data connector library, easy authentication and authorization, and dedicated tools for health and performance monitoring, It simplifies both the development of pulSAR-based real-time applications and the deployment and management of large-scale messaging backplanes.

Hugging Face

As the most important open source deep learning repository, Hugging Face isn’t a deep learning framework in itself. “The goal of Hugging Face is to expand beyond text, enabling images, audio, video, object detection and more. Infoworld points out that deep learning practitioners should keep a close eye on this REPO for years to come.

EleutherAI

EleutherAI is a distributed group of machine learning researchers that aims to bring GPT-3 to everyone. At The beginning of 2021, EleutherAI released The Pile, an 825GB diverse text data set for training; In June, it announced GPT-J, a 6 billion parameter model that is roughly equivalent to OpenAI’s Curie Variant of GPT-3. With gpT-NeOX, EleutherAI plans to raise the parameters all the way up to 175 billion to compete with the most extensive GPT-3 model available today.

Infoworld commented, “Hackers versus the world’s biggest company? That’s the power of open source.”

Colab notebooks for generative art

Infoworld says that in general the winners of Bossies are the open source backbone of libraries, frameworks, platforms and operating systems. This year, however, they argued that some excellent open source components should also be recognized.

The first is OpenAI’s CLIP (Contrast Language-Image Pre-training) model, a multimodal model for generating text and image vector embedding. While CLIP is completely open source, OpenAI’s generative neural network, All-E, is not. To fill this gap, Ryan Murdoch and Katherine Crowson developed Colab Notebooks, integrating CLIP with other open source models such as BigGAN and VQGAN, Make prompt-based generative art works. Notebooks, which are licensed by MIT, have been distributed widely over the Internet over the past few decades, remixed, altered, translated, and used to produce amazing works of art. See ai_curio for details.

These are the InfoWorld Bossie Awards 2021. For more information on the winners and finalists for each of the Awards, see the original website:

www.infoworld.com/article/363…

Recent hot articles recommended:

1.1,000+ Java Interview Questions and Answers (2021)

2. Don’t use if/ else on full screen again, try strategy mode, it smells good!!

3. Oh, my gosh! What new syntax is xx ≠ null in Java?

4.Spring Boot 2.5 is a blockbuster release, and dark mode is exploding!

5. “Java Development Manual (Songshan version)” the latest release, quick download!

Feel good, don’t forget to click on + forward oh!