[Data Contest] Interpretation of Kaggle ARC Top1 scheme

SW King****

Kaggle: Abstraction and Reasoning Challenge Top1

Background of the problem

Can a computer learn complex, abstract tasks from a few examples?

Current machine learning techniques are data-intensive and fragile, and they can only understand patterns they’ve seen before. Using current methods, an algorithm can acquire new skills through exposure of large amounts of data, but cognitive abilities that can be broadly generalized to many tasks remain elusive. This makes it challenging to create systems that can handle the variability and unpredictability of the real world, such as home robots or self-driving cars.

However, other approaches, such as inductive programming, offer the potential for more human abstraction and reasoning. The Abstract and Inference Corpus (ARC) provides a benchmark for the acquisition of artificial intelligence skills on unknown tasks, with the limitation that only a few demonstrations can learn complex tasks. Artificial intelligence offers an opportunity to quickly solve future problems. The Kaggle Abstraction and Reasoning Challenge invites you to try to bring the future into the present!

The contest was moderated by Franois Chollet, founder of the Keras neural network library. Chollet’s paper on measuring intelligence provides the background and motivation behind ARC benchmarks.

In this match, you will create an AI that can solve reasoning tasks that have never been seen before. Each ARC task contains 3-5 pairs of training inputs and outputs, as well as one test input, and you need to predict the corresponding outputs using patterns learned from the training example.

If successful, you’ll help bring computers closer to human cognition, and you’ll open the door to a whole new set of ai applications!

The problem to assess

1. Assessment indicators \

The competition uses the Top 3 error rate as the evaluation index. For each task in the test set, you can predict up to 3 outputs per test input grid. Each mission output has a ground truth. For a given task output, if any of the three predicted outputs contains ground truth, the error of the task is 0; otherwise, it is 1. The final score is the average error for all tasks.

Mathematically, for each task, your algorithm can make three predictions at most. Among them, the error of the task about the ground truth is:

Where, is 0, otherwise is 1, and the overall error score is:

2. Form of output prediction

The training, evaluation, and test input data are all in the same JSON format. Input and output are stored in 2d Python lists. However, output forecast must be flattened as strings, a list of line | space.

Output, for example, [[1, 2], [3, 4]] should be reformatted for 12 | | | 34 as predicted. Each task can have up to three output predictions, which should be separated by Spaces. See the sample. The CSV.

The following Python code converts 2d list pred to the correct format:

def flattener(pred):
    str_pred = str([row for row in pred])
    str_pred = str_pred.replace(', '.' ')
    str_pred = str_pred.replace('[['.'|')
    str_pred = str_pred.replace('] ['.'|')
    str_pred = str_pred.replace(']] '.'|')
    return str_pred
Copy the code

3. Submit documents

You can make up to three predictions for output_id output for each task in the test set. Output_id is the ID of the task, with an indicator indicating which output you want to predict for the task. Most tasks have only one output (0), although some tasks have two outputs that must be predicted (0, 1). This file should contain a header and have the following format:

output_id,output
00576224_0. |32|78| |32|78| |00|00|...12997ef3_0,|00000000000|01100000000|11000000000|...
12997ef3_1,|00000000000|01100000000|11000000000|...
etc.
Copy the code

Data description

The goal of this competition is to create an algorithm that can solve abstract reasoning tasks. The format of the competition is quite different from previous competitions, so please read this information carefully and refer to the supplementary documentation as required.

When viewing a task, Test-taker has access to the inputs and outputs of the training pairs, as well as the inputs of the test pair. The goal is to construct an output grid corresponding to the test input grid, using three trials per test input.” “Build the Output grid” consists of selecting the height and width of the output grid, and then filling each cell in the grid with symbols (integers between 0 and 9, which can be considered as colors). Only the exact solution (all the cells match the expected answer) can be said to be correct.

An additional piece of information and an interactive application can be found on the Github page of the Abstract and Reasoning corpus to explore the goals of the contest. It is highly recommended that you download and browse the interactive application, which is the best way to learn about the objectives of the contest.

Task file form \

Task files are stored in two ways:

Training: Contains task files for training (400 tasks). Use these Prototype algorithms for your algorithm or train your algorithm to gain ARC related cognitive prior knowledge.
Evaluation: Task file containing 400 tasks for evaluation;

Tasks are stored in JSON, and each JSON file contains a dictionary of two fields:

Train: describe pairs of input/output, a list of pairs.
Test: Test input/output pairs, a list of pairs (typically 1 pair).

A pair is a dictionary with two fields: – “input” : pair input “grid”; – Output: grid output of the pair.

A “grid” is a rectangular matrix between 0 and 9. The smallest possible grid size is 1*1, and the largest is 30 * 30.

Note: For this contest, we retain the language of the ARC dataset, but this can create some ambiguity. Both training and evaluation tasks involve training and testing pairs of inputs and outputs. The leaderboard will be scored using 100 invisible test tasks. The test task will contain training pairs of inputs and outputs, but you only get the task test input. Your algorithm must predict the test output. For most of the 100 tasks used for leaderboard scores, only one test input will require a corresponding output prediction, although for a few tasks, you will be asked to predict two test inputs. The output_id shown in the example represents the task ID as the test input ID in sample_submission.csv.

The file \

Training: a folder of mission files, containing training data for 400 missions;
Evaluation: A folder of task files containing 400 tasks that are used for final algorithm evaluation.
The test:
- The Test folder on the data page contains the top 100 tasks copied from the Evaluation folder, but no answers; The purpose is to ensure that the code works properly when the synchronization is re-run.
- The synchronized rerun of the Test folder contains 100 (unviewed) tasks for scoring the model; These same unviewed 100 tasks are used for public and private leaderboards. You will not have access to these tasks, and your public/private score will be calculated during a synchronous re-run.
Sample_submit.csv: a well-formed sample submission file; The output_ID column contains the task and test output IDS that need to be predicted; The sample submitted list of Output_id on this data page will be different from the re-run laptop synchronization.

Top1 scheme interpretation

The main component of the core solution is a DSL that applies four of 142 unary transformations (based on 42 different functions, some of which have multiple variants). I effectively enumerate the transformations by reducing duplication, and then combine them by greedily stacking them to fit the training sample. Everything is implemented in C++ (with no dependencies) and can be run in parallel. A simple scheduler tries to take full advantage of the 9 hours / 16GB memory budget.

1. Transformation’s selection and engineering

The most important points of image transformation:

Cut(Image) -> list of images
- Try to find a background color and divide the remaining pixels into angular connected groups;
filterCol (image, color) -> image
- Delete all colors except the given one (set it to 0);
colShape (image, color) -> image
- Change all non-zero pixels to “color”.
composeGrowing (list of images) -> image
- Stack the list of images together, treating 0 as transparent. The image with the fewest non-zero pixels is at the top.
compress (image) -> image
- Extract the smallest image containing all non-zero pixels
rigid (image, id) ->image
- Rotate (0/90/180/270) and flip
pickMax (list of images, id) -> image
- Extract the image with the largest attribute, such as id=0 extract the image with the most non-zero pixels.

I constructed the transformation by manually solving 100 training tasks and 100 evaluation tasks, and then extracting useful functions. Generalize when it seems reasonable (such as adding all rotations if I use one of them). I didn’t try to cut the conversion, because the given mission didn’t seem to represent what was needed on the leaderboard.

The transformations stack very well and even solve a few of the tasks I had to do manually with other transformations (the model wasn’t available).

Integration of 2.

In the last model, I ran four different configurations and integrated the predicted results.

I run the transform to search depth 3, depth 3 increases the diagonal flip (multiplied by two diagonal flips), and finally depth 4 until I run out of time or memory. Choose the best forecast based on the following criteria, the highest criteria being the most important criteria:

The largest number of training samples were processed
Minimum depth solution
Images stacked with greedy Stacker least

3. Tricks

Augmenting my sample with the diagonal flip task, a simple trick, gave me significantly better scores. Preprocessing all samples by remapping colors according to some heuristic also worked surprisingly well.

I believe my main advantage over the competition is my experience in competitive programming. It allows me to quickly and efficiently write a large number of image transformations in C++, which allows me to search for more transformation combinations than Python implementations or other less optimized solutions.

4. Running time

One natural way to make my method work faster is to reduce the search depth. When I took the full 9 hours, I could run about half the problems in depth 4 and about 20 times faster (20 times less memory) in depth 3. During development, I would run on Depth 2, which is 15 times faster than Depth 3, while solving 80% of the tasks on the evaluation set.

reference

www.kaggle.com/c/abstracti…
www.kaggle.com/c/abstracti…

Highlights of past For beginners entry route of artificial intelligence and data download machine learning and deep learning notes such as printing machine learning online manual deep learning notes album "statistical learning method" code retrieval based album album download AI based machine learning mathematics Access to this site knowledge planet coupons, copy the link directly open: HTTPS://t.zsxq.com/qFiUFMVThis qq group704220115. To join the wechat group, scan the code:Copy the code

[Data Contest] Interpretation of Kaggle ARC Top1 scheme

2. Form of output prediction

3. Submit documents

1. Transformation’s selection and engineering

Integration of 2.

3. Tricks

4. Running time

Related Posts

spring boot + rabbitmq

Understanding interface and interface{} in Golang

How to explain to girlfriend why 12306 will user information leakage (1) – password