Learn automata and language
This chapter introduces the problems of language learning. Is a classic problem explored since the early days of formal language theory and computer science, and there is a very large literature dealing with related mathematical problems. In this chapter, we will give a brief introduction to this problem, with special attention to the problem of learning with finite automata, which itself has been studied in many forms by thousands of technical papers. We will study two frames of learning automata and give algorithms respectively. In particular, we describe an algorithm for learning automata in which learners have access to several types of queries, and we discuss the limits of an algorithm for identifying a subclass of the automata family.
16.1 introduction
Language learning is one of the earliest issues discussed in linguistics and computer science. It is driven by humans’ extraordinary ability to learn natural language. Humans can produce well-formed new sentences at an early age, despite being exposed to a limited number of sentences. And, even at an early age, they can make accurate judgments about the grammaticity of new sentences. In computer science, the problem of learning a language is directly related to learning the representation of the computing device that generates the language. Or learning context-free languages or context-free grammars is equivalent to learning push-down automata. There are several reasons specifically to study the problem of limited autonomous learning. Automata provide natural modeling representations in a variety of domains, including systems, networks, image processing, text, and speech
Figure 16.1 (A \ Mathcal AA) Graphical representation of a finite automaton. (b\mathcal bb) Equivalent (minimum) deterministic automaton.
Processing, logic and many others. For more complex devices, automata can also serve as simple or efficient approximations. In natural language processing, for example, they can be used to approximate context-free languages. Learning automata is usually effective where possible, but as we shall see, this problem is difficult in many natural situations. As a result, it is more difficult to learn more complex devices or languages, and we consider two general learning frameworks: efficient generative learning models and limit recognition models. For each model, we briefly discuss the problem of learning automata and describe an algorithm. This paper first briefly reviews the basic definition and algorithm of automata, then discusses automata’s efficient exact learning problem and automata’s efficient exact learning problem identification in limits.
16.2Finite automaton
We will use ∑\sum∑ to represent a finite alphabet. X ∈ ∑ ∗ \ mathcal x \ \ sum in ^ * x ∈ ∑ ∗ in the length of the string on the alphabet with ∣ x ∣ | | \ mathcal x ∣ x ∣ said. An empty string with ϵ \ epsilon ϵ, therefore ∣ ϵ ∣ = 0 | \ epsilon | = 0 ∣ ϵ ∣ = 0. For any string x=x1… Xk ∈ ∑ ∗ \ mathcal x = \ mathcal x_ {_1} \ dots \ mathcal x_ {_k} \ \ sum in ^ * x = x1… For xK ∈∑∗, k>0\mathcal k> 0K >0, we calculate x[j]=x1… Xj \mathcal x[\mathcal j]=\mathcal x_{_1}\dots\mathcal x_{_j}x[j]=x1… Xj whose prefix length j
Definition 16.1 (Finite automata)
Finite automaton AAA is a 5-tuple (∑,Q,I,F,G)\big(\sum,Q,I,F,G\big)(∑,Q,I,F,G \big), where ∑\sum is a finite alphabet, QQQ is a finite set of states, and I∈QI\in QI∈Q is an initial set of states. F∈QF\in QF∈Q is the set of final states, and E∈Q×(∑∪ϵ)×QE\in Q\times\big(\sum\cup{\epsilon}\big)\times QE∈Q×(∑∪ϵ)×Q is a finite set of migration sets. Figure 16.1a shows a simple example of a finite automaton. The states are represented by circles. The bold circle indicates the initial state, and the double circle indicates the final state. Each transition is represented by an arrow, from the original state to the target state, labeled ∑∪ϵ\sum\cup{\epsilon}∑∪ϵ.
16.3Efficient and accurate learning
The path from the initial state to the final state is called the receive path. An automaton is said to be pruned if all of its states are accessible from its initial state and allow a path to its final state. The string x∈∑\mathcal x\in\sumx∈∑* is accepted by the robot if and only if all of its states are on an acceptable path. For convenience, we will say that when X ∈∑∗\mathcal x∈\sum^*x∈∑∗ is not rejected by A, it is rejected by AAA. The set of all strings accepted by AAA defines the accepted language represented by L(A)L(A)L(A). Finite automata accept the same family of languages as regular languages, that is, languages that can be described by regular expressions. Any finite automaton allows an equivalent automaton without an electronic transformation, that is, no transformation marked with an empty string: there exists a general electron-removal algorithm that takes an automaton as input and returns an equivalent automaton without an electronic transformation. An automaton without electronic transitions is deterministic if it has a unique initial state and no two transitions sharing the same label leave any given state. Deterministic finite automata are usually referred to by the abbreviation DFA, while arbitrary automata are referred to by the abbreviation NFA, i.e., non-deterministic finite automata. Any NFA allows an equivalent DFA: there is a general (exponential time) determination algorithm that takes an NFA without electronic conversion as input and returns an equivalent DFA. Therefore, the language category accepted by THE DFA is the same as the language category accepted by the FFA, which is the general language. For any string X ∈∑\in\sum∈∑* and DFA A, we use A(X) to represent the state in which X reaches A when read from its unique initial state. A DFA is said to be minimal if it does not allow equivalent deterministic automata with fewer states. There is a general minimization algorithm to deterministic automata as input and return to run in O (∣ E ∣ log ∣ Q ∣) O (| E | \ log | | Q) O (∣ E ∣ log ∣ ∣ Q) of the minimal automaton. When entering the DFA is a cycle that when it does not allow a path to form a loop, it can be in the linear time O (∣ E ∣ log ∣ Q ∣) O (| E | \ log | | Q) O (∣ E ∣ log ∣ Q ∣) in minimum. Figure 16.1b shows the minimum DFA equivalent to the NFA in Figure 16.1a.
16.3.0 Effective learning
In an efficient exact learning framework, the problem involves identifying a target concept ϱ\varrhoϱ from a finite set of examples, a time polynomial in the size of the concept representation and the upper bound of the size of the example representation. In this model, unlike the PAC-learning framework, there are no random assumptions, and no hypothetical instances are plotted according to some unknown distribution. In addition, the goal is to determine the goal concept, precisely speaking, without any approximation. The conceptual class C is said to be learnable efficiently if there is an algorithm that effectively learns exactly any C ∈ ϱ\varrhoϱ. We will consider two different scenarios within the framework of effective and accurate learning: passive and active learning scenarios. The passive learning scenario is similar to the standard supervised learning scenario discussed in previous chapters, but without any random assumptions: the learning algorithm passively receives data instances like the PAC model and returns a hypothesis, but here the instances are not assumed to come from any distribution. In the active learning scenario, learners actively participate in the selection of training samples by using the various types of queries we will describe. In both cases we will focus more specifically on the problem of learning automata.
16.3.1 Passive learning
The problem of learning finite automata in this case is called the least consistent DFA learning problem. It can be expressed as: learners receive a finite sample S=((x1x_1x1, y1y_1y1),.. , (xmx_mxm ymy_mym)); For any xi ∈ ∑ ∗ x_i \ \ sum in ^ * xi ∈ ∑ ∗ and yi ∈ y_i \ inyi ∈ {1, + 1}, if yiy_iyi = + 1, then; Is an accepted string, otherwise it will be rejected. The problem involves using this sample to learn the minimum DFA A consistent with S, that is, the automaton with the smallest number of states, which accepts S strings labeled +1 and rejects S strings labeled -1. Note that finding a minimum DFA consistent with S can be regarded as following Occam’s Razor. The problem just described is different from the standard minimization of das. A string with a minimum DFA that accepts the positive flag S may not have a minimum number of states: in general, there may be DFA with fewer states that accept a superset of these strings and reject a sample string with a negative flag. For example, in the simple case of S= ((a, +1), (b, -1)), a minimum deterministic automaton that accepts a uniquely positive-labeled string A or a uniquely negative-labeled string B has two states. However, deterministic automata that accepts language A * accepts A, rejects B, and has only one state. Passive learning with finite automata is a computationally difficult problem. The following theorem gives several known negative results of this problem. Theorem 16.2 The problem of finding minimum deterministic automata consistent with a set of accepted or rejected strings is NP-complete. Hardness results are known even with polynomial approximations, as described in theorem 16.3 if PNP. Then, the polynomial-time algorithm is not guaranteed to find a DFA consistent with a set of accepted or rejected strings whose size is less than the polynomial function of the minimum consistent DFA, even when the alphabet is reduced to just two elements. Passive learning with finite automata also yields other powerful negative results under various cryptographic assumptions. These negative learning outcomes prompt us to consider alternative learning scenarios with finite automata. The next section describes a scenario in which learners can actively participate in the data selection process using various types of queries to achieve more positive results.
16.3.2 Learning Through Query
The learning mode with query corresponds to the learning mode of the (minimal) teacher or Oracle and the active learner. In this model, learners can present the following two types of queries, which oracle responds to: Membership query: Learners request and receive the target tag F (x)∈ F (x)\ INF (x)∈ {-1,+1} of instance A. Equivalent query: learner hypothesis request h\mathcal h h; If h\mathcal h h= f\mathcal f f, it will receive the response yes, otherwise it is a counter example. We would say that while a concept class ϱ\varrhoϱ is efficiently and accurately learning in this model, it is efficiently and accurately learning members and equivalent queries. This model is unrealistic because there is usually no such Oracle in practice. However, it provides a natural framework which, as we shall see, will lead to positive results. Also note that for this model to be meaningful, the equivalence must be computable. For some conceptual classes, such as conflict-free syntax, this is not the case because the equivalence problem is indeterminate. In fact, equivalence must be further tested effectively or it will not provide a response to learners in a reasonable time. Efficient precise learning in this query learning model implies the following variant of PAC learning: If the concept class ϱ\varrhoϱ can PAC learn by an algorithm that accesses a polynomial number of member queries, we will say that it can PAC learn by member queries. Theorem 16.4 Let be a concept class ϱ\varrhoϱ that can be learned efficiently by members and equivalent queries, then ϱ\varrhoϱ is a PAC- learnable member query. Proot: Let A is an algorithm that uses members and equivalent queries to efficiently and accurately learn ϱ\varrhoϱ. Fixed ϱ\varrhoϱ > 0. For human Oracle, answering member queries can also become very difficult under certain circumstances when the query is near the class boundary. This also makes the model difficult to adopt in practice. Target C ∈ϱ\mathcal C \in\varrho C ∈ϱ, each equivalence query passes an example of a polynomial number marker that tests the current hypothesis. Let D be the distribution of points. To simulate the TTH equivalent query, we plot the distribution of mt\mathcal D as points. To simulate the TTH equivalent query, we plot the distribution of m_t D as points. To simulate the t th equivalent query, we plot the point ii.d mt= 1ϵ\frac{1}{\epsilon}ϵ1(log 1δ\log_ \frac{1}{\delta}log δ1 + t log 2\log_ 2log 2). If hTH_THT matches all of these points, the algorithm stops and returns HT. Otherwise, one of the points does not belong to hTH_THT, which provides a counter example. Because A\mathcal A A learns C exactly, it does T\mathcal T T T equivalent queries at most. Where T\mathcal T T is a polynomial that is an upper bound on the representation size of the target concept and the representation size of an example. Therefore, if there is no equivalent query, the simulation will respond positively. The algorithm will terminate after T\mathcal T T equivalent query and return the correct concept C. Otherwise, the algorithm will stop when the first equivalent query gets a simulated positive response. The assumption it returns is not ϵ\epsilonϵ approximation only if the equivalent query of the stop algorithm does not get the correct response. Since for any fixed tE [T\mathcal T T], P \ mathbb {P} [P R \ mathcal R R (hth_tht) > ϵ \ epsilon ϵ] ⩽ \ leqslant ⩽ (1 – ϵ) emt (1 – \ epsilon) e ^ {m_t} (1 – ϵ) emt, For t ∈ \ in ∈ \ mathcal [t t t], [R] (ht) > ϵ \ mathcal R (h_t)] (ht)] > > \ epsilon R ϵ can be limited to:
Thus, the assumption returned by the algorithm is a ϵ\epsilonϵ approximation in cases where the probability is at least 1−δ1-\delta1−δ. In the end, Draw the maximum points of ∑ t = 1 Tmt = 1 ϵ (t \ sum_ {t = 1} ^ {t} \ mathcal m_t = \ frac {1} {\ epsilon} (\ mathcal t t = 1 Tmt = ∑ ϵ 1 (t (log delta 1 + t (t + 1) 2) \ log_ \frac{1}\delta+\frac{\mathcal T(\mathcal T+1)}{2})log δ1+ 2t (T+1)), is in polynomial in 1/ ϵ\epsilonϵ, 1/ δ\deltaδ, and T\mathcal T T. Since the remaining computational costs of A\mathcal A A are also assumed to be polynomial, this proves ℓ\ellℓ belongs to pac-learning
16.3.3 Learning with query automata
\
In this section, we describe an efficient exact learning algorithm for DFA with membership and equivalent queries. We use A\mathcal A A to represent the target DFA and A\mathcal A A to represent the current hypothesis DFA of the algorithm. Used to talk about algorithms. Without loss of generality, we assume that A\mathcal A A is A minimal DFA. The algorithm uses two sets of strings. U and V, U\mathcal U and \mathcal V,\mathcal U U and V, U is a set of access strings: Reading an access string U ∈ U\mathcal U\ in\ Mathcal U U ∈ U will result in state A(U)\mathcal a (U) a (U). This algorithm ensures that state A(u)\mathcal A(u) A(u), u∈ Mathcal u\ mathcal u\in\ mathcal u u u∈mathcal u is different. To do this, it uses a distinguishable set of strings V. As A\mathcal V. As \mathcal A V. Since A is the minimum, there must be at least one string for two different states q and q\mathcal q in A and \mathcal q in A and q’ in A, Q thus not from q\mathcal q so not from q\mathcal q q so not from q’ leads to the final state. And vice versa. This string helps distinguish q from q\mathcal q and \mathcal q q from q’. The set of string V is as follows
Figure 16.2
Classification tree
= {
} and
= {
}.
use
Construct the current automaton
.
Target automaton
Help to distinguish
Any pair of the
.
They actually define the partitioning of all strings and the goal of the algorithm is to find a new access string at each iteration that is different from all the previous access strings
, and finally get the number of access strings equal to the number of states 4. It can identify the access string u for each state (u). Find the destination’s state transition pasted with E2 leaving state U, which can determine that using the partition accessed by V “belongs to the same equivalence class UA. The end result of each state can be determined in a similar way.