Melting Down the Number of Variables for the Inclusion Problem for Pattern Languages and Logspace-Decidability of Determinism of Regular Languages

Abstract

This dissertation examines a total of three subject areas: The construction of a deterministic 2-counter-automaton with input for simulating a universal Turing machine with input. The more in depth investigation of the inclusion problem for erasing and non-erasing pattern languages in comparison to preceding publications. And the problem of deciding whether a language is deterministic regular if it is given in the form of a deterministic finite automaton, a nondeterministic finite automaton or a regular expression.

A deterministic 2-counter-automaton with input has a finite set of states. The machine has two counters as memory in which natural numbers are stored. The input is initially stored in the first counter and the second counter is set to zero. The current state and the information about whether the natural numbers stored in the counters are equal to zero or greater than zero are the input for the transition function. With each transition, the counters can be increased or decreased independently of each other by a maximum of one. A counter can only be reduced if the counter is not zero. As the transition function does not have to be completely defined, the machine can terminate. If it terminates in an accepting state, then the machine accepts the input.

It is already known that it is possible to transform a Turing machine with input into a 2-counter-automaton with input. This requires two transfer steps, which increase the set of states so much that a practical construction does not make sense. They are purely conceptual constructions. You would first convert the Turing machine into a 5-counter-automaton and then simulate this 5-counter-automaton by a 2-counter-automaton. Through a construction without intermediate steps and through some optimizations, I directly construct a 2-counter-automaton with input. The basis for this is an already known universal Turing machine with just 15 states and a binary tape alphabet, where one of the two tape symbols is also the blank symbol. This means that the contents of the tape to the left of the head position and to the right of the head position can each be interpreted as a natural number. Let these natural numbers be l and r. Then the input for the 2-counter-automaton has the form 3l · 5 r , from which l and r can be calculated. The symbol at the head is stored in the states and is initially always the same for the Turing machine. Sub-automata can now be constructed that can check the divisibility properties of the counters using loops and can multiply and divide the counters with small numbers. Thus, for each transition of the original universal Turing machine, a sub-automaton of the 2-counter-automaton is constructed. Once all of these required sub-automata are constructed, they can be put together to get a 2-counter-automaton with input, which consists of 1151 states and 1682 transitions, simulating the universal Turing machine with input.

In addition to the first direct construction of such a 2-counter-automaton, such an automaton is required for the chapter on pattern languages. By this construction exact numbers can be used in the needed patterns instead of variables representing these numbers, as the number of states and the number of transitions of such a 2-counterautomaton are now known.

A pattern is a string composed of symbols from a finite terminal alphabet and a variable alphabet of unlimited size. As an example, take the terminal alphabet, which consists only of the characters 0 and 1 and the variables x and y. Each of these characters can appear any number of times in a pattern. The pattern x 1 y would represent the language containing exactly those words over the terminal alphabet that contain at least one 1. We get these words by replacing the variables x and y respectively with any words over the terminal alphabet. A distinction is made whether variables can also be replaced by the empty word. If so, then this is called an erasing pattern language, and if not, then it is called a nonerasing pattern language. The nonerasing pattern language for the pattern x 1 y would then be a little more complicated, namely exactly the words over the terminal alphabet that contain at least one 1 that is not at the beginning or at the end of the words. For example, things become more complex with the pattern x 1 x, as the variable x now appears several times in the pattern. To form the language, x is still replaced by any words over the terminal alphabet, but both occurrences of the variable x have to be replaced by the same string. Thus, the erasing pattern language for the pattern x 1 x is the set of all words over the terminal alphabet with an odd length, where there is a 1 in the middle of the words and the same string of terminal symbols is in front of and behind this 1. For the nonerasing pattern language and the pattern x 1 x, the words have to be at least of length three. In the following, for the sake of simplicity, I will only refer to erasing pattern languages and instead just write pattern languages. It is obvious that the pattern language of the pattern x 1 x is completely contained in the pattern language of the pattern x 1 y. The inclusion problem deals with exactly this question whether one pattern language is completely contained in the other pattern language for two given patterns.

The inclusion problem for pattern languages is undecidable for finite terminal alphabets from size two and above. This result has been proven in several steps in several publications. Some parts of it are contributions from me, which I will not break down in greater detail at this point. A completely new part is the significant reduction in the number of different variables required for this result to still be valid. The problem remains open for patterns with even fewer variables. Increasing the number of variables would always be easily possible without greatly changing the proof, which is why we are interested in cases with few variables.

For now, just terminal alphabets with exact size two are examined. In the end an infinite set of pattern pairs is constructed. The pattern, which is supposed to represent the supposedly larger language for the inclusion, always remains the same and contains 67 pairwise distinct variables. The other patterns each contain two distinct variables and just differ in one section, which consists of a long chain of one of the terminal symbols. The length of this chain corresponds to the input of the 2-counter-automaton which was already mentioned. In the end, the undecidability is shown by a contradiction. If the inclusion problem could be decided for the patterns, then we would know whether the 2-counter-automaton would terminate and accept the corresponding input. And so we would also know for the universal Turing machine whether it would terminate and accept.

The result is then extended to larger terminal alphabets and cases where patterns do not contain terminal symbols. Finally, all of this is also redone for nonerasing pattern languages. For all of these variants, the number of variables required increases slightly beyond the 67 already mentioned.

Last of all, deterministic regular languages are investigated, which are defined using so-called deterministic regular expressions. Deterministic regular expressions are special regular expressions. The following has to apply to deterministic regular expressions: for every word from the language described by the expression, if you read the word letter by letter from the left to the right, it is clearly determined at any time which letter in the expression created the currently read letter in the word. Thus, for a given word and a deterministic regular expression, it is easy to check whether the word is part of the described language, since when going through the word, the path through the deterministic regular expression cannot split into multiple possibilities.

An algorithm, which decides for minimal deterministic finite automata whether the described language is deterministically regular, is already known. This requires a maximum of polynomial amount of time and of linear amount of space, compared to the size of the automaton. The algorithms now constructed are based on the basic structure of the already existing algorithm.

First, minimal deterministic finite automata are examined again, extending the result also to automata which are not minimal. By adding nondeterminism, it is shown that this problem can be decided even with a logarithmic amount of space. In the original algorithm the automaton was gradually broken down into smaller automata. This is no longer possible with logarithmic space. The problem class of problems that can be solved nondeterministically in logarithmic space is abbreviated by NL. Nondeterminism allows to focus on just one of these subautomata. Due to special properties, this subautomaton can be constructed with just a small amount of information from the input automaton and can therefore be stored in logarithmic space. The construction is never actually carried out, but is based on testing properties, which at the lowest level are then based on reachability tests, nondeterministically requiring a maximum of logarithmic space. In consequence of this and its provable NL-hardness, the problem is NL-complete.

Some of these modifications are then used to further investigate the decision problem whether for given nondeterministic finite automata or regular expressions the described languages are deterministic regular. In consequence the problem will be deterministically decidable in polynomial space - square space to be precise.

Details

Subject

Investigations;
Determinism;
Computer science

Classification

0984: Computer science

Identifier / keyword

Pattern languages; Finite terminal alphabet

URL

https://edoc.hu-berlin.de/items/e8b89f31-173e-4e5d-83c5-a54b5c749298

Title

Melting Down the Number of Variables for the Inclusion Problem for Pattern Languages and Logspace-Decidability of Determinism of Regular Languages

Author

Bremer, Joachim

Number of pages

176

Publication year

2024

Degree date

2024

School code

5416

Source

DAI-B 87/5(E), Dissertation Abstracts International

ISBN

9798265414724

Advisor

Schweikardt, Nicole; Köbler, Johannes; Freydenberger, Dominik

University/institution

Humboldt Universitaet zu Berlin (Germany)

University location

Germany

Degree

Ph.D.

Source type

Dissertation or Thesis

Language

English

Document type

Dissertation/Thesis

Dissertation/thesis number

32319715

ProQuest document ID

3278195457

Document URL

https://www.proquest.com/dissertations-theses/melting-down-number-variables-inclusion-problem/docview/3278195457/se-2?accountid=208611

Go to Library Record

https://edoc.hu-berlin.de/items/e8b89f31-173e-4e5d-83c5-a54b5c749298

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Database

ProQuest One Academic

Melting Down the Number of Variables for the Inclusion Problem for Pattern Languages and Logspace-Decidability of Determinism of Regular Languages

Content area

Abstract

Details