Content area
This paper proposes an approach to interpreting a response an answer to a test task of an open form in a system for computer-aided testing of knowledge. The notion of a sense's standard is introduced; and the method for building it based on the set of semantic equivalent phrases of natural language and use of sense's standards for compression of a textual knowledge base are described.[PUBLICATION ABSTRACT]
APPLICATION PROBLEMS
Senses Standards and Machine Understanding of Texts in the System for Computer Aided Testing of Knowledge
G. M. Emelyanov and D. V. Mikhailov
Novgorod State University, Veliky Novgorod, 173003 Russia
AbstractThis paper proposes an approach to interpreting a response an answer to a test task of an open form in a system for computer aided testing of knowledge. The notion of a senses standard is introduced; and the method for building it based on the set of semantic equivalent phrases of natural language and use of senses standards for compression of a textual knowledge base are described.
Keywords: computer aided testing of knowledge, test task of an open form, senses standard, Formal Concept Analysis, compression of texts.
DOI: 10.1134/S1054661811040067
INTRODUCTION
A test task of an open form in the system for com puter aided testing of knowledge [7] implies a trainees response in the form of one or several sen tences in Natural Language (NL). In the general case, a developer of a test must describe his variant of a cor rect response to a test question by the assembly of Semantic Equivalent (SE) NL statements assigning a Usage Situation for Natural Language (USNL) based on his own knowledge in an assigned subject field. No restrictions are imposed on the initial assembly of SE statements, and a separate statement can consist of more than one phrase, each of which corresponds to a simple extended sentence.
Traditionally, the interpretation of a trainees response is a simple search among the correct variants here [6]. As we have shown in [5], the estimation of the semantic proximity of a trainees answer to an assigned correct response implies the attraction of a thesaurus based on the sets of variants of correct responses for the assembly of tests in assigned themes. When a USNL is used as an information unit of the thesaurus, separate NL phrases on its basis must describe the corresponding fact of reality as exactly as possible (they must express the sense smoothly). The task is posed to separate the knowledge about similar lan guage forms for describing different situations of real ity (on the one hand) and about the explicitly different forms of the most compact description of each of the situations for presentation in the thesaurus (on the other hand).
This work solves the problem of minimizing the indi cated thesaurus by introducing the senses standard of a USLN and suggests the method for semantic interpreta tion of NL statements based on the patterns built for
the sets of SE phrases according to the results of a dis tinguishing senses standards.
A SENSES STANDARDAND INTERPRETATION OF AN ANSWER
Let a unit of knowledge corresponding to one USNL be fixed by the triple given by:
(1)
named a Formal Context (FC, [2]). Here, the set of objects G is formed by the stems of the words syntacti cally subordinated to other words from SE phrases making up a USNL. The attributes of the objects from set M are put into correspondence to these objects by the relation I G M. The set of signs M itself includes the following subsets:
Indications to the stem of a syntactically major word;
Indications to the inflection of the major word; Stem flexion relations for a syntactically major word;
Combinations of the flexions of a dependent word and the major word. This being the case, a prep osition (if it is present) is indicated after the flexion of the major word through a colon to bind the major word and the word depending on it in a phrase;
Indications to the flexion of the dependent word.
Let us suppose that the indicated formal context is formed based on SE statements, each of which con sists of maximally projective phrases [3]. This being the case, the summary length of syntactic links inside each phrase of this type is smaller or equal to its length. In this case, model (1) will be regarded by us as the model of the senses standard of a USNL.
Note. Let us assume for the following reasoning that each NL phrase in the set of SE statements has a
K G M I
, ,
( ),
=
Received June 16, 2011
ISSN 1054 6618, Pattern Recognition and Image Analysis, 2011, Vol. 21, No. 4, pp. 705719. Pleiades Publishing, Ltd., 2011.
EMELYANOV, MIKHAILOV
prototype in the form of a phrase equivalent to it in its sense in the composition of each of the state ments under consideration (otherwise, the condi tion of semantic equivalence is not fulfilled for the NL statements from the decisive USNLs). What has been said permits one to restrict the consider ation of semantic equivalence by the situation when each of the statements includes one phrase, and to use the term NL phrase instead of the term NL statement below.
Let us transform model (1) into the following form:
(2)
where the set T is obtained from the initial set of SE phrases by replacing each word by the pair (bi, fi), in which bi corresponds to the word stem, and fi corre
sponds to the inflection of this word.
Allow us to note that the initial SE phrases include both the phrases determining the senses standard of a USNL and those that do not do so. To relate such phrases with the standard, let us put some variable xi into correspondence to each stem bi, for which there is either a sign m M: m = pbs bi or object g G. Here,
S T K
,
706
pbs corresponds to the symbolic constant the major thing is a stem, and the symbol designates the operation of concatenation. This being the case, the USLN pattern (the superscript P from the English word Pattern) is built based on model (2):
(3)
This pattern replaces all the designations of stems in the composition of the names of objects and signs of the formal context for the standard of a concrete USNL by variables and gives the list of the specific four given by
(4)
where IdS is the identification number of the USNL, and IdP is the number of its pattern. For the USNL of a newly distinguished pattern, we can take, for exam
ple, IdP = Rnd + 1 and IdS =
, where WSX is the set of the
stems specifying the variables of this USNL, len is the length of the stem bi (in symbols), and 0 Rnd < 1 is a random number.
The example of a FC standard for an USNL assigned by the set of SE phrases in Fig. 1 is shown in Fig. 2, and the pattern KP for the formal context of the USNL corresponding to this standard is given in Fig. 3. The NL phrases included into the standard are presented in Table 1. The set TP for the example under consideration is shown in Fig. 4 (each stem flexion pair is presented by the composite object wm of the Prolog Language, from the English Word Marking), and variables are specified in Table 2.
Allow us to note that in a significant number of testing situations, the interpretation of a trainees answer is an attempt at using pattern (3) for the correct response formulated by the developer of a test. This being the case, it is not necessary to analyze the response by attracting external programs for syntactic analysis, since it is sufficient to superpose the analyzed phrase on one of the patterns included in the set TP with the formation of variable stem pairs that are compared with the structures of form (4) for the cor rect response. The interpretation itself takes place for a linear time proportional to TP.
The suggested concept of the pattern for a USNL in the form of the formal context KP in the compo sition of structure (3) completely agrees with the determination of the sense of a text by the set of the characteristic functions that were spoken about in[1]. The objects of the formal context will form the definitional domains and sets of the values of the indicated functions, and the classes of formal con cept in an lattice for KP assign the types of relations
SP IdP TP KP
, ,
( ).
=
IdP IdS xi bi
=
( ),
, , ,
( ),
WSX
WSX
1 WSX
Rnd len bi
( )
i 1
=
[Nezhelatelnoe pereobuchenie privodit k zanizhennosti empiricheskogo riska. Nezhelatelnoe pereobuchenie, sledstviem kotorogo javljaetsja zanizhennost
Zanizhennost empiricheskogo riska javljaetsja sledstviem nezhelatelnogo
Zanizhennost empiricheskogo riska, javljajushhajasja sledstviem
Empiricheskij risk, zanizhennost kotorogo javljaetsja sledstviem
Empiricheskij risk, zanizhennyj vsledstvie nezhelatelnogo pereobuchenija. Empiricheskij risk, k zanizhennosti kotorogo vedjot nezhelatelnoe pereobuchenie.
Empiricheskij risk po prichine, vyzvannoj nezhelatelnym pereobucheniem,
empiricheskogo riska.
nezhelatelnogo pereobuchenija.
mozhet okazatsja zanizhennym.
pereobucheniem, mozhet okazatsja zanizhennym.
pereobuchenija.
nezhelatelnogo pereobuchenija.
Empiricheskij risk po prichine, obuslovlennoj nezhelatelnym pereobucheniem,
Empiricheskij risk v silu obstojatelstv, svjazannyh s nezhelatelnym
Empiricheskij risk, k zanizhennosti kotorogo privodit nezhelatelnoe pereobuchenie. Nezhelatelnoe pereobuchenie sluzhit prichinoj zanizhennosti empiricheskogo riska. Zanizhennost empiricheskogo riska, prichinoj kotoroj javljaetsja nezhelatelnoe pereobuchenie. Zanizhennost empiricheskogo riska javljaetsja rezultatom nezhelatelnogo pereobuchenija. Nezhelatelnoe pereobuchenie, s kotorym svjazana zanizhennost empiricheskogo riska. Empiricheskij risk, s pereobucheniem svjazana ego zanizhennost.
Zanizhennost empiricheskogo riska svjazana s pereobucheniem. Zanizhennost empiricheskogo riska, javljajushhajasja rezultatom nezhelatelnogo
Nezhelatelnoe pereobuchenie, rezultatom kotorogo javljaetsja zanizhennost empiricheskogo
Nezhelatelnoe pereobuchenie, rezul'tat kotorogo est zanizhennost empiricheskogo riska.
Nezhelatelnoe pereobuchenie, sluzhashhee prichinoj zanizhennosti empiricheskogo riska. Zanizhennost empiricheskogo riska otnositsja k sledstviju nezhelatelnogo
Zanizhennost empiricheskogo riska svjazana s nezhelatel'nym pereobucheniem.
Nezhelatelnoe pereobuchenie javljaetsja prichinoj zanizhennosti empiricheskogo riska. Zanizhennost empiricheskogo riska, prichinoj kotoroj sluzhit nezhelatel'noe pereobuchenie.]
pereobuchenija.
mozhet byt zanizhennym.
pereobuchenie.
riska.Nezhelatelnoe pereobuchenie, rezultat kotorogo est zanizhennost empiricheskogo riska.
Nezhelatelnoe pereobuchenie, privodjashhee k zanizhennosti empiricheskogo riska.
Fig. 1. Initial set of SE phrases.
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
SENSES STANDARDS AND MACHINE UNDERSTANDING 707
ogo:a
risk:a
empirichesk
the main thing is the stem: risk
the main thing is the flexion: ya
pereobucheni:em the main thing is the stem: pereobucheni
flexion: ogo
oe:e pereobucheni:ya
the main thing is the flexion: a
the main thing is the flexion: e
ogo:ya
pereobucheni:e
flexion: oe
nezhelatel'n
Fig. 2.Formal context of the senses standardfor the example of USNL in Fig. 1.
prichin:oioct':etsya
osti:oi
flexion: osti
privod:it: k
the main thing is the stem: prichin
flexion: ym
ym:em
the main thing is the flexion: em
svyazan:a
zanizhenn
octi:it:k
ost':a
flexion: ost'
the main thing is the stem: svyazan
the main thing is the flexion: oi
the main thing is the flexion: it:k
the main thing is the flexion: om
om:etsya flexion: om
resul'tat
the main thing is the stem: privod
flexion: ya
sledtsvi:em
the main thing is the stem: sledstvi
flexion: e
ya:om
the main thing is the
the main thing is the stem: resulpereobucheni
e:etsya
flexion: etsya
yavlya:etsya the main thing is the
stem: yavlya
em:a:s
resul'tat:ome:it
privod:it ya:em
svyazan:a:s
the main thing is the flexion: a:s
flexion: em
the main thing is the flexion: osti
the main thing is
the stem: sluzh
the main thing is the flexion: ost
the main thing is the stem: zanizhenn
em:etsya
cledstvi
the main thing is
the flexion: it
sluzh:it
oi:it
oi:etsya flexion: oi
prochin
a:osta:osti
risk
flexion: a
zanizhenn:ost
zanizhenn:osti
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
708
EMELYANOV, MIKHAILOV
X9:a
ogo:a
X11
the main thing is the stem: X9
the main thing is the flexion: ya
flexion: ogo
oe:e X8:ya
the main thing is the flexion: X8
the main thing is the flexion: a
the main thing is the flexion: it:k
the main thing is the flexion: oi
the main thing is the flexion: e
flexion: ost'
ogo:ya
X10
Fig. 3.Pattern for the formal context of the standard for the example in Fig. 1.
the main thing is the stem: X1
flexion: osti
X8:e
ym:em
flexion: oe
flexion: ym
X8:em
X1:oi
the main thing is the flexion: em
osti:oi
X4:it:kX2
the main thing is the stem: X5
osti:it:k
the main thing is the flexion: om
ost':etsya
est':a
the main thing is the stem: X6
om:etsya flexion: om
X5:a
X7
e:etsya
the main thing is the stem: X4
flexion: ya
em:a:s e:it
X5:a:sya:om
X6:em
X0: etsya the main thing is the stem: X0
X8
flexion: e
ya:em
the main thing is the flexion: etsya
X7:om
X4:it
the main thing is the stem: X7
flexion: em
the main thing is the flexion: a: s
X3:itthe main thing is the stem: X3
em:etsya
X6
the main thing is the flexion: it
the main thing is the flexion: osti
the main thing is the flexion: ost
oi:etsyaflexion: oi
the main thing is the stem: X2
oi:it
X1
a:osti
X9
a:ost
X2:osti
X2:ost
flexion: a
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
SENSES STANDARDS AND MACHINE UNDERSTANDING 709
Table 1. NL phrases in the composition of the senses standard
stem inflectional part + preposition
zanizhenn ost osti ost osti ost osti empirichesk ogo ogo ogo ogo ogo ogo risk a a a a a a nezhelateln ogo oe ogo oe ym oe pereobucheni ya e ya e em e yavlya etsya etsya etsya sledstvi em sluzh it prichin oi oi resultat om svyazan a:s privod it:k
Table 2. Specific definitions of the variables for the pattern in Fig. 3
xi bi xi bi xi bi
X0 yavlya X4 privod X8 pereobucheni X1 prichin X5 svyazan X9 riskX2 zanizhenn X6 sledstvi X10 nezhelateln X3 sluzh X7 resultat X11 empirichesk
within the dependences described by the [lambda] expressions. Here, the characteristic functions themselves describe the interpretations of classes of lexical meanings with a varied degree of abstraction rather than those of separate lexical meanings by introducing variables into the definitional domains and sets of permissible values.
FORMATION OF A SENSES STANDARD
The patterns of the formal contexts in the compo sition of structures (3) can be used to syntactically analyze NL phrases from a trainees response. As the result of analysis, the formal context given by (1) is built with respect to the assigned and related subject
Making of SE phrases in the composition of the USNL pattern
[[wm[X10","e],wm[X8","e],wm[X4","it],wm[k,""],wm[X2","osti],wm[X11","ogo],wm[X9","a]] [wm[X10","oe],wm[X8","e],wm[X6","em],wm[kotorogo,""],wm[X10","etsya],wm[X2","ost'"],wm["X11","ogo],wm[X9","a]] [wm[X2","ost'"],wm["X11","ogo],wm[X9","a],wm[X10","etsya],wm[X6","em],wm[X10","ogo],wm[X8","ya]] [wm[X11","ii],wm[X9",""],wm["X2","ost'"],wm["kotorogo,""],wm[X0","etsya],wm[X6","em],wm[X10","ogo],wm[X8","ya]] [wm[X11","ii],wm[X9",""],wm["X2","yi],wm[vsledstvie,""],wm[X10","ogo],wm[X8","ya]] [wm[X11","ii],wm[X9",""],wm["k,""],wm[X2","osti],wm[kotorogo,""],wm[vedet,""],wm[X10","oe],wm[X8","e]] [wm[X9",""],wm["X2","yi],wm[kak,""],wm[X6",""],wm["X8","ya]] [wm[X11","ii],wm[X9",""],wm["po,""],wm[X1","e],wm[obuslovlennoi,""],wm[X10","ym],wm[X8","em],wm[mozhet,""],wm[kkazatsya,""],wm[X2", "ym]] [wm[X11","ii],wm[X9",""],wm["v,""],wm[silu,""],wm[obstoyatelstv,""],wm[X5","nykh],wm[s,""],wm[X10","ym],wm[X8","em]wm[mozhet,""],wm [okazatsya,""],wm[X2","ym]] [wm[X11","ii],wm[X9",""],wm["po,""],wm[X1","e],wm[vyzvannoi,""],wm[X10","ym],wm[X8","em],wm[mozhet,""],wm[byt'",""],wm["X2","ym]] [wm[X11","ii],wm[X9",""],wm["k,""],wm[X2","osti],wm[kotorogo,""],wm[X4","it],wm[X10","oe],wm[X8","e]] [wm[X10","oe],wm[X8","e],wm[X3","it],wm[X1","oi],wm[X2","osti],wm[X11","ogo],wm[X9","a]] [wm[X2","ost'"],wm["X11","ogo],wm[X9","a],wm[X1","oi],wm[kotoroi,""],wm[X0","etsya],wm[X10","oe],wm[X8","e]] [wm[X2","ost'"],wm["X11","ogo],wm[X9","a],wm[","etsya],wm[X7","om],wm[X10","ogo],wm[X8","ya]] [wm[X10","oe],wm[X8","e],wm[s,""],wm[kotorym,""],wm[X5","a],wm[X2","ost'"],wm[""X11,"ogo],wm[X9","a]] [wm[X11","ii],wm[X9",""],wm["s,""],wm[X8","em],wm[X5","a],wm[ego,""],wm[X2","ost'"]] [wm[X2","ost'"],wm["X11","ogo],wm[X9","a],wm[X5","a],wm[s,""],wm[X8","em]] [wm[X2","ost'"],wm["X11","ogo],wm[X9","a],wm[X0","yuschayasya],wm[X7","om],wm[X10","ogo],wm[X8","ya]] [wm[X10","oe],wm[X8","e],wm[X7","om],wm[kotorogo,""],wm[X0","etsya],wm[X2","ost'"],wm["X11","ogo],wm[X9","a]] [wm[X10","oe],wm[X8","e],wm[X7",""],wm["kotorogo,""],wm[est'",""],wm["X2","ost'"],wm["X11","ogo],wm[X9","a]] [wm[X10","oe],wm[X8","e],wm[X4","yashchee],wm[k,""],wm[X2","osti],wm[X11","ogo],wm[X9","a]] [wm[X10","oe],wm[X8","e],wm[X3","ashchee],wm[","oi],wm[X2","osti],wm[X11","ogo],wm[X9","a]] [wm[X2","ost'"],wm["X11","ogo],wm[X9","a],wm[otnositsya,""],wm[k,""],wm[X6","yu],wm[X10","ogo],wm[X8","ya]] [wm[X2","ost'"],wm["X11","ogo],wm[X9","a],wm[X5","a],wm[s,""],wm[X10","ym],wm[X8","em]] [wm[X10","oe],wm[X8","e],wm[X0","etsya],wm[X1","oi],wm[X2","osti],wm[X11","ogo],wm[X9","a]]
Fig. 4. Making of SE phrases in the composition of the USNL pattern from the example in Fig. 1.
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
EMELYANOV, MIKHAILOV
fields. This being the case, specifying structures (4) are not compulsory for each phrase.
Now, allow us to consider the problem of building the formal context of a senses standard itself as the basis of models (2) and (3). As appears from the infor mal definition of a senses standard at the beginning of the previous section, the approach to distinguishing and classifying the syntagmatic dependences sug gested by us in [3] can serve as a basis of the mecha nism for building the indicated FC. Let us assign the conditions under which the objects and attributes of the formal context K = (G, M, I) representing the senses standard are bound by the relation I G M.
In order to build the standard, NL phrases are selected based on analyzing the models of their linear structures. We remind various agreements accepted by us in [3] for the subsequent statement.
If J is the index set of the invariant parts of all the words used in all the phrases of the initial set, then the model L of the linear structure of the phrase Ti is understood as the ordered assembly of indices j J of the permanent parts of the words that are present in Ti.
Let h(j, L(Ti) be the position of the index j in the model L(Ti). Then, the set of links with respect to
L(Ti)
The link dqi = (h(j, L(Ti)), h(k, L(Ti))) is permissi ble for the model L(Ti), if {Tl, Tm} T, l m; more
over, L(Tl) and L(Tm) contain either {j, k} or {k, j} as a subsequence. This being the case, the pair of indices (j, k) corresponds to one syntagma, and the index q corresponds to the type of the syntactic relation corre sponding to it.
Let us assume that all dqi D(Ti) are permissible for each phrase Ti from the set of phrases assigning a
USNL. If the phrase Ti is among the phrases assigning the standard, then its model L(Ti) must answer the fol
lowing restriction on projectivity:
The graph of syntagms (VJ, IJ) is formed based on (Ti). The sets of pairs (j, k), {j, k} J, grouped
by some index k that is common for them are the ele ments in the set of tops VJ of this graph. The set E1 and
E2 included in VJ will be joined by an edge from IJ, if {j, k, m} J: (j, k) E1, (k, m) E2, and j m.
By analyzing (VJ, IJ), a tree precedent V is built for , where i = 1, , in formula (2). Formally,
(5)
710
This being the case, the index k corresponds
to the root of the tree ( , ), if E1 VJ, in which the pairs of indices are grouped by k, > 1, and k is not contained in any pair of indices for E2 VJ: E1
E2.
From the standpoint of the symbols that form the phrase Ti, we have
where is the common permanent part for all NL phrases making up a given usage situation for natural language; is an inflectional part.
If Ti = , then, correspondingly,
(6)
Here, Wij is the alphabetic structure of a word,
is a invariant part, and is an inflec tional part.
Comparing Wij of different Ti pairwise, it is neces sary to find the following information:
(1) and for each Wij at max;(2) The relation Rq making up the permissibility of the combination ( , ), k j.
Let us consider
Let us also assume that Ti gives the sequence
where Ti is the sequence of symbols for the word for which presentation (6) has not been found.
Lemma 1. The sequence contains a predicate word (a verb or a verbal noun), if {j, 0, k} L(Ti): {wij, u1, , up, wik} , where {u1, , up} = , p =
.
The proof is presented in [3].
Let the condition of Lemma 1 be fulfilled for the sequence .
Lemma 2. The word uk belongs to a splin tered predicative value, if Tj T: L(Tj) L(Ti), and uk , where also meets the condition of
Lemma 1. This being the case, , L(Tk) L(Tj), and L(Tk) L(Ti).
V1J
V1J I1J
E1
Ti TiC TiF,
=
TiC
TiF
Wij
j
Wij WijC WijF.
=
D: Ti h j L Ti
( )
( ): j k
{ }.
qi
q 1
=
( ) h k L Ti
( )
,
,
( )
,
WijC TiC WijF TiF
WijC WijF WijC
WijF WikF
TiCnc wij : wij = Wij
( )
=
{ }.
TiP
PiCnc uk : uk = WkP
( ) WkP = TiP
k
=
,
,
WkP
D Ti
( )
L Ti
( ) ,
PiCnc
where qi h j L Ti
( )
= ( ) h k L Ti
( )
,
( )
.
,
TiCnc PiCnc
PiCnc
D
i
PiCnc
PiCnc
T
Ti
i
PjCnc PjCnc
PkCnc PiCnc
V1J J, I1J j k
,
= =
( ): E
VJ
j k
,
( )
, E
{ }.
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
SENSES STANDARDS AND MACHINE UNDERSTANDING 711
The proof follows from the proof of Lemma 1 and definition of the set of edges in the graph (VJ, IJ).
Note. If the condition of Lemma 2 is fulfilled, uk can inclusively be a dependent word in the composi
tion of a splintered predicative value. The splintered predicative value is understood here as the assembly of an auxiliary verb (a link) and some noun naming a sit uation, as in [5].
Let be the sequence of words satisfying the condition of Lemma 2.
Theorem 1. In order to form the senses standard of an assigned USNL, it is necessary and sufficient to find a subset T' in the initial set of SE phrases meeting the condition
The proof follows from the proof of Lemma 2.
Besides fulfilling the condition of Theorem 1, the key requirement for the selection of phrases is the minimum words that cannot be presented by relation(6). For uk , Ti T', presentation (6) is
formed by comparing the literal composition with all
uj , where the phrase Tl belongs to the subset
of initial SE phrases not included in T'. This being the case, it is necessary that 2 > + , where
= , and = .
Note. If , then um ( \ ) is a preposition and is put together with the word standing on the left from it in .
With consideration for , tree (5) is trans formed as follows:(1) The root changes from k = 0 to the value k for uk that has the maximum occurrence in differ
ent .(2) The left subtree remains unchanged.(3) The right subtree is hung to the site j for uj
of the least occurrence.
(4) In the pair {ul, um} , the node for the word with a lesser occurrence will be child.
Let us introduce designations for the symbolic con stant used below: pfl for a inflection, pbf for the main thing is a flexion. If {j, k} J, and E VJ: (j, k) E in structure (5), which is transformed taking the conditions of lemma 2 and theorem 1 into consider ation (below, in the extended tree (5)), then, for the stems bj and bk and flexions fj and fk, the elements of the sets G and M corresponding to them, as well as ele
PiCnc'
T' Ti: PiCnc' max
=
{ }.
PiCnc'
i
PlCnc
l
M
WkC WkF WjF
WkP WkC WkF WjP WjC WjF
PiCnc' PiCnc PiCnc PiCnc'
PiCnc
PiCnc'
PiCnc'
TiCnc
PiCnc'
PiCnc'
ments of relation I in the composition of structure (1), will be formed as follows.
Case 1. The index k corresponds to the parent site, and the index j corresponds to the child site in extended tree (5), and the linear structure of a NL phrase does not contain a preposition between the words with the indices j and k. This being the case, the set of FC signs (1) will include the signs m1 = pbs gk, m2 = pbf fk, m3 = pfl fj, and m4 = fj : fk; the stem bj will enter into the set of objects of the indicated for
mal context; and the pairs (bj, m1), (bj, m2), (bj, m3), and (bj, m4) are included in the relation I.
Case 2. The index k corresponds to the parent site, and the index j corresponds to the child site in extended tree (5), the linear structure of a phrase con tains a preposition py between the words with the indi
(7) ces j and k. Here, the signs m1 and m3 are formed anal ogously to case 1, m2 = pbf fk : py, m4 = fj :
fk : py, and the pairs (bj, m1), (bj, m2), (bj, m3), and (bj, m4) are included in the relation I.
In particular, the formal context, whose formal concept lattice is shown in Fig. 2, was formed by the recursive traversal of extended tree (5) based on the set of SE phrases from the example in Fig. 1. The accu racy of forming the senses standard by the method described in this section can be estimated numerically with an error calculated using the following formula:
(8)
where is the number of unrecognized attributes of the ith object of FC (1); and Ni is the total number of signs describing this object in the structure of the indi cated FC.
The value of indicator (8) will be the higher the smaller the frequency with which word combinations at the heart of the object sign relation for the formal context of the standard occur jointly in different phrases given the usage situation for natural language. What has been said completely corresponds to the hypotheses about latent relations [4], according to which pairs of words occurring in similar models aspire to have a close semantic dependence.
SYNTACTIC RELATIONS AND THE FORMAL CONTEXT OF A SENSES STANDARDLet us consider a situation when the necessary pat
tern (3) is absent for the interpretation of an NL phrase and analysis of the possibility to find an approximated solution in the form of a senses stan dard by compiling the formal contexts of patterns of several USNLs.
Let there be the set of patterns given by (3) built according to the results of distinguishing the standards
Ni Ni
i 1
=
=
100%,
M
Ni
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
712
EMELYANOV, MIKHAILOV
Potential position of a predicate word
d_no marked > d_no marked[11.9838354,10.32453989
[wm["mozhet", ""], wm["okazatsya""]],4.313553596]
Fig. 5. Syntactic environment of the place where a predi cate word may be present.
Syntactic relation
d_syn_rel > d_syn_rel[11.9838354
[wm["X5", "a], wm["X2", "ost"]],["a", "nykh"], ["ym",","yi", "ost", "osti"], "X2"], ["the main thing is the stem:X5", the main thing is the flexion:a"flexion:ost", "X5:a", "ost:a"]]
Fig. 6. First of the relations of the predicate word for the example in Fig. 5.
Id1R
for usage situations for natural language with respect to some fixed subject field (informally, the field, for which knowledge is tested). The set of syntactic rela tions is distinguished based on each pattern of this sort; the set of all the syntactic relations distinguished using the USNL patterns is to be designated below as
RP. A separate relation RP is presented by the six relations
(9)
where I is the identification number of the relation
;
is the sequence of the inflection stem pairs for
the word combination implementing the relation (in the direction from the major word to the dependent one) within the standard pattern, and ,
TP in the assigned pattern (3);
and are the set of possible variants for the inflection of the major and dependent word, respec
tively, as applied to the relation , but for all TP;
is the variable designating the stem of the
dependent word in the composition of the relation ;
is the list of the names of attributes describing
the relation within the framework of the formal context of the standard pattern in the corresponding structure given by (3). By analogy with the identifica
tion numbers for an USNL and its pattern, I =
Rnd .
Let us also assume that the descriptions of a possi ble presence in the analyzed phrase of the pairs of the syntactic relations binding an unrecognized predica tive word with the words directly subordinated to it are built based on the formed set RP. For a separate unrec ognized predicative word, such relations are to be pre sented by the four relations
(10)
RiP
RlP ldiR TiPR FiM FiD ViR MiPR
where IdP is the number of an USNL pattern; and
are the numbers of the first and second relation;
is the sequence of the stem inflection pairs from
some TP; this being the case, the extreme mem bers of the sequence correspond to the dependent words in and and there is TP, ,
such that both and are subsequences in having a common major word. Structure (10) and cor
responding structures (9) are exemplified in Fig. 5, 6, and 7. Structure (9) is presented by a compound object d_synt_rel, and structure (10) is presented by a com posite object d_no_marked of the Prolog Language (in the example of Fig. 5, the first component of struc ture (10) is presented by the last one, and the second and third components are presented by the first and second one, respectively.
Rule 1. Let us designate the set of quadruples (10) distinguished according to the patterns for the assem
bly of USNLs as and the sequence of words in the interpreted phrase as WK. Then, if there are sets RP and
with respect to the assigned set of patterns (3), FC(1) for the senses standard of the analyzed phrase is built recursively by distinguishing in the initial WK some word assembly , answering one of the fol
lowing conditions:
(1) is the sequence of WK (after taking into
consideration reversion), WK =
(below, the latter element in will be designated as
w1, w1 = b1 f1, and the first element in will be
designated as w2, w2 = b2 f2, and ( ,
RP, RP), where = , and, for the assigned IdP, there are structures given by (4) that put word stems into correspondence with the variables included in the pairs from and . The first elements of
the sequences and coincide, and the last ones are (x1, f1) and (x2, f2); moreover, the variable x1 is
Id2R
TkP
TiP
T1PR T2PR TjP TiP TjP
T1PR T2PR TjP
RP
=
, , , , ,
( ),
diR
RiP
TiPR
RiP
TiPR TjP TjP
FiM FiD
RiP TjP
ViR
RP
RiP
WmidK
MiPR
RiP
WmidK
WbefK WmidK WrestK
WbefK
diR
TP
WrestK
RkP RP R1P
R2P TkP WmidK
T1PR T2PR
T1PR T2PR
RkP IdP Id1R Id2R TkP
=
, , ,
( ),
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
SENSES STANDARDS AND MACHINE UNDERSTANDING 713
Syntactic relation
Syntactic relation
d_syn_rel > d_syn_rel[10.32453989
[wm["X5", "a"], wm["s", ""]],wm[", "em"]] ["a", "nykh"], ["em","ya","e",], "X8",
the main thing is the flexion:a:s",
["the main thing is the stem:X5",
"flexion:em", ", "X5:a:s", "em:a:s"]]
[wm["X4", "e"], wm["X0","i"]], ["em","ya","e",],["i"], "X0",["the main thing is the stem:X4",the main thing is the flexion:e","flexion:i", "X4:e", "i:e"]]
Fig. 7. Second of the relations of the predicate word for the example in Fig. 5.
d_syn_rel > d_syn_rel[11.32115861
Fig. 8. Syntactic relation for Russian pereuslozhnenie modeli.
P f 1
P FiM
FiD
WmidK
TiPR f p
P f 1
P
specified by the stem b1, and the variable x2 is concret ized by the stem b2. This being the case, objects b1 and b2 will be added into the formal context of the stan
dard, and the set of attributes for the added objects will be formed by the elements from the lists and
, where the variables are replaced by the values from structures (4) for the assigned IdP. Subsequently, the
FC of the standard is built for the sequences {bp}
and {bp} , where bp is the stem specifying a vari
able for the first elements in and .
(2) = {wp, w1} and wp and w1 do not necessar ily form a subsequence in WK (taking into consider
ation reversion), but there is RP such that there are structures given by (4) that put word stems into correspondence with the variables included in the pairs from ; the words wp and w1 are identified
within the framework of as major and dependent,
respectively; wp = bp fp, w1 = b1 f1, = {(xp, fp), (x1, f1)}, the variables xp and x1 are specified by the stem bp and b1, respectively. The object b1 is added into the formed formal context of the standard, and the set of its attributes will be formed by the elements from the list , in which the variables are replaced by their values.
(3) With = {wp, py, w1}, the requirements to wp and w1 are analogous to condition 2 with the exception of the fact that the word wp is bound with w1 through the preposition py within . The addition of informa
tion in the formal context of the standard for the present as well as next three conditions takes place by analogy with the addition performed according to condition (2).
(4) For = {wp, w1}, the requirements to wp and w1 are analogous to condition (2) with the exception of the fact that = {(xp, , (x1, )}, wp = b1 fp, w1 =
M1PR
b1 f1, and (( = fp) ( = f1)) true, but fp ,
and f1 .
(5) = {wp, py, w1}, py is a preposition, whose requirements are analogous to condition 3 with the exception that = {(xp, ), (py, ''''), (x1, )}.
(6) For = {wp, w1}, as in conditions (2)(5), wp and w1 are permitted not to form a subsequence in
WK (after taking into consideration reversion), and RP such that there are structures (4) that put word stems in correspondence with the variables included in the pairs from , but with respect to a
fixed USNL. This being the case, = {(xp, ), (x1,
)}, wp = bp fp, w1 = b1 f1, and (( = fp) ( =
f1)) true, and either fp , or f1 .
If any of the enumerated six conditions is fulfilled, then the words, for which relations have already been found, are removed from the list of those which have not yet been considered during each subsequent recur sive traversal of the procedure for building the FC of the standard. Before starting to perform the procedure under consideration, all the words from WK are entered in this list. When the indicated list becomes empty, the exit from the procedure takes place, and the formed formal context is yielded as the result. In addi tion, in each recursive traversal of the procedure for conditions (2)(6), a pair or triad of words with respect to which a relation is determined is remem bered to avoid a getting into an endless loop.
Let us consider the building of the senses standard in the form of FC (1) for the Russian sentence Nezhelatelnoe pereobuchenie sluzhit prichinoj zanizhennosti srednej oshibki na trenirovochnoj vyborke as an example. Let us assume that the cur rent content of a knowledge base does not permit this phrase to be interpreted using one of the patterns given by (3), but the structures given by (9) shown in Figs. 819 exist and correspond to the syntactic rela tions within the limits of the Russian phrases Pereus lozhnenie modeli sluzhit prichinoj zanizhennosti
f p
M2PR
WbefK
WmidK
WrestK
T1PR T2PR
WmidK
RiP
TiPR
TiPR f p
P
RiP
f 1
P f p
P f 1
P
FiM FiD
TiPR
RiP
TiPR
MiPR
WmidK
RiP
WmidK
TiPR f p
P f 1
P
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
714
EMELYANOV, MIKHAILOV
Syntactic relation
d_syn_rel > d_syn_rel[13.42677194[wm["X2", "it"], wm["X4", "e"]],["it"], ["em","ya", "e",], "X4",["the main thing is the stem:X2", the main thing is the flexion:it", "flexion:e", "X2:it", "e:it"]]
Fig. 9. Syntactic relation for Russian sluzhit pereuslozh nenie.
d_syn_rel > d_syn_rel[5.436243939[wm["X2", "it"], wm["X5", "oi"]],["it"], ["oi","e",], "X5",["the main thing is the stem:X2", the main thing is the flexion:it","flexion:oi", "X2:it", "oi:it"]]
Fig. 10. Syntactic relation for Russian sluzhit prichinoj (variant 1).
Syntactic relation
d_syn_rel > d_syn_rel[24.212913645[wm["X1", "oi"], wm["X2","osti"]], ["oi","e"], ["ym","yi",["ost"["osti"], "X2", ["the main thing is the stem:X1",he main thing is the flexion:oi","flexion:osti", "X2:oi", "osti:oi"]]
Fig. 11. Syntactic relation for Russian prichinoj zani zhennosti.
Syntactic relation
d_syn_rel > d_syn_rel[11.52121784[wm["X10", "i"], wm["X9","ei"]],["a","i"], ["ei"], "X9",["the main thing is the stem:X10", the main thing is the flexion:i","flexion:ei", "X10:i", "ei:i"]]
Fig. 12. Syntactic relation for Russian oshibki srednej.
Syntactic relation
d_syn_rel > d_syn_rel[8.26473393
[wm["X10", "i"], wm["na",""],wm["X11","e"]] ["a","i"], ["e"], "X11,"["the main thing is the stem:X10, " the main thing is the flexion:i:na,""flexion:ei," "X10:i:na, " "e:i:na"]]
Fig. 13. Syntactic relation for Russian oshibki na vyborke.
Syntactic relation
d_syn_rel > d_syn_rel[19.98187161
[wm["X11", "e"], wm["X12","oi"]],["e"], ["oi"], "X12,"["the main thing is the stem:X11" , the main thing is the flexion:e,""flexion:oi,"
"X11:e", "oi:e"]]
Fig. 14. Syntactic relation for a vyborke trenirovochnoj.
srednej oshibki na trenirovochnoj vyborke and Nezhelatelnoe pereobuchenie sluzhit prichinoj zanizhennosti jempiricheskogo riska. In addition, there are specific definitions shown in Fig. 1, which correspond to the two indicated phrases for all the variables included in the structures (9) under consid eration.
The sequence numbers of USNLs in Table 3 were taken from Table 5, namely, (1) is the relation between retraining and empirical risk; (2) is the relation
between the excessive complication of the model and the underestimation of an average error based on a training sample.
Let the structure given by (9), in which the second component = {(xp, osti), (x1, i)} in the pres
ence of specific fours (4) for the pairs (xp,
zanizhenn) and (x1, oshibk) have not been found for the word combination underestimate of an error within the USNL with number 2 from Table 3. Mean
TiPR
Table 3. Specific definitions of the variables for the structures in Figs. 819 and Fig. 21
xi stem USNL xi stem USNL X0 model 2 X1 prichin 1
X2 sluzh 2 X2 zanizhenn 1 X4 pereuslozhneni 2 X3 sluzh 1 X5 prichin 2 X8 pereobucheni 1 X9 sredn 2 X9 risk 1 X11 vybork 2 X10 nezhelateln 1 X12 trenirovochn 2 X11 empirichesk 1 X6 zanizhenn 2 X10 oshibk 2
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
SENSES STANDARDS AND MACHINE UNDERSTANDING 715
Syntactic relation
d_syn_rel > d_syn_rel[18.74911802[wm["X8", "e"], wm["X10","oe"]], ["em","ya","e"], ["ym","ogo","oe"], "X10," ["the main thing is the stem: X8,"the main thing is the flexion:e, ""flexion:oe,""X8:e,""oe:e"]]
Fig. 15. Syntactic relation for Russian pereobuchenie nezhelatelnoe.
Syntactic relation
d_syn_rel > d_syn_rel[8.988432908[wm["X3", "it"], wm["X8","e"]], ["ashchee","it"], ["em","ya","e"], "X8,",["the main thing is the stem:X3," the main thing is the flexion:it,""flexion:e," "X3:it," "e:it"]]
Fig. 16. Syntactic relation for Russian sluzhit pereobu chenie.
Syntactic relation
d_syn_rel > d_syn_rel[14.46134958[wm["X3", "it"], wm["X1","oi"]], ["ashchee","it"], ["oi","e"], "X1,"["the main thing is the stem:X3, " the main thing is the flexion:it, ""flexion:oi," "X3:it," "oi:it"]]
Fig. 17. Syntactic relation for Russian "sluzhit prichinoj (variant 2).
Syntactic relation
d_syn_rel > d_syn_rel[5.798477273[wm["X2," "osti"], wm["X9","a"]],["ym," "yi," "ost," "osti"], ["","a"], "X9,"["the main thing is the stem:X2", the main thing is the flexion:osti, ""flexion:a," "X2:osti,
Fig. 18. Syntactic relation for Russian zanizhennosti riska.
while, there is a structure given by (9), shown in Fig. 21, for which the variable X6 is specified by the stem zanizhenn, and the variable X10 is specified by the stem oshibk with respect to the indicated USNL; moreover, the inflection i is included in the list of possible variants for the inflection of a depen dent word.
Then, the building of the sought FC of the standard is ensured by the fulfillment of the conditions of rule 1 for the word combinations in the composition of the analyzed sentence such as is the one shown in Table 1. The resultant FC is presented by an lattice in Fig. 20.
It should be noted that rule 1 is applicable to build the pattern for the formal context of the USNL of the analyzed sentence. In this case, variables will be used instead of word stems in the composition of the names of objects and signs of the formed formal context; moreover, one stem is put into correspondence with each variable, and the name of each variable must be unique, independently of the existing names of vari ables in the descriptions of components of struc tures (9). For each variable involved in the built pat tern of the formal context, the specific quadruple given by (4) is assigned with respect to the USNL of the ana lyzed sentence based on the specific definitions given by (4) that are used when building the formal context for the senses standard of the same sentence accord ing to rule 1.
THE TYPICAL ARCHITECTUREOF THE SYSTEM FOR COMPUTER AIDED TESTING OF KNOWLEDGETaking into consideration the properties of the sug
gested way of interpreting an answer to a test task of an open form, it is necessary to distinguish thirty major components presented in Fig. 22 in the composition of
the system for computer aided testing of knowledge. These components are as follows:
BFSBlock Forming the Standards; BFPBlock Forming the Patterns; BMPBlock Merging the Patterns; BFThBlock Forming the Thesaurus; BSTBlock Selecting a Test; TESTBlock Performing a Test; BFTsBlock Forming the Tasks placed in the database named as TASKS;
Components THESAURUS, SPECIFIC DEF INITIONS, and PATTERNS form the basis of the systems subject linguistic knowledge. The SR, i.e., the Database of Syntactic Relations presented by the structures given by (10) and formed be means of the BFR, i.e., the block forming the relations based on the preliminarily formed database of patterns given by (3).
The assembly of sets of SE phrases, based on each of which the block forming the standards builds its standard in the form of formal context (1), serves as the initial data to form the systems subject linguistic knowledge. The latter context enters the block form ing the patterns, where the pattern given by (3) is built based on the entered standard with the introduction of the corresponding fours (4) into the base of the spe
Syntactic relation
d_syn_rel > d_syn_rel[25.08897144
[wm["X9", "a"], wm["X11","ogo"]],["","a"], ["ii," "ogo"], "X11,"["the main thing is the stem:X9, " the main thing is the flexion:a, ""flexion:ogo," "X9:a," "ogo:a"]]
Fig. 19. Syntactic relation for Russian riska jempi richeskogo.
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
716
EMELYANOV, MIKHAILOV
pereobucheni:e
the main thing is the flexion: e
the main thing is the stem: pereobychenioe: e
nezhelatel'nflexion: oe
the main thing is the stem: vyborkvybork:e
oi:e
trenirovochn
osti:oi
prichin:oiflexion: osti
the main thing is the stem: prichinthe main thing is the flexion: oi
zanizhenn
sluzh:it
the main thing is the flexion: it the main thing is the stem: sluzh
e:itoi:it
Fig. 20.FC of the senses standard according to the results of compiling the patterns of two USNLs.
flexion: oi
pereobucheniprichin
the main thing is the flexion: osti
the main thing is the stem: zanizhenn
zanizhenn:ostiflexion: i
oshibk
flexion: e
the main thing is the flexion: i:na
i:osti
oshibk:i:na
vybork
the main thing is the stem: oshibk
e:i:na
the main thing is the flexion: i
flexion: ei
sredn
oshibk:i
ei:i
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
SENSES STANDARDS AND MACHINE UNDERSTANDING 717
cific definitions. The block forming the thesaurus enters the units of knowledge presented by the formal contexts given by (1) into the thesaurus base. The the saurus itself is presented as a lattice model considered by us in detail in [5]. This being the case, the thesaurus can entirely store the patterns of a senses standards rather than these standards. In this case, it is compul sory to introduce specific structures (4) for each USNL into the base of specific definitions.
Without considering the form of the presentation of the units of knowledge in the thesaurus (standards or their patterns), the purpose of the base of patterns is to store the patterns of the USNLs described by struc tures (3). This being the case, the block merging the patterns for the pair of structures = ( , , )
and = ( , , ) builds a new structure =
( , , ): = , = from the base of patterns only when it finds, for each USNL corresponding to , the USNL corresponding to and describing the same fact of reality (in the opinion of a carrier of NL) on the basis of all the specific defi nitions available in the base. The procedure for merg ing the patterns is initiated by the user.
The purpose of the block forming the tasks is to organize the interaction between the system and an expert teacher in an assigned subject field in the pro cess of composing a test. Each test task represents the assembly of a question and pattern given by (3) for a correct response, as well as the corresponding specific structures (4) based onspecific definitions. This being the case, the text of a question is introduced in a spe
Syntactic relation
d_syn_rel > d_syn_rel[15.01982574[wm["X6," "ost'"], wm["X10","i"]],["oi," "ost'"], ["a," "i"], "X10,"["the main thing is the stem: X6, " the main thing is the flexion: ost', ""flexion: i," "X6: ost'," "i: ost'"]]
Fig. 21. Syntactic relation for Russian zanizhennost oshibki.
cial text editor, and the required pattern from those formed preliminarily together with concomitant structures (4) is immediately put into correspondence with the introduced question by the teacher. With allowance for the properties of real tests, it makes sense to join tasks into packages; the TASKS base in Fig. 22 will separately store the tasks themselves and packages of tasks to organize tests with a varying degree of complexity.
According to the tasks chosen by the student, the TEST block implements the testing procedure. After a trainee introduces a response, the system makes an attempt to use pattern (3) of the correct response, tak ing into account the specific structures (4) within the framework of a task assigned for this pattern. If the comparison is a success, the response of the tested per son is identified as correct, the work with the task ends, and either transition to the next task in the package takes place (if the number of tasks is more than one) or the package is exited. If the comparison does not end successfully, the attempt is made to use other patterns and, in the case of success, to prove the presence of the relation of a similarity between the usage situations for
S1P Id1P T1P K1P
S2P Id2P T2P K2P S3P
Id3P T3P K3P T3P T1P T2P K3P K1P K2P
S1P S2P
Table 4. Correspondence of word combinations to the conditions of rule 1
word combination condition word combination condition
undesirable retraining 2 the underestimate of an error 6 retraining serves as 2 an average error 2 serves as a reason 2 an error based on a sample 3 a reason for the underestimate 2 a training sample 2
Table 5. Usage situations for natural language
Number What is described by a USNL
1 Relation between retraining and empirical risk2 Relation between the excessive complication of the model and underestimate of an average error basedon a training sample3 Influence of readjustment on the frequency of errors of the decision making tree4 Reason for the underestimate of the generalizing capability of the algorithm5 Dependence of the estimate of a recognition error on the choice of a decisive rule6 Dependence of the generalizing capability of the logical classification algorithm on the number of regularities of an algorithmic composition
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
718
EMELYANOV, MIKHAILOV
SD
SE phrases
BFS
BFTh
BFR
thesaurus SR
BFP BMP
Patterns
BFTs
Tasks
BST
Student
Question
Test
correct/incorrect
Answer
Fig. 22. Architecture of the system for computer aided testing of knowledge.
the natural language for the trainees response and the correct response according to the definition of the indicated relation (including the numeric value of the similarity measure) formulated in [5]. If the proof is a success, the similarity measure is calculated for the USNL. The obtained numeric value is used as the basis for giving marks to tested persons, as well as to collect statistics. If there is no appropriate pattern in the base, an attempt is made to find an approximate solution in the form of formal context (1) for the trainees response, using the recursive procedure based on rule 1. If the application of the indicated pro
cedure is a success, it proves that there is a similarity between the USNL of the correct response and the trainees response. This being the case, the built formal context represents the USNL of the trainees response. If the similarity of the USNL is proved successfully, the value of its measure is calculated. If there is no success in applying the procedure based on rule 1, as well as proving the similarity of the USNL, the value of the indicated measure is zero, and the response to the question of the test is identified as incorrect.
The ratio between the sizes of the thesaurus during building it, based on the formal contexts given by (1) for all the SE phrases of each USNL and based on the senses standards at the assigned number of usage situ ations for natural language presented in the thesaurus as of now, can serve as a qualitative characteristic of the process of forming a senses standards on the whole. The indicated ratio is presented as an example in Fig. 23 for the usage situations for natural language from Table 5, corresponding to the NL descriptions of the facts of the subject field Mathematical Methods for Learning by Precedents.
CONCLUSIONSBy introducing the senses standard based on the set of SE phrases, the reduction of the size of a knowl edge base, on average by 4050%, is achieved to esti mate the similarity between USNLs in the case of their independent generation.
Forming the model of the senses standard based on distinguishing the syntagmatical dependences rela
80000
70000
60000
50000
40000
30000
20000
10000
0 1 2 3 4 5 6
Size of the thesaurus before distinguishing standards, bytes Size of the thesaurus based on standards, bytes
Number of USNLs in the thesaurus
Fig. 23. Size of the thesaurus for the different number of USNLs.
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
SENSES STANDARDS AND MACHINE UNDERSTANDING 719
tive to a set of SE phrases permits the properties of a specific restricted subject subset of Natural Language to be taken into account most fully. This being the case, the accuracy of the solution is estimated numer ically by the average number of undistinguished (missed) signs per object of the formal context for the formed standard.
The suggested USNL pattern model actually serves as a basis for forming syntactic strategies and rules regarding a restricted subject subset of NL. When the thesaurus of a subject field is built in the form of the Formal Context based on patterns of Usage Situations for Natural Language, the patterns themselves enable the types of syntactic relations to be distinguished in the automated mode as classes of formal notions.
The accuracy of building the senses standard is increased when involving the base of syntactic rela tions formed, based on the patterns of different USNLs in the assigned subject field.
ACKNOWLEDGMENTSThe work was supported by the Russian Founda
tion for Basic Research, project no. 10 01 00146.
REFERENCES1. G. M. Emelyanov and D. V. Mikhailov, Formalization of the Words Lexical Meaning in a Problem of Recog
nition of Natural Languages Statementss Synonymys Situations, in Proc. 8th Int. Conf. Pattern Recognition and Image Analysis: New Information Technologies (PRIA 8 2007) (Mari State Technical Univ., Yoshkar Ola, 2007), Vol. 2, pp. 253257.
2. B. Ganter and R. Wille, Formal Concept Analysis Mathematical Foundations (SpringerVerlag, Berlin, 1999).
3. D. V. Mikhailov and G. M. Emelyanov, Forming and Clustering of Syntactic Relations on the Basis of Natu ral Languages Usings Situations, in Interactive Sys tems and Technologies: the Problems of Human Com puter Interaction. Collection of Scientific Papers (UlSTU, Ulyanovsk, 2009), Vol. 3, pp. 295307 [in Russian].
4. P. D. Turney, The Latent Relation Mapping Engine: Algorithm and Experiments, J. Artificial Intell. Res., No. 33, 615655 (2008).
5. D. V. Mikhailov and G. M. Emelyanov, Texts Seman tic Similarity in the Problem of Automatic Knowledge Inspection, in Proc. 8th Int. Conf. Intellectualization of Information Processing IIP2010 (Maks Press, Moscow, 2010), pp. 516519 [in Russian].
6. K. S. Ostanin, TestEkzamenator System for Computer Testing, in Proc. Int. Conf. Information Technologies in Education (ITE 2003) (Moscow, 2003), Available from: http://www.bitpro.ru/ito/2003/VI/VI 0 2562.html
7. M. B. Chelyshkova, Theory and Practice for Generating the Educational Tests (Logos, Moscow, 2002) [in Rus sian].
Gennadii M. Emelyanov. Born 1943. Graduated from the Leningrad Institute of Electrical Engineering in 1966. Obtained his PhD (Kandidat Nauk) and his Doctoral (Doctor Nauk) degrees in 1971 and 1990, respectively. From 1993 to 2003, a Dean of the Faculty of Mathematics and Computer Science at Yaroslav the Wise Novgorod State University. Now he is a professor of the Depart ment of Information Technologies and Systems at the same university. Scientific interests: con struction of problem oriented computing systems of image processing and analysis.
Dmitrii V. Mikhailov. Born 1974.
Graduated from Yaroslav the Wise Novgorod State University, Novgo rod, in 1997. Obtained his PhD (Kandidat Nauk) degree in Physics and Mathematics in 2003. From 2000 to 2007 has worked at the Depart ment of Computer Software of Novgorod State University. Now he is a Docent of the Department of Infor mation Technologies and Systems at the same university. Since 2002 is a member of Russian Association for Pattern Recognition and Image Analysis. Scientific interests: computational linguistics and artificial intelligence. In scientific area of Pattern Recogni tion and Image Analysis has 25 papers.
PATTERN RECOGNITION AND IMAGE ANALYSIS Vol. 21 No. 4 2011
Pleiades Publishing, Ltd. 2011