Abstract: Original short videos are deeply loved by the general public for their outstanding originality and personahty. A large amount of information needs to be digitized in order to be transmitted or received, and the video type in multimedia information is the most complex, integrating images, sound, and text. It has been widely used due to its large amo mit of information, easy dissemination, and intuitive and vivid characteristics. Online English course vocabulary learning effectively utilizes short videos resources and intelligent retrieval and mining technologies to help create a short videos system with multi-dimensional annotations such as grammar, semantics, and themes, achieving a multi-dimensional automated retrieval sharing platform, and utilizing short videos resources and data. A short videos mining teaching model based on intelligent retrieval technology. Short videos, as a new media dissemination method, have extended to the field of art education. Its concise, interesting, and fragmented characteristics have brought a huge breakthrough to traditional art education.
Keywords: ShortVideos; Data mining; Online English course vocabulary learning
(ProQuest: ... denotes formulae omitted.)
1. Introduction
From an industry perspective, the growth of the short video market will not stop here. Mobile media carrying short video transmission has occupied the center of people's daily lives, and popular culture represented by original short videos remains the mainstream in the consumer culture market (Liu et al. 2014). This rectification is conducive to promoting the demand for high-quality original content in the short video market. In the face of massive video data, from the perspective of users, the traditional video description methods that use simple operations such as fast forward or rewind to browse videos to obtain the main information in the videos can no longer meet the needs of people to quickly obtain professional knowledge (Hsiao, 2014). In order to better manage and utilize these rich video resources and make them more effective, automatic video description can help users improve the indexing speed and search quality of online videos. Short videos are usually disseminated on new media on the internet, with durations ranging from a few seconds to 20 minutes. It carries a huge amount of traffic and has a strong attraction to both the subject and object of art education (Paker et al. 2017). This new model reflects the transformation of traditional educational ideas, educational concepts, and teaching forms, and plays an extremely important role in the development of art education. In recent years, research on multimedia learning in the field of second language acquisition has been continuously deepening, especially regarding the impact of visual and auditory inputs on vocabulary acquisition (Juffs and Friedline, 2014) . Research has shown that visual input can promote vocabulary memory, while auditory input contributes to vocabulary comprehension and use (Chatpunnarangsee, 2013). Short videos, as a medium that contains both visual and auditory elements, are expected to play an important role in English vocabulary acquisition. This article adopts quantitative analysis methods to evaluate the impact of short videos on English vocabulary acquisition by collecting and analyzing data (Sharma et al. 2014). Firstly, we selected a series of short videos containing English vocabulary and identified and classified the vocabulary within them. Then, we designed an experiment to invite a certain number of English learners to watch these short videos and learn and memorize the vocabulary in them. Finally, evaluate the impact of short videos on English vocabulary acquisition by testing learners' vocabulary mastery (Belkhir et al. 2013).
Multimedia computer-assisted foreign language teaching has shown us a good teaching method and auxiliary means. Interpretation teachers can use their auxiliary functions to construct a learning environment suitable for students' cognitive characteristics and learning habits, and stimulate students' enthusiasm and subjective initiative. The application of short videos in English classrooms is still a new research field with great exploration space. The exploration of teaching models based on short videos has enriched the research content of linguistics, providing a large number of examples for ESP teaching with its rich themes, practical information, and convenient functions, facilitating students' exposure to real and vivid language materials, providing new ideas and perspectives for foreign language teaching and research, and enriching the methodology of foreign language teaching and research (Caws et al. 2013). With the development of parallel short video research and application, the demand for parallel short video database construction continues to increase. In addition to the basic work of collecting short videos and organizing corpora, compared to general monolingual corpora, short videos also need to identify and align corresponding units during construction (Mohamad et al. 20i7)(Attia et al. 2014). But how can these impressions be obj ectively presented to third parties? This requires utilizing the electronic measurement advantages of text data mining to solve these problems (Jeaco et al. 2015).
Through a questionnaire survey of students, based on their feedback and short videos, can the short videos be directly obtained through online resources and distributed CDs. Meanwhile, the classification of genres (oral and written) is based on the standards of large general corpora such as BNC and LLC, and then subdivided according to the principle of "subject based, source based" (Shah et al. 2015). In practical applications, text data mining has three main elements: the first is to extract data. The second is data analysis. The third is the visualization of the analysis results. That is to say, text data mining must consider how to reduce errors, collect necessary information correctly and effectively, and what methods should be used to scientifically analyze information (Lihui, 2015) . Develop an automatic recognition software module for English NOUN phrases.
Main function: Automatically call CLAWS coding software, develop noun related unit automatic recognition and alignment software module (Schneider et al. 2016). Main function: Develop a parallel short video construction platform tool to provide a testing environment for automatic recognition and alignment of corresponding units. The integrated English noun phrase automatic recognition software module has achieved automatic recognition of English noun phrases on a parallel short video construction platform. This not only verifies the applicability of short video platforms in ESP teaching, but also further expands the application scope of existing theories (Wang, 2023).
The main content of this paper is to analyze and elaborate the strategy optimization of vocabulary learning in web-based English course. At the same time, combining theory with practice, this paper focuses on discussing and describing the solution of vocabulary learning in web-based English course based on Short Videos data mining. The main contributions of this paper are as follows:
1. In this paper, a data mining algorithm based on Short Videos is adopted.
2. This paper designs a data mining-based vocabulary learning strategy optimization model for online English courses.
3. This paper proposes an Short Videos-based data mining algorithm. Based on the algorithm, a semantic analysis model is established to provide technical support for solving the problem of vocabulary learning strategy optimization in online English courses.
4. This study is based on the forefront of disciplinary development. The application of short video platforms will provide new perspectives and methods for ESP teaching and research, which is a cutting-edge research field at home and abroad.
2. Related Work
In the past 20 years, with the rapid development of network technology, different types and types of short videos have emerged for various purposes and purposes. They play an important role in the theoretical and applied research of linguistics, as well as in the compilation of dictionaries and textbooks. An increasingly important role. The theoretical research and teaching practice of short videos are receiving increasing attention. There has been significant development in teaching and research abroad. Research areas include classification, teaching methods, needs analysis, textbook design, teacher training, short video research, and evaluation testing. With the emergence and development of parallel short videos based on language comparative analysis and translation research, people have further proposed corresponding units on the basis of translation units. Correspondence unit refers to any segment or sequence in the target language text that fully corresponds to the meaning of the source language text and has clear boundaries. In the 1990s, due to the needs of foreign exchange in fields such as natural sciences, humanities, social sciences, economy and trade, vocabulary learning and teaching in online English courses in China gradually emerged. While introducing relevant theories, Chinese scholars have begun to pay attention to the design of online English textbooks and successful cases of online English courses at home and abroad.
Establishing a diversified short video suitable for autonomous learning of college English learners will become an important topic in the field of college English teaching reform and short video linguistics. In recent years, the resource sharing of short video indexing tools has made it possible for corpora to become an advanced learning tool and product resource in the field of foreign language teaching. In fact, the corresponding unit is the embodiment of the translation unit in parallel short videos, which is the substantive translation unit. The two sequences corresponding to the unit are examples of "source language expression" and "equivalent expression of target language text" in the translation unit. Based on the personalized, interactive, and adaptive characteristics of short video data mining, exploratory teaching methods have shifted English teaching from traditional teacher and teaching to student-centered and social service, marking a significant change in language teaching. And the strategic shift in linguistic research. With the rapid development of the internet, the rich data sources and convenient communication methods on the internet have made up for the shortcomings of traditional multimedia teaching. With the help of the internet, we can further promote and develop computer-aided interpretation teaching practices.
3. Methodology
The short videos is a large-scale database that uses computer technology to process and store large amounts of natural language materials for automatic retrieval, indexing, and statistics. Since the 1950s, the construction and use of the first generation of large electronic corpora, the BROWN and LOB short videos, has marked the world's first machine-readable short videos. With the development and popularity of the network, it has become the largest and fastest information dissemination platform. The timeliness of online short videos is unmatched by traditional media (such as newspapers, periodicals, etc.). Undoubtedly, the network should be the main collection place for political news corpora. Then we use the analysis tool of text data mining to construct the keyword database on the basis of the database, and analyze the whole data and specific keywords. Finally, the analysis results are evaluated comprehensively, and the deep analysis and discussion are carried out. The methods and ways of short videos collection and the standard of short videos annotation are given to students. Meanwhile, the tools for making multimedia short videos are given to them, and they are invited to participate in short videos collection and short videos construction. Under the guidance of teachers, students can create a short videos and carry out meaningful language practice activities at the same time.
All documents in the test data are segmented and the time of word segmentation is recorded. Then the documents are divided into 0-10 word group and 10-20 word group according to their size. Calculate the average time of word segmentation in each group, the composition of each group of documents and the time consumed for word segmentation are shown in Table 1 and Figure 1 below.
Three articles were randomly selected as test cases for word segmentation accuracy. The results of the tests are shown in Table 2 and Figure 2. The experimental results show that the word segmentation precision of the designed natural language-based lexical analyzer is also directly related to the length of the document to be processed. In general, the document is longer and the accuracy of the word segmentation will be higher.
The advantage of vocabulary learning in online English course lies in its unique advantages. The rich network resources provide a large number of natural, real and vivid language materials for English teaching. Both students and teachers can learn English and Chinese vocabulary and master new words and popular vocabulary through the network resources. Network can provide a real and natural language learning and communicative environment for vocabulary learning of online English courses. It provides valuable practical materials for the teaching of related courses, and advanced short videos retrieval methods provide technical support for inquiry learning, which is conducive to the implementation of students' autonomous learning. Short videos plays a more and more important role in modern linguistics research and education because of its advantages of large capacity, representativeness, authenticity, fast and accurate retrieval. The recognition of noun phrases is an important subtask in natural language processing. Its recognition results can simplify the sentence structure, reduce the difficulty and complexity of syntactic analysis, and provide a basis for further phrase analysis and syntactic analysis. It does not have specific analysis methods and processes. Text data mining can take many forms, some are to extract high-frequency words from text data, and to summarize narrative statistics; some are to group observation data.
Word segmentation is performed on all documents in the test data and the time spent in keyword calculation is recorded. Documents are divided into o-to word group and 1020 word group according to size. There are 2 articles in each group and 4 articles in total. The average time of Chinese words in each group is calculated. For the feature selection program designed in this paper, the calculation time and processing speed of keywords are shown in Table 3 and Figure 3 below.
Using the keyword analysis in the word segmentation demo of the massive word segmentation research version, the keyword calculation consumption time is shown in Table 4 and Figure 4 below. The experimental results show that the time complexity of feature selection is linear with the length of the document to be processed. Compared with the mass, the feature extraction of the system still has a certain gap in efficiency. The reason for this result is mainly due to the design model of the thesaurus, the choice of data structure and the algorithm for calculating the frequency of the document have yet to be improved, and further improvements are needed in the future.
Real-time vocabulary exercises, synonym comparisons, collocation phrase exercises, etc. can be developed through word indexing. An interactive exercise based on short videos indexing is generated for contextual co-occurrence of vocabulary in discourse, and then combined with dynamic hypertext format to convert courseware that can be transmitted online for use in remote classrooms or LAN classrooms. Students can communicate with different levels of English learners through the Internet, and even communicate with native English speakers to train more authentic expressions. This also contributes to the improvement of students' cross-cultural knowledge and is conducive to the cultivation of students' comprehensive English quality. The network makes "individualized teaching" possible. In the process of noun phrase recognition, the software combines and calculates some parameters of English text according to the mathematic model designed beforehand to judge and recognize NOUN phrases. According to the given mathematic model, this method obtains statistics by calculating some parameters and identifies noun phrases according to statistics. Although the recognition rate is high, the software developed by the algorithm has complex structure and requires high quality and quantity of supporting data. In addition, linking the highly correlated keywords into a network can not only show the relationship between entries and entries, but also understand the relationship between entries and the times.
4. Result Analysis and Discussion
English short videos plays an assistant role in vocabulary learning and teaching of online English courses. First of all, English short videos can reflect the frequency of English vocabulary in actual use. Vocabulary, phrases, idioms or idioms with high frequency of words can be trained with emphasis or priority in teaching. Secondly, English short videos can reflect the common collocation of vocabulary. The application of short videos in College English teaching is confined to the lexical and grammatical levels, while the construction of short videos-based online English course vocabulary learning platform, teaching mode and effectiveness empirical research are insufficient; the research on how to apply short videos to teaching practice and how to realize the effective combination of short videos and online English course vocabulary learning teaching is insufficient. In the construction process of the short videos, the most critical and the most cumbersome and hard work is to determine the corresponding units in the source language text and the target language and align them. For a long time, some experts and scholars have reduced the labor burden of the founders. By writing software tools to use computeraided manual methods to identify and align the corresponding units, the results are all good. Detailed analysis of the frequency of these entries in the external speech and the internal speech increase and decrease, and research and various The terms related to business philosophy are used in different ages.
Two articles were randomly selected as test cases to analyze the feature selection accuracy. The length of the three articles is 0-400, 400-800. For each article, extract the first 24 keywords, and then compare with the massively extracted keywords, and record the number of the same keywords, as shown in Table 5 and Figure 5 below. The experimental results show that the comparison rate is lower when the document length is larger than the keyword extraction of the massive technology company.
All the words that appear in ¥ can be regarded as P, or all the phrases appearing in ¥ can be used as F, thereby improving the accuracy of the feature representation. P is generally defined as the probability of occurrence in Y, and its expression is:
... (1)
If this happens frequently, it is necessary to modify the predefined categories and then re-perform the above training and classification process. When calculating P, there are many ways to choose. The simplest method is to consider only the degree of overlap of terms contained in two feature vectors. Namely:
... (2)
Among them, P is the same number of entries as X, and I is the same number of entries as A. The most commonly used method is to consider the angle cosine between two eigenvectors, that is:
... (3)
The Boolean model is a simplification of the mathematical model. It defines a binary mapping function W, the value of the metadata heart A is no longer a weight, but a Boolean value. The result of the text representation is the 2b vector, the formula is:
... (4)
It is an entropy-based evaluation method involving more mathematical theories and complex entropy theory formulas, defined as the difference in information entropy before and after the appearance of a feature in the document. According to the training data, the information gain of each feature word is calculated, and the words with small information gain are deleted, and the rest are sorted according to the information gain from large to small. The information gain evaluation function is defined as:
... (5)
Assuming that there are entries К and class I, F mutual information is defined as:
... (6)
The total number of words in the experiment was 163,170 of which were used to train the classifier and 103 were used to verify the training effect. Of the 193 test sets, 49 were legitimate and 10 were spam. By synthesizing and comparing the neural network algorithm, the distribution diagrams of recall rate and accuracy rate of the following experimental data are obtained, as shown in Figure 6 below.
Network resources can provide a channel for the available short videos, but the following problems are the authenticity of the short videos source and the long process of screening the infinite amount of information on the network. One of the solutions is to build a small short videos and gradually improve the large short videos by accumulating data. At the same time, the research on the evaluation mechanism of the online English vocabulary learning teaching model based on short videos has not attracted the attention of scholars. It can be seen that the online English vocabulary learning teaching model and its evaluation mechanism based on short videos resource platform are the areas that need to be explored and explored urgently in the research field of our country. Its meaning can still maintain a relatively stable word or word combination (hereinafter referred to as word sequence) after extracting the context. The size of the text segment in the corresponding unit, we usually use the corresponding unit granularity or the corresponding level of the two terms. If the text segments in the corresponding unit of a short videos are sentences, the granularity of the corresponding unit of the short videos may be the sentence, or the corresponding level of the short videos is a sentence. Final summary. This method not only encourages students to use the time to learn English, but also expands the scope of collection of short videos, and then through a controlled screening process, to obtain a more tangible short videos of content.
For some words with equal conditional probability i, rare words have higher scores than common words, so for words with widely different frequencies, the scores are not comparable, which makes the mutual information evaluation function not useful for selecting high frequencies. Words and the possibility to choose rare words as the best features of the text. The S statistical assessment is defined as follows:
... (7)
Expected cross-entropy can only compute the feature items in the text. Therefore, the expected cross-entropy is better than the information gain in feature selection. The formula for calculating the expected cross-entropy is defined as follows:
... (8)
Text evidence weight is a relatively new evaluation function, which measures the difference between the probability of a class and the conditional probability of a given feature. The calculation formulas are as follows:
... (9)
The conditional probability of the i class when the term t appears:
... (10)
The probability that entry t does not appear in category I:
... (11)
The co-occurrence probability of entries M and N:
... (12)
In order to select the best parameters of the mail filter, the classification experiment is carried out using the verification data, and the best parameters are selected from the experimental results to adjust the filter. The experimental results of the verification data classification are shown in Figure 7.
Constructivism believes that the learning process is the process by which learners continue to construct cognitive information. The short videos, with its large number of real language materials and co-occurrence functions, allows learners to observe the use characteristics and laws of language in the true reproduction of language information. In addition, in the application of the short videos, the software technology can also automatically generate a short videos corresponding to a larger unit granularity from a short videos with a smaller unit granularity. Taking a short videos corresponding to the unit granularity as a word sequence as an example, the software uses the saved corresponding units in the short videos and their alignment information. In addition, we should analyze and sort out the information we listen to in a limited time, quickly grasp the inner and outer meanings of the information, and then form effective notes in time and keep them. Therefore, from the perspective of text data toy playing technology, shorthand teaching design should be based on the application ability of source language and the reserve ability of pan-professional knowledge. Teachers' guidance in teaching is still the most affinity and direct, and students' subjective initiative is still the main driving factor of learning. The network and short videos will not weaken the role of teachers, but should be placed in the overall teaching plan to save time, provide convenience and imagination space for teachers, and integrate with real classroom learning organically.
5. Conclusions
From the perspective of industry development, short videos still have a wide audience base and significant development space, and popular culture represented by them remains the mainstream of cultural market consumption. In order to better manage and utilize these rich video resources and make them more effective, automatic video description can help users improve the indexing speed and search quality of online videos. This article proposes a short video classification and retrieval technology based on image semantic description. And elaborated on the application of this technology in art education, in order to promote the progress of short video semantic description technology and the development of art education. The teaching mode based on short videos has broken through the time and space limitations of traditional art education, transforming from teacher teaching to students watching short videos for art education, which helps to share high-quality art education course resources. By utilizing information retrieval technology and sharing teaching resources in the network environment, we can effectively utilize short videos resources and intelligent retrieval and mining technologies. We have created a short videos system with syntax, semantics, and topic annotations, and a shared platform for multi-dimensional automatic retrieval. And based on short videos resources, a language has been constructed, with data mining and intelligent retrieval technologies, as well as a teaching model for warehouses.
References
Liu H J, Lan Y J, Ho Y Y, (2019). Exploring the Relationship between Self-Regulated Vocabulary Learning and Web-Based Collaboration, Journal of Educational Technology & Society, 17(4): 404-419.
Hsiao-I H. (2017). Teaching Specialized Vocabulary by Integrating a Short videos-Based Approach: Implications for ESP Course Design at the University Level, English Language Teaching, 7(5): 6-12.
Paker T, Yeliz E. (2019). The Effectiveness of Using Short videos-Based Materials in Vocabulary Teaching., Online Submission, 5(14): 62-81.
Juffs A, Friedline В E. (2021). Sociocultural influences on the use of a web-based tool for learning English vocabulary, System, 42:48-59.
Chatpunnarangsee K. (2018). Incorporating Short videos Technology to Facilitate Learning of English Collocations in a Thai University EFL Writing Course., Proquest Lie, 240-256.
Sharma A K, Kaur P, Anand S K. (2019). Evaluation of Content Based Spam Filtering Using Data Mining Approach Applied on Text and Image Short videos, Advances in Intelligent Systems and Computing, 258:561-577.
Belkhir Z F. (2018). A Survey on Teachers' Awareness and Attitudes on Computershort videos Data: An Assisted Technology-based EFL Vocabulary Selection and Instruction Source, Procedia - Social and Behavioral Sciences, 103:77-85.
Caws C G. (2023). Evaluating a web-based video short videos through an analysis of user interactions, ReCALL, 5(1): 20-25.
Mohamad A F N, Puteh S N. (2017). A Short videos-Based Evaluation on Two Different English for Nursing Purposes (ENP) Course Books, Advances in Language & Literary Studies, 5(15): 8-13.
Attia M, Pecina P, Toral A, et al. (2014). A short videos-based finite-state morphological toolkit for contemporary arabic, Journal of Logic and Computation, 24(2): 455-472.
Jeaco, Stephen. (2015). The Prime Machine : a user-friendly short videos tool for English language teaching and self-tutoring based on the Lexical Priming theory of language, 12(2): 36-42.
Shah M, Chakrabarti C, Spanias A. (2020). Within and cross-short videos speech emotion recognition using latent topic model-based features, EURASIP Journal on Audio, Speech, and Music Processing, 2015(1): 4-10.
Lihui Z A. (2020). Short videos-based Study of Collocational Use in Oral Production by Chinese EFL Learners, Foreign Language Learning Theory & Practice.
Schneider N, Hwang J D, Srikumar V, et al. (2016). A short videos of preposition supersenses in English web reviews, 36(2): 456-478.
Wang M. (2023). The Impact of Animation and Film English Education Environment on Students' Psychological Health, Revista Ibérica de Sistemas e Tecnologias de Informagäo, (E62): 660-671.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023. This work is published under https://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
[...]we selected a series of short videos containing English vocabulary and identified and classified the vocabulary within them. [...]evaluate the impact of short videos on English vocabulary acquisition by testing learners' vocabulary mastery (Belkhir et al. 2013). The exploration of teaching models based on short videos has enriched the research content of linguistics, providing a large number of examples for ESP teaching with its rich themes, practical information, and convenient functions, facilitating students' exposure to real and vivid language materials, providing new ideas and perspectives for foreign language teaching and research, and enriching the methodology of foreign language teaching and research (Caws et al. 2013). Research areas include classification, teaching methods, needs analysis, textbook design, teacher training, short video research, and evaluation testing. [...]the corresponding unit is the embodiment of the translation unit in parallel short videos, which is the substantive translation unit.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Dept of English, Faculty of Modern Languages and Communication, LTniversiti Futra Malaysia, 43400, LTPM, Serdang, Selangor, Malaysia
2 Dept of Foreign Language, Faculty of Modern Languages and Communication, LTniversiti Putra Malaysia, 43400, LTPM, Serdang, Selangor, Malaysia