A Behavior-Driven Forum Spammer Recognition

Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

In recent years, with the background of social media, forums have become a specific community for users who have the same interests. An increasing number of users post related reviews in forums [1]. These reviews cover a wide variety of content, ranging from breaking news, discussions on various topics, posts about one’s personal life, and the sharing of activities and interests [2]. As a significant platform for the users’ discussion, some forums maintain a high level of user activity. In addition, the feedback from forum users is usually an important source of information for potential consumers to access product features. Enterprises also aim to discover product defects and real users’ requirements via reviews in forums.

Due to the strong negative response to the initial exposure to erroneous information, it is difficult to correct such influences later. Once a network agrees on what happened, the collective memory becomes relatively resistant to competing information [3]. Thus, fake reviews in forums are now the biggest problem for forum users and enterprises.

Lots of current studies indirectly identify fake reviews by recognizing forum spammers based on behavioral features or sentiment analysis methods [4–7]. However, forum spammers are constantly updating their technology or changing their posting methods to prevent them from being detected by the fake reviews recognition system, which makes many methods no longer useful for recognizing forum spammers. Although the forum spammers try to disguise themselves as ordinary users, this purposeful posting will eventually show different behaviors from ordinary users. Therefore, this paper changes the research target from understanding abnormal reviews and the suspicious relationship among forum spammers to discovering how they must behave (follow or be followed) to achieve their monetary goals. Firstly, we classify forum users as automated spammers, marketing spammers, and normal users according to the different behavior patterns of forum users. Automated spammers are those forum users who are controlled by the spam software. They disguise themselves as normal users who display an intention to purchase the related product or express dissatisfaction toward a related product. Normally, automated spammers mislead forum users by posting reviews with a biased emotional tendency. Marketing spammers are real users who are hired by a spam company. In contrast to automated spammers, marketing spammers disguise themselves as leading users in forums to promote related products. They post deep, detailed, and positive reviews to overstate the quality of related products. In general, the more detailed analysis, the more useful information for forum users [8–10]. Moreover, marketing spammers, as a new but contemptible marketing mode, are emerging in many forums [11]. Then, we propose a behavior-driven automated spammer recognition (ASR) model and a marketing spammer recognition (MSR) model to recognize forum spammers based on the above three types of forum users. Final experimental results illustrate our behavior-driven recognition models are able to accurately detect forum spammers.

The paper is organized as follows: Section 2 reviews the related works. In Section 3, we define some variables to measure the behavior features of forum users. The proposed ASR and MSR models are introduced in Section 4. Subsequently, we describe the experimental dataset and discuss the main experimental results in Section 5. Finally, we conclude with a summary in Section 6.

2. Related Works

At present, the research on recognizing spammers and fake reviews is mainly focused on social media like Twitter. Some e-business websites, such as Amazon and Taobao, have also achieved more research attention. In terms of recognizing forum spammers, a few studies have been conducted in recent years, mainly focusing on the recognition of fake forums and forum spam automator tools. Some recognition methods based on abnormal text content have also been proposed by researchers. Some researchers attempt to use abnormal URL characteristics in reviews and the link structure of the graph rooted at the posted URL to recognize posts from the forum spammers [12, 13]. Additionally, contents unrelated to the target posts in the forum were used to recognize forum spammers [14]. Shin [15] discovered some features and operational mechanisms of a forum spam automator tool named XRumer. This study provided some ideas for recognizing the forum spammers who used this tool. Some researchers proposed an approach that uses features such as the submission time of replies, thread activeness, position of replies, and spamicity of a forum user’s first post to construct a forum spammer recognition model [5]. The significant differences in the action time and action frequency between forum spammers and normal users were also used to construct the forum spammer recognition model [7]. The performance of the classifier in [6], with an integrated semantic analysis, was quite promising in the real-world case study, as confirmed with both supervised learning and unsupervised learning techniques by comparing a nonsemantic and semantic analysis. As demonstrated in [16], by analyzing the features of forum users, forum spammer, and forums, the authors found that every forum has many fake reviews, including some forums with good reputations.

However, our work found that the methods mentioned above are no longer working well. For instance, most users are now able to easily distinguish rough and fake websites with many advertisements, so the number of fake reviews with URLs [12, 13] has become much lower. Additionally, we found that the recognition effect of the method in [14] would be compromised if a large number of forum spammers have occupied the forums. In our study, the abnormal feature named spamicity in the first post in [5] does not work currently for recognizing forum spammers. At the same time, we found that marketing spammers have a similar abnormal behavioral feature named the submission time of replies in [5] but we cannot find the same behavioral pattern among automated spammers. In [16], the method that recognizes spam pages based on spam content features is still effective, but this method cannot efficiently recognize forum spammers who have many reviews that are similar to those of normal users. In [6], the authors mentioned that once a mission is finished, a paid spam poster normally discards the user ID and never uses it again, potential paid spam posters are not willing to continue their activities for a long time.

In recent years, research on spammers in social media and e-business websites has been increasing. Liu [17] proposed a two-stage cascading model, named ProZombie, which balanced effectiveness and accuracy well in recognizing spammers in Weibo. In [18], message content, user behavior, and social relationship information were fully used to recognize spammers in Weibo. The work by Hayati et al. [19] proposed using a self-organizing map and neural networks to determine the features of spammers on the Internet. They classified spammers into four categories based on the different behavioral patterns of spammers: content submitters, profile editors, content viewers, and mixed behavior. Radford et al. [20] constructed an unsupervised representation learning system, which reached an accuracy of 91.8% in sentiment analysis by using reviews in Amazon as training datasets. Furthermore, the authors in [12, 21] recognized fake reviews via the difference of emoticons, URLs, @ symbols, and photos in different reviews from spammers and normal users. Dewang et al. [22] proposed a spam detection framework combining the PageRank algorithm to detect the spam host of websites. In [6], the authors distinguished the fake reviews by using word segmentation for the text and calculating the emotional tendency. Jiang and Ratkiewicz [23, 24] found that spammers have a “synchronized” behavioral pattern for a particular target and that it is significantly different from that of normal users. A spam detection model called SkyNet using user social networks and the posted photos in reviews has been proposed by Sun and Kenneth Loparo [25]. In [26], the final recognition accuracy for spammers was improved by 9.73% by integrating the social network and content information into a matrix decomposition-based learning model. The above recognition methods for spammers in social media and E-business websites are developed well. However, our work found that these methods cannot be directly used to recognize forum spammers as they are not well adapted to their special behavioral patterns.

Our work is inspired by the idea of using noncontent-based features. Furthermore, Asghar et al. [27] also illustrated the effectiveness of spam-related features on improving the performance of spam detection works. Thus, we construct behavior-driven forum spammer recognition models by understanding how forum spammers must behave (follow or be followed) for monetary purposes. To the best of our knowledge, this work is the first to construct forum spammer recognition models based on forum users’ different behavioral patterns. In addition, we achieved promising experimental results on real-world forum datasets.

3. Observed Features

Automated spammers and marketing spammers often cooperate with each other to mislead forum users via the different roles they play in forums. In addition, the differences in roles they play inevitably lead to differences in the behavioral patterns they exhibit in forums. To recognize these forum spammers, in this section, the features of abnormal behaviors that are likely to be linked with the forum spammers are proposed and some variables are defined to measure these features. Subsequently, these variables can be exploited in our recognition models.

3.1. Automated Spammer Features

In this section, we perform a statistical analysis to investigate the objective features that are useful in capturing the reply behavior of automated spammers. And for each feature, we define the relevant variable. The four features of automated spammers are fully described as follows.

3.1.1. Reply Manner

The work in [6] reported that the spammers usually tend to post new comments because they do not have enough patience to read the comments and replies of others. The authors also proposed the response indicator (whether the comment is a new comment or a reply to another comment) to capture the abnormal behavior. However, automated spammers in forums never post any replies to the comments of others, and they only post new replies. To recognize this more extreme abnormal behavioral pattern in forums, we define ${RM}_{i}$ as an indicator of whether forum user $i$ only has new replies or has some replies to other comments (even if he only has a single reply for another comment): $\begin{matrix} (1) & {RM}_{i} = \begin{cases} 0, & never reply to another comment, \\ 1, & otherwise. \end{cases} \end{matrix}$

As shown in Table 1, in the labelled dataset, we find 100% of automated spammers never reply to another comment, but only 1.68% of normal users have this similar behavior. On contrary, most normal users in forums not only post new replies but also post many replies to the comments of others.

Table 1

Reply indicators.

RM	0 (%)	1
Automated spammers	100	0
Normal users	1.68	98.32%

3.1.2. Replies Number

Posting a large number of replies within a single minute also indicates an abnormal behavior. As shown in Table 2, in the labelled dataset, some automated spammers post more than 30 replies in a single minute, which means that they can post a reply within 2 seconds on average. To capture this abnormal behavioral pattern, we define ${MRN}_{i}$ as the maximum replies number within a single minute of forum user $i$ . However, relying only on the maximum replies number may cause misjudgment, because normal users may also post a large number of replies at a certain point in time. Consider that this behavior pattern is frequent for automated spammers, but occasionally for normal users. We define $AVG_{MRN}_{i}^{n}$ as the average value of the top $n$ maximum replies number within a single minute of forum users $i$ . Empirically, the value of $n$ is set to 10.

Table 2

Percentage of the number of replies.

MRN ≥	10 (%)	20 (%)	30
Automated spammers	6.29	0.98	0.39%
Normal users	1.63	0.16	0

3.1.3. Cooccurrence Frequency

To avoid being detected, automated spammers in the forum utilize different reply content from their databases frequently to reply to different original posts. The phenomenon that a forum spammer uses the same content to reply to an original post continuously has become rare now. However, currently, spam teams that are constituted by different automated spammers start to post fake replies to target posts continuously. Thus, it leads to cooccurrence behavior. This means that many automated spammers appear together at the same time or within a short time period. As shown in Table 3, in our labelled dataset, 59.14% of the automated spammers have this behavior that any two forum users post replies together with one minute more than five times. In contrast, only 3.52% of normal users have the same behavioral pattern. Therefore, we define ${MCF}_{i}$ as the maximum cooccurrence frequency between user $i$ and other forum users who simultaneously post a reply within one minute. Similar to the replies number, the reply time of normal users may coincide with the automated spammers. Therefore, $AVG_{MCF}_{i}^{n}$ is defined as the average value of the top $n$ maximum cooccurrence frequency between user $i$ and other forum users who simultaneously post a reply within one minute.

Table 3

Percentage of the cooccurrence frequency.

CF ≥	3 (%)	4 (%)	5 (%)	6 (%)	7 (%)
Automated spammers	74.26	64.44	59.14	54.42	40.47
Normal users	8.23	5.25	3.52	2.72	2.20

3.1.4. Duplicate Replies (DR)

Automated spammers usually post duplicate replies under different original posts [28]. Our study finds that a few normal users also post some duplicate replies, such as “I support the original poster.” However, the higher the ratio of a user’s duplicate replies, the more likely he/she is an automated spammer in the forum. To capture this abnormal behavior, we define ${DRR}_{i}$ as the duplicate replies rate of forum user $i$ , which can be calculated by the following equation: $\begin{matrix} (2) & {DRR}_{i} = \frac{2 \sum_{j}^{N} \sum_{k}^{N} sim r_{j}, r_{k}}{N N + 1}, \end{matrix}$ where $N$ denotes the total number of replies posted by user $i$ , $r_{j}$ represents the text vector of $j^{th}$ reply, and $sim r_{j}, r_{k}$ denotes the text similarity of $j^{th}$ reply and $k^{th}$ reply. In this paper, the text similarity between two replies is measured by the TF-IDF weighted word embedding. Reply $R_{j}$ can be represented as $\begin{matrix} (3) & r_{j} = \sum_{t \in R_{j}} w_{t} \cdot {TFIDF}_{t}, \end{matrix}$ where $t$ denotes the word in $R_{j}$ , $w_{t}$ represents the word vectors of word $t$ generated by pretrained word embedding model, and ${TFIDF}_{t}$ denotes the $TFIDF$ value of word $t$ . Then, for each two replies $j$ and $k$ , their text similarity can be measured by the following equation: $\begin{matrix} (4) & sim r_{j}, r_{k} = \frac{r_{j} \cdot r_{k}}{r_{j} \times r_{k}} . \end{matrix}$

As shown in Table 4, 55.40% of automated spammers have a duplicate replies rate of more than 0.5, but the rate for the normal users is extremely low.

Table 4

Percentage of the ratio of duplicate replies.

DRR ≥	0.3 (%)	0.4 (%)	0.5 (%)	0.6 (%)	0.7 (%)
Automated spammers	58.74	56.19	55.40	44.79	36.74
Normal users	15.93	8.70	3.65	1.05	0.03

3.2. Marketing Spammer Features

As discussed before, marketing spammers usually disguise themselves as the leading users in the forums. These spammers not only post replies but also publish many original posts as do normal users. In other words, they are real forum users but they do what the spammers always do. Therefore, it is difficult to recognize marketing spammers using a recognition model that is constructed based on the abnormal behavioral features of automated spammers. In this section, three abnormal behavior features are identified in terms of the posting behavior of marketing spammers.

3.2.1. Posting in Many Forums

Due to the increasing strict registration process in forums, a forum account, especially a reputable forum account, is becoming a rare resource for marketing spammers. To maximize their commercial interests, the forum accounts of marketing spammers normally work in several forums. In other words, marketing spammers may publish fake original posts for different targeted products in several forums. As shown in Table 5, in the labelled dataset, the average number of forums in which marketing spammers publish original posts is much higher than that of normal users. Therefore, the variable $NF$ is defined as the number of forums in which a forum user posts original posts within a year.

Table 5

The number of forums in which marketing spammers publish original posts.

Marketing spammer	The number of forums
MS₁	45
MS₂	33
MS₃	57
MS₄	132
MS₅	73
MS₆	35
MS₇	52
MS₈	66
MS₉	75
MS₁₀	136
MS₁₁	49
Average	68.45
Average (normal user)	3.56

3.2.2. Posting Intensity Is High and Uneven

To strengthen the performance of the marketing effort, marketing spammers usually publish a series of original posts and actively interact with other forum users during the marketing period. In this period, marketing spammers promote the targeted product via the diffusion of a large number of positive word-of-mouth recommendations that they make. Moreover, they sometimes publish many negative word-of-mouth recommendations to slander their competitors. All of these are for their marketing purpose. Therefore, once the marketing period is finished, the activity of marketing spammers will decline sharply or the users even disappear completely. Moreover, the point in time at which marketing spammers post original posts usually is highly correlated with the targeted product’s marketing events. As shown in Figure 1, a new car named Tiggo7 began to sell from September 2016, and with the rising search number (yellow line), the activity of marketing spammers also began to increase. Apparently, the average number of postings of marketing spammers reached the maximum 3 months after the new car was put on the market. However, with the decline of the search numbers and the end of the marketing period, the average number of postings by marketing spammers began to decline sharply or even reached zero. Moreover, the average number of postings of normal users was always stable and low. That is, the posting and replying activities of marketing spammers show alternating or cyclical fluctuations. As such, two variables $NOP$ and $SDNP$ are defined to measure this difference. The former variable denotes the number of original posts published by a forum user within a year and the latter variable denotes the standard deviation of the number of posts published by a forum user over 12 months.

[figure omitted; refer to PDF]

In addition, we notice that a few forum users are automobile evaluators who posted many original posts and replies in many forums. Their behavior patterns are similar to those of marketing spammers, so they may be considered marketing spammers by the MSR model. As a special user group in the automobile forum, these automobile evaluators are not considered in our experiments because there are no such users in other types of forums. Eventually, the ASR and MSR models recognized 41 forum spammers in all the Baojun610 forums. The experimental results show that our behavior-driven recognition models are effective and accurate.

More interestingly, we noticed that a forum user named “Baidu Knows” (in Chinese), indicated by the green circle in Figure 4, and the forum user named “Secret Passage” (in Chinese), indicated by the yellow circle in Figure 4, surprisingly posted original posts in 140 and 118 forums, respectively. As we can see in Figure 3, they completely stopped posting after many original posts. The number of original posts that they posted is significantly higher than the average number of original posts of other forum users. We then accessed their user profiles on the Bitauto website, as seen in Figures 5 and 6.

[figure omitted; refer to PDF]

As shown in Figure 5, the forum user named “Baidu Knows” (in Chinese) posted many original posts in forums on March 25, 2015. In the morning, he complained that his automobile, a VW Golf, could not be started. Then, in the afternoon, he watched a DCD in his automobile, an Infiniti QX70. His last original post was posted on August 04, 2017. Currently, his original posts and replies have been deleted by the officials, and the account has been closed. This also proves that our MSR model is effective and that the recognition result is precise.

As seen from Figure 6, the forum user named “Secret Passage” (in Chinese) is an officially verified forum user who has a high level of influence. He posted original posts in many forums in a single day, and this behavior is similar to that of the forum user named “Baidu Knows” (in Chinese). He not only praised his automobile, a Geely Vision that has been driven 60,000 km with few serious problems so far, but also complained about the idling problem of his Buick Regal automobile, which has been driven 20,000 km. In addition, he also wishes to sell his Senova D50 automobile. From his contradictory words, we can infer that he is a forum spammer.

Table 9

Comparison experiment with other models.

Model	Precision	Recall	F1-score
Hu’s model [4]	0.886	0.918	0.902
Chen’s model [5]	0.878	0.922	0.897
Yu’s model [18]	0.924	0.943	0.933
The proposed architecture	0.964	0.938	0.951

5.2.3. Experiment 3: Comparison with Other Methods

In this section, the proposed architecture is compared with three representative models [4, 5, 18]. Table 9 shows the comparisons of the precision, recall, F1-score of each model on the Tiggo7 dataset. It is obvious that the proposed model outperforms other models. We believe that this is because we take more account of the user’s behavior features. This also shows that the behavior feature-based method is better than the previous methods.

5.2.4. Experiment 4: Analysis of Running Time

Finally, we count the running time of the proposed model, as shown in Table 10, including feature extraction and two-level model. We can easily find that feature extraction takes up most of the time. This is because we need to calculate not only the personal behavior features of users but also the interactive behavior features between different users, which increases the burden of calculation. In addition, according to the feature extraction method described in Section 3, we can infer that the complexity of feature extraction depends on the following points: the total number of forum users, the number of forum posts, and the length of forum posts.

Table 10

Running time of the proposed model.

Total time (min)	Feature extraction (min)	Two-level model
		ASR (min)	MSR (min)
16.16	12.85	1.59	1.72

6. Conclusion

Fake reviews in forums are always an obstacle for enterprises to make effective use of the information in forums. And forum spammers are constantly updating their technology or changing their posting methods to prevent them from being detected by the fake reviews recognition system. Although the forum spammers try to disguise themselves as ordinary users, this purposeful posting will eventually show different behaviors from ordinary users. Therefore, this paper changes the research target from understanding abnormal reviews and the suspicious relationship among forum spammers to discovering how they must behave (follow or be followed) to achieve their monetary goals. Based on different behavior features, forum spammers can be classified into automated forum spammers and marketing forum spammers. The support vector machine-based ASR model and the k-means clustering-based MSR model are developed, and their applications are demonstrated by using car forum reviews written in Chinese. The final experimental results illustrate the effectiveness of our behavior-driven recognition models.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (no. 72101075 and 72101078), the Fundamental Research Funds for the Central Universities (nos. JZ2020HGQA0168 and JZ2021HGQA0204), and the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (no. 71521001).

References

[1] X. T. Vu, P. Morizet-Mahoudeaux, A User-Centered Approach for Integrating Social Data into Groups of Interest, 2015.

[2] B. Zhao, Z. Zhang, W. Qian, A. Zhou, "Identification of collective viewpoints on microblogs," Data & Knowledge Engineering, vol. 87 no. 9, pp. 374-393, DOI: 10.1016/j.datak.2013.05.003, 2013.

[3] L. Spinney, "How Facebook, fake news and friends are warping your memory," Nature, vol. 543, pp. 168-170, DOI: 10.1038/543168a, 2017.

[4] X. Hu, T. Jiliang, G. Huiji, L. Huan, "Social spammer detection with sentiment information," Proceedings of the IEEE International Conference on Data Mining IEEE, pp. 180-189, DOI: 10.1109/icdm.2014.141, .

[5] Y. R. Chen, H. H. Chen, "Opinion spam detection in web forum: a real case study," In Proceedings of the, International Conference, pp. 173-183, DOI: 10.1145/2736277.2741085, .

[6] C. Chen, K. Wu, V. Srinivasan, X. Zhang, "Battling the internet water army: detection of hidden paid posters," 2011. http://arxiv.org/abs/1111.4297

[7] P. Hayati, K. Chai, V. Potdar, A. Talevski, "Behaviour-based web spambot detection by utilising action time and action frequency," pp. 351-360, DOI: 10.1007/978-3-642-12165-4_28, .

[8] J. P. Singh, S. Irani, N. P. Rana, Y. K. Dwivedi, S. Saumya, P. Kumar Roy, "Predicting the “helpfulness” of online consumer reviews," Journal of Business Research, vol. 70 no. 70, pp. 346-355, DOI: 10.1016/j.jbusres.2016.08.008, 2017.

[9] Y.-M. Li, H.-M. Chen, J.-H. Liou, L.-F. Lin, "Creating social intelligence for product portfolio design," Decision Support Systems, vol. 66, pp. 123-134, DOI: 10.1016/j.dss.2014.06.013, 2014.

[10] S. M. Mudambi, D. Schuff, "What makes a helpful online review? a study of customer reviews on amazon.com," Social Science Electronic Publishing, vol. 34 no. 1, pp. 185-200, 2012.

[11] Y. Chen, J. Xie, "Online consumer review: word-of-mouth as a new element of marketing communication mix," Management Science, vol. 54 no. 3, pp. 477-491, DOI: 10.1287/mnsc.1070.0810, 2008.

[12] M. Ghannoum, "Prevalence and mitigation of forum spamming," vol. 34 no. 17, pp. 2309-2317, DOI: 10.1109/INFCOM.2011.5935048, .

[13] Y. Shin, S. Myers, M. Gupta, P. Radivojac, "A link graph-based approach to identify forum spam," Security and Communication Networks, vol. 8 no. 2, pp. 176-188, DOI: 10.1002/sec.970, 2015.

[14] Y. J. Lee, J.-M. Shim, H.-G. Cho, G. Woo, "Detecting and visualizing the dispute structure of the replying comments in the internet forum sites," pp. 456-463, DOI: 10.1109/cyberc.2010.90, .

[15] Y. Shin, M. Gupta, S. Myers, "The nuts and bolts of a forum spam automator," .

[16] Y. Niu, W. Yi-Min, C. Hao, M. Ming, H. Francis, "A quantitative study of forum spamming using context-based analysis," .

[17] H. Liu, Z. Yuchao, L. Hao, Wu Junjie, W. Zhiang, Z. Xu, "How many zombies around you," Proceedings of the 2013 International Conference on Data Mining, pp. 1133-1138, DOI: 10.1109/icdm.2013.166, .

[18] D. Yu, N. Chen, F. Jiang, B. Fu, A. Qin, "Constrained NMF-based semi-supervised learning for social media spammer detection," Knowledge-Based Systems, vol. 125, pp. 64-73, DOI: 10.1016/j.knosys.2017.03.025, 2017.

[19] P. Hayati, V. Potdar, K. Chai, A. Talevski, "Characterization of web spambots using self organizing maps," Computer Systems Science and Engineering, vol. 26 no. 2, 2011.

[20] A. Radford, R. Jozefowicz, I. Sutskever, "Learning to generate reviews and discovering sentiment," 2017. http://arxiv.org/abs/1704.01444

[21] L. Akoglu, M. Mcglohon, C. Faloutsos, "Oddball: spotting anomalies in weighted graphs," pp. 410-421, DOI: 10.1007/978-3-642-13672-6_40, .

[22] R. K. Dewang, A. K. Singh, "State-of-art approaches for review spammer detection: a survey," Journal of Intelligent Information Systems, vol. 50 no. 2, pp. 231-264, DOI: 10.1007/s10844-017-0454-7, 2018.

[23] M. Jiang, C. Peng, B. Alex, F. Christos, Y. Shiqiang, "Inferring strange behavior from connectivity pattern in social networks," Proceedings of the 2014 Pacific-Asia Conference on Knowledge Discovery and Data Mining, .

[24] J. Ratkiewicz, M. Conover, B. G. Alves, A. Flammini, F. Menczer, "Detecting and tracking political abuse in social media," .

[25] Y. Sun, K. Loparo, "Opinion spam detection based on heterogeneous information network," ,DOI: 10.1109/ictai.2019.00277, .

[26] S. Ghosh, V. Bimal, K. Farshad, S. Naveen Kumar, K. Gautam, B. Fabricio, G. Niloy, G. Krishna Phani, "Understanding and combating link farming in the twitter social network," Proceedings of the 21st International Conference on World Wide WebACM, pp. 61-70, DOI: 10.1145/2187836.2187846, .

[27] M. Z. Asghar, A. Ullah, S. Ahmad, A. Khan, "Opinion spam detection framework using hybrid classification scheme," Soft computing, vol. 24 no. 5, pp. 3475-3498, DOI: 10.1007/s00500-019-04107-y, 2020.

[28] E. P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, H. Wirawan Lauw, "Detecting product review spammers using rating behaviors," Proceedings of the ACM International Conference on Information and Knowledge Management ACM, pp. 939-948, DOI: 10.1145/1871437.1871557, .

[29] JATO. http://www.jato.com/global-car-sales-5-6-2016-due-soaring/

[30] iResearch, "The monthly report about internet advertising of chinese automotive industry," 2016. in Chinese

[31] A. Mukherjee, B. Liu, N. Glance, "Spotting fake reviewer groups in consumer reviews," Proceedings of the International Conference on World Wide Web ACM, pp. 191-200, DOI: 10.1145/2187836.2187863, .

Word count: 4477

Show less

Copyright © 2021 Han Su et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Translate

Forum comments are valuable information for enterprises to discover public preferences and market trends. However, extensive marketing and malicious attack behaviors in forums are always an obstacle for enterprises to make effective use of this information. And these forum spammers are constantly updating technology to prevent detection. Therefore, how to accurately recognize forum spammers has become an important issue. Aiming to accurately recognize forum spammers, this paper changes the research target from understanding abnormal reviews and the suspicious relationship among forum spammers to discover how they must behave (follow or be followed) to achieve their monetary goals. First, we classify forum spammers into automated forum spammers and marketing forum spammers based on different behavioral features. Then, we propose a support vector machine-based automated spammer recognition (ASR) model and a k-means clustering-based marketing spammer recognition (MSR) model. The experimental results on the real-world labelled dataset illustrate the effectiveness of our methods on classification spammer from common users. To the best of our knowledge, this work is among the first to construct behavior-driven recognition models according to the different behavioral patterns of forum spammers.

Details

Title

A Behavior-Driven Forum Spammer Recognition Method with Its Application in Automobile Forums

Author

Han, Su¹; Ren, Minglun¹; Wang, Anning¹; Tang, Xiaoan¹

; Ni, Xin²; Zhao, Fang³

¹ School of Management, Hefei University of Technology, Hefei 230009, China
² Department of Design, Information System and Inventive Processes, INSA de Strasbourg, Strasbourg, France
³ School of Management, Hefei University of Technology, Hefei 230009, China; Department of Information Systems and Analytics, National University of Singapore, 13 Computing Drive, Singapore

Editor

Xin Tian

Publication year

2021

Publication date

2021

Publisher

John Wiley & Sons, Inc.

ISSN

1024123X

e-ISSN

15635147

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2021/7682579

ProQuest document ID

2571755830

A Behavior-Driven Forum Spammer Recognition Method with Its Application in Automobile Forums

Jump to:

Full text

Abstract

Details

Suggested sources