Abstract. The paper investigates relationship between world events as reported in newspapers and characteristics of the newspapers in terms of political alignment and economic conditions. We propose a novel methodology that includes selection, representation and clustering of the newspapers to analyse relationship of the events and characteristics of the newspaper. We represent world events by a set of concepts and content categories of the news articles reporting on the event. Each newspaper is represented by a set of events they reported about over several years. We investigate different similarity measures between the newspapers to see whether the newspapers with the same characteristics are reporting on similar events over a given time span.
The results indicate: 1) the representation of the news events with the Wikipedia-concepts and DMOZcategories appears an appropriate way to understand relationships between the newspapers, 2) economic conditions of the country of the newspaper publisher reflect better in Wikipedia-concepts than when using representation with DMOZ-categories, whereas for identifying politically aligned groups of newspapers, DMOZ-categories stand out more suitable, 3) for capturing economic groups, clustering using the Dynamic Time Warping similarity between the trend lines of newspapers is better aligned with the ground truth groups than others tested similarities, whereas for capturing political group, Jaccard distance using the frequent terms and Euclidean distance between the trend lines turn up more useful.
Keywords. Information Propagation Barriers, Political Alignment, Economic Conditions, Dynamic Time Warping (DTW), Euclidean Similarity, Jaccard Similarity.
(ProQuest: ... denotes formulae omitted.)
1Introduction
Economic strength and political situation of a country have a strongest association with news prominence. One of the assumptions of many news-flow related studies is that external variables such as economic power and political events in a country can define the scope of fame around the world. In fact, there are also various internal factors such as the structure of international telecommunications, the presence of news agencies, as well as the editorial practices and traditions that effect the news prominence of a country (Segev, 2015). According to news flow theories, multiple determinants impact international news spreading. The economic power of a country is one of the factors that influence news spreading. One of the parameters to represent the economic condition of a country is the economic growth/income level. Moreover, it was noted that the magnitude of economic interactivity between countries can also impact the news flow (Wu, 2007).
The idea of event-centric news spreading disclosed internationally and become popular due to globalization (Hong et al., 2017). Global events become famous and catch the attentions in all corner of the world. News agencies or news publishers play their role in this process. Varying nature of living styles, cultures, economic conditions, time zone, and geographical juxtaposition of countries present a significant role in the process of reporting on news events (Wei et al., 2020, Quezada et al., 2015, p. 935938). The news to be spread wider cross multiple barriers such as linguistic, economic, geographical, political, time zone, and cultural barriers. News publisher have different characteristics such as the reporting languages, their political alignment, economic conditions, cultural values, time-zone, and geographical position of their headquarters. In this paper, we focus on two characteristics of news publishers, namely political alignment and economic situation. We select ten most read daily newspapers in the world in 2020 and collect information using Event Registry about the news they have published. Event Registry is a system which analyses news articles and identify groups of articles that describe the same event and represent them as single event (Leban et al., 2014, p. 107-110). The description of the meta data of an event is shown in the Table 1.
Following are the main scientific contributions of this paper:
* A novel methodology that includes selection, representation and clustering of the newspapers to analyze relationship between the reported events and characteristics of the newspaper.
* Evaluation of the proposed methodology on realworld news data using representation of world events by a set of concepts and content categories testing different similarity measures between newspapers: Dynamic Time Warping (DTW) with cosine similarity or with Euclidean distance on trend lines of concepts and categories, Jaccard similarity between the concepts and categories.
* We show that what type of features are more suitable for clustering newspapers based on economic and political characteristics.
2Related Work
International news about different events led us to investigate the reasons why news regarding specific events either spread or do not spread to certain geographic areas. Media focuses on specific foreign and regional events based on some certain factors. For instance, spreading of events may tilt toward developed countries such as United States, the United Kingdom, or Russia. Moreover, it may be due to geographical juxtaposition (latitude, longitude) of countries (Wilke et al., 2012). There is a great deal of negotiation between political actors and journalists in news production to enhance their influence on news coverage. It will be true to say that fake news is produced based on many factors and it is surrounded by a paramount factor that is political effect (Martens et al., 2018). Therefore, political alignment of publishers can more or less impact their coverage of different events. Two of the determinants for news coverage are economic conditions and association between countries. These factors also impact information selection, analysis, and propagation (chang et al., 1992).
There are few studies that have worked on finding the relation of news outlets on political and economic activities. Generally, news reporting about different events (elections etc.) is inclined towards certain characteristics of newspapers. As there is a tendency to support underground and indirectly, the research interest in the reporting characteristics of each newspaper has begun (Jo et al., 2018). The study of information flows between media sources from different countries explores the dynamics underlying transnational communication spaces. Castells sees the emergent Euro-state not only as a political-economic zone but, by virtue of privileging its network character, also as a specific kind of communicative space (veltri , et al., 2012). It has also appeared that local newspapers have a relatively distinctive content emphasis (lin et al., 2001). Filla investigates the political participation by the local news outlets in elections and find the relationship between the political participation and availability of local news outlets (Filla, 2010, p. 679692). Another study was conducted to find the correlation at the outlet level between public trust and experts' evaluation. It had compared the evaluation of accuracy of news outlets and trust scores (Schulz, et al., 2020). News agencies tend to follow the national context in which journalists operate. One of the related examples is the SARS epidemic study which found that cross-national contextual values such as political and economic situations impact the news selection (Camaj, 2020, p. 635-653). A great amount of work regarding fake news dwells on different strategies, while few studies considered political alignment to have a compelling effect on news spreading (Bakshy et al., 2015, p. 1130-1132). (Maurer, 2018, p. 2024-2041) strongly proved it to be a major strategy in news agencies to control the news and change accordingly due to the involvement of journalists and political actors.
Although the previous work involves relationship between outlets, and political activities or public interests, our work focused directly on studying and confirming the political alignment and economic conditions with news outlets. There should be a representation and computational method to reflect the political and economic characteristics. The objective is to find a representation able to cluster the input of text in their original groups. Previously a newspaper corpus having five knowledge fields (Human Sciences, Biological sciences, Social Sciences, Religion and Thought, Exact Sciences) in Brazilian Portuguese was used to verify whether an automated clustering process could create the correct clusters of newspapers (Afonso, et al., 2014). We utilize more than 65,000 news events published by top ten newspapers across different countries in English. In addition to that we consider the temporal information of events while finding similarities between newspapers. There are two approaches (hierarchical and non-hierarchical clustering) to cluster the text. We focus on a nonhierarchical text clustering method. Non-hierarchical text clustering is applied when the goal is to produce text clusters which do not fit in specific knowledge hierarchy (Afonso, et al., 2014).
3Data Description
3.1Data Statistics
We choose the top 10 daily read newspapers in the world in 2020 (https://www.trendrr.net/) and collect the events reported by these newspapers using Event Registry over the time period of 2016-2020. Approximately 8000 events belong to each newspaper except "Zaman" (only 900 events) (see Table 1). Figure 1 shows the number of events reported by the selected newspapers on a yearly basis.
The attributes of an event with description have been displayed in the Table 1. Few attributes are selfexplanatory such as Uri, title, summary, date, source, and total news articles. Concepts are the annotation for events. Concepts can represent entities (locations, people, organizations) or non-entities (things such as personal computer, toy). In Event Registry Wikipedia's URLs are used as concept URIs. DMOZ-categories represent what topic the content is about. The DMOZ project is a hierarchical collection of web page links organized by subject matters\footnote{https://dmozodp.org/}. Event Registry use top 3 levels of taxonomy which amount to about 50,000 categories (https://eventregistry.org/documentation?tab=terminol ogy).
Each newspaper leans to a different political alignment in the political spectrum. We estimate the political alignment of a newspapers through the political alignment of its publisher and the economic conditions through the country of headquarter of its publisher (see Figure 2). We fetched the headquarter and the political alignment of each newspaper from Wikipedia Info-box. Each newspaper has its headquarters in different countries varying the economic ranking and the income levels (see Table 2).
There are four main income levels of the economies: Low-Income Economies (\$1035 or less), Lower-Middle Income Economies (\$1036 to \$4045), Upper-Middle Income Economies (\$ 4046 to \$12,535), and High-Income Economies (\$12,535 or more) (<https://datahelpdesk.worldbank.org/knowledgebase/a rticles/906519>). The overall ranking (from 1 to 149) of each country bases on 12 features: Safety and Security, Personal Freedom, Governance, Social Capital, Investment Environment, Enterprise Conditions, Market Access and Infrastructure, Economic Quality, Living Conditions, Health, Education, and Natural Environment. Table 2 shows the political alignment of each newspaper and economic conditions (ranking, income-level) of the headquarters of the newspapers (https://www.prosperity.com/rankings).
3.2Similarity Measures
We propose to estimate similarity between the newspapers by looking at events they are reporting about over a period of time. In one case we consider trend lines of concepts or content categories characterising the events and apply Dynamic Time Warping or Euclidean distance. In the other case we ignore the time dimension and simply take the union of all the concepts or categories from events over the years.
3.2.1Dynamic Time Warping (DTW)
Dynamic Time Warping is a method for calculating the similarity between two time series which can occur at different times or speeds. Its ability to warp time axis and find optimal alignment between time series has made it very popular. DTW has been used
in several disciplines such as: Speech recognition, gesture recognition, data mining, robotics, manufacturing and medicine. DTW aligns two time series in the way some distance measure is minimized (usually Euclidean or Cosine distance is used). Optimal alignment (minimum distance warp path) is obtained by allowing assignment of multiple successive values of one time series to a single value of the other time series and therefore DTW can also be calculated on time series of different lengths (Strle, et al., 2009).
3.2.2Euclidean Distance
Euclidean distance between two points is the length of a line segment between two points (also called Pythagorean theorem as shown below).
...
3.2.3Jaccard Distance
Measuring the Jaccard similarity coefficient between two data sets is the result of division between the number of features that are common to all and the number of properties as shown below (Niwattanakul, et al., 2013, p. 380-384).
...
4Methodology
We present a novel methodology for clustering the daily read newspapers based on political and economic characteristics using different similarity mechanisms. It is based on the trend lines of Wikipedia-concepts, and simple count of DMOZ-categories (Directory Mozilla) and Wikipedia-concepts related to the news events (see Figure 4). We built hierarchy of clusters and also compare the generated clusters with ground truth values.
We extract the 100 most-frequent concepts for all news events reported by each newspaper. Afterward, we generate the trend lines using The Hodrick Prescott filter (Bhowmik et al., 2021, p. 7-17) for each Wikipedia-Concepts per each daily read newspaper. An example of trend lines between two concepts is shown in Figure 3. Having a bunch of trend lines, we calculate the Dynamic Time Warping (DTW) distance between them using the cosine similarity for each daily read newspaper separately. We filter out the Wikipedia-concepts if their distance does not lie in threshold value of 0.1 (0.0 means the trends lines are absolutely similar whereas maximum value varies depending on the difference between trend lines). At this stage, we have pair of those Wikipedia-concepts that have similar trends over time for each newspaper. To calculate the distance between two newspapers, we measure the overlap between corresponding pairs to the newspapers. Then we built hierarchy of clusters using popular algorithm of hierarchical clustering called dendrograms. At each step in dendrograms, the two clusters that are most similar are joined into a single new cluster. We chose top four and six hierarchies separately (see Figure 5) for economic and political characteristics respectively. We revise this process similarly for 150 and 200 mostfrequent Wikipedia-concepts for all daily read newspapers.
We apply a second mechanism to calculate the similarity between the trend lines. We choose the same number of Wikipedia-concepts and set the same threshold value as we set in the previous method to filter out non-similar Wikipedia-concepts. The only difference is that we use aligned values between trend lines and compute the similarity using euclidean distance rather than DTW. As this method does not tackle the situation if two trend lines have different lengths over time, we cut out the extra line and only keep similar length of two trend lines (DTW do handle this situation). Further we built hierarchy of clusters in the similar way as we did in the previous mechanism.
Lastly, we extract two lists of the unique Wikipedia-concepts, and the DMOZ-categories, and two lists of all the Wikipedia-concepts, and all the DMOZ-categories. Then we compute the Jaccard similarity between the daily read newspapers based on these counting and built hierarchy of clusters and generate four and six clusters for economic and political characteristics respectively.
The GitHub repository containing the scripts is available at https://github.com/cleopatraitn/Trends_Clustering.
5Experimental Evaluation
5.1Evaluation Metric
To provide insights in how the economic conditions and political alignment of newspapers may be reflected in the events they report on, we first generate hierarchical clustering of the newspapers using the proposed methodology. Then we cut the hierarchy in a way to obtain as many clusters to match the predefined number of economic groups or political groups. Then we manually compare the generated clusters with our ground truth clusters (see Figure 2) by calculating accuracy on the economic condition and on the political alignment.
Accuracy: Accuracy is the most intuitive performance measure and best to use when we have symmetric data set where values of false positive and false negatives are almost same.
We consider each newspaper as one instance. To calculate the number of correctly grouped instances, we compare the output clusters with our original clusters (see Figure 2) of economic and political characteristics. For instance, in case of economic characteristics, if output cluster consist of a cluster with the single newspaper "Zaman", it means one instance is correctly grouped because there exists a cluster with same newspaper. Further, if output cluster contains one of the four clusters with the two newspapers "China Daily" and "Dawn", it means one instance is correctly grouped and one is incorrectly grouped, because there is a cluster with the two newspapers "The times of India" and "Dawn". Therefore, one is correctly grouped and one is incorrectly grouped. We use algorithm 1 to compare the output clusters with the original groups.
5.2Results and Analysis
Figure 5 shows three hierarchies of clusters built upon three different similarity mechanisms. First two diagrams present the hierarchies of clusters with Dynamic Time Warping (DTW) and Euclidean distance. Whereas third diagram shows the hierarchy of clusters using Jaccard similarity between the unique Wikipedia-concepts. While we follow the same mechanism to choose the four and six hierarchies in all three diagrams, so we will explain only first diagram. Considering our original clusters (see Figure 2), we choose top four and six hierarchies for economic and political characteristics respectively. For example, using the first diagram in figure 5, we create the following four clusters to compare with our original economic cluster (see Figure 2):
* "Zaman"
* "Dawn" and "The Times of India"
* "The Asahi Shimbun"
* "The Guardian", "The Washington Post", "The Sydney Morning Herald", "The Wall Street Journal", "New York Times"
Similarly, we create the following six clusters also using the same figure 5 to compare with our original political cluster (see Figure 2):
* "Zaman"
* "Dawn" and "The Times of India"
* "The Asahi Shimbun"
* "The Sydney Morning Herald"
* "The Wall Street Journal", "New York Times"
* "The Guardian", "The Washington Post"
Table 3 shows the results in form of overall accuracy for both economic and political attributes. For economic attribute, firstly highest results are achieved using most-frequent Wikipedia-concepts (88.89%). Then second-best performing mechanism is Jaccard similarity with 80% accuracy, and then lastly Euclidean distance with 77.78% accuracy. For political attribute, the best performing mechanism is Euclidean distance and Jaccard similarity with 77.78%, and 70.0% accuracy.
Based on the overall accuracy for economic attribute, we can say that the representation with Wikipedia-concepts is better than DMOZ-categories for capturing economic characteristic of the newspaper, while the opposite is true for capturing political attribute. Furthermore, it can be noticed that similarity using DTW is more suitable for economic characteristic but not for political characteristic.
6Conclusions and Future Work
Newspapers have different characteristics such as political alignment, economic values, cultural differences, reporting languages and geographical differences. In this paper, we focused on to find representation of the news events for two characteristics (political and economic) separately that could be able to cluster the input of text in their original groups. We represent the news events with set of concepts and content categories separately, create hierarchical cluster and compare the output clusters with original groups. Instead of just keywords, we also consider the trends of DMOZ-categories and Wikipedia-concepts. Moreover, we perform different similarity mechanisms (Dynamic Time Warming, Jaccard Distance, Euclidean Distance) before creating the clusters and see which mechanism is suitable for economic and political groups.
Our results (see Section 5.2) suggest that economic conditions of the country of the newspaper publisher reflect better with a set of concepts then content categories whereas content categories are more suitable for politically aligned groups of newspapers. Furthermore, results show that clustering using the Dynamic Time Warping similarity between trend lines of newspaper is better aligned with ground truth of economic groups and Jaccard distance using the frequent terms and Euclidean distance between the trend lines is more useful for ground truth of political groups.
Our research experiments only centered around economic and political attributes. In future, we would like to explore the other characteristics such as cultural, time-zone, geographical, and linguistic. We also have a plan to use advance tools of Natural language Processing (NLP) to detect the cultural differences in news events.
Acknowledgments
This work was supported by the Slovenian research agency under the project J2-1736 Causalify and cofinanced by the Slovenian Research Agency and the European Union's Horizon 2020 research and innovation programme under the Marie SkłodowskaCurie grant agreement No 812997.
References
Afonso, Alexandre Ribeiro and Duque, Cl{\'a}udio Gottschalg (2014). Automated text clustering of newspaper and scientific texts in brazilian portuguese: analysis and comparison of methods. JISTEM-Journal of Information Systems and Technology Management, 11, 415-436.
Lin, Carolyn A and Jeffres, Leo W (2001). Comparing distinctions and similarities across websites of newspapers, radio stations, and television stations. Journalism \& Mass Communication Quarterly, 78(3), 555-57.
Veltri, Giuseppe Alessandro(2012). Information flows and centrality among elite European newspapers. European Journal of Communicationy, 27(4), 354-375.
Chang, Tsan-Kuo and Lee, Jae-Won(1992). Factors affecting gatekeepers' selection of foreign news: A national survey of newspaper editors. Journalism Quarterly, 69(3), 554-561.
Wilke, J{\"u}rgen and Heimprecht, Christine and Cohen, Akiba(2012). The geography of foreign news on television: A comparative study of 17 countries. International communication gazette, 74(4), 301-322.
Jo, HyunChae and Park, Cheolyong (2018). Analysis of reporting characteristics of newspapers in the 19th presidential election based on random forest. The Korean Data \& Information Science Society, 29(2), 367-375.
Wei, H., Sankaranarayanan, J., & Samet, H. (2020). Enhancing local live tweet stream to detect news. GeoInformatica, 1-31.
Hong, X., Yu, Z., Tang, M., & Xian, Y. (2017). Crosslingual event-centered news clustering based on elements semantic correlations of different news. Multimedia Tools and Applications, 76(23), 2512925143.
Leban, G., Fortuna, B., Brank, J., & Grobelnik, M. (2014, April). Event registry: learning about world events from news. In Proceedings of the 23rd International Conference on World Wide Web (pp. 107-110).
Quezada, M., Pena-Araya, V., & Poblete, B. (2015, August). Location-aware model for news events in social media. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 935-938).
Bhowmik, D., & Poddar, S. (2021). Cyclical and seasonal patterns of India's GDP growth rate through the eyes of Hamilton and Hodrick Prescott Filter models. Asia-Pacific Journal of Management and Technology, 1(3), 7-17.
Heywood, A. (2017). Political ideologies: An introduction. Macmillan International Higher Education.
Strle, B., Mozina, M., & Bratko, I. (2009, June). Qualitative approximation to Dynamic Time Warping similarity between time series data. In Proceedings of the Workshop on Qualitative Reasoning.
Niwattanakul, S., Singthongchai, J., Naenudorn, E., & Wanapu, S. (2013, March). Using of Jaccard coefficient for keywords similarity. In Proceedings of the international multiconference of engineers and computer scientists (Vol. 1, No. 6, pp. 380384).
Segev, E. (2015). Visible and invisible countries: News flow theory revised. Journalism, 16(3), 412-428.
Wu, H. D. (2007). A brave new world for international news? Exploring the determinants of the coverage of foreign news on US websites. International Communication Gazette, 69(6), 539-551.
Filla, J., & Johnson, M. (2010). Local news outlets and political participation. Urban Affairs Review, 45(5), 679-692.
Schulz, A., Fletcher, R., & Popescu, M. (2020). Are news outlets viewed in the same way by experts and the public? A comparison across 23 European Countries. Reuters Institute for the Study of Journalism.
Camaj, L. (2010). Media framing through stages of a political discourse: International news agencies' coverage of Kosovo's status negotiations. International Communication Gazette, 72(7), 635653.
Martens, B., Aguiar, L., Gomez-Herrera, E., & Mueller-Langer, F. (2018). The digital transformation of news media and the rise of disinformation and fake news.
Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130-1132.
Maurer, P., & Beiler, M. (2018). Networking and political alignment as strategies to control the news: Interaction between journalists and politicians. Journalism Studies, 19(14), 2024-2041
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021. This work is published under http://archive.ceciis.foi.hr/app/index.php/ceciis/archive (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Abstract. The paper investigates relationship between world events as reported in newspapers and characteristics of the newspapers in terms of political alignment and economic conditions. We propose a novel methodology that includes selection, representation and clustering of the newspapers to analyse relationship of the events and characteristics of the newspaper. We represent world events by a set of concepts and content categories of the news articles reporting on the event. Each newspaper is represented by a set of events they reported about over several years. We investigate different similarity measures between the newspapers to see whether the newspapers with the same characteristics are reporting on similar events over a given time span. The results indicate: 1) the representation of the news events with the Wikipedia-concepts and DMOZcategories appears an appropriate way to understand relationships between the newspapers, 2) economic conditions of the country of the newspaper publisher reflect better in Wikipedia-concepts than when using representation with DMOZ-categories, whereas for identifying politically aligned groups of newspapers, DMOZ-categories stand out more suitable, 3) for capturing economic groups, clustering using the Dynamic Time Warping similarity between the trend lines of newspapers is better aligned with the ground truth groups than others tested similarities, whereas for capturing political group, Jaccard distance using the frequent terms and Euclidean distance between the trend lines turn up more useful.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer