Content area
Full text
Abstract-To defend against the ever increasingly frequent, varied and sophisticated cyber-attacks, timely analysis of cybersecurity threat information is critical. Open Source Intelligence (OSINT) is a method of collecting information from publicly available sources and then analyzing it to derive actionable intelligence. This paper presents the design and implementation of a prototype system named TwitterOSINT for automating the collection and analysis of cybersecurity related information (e.g., threats and vulnerabilities) posted on Twitter which serves as an OSINT source. The prototype system was implemented in Java and it used Twitter Streaming API to download relevant tweets based on a set of user provided keywords. The selected tweets were next processed and analyzed by a Natural Language Processing (NLP) module which used the Stanford CoreNLP library and language models for foundational NLP capabilities, and the Stucco cyber domain-specific entity extraction library developed and provided by the Oak Ridge National Lab. The processed tweets were then stored in a JavaScript Object Notation (JSON) formatted file along with all the annotations produced by the NLP module for these tweets. Next, the TwitterOSINT system used the Elastic Stack (Elasticsearch, Logstash and Kibana) to collect, index, store, analyze, manage, and visualize the annotated tweets, and help derive intelligence using the built-in data analytic and machine learning capabilities provided by the Elastic Stack. A preliminary experiment was conducted to gain operational experiences and insights with this system.
Keywords-Cyber intelligence, OSINT, Twitter, natural language processing, Elastic Stack
I.Introduction
The impact of cyber-crime has necessitated the government and private-sector organizations across the world to tackle cyber threats. All sectors are now facing similar dilemmas of how to best mitigate against cyber-crime and implement best practices effectively. Extracting actionable and high value intelligence by harvesting public information is emerging rapidly as an important means for cyber defense. As the amount of information available from open sources rapidly increases, countering cyber-crime increasingly depends upon advanced software tools and techniques to collect, process, analyze and leverage the information in an effective and efficient manner.
Open Source Intelligence (OSINT) refers to intelligence that has been derived írom publicly available sources such as news articles, blogs, and social media. Using automated OSINT collection and analysis tools and methods, government and private sectors alike can be better prepared to avert...