Content area
The advent of Large Language Models (LLMs) has revolutionised natural language processing, providing unprecedented capabilities in text generation and analysis. This paper examines the utility of Artificial-Intelligence-assisted (AI-assisted) content analysis (CA), supported by LLMs, as a methodological tool for research in Information Science (IS) and Cyber Security. It reviews current applications, methodological practices, and challenges, illustrating how LLMs can augment traditional approaches to qualitative data analysis. Key distinctions between CA and other qualitative methods are outlined, alongside the traditional steps involved in CA. To demonstrate relevance, examples from Information Science and Cyber Security are highlighted, along with a new example detailing the steps involved. A hybrid workflow is proposed that integrates human oversight with AI capabilities, grounded in the principles of Responsible AI. Within this model, human researchers remain central to guiding research design, interpretation, and ethical decision-making, while LLMs support efficiency and scalability. Both deductive and inductive AI-assisted frameworks are introduced. Overall, AI-assisted CA is presented as a valuable approach for advancing rigorous, replicable, and ethical scholarship in Information Science and Cyber Security. This paper contributes to prior LLM-assisted coding work, proposing that this hybrid model is preferred over a fully manual content analysis.
1. Introduction
Generative Artificial Intelligence (AI) has advanced rapidly since late 2022, when ChatGPT 3.5 was first released to the public (30 November 2022) [1,2]. Universities have struggled to respond to the use of tools such as OpenAI’s ChatGPT in assessments, amid concerns that students may outsource work and therefore fail to master core knowledge and skills or demonstrate independent learning [3,4,5,6]. Alongside these integrity risks, it is equally important to examine how AI can be used responsibly to support scholarship—for example, to brainstorm, locate literature, assist with coding and data cleaning, improve accessibility, and scaffold writing—within clear pedagogical guidelines and assessment designs [7,8,9,10].
AI, especially large language models (LLMs), can also assist researchers across the project lifecycle [11,12,13,14,15,16,17]. Tools, such as Rayyan and related platforms, can assist in the dual coding and screening of systematic literature reviews [18,19]. For scientific writing and analysis, LLMs can enhance writing quality and reduce task time, with the most significant gains observed for writers with lower baseline skills [20,21].
This paper proposes that Artificial Intelligence (AI), particularly large language models (LLMs), can serve as a valuable tool for conducting content analysis (CA). Drawing on the disciplines of Information Science (IS) and Cyber Security, this paper illustrates, through a review of the literature, how AI-assisted CA can enhance methodological rigour, while acknowledging that the approach can be applied across a wide range of disciplines. The discussion begins by outlining the relevance of CA as a methodology for Information Science (IS) and Cyber Security, followed by an overview of its core principles. It then provides a set of guidelines to ensure validity and reliability when employing AI-assisted CA. To demonstrate the practical application of this approach, this paper presents examples of a CA of disinformation, illustrating how researchers can systematically integrate AI tools into their analysis.
1.1. Rationale
Advances in AI and natural language processing (NLP) make it timely to articulate how AI-assisted content analysis (CA) can strengthen empirical inquiry in IS and Cyber Security. Traditional CA provides a transparent and systematic approach for describing the manifest and latent features of texts [22], yet the volume, velocity, and heterogeneity of contemporary digital corpora—threat reports, incident tickets, logs, social media streams, forums, and policy documents—strain purely manual approaches common in social science methods. AI-assisted content analysis (CA) offers a potential solution to the challenges posed by ‘big data’ by enabling researchers to process large and complex datasets while maintaining methodological rigour. Importantly, such approaches can enhance the reliability and validity of research by creating reproducible audit trails, thereby supporting transparency and replicability. At the same time, AI-assisted CA retains the critical role of human oversight, ensuring that construct validity and contextual interpretation are preserved [23].IS and Cyber Security, where many researchers originate from technical or applied traditions and may have limited formal training in qualitative/quantitative social-science methodologies [24]. AI-assisted CA offers a pragmatic bridge: it encodes the procedural discipline of CA—clear units of analysis, explicit coding rules, and reliability checks—while utilising algorithms to manage scale and to standardise repetitive tasks, thereby reducing researcher burden and variance [25].
Crucially, the promise of AI-assisted CA hinges on rigorous governance to mitigate construct drift, dataset bias, and spurious correlations. Best practice includes preregistered coding protocols, reliability assessment for human and model-assisted labels, and documentation of the process [23,26,27]. Such guardrails ensure transparency and replicability—core scientific values that remain unevenly implemented in Information Science and Cyber Security—while allowing these fields to exploit AI’s strengths for scale and consistency without sacrificing theoretical fidelity or interpretability [23].
1.2. Differences Between CA and Thematic Analysis (TA) and Discourse Analysis (DA)
Content analysis (CA), thematic analysis (TA), and discourse analysis (DA) represent three prominent approaches to studying textual and communicative materials, differing in their epistemological stance, analytic focus, and intended outcomes [28,29,30]. CA is traditionally systematic and rule-based, oriented toward coding manifest or latent content into predefined categories to generate replicable and, often, quantifiable results [22]. TA, by contrast, emphasises the identification and interpretation of patterned meanings or themes across datasets, offering flexibility but typically less emphasis on replicability or quantification [31]. DA differs further, treating language not as a neutral medium but as a constitutive social practice that shapes identities, power relations, and ideologies [32]. Because CA is built on coding frameworks, explicit decision rules, and reliability testing, it aligns more directly with the affordances of AI, particularly LLMs [33,34]. AI-assisted CA can automate classification, support consistent application of coding rules, and scale analyses across large corpora, all while maintaining the possibility of reliability checks against human coders [26]. In contrast, the more interpretive and reflexive orientations of TA and DA pose challenges for automation, as they rely heavily on researcher judgment, contextualisation, and theoretical framing. These two approaches tend to also be more qualitative than quantitative. For these reasons, this paper focuses on the promise of AI-assisted CA.
1.3. Current Study
This paper examines the utility of AI, particularly LLMs, as a methodological tool for conducting CA. Situated within the field of Information Science and Cyber Security, the discussion illustrates how AI-assisted CA can strengthen methodological rigour while recognising that the approach is broadly transferable across disciplinary contexts. This paper synthesises insights from prior research and introduces a new applied example that outlines the practical strengths for implementation. In addition, it offers a critical appraisal of the strengths and limitations of AI-assisted CA, evaluating both its promise and its challenges for future research practice.
2. Defining Content Analysis
CA is a systematic, rule-governed method for making valid and replicable inferences from texts (or other meaningful material, such as photographs or other visual materials) to the contexts of their use [22,26]. At its core, it transforms qualitative materials, such as social media posts, news articles, interview transcripts, policy documents, or images, into analysable data [35,36,37]. The process involves deciding on the content that requires analysis, sampling from this content, coding, and finally drawing inferences [38,39]. Disciplines that more commonly utilise this methodology include psychology, media and communications, health, political science, and, more recently, information science [40,41] and Cyber Security research [42,43].
CA can be either quantitative or qualitative. Quantitative content analysis seeks to measure manifest or latent variables and to test hypotheses using counts, cross-tabulations, or statistical modelling [29,44]. It relies on a priori categories, carefully trained coders, and formal reliability estimation (e.g., Krippendorff’s α or Cohen’s Kappa) to measure coder agreement [22]. In contrast, qualitative content analysis is oriented toward interpreting meaning and context. It may use inductive coding to build categories from the data or integrate theory more flexibly, while still maintaining transparency through audit trails and explicit decision rules [45,46].
There are four primary purposes for conducting CA. First, it maps and describes the distribution of topics, frames, or sentiments across large corpora (e.g., how often a newspaper takes a conservative compared to a liberal perspective) and over time [47,48]. Second, it tests theory by operationalising abstract constructs—such as moral foundations, stigma, or strategic framing—into observable categories that can be associated with predictors or outcomes [49]. Third, it compares groups and contexts (e.g., focus groups, cultural groups) [39,50,51]. Fourth, it supports mixed-methods inference, such as triangulation [52] or as a starting point to guide future studies [53].
Notably, CA can be theoretically or atheoretically driven [54]. A theoretical content analysis starts from existing theory, models, and hypotheses and uses them to define categories and decision rules a priori [55]. In contrast, an atheoretical approach, often referred to as inductive content analysis, begins with minimal a priori categories and allows patterns and themes to emerge directly from the data [30,56].
As aforementioned, content analysis distinguishes between coding manifest and latent content. Manifest content refers to what is explicitly present and readily observable, such as the presence or frequency of specific words, phrases or images [57,58]. Methodologically, manifest coding prioritises operational clarity and replicability. Categories are often defined a priori with clear inclusion rules, which allows researchers or automated procedures, such as Generative AI, to code. This approach typically achieves higher intercoder reliability because coders evaluate observable features with limited discretion. The trade-off is that manifest coding can underrepresent constructs whose meaning is context-dependent. Latent content, in contrast, involves inferring the meaning behind what is written, such as identifying a ‘moral panic’ code or an attitude position that is not stated verbatim [59,60]. Because latent coding is designed to capture meaning, connotation, and underlying patterns in text, coders are required to interpret segments in light of the broader context, theoretical constructs and discourse conventions. Reliability is, therefore, more challenging for latent coding [53]. The strength of latent coding lies in construct validity, which is essentially how well a theoretical concept is measured.
CA has notable strengths: versatility across media types, alignment with theory testing, and compatibility with both qualitative and quantitative paradigms. Its limitations arise when categories are not clearly defined (needed for validity and reliability), coder training is insufficient (needed for reliability), or the contextual nuance is lost through over-reduction [47]. In brief, content analysis provides researchers with a useful tool to transform meaning-rich materials into evidence that can test theories or develop new ones.
Similarities and Differences Between Content, Thematic, and Discourse Analyses
Qualitative and mixed-methods researchers frequently draw upon content, thematic and discourse analyses for examining textual and communication materials. While these approaches share some broad similarities, it is important to distinguish the differences with clarity (see also summary in Table 1). The choice of method should be guided by the researcher’s aims, objectives, and specific research questions, as well as by the epistemological paradigm underpinning the study—for example, a positivist orientation versus a postmodern or constructionist perspective [61,62].
At a general level, all three methods provide systematic frameworks for moving from raw textual data toward more abstract interpretations of meaning. Each involves processes of coding, whether this is operationalised as assigning categories to manifest or latent content (content analysis), clustering codes into themes (thematic analysis), or identifying discursive repertoires and linguistic strategies (discourse analysis) [63]. Moreover, all approaches rely on transparent documentation of coding decisions, reflexivity about the researcher’s interpretive role, and explicit links between analytic claims and textual evidence [63,64].
One of the primary distinctions lies in their epistemological foundations. CA is traditionally rooted in a positivist/post-positivist orientation, aiming to produce replicable, reliable measures of manifest and latent variables within texts [22]. In contrast, thematic analysis is more flexible and can be applied across positivist, critical realist, or constructionist paradigms, with reflexive thematic analysis emphasising researcher subjectivity and meaning-making [31,65]. Discourse analysis, particularly in traditions such as critical discourse analysis or discursive psychology, is explicitly constructionist, treating language as a social practice that constructs realities and reproduces power relations [32].
Another key distinction between these qualitative methods lies in their analytical focus. CA focuses on the content (what is written) and the frequency and distribution of categories or codes, often producing quantitative summaries alongside qualitative interpretation [22]. In contrast, thematic analysis seeks to identify patterns of meaning across a dataset, offering rich, narrative accounts of themes and how they relate to one another and to broader research questions [65]. Discourse analysis, by contrast, investigates how language is used to perform actions, construct subjectivities, and embed ideologies, often attending to grammar, metaphors, intertextuality, and rhetorical strategies [32].
The procedures for carrying out these analyses also differ. CA typically employs a structured coding frame, explicit decision rules, and intercoder reliability checks, sometimes incorporating computational tools for large-scale text analysis [22]. Thematic analysis can be more interactive, iterative, and flexible, with phases including data familiarisation, code generation, theme development, and refinement. Like CA, inter-coder reliability can be important; however, in the more modernised version of thematic analysis, reliability is conceptualised in terms of reflexivity and transparency [31,66]. Prescriptive procedures are not required for discourse analysis, which emphasises the close reading of extracts, contextualisation within broader social and institutional structures, and theoretically informed interpretation of how discourse enacts power and identity [32].
3. The Utility of CA for Information Science and Cyber Security Research
Information Science and Cyber Security are closely linked through their shared focus on information as both a resource and an asset to be protected. Information Science examines the creation, organisation, retrieval, and use of information within socio-technical systems [67,68,69], while Cyber Security seeks to safeguard that same information and the systems that process it from unauthorised access, disruption, or misuse [69]. Their connection is evident in overlapping concerns with privacy, trust, data governance, and human behaviour. However, as argued by von Solms and Niekerk (2013) “traditional information security to include not only the protection of information resources, but also that of other assets, including the person him/herself [69]. In information security, reference to the human factor usually relates to the role(s) of humans in the security process” (p. 97). It is argued here that as emerging technologies such as cloud computing, AI, and digital infrastructures continue to blur disciplinary boundaries, collaboration between Information Science and Cyber Security has become essential for designing environments that are not only efficient and usable but also secure and trustworthy.
In Information Science, CA may enable research on how information is organised, communicated, and used across platforms and communities. Researchers can assess metadata and case studies, information behaviour acquired through interviews, and observational studies, and examine policies (e.g., organisational, defence, intelligence and government).
With respect to Cyber Security, CA could be used to examine phishing emails, security awareness materials, threat-intel reports, policy and regulatory guidance. More broadly, CA could be used to support threat and vulnerability landscape mapping, policy and governance analysis, human factors and security culture and malicious communication analysis.
Examples of how CA has been used in Information Science and Cyber Security research are shown in Table 2.
3.1. Steps Involved in a Traditional CA
3.1.1. Reliability and Validity
Ensuring that research is both valid and replicable requires strict adherence to established methodological procedures. This is particularly critical in Information Science and Cyber Security, where many researchers are trained in technical or applied domains but often lack systematic exposure to methodological traditions common in the social sciences [24]. The absence of such training risks undermining the rigour of empirical inquiry, leading to studies that may be difficult to replicate, insufficiently transparent, or methodologically inconsistent. Without careful engagement with established research methods, findings in these fields risk being interpreted as ad hoc or lacking in scientific credibility [22]. Accordingly, cultivating methodological literacy is not merely a matter of academic thoroughness but a prerequisite for producing research that withstands critical scrutiny and contributes meaningfully to interdisciplinary scholarship.
3.1.2. Research Questions, Coding Frames and Unit of Analysis
As stated in this paper, CA is a systematic and replicable method for examining communication materials to identify patterns, themes, and meanings [26], which follow a series of steps (see Table 3). The process typically begins with the formulation of clear research questions that guide the analysis, followed by the development of a coding framework. Next, the content that will be examined is considered. Defining the unit of analysis—whether words, sentences, paragraphs, or entire documents—is the next crucial step, as it shapes the scope and granularity of the findings [22]. Researchers must then select an appropriate sampling strategy to ensure the representativeness of the material under investigation.
3.1.3. Coding Scheme
Once materials are collected, researchers construct a coding scheme based on either deductive categories derived from theory or inductive categories emerging from the data itself [76]. Coders then systematically assign codes to segments of text, ensuring that coding rules are applied consistently across the dataset. To maintain rigour, inter-coder reliability checks are often conducted, using measures such as Cohen’s Kappa (0.41–0.60 moderate, 0.61–0.81 substantial and 0.81–1.0 almost perfect) or Krippendorff’s, α 0.80 for confirmatory studies, and 0.67–0.80 for explanatory work, to assess the level of agreement among independent coders [77].
3.1.4. Analysis and Interpretation
After coding is complete, the data are analysed to identify frequency distributions, co-occurrences, and patterns that reveal the underlying structure of the communication. Depending on the epistemological orientation, the analysis may emphasise quantitative aspects—such as the frequency of word categories—or qualitative interpretations that explore meanings and latent content [53]. The final stage involves interpreting the findings in light of the research questions, theory, and broader context, and presenting them in a way that enhances understanding of the phenomena under study [22].
4. Employing AI-Assisted CA
In the last few years, researchers have increasingly used large language models (LLMs) to automate pieces of the CA workflow—building or adapting codebooks, coding text at scale, and serving as ‘second coders’ for reliability checks. Several empirical studies report that for certain well-specified tasks (e.g., sentiment, simple categorical codes), modern LLMs can reach agreement levels comparable to human coders, though performance varies by construct and model, and reliability on subtle, latent constructs (e.g., sarcasm) remains uneven [78,79]. For example, Bojić et al. (2025) employed multiple LLMs with 33 human annotators and found that humans and leading LLMs were similarly reliable for sentiment and political leaning; however, both struggled on sarcasm [78].
One of the obvious benefits of AI-assisted CA is a saving on researchers’ time and resources [80,81,82]. Morgan (2023) found that ChatGPT conducted a thematic analysis in 2 h, which compared to 23 h when human coders conducted the same task [80]. Similarly, Fuller et al. (2024) reported that ChatGPT was able to analyse 470 free-text survey responses within 10–15 min, whereas human researchers required an average of 27.5 min to complete the same task [81].
A natural concern is whether the substantial reduction in analysis time achieved through AI-assisted content analysis (CA) comes at the cost of accuracy. As noted earlier in this paper, current evidence suggests that AI-assisted CA can achieve levels of reliability comparable to human coding, and this reliability is expected to improve as the technology continues to advance [83]. For example, Bijker et al. (2024) employed ChatGPT to analyse forum posts in which individuals shared their experiences of reducing sugar consumption and found that inter-rater reliability with a human coder was almost perfect [83]. Nevertheless, the accuracy of AI coding remains contingent on the type and complexity of data being analysed, with some constructs—particularly latent or nuanced categories—posing greater challenges. Importantly, methodological refinements, such as providing contextual information and iteratively tailoring prompts, have been shown to improve reliability [84].
AI-assisted content analysis (CA) has the potential to generate novel insights into data beyond those typically achieved through manual approaches. For instance, AI systems can be prompted repeatedly to yield alternative interpretations of the same dataset, thereby exposing different analytical angles [85]. Moreover, they can be directed to apply specific theoretical or methodological lenses, enabling researchers to explore data through multiple perspectives in a systematic and scalable manner [85]. Hitch (2024) has suggested that AI could be used as a tool to supplement human coding as part of a reflexive, collaborative approach to analysis [85]. Recent empirical work illustrates these strengths while highlighting the importance of methodological safeguards. Yan et al. demonstrated that ChatGPT not only improved coding efficiency and supported initial data exploration but also provided granular quantitative insights—even for users with less methodological experience—underscoring the need for transparent explanatory mechanisms and iterative validation in human–AI workflows [86]. Similarly, Turobov et al. (2024) found that ChatGPT can serve as a capable assistant in thematic analysis, improving methodological efficacy when used in a validated, transparent, and reliability-conscious manner [87].
Examples of studies that have employed AI-assisted content analysis are shown in Table 4. As shown in Table 4, the studies employed a range of LLMs—although some do not report which models. Bojić et al. (2025) reported that LLMs and humans struggled with coding for sarcasm [78]. Inter-coder reliability was typically considered by carrying out a statistical test, such as Krippendorff’s alpha or Cohen’s κ.
4.1. Responsible AI
There has been much discussion in recent times about Responsible AI. Responsible AI refers to the principles and governance practices that ensure AI systems are developed and used in ways that are lawful, ethical, transparent, safe, and accountable [91,92]. The OECD Principles on AI (OECD, 2019) emphasise inclusive growth, human-centred values, transparency, robustness, and accountability, while the European Commission’s Ethics Guidelines for Trustworthy AI (EU, 2019) outline seven core requirements, including human agency, technical robustness, data governance, and societal well-being [91,92].
In empirical settings, Responsible AI considerations often surface when unintended harms become evident. For example, Buolamwini and Gebru’s well known study on facial recognition revealed high error rates for women and people with darker skin tones, underscoring the need for fairness audits and bias mitigation [93]. Similarly, studies on predictive policing algorithms highlight risks of amplifying structural inequalities unless explicit governance interventions are applied [94]. These cases demonstrate that Responsible AI is not simply aspirational but necessary to prevent harms in high-stakes contexts [94].
Arguably, Responsible AI also reinforces research-ethics obligations when CA involves informing real world decisions, such as policy around securing a nation state. It also may involve careful consideration of texts containing personal or sensitive information. Privacy-respecting data governance—including careful sampling, minimisation, lawful basis for processing, and attention to contextual integrity—is essential, particularly when scraping or analysing user-generated content [95]. Equally important is the accountability of employing AI-assisted CA and concerns around validity and reliability. Researchers need to clearly set out the data used, the procedures employed in their studies, the development of a codebook and evidence of intercoder reliability [26]. In summary, Responsible AI provides the normative and procedural infrastructure that allows AI-assisted CA to scale analysis while safeguarding validity, fairness, and reproducibility. Embedding these practices moves AI-assisted CA beyond convenience automation toward theory-consistent, ethically defensible, and auditable research that can withstand interdisciplinary scrutiny in Information Science and Cyber Security.
4.2. Hallucination Phenomenon and Error Mitigation
One important limitation of AI-assisted CA is the hallucination phenomenon, a well-documented challenge in the use of LLMs. As Ahmadi observes (2024), “as these language juggernauts sift through immense volumes of data, the line between accurate interpretation and creation generation can blur” (p. 1) [96]. In practice, generative AI systems can produce content that appears linguistically fluent and conceptually plausible but is ultimately nonfactual [97,98,99]. This poses a particular risk for researchers, as outputs may be accepted at face value without sufficient domain expertise to recognise inaccuracies or fabrications. For AI-assisted CA, such hallucinations have direct implications for both reliability and validity: reliability may be undermined if the same input generates inconsistent outputs across iterations, while validity is threatened when fabricated or distorted information is coded and treated as empirical evidence. Addressing this limitation requires rigorous oversight, triangulation with human coders, and transparent reporting of verification procedures.
4.2.1. Verification Playbook
It is also important to consider the broader issue of ‘model fallibility’ [100]. LLMs are neutral coders but probabilistic systems that generate outputs based on statistical associations rather than grounded understanding. This necessitates the introduction of verifiable controls to safeguard the integrity of AI-assisted CA. Recommended practices include conducting cross-run agreement checks to ensure stability across repeated model queries, using evidence-anchored prompts that require citation of specific text spans from the source corpus, and enforcing a strict rule that no model assertion is accepted unless it links directly to a verifiable snippet in the underlying data. To operationalise these safeguards, researchers can employ a ‘verification playbook’, which documents common failure types, such as hallucinated references, misclassification of nuanced categories, or omission of contextual qualifiers and prescribes targeted countermeasures for each. Such measures not only strengthen the reproducibility and credibility of AI-assisted CA but also make explicit the limitations of LLMs, underscoring the need for human oversight and methodological transparency.
4.2.2. Errors Related to Cyber Security and Information Science
In the domain of Cyber Security, AI-assisted analysis faces unique risks stemming from the highly dynamic and adversarial nature of the field. Edge cases include fabricated indicators of compromise (IoCs) inserted into threat reports and possibly synthetic vulnerability identifiers that mimic established formats without grounding in authoritative registries. If unverified, such artefacts risk being coded as legitimate, undermining both the validity of findings and the reliability of subsequent analyses. This demonstrates the importance of the human-in-the-loop for CA—especially in these disciplines.
4.3. Hybrid Approach
Currently, a hybrid approach that combines the strengths of humans is recommended, given the current state of generative AI and large language models (LLMs). As demonstrated in this paper, the combination of AI and human expertise may improve accuracy and reduce the time required to conduct this type of research. Such an approach allows researchers to leverage the efficiency of AI-assisted content analysis (CA) while ensuring that its application remains responsible and ethically grounded [82]. While LLMs can rapidly process large volumes of text, generate coding suggestions, and reveal alternative interpretive frames, human coders remain essential for safeguarding construct validity, identifying hallucinations, and ensuring that coding decisions align with theoretical frameworks and empirical realities. This balance mitigates risks associated with over-reliance on automation and ensures that AI outputs are subject to critical scrutiny, verification, and contextual interpretation. In this sense, hybrid approaches not only embody ethical and responsible practice but also enable CA to benefit from the scalability of AI while preserving the interpretive depth and validity that underpin rigorous social science research. There may, of course, be some caveats to choosing AI-assisted CA. In cases where complex themes, irony, and sarcasm require coding, a manual approach may be preferred.
5. Stages Recommended for an AI-Assisted CA
Drawing from the research summarised in this paper, a workflow hybrid pipeline with humans leading the research has been outlined (see Table 5 and Table 6). In this recommended approach, humans lead research question formation and codebook development and LLMs support the generation of categories/codes, the coding, and their definitions and the interpretation.
5.1. Deductive AI-Assisted CA
5.1.1. Research Questions, Population Texts, Samples and Units of Analysis
In a deductive approach (see Table 5), the process begins with the formulation of research questions by human researchers, grounded in theoretical frameworks and prior empirical evidence. Researchers then identify the population of texts to be analysed and select an appropriate sample from this corpus in line with methodological requirements. The next step involves determining the units of analysis—such as words, sentences, paragraphs, documents, or visual artefacts—which are defined by the human researchers as the basis for coding.
Once the units are established, both human researchers and LLMs contribute to the development of the coding scheme. Human researchers initially generate categories or codes with accompanying definitions, informed by research questions, theory, and prior studies. They also specify inclusion and exclusion criteria. In parallel, LLMs are prompted with the same research questions, theoretical perspectives, and key findings from the literature to generate their own proposed categories, definitions, and criteria. The human researchers critically evaluate these outputs and make the final decision on the coding scheme. The human researchers then construct a formal codebook, typically in a spreadsheet, containing the categories, definitions, and inclusion/exclusion rules, ideally supplemented with illustrative examples and counterexamples.
5.1.2. Coding and Reliability
Coding is carried out by both humans and LLMs through an iterative process of training and refinement. A subsample of data is independently coded using the same scheme, and the outputs are compared to identify discrepancies and refine category definitions. This cycle continues until satisfactory consistency is achieved. Once the scheme is stabilised, the entire dataset is coded. Reliability is subsequently assessed statistically by comparing agreement between human coders and LLM outputs, using established measures such as Krippendorff’s α or Cohen’s κ.
5.1.3. Synthesising or Quantifying and Interpreting
The next stage of analysis depends on whether a qualitative or quantitative approach is chosen. Qualitative analysis concentrates on identifying and synthesising thematic patterns within the data, while quantitative analysis employs suitable statistical techniques to examine relationships, frequencies, or distributions across codes. Ultimately, the interpretation and reporting of findings are primarily carried out by human researchers. Although LLMs can be cautiously used to generate alternative interpretive perspectives—when provided with research questions, theoretical frameworks, and results—such outputs must be critically validated against the academic literature to reduce the risk of hallucination. In the end, responsibility for interpretation and reporting lies with human researchers to ensure the validity and integrity of the analysis.
5.2. Inductive AI-Assisted CA
5.2.1. Immersion, Code Development and Definitions
The Inductive AI-Assisted CA proposed here is very similar to the Deductive approach (Table 6). It differs at Stage 5, where human researchers begin by reading and re-reading the dataset to develop a deep understanding of its context and nuances. From this immersion, they generate an initial set of codes and definitions that address the research questions. In parallel, large language models (LLMs) are provided with the research questions and tasked with producing a preliminary set of codes and definitions. The outputs from humans and LLMs are then compared, enabling researchers to reconsider, cluster, and refine the categories where appropriate.
5.2.2. Developing a Codebook
Subsequently, both human researchers and LLMs conduct a second round of independent coding, which may involve the development of subcategories or more fine-grained codes. At this stage, inclusion and exclusion criteria are further clarified. The human researchers critically examine the combined sets of findings, make final decisions regarding the coding scheme, and resolve any discrepancies. The definitive codebook is then produced, typically in a spreadsheet, and includes the categories or codes, their operational definitions, and explicit inclusion and exclusion criteria, ideally supplemented with illustrative examples and counterexamples.
5.2.3. Reliability and Reflexivity
Stage 7 in this workflow differs slightly from the deductive AI-assisted CA process. While reliability may still be assessed quantitatively using statistical measures, it can alternatively be enhanced through qualitative strategies such as coder triangulation, reflexivity, and iterative consensus-building. These approaches place greater emphasis on collective judgment and critical reflection, thereby strengthening the robustness of the findings.
5.3. Practical Implications
The steps set out in Table 5 and Table 6 are a general guide. In practical terms, researchers may need to adjust these steps according to the type of data being examined. Piloting for more complex data may be important, and researchers may wish to obtain confidence scores and take these into account when assessing the reliability. Careful planning of human time, taking into account the complexity and corpus sizes, will also need to be considered.
5.4. Apply in Information Science and Cyber Security Research
While the hybrid workflows outlined above are broadly applicable across disciplines, AI-assisted content analysis is argued to hold promise for researchers in Information Science and Cyber Security, where the scale, complexity, and velocity of textual data demand innovative methodological solutions. Both disciplines frequently involve working with large, fast-moving, and heterogeneous textual corpora, including technical reports, threat intelligence feeds, logs, social media discussions, and policy documents [101]. Traditional, fully manual CA is often infeasible in these contexts due to the scale and velocity of the data. By leveraging large language models (LLMs), researchers can expedite the coding of massive datasets, identify emergent patterns, and even simulate multiple interpretive frames, thereby supporting richer and more timely analyses [86]. For Information Science, AI-assisted CA can enhance understanding of how information is produced, shared, and trusted across digital networks, while in Cyber Security it provides tools to monitor online threat landscapes, detect persuasive techniques in malicious communication (e.g., phishing, disinformation), and analyse the diffusion of security-relevant information [15,102,103].
5.5. Limitations
There are notable limitations that also require critical reflection. Most prominently, the hallucination phenomenon—where LLMs generate fluent but factually incorrect or unsupported content—poses acute risks in these domains [99]. In Cyber Security, hallucinations could manifest as fabricated technical indicators of compromise, misattributed attack vectors, or non-existent vulnerabilities, leading to flawed or even harmful intelligence reporting. In Information Science, hallucinations may distort analyses of trust, misinformation, or communication flows by producing spurious themes or invented references. Such errors can undermine both reliability—if repeated runs generate inconsistent categories—and validity, when fabricated content is mistakenly treated as empirical evidence.
Additional challenges include the risk of algorithmic bias, particularly when LLMs are trained on data that underrepresents technical jargon, non-Western threat actors, or niche information ecosystems, thereby skewing the resulting codes or themes [104]. The opacity of LLM decision-making also raises concerns for transparency and replicability, key principles of scientific inquiry [104]. Finally, over-reliance on AI tools may encourage researchers to under-engage with the interpretive and theoretical dimensions of CA, which are essential for contextualising findings in socio-technical settings.
At present, given the limitations of AI-assisted CA for highly nuanced constructs with low inter-rater reliability, human content analysis is still preferable. However, that is likely to change in the near future as AI continues to develop. Additionally, for legal and safety concerns related to some data addressing legal or safety research questions, human coders may be preferred. As a further note, there are nuances between the different types of LLMs. Whilst a discussion of these differences is beyond the scope of this paper, future reviews and AI-assisted content analyses may consider these differences.
6. Practical Example
For demonstration purposes, a small deductive AI-assisted CA was run on a subset of a database of disinformation and genuine news posts about the pandemic sent during the time of the pandemic [105] that had previously been coded in a thematic analysis of persuasive themes informed by the Elaboration Likelihood Model (ELM) [106]. ChatGPT 5.0 was used to code two of the themes previously coded by two researchers. The data included anonymised Facebook posts.
The first theme was ‘emotional appeals’. ChatGPT was prompted with the exact definition that the researchers used to code for this theme: “An approach used to affect the emotions of others (for example, but not excluding other emotional appeals, fear, sadness, joy, humour, love, beauty, security)”. It was told to search for phrases or words. ChatGPT was also asked to report its confidence levels in its decision, categorising them as low, high, and medium, and provide text extracts to support its answer. In its first iteration, it picked up 62.5% of the same coding as the researchers. This elicited a Cohen’s Kappa of 0.45, which was moderately reliable. The ones that were not coded as finding this theme were all said to have low confidence. The ones that had not been coded were compared with the original coding by the researchers, and it was found that ChatGPT had missed picking up emotions associated with anxiety, stress and depression. It was then asked to recode to also code for text which mentioned anxiety, stress and depression. This time it reached 95% agreement. This elicited a Cohen’s Kappa of 0.90, which is almost perfect inter-rater reliability. The one that ChatGPT identified as having this theme, which the coders did not, was re-examined, and it was found that the coders had missed this text. The text that had been coded was also compared with the texts the coders identified, and it was found that the researchers and coders were considering the same text extracts.
The second theme was repetition. The definition this time was “repeated phrases (can be verbatim or similar)”. It was also prompted to exclude generic terms like ‘COVID’, ‘vaccine’, ‘United States’, or trivial function phrases. There were two discrepancies; ChatGPT coded two of the posts as repetition that the coders had not. A closer inspection revealed that ChatGPT had chosen a trivial phrase for ‘in other words’ but a relevant one for the other ‘1151 claims’ that the coders had missed.
This example demonstrates the importance of keeping the ‘human-in-the-loop’ with AI-assisted CA as well as the utility of this tool (given identified themes missed by the coders). It may, therefore, serve to improve reliability. Notably, this was with a small dataset and two relatively uncomplicated themes. Nonetheless, it does highlight the utility of this in content analysis coding.
7. Conclusions
This paper has advanced the case for adopting a hybrid approach to AI-assisted content analysis (CA), with particular application in the fields of Information Science and Cyber Security. Compared to disciplines such as the social sciences, researchers in these areas are often less familiar with CA as a methodological tool. Nonetheless, the approach offers considerable potential to address a wide range of research questions, from the analysis of information flows to the study of online harms and cyber threat communications.
At the same time, concerns regarding validity and reliability must be taken seriously—even, and perhaps especially, in these technically oriented disciplines. To address this, this paper presented a framework for conducting AI-assisted CA that places human researchers firmly in the driver’s seat. The model sets out a hybrid workflow process, with clearly defined steps to guide researchers in conducting research that is valid, reliable, replicable, transparent, and ethical. Particular attention was given to issues of Responsible AI, including the risks of over-reliance on automated tools, which may discourage researchers from engaging with the interpretive and theoretical dimensions of CA. The discussion also underscored the problem of hallucination, whereby large language models generate plausible but nonfactual content, posing risks to both the reliability and validity of findings. Caveats around also raised regarding the use of AI-assisted CA, particularly in cases where complex themes, such as irony and sarcasm, necessitate a manual approach.
Future methodological work should prioritise benchmarking LLM agreement on complex constructs, developing robust approaches for multi-coder reliability, and establishing transparent protocols for managing version drift. By foregrounding these challenges, this paper positions itself not as a fixed recipe but as a foundation for a broader research agenda that advances the rigour and reproducibility of AI-assisted content analysis. The proposed framework should not be viewed as a definitive solution but rather as a foundation that will evolve alongside technological advances. As AI models improve, workflows for integrating them into CA may need to be redeveloped and refined. To preserve validity and integrity, future adaptations must remain anchored in Responsible AI practices. These include rigorous reliability testing, transparent documentation, triangulation with human coding, and the critical evaluation of outputs against existing theory and literature. By embedding such practices, researchers in Information Science and Cyber Security can leverage the advantages of AI-assisted CA while safeguarding the standards of robust and ethical scholarship.
No new data were created or analysed in this study. Data sharing is not applicable to this article.
During the preparation of this manuscript/study, the author(s) used Chat GPT5.0 for the purposes of developing the icon graphics in
The author declares no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Similarities and differences between content, thematic and discourse analyses.
| Dimension | Content Analysis | Thematic Analysis | Discourse Analysis |
|---|---|---|---|
| Epistemological orientation | Positivist/post-positivist. | Flexible—positivist/constructionist/reflexive. | Constructionist/post-modernist/critical. |
| Focus of Analysis | What is present in the text, frequency and distribution of categories, manifest and latent features. | Patterns of meaning—identifying, developing, and interpreting themes across the dataset. | How language works—discursive practices, rhetorical strategies, ideological functions. |
| Coding process | Structured codebook with explicit rules; intercoder reliability checks. | Iterative coding and theme generation; phases include familiarisation, coding, theme development and refinement. | Close reading of extracts; interpretative identification of discursive repertories; metaphors and strategies. |
| Reliability and Validity | Intercoder reliability measured by Krippendorff’s α or Cohen’s Kappa; Validity addressed through clear coding scheme that measure constructs and includes criterion checks. | Reliability conceptualised as transparency and reflexivity, can use interrater reliability or member checking and peer debriefing; requires an audit trail. | Reliability achieved through rigorous, systematic methods, clear research questions and intersubjectivity shared by other researchers; validity concerns include the accuracy of the reflection. |
| Outputs | Quantitative summaries; descriptive statistics; cross-tabulations; visualisations of patterns. | Thematic maps, rich narrative accounts of themes, supported by illustrative extracts; tabulations; visualisations of patterns. | Critical interpretations of discourse practices; accounts of ideology, power, and identity construction. |
| Strengths | Scalability, replicability, ability to combine with statistical analysis. | Flexibility, accessibility, capacity to capture patterned meanings. | Deep contextual insight; capacity to interrogate power, identity and ideology. |
| Limitations | Can overlook nuance; risk of reductionism, especially if categories are poorly defined. | Risk of superficiality if rigour is not applied; themes may reflect researcher bias. | Limited replicability; findings are typically not generalisable. |
Examples of Content Analysis in Information Science and Cyber Security Studies.
| Study | Materials | Methods | Key Findings |
|---|---|---|---|
| Trends in information behaviour research 1999–2008: A content analysis [ | 749 articles. | CA on human information literature using the searchers ‘information needs’ and ‘information uses’. | Overtime, scholarly researchers increased and practitioners decreased. |
| Content analysis of cyber insurance policies: how do carriers price cyber risk? [ | 235 documents. | CA on the losses covered by cyber insurance policies and those excluded; the questions carriers pose to applicants in order to assess risk; and how cyber insurance premiums determined. | Most important firm characteristics used to computer insurance premiums were the firm’s asset value base rate, rather than specific technology or governance controls. |
| What security features and crime prevention advice is communicated in consumer IoT device manuals and support pages? [ | Manuals and associated support pages for 270 consumer IoT devices produced by 220 different manufacturers. | CA on examined security features. | Manufacturers do not provide enough information about the security features of their devices. |
| Twenty-five years of cyber threats in the news: A study of Swedish newspaper coverage (1995–2019) [ | 1269 newspaper articles. | CA examined threats along several dimensions: modality, ambiguous themes, how threat has changed over time and event orientation. | Swedish papers cover multiple threats, hacking has multiple meanings, coverage has changed over time. |
| An investigation of the impact of data breach severity on the readability of mandatory data breach notification letters: Evidence from U.S. firms [ | 512 data breach incidents from 281 U.S. firms across 2012–2015. | CA examined data breach severity attributes and readability measures. | Data breach severity has a positive impact on reading complexity, and a negative impact on numerical terms. |
| Scamming higher ed: An analysis of phishing content and trends [ | 2300 phishing emails from 2010 to 2023. | CA examined persuasive and emotional appeals, topics, linguistic features and metadata | There has been a shift in the topics in phishing emails; persuaded using sources of authority, scarcity and fear appeals. |
Steps involved in carrying out a CA.
| Step | Description |
|---|---|
| Define research questions | Specify focused questions that the analysis will address. |
| Select Content | Identify the corpus and choose an appropriate sampling strategy. |
| Select sample of content | A sample of the corpus. |
| Determine units of analysis | Decide the textual unit (e.g., word, sentence, paragraph, document). |
| Develop coding scheme | Construct clear, mutually exclusive and exhaustive categories (deductive and/or inductive). |
| Train coders and code | Provide coder training; apply coding rules systematically and consistently. |
| Assess reliability | Evaluate intercoder agreement (e.g., Cohen’s K) and refine the scheme if needed. |
| Analyse coded data | Conduct descriptive/quantitative summaries and/or qualitative pattern interpretation. |
| Interpret and report | Link results to questions and theory, discuss implications, limitations, and transparency of procedures. |
Steps involved in carrying out AI-assisted CA.
| Study | Materials | AI Role in CA | Findings & Conclusions | Type AI |
|---|---|---|---|---|
| Beyond manual media coding: Evaluating large language models and news agents for news content analysis [ | 200 news articles on U.S. tariff policies. | 7 LLMs assessed under a unified zero-shot prompt. | Strong reliability. Structured prompting and agentic reasoning improved accuracy. Human-in the loop validation may be essential to ensure reliability. | not stated |
| Comparing large language models and human annotators in latent content analysis of sentiment, political leaning, emotional intensity and sarcasm [ | 100 curated textual items. | 33 human annotators; | Both humans and most LLMs exhibit high inter-rater reliability in sentiment analysis and political leaning assessments. LLMs demonstrate higher reliability than humans. Both groups struggled with sarcasm detection. | GPT-4, Gemini, Llama 3.1-70B and Mixtral 8 x7B |
| Using large language models for narrative analysis: A novel application of generative AI [ | 138 short stories written by young people about social media, identity formation, and food choices. | Analysed by humans researchers and two different LLMs (Claude and GPT-01). | LLMs were fast and comparable to human researchers. LLMs provided additional insights that enhanced the analysis. | ChatGPT (version not stated) |
| LLM- Assisted Content Analysis: Using large language models to support deductive coding. [ | 4 publicly available datasets. | Compared GPT 3.5 with human researchers. | GPT 3.5 can perform deductive coding at levels similar to humans. GPT4.5 can help refine prompts for deductive coding. | GPT 3.5 |
| LLM-Assisted qualitative data analysis: Security and privacy concerns in gamified workforce studies [ | 23 Interview transcripts. | Compared LLaMA, Gemma and Phi. | LLaMA excelled in detailed security insights. Gemma focuses on compliance. Phi focuses on ethical transparency Useful as a preliminary step to enhance efficiency of more traditional qualitative analysis methods. | LLaMA, Gemma, and Phi. |
| Coding latent concepts: A human and LLM-coordinated content analysis procedure [ | 1000 public comments. | Compared human researchers with GPT-4o and GPT-3.5-turbo-1106. | Fine tuned GPT 3.5-turbo with smaller datasets can surpass GPT 4o’s performance. | GPT3.5 and GPT-4o |
Deductive approach—AI-Assisted Content Analysis.
| [Image omitted. Please see PDF.] | Step1: Define research questions |
| [Image omitted. Please see PDF.] | Step 2: Select content |
| [Image omitted. Please see PDF.] | Step 3: Sample |
| [Image omitted. Please see PDF.] | Step 4: Determine units of analysis |
| [Image omitted. Please see PDF.] | Step 5: Develop coding scheme and codebook |
| [Image omitted. Please see PDF.] | Step 6: Train coders and code and verification playbook |
| [Image omitted. Please see PDF.] | Step 7: Assess reliability and verification playbook |
| [Image omitted. Please see PDF.] | Step 8: Analyse coded data |
| [Image omitted. Please see PDF.] | Step 9: Interpret and report |
Inductive approach—AI-Assisted Content Analysis.
| [Image omitted. Please see PDF.] | Step1: Define research questions |
| [Image omitted. Please see PDF.] | Step 2: Select content |
| [Image omitted. Please see PDF.] | Step 3: Sample |
| [Image omitted. Please see PDF.] | Step 4: Determine units of analysis |
| [Image omitted. Please see PDF.] | Step 5: Develop coding scheme and codebook |
| [Image omitted. Please see PDF.] | Step 6: Train coders and code |
| [Image omitted. Please see PDF.] | Step 7: Assess reliability |
| [Image omitted. Please see PDF.] | Step 8: Analyse coded data |
| [Image omitted. Please see PDF.] | Step 9: Interpret and report |
1. Jiayimei, W.; Tao, N.; Wei-Bin, L.; Qingchuan, Z. A Contemporary Survey of Large Language Model Assisted Program Analysis. Trans. Artif. Intell.; 2025; 1, pp. 105-129. [DOI: https://dx.doi.org/10.53941/tai.2025.100006]
2. Kharma, M.; Choi, S.; AlKhanafseh, M.; Mohaisen, D. Security and Quality in LLM-Generated Code: A Multi-Language, Multi-Model Analysis. arXiv; 2025; arXiv: 2502.01853
3. Du, X.; Du, M.; Zhou, Z.; Bai, Y. Facilitator or hindrance? The impact of AI on university students’ higher-order thinking skills in complex problem solving. Int. J. Educ. Technol. High. Educ.; 2025; 22, 39. [DOI: https://dx.doi.org/10.1186/s41239-025-00534-0]
4. Jie, A.L.X.; Kamrozzaman, N.A. The Challenges of Higher Education Students Face in Using Artificial Intelligence (AI) against Their Learning Experiences. Open J. Soc. Sci.; 2024; 12, pp. 362-387. [DOI: https://dx.doi.org/10.4236/jss.2024.1210025]
5. Ergene, O.; Ergene, B.C. AI ChatBots’ solutions to mathematical problems in interactive e-textbooks: Affordances and constraints from the eyes of students and teachers. Educ. Inf. Technol.; 2025; 30, pp. 509-545. [DOI: https://dx.doi.org/10.1007/s10639-024-13121-z]
6. Sijing, L.; Lan, W. Artificial Intelligence Education Ethical Problems and Solutions. Proceedings of the 2018 13th International Conference on Computer Science & Education (ICCSE); Colombo, Sri Lanka, 8–11 August 2018; pp. 1-5. [DOI: https://dx.doi.org/10.1109/ICCSE.2018.8468773]
7. Chiu, T.K.F.; Moorhouse, B.L.; Chai, C.S.; Ismailov, M. Teacher support and student motivation to learn with Artificial Intelligence (AI) based chatbot. Interact. Learn. Environ.; 2024; 32, pp. 3240-3256. [DOI: https://dx.doi.org/10.1080/10494820.2023.2172044]
8. Kim, J.; Lee, H.; Cho, Y.H. Learning design to support student-AI collaboration: Perspectives of leading teachers for AI in education. Educ. Inf. Technol.; 2022; 27, pp. 6069-6104. [DOI: https://dx.doi.org/10.1007/s10639-021-10831-6]
9. Guo, K.; Zhang, E.D.; Li, D.; Yu, S. Using AI-supported peer review to enhance feedback literacy: An investigation of students’ revision of feedback on peers’ essays. Br. J. Educ. Technol.; 2025; 56, pp. 1612-1639. [DOI: https://dx.doi.org/10.1111/bjet.13540]
10. Guo, K.; Pan, M.; Li, Y.; Lai, C. Effects of an AI-supported approach to peer feedback on university EFL students’ feedback quality and writing ability. Internet High. Educ.; 2024; 63, 100962. [DOI: https://dx.doi.org/10.1016/j.iheduc.2024.100962]
11. Whitfield, S.; Hofmann, M.A. Elicit: AI literature review research assistant. Public Serv. Q.; 2023; 19, pp. 201-207. [DOI: https://dx.doi.org/10.1080/15228959.2023.2224125]
12. Al-Zahrani, A.M. The impact of generative AI tools on researchers and research: Implications for academia in higher education. Innov. Educ. Teach. Int.; 2024; 61, pp. 1029-1043. [DOI: https://dx.doi.org/10.1080/14703297.2023.2271445]
13. Khalifa, M.; Albadawy, M. Using artificial intelligence in academic writing and research: An essential productivity tool. Comput. Methods Programs Biomed. Update; 2024; 5, 100145. [DOI: https://dx.doi.org/10.1016/j.cmpbup.2024.100145]
14. Whitty, M.T. Leveraging Behaviour Sequence Analysis (BSA) in Information Systems: Examining malicious insider threat as an example. Proceedings of the International Conferences IADIS Information Systems 2025 and e-Society 2025; Madeira Island, Portugal, 1–3 March 2025; pp. 517-521.
15. Whitty, M.T. How the voice lost its voice: Applying the Dual Processing Theory to explain how mis/disinformation can deceive and persuade voting decision-making. J. Inf. Warf.; 2025; 24, pp. 93-108.
16. Whitty, M.T.; Abdulgalimov, D.; Oliver, P.; Ruddy, C.; Seguin, J.; Young, G. Inside the Threat Matrix: Using Hybrid Computer Simulations to Educate Adults on Malicious Insider Threat and Technology Misuse. HCI for Cybersecurity, Privacy and Trust; Moallem, A. Springer Nature: Cham, Switzerland, 2025; pp. 298-312. [DOI: https://dx.doi.org/10.1007/978-3-031-92833-8_18]
17. Whitty, M.T.; Ruddy, C.; Keatley, D.; Butavicius, M.; Grobler, M. The prince of insiders: A multiple pathway approach to understanding IP theft insider attacks. Inf. Comput. Secur.; 2024; 32, pp. 509-522. [DOI: https://dx.doi.org/10.1108/ICS-11-2023-0210]
18. Valizadeh, A.; Moassefi, M.; Nakhostin-Ansari, A.; Hosseini Asl, S.H.; Saghab Torbati, M.; Aghajani, R.; Maleki Ghorbani, Z.; Faghani, S. Abstract screening using the automated tool Rayyan: Results of effectiveness in three diagnostic test accuracy systematic reviews. BMC Med. Res. Methodol.; 2022; 22, 160. [DOI: https://dx.doi.org/10.1186/s12874-022-01631-8] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35655155]
19. Gill, M.; Ng, J.; Szydlowski, N.; Fusco, N.; Ruiz, K. MSR95 Artificial Intelligence for Targeted Literature Review Screening within the Rayyan Platform. Value Health; 2024; 27, S278. [DOI: https://dx.doi.org/10.1016/j.jval.2024.03.1528]
20. Noy, S.; Zhang, W. Experimental evidence on the productivity effects of generative artificial intelligence. Science; 2023; 381, pp. 187-192. [DOI: https://dx.doi.org/10.1126/science.adh2586]
21. Pack, A.; Barrett, A.; Escalante, J. Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability. Comput. Educ. Artif. Intell.; 2024; 6, 100234. [DOI: https://dx.doi.org/10.1016/j.caeai.2024.100234]
22. Neuendorf, K.A. The Content Analysis Guidebook; Sage: Los Angeles, CA, USA, 2017.
23. Van Atteveldt, W.; Trilling, D.; Calderon, C.A. Computational Analysis of Communication; John Wiley & Sons: Hoboken, NJ, USA, 2022.
24. Mingers, J. Combining IS research methods: Towards a pluralist methodology. Inf. Syst. Res.; 2001; 12, pp. 240-259. [DOI: https://dx.doi.org/10.1287/isre.12.3.240.9709]
25. Nicmanis, M.; Spurrier, H. Getting Started with Artificial Intelligence Assisted Qualitative Analysis: An Introductory Guide to Qualitative Research Approaches with Exploratory Examples from Reflexive Content Analysis. Int. J. Qual. Methods; 2025; 24, 16094069251354863. [DOI: https://dx.doi.org/10.1177/16094069251354863]
26. Krippendorff, K. Content Analysis: An Introduction to Its Methodology; Sage publications: Los Angeles, CA, USA, 2018.
27. Gebru, T.; Morgenstern, J.; Vecchione, B.; Vaughan, J.W.; Wallach, H.; Daumé, H., III; Crawford, K. Datasheets for datasets. Commun. ACM; 2021; 64, pp. 86-92. [DOI: https://dx.doi.org/10.1145/3458723]
28. Whitty, M. Possible Selves: An Exploration of the Utility of a Narrative Approach. Identity; 2002; 2, pp. 211-228. [DOI: https://dx.doi.org/10.1207/S1532706XID0203_02]
29. Badzinski, D.M.; Woods, R.H.; Nelson, C.M. Content analysis. The Routledge Handbook of Research Methods in the Study of Religion; Routledge: Oxfordshire, UK, 2021; pp. 180-193.
30. Baker, L.M.; King, A.E. Let’s get theoretical: A quantitative content analysis of theories and models used in the Journal of Applied Communications. J. Appl. Commun.; 2016; 100, 5. [DOI: https://dx.doi.org/10.4148/1051-0834.1021]
31. Braun, V.; Clarke, V. Supporting best practice in reflexive thematic analysis reporting in Palliative Medicine: A review of published research and introduction to the Reflexive Thematic Analysis Reporting Guidelines (RTARG). Palliat. Med.; 2024; 38, pp. 608-616. [DOI: https://dx.doi.org/10.1177/02692163241234800] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38469804]
32. Potter, J.; Wetherell, M. Discourse and social psychology. The Discourse Studies Reader; John Benjamins Publishing Company: Amsterdam, The Netherlands, 2014; pp. 244-255.
33. Wong, M.-F.; Guo, S.; Hang, C.-N.; Ho, S.-W.; Tan, C.-W. Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review. Entropy; 2023; 25, 888. [DOI: https://dx.doi.org/10.3390/e25060888] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37372232]
34. Hassani, H.; Silva, E.S. The Role of ChatGPT in Data Science: How AI-Assisted Conversational Interfaces Are Revolutionizing the Field. Big Data Cogn. Comput.; 2023; 7, 62. [DOI: https://dx.doi.org/10.3390/bdcc7020062]
35. Fazeli, S.; Sabetti, J.; Ferrari, M. Performing qualitative content analysis of video data in social sciences and medicine: The visual-verbal video analysis method. Int. J. Qual. Methods; 2023; 22, 16094069231185452. [DOI: https://dx.doi.org/10.1177/16094069231185452]
36. McCashin, D.; Murphy, C.M. Using TikTok for public and youth mental health—A systematic review and content analysis. Clin. Child Psychol. Psychiatry; 2023; 28, pp. 279-306. [DOI: https://dx.doi.org/10.1177/13591045221106608]
37. Benjamin, S.; Bottone, E.; Lee, M. Beyond accessibility: Exploring the representation of people with disabilities in tourism promotional materials. Justice and Tourism; Routledge: Oxfordshire, UK, 2021; pp. 153-171. [DOI: https://dx.doi.org/10.1080/09669582.2020.1755295]
38. Drisko, J.W.; Maschi, T. Content Analysis; Oxford University Press: Oxford, UK, 2016.
39. Riffe, D.; Lacy, S.; Watson, B.R.; Lovejoy, J. Analyzing Media Messages: Using Quantitative Content Analysis in Research; Routledge: Oxfordshire, UK, 2023.
40. Chu, H. Research methods in library and information science: A content analysis. Libr. Inf. Sci. Res.; 2015; 37, pp. 36-41. [DOI: https://dx.doi.org/10.1016/j.lisr.2014.09.003]
41. Choi, W.; Zhang, Y.; Stvilia, B. Exploring applications and user experience with generative AI tools: A content analysis of reddit posts on ChatGPT. Proc. Assoc. Inf. Sci. Technol.; 2023; 60, pp. 543-546. [DOI: https://dx.doi.org/10.1002/pra2.823]
42. Eriksson, J.; Giacomello, G. International relations, cybersecurity, and content analysis: A constructivist approach. The Global Politics of Science and Technology-Vol. 2: Perspectives, Cases and Methods; Springer: Berlin/Heidelberg, Germany, 2014; pp. 205-219. [DOI: https://dx.doi.org/10.1007/978-3-642-55010-2_12]
43. Walker, J.T. Mixed Method Content Analysis of the Cybersecurity Corpus of the US National Security System; Capitol Technology University: Laurel, MD, USA, 2024.
44. Nicmanis, M. Reflexive Content Analysis: An Approach to Qualitative Data Analysis, Reduction, and Description. Int. J. Qual. Methods; 2024; 23, 16094069241236603. [DOI: https://dx.doi.org/10.1177/16094069241236603]
45. Vears, D.F.; Gillam, L. Inductive content analysis: A guide for beginning qualitative researchers. Focus Health Prof. Educ. Multi-Prof. J.; 2022; 23, pp. 111-127. [DOI: https://dx.doi.org/10.11157/fohpe.v23i1.544]
46. Russman, U.; Flick, U. Designing Qualitative Research for Working with Facebook Data. The SAGE Handbook of Qualitative Research Design; Sage: Los Angeles, CA, USA, 2022; pp. 851-885.
47. Harwood, T.G.; Garry, T. An overview of content analysis. Mark. Rev.; 2003; 3, pp. 479-498. [DOI: https://dx.doi.org/10.1362/146934703771910080]
48. Bretl, D.J.; Cantor, J. The portrayal of men and women in US television commercials: A recent content analysis and trends over 15 years. Sex Roles; 1988; 18, pp. 595-609. [DOI: https://dx.doi.org/10.1007/BF00287963]
49. Stemler, S. An overview of content analysis. Pract. Assess. Res. Eval.; 2000; 7, 17.
50. Moretti, F.; van Vliet, L.; Bensing, J.; Deledda, G.; Mazzi, M.; Rimondini, M.; Zimmermann, C.; Fletcher, I. A standardized approach to qualitative content analysis of focus group discussions from different countries. Patient Educ. Couns.; 2011; 82, pp. 420-428. [DOI: https://dx.doi.org/10.1016/j.pec.2011.01.005]
51. De Wever, B.; Schellens, T.; Valcke, M.; Van Keer, H. Content analysis schemes to analyze transcripts of online asynchronous discussion groups: A review. Comput. Educ.; 2006; 46, pp. 6-28. [DOI: https://dx.doi.org/10.1016/j.compedu.2005.04.005]
52. Oleinik, A. Mixing quantitative and qualitative content analysis: Triangulation at work. Qual. Quant.; 2011; 45, pp. 859-873. [DOI: https://dx.doi.org/10.1007/s11135-010-9399-4]
53. Schreier, M. Qualitative Content Analysis in Practice; Sage Publications: Los Angeles, CA, USA, 2012.
54. Forman, J.; Damschroder, L. Qualitative Content Analysis. Empirical Methods for Bioethics: A Primer; Jacoby, L.; Siminoff, L.A. Emerald Group Publishing Limited: Leeds, UK, 2007; Volume 11, pp. 39-62.
55. Hsieh, H.-F.; Shannon, S.E. Three Approaches to Qualitative Content Analysis. Qual. Health Res.; 2005; 15, pp. 1277-1288. [DOI: https://dx.doi.org/10.1177/1049732305276687] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/16204405]
56. Mayring, P. Qualitative Content Analysis: Theoretical Background and Procedures. Approaches to Qualitative Research in Mathematics Education: Examples of Methodology and Methods; Bikner-Ahsbahs, A.; Knipping, C.; Presmeg, N. Springer: Amsterdam, The Netherlands, 2015; pp. 365-380.
57. Kleinheksel, A.J.; Rockich-Winston, N.; Tawfik, H.; Wyatt, T.R. Demystifying Content Analysis. Am. J. Pharm. Educ.; 2020; 84, 7113. [DOI: https://dx.doi.org/10.5688/ajpe7113]
58. Dooley, K.J. Using manifest content analysis in purchasing and supply management research. J. Purch. Supply Manag.; 2016; 22, pp. 244-246. [DOI: https://dx.doi.org/10.1016/j.pursup.2016.08.004]
59. Lee, J.-H.; Kim, Y.-G. A stage model of organizational knowledge management: A latent content analysis. Expert Syst. Appl.; 2001; 20, pp. 299-311. [DOI: https://dx.doi.org/10.1016/S0957-4174(01)00015-X]
60. Egberg Thyme, K.; Wiberg, B.; Lundman, B.; Graneheim, U.H. Qualitative content analysis in art psychotherapy research: Concepts, procedures, and measures to reveal the latent meaning in pictures and the words attached to the pictures. Arts Psychother.; 2013; 40, pp. 101-107. [DOI: https://dx.doi.org/10.1016/j.aip.2012.11.007]
61. Denzin, N.K. The new paradigm dialogs and qualitative inquiry. Int. J. Qual. Stud. Educ.; 2008; 21, pp. 315-325. [DOI: https://dx.doi.org/10.1080/09518390802136995]
62. Sławecki, B. Paradigms in Qualitative Research. Qualitative Methodologies in Organization Studies: Volume I: Theories and New Approaches; Ciesielska, M.; Jemielniak, D. Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 7-26.
63. Jaspal, R. Content analysis, thematic analysis and discourse analysis. Res. Methods Psychol.; 2020; 1, pp. 285-312.
64. Lytvyn, V.; Vysotska, V.; Peleshchak, I.; Basyuk, T.; Kovalchuk, V.; Kubinska, S.; Rusyn, B.; Pohreliuk, L.; Chyrun, L.; Salo, T. Identifying Textual Content Based on Thematic Analysis of Similar Texts in Big Data. Proceedings of the 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT); Lviv, Ukraine, 17–20 September 2019; Volume 2, pp. 84-91. [DOI: https://dx.doi.org/10.1109/STC-CSIT.2019.8929808]
65. Clarke, V.; Braun, V. Teaching thematic analysis: Overcoming challenges and developing strategies for effective learning. Psychologist; 2013; 26, pp. 120-123.
66. Braun, V.; Clarke, V. Reflecting on reflexive thematic analysis. Qual. Res. Sport Exerc. Health; 2019; 11, pp. 589-597. [DOI: https://dx.doi.org/10.1080/2159676X.2019.1628806]
67. Ekstrom, J.J.; Lunt, B.M.; Parrish, A.; Raj, R.K.; Sobiesk, E. Information Technology as a Cyber Science. Proceedings of the 18th Annual Conference on Information Technology Education; Rochester, NY, USA, 4–7 October 2017.
68. Kaur, J.; Ramkumar, K.R. The recent trends in cyber security: A review. J. King Saud Univ.-Comput. Inf. Sci.; 2022; 34, Pt B, pp. 5766-5781. [DOI: https://dx.doi.org/10.1016/j.jksuci.2021.01.018]
69. von Solms, R.; van Niekerk, J. From information security to cyber security. Comput. Secur.; 2013; 38, pp. 97-102. [DOI: https://dx.doi.org/10.1016/j.cose.2013.04.004]
70. Julien, H.; Pecoskie, J.; Reed, K. Trends in information behavior research, 1999–2008: A content analysis. Libr. Inf. Sci. Res.; 2011; 33, pp. 19-24. [DOI: https://dx.doi.org/10.1016/j.lisr.2010.07.014]
71. Romanosky, S.; Ablon, L.; Kuehn, A.; Jones, T. Content analysis of cyber insurance policies: How do carriers price cyber risk?. J. Cybersecur.; 2019; 5, tyz002. [DOI: https://dx.doi.org/10.1093/cybsec/tyz002]
72. Blythe, J.M.; Sombatruang, N.; Johnson, S.D. What security features and crime prevention advice is communicated in consumer IoT device manuals and support pages?. J. Cybersecur.; 2019; 5, tyz005. [DOI: https://dx.doi.org/10.1093/cybsec/tyz005]
73. Boholm, M. Twenty-five years of cyber threats in the news: A study of Swedish newspaper coverage (1995–2019). J. Cybersecur.; 2021; 7, tyab016. [DOI: https://dx.doi.org/10.1093/cybsec/tyab016]
74. Jackson, S.; Vanteeva, N.; Fearon, C. An investigation of the impact of data breach severity on the readability of mandatory data breach notification letters: Evidence from US firms. J. Assoc. Inf. Sci. Technol.; 2019; 70, pp. 1277-1289. [DOI: https://dx.doi.org/10.1002/asi.24188]
75. Morrow, E. Scamming higher ed: An analysis of phishing content and trends. Comput. Hum. Behav.; 2024; 158, 108274. [DOI: https://dx.doi.org/10.1016/j.chb.2024.108274]
76. Brown, S.A.; Upchurch, S.L.; Acton, G.J. A Framework for Developing a Coding Scheme for Meta-Analysis. West. J. Nurs. Res.; 2003; 25, pp. 205-222. [DOI: https://dx.doi.org/10.1177/0193945902250038] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/12666644]
77. Lombard, M.; Snyder-Duch, J.; Bracken, C.C. Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability. Hum. Commun. Res.; 2006; 28, pp. 587-604. [DOI: https://dx.doi.org/10.1111/j.1468-2958.2002.tb00826.x]
78. Bojić, L.; Zagovora, O.; Zelenkauskaite, A.; Vuković, V.; Čabarkapa, M.; Veseljević Jerković, S.; Jovančević, A. Comparing large Language models and human annotators in latent content analysis of sentiment, political leaning, emotional intensity and sarcasm. Sci. Rep.; 2025; 15, 11477. [DOI: https://dx.doi.org/10.1038/s41598-025-96508-3]
79. Chew, R.; Bollenbacher, J.; Wenger, M.; Speer, J.; Kim, A. LLM-assisted content analysis: Using large language models to support deductive coding. arXiv; 2023; arXiv: 2306.14924
80. Morgan, D.L. Exploring the Use of Artificial Intelligence for Qualitative Data Analysis: The Case of ChatGPT. Int. J. Qual. Methods; 2023; 22, 16094069231211248. [DOI: https://dx.doi.org/10.1177/16094069231211248]
81. Fuller, K.A.; Morbitzer, K.A.; Zeeman, J.M.; Persky, A.M.; Savage, A.C.; McLaughlin, J.E. Exploring the use of ChatGPT to analyze student course evaluation comments. BMC Med. Educ.; 2024; 24, 423. [DOI: https://dx.doi.org/10.1186/s12909-024-05316-2]
82. Jenner, S.; Raidos, D.; Anderson, E.; Fleetwood, S.; Ainsworth, B.; Fox, K.; Kreppner, J.; Barker, M. Using large language models for narrative analysis: A novel application of generative AI. Methods Psychol.; 2025; 12, 100183. [DOI: https://dx.doi.org/10.1016/j.metip.2025.100183]
83. Bijker, R.; Merkouris, S.S.; Dowling, N.A.; Rodda, S.N. ChatGPT for Automated Qualitative Research: Content Analysis. J. Med. Internet Res.; 2024; 26, e59050. [DOI: https://dx.doi.org/10.2196/59050]
84. Theelen, H.; Vreuls, J.; Rutten, J. Doing Research with Help from ChatGPT: Promising Examples for Coding and Inter-Rater Reliability. Int. J. Technol. Educ.; 2024; 7, pp. 1-18. [DOI: https://dx.doi.org/10.46328/ijte.537]
85. Hitch, D. Artificial Intelligence Augmented Qualitative Analysis: The Way of the Future?. Qual. Health Res.; 2024; 34, pp. 595-606. [DOI: https://dx.doi.org/10.1177/10497323231217392]
86. Yan, L.; Echeverria, V.; Fernandez-Nieto, G.M.; Jin, Y.; Swiecki, Z.; Zhao, L.; Gašević, D.; Martinez-Maldonado, R. Human-ai collaboration in thematic analysis using chatgpt: A user study and design recommendations. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems; Honolulu, HI, USA, 11–16 May 2024; pp. 1-7.
87. Turobov, A.; Coyle, D.; Harding, V. Using ChatGPT for thematic analysis. arXiv; 2024; arXiv: 2405.08828
88. Doropoulos, S.; Karapalidou, E.; Charitidis, P.; Karakeva, S.; Vologiannidis, S. Beyond manual media coding: Evaluating large language models and agents for news content analysis. Appl. Sci.; 2025; 15, 8059. [DOI: https://dx.doi.org/10.3390/app15148059]
89. Adeseye, A.; Isoaho, J.; Mohammad, T. LLM-Assisted Qualitative Data Analysis: Security and Privacy Concerns in Gamified Workforce Studies. Procedia Comput. Sci.; 2025; 257, pp. 60-67. [DOI: https://dx.doi.org/10.1016/j.procs.2025.03.011]
90. Fan, J.; Ai, Y.; Liu, X.; Deng, Y.; Li, Y. Coding latent concepts: A human and LLM-coordinated content analysis procedure. Commun. Res. Rep.; 2024; 41, pp. 324-334. [DOI: https://dx.doi.org/10.1080/08824096.2024.2410263]
91. Brumen, B.; Göllner, S.; Tropmann-Frick, M. Aspects and views on responsible artificial intelligence. International Conference on Machine Learning, Optimization, and Data Science; Springer: Berlin/Heidelberg, Germany, 2022; pp. 384-398.
92. Akbarighatar, P. Operationalizing responsible AI principles through responsible AI capabilities. AI Ethics; 2025; 5, pp. 1787-1801. [DOI: https://dx.doi.org/10.1007/s43681-024-00524-4]
93. Buolamwini, J.; Gebru, T. Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the Conference on Fairness, Accountability and Transparency; New York, NY, USA, 23–24 February 2018; pp. 77-91.
94. Richardson, R.; Schultz, J.M.; Crawford, K. Dirty data, bad predictions: How civil rights violations impact police data, predictive policing systems, and justice. NYUL Rev. Online; 2019; 94, pp. 15-55.
95. Nissenbaum, H. Privacy in context: Technology, policy, and the integrity of social life. Privacy in Context; Stanford University Press: Redwood, CA, USA, 2009.
96. Ahmadi, A. Unravelling the mysteries of hallucination in large language models: Strategies for precision in artificial intelligence language generation. Asian J. Comput. Sci. Technol.; 2024; 13, pp. 1-10. [DOI: https://dx.doi.org/10.70112/ajcst-2024.13.1.4144]
97. Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.
98. Banerjee, S.; Agarwal, A.; Singla, S. Llms will always hallucinate, and we need to live with this. arXiv; 2024; arXiv: 2409.05746
99. Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.J.; Madotto, A.; Fung, P. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv.; 2023; 55, 248. [DOI: https://dx.doi.org/10.1145/3571730]
100. Williamson, S.M.; Prybutok, V. The Era of Artificial Intelligence Deception: Unraveling the Complexities of False Realities and Emerging Threats of Misinformation. Information; 2024; 15, 299. [DOI: https://dx.doi.org/10.3390/info15060299]
101. Clairoux-Trepanier, V.; Beauchamp, I.-M.; Ruellan, E.; Paquet-Clouston, M.; Paquette, S.-O.; Clay, E. The use of large language models (llm) for cyber threat intelligence (cti) in cybercrime forums. arXiv; 2024; arXiv: 2408.03354
102. Whitty, M.T.; Moustafa, N.; Grobler, M. Cybersecurity when working from home during COVID-19: Considering the human factors. J. Cybersecur.; 2024; 10, tyae001. [DOI: https://dx.doi.org/10.1093/cybsec/tyae001]
103. Whitty, M.T. Mass-Marketing Fraud: A Growing Concern. IEEE Secur. Priv.; 2015; 13, pp. 84-87. [DOI: https://dx.doi.org/10.1109/MSP.2015.85]
104. Jobin, A.; Ienca, M.; Vayena, E. The global landscape of AI ethics guidelines. Nat. Mach. Intell.; 2019; 1, pp. 389-399. [DOI: https://dx.doi.org/10.1038/s42256-019-0088-2]
105. Ruddy, C.; Whitty, M.; Doherty, S. Fake and True Pandemic News Database; Zenodo: Honolulu, HI, USA, 2025; [DOI: https://dx.doi.org/10.5281/zenodo.16603669]
106. Whitty, M.T.; Ruddy, C. COVID-19 lies and truths: Employing the Elaboration Likelihood Model (ELM) and Linguistic Inquiry and Word Count (LIWC) to gain insights into the persuasive techniques evident in disinformation (fake news). Comput. Hum. Behav. Rep.; 2025; 20, 100797. [DOI: https://dx.doi.org/10.1016/j.chbr.2025.100797]
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.