Content area
This paper explores the rise in voice-based social media as a pivotal transformation in digital communication, situated within the broader era of chatbots and voice AI. Platforms such as Clubhouse, X Spaces, Discord and similar ones foreground vocal interaction, reshaping norms of participation, identity construction, and platform governance. This shift from text-centered communication to hybrid digital orality presents new sociological and methodological challenges, calling for the development of voice-centered analytical approaches. In response, the paper introduces a multidimensional methodological framework for analyzing voice-based social media platforms in the context of surveillance capitalism and AI-driven conversational technologies. We propose a high-level reference architecture machine learning for social science pipeline that integrates digital methods techniques, automatic speech recognition (ASR) models, and natural language processing (NLP) models within a reflexive and ethically grounded framework. To illustrate its potential, we outline possible stages of a PoC (proof of concept) audio analysis machine learning pipeline, demonstrated through a conceptual use case involving the collection, ingestion, and analysis of X Spaces. While not a comprehensive empirical study, this pipeline proposal highlights technical and ethical challenges in voice analysis. By situating the voice as a central axis of online sociality and examining it in relation to AI-driven conversational technologies, within an era of post-orality, the study contributes to ongoing debates on surveillance capitalism, platform affordances, and the evolving dynamics of digital interaction. In this rapidly evolving landscape, we urgently need a robust vocal methodology to ensure that voice is not just processed but understood.
Details
Surveillance;
Communication;
Conversation;
Machine learning;
Methodological problems;
User generated content;
Models;
Ethical standards;
Capitalism;
Neoliberalism;
Social sciences;
Sociology;
Artificial intelligence;
Speech recognition;
Ingestion;
Social media;
Mass media effects;
Social discrimination learning;
Algorithms;
Subjectivity;
Personal information;
User behavior;
Social networks;
Governance;
Learning algorithms;
Methodology;
Voice recognition;
Natural language processing;
Computer platforms;
Data collection;
Automatic speech recognition;
Concept learning;
Digital technology;
Digital media;
Ethical dilemmas;
Research methodology;
Frame analysis;
Mass media;
Transformation;
Voice analysis
