Content area

Abstract

We are in a period of rapidly increasing machine intelligence. When reviewing progress, we conclude that there is a clear possibility of machines performing a dominant fraction of economic activity within the next few decades (reaching “high level machine intelligence (HLMI)” or “artificial general intelligence (AGI)”), which requires a focus on safety.

We contribute to this effort in two parts. In the first part, we focus on “Normative NLP”, or designing Natural Language Processing systems that follow certain norms. First, we recognize that with increasing capabilities, machines are at risk of deceiving humans by pretending to be human (“deceptive anthropomorphism”). We present work on this topic published at prominent NLP venues. In the R-U-A-Robot dataset (2021), we collected over 2,500 phrasings related to the intent of “Are you a robot?”. Even when explicitly asked, we show popular systems of the time often failed to confirm their non-human identity, and we design machine learning classifiers as well as a user study to improve on this. In addition, we contribute the Robots-Dont-Cry dataset (2022), which studies implicit deceptive anthropomorphism. We collect over 900 dialogue turns in popular datasets of the time, showing many are not viewed as possible for a machine. This work has been used by other scientists to study anthropomorphism and robust NLP classifiers.

In the second part, we focus on connecting Software Engineering research to AGI safety. We discuss traditional Software Engineering research problems and their connection to AGI safety (2023). We then focus on two problems. First, we contribute techniques for estimating confidence of correctness in code-generating models (2025). This work aims to aid our ability to know when to audit machine outputs. In other work (2020), we study the problem of code summarization, where we characterize datasets of the time and help to improve the rigor of metrics. Faithful and high-quality summaries of complex machine output might also help manage a world where machines are producing vast amounts of complex output.

These contributions are presented toward the UC Davis PhD requirements, and add to knowledge for building a better future within AI, NLP, and Software Engineering.

Details

1010268
Business indexing term
Title
Progress in AI Safety via Normative NLP Research and Software Engineering Research
Number of pages
239
Publication year
2025
Degree date
2025
School code
0029
Source
DAI-B 87/4(E), Dissertation Abstracts International
ISBN
9798297645578
Committee member
Yu, Zhou; Chen, Muhao
University/institution
University of California, Davis
Department
Computer Science
University location
United States -- California
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32241389
ProQuest document ID
3263306025
Document URL
https://www.proquest.com/dissertations-theses/progress-ai-safety-via-normative-nlp-research/docview/3263306025/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic