Content area

Abstract

Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test our methodology, we use a human-generated scientific instruction tuning dataset and train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. In comparison to the models that are finetuned with machine generated data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark.

Details

1009240
Title
SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions
Publication title
arXiv.org; Ithaca
Publication year
2023
Publication date
Jul 3, 2023
Section
Computer Science
Publisher
Cornell University Library, arXiv.org
Source
arXiv.org
Place of publication
Ithaca
Country of publication
United States
University/institution
Cornell University Library arXiv.org
e-ISSN
2331-8422
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Online publication date
2023-07-04
Milestone dates
2023-07-03 (Submission v1)
Publication history
 
 
   First posting date
04 Jul 2023
ProQuest document ID
2832891468
Document URL
https://www.proquest.com/working-papers/scitune-aligning-large-language-models-with/docview/2832891468/se-2?accountid=208611
Full text outside of ProQuest
Copyright
© 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2023-07-05
Database
ProQuest One Academic