Deceptive Tendencies of Language Models | Olli Järviniemi

Deceptive Tendencies of Language Models | Olli Järviniemi | EAGxNordics 2024

Listen now

Description

AI systems deceiving humans, particularly about their alignment, pose significant challenges for ensuring their safety. Olli Järviniemi talks about his recent research on the deceptive tendencies of language models: will LLMs take deceptive actions without external instruction or pressure to do so? The basic approach is to create a realistic simulation environment and naturally provide opportunities for deception. The focus of this talk is on the experimental setup and results, with some discussion of future research directions. Watch on Youtube: https://www.youtube.com/watch?v=ynF8QuyO_9Q

More Episodes

See all »

Wild Animal Welfare Through the Lens of Population Ethics | Tim Campbell | EAGxNordics 2024

According to one recent estimate, there are one sextillion animals on Earth that may be sentient, most living in the wild. Yet wild animal welfare is neglected by intergovernmental bodies such as the IPCC. This talk discusses the importance and difficulty of developing a framework for evaluating...

Published 10/24/24

EAG Talks

Published 10/24/24

EA Forum AMA: Darren Margolias, BeastPhilanthropy

Darren Margolias, Executive Director of ‪@BeastPhilanthropy‬, answers questions from EA Forum users, posted here: https://forum.effectivealtruism.org/posts/7QfKaF2bnCbuREJNx/ama-beast-philanthropy-s-darren-margolias/ Watch on Youtube: https://www.youtube.com/watch?v=0ylphNrBjWI

Published 10/24/24