The Japan Times - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 3.839032
AFN 78.318295
ALL 98.686181
AMD 418.630098
ANG 1.881327
AOA 955.800527
ARS 1094.340711
AUD 1.653056
AWG 1.881379
AZN 1.776484
BAM 1.96609
BBD 2.10768
BDT 127.301836
BGN 1.95521
BHD 0.393966
BIF 3088.952288
BMD 1.045211
BND 1.416338
BOB 7.213608
BRL 6.192247
BSD 1.043856
BTN 90.188095
BWP 14.488773
BYN 3.416238
BYR 20486.127443
BZD 2.096843
CAD 1.497813
CDF 2974.669187
CHF 0.945842
CLF 0.037408
CLP 1032.197824
CNY 7.568896
CNH 7.571683
COP 4417.331682
CRC 526.79962
CUC 1.045211
CUP 27.69808
CVE 110.642972
CZK 25.098667
DJF 185.893259
DKK 7.460462
DOP 64.058834
DZD 140.778224
EGP 52.565522
ERN 15.678159
ETB 133.481592
FJD 2.408426
FKP 0.860822
GBP 0.842409
GEL 2.994518
GGP 0.860822
GHS 15.81495
GIP 0.860822
GMD 75.255015
GNF 9026.836922
GTQ 8.06756
GYD 218.395023
HKD 8.137283
HNL 26.57679
HRK 7.713182
HTG 136.42605
HUF 409.69429
IDR 16898.024029
ILS 3.734135
IMP 0.860822
INR 90.199058
IQD 1367.445216
IRR 43990.30736
ISK 145.880122
JEP 0.860822
JMD 164.110625
JOD 0.741576
JPY 162.260058
KES 135.187213
KGS 91.401889
KHR 4203.359256
KMF 493.745458
KPW 940.689642
KRW 1496.219752
KWD 0.321998
KYD 0.869955
KZT 543.516327
LAK 22759.531956
LBP 93480.648443
LKR 311.701834
LRD 206.696102
LSL 19.376608
LTL 3.086235
LVL 0.632237
LYD 5.137501
MAD 10.437907
MDL 19.46832
MGA 4893.717616
MKD 61.575094
MMK 3394.803205
MNT 3551.625676
MOP 8.375451
MRU 41.579439
MUR 48.455717
MVR 16.094183
MWK 1810.183838
MXN 21.211368
MYR 4.590463
MZN 66.78705
NAD 19.376422
NGN 1626.358483
NIO 38.411218
NOK 11.724064
NPR 144.300952
NZD 1.830363
OMR 0.402336
PAB 1.043861
PEN 3.882188
PGK 4.190355
PHP 61.014694
PKR 290.959273
PLN 4.213021
PYG 8254.118238
QAR 3.8054
RON 4.975724
RSD 117.116883
RUB 104.389962
RWF 1449.050156
SAR 3.920503
SBD 8.828422
SCR 14.91201
SDG 628.171368
SEK 11.452702
SGD 1.409059
SHP 0.860822
SLE 23.731231
SLL 21917.543254
SOS 596.638199
SRD 36.692093
STD 21633.748813
SVC 9.134028
SYP 13589.827995
SZL 19.384219
THB 35.214217
TJS 11.425531
TMT 3.658237
TND 3.332886
TOP 2.447983
TRY 37.312999
TTD 7.096105
TWD 34.121421
TZS 2649.608991
UAH 43.843475
UGX 3847.123903
USD 1.045211
UYU 45.68607
UZS 13549.156159
VES 58.754499
VND 26198.203283
VUV 124.089499
WST 2.927454
XAF 658.205521
XAG 0.033877
XAU 0.000376
XCD 2.824734
XDR 0.804348
XOF 658.199202
XPF 119.331742
YER 260.363701
ZAR 19.24459
ZMK 9408.155357
ZMW 29.045947
ZWL 336.557382
  • RBGPF

    61.2800

    61.28

    +100%

  • RELX

    0.1300

    49.39

    +0.26%

  • SCS

    0.0200

    11.6

    +0.17%

  • NGG

    0.6600

    60.71

    +1.09%

  • GSK

    0.6200

    34.05

    +1.82%

  • AZN

    0.4000

    68.6

    +0.58%

  • CMSC

    -0.0050

    23.485

    -0.02%

  • BTI

    0.4800

    37.05

    +1.3%

  • CMSD

    -0.0900

    23.87

    -0.38%

  • BCC

    0.5300

    128.45

    +0.41%

  • RIO

    0.4400

    61.56

    +0.71%

  • JRI

    0.0200

    12.55

    +0.16%

  • BCE

    0.0700

    23.22

    +0.3%

  • RYCEF

    0.2800

    7.55

    +3.71%

  • VOD

    0.0200

    8.4

    +0.24%

  • BP

    0.3600

    31.49

    +1.14%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

S.Yamamoto--JT