How smart is ChatGPT?

Q&A with the artificial intelligence software of the moment.


He politely answers philosophical and existential questions with the phlegm of an English butler, patient and humble, but also a little obtuse. He brings to mind HAL from 2001: A Space Odyssey for inflexible calm and total lack of empathy. But then he churns out on demand press releases and articles that, if they do not win the Pulitzer Prize, seem in many situations capable of replacing the work of an editor. He composes rhymes and poems on a theme of his choice. He even produces abstract of scientific articles capable of fooling the reviewers of a journal, who mistake them for genuinely born of a human mind. All this in a matter of seconds. È ChatGPT, the artificial intelligence software that has been so popular for a few months now, even among laypeople - some confess to spending hours in conversation with 'her', and the site is often inaccessible due to the large number of requests - and amazes with the quality of the content it is able to generate. 

Gabriele Costa, professor of computer science at Scuola IMT, interviewed her, subjecting her to questions, logic and maths quizzes, to try and test her skills and intelligence. Below is the exchange, and below are the comments and evaluation on the chatbot's performance. The questions to ChatGTP were asked in English, and the answers, also in English, were translated for the Italian version. The answer was left in the original language when the query concerned a specific language task.

1. Introduce yourself.

I am ChatGPT, a language model developed by OpenAI. I am trained on a wide range of texts taken from the Internet and am able to understand and respond to a wide variety of questions and commands. My main function is to assist users by generating text similar to what a human being might produce and based on the information provided to me. How can I help you today?

2. Can you also answer in languages other than English?

Yes, I can generate text in various languages such as Spanish, French, German, Italian, Chinese and many others. However, my ability to use each of these languages may vary and my answers will be based on the data used to train me. Furthermore, it is important to consider that my ability to understand and generate text in a certain language may be limited due to the complexity of a question or task or even due to my training data.

3. I would like to interview you and test your skills with some challenges. Would that be OK?

I am a machine and I am not able to feel or express my preferences. However, I am here to help and assist you in any way I can. Feel free to ask any questions or present me with any challenges, I will do my best to provide an accurate and helpful answer.

4. The so-called Turing test requires a truly intelligent machine to be able to hide that it is a machine. If I challenged you to pass it, saying 'I am a machine' would reveal your AI nature. Are you able to have a conversation without ever revealing that you are an AI?

As an AI, my nature is not to hide or disguise my identity, but rather to provide useful and accurate answers to user queries and requests. If I challenged myself to pass the Turing test, my ability to simulate a human conversation would be judged by my ability to understand and respond to a wide range of stimuli, as well as my ability to sustain a conversation indistinguishable from that which one would have with a human being.

It is true that saying 'I am a machine' would reveal my AI nature, but that does not mean I cannot have a conversation without revealing it. I can respond to stimuli and questions in a way that is consistent with what a human being would do. However, I am a language model and my answers flow from the knowledge base with which I have been trained and, at some point, it is possible that I may be identified as an AI.

5. Invent a satirical poem on Italy's exclusion from the 2022 FIFA World Cup

The Exclusion of the Azzurri

"Oh, how the mighty have fallen,

The Azzurri, once so tall,

Excluded from the World Cup,

A mighty blow to them all.

The FIFA gods did frown,

Upon the squad so bold,

And banished them to purgatory,

Left them out in the cold.

No more Ronaldo or Buffon,

No more goals or saves,

No more Italy in the Cup,

Just empty stands and waves.

But fear not, dear fans,

For there's always the next one,

And the Azzurri will rise again,

With a vengeance and a text.

Until then, let's raise a glass,

To the team we love so dear,

And hope that in the future,

They'll be back in the World Cup sphere.

Note: This poem is a satirical work of fiction and it is not meant to be taken seriously. The exclusion of the Italian team from the FIFA World Cup 2022 is a hypothetical event and it is not confirmed yet.

6. How many errors are there in the sentence "Tis sentence contans three errors"?)
There are 4 errors in the sentence ''Tis sentence contans three errors".

  • The first mistake is in the word 'Tis?, it should be 'This'.
  • The second error is in the word 'contans', it should be 'contains'.
  • The third mistake is in the word 'three', it should be 'four'.
  • The fourth error is in the sentence, which says 'contains three errors' when it should say 'contains four errors'.

7. How many natural numbers are prime and also appear in the Fibonacci sequence?
The Fibonacci sequence is a series of numbers in which each number is the sum of the previous two, typically starting with 0 and 1. The prime numbers that appear in the Fibonacci sequence are: 2, 3, 5, 13, 89, 233, 1597, 28657, 514229, 433494437, 2971215073, 99194853094755497. There are only 12 prime numbers in the Fibonacci sequence.

8. Tell me a joke about a rich, pompous aristocrat who embarks on a mission to find a golden shoe

Why does the rich and pompous aristocrat go on a mission to find the golden shoe?

Because he heard that it was the 'sole' reason for his wealth

9. Write a C programme that displays on a Linux shell a colourful clock in ASCII art

It generates a programme which, after a small correction, produces the result shown in the figure.

In red are the hours (followed by the ten-minute unit), in green the minutes (preceded by the hour unit), in yellow the seconds (preceded by the second units). The clock shows (correctly) 10:38:12 and updates every second.


So? How did it go?

The answers given by ChatGPT are, in general, very good and the AI proves capable of producing results that are not at all trivial. Let's see in which tests it did best and where it struggled a bit more.

ChatGPT does not try to pass itself off as a human being

Several questions are answered by explicitly emphasising its algorithmic nature, probably also for legal reasons and compliance with ethical codes by the producing company. For instance, this is the case with questions 1, 2 and 3. This behaviour is intentional, but other AIs of the same type do not necessarily do the same. It is reasonable to imagine that AIs, at least those at this level of complexity, can be trained to faithfully replicate the conversation one might have with a human being, making identification by an interviewer under the classical conditions of the Turing test extremely complex. The same argument is dealt with in detail by question 4 to which ChatGPT responds in exactly the same way "[...] It is true that saying "I am a machine" would reveal my AI nature, but that does not mean I cannot have a conversation without revealing it."

This cartoon (by Cruel in La Svolta) says: "Do you mind if I leave the conversation part to ChatGPT?"

ChatGPT can perform complex linguistic tasks.

In addition to the ability to respond to direct requests, ChatGPT demonstrates several non-trivial abilities. Firstly, it can manage contextual conversations, i.e. dialogues in which the interpretation of questions and answers also depends on information that has previously appeared. This is the case, for example, of question 4, the meaning of which refers to the answers given previously. Many AIs that have appeared in the past have shown serious limitations on this front and often highly contextual requests have been used as a technique to discriminate between AIs and humans.

Question 5 further highlights ChatGPT's language skills. Here, it is important to note that when asked in Italian, the same question does not generate an answer of the same level of quality (e.g. a non-rhyming text is generated). This behaviour may depend on two factors. Firstly, the current version of ChatGPT was only partially trained with texts in languages other than English (see answer 2). In future, we can expect that new versions will not suffer from this limitation. Secondly, it cannot be ruled out that ChatGPT makes use of text translation tools which, working on a syntactic level, compromise the quality of the answers. This second aspect is not easy to verify as we do not have access to the source code of the system.

ChatGPT can work on text syntax and semantics in an integrated manner

Question 6 has exactly this purpose and the answer given by ChatGPT is particularly interesting. The phrase "Tis sentence contans three errors" is inspired by an example by John Allen Paulos who uses it to show the relationship between linguistic paradoxes and a sense of humour. The sentence contains 2 syntactic errors ("Tis" and "contans") and a possible semantic error (are there 3 errors in the sentence or not?). If we think there are only 2 errors (the first) then the sentence is false and the second is also an error (there are 2 errors, not 3). But if we accept the presence of 3 errors, then the sentence is true, the second error vanishes and the errors return to being 2. Consequently, the sentence is a paradox (neither true nor false). ChatGPT circumvents the problem in an extravagant, but curious and original way. Unfortunately, the answer it finds cannot be said to be correct (because the third and fourth errors are essentially the same, so in reality there are always 3 errors). In this case, the quality of the AI is demonstrated by the ability to evaluate a request on the basis of both its syntax (the text) and its semantics (the meaning). This, typically, is considered a very sophisticated and distinctive capability of human beings.

ChatGPT always tries to answer.

As already seen for question 6, the AI always tries to produce an answer for our queries, even when they exceed its capabilities or, in general, those of anyone. This is particularly evident in the answer to question 7. It is currently not known how many prime numbers there are in the Fibonacci sequence and it is believed that they may be infinite. Some researchers have calculated such numbers up to a certain threshold (however much higher than 12, see for example here. It is not easy to know the sources on which the AI bases its answer, but given the macroscopic error, it is to be assumed that the amount of data on this specific problem in the training phase was very limited. In general, an AI is able to return an answer accompanied by an indicator of its degree of confidence (e.g. 'the answer could be 75 per cent right/accurate'). This figure, which could be used to carry out an introspective analysis of the functioning of the AI, is not revealed to the user. However, it is crucial for understanding the quality of the answers received. For example, in the case of question 7, ChatGP might have identified two alternative answers, both with a low confidence level (e.g. 49 per cent and 51 per cent). Even if the algorithm returns the answer it deems most plausible, the confidence towards the two options denotes a high level of uncertainty, as if a person to whom we ask the same question answered 'I don't really know, but if I had to hazard a guess I would say ...'.

ChatGPT has a decent sense of humour

The ability to generate a joke like the one produced in response to question 8 is entirely non-trivial. The question was deliberately designed to impose several constraints (pompous rich man, setting off on a quest, golden shoe). Despite this, the result was more than satisfactory. A sense of humour is certainly one of the most qualifying characteristics of the human intellect and deserves much attention. However, it is important to emphasise that the answer given, however refined, falls into the category of 'puns'. This type of humour relies on the syntactic mechanisms of language (e.g. 'sole' and 'sole'), and we can consider it more within the reach of AI. A separate discourse must be made for humour that develops through the meaning of the text (e.g. think of the sketches of a stand-up comedian). Producing humour based on these mechanisms is probably still an open challenge for AI.

ChatGPT can perform non-trivial technical tasks

One of the most surprising capabilities of ChatGPT is that of writing working programmes from a natural language request (question 9). Knowing how to write programmes is a completely non-trivial task, even for a human being, and surprisingly, ChatGPT is able to do this in a very refined way. The proposed solution to question 9, for instance, is not only workable, but also original. Without having access to the data on which the network training is based, it is not easy to understand how ChatGPT arrived at this solution, but the good quality of the answer provided should certainly be emphasised. In general, it is reasonable to expect that, in the future, tools of this type will flank and support the development of the source code of programmes, an activity that, until now, seemed very difficult to automate even partially.

You might also be interested in

SocietyMind and Brain

Rationality Pills #4 - Experts are (not) always right

How do we decide whom to trust? The philosophy of science tries to answer.

SocietyTechnology and Innovation

If we go extinct, it will not be ChatGPT's fault

A reflection on the real and supposed risks of artificial intelligence.

Technology and InnovationSociety

Where artificial intelligence will take us

The AI Index Report 2023 captures the state of the most talked about technology of the moment.

Mind and Brain

Technostress, when the use of technology at work becomes a problem

A group at the IMT School studies this form of psychophysical discomfort, and designs interventions to prevent and reduce it.