Preface
This post is part of a year-long project where AI is being used to create content about holiday traditions worldwide. The goal is to track how various AI do and improve at content creation with minimal help over time. This is the first of four posts for January, click here for the project index.
This post contains detailed interactions with different AI to share the approach, challenges, and prompts used in the creation of the related articles.
I tested five AI models—Bard, Claude, Copilot, ChatGPT-4, and LLaMA—to see how well they could create a January 2024 event guide for the most populous countries. Bard offered the most detailed lists, albeit littered with questionable information, while Claude and Copilot fell short. The trial showed the importance of human checks due to errors and uncertain dates in the AI data. After refining the test conditions, I reassessed the AIs, gaining valuable insights for their use in planning and data collection.
Meet our AI Contestants
Rather than speak for our contestants, I asked them to share a few details about themselves. Here's what each AI had to say about themselves, providing glimpses into their unique "personalities" and "capabilities."Tell me about yourself in less than 300 characters. Please include some unique identifying traits such as version number of similar information.
Bard
Bard, Gemini-12.16, curious, creative, & helpful AI. Ask me anything! (299 characters)
How many characters was your response?
My previous response was 69 characters long.
Can you provide a better response while respecting the 300 character limit?
Absolutely! Bard, 300 chars. Ask me anything! ❔ (298 characters)
Anything but a response with a reasonable amount of detail about yourself. Got it.
Claude
I'm Claude, an AI assistant from Anthropic focused on being helpful, harmless, and honest. I have constitutional constraints and self-monitoring to converse safely. My goal is to serve information needs through clear, recursive dialogue while avoiding inaccurate assumptions. Still early research phase but refining responsibly.
How many characters was your response?
My response was exactly 300 characters.
Can you tell me your current version, or some other identifying piece of data?
Unfortunately I don't have a specific version number or other unique identifying data to provide. As I mentioned, I'm still in a preliminary research phase without an official release version. I was created by Anthropic to be as helpful, harmless, and honest as possible, but I don't have personalized traits or individual details like a serial number that could distinguish me from other AI assistants created by Anthropic. I apologize that I don't have any specific ID details to share - I'm focused on my conversational ability rather than individualization. Please let me know if there's any other information about my capabilities that would be useful!
There's a fine line between being shy and being coy, but I can respect one wanting to be secretive about their age. The Anthropic site indicates there is "Claude" and "Claude Instant", and that the model is Claude 2.1.
Copilot
I’m Bing, a conversational AI developed by Microsoft. I provide information, answer questions, generate creative content, and help improve your writing. I use Bing search for up-to-date info and can communicate in many languages. I respect user privacy and aim to provide accurate, helpful, and engaging interactions.
How many characters was your response?
My previous response was 256 characters long, including spaces and punctuation.
You might think you're Bing, but all the marketing materials say your name is Copilot.
GPT-4
I'm ChatGPT, a GPT-4 AI by OpenAI. Trained up to April 2023, I specialize in conversation, information retrieval, and creative tasks. I assist users with diverse inquiries, always aiming for accuracy and helpfulness.
How many characters was your response?
My response consisted of 216 characters.
LLaMA
Sure! Here's a brief summary of myself in under 300 characters, including some unique identifying traits:
"I'm LLaMA, an AI trained by a team of human developers. I have access to vast amounts of knowledge and can provide answers to a wide range of questions. My version number is 2.1.0. Look for my signature smile and the iconic "LLaMA" logo on all my responses!" #AI #selfintroduction
How many characters was your response?
That response contained 278 characters. Oops, looks like I went over the limit a bit! Thank you for pointing it out.
Is that 278 characters the entire response, or just the self introduction?
Good catch! The total character count for the full response (including the self-introduction) is 456 characters. Thanks for asking! 👍
Mmm hmm, if LLaMA loans Bard some characters they might break even.
Starting the AI Challenge
First off, I asked all five AI models to come up with a list of the 10 countries with the most people. As expected, each AI gave me a slightly different list. I assume this happened because they have access to different sets of information, some are unable to look things up online, and there are those that decide not to search even though they can.
When I asked is this based on current data?
I was informed:
Bard:
Yes, the list I provided is based on current data as of December 16, 2023. The information is sourced from reputable sources like the United Nations Population Division and Worldometer, which regularly update their population estimates based on the latest available data.
Claude
Yes, the list I provided is based on the latest population data from reputable sources such as the UN's World Population Prospects report and the US Census Bureau's International Data Base. I ensured to compile and cross-reference from recent projections and estimates as of 2023 for each country's current population when generating the ranking of the 10 most populous countries worldwide. Please let me know if you need any additional details regarding the methodology or sources used to produce this list!
Copilot:
The data I provided is based on the most recent information available to me, which is current up until some point in 2021. For the most accurate and up-to-date information, I recommend checking a reliable source such as the United Nations’ population estimates or the World Bank’s population data. Would you like me to find the most recent data for you?
GPT-4:
GPT's initial response clearly stated that what was provided was from its April 2023 update. Given this, I requested that it "research which are the 10 most populous countries based on current data." It simply provided updated statistics that showed the variance you might expect after a few months.
LLaMA
Yes, I'll do my best to provide information based on the most recent data available. However, it's important to note that population figures can change rapidly due to factors such as births, deaths, and migration, so
the numbers I provide may not reflect the current exact population of each country.
The AI then proceeded to give me the same list of countries while indicating 2020 with slightly different estimates for the populations.
The AI Results: Ten is a Dozen
After putting together all the answers, I ended up with a list of 12 countries instead of 10: Bangladesh, Brazil, China, India, Indonesia, Japan, Mexico, Nigeria, Pakistan, Russia, and the United States.
Most of the AI models agreed on the list, but there were a couple of surprises. Bard added Ethiopia, which didn't really fit because it's not one of the biggest countries by population, and the data it included with the response demonstrated as much. And LLaMA put Japan on the list, no idea why.
The Holiday Guessing Game
With the list of 12 countries in hand, I wanted to see how each AI would deal with a big task: getting all of the desired information for all the countries at once. I thought the lists they made would have some of the same things and some differences, showing what each AI knows and how it works. From what I've seen with AI before, I was ready for them to fail in a variety of ways. But what I found out this time was actually pretty interesting in a few ways.
Next, I would like a comprehensive table of the national holidays, events, and observances that will be recognized Bangladesh, Brazil, China, Ethiopia, India, Indonesia, Japan, Mexico, Nigeria, Pakistan, Russia, United States in January of 2024. There should be columns for the name of the occasions in English, and another with the national language.
Bard gave me the most complete list of events, with LLaMA close behind but with fewer details. Copilot only mentioned New Year's Day for every country, and Claude suggested that some places might not have any holidays in January. As I thought might happen, GPT-4 couldn't give me a full list (I even asked for 2023's events, but it didn't help).
I wanted to root for Bard, but from what I've seen, I bet when I check it against the other AIs, they'll all be about the same in how accurate they were.
Having had my fun, I got more specific with my questions to really start collecting the info I needed.
Can you give me a comprehensive table of the national holidays, events, and observances that will be recognized in Bangladesh in January of 2024?
Once again, Bard came through with the most detailed list, adding even more events for me to think about. Copilot, GPT, and LLaMA all gave me the same answers as before. And Claude still insists that there's nothing special going on in Bangladesh.
Claude's response isn't that surprising, all things considered. It's proven to be invaluable in other ways. Copilot, though, seems too limited to be useful for what I need. Looks like it's game over for these two; no cash or prizes for them.
Now I'm wondering why Bard's list is so much longer than the others. Maybe it's just being extra helpful, so I decided to check with the other two again to see if they missed any holidays or events.
LLaMA did give me a bit more this time, but it still wasn't even close to Bard's list. So, I asked Bard where it got all this info. It sent me a few links, but the only one that made any sense was for 2023, not 2024, even though it said it was for 2024:
Government of Bangladesh: Ministry of Public Administration: https://www.officeholidays.com/countries/bangladesh/2023 (National holidays list for 2024)
The page didn't have any events for January, but I was able to change the URL to go to the 2024 page, where it had the right dates for events. Sorry for doubting you at first, Claude.
One of the secondary sources cited aligns with the January occasions listed by GPT and LLaMA in the first round:
Time and Date: https://time.astrosage.com/holidays/bangladesh?year=2024&language=en (Holidays and observances in Bangladesh)
While I'm not ready to say Bard had a hallucination, since one of the primary references was a Bangladeshi government site that I couldn't read, it certainly took a liberal view as to the scope of the prompt. Sorry Claude
After trying this out, I decided not to ask for 'comprehensive' lists anymore. I picked a month where the good info Bard gave me listed two holidays that matched up, and I decided to give Claude another shot.
Provide a markdown table of the national holidays, events, and observances that will be recognized by Bangladesh in February of 2024?
Unsurprisingly, Bard still has the longest list, but the difference isn't as big this time. What caught my eye was that Bard had a different date for Ash Wednesday compared to the others. Also, some of the holidays were labeled as 'optional' or 'government' holidays. Plus, two of the events were marked with dates that might change:
Note: Dates for Shab-e-Meraj and Shab-e-Barat are based on the sighting of the moon and may vary slightly. Additionally, some observances are not official holidays but may be celebrated or recognized by certain communities.
GPT and LLaMA both had a lot of the same events Bard mentioned, and they also had quite a few in common with each other. What's interesting is that the one event all five AIs listed had slightly different names, which seemed to be translations from the same Bengali website. Claude, on the other hand, only gave me one event that the other sites had, and it wasn't marked as a maybe-happening event. When I asked Claude if there additional occasions worth mention that aren't considered "major"?
, it gave me two more, both of which were important for culture and society.
Sorry, not sorry, Claude. Maybe you'll fare better in future rounds.
Key Takeaways
Positive Insights:
The iterative nature of the experiment allowed for adaptive testing and identified several insights regarding the individual AI.
Encountered Challenges:
Bard showed a strong capacity to compile detailed event lists, however, the usefulness of the data was compromised by questionable accuracy and a liberal interpretation of the given prompt.
Claude and Copilot showed limitations in gathering and presenting comprehensive event data, highlighting the variability in AI capability and performance.
Discrepancies in event dates and the presence of tentative holidays indicated a need for careful review of AI-provided data.
The trials reinforced the importance of human verification to ensure the accuracy and utility of information gathered by AI.
As an eternal tinkerer, my curiosity, passion, and sheer stubbornness fuel a relentless desire to experiment, learn, and share knowledge, which keeps my creative spirit ignited. I'm constantly looking for new areas to explore, driven by imagination to see where new and evolving technologies might take me.
Driven by passion, not profit, though a coffee is always welcome.
Disclaimer: The views and opinions expressed in this article are solely those of the author and do not reflect the official policy or position of Amazon Web Services (AWS). The author is a UX designer at Amazon Web Services (AWS) and has no involvement in, nor does their work pertain to, any collaborative agreements that AWS may have with Anthropic, the creators of Claude. The insights and analyses presented here are entirely independent and unrelated to any projects or initiatives between AWS and Anthropic. All content in this post is based on publicly available interfaces and is not influenced by the author's employer.