The AI Wave: Exhilarating Advancements and Potential Risks
The rapid evolution of artificial intelligence is truly astonishing, even a bit unsettling. Just as ChatGPT took the world by storm, a relentless wave of generative AI technologies is emerging, each more impressive than the last. It's like a meteor shower of innovation, leaving us both thrilled and apprehensive.
Since my last article on ChatGPT, the AI landscape has undergone a dramatic transformation. On March 14th, Google unveiled its large language model interface, PaLM API. A mere 24 hours later, OpenAI countered with GPT-4, an upgrade to ChatGPT. Microsoft wasted no time announcing GPT-4's integration into its Office suite. On March 16th, Baidu launched Ernie Bot, marking China's entry into the large language model arena. That same day, Midjourney unveiled its fifth iteration, generating images of astonishing realism. Not to be outdone, Huawei, Alibaba, 360, and SenseTime have all introduced their own large language models.
What's even more significant is that we're no longer just witnessing the emergence of foundational models. Companies are actively integrating these models into practical applications. We'll delve deeper into this later. This breathtaking pace of development is both exhilarating and a cause for concern. Elon Musk's 2018 warning still resonates: "Mark my words — AI is far more dangerous than nukes." We'll address the specific concerns at the end of this article.
However, judging by the current trajectory, this AI wave seems unstoppable. Therefore, I felt compelled to write another article about AI, to update both myself and my readers on the recent advancements and discuss the potential risks they might entail. We can't afford to fall behind.
This article isn't about emphasizing or comparing the competition between different companies. It's still early days, and many of these technologies are still in their infancy, with products in beta testing. Declaring a clear winner is premature and insignificant at this stage. Instead, let's explore the recent breakthroughs, applications, and products emerging in this exciting field.
Generative AI: A New Era of AI Capabilities
While artificial intelligence and machine learning have been around for a while, they've mainly focused on analytical tasks, such as big data analysis, AlphaGo's prowess in the game of Go, and facial recognition. The current wave, however, revolves around "generative AI"—AI that can create entirely new content, like text, images, code, audio, video, and more. This signifies a leap forward, implying that AI is developing creative abilities and producing outputs that are strikingly realistic.
GPT-4: Enhancing ChatGPT's Capabilities
Let's start with large language models, specifically GPT-4, the upgraded version of ChatGPT. By subscribing to ChatGPT Plus, users can access the enhanced GPT-4. I've already envisioned names for future iterations: ChatGPT Pro, ChatGPT Max, ChatGPT Pro Max.
We've previously discussed ChatGPT, based on GPT-3.5, and its ability to answer almost any question, write code, generate summaries, and more, albeit with accuracy limitations. So, what enhancements does GPT-4 offer?
The most noticeable improvement is its ability to understand images. It's not merely recognizing objects within an image; it's demonstrating common sense and even a sense of humor. For example, if you ask GPT-4 what's funny about a picture showing a VGA port plugged into a smartphone, it'll point out the absurdity of connecting outdated technology to a modern device. Show it an image of someone ironing clothes in the back of a taxi, and it'll identify the unusual activity. Ask what would happen if you cut the string holding a balloon, and it'll accurately predict that the balloon would fly away.
Present GPT-4 with a rough sketch of a website, and it can generate the corresponding code. While the resulting website might not be perfect, it's impressive that GPT-4 can interpret a hand-drawn sketch.
It's important to note that while subscribing to ChatGPT Plus grants access to GPT-4, the image input functionality is currently only available through the API.
Beyond image input, ChatGPT boasts significant improvements in accuracy, addressing one of its major drawbacks. GPT-4 demonstrates a remarkable improvement in answering questions accurately compared to its predecessor. Comparing the performance of GPT-3.5 and GPT-4 on various human exams, we see a noticeable increase in scores. For instance, in a simulated bar exam, GPT-4 scored in the 90th percentile, a significant jump from GPT-3.5's score which placed it in the bottom 10%.
Besides these key enhancements, GPT-4 brings several other improvements, including increased text limitations, better adherence to content restrictions, and reduced costs.
You might be accustomed to ChatGPT's ability to answer questions, but GPT-4 takes it to another level. Users are getting creative, using it to generate Swift code for animations, create Snake games, draft legal letters, and even generate smart contracts for Ethereum.
The remarkable aspect is that these advancements occurred in less than six months. Imagine the potential in two years if GPT-4 can evolve from a metaphorical elementary school student to a college student in such a short timeframe. It's conceivable that it could surpass the capabilities of many humans and experts.
The reasons behind GPT-4's rapid development, the specific model improvements, and the number of parameters used are all shrouded in secrecy. OpenAI, once a non-profit organization that even open-sourced GPT-2, has transformed into a private company following Microsoft's investment. With fierce market competition, these details are now closely guarded trade secrets. Even the number of parameters used remains undisclosed.
The Rise of Large Language Models
While OpenAI's GPT-4 enjoys a first-mover advantage in the large language model space, other tech giants are scrambling to catch up. Everyone with the resources is training their own models, leading to a surge in development. Notable examples include Google's PaLM and LaMDA, Baidu's Ernie Bot, Alibaba's Tongyi Qianwen, Meta's LLaMA, Huawei's PanGu, and Claude, developed by former OpenAI employees. The market is now flooded with dozens of large language models, with Google alone having seven or eight.
However, this doesn't mean that only tech giants can participate in this AI revolution. Stanford University, using Meta's large language model and a mere $600, fine-tuned their own model called Alpaca, achieving performance comparable to GPT-3.5. Moreover, they open-sourced the code. Soon, individuals might be able to train their own large language models on their smartphones.
Assessing the performance of these models is a complex task. Currently, ChatGPT is generally perceived as more mature. However, given the rapid pace of development, the landscape could change dramatically in six months.
Each company claims its model has unique strengths. This highlights a significant characteristic of machine learning—the generated models are challenging to compare directly, unlike smartphones or computers with easily quantifiable specifications.
Evaluating a large language model's capabilities often relies on subjective testing and observation, similar to assessing a student's abilities based on their exam performance rather than solely on the number of books they've read.
It's likely that dedicated AI evaluation organizations will emerge, similar to financial rating agencies, to assess and rank these models objectively.
Applications of Large Language Models
The large language models discussed so far are foundational models, designed for general-purpose tasks. They are analogous to a child's general education. However, training an AI for a specific professional field requires specialized training and data. While companies like Bloomberg, with vast resources, can train their own financial AI models, smaller companies can leverage existing large language models for fine-tuning.
This involves taking a pre-trained model and training it further on domain-specific data, similar to providing specialized education to a child with a general education. Alternatively, they can utilize the model's API, integrating it as part of their services.
This adaptability is a key strength of large language models. Their potential applications extend far beyond simple chatbots or image generators. Let's explore some of the current and emerging application areas:
1. Search: The integration of large language models into search engines is already well underway. Microsoft incorporates ChatGPT into its revamped search engine, Bing, while Google integrates LaMDA and PaLM into Bard, which serves as a conversational search tool. Other search engines like You.com, Baidu, and 360 are following suit. This signifies a shift from traditional search results to conversational interactions with AI-powered search assistants.
2. Productivity: Given their proficiency in language processing, large language models are well-suited for assisting with various work tasks. Tasks like summarizing documents, improving grammar, and generating content can be significantly streamlined with the help of AI.
For instance, Notion, a popular note-taking app, integrated ChatGPT into its platform, introducing Notion AI. Microsoft, leveraging its ownership of both the Office suite and ChatGPT, unveiled Microsoft 365 Copilot. This integrates GPT-4 into applications like Excel, Word, PowerPoint, and Outlook. Users can now leverage AI to generate content, create presentations from notes, analyze data in spreadsheets, and more.
While the current capabilities are impressive, they are still evolving. However, the potential for seamless integration and automation across various tasks is vast.
3. Image Generation: Generative AI extends beyond text, enabling the creation of realistic images from simple descriptions or by mimicking the style of specific artists. Platforms like Midjourney, OpenAI's DALL·E 2, and Stable Diffusion are pushing the boundaries of AI-generated imagery.
Adobe, the leader in image editing software, introduced its own image generation model, Firefly. This integration of image generation and editing capabilities opens up a world of possibilities. Imagine effortlessly changing seasons in an image, replacing objects, animating static elements, or adding realistic details like a flowing river. These effects, once time-consuming even for professionals, can now be achieved with AI assistance.
4. Other Applications:
Beyond image generation, AI is making strides in video generation, music composition, and other creative domains. Platforms like Runway allow users to generate video clips from text prompts or images. AI-powered music generators can create compositions based on specific parameters like genre, tempo, and key.
The rapid evolution of AI in these multimedia generation areas is remarkable. The quality and realism of AI-generated content, particularly in image synthesis, have improved drastically in a short period, hinting at a future where AI plays a significant role in content creation.
5. Finance: Bloomberg, using its decades of financial data and a massive 500 billion parameters, has developed BloombergGPT, a specialized financial AI. Early tests have shown promising results. Similarly, numerous banks and brokerage firms are incorporating various GPT models into their operations. Tongshi Shun, in particular, has been an early adopter of AI in finance. While acknowledging the gap between their technology and international advancements, their stock price has doubled, reflecting the market's confidence in AI's potential to revolutionize finance.
6. AutoGPT: A particularly fascinating application within the AI community is AutoGPT, which can be understood as an autonomous agent. It's an open-source project on GitHub that leverages the GPT-4 API to create a self-operating agent. By providing a goal, such as "help me start a business and make it profitable," AutoGPT can handle the rest.
AutoGPT differs from ChatGPT by not simply providing answers. Instead, it leverages GPT-4's capabilities in programming, web searching, and long-term memory to achieve goals. It essentially asks itself questions, executes tasks, conducts research, and optimizes its approach through continuous self-improvement.
7. Education: AI's language proficiency makes it a natural fit for virtual language tutoring. Duolingo, for instance, has integrated GPT-4 into its platform, offering Duolingo Max.
8. E-commerce: Shopify utilizes ChatGPT to assist merchants in writing detailed product descriptions.
9. Programming: Github Copilot, powered by AI, has become an indispensable tool for programmers.
These are just a few examples of generative AI's wide-ranging applications. As the technology matures, we can expect to see even more innovative and impactful use cases.
Addressing the Risks of Artificial General Intelligence (AGI)
While the current capabilities of large language models are impressive, they are not without risks. The emergence of AGI, artificial general intelligence with human-level cognitive abilities and potentially even consciousness, is a concern for many.
Some argue that large language models are simply sophisticated word-prediction machines, lacking true understanding or consciousness. Others, however, point to the rapid progress and the increasingly complex tasks these models can perform, suggesting that AGI might be closer than we think.
Microsoft, in its 155-page paper titled "Sparks of Artificial General Intelligence: Early experiments with GPT-4," argues that GPT-4 exhibits early signs of AGI. This has sparked debate within the AI community, with some experts refuting the claim.
Concerns about AGI's potential risks, including the possibility of AI turning against humanity, have led over 26,000 individuals, including Elon Musk, to sign an open letter calling for a six-month pause on training AI systems more powerful than GPT-4.
While the open letter highlights valid concerns, it's unlikely to halt the ongoing AI race. The potential benefits and commercial incentives driving AI development are too significant to ignore.
The fear of AI replacing human jobs is another valid concern. However, history suggests that technological advancements often create new opportunities alongside job displacement. It's crucial to adapt and acquire new skills to remain relevant in an evolving job market.
Conclusion: Embracing the Future of AI
The current wave of generative AI is both exciting and potentially disruptive. While the risks are real and need to be addressed, the potential benefits are too significant to ignore.
Open discussions, ethical guidelines, and ongoing research are crucial to ensure that AI development benefits humanity. The future of AI is in our hands, and it's up to us to shape it responsibly.