In the world of artificial intelligence, size has often been a key barometer of power and capability. It's a narrative that's been widely upheld, particularly with the emergence of increasingly sophisticated AI models. 

Recently, the founder of Comma.ai, George Hotz, later backed by Soumith Chintala, co-founder of PyTorch at Meta, confirmed that all together, GPT-4 contained over 1.7 trillion parameters: more than ten times the 175 billion parameters GPT-3.5 contained. Th Although GPT-4 isn't a monolithic giant with over a trillion parameters but instead a collection of smaller models, each with around 220 billion parameters, this still raises a question in the world of LLMs: does a larger size always equate to improved performance; or could we be mistaken?

The first iteration of the GPT model family, introduced by OpenAI in 2018, boasted a modest 117 million parameters. Its successor, GPT-2, pushed past the billion-parameter mark, exhibiting a more than tenfold increase from its predecessor, with 1.5 billion parameters. The progression didn't halt there. GPT-3 further upped the game with a whopping 175 billion parameters—more than 100 times that of GPT-2—and GPT-4 has continued the trend, boasting over 1.7 trillion parameters, a tenfold increase from GPT-3. 

It should be clarified that OpenAI has not confirmed or denied any of the details related to GPT-4’s architecture, but given the converging opinions of many AI experts, we’ll operate on the assumption that the putatively “leaked” stats, model architecture, and training details are, at least, directionally correct. However, operating on the assumption of “generally correct-ness”, the pattern is clear: each successive generation brings slight architectural modifications and a significant amplification in parameter size, seemingly enhancing performance in each step.

However, when we delve into the realm of conversational chatbots, defining "performance" becomes a trickier task. In the context of training large language models, performance might be straightforwardly gauged by metrics measuring the accuracy of the next token prediction. But when everyday users interact with ChatGPT, performance assumes a more nuanced meaning. It hinges on the chatbot's ability to respond to user requests with a degree of precision and efficiency that renders the interaction helpful and satisfying. 

From the Consumer’s Perspective 

With the user count of ChatGPT falling for the first time since its release by a whopping 9.7% in June, users might be seeking alternatives that are less costly and better suited for their needs. 

Sure, GPT-4 may reign supreme to other chatbots in its general abilities, but it's also costly for the average user, with the subscription summing up to $20 per month while only allowing 25 messages every 3 hours. 

From the perspective of the consumer, there are many alternatives that are practically free that offer similar, if not better, performance to GPT-4. After the initial hype, GPT-4 seems more like a cool tech demo that OpenAI occasionally integrates new features onto rather than a convenient app to have in your digital tool belt.

For example, Quillbot, a website that uses GPT models to help you rephrase and rewrite sentences, is not only free but will also preserve the original structure of your writing: avoiding being flagged by various AI content detectors. Additionally, ChatPDF allows you to upload any pdfs to their website, and a Large Language Model will analyze its content and answer any questions you have about the document. There are more amazing AI tools like these that are offered at a much lower price, or even free in the case of the two mentioned above, that can accomplish tasks faster and arguably better than GPT-4. 

In a world where working smarter produces far more valuable returns than working harder, the ultimate goal of LLMs is to improve the quality of life, not satisfy performance metrics. 

Censorship Drives People to Seek Alternatives

Another huge reason for the decreased user numbers in GPT-4 can be attributed to OpenAI’s censorship of the chatbot. While it's unquestionable that some degree of control is necessary to prevent misuse and ensure ethical usage, it's also a double-edged sword. Censorship tends to temper the raw potential of GPT-4, curbing it from delivering responses that lie within its true capabilities.

With this in mind, consumers are often left dissatisfied as they are denied access to a breadth of information, responses, and interactions that GPT-4 can otherwise provide. 

For instance, the system is programmed to avoid certain controversial topics or to provide overly cautious responses in many scenarios, leading to a perceived reduction in the model's authenticity and utility. This approach can inadvertently stifle creativity, freedom of expression, and even the quality of technical outputs in some cases, as users are not presented with the full range of possibilities that the model can generate.

To avoid censorship, people seek models that can be run locally without the intervention of a third party, which solidifies the idea that bigger isn’t always better. 

From a Technical Perspective

Unfortunately, a model like GPT-4 is not only closed-source but also impossible to run locally for an average user, even if its code were available, due to the sheer size of the model. Luckily, since the release of GPT-4, many alternatives have been much smaller in size and have somewhat comparable performance to the GPT family of models. 

HuggingFace has a web page dedicated to ranking the performance of open-source LLMs, all of which is feasible for an average user to get running. On the smaller end of the model, there’s Stanford’s Alpaca 13B, a model small enough to comfortably run on a modern personal laptop with its performance matching that of GPT-3.5. On the larger side of things–the best of the best in open-source LLMs, Falcon 40B-instruct, only has a fraction of the parameters compared to GPT-4 but ranks first among all open-source LLMs and it has an Apache 2.0 license, meaning that it can be adapted for commercial use. 

Furthermore, Microsoft research unveiled an open-source 13 billion parameter model in early June, known as "Orca." According to the paper published, this model rivaled or even outperformed GPT-4 in specific tasks, while its overall performance is on par with GPT-3.5. Intriguingly, Orca was trained to mimic and internalize the reasoning process of Large Foundational Models like GPT-4. This is particularly interesting because, just a month earlier, UC Berkeley researchers had published a paper asserting that "model imitation is a false promise," arguing that imitation extends only to style, not intelligence. The advent of Orca has effectively debunked this assertion.