I have been experimenting and creating small projects with OpenAi for over a year now and while It seems OpenAI is staying at the top of the charts there are a couple of alternatives that don't get as much press and have competitive offerings. AWS Bedrock and Google Gemini are two very interesting options, especially if you already operate within those ecosystems.
In this article I'm going to explore Google Gemini. This one has popped upon my feeds more recently and I saw some claims that Gemini chat was much faster than OpenAI's ChatGPT. When I did a quick test of Gemini Chat the claim that it was way faster seemed to stand up so I'm digging in to explore further.
This will not be a full comparison of Gemini to the competitors, I'm just taking you along for my exploration journey. Hope you enjoy the ride.
I'm interested in the AI chat capabilities, the API offerings and I saw part of a video that showed Gemini evaluating a profit and loss statement and debugging a Make.com workflow. That last bit is really interesting since it has the potential to be really powerful. If we are lucky, I'll find some additional features to explore that I wasn't expecting.
My first step is to check out the Chat feature since that seems to be the most common usage for OpenAI that everyone is familiar with. You can find Gemini Chat at gemini.google.com.
As you can see, the interface is very similar to OpenAI with my chat history on the left and a prompt field. If you've used OpenAI or any chat, this will seem very familiar. The first difference I'm seeing is that the response times are blazing fast relative to OpenAI. I tried it out with a chat about what a restaurant website should include and the responses were fast and seemed accurate and helpful.
In my chat window, there was a disclaimer that my chats were not used to improve the models. This has me interested since that is an issue lots of folks have with OpenAI. When I dug a bit deeper on that I see that since I have a work account (long story, but I've had an organization account for my personal use). I found that qualifying editions of Google Workspace provides access to Gemini Apps with "enterprise-grade data protections". I'm not sure what enterprise-grade data protections means, but it sure sounds good if I'm worried about employees putting proprietary or sensitive data into chat windows. You can find the details I found here.
If you are using Google Gemini with a personal account, the terms are very different. It appears that your chat will be used to improve the model similar to OpenAI. You can learn more here. I recommend you consider this when putting information into your chats.
I had a couple of coding examples I tried Gemini for and for the most part was happy with the results. In one case, I asked Gemini to code a Regex for .NET to parse a person's full name into First and Last name. The first response was very quick and generated a C# console application with a Regex and even included a description of the Regex.
When I tested the code and Regex, the samples Gemini provided did not all work. When I explained the issues to Gemini, it was apologetic and revised the pattern. The new results were better but still not perfect. After a few tries, we got a working solution. It would have been nice if Gemini nailed it on the first try, but this is a tricky problem to solve and I was able to arrive at a solution in a fraction of the time it would take to code the Regex myself.
At one point, I decided I need a Polyglot Notebook to test what I was getting from Gemini. I was surprised when I asked Gemini to generate the output in .NET for Polyglot Notebook. It quickly generated code in Python, C# and F# that can be pasted into a Polyglot Notebook. This surprised me. These are all languages that Polyglot Notebook supports and that .NET supports, so in hindsight this makes sense. I was mostly surprised that Gemini realized I might want the three options.
In exploring the Gemini API, I found Google AI Studio. Google AI Studio is a site that will help you experiment with creating solutions with Generative AI. There are options to create and tune prompts, improve models with additional data and links to Gemini API documentation.
I generated a free API key from Google AI studio to use in a sample project. At this time, you can generate a key without a credit card and the free tier is very generous. For most personal projects, you are not likely to exceed the limits on the free tier. This is exciting since OpenAI requires a credit card and the purchase of credits to use the API.
Note that the free tier will use your data for improving the model. The Pay-as-you-go pricing does require billing setup and it does not use your data to improve their model.
I was excited to see articles on using Gemini with Semantic Kernel, but found it is definitely early days. The required package for Google is still alpha versioned and requires a pragma to compile. I updated my article scan example to use Gemini instead of OpenAI in a few minutes and ran it. Disappointment quickly ensued. The output was not in JSON format and was not coherent. I used the gemini-1.5-pro
model, which seems to be latest and greatest model.
After a quick search it looks like Gemini supports structured output. I'm guessing that the alpha package for Semantic Kernel has not implemented this feature yet. The JSON output option for Semantic Kernel is still in evaluation status and also require a pragma to compile, so that seems reasonable.
After reading the Google AI documentation on JSON output for a bit, I decided to try revising my prompt a bit and had success. The output makes sense now and the format is in JSON mostly. For some reason Gemini output is in Markdown which is inconvenient, but not a show stopper. I wonder if with some fiddling with the prompt or other settings if we can resolve that too.
Output from Gemini (yep, it included the ```json in the output):
```json
{
"Author": "Brad Jolicoeur",
"PublishDate": "2024-09-28T00:00:00Z",
"Title": "Convert HTML into JSON using Semantic Kernel and OpenAI",
"Summary": "Improved page scanning accuracy by converting HTML to markdown and extracting metatags with XPath, then leveraging OpenAI's JSON output for structured data.",
"KeyWords": "Semantic Kernel, OpenAI, HTML, JSON, Web Scraping, Data Extraction",
"ImageUrl": "https://storage.googleapis.com/blastcms-prod/blog-blastcms/3d223ea9-420e-4622-90b1-b8beba986840-20240928183627.jpg"
}
Output from OpenAI:
{
"Author": "Brad Jolicoeur",
"PublishDate": "2024-09-28T00:00:00Z",
"Title": "Convert HTML into JSON using Semantic Kernel and OpenAI",
"Summary": "Utilized Semantic Kernel and OpenAI to enhance page scanning accuracy by converting HTML to Markdown and extracting metatags, resulting in structured data outputs.",
"KeyWords": "OpenAI, Semantic Kernel, HTML, JSON, Data Extraction",
"ImageUrl": "https://storage.googleapis.com/blastcms-prod/blog-blastcms/3d223ea9-420e-4622-90b1-b8beba986840-20240928183627.jpg"
}
Since I'm just doing an exploration, I'm not going to get into the details on the prompt and the Semantic Kernel, but you can find my notebook in this Github Repo if you are interested.
While it is early days, it is very encouraging that I was able to take a Semantic Kernel example that uses OpenAI and in a few minutes revise it to use Gemini. This is powerful since I spent no time learning the Gemini API to get my example to work. My knowledge of Semantic Kernel transferred.
The reality that the prompts need to be reworked between the two platforms is an eye opener though. This shows that the API abstraction works, but the prompt refinement is the most time consuming part of integrating with the API. Using Semantic Kernel to quickly switch between platforms is not a realistic expectation.
Beyond Semantic kernel, I found a couple of .NET SDK for Gemini that were available for accessing Gemini directly. As you might guess the Google provided SDK are Python and Go.
Both of the options for .NET SDK I found seemed to be relatively simple and straight forward to utilize. Creating a solution using one or both of these SDK is now on my todo list.
You also have the option to craft your own client for the Gemini Rest API.
https://ai.google.dev/gemini-api/docs/quickstart?lang=rest
My initial impressions on Gemini with a minimum of exploration and testing is that it compares closely with OpenAI. It is easy to use, has a fast response and there is good tooling around it for building solutions. I really liked the Google AI Studio and that the API doesn't require payment to use for small projects.
If your organization already uses the Google ecosystem, then I think Gemini is where you probably want to start with building solutions. The data protections you get are key and you likely don't need to pay any extra fees to get it.
For my personal use, I'm probably going to spend more time using Gemini over OpenAI. I've had really good responses from Gemini that are fast and accurate. The free API access is a big plus as well.