In the last post, I explained the basic concepts of agentic coding - the terminology, the constructs, when to use what. It was mostly theoretical. This post is the practical follow-up that describes how I actually build features with AI agents day to day.
I hesitate to call these ‘best practices’ because in this fast moving space, it just means ‘what someone discovered last week’. So, think of this as a field report: here’s what works for me today, and why.
I used to think that I love coding, but in the last year, I came to realize that what I love more is building - creating something useful and beautiful. Last year this time, my coding workflow was to fire up VS Code with Claude in the browser or GitHub Copilot in ‘Ask’ mode and brainstorm with a model, review solutions suggested by the LLM, copy code into the editor, test and deploy🚀. This was fun in the beginning, but soon the context-switching became tedious and broke the flow of building.
In our previous post, Mobile On-device AI: Smarter Faster Private Apps, we explored the fundamentals of running AI locally on mobile devices. Now, it’s time to get hands-on and see this technology in action!
This practical guide walks you through implementing mobile on-device AI using Google’s powerful Gemma model family, including the cutting-edge Gemma 3n. You’ll learn to deploy these models across iOS, Android, and web platforms using industry-standard frameworks.
What You’ll Learn Here
- Test various Gemma models, including Gemma 3n, in Google AI Studio.
- Run sample on-device AI applications on Android using tools like the Google AI Edge Gallery App and MediaPipe.
- Implement on-device Large Language Model (LLM) inference on iOS using MediaPipe.
- Explore how to run LLMs in mobile web browsers with JavaScript and MediaPipe.
- Gain practical experience deploying and interacting with Gemma models across different mobile platforms.
Prerequisites: Basic mobile development knowledge helpful but not required.
While cloud computing drives many AI breakthroughs, a parallel revolution is happening right in our hands - running LLMs locally on mobile devices. This emerging field, known as Mobile On-device AI, enables us to build more private, faster and smarter app experiences - especially as mobile devices become increasingly powerful. As a developer passionate about AI and mobile, I am fascinated by the convergence these two worlds and the possibilities it brings.
Every day we’re seeing fantastic advancements in AI, thanks to more data and powerful computers. This may make it seem like the future of AI is all about getting even more data and bigger computers. But I believe a critical and rapidly evolving piece of the puzzle is about bringing the Intelligence of Artificial Intelligence onto the devices where the data originates (eg: our phones, cameras, and IoT devices) and doing the “smarts” using their own computing capabilities. This is the essence of Edge AI, the topic we’ll explore in this post.
We know that RAG (Retrieval Augmented Generation) is a reliable mechanism to augment LLMs with up-to-date data and ground them on facts relevant to the context of the user query, thereby reducing hallucination. When set up properly, it works pretty well. Companies like Perplexity AI and enterprise applications use RAG extensively.
However, building a RAG pipeline on your own from scratch can be complex and high maintenance. You need to assemble your data sources, chunk the data, index it, generate embeddings, and store them in a vector database. At inference time, you need to generate an embedding of the user query using an embedding model, retrieve the relevant data from the indexed store and return a meaningful context-aware response to the user. On top of that, any change in the data source means that you need to re-index the data, re-generate embeddings, and update the store. Rinse and repeat.
Today I tried something fun - built a Pomodoro timer app mostly by talking to AI instead of typing code myself. And guess what - there is a term for it - vibe coding, coined by Andrej Karpathy 😎.
I have done it a few times before, but this is the first time I am using it to build a full app. I wanted to create something that was useful and worked well, so I chose the Pomodoro timer. Here’s how it went and my key takeaways from this way of building products.
Today I tried Claude Code, the new agentic coding tool announced by Anthropic this morning. Unlike other agentic tools, Claude Code is a CLI tool.
Claude has been my favorite AI coding partner so far. I use it via GitHub Copilot and as standalone through its web interface. I was curious to see how it works in CLI and decided to give it a try.
In this post, I share my first impressions of using Claude Code - how I set it up, what I loved about it, what I didn’t, and how it compares to other similar tools.
Last week, I watched an interview of Aravind Srinivas, the CEO of Perplexity AI (https://www.perplexity.ai). It is a three-hour interview done by Lex Fridman where Aravind talked about the major breakthroughs in AI that brought us to LLMs, the mission of Perplexity, how the technology works, his vision of the future of search and web in general, and some valuable advice for startup founders and young people.
Fascinating interview - highly recommended for everyone to watch. Personally, it opened my eyes to the fact that Perplexity is very different from other chatbots - not only in how it works, but what it is trying to solve. So I started using it for a few days and was blown away by the results 💯. I realized that this is one of the tools that gives you so much value that you cannot imagine going back to the way of doing it.
Perplexity AI (https://www.perplexity.ai) has been gaining attention in the world of chatbots and large language models. I had heard about it in a few forums and mentioned by industry leaders like Jensen Huang and Kelsey Hightower. In fact, I had created an account and tried it out a few times earlier this year, but didn’t take it much seriously.
All that changed last week when I watched this recent interview of Perplexity CEO Aravind Srinivas by Lex Fridman. It is a fascinating interview - highly recommended for everyone to watch, but personally, it opened my eyes to the fact that Perplexity is very different from other chatbots, not only in how it works, but what it is trying to solve. So I started using it for a few days and was blown away by the results 💯. I realized that this is one of the tools that gives you so much value that you cannot imagine going back to the way of doing it.
The standout feature unveiled at this week’s Apple WWDC 2024 event was Apple Intelligence, a personal intelligence system that will be integrated into multiple platforms - iOS 18, iPadOS 18 and macOS Sequoia.
What is Apple Intelligence?
Apple Intelligence comprises of multiple highly-capable and efficient generative models - large language models and diffusion models. These models include on-device models as well as server-based foundation models.
The foundation models are trained on Apple’s open-source AXLearn library for deep learning, built on top of JAX (Python library for accelerated computing and transformation) and XLA (Accelerated Linear Algebra, an open-source ML compiler). The branding of Apple Intelligence is intriguing, positioning it as Apple’s take on “AI”.