Can they do it? Or not? AI companies claim (and very enthusiastically so) that their models vary between good and amazing, at almost everything under the sky. Programming tasks, coding and debugging so-called prowess is one of the reasons for that confidence. There’s a laundry list of AI models that lay stake to this claim. OpenAI’s GPT-4 and GPT-4o, Anthropic’s Claude Code (this underlines Canva’s new Code feature; we’ll talk more on that in a bit), Google Gemini (and potentially even AlphaCode) and Microsoft’s own Copilot are some of the general purpose AI models; and amongst the developer focused AI tools, there’s of course a case to be made for Amazon’s CodeWhisperer, as well as Github Copilot that’s built on OpenAI’s models. In 2023, GitHub CEO Thomas Dohmke had predicted that “Sooner than later, 80% of the code is going to be written by Copilot. And that doesn’t mean the developer is going to be replaced.” There’s more to this story, than unbridled optimism.

A new study by Microsoft Research (you can read more, if this piques your interest) suggests that many AI models still fail to correctly debug many an issue with software they are tasked with — this fact illuminated by software development benchmarks called SWE-bench Lite, Mini-nightmare, and Aider. They compared 9 different models from OpenAI (GPT-4o, GPT-4o mini, o1 Preview and o3-mini), Claude (3.7 Sonnet), Meta Llama (Llama-3.2-3B-Instruct and Llama-3.3-70B-Instruct), as well as DeepSeek (DeepSeek-R1-Distill-Llama-70B and DeepSeek-R1-Distill-Qwen-32B).
The results are worrying. No AI model in the debugging benchmark posed by SWE-bench scored more than 50% in terms of success rate. Anthropic Claude 3.7 Sonnet led the charts with a 48.4% score, followed by the o1 (30.2%), and o3-mini (22.1%). There’s potentially light at the end of this dark tunnel, if one keeps walking nonetheless. “We believe this is due to the scarcity of data representing sequential decision-making behaviour (e.g., debugging traces) in the current LLM training corpus,” say the researchers. If data is the weak link, then getting more data for training shouldn’t be a problem. It was said long ago that data is the new oil. And AI companies have lots of that.
REALIGNMENT?
We are hearing that iPadOS 19 (the first glimpses of which, we’ll see at this summer’s WWDC conference), will be closer to the workings and behaviour of macOS, than it has ever been. That is perhaps the best news for iPad users — at least the ones (I’ll admit, I’m in that demographic) who use a keyboard and make genuine attempts to replicate productivity, multitasking and app window management of macOS on iPadOS. That dream may well be coming true; for now, let us keep our fingers crossed.
Our coverage of the Apple iPad, and iPadOS evolution…
SPEED
The pace of AI development is at the blink-and-you-miss stage. OpenAI has introduced GPT-4.1, which is a a successor to the GPT-4o multimodal AI model, from last year. Of course, the claims are it is better in every respect, with particular improvements claimed for coding (their benchmark tests peg this as much as 21.4% better than GPT-4o) and instruction following (a claimed 10.5% increase over GPT-4o). There is GPT-4.1, alongside GPT-4.1 Mini and GPT-4.1 Nano. OpenAI confirms these models have larger context windows—supporting up to 1 million tokens of context. This comes a few weeks after Google released Gemini 2.5 Pro ‘thinking model’, something we’d analysed in detail
DECISIVENESS

It is a delicate dance, one with no particular cohesion or sequence of steps, for big tech companies still trying to understand what happens next with the trade war. At from an India perspective, the opportunity and the importance of this moment, is something I’ve tried to illustrate. India should largely benefit not just from the large consumer base, but also the existing electronics manufacturing infrastructure that has well and truly plugged into the global supply chains. The need of the hour, is to corner a large share of the pie; and with that long-term partners, and big numbers.
The Performance-Linked Incentive (PLI) scheme announced in March, now gains even more importance as a factor of time (no one expected Donald Trump’s tariff uncertainties to go on this long). Here’s what has transpired.
- Government of India has announced significantly increased budget allocations for key sectors under the Performance-Linked Incentive (PLI) Scheme in 2025-26.
- This includes Electronics and IT Hardware which now has an allocation increase from ₹5,777 crore (this was an already revised estimate for 2024-25) to ₹9,000 crore, and Automobiles and Auto Components being earmarked for an increase from ₹346.87 crore to ₹2,818.85 crore.
Below are some examples of the standpoint from which India must build quickly.
- Apple via Suppliers like Foxconn, Pegatron, and Tata Electronics, has detailed plans to increase manufacturing numbers in India; a barometer for that is exports from India are projected to cross $15 billion in 2025.
- Micron Technology has lined up investments in Gujarat for its assembly, testing, and packaging (ATMP) plant.
- Tata Electronics’ commitment of $152,000 crore ($18 billion) for India’s first major semiconductor fab in Gujarat’s Dholera for 28nm and above chips.
EVOLUTION

This week, the annual Canva Create keynote gave us a view of what the still-one-of-its-kind creative suite has in store as new upgrades. I had a keen eye on this one, considering Canva made some major forward movement at last year’s Create. Canva’s competition isn’t one rival, as I often point out; their competitive landscape doesn’t just include creative and workspace tools from the likes of Adobe, Google and Microsoft; but an increasing number of generative AI tools that rightly stake claim to creative prowess themselves. The Ghibli style smarts of the OpenAI’s ChatGPT is an example with image generation, while Google’s Gemini 2.5 stakes claim as the most proficient coding model. Among all that is new (and there is a lot), a few will stand out in terms of relevance.
There is of course Canva AI, which underlines everything that is new. Cliff Obrecht, who is co-founder and chief operating officer of Canva tells us that increasingly powerful open-source models are helping in building their own omni-models (these are multimodal AI systems understanding relationships between text, images, audio, and video in a prompt), as well as smart acquisitions such as that of Leonardo.ai which helped unlock a powerful image generational foundation model. When it comes to Canva Code, this is in partnership with Anthropic.

Anthropic’s Claude 3.5 Sonnet as well as other Claude models, are among the most refined AI models for coding tasks — they compete with Google Gemini 1.5 Pro, the new Gemini 2.5 and OpenAI’s GPT-4o. There are AI tools such as Cursor, Aider, Windsurf, which provide a choice of these models (among others), to coders. As the name suggests, Canva Code is trying to make it simpler and visually more intuitive for users to try their hand at building an app, with a text prompt explaining what they’d like it to do.
Obrecht and I built an experimental app that compiled an exhaustive list of all cricket matches being played around the world on any particular date, with scores as well as details of the streaming platform or TV channels they’ll each be broadcast on for the India region. The results, which took around a minute and a half to process our prompt, were vividly detailed and impressive.
TRIAL
Proceedings in the landmark antitrust case against Meta have begun in the US, one that is based on allegations that Meta (previously Facebook) hurt competition and behaved like a monopoly when they bought out instant messaging app WhatsApp and social media platform Instagram, many years ago. As Federal Trade Commission (FTC) lawyer Daniel Matheson put it, “They decided that competition was too hard and it would be easier to buy out their rivals than to compete with them.” Meta, of course, doesn’t agree with this line of thinking. On its part, Meta may have a point that they wanted these apps to grow alongside Facebook — after all, the company never let their own services cannibalise each other (except perhaps Messenger, in favour of WhatsApp and Instagram Direct Messages). Could Meta be forced to spin off WhatsApp or Instagram (or worse case for them, both)? This one will not have an easy answer. Or a quick one.
Leave a Reply