December 2025, Healthcare LLM Evaluations, Sakhi's story, and Splice Beta
Benchmarking AI models for maternal health across English, Hindi, and Marathi. The story behind Sakhi — WhatsApp-based maternal health literacy. Splice Beta 2025 in Chiang Mai and GTSInnovationalogue with Carnegie India.
We are benchmarking leading AI models using the Sakhi Dataset, a parallel set of maternal health questions undergoing expert validation in English, Hindi, and Marathi. Early findings reveal a critical safety concern: AI models perform inconsistently across languages, which could deliver incomplete or inaccurate maternal health guidance at scale.
Sakhi is a WhatsApp-based maternal health literacy conversational agent providing reliable, verified information in local languages. Over recent months, we surveyed women in Jalgaon, Maharashtra, and piloted Sakhi using a human-in-the-loop approach, where healthcare professionals reviewed content and handled queries beyond our curated, expert-verified knowledge base.
AI healthcare tools reach millions in India, but safety evaluations focus overwhelmingly on English. We don't know how models perform in Hindi and Marathi, risking misleading guidance at scale for the populations who need it most. Our goal is to help ensure that future health AI systems provide safe and equitable guidance across all languages and communities.
We attended the Splice Beta 2025 conference in Chiang Mai, Thailand, connecting with the global community working at the intersection of AI, digital discourse, and social impact. We also attended the GTSInnovationalogue, co-hosted by Carnegie India and the Ministry of External Affairs, India, to discuss AI adoption frameworks.
- AI Scientist Thinks: Watching Artificial Intelligence Reason
An interface that makes AI Scientist-v2's reasoning process interpretable, allowing users to observe how artificial intelligence thinks.
- Distributed GPU Training
A comprehensive breakdown of distributed GPU training for large-scale models, covering silicon-level details, parallelism paradigms, and infrastructure challenges.
- C++ Design Patterns for Low-Latency Applications
Advanced C++ design patterns tailored for low-latency applications, with a focus on high-frequency trading systems.
- Unpacking Deceptive Design
Google Public Policy explores how deceptive designs (dark patterns) affect user trust online, based on a survey of 12,000 users across six European countries.
- Law and Policy: Digital Services Act
Legal implications of the DSA, focusing on digital platform regulation and fair, transparent online services.
- DSA Platforms: Digital Services Act
Practical implementation and legal framework of the DSA, emphasizing platform regulation and fundamental rights in the digital space.
- How Conversational Structure and Style Shape Online Community Experiences
How the structure and linguistic style of conversations predict a sense of virtual community, using data from over 2,800 Reddit users.
- ChatGPT Does Not Replicate Human Moral Judgments
While ChatGPT ratings of moral scenarios correlate strongly with average human judgments, the AI systematically deviates in predictable ways.
- Polarization Is Increasing Online
A PNAS paper finds that both ideological divides and emotional intensity are on the rise across major social media platforms.
- The Emerging Market for Intelligence: Pricing, Supply, and Demand for LLMs
Analysis of the LLM market using API usage data from OpenRouter and Azure — rapid growth, significant price declines, and open-source competitiveness.
- Global Claims: A Multilingual Dataset of Fact-Checked Claims
A large-scale dataset of 67,000 fact-checked claims from over 200 fact-checking websites in 50 languages.
- Global YouTube Trending Dataset (2022-2025)
Three years of YouTube Trending videos from July 2022 to June 2025, with four daily snapshots for 104 countries — 78.4M video entries.
