June 2025, Swapneel on Hacking Algorithms, DW Global Media Forum, and Fairify
Co-founder Swapneel Mehta featured on MAD Warfare podcast and DW Global Media Forum in Bonn. Plus: introducing Fairify, a custom bias detection tool for NLP models, and our June reading list.
Our co-founder, Swapneel Mehta, was featured on the "MAD Warfare: Mutually Assured Distraction" podcast, discussing how to "hack the algorithm." He explored how users can take control of online personalization to get relevant content, rather than letting algorithms dictate feeds. The discussion covered strategies to make algorithms work for you, from adjusting ad settings to optimizing searches for grants or discounts.
Swapneel Mehta, co-founder, recently attended the DW Global Media Forum in Bonn, engaging in vital discussions on media development in the era of AI. Conversations centered on media organizations taking ownership of AI technology, the need for sustainable journalism business models, and the evolving role of journalists in content delivery. He also highlighted the challenge of ensuring reliable information access in digitally underserved regions and expressed optimism for future collaborations addressing the intersection of AI and journalism.
Fairify is a bias detection tool that enables engineers to assess their NLP models for biases specific to their use cases. As NLP models are increasingly integrated into sensitive domains, issues like stereotype reinforcement and unfair treatment have become significant challenges. Fairify helps address these by offering targeted bias evaluation. It also allows developers to create custom datasets tailored to their applications, enabling more accurate testing.
- MAD Warfare Podcast: Co-Founder Swapneel Mehta on Hacking Algorithms
Listen to the episode on Apple Podcasts or Spotify to hear how users can take control of online personalization to get relevant content, rather than letting algorithms dictate feeds.
- Swapneel Mehta's Insights from DW Global Media Forum: Navigating AI & Journalism's Future
Vital discussions on media development in the era of AI, media organizations taking ownership of AI technology, and sustainable journalism business models.
- How a Danish News Service Made a Profit with its Transcription Tool
Zetland's Good Tape has transcribed 10M+ audio files, driving $3M in recurring revenue with its secure, AI-powered multilingual tool.
- ByteDance's Seedance: Cost-Effective AI Text-to-Video Generation
Seedance 1.0 Lite is ByteDance's advanced text-to-video AI model, available through fal.ai, designed for quick and efficient video creation. It transforms text prompts into high-quality 720p videos, costing just $0.18 for a 5-second clip. Ideal for social media and rapid prototyping, Seedance also offers an Image-to-Video feature and a Pro version for premium 1080p outputs with advanced camera controls.
- Joint Retrieval and Recommendation Modeling
Search engines use explicit queries to retrieve relevant items, while recommendation systems rely on user behavior to suggest content. Traditionally, these systems use separate models and pipelines, leading to high maintenance costs and complexity. Joint modeling treats both as top-K retrieval problems with different context sources, enabling a unified approach.
- Python's Instructor Library: Profitable Open-Source Growth
Instructor, a Python library for structured LLM outputs, has achieved remarkable success, boasting over 3 million monthly downloads and 11,000 GitHub stars. This impressive organic growth comes with no external money raised. The project's popularity has translated into a strong $1.4 million in top-line revenue for 567stud.io.
- AI's Power Requirements Under Exponential Growth
AI data centers are driving unprecedented electricity demand, projected to reach 68 gigawatts globally by 2027, nearly matching California's 2022 total capacity. This rapid growth strains U.S. power grids and may force companies to build data centers abroad. Large AI training runs could require up to 8 gigawatts at a single site by 2030 if current trends continue.
- Understanding the Artificial Intelligence Diffusion Framework
Framework for Artificial Intelligence Diffusion was introduced in January 2025 to manage global AI technology diffusion, maintain U.S. leadership, and address security risks. The framework used control over AI chip compute power as a strategic tool and introduced new export controls and licensing regimes for advanced AI chips.
- China's AI Models Are Closing the Gap—but America's Real Advantage Lies Elsewhere
China is expected to match U.S. AI model capabilities in 2025, raising concerns about America's technological lead. Despite this, the U.S. retains a significant advantage in total compute capacity and access to advanced AI chips. A major export control breach in 2024 allowed Huawei to obtain millions of advanced chips, boosting China's AI hardware resources.
- What DeepSeek Really Changes About AI Competition
DeepSeek has released AI models that match GPT-4 and OpenAI o1 performance while using significantly less computing power and costing much less. Their models are openly downloadable and licensed under MIT, making advanced AI widely accessible. This shift increases the importance of computing resources for scaling and deployment.
- Your Brain on ChatGPT: Cognitive Debt with AI Writing
Relying on ChatGPT for essay writing leads to reduced brain engagement, weaker neural connectivity, and lower memory recall. Students using ChatGPT report less deep processing, mental effort, and sustained attention compared to those writing without AI. Switching back to unaided writing after using ChatGPT does not fully restore cognitive engagement, indicating the accumulation of cognitive debt.
- Human Trust in AI Search: A Large-Scale Experiment
A large-scale experiment with nearly 5,000 U.S. participants compared trust in generative AI (GenAI) search results to traditional search. Users generally trusted GenAI search less than traditional search, especially on sensitive topics like inflation, vaccines, and climate change. Features such as reference links and explanations increased trust and willingness to share GenAI results.
- Why 300,000 Scholars Are Migrating from Twitter/X to Bluesky
This research analyzes the migration of 300,000 academics from Twitter/X to Bluesky between 2023 and early 2025. It reveals that 18% of scholars in the sample transitioned, driven powerfully by information sources and peer influence, rather than traditional academic metrics. While simple contagion accounts for two-thirds of exits, shock-driven bursts contribute 16%.
- The Impact of Generative AI on Social Media: An Experimental Study
Researchers conducted a controlled experiment with nearly 700 U.S. participants to assess the effects of generative AI integration on social media. Participants were exposed to four different AI treatments and a control condition. The study found that generative AI tools significantly influenced user behavior and content creation on the platform.
- Who is using AI to code? Global diffusion and impact of generative AI
Generative AI tools are increasingly used in coding to enhance productivity by automating repetitive tasks and improving code consistency. These tools can generate code fragments, enforce coding standards, and aid in software testing. However, their use raises concerns about widening skill and income gaps and potentially reducing developers' understanding of generated code.
- The Schwurbelarchiv: a German Language Telegram dataset for the Study of Conspiracy Theories
The Schwurbelarchiv is a large-scale German-language Telegram dataset focused on conspiracy theories, misinformation, and political extremism. It includes over 6,000 groups and channels, 40 million text messages, and 3 million transcribed audio files from 2016 to 2022. Data was collected using snowball sampling and includes multimodal content for comprehensive analysis.
- TGDataset: Collecting and Exploring the Largest Telegram Channels Dataset
TGDataset is the largest publicly available collection of Telegram channels, containing 120,979 channels and over 400 million messages. The dataset enables large-scale analysis of information flows, language communities, and the spread of both legitimate and problematic content, including conspiracy theories and extremism.
- Updesh: Advancing LLMs for 13 Indian Languages
Updesh is a new, large-scale synthetic dataset designed by Microsoft Research India to enhance the post-training of Large Language Models (LLMs) for 13 Indian languages, including Hindi, Tamil, and Bengali. It addresses the critical gap in culturally-grounded, high-quality instruction-tuning data for Indic languages, crucial for Small Language Models (SLMs) in India.
