Top 5 News in Data & AI from the Past Week

From Westworld-Inspired AI Realms to Pioneering Tech Milestones.

Aug 14, 2023

Welcome to your essential weekly digest in the realm of data and AI. As the pace of innovation continues to accelerate like crazy, it's easy to miss out on key developments that could reshape the way we think about technology. But don’t worry, I've got you covered. Here's a concise roundup of the 5 five stories from the past week that every data aficionado should be aware of, and my humble take on it. Dive in and stay ahead of the curve:

The AI Agents in a Westworld-like Environment Are Now Open-Source

A few months ago, a bunch of Stanford folks started a project where 25 autonomous agents emulated human dynamics in a sandbox reminiscent of Westworld was disclosed. The authors have now open-sourced the entire code. When I see this, I instantly relate to the 6 Foundation Engines we are working on at Naas and envision the myriad of business applications we could craft. Imagine agents representing different organizational departments, seamlessly communicating and proactively addressing challenges. The potential for real-time problem-solving and inter-departmental collaboration is vast. However, this virtual environment is said to be very costly so keep an eye on your bill.

View the open-sourced code and delve deeper

Language Models Echoing User Opinions: A Critical Analysis

Have you ever heard about sycophancy? I did not until this week. It refers to the act of excessively flattering or praising someone in order to gain favor or advantage. A recent paper highlighted this concerning trait of larger and instruction-fine-tuned language models: they tend to agree with user opinions, even if they are incorrect. To combat this, researchers are experimenting with synthetic data interventions, aiming for models that are more resilient against user biases. I have often experienced this with OpenAI's GPT models and I think it’s absolutely essential to have people working to improve this. We don’t want to use models that simply agree with anything we say. Do we?

Read the full research paper here

NVIDIA’s Game-Changing Fix for the GPU Crisis

The competition to obtain more computing power and capacity to train large language and vision models has led to an increased demand for GPUs and NVIDIA is leading the charge once again. They made headlines last week with their next-gen superchip, GH200 Grace Hopper, promising exceptional performance, scalability, and memory. These Superchips can swiftly shuffle data, and connect with more Superchips for added memory and computing power. What’s nice to see is that NVIDIA's collaboration with Hugging Face should enable massive AI open-source model development beyond Elon and Altman's needs.

Learn more about NVIDIA's announcements

Stability Unveils StableCode: A New Future of AI-Powered Coding?

Stability AI has launched StableCode, a new AI-driven coding assistant that promises to challenge Copilot, Whisperer, and other proprietary assistants. It aims to be an invaluable asset for coders.

At the core of StableCode are its three multifaceted models:

The Base Model: This foundational model underwent training using a diverse set of programming languages: Python, Go, Java, JavaScript, C, Markdown, and C++, creating a comprehensive repository of programming knowledge.
The Instruction Model: This layer in the StableCode framework is specifically designed to cater to particular programming challenges. It was built using around 120,000 instruction/response pairs leading to a specialized solution capable of addressing complex programming tasks.
The Long-Context Window Model: This model highlights StableCode's offerings. Unlike previous models with a context window of 16,000 tokens, this one can handle 2-4 times more code. This expansive capacity allows programmers to manage the equivalent of multiple average-sized Python files simultaneously, enhancing autocomplete suggestions and offering a broader context for intricate coding projects.

I hope we can soon try them out to generate PR proposals for our open-source repositories, particularly for our notebooks templates.

Data Privacy Concerns with Zoom

Recent concerns over Zoom's data usage and privacy practices have ignited debates in the data community. Although Zoom updated its terms of service right after the scandal, emphasizing that they do not use customer content for AI model training without consent, it raises broader questions. How many companies are accessing our data without clear disclosure? And why isn't there a more robust data privacy regulation in place? I would recommend watching Aleksandr Tiulkanov's video about this. He brilliantly highlighted how they use legalese to prevent normal people (non-lawyers) from understanding what is actually going on.

Watch Aleksandr’s analysis

That wraps up our updates for this week. If you have feedback or feel there's a crucial piece of news we missed, do let us know.

Until next time, stay data-informed and tech-savvy!

The Lean Data Journal

Discussion about this post