AI & Data Last Week Update: 5 picks for Lean Data Practitioners
Revolutionary Robots and Game-Changing AI Usability
Greetings everyone! With a bit of delay because sometimes it’s hard to keep up and get out of what we are cooking up at Naas, here's a dive into last week's AI and data developments, and what they signify for lean data practitioners. I wanted to include the crazy news from Anaconda and Microsoft that Python is finally going to be integrated into Excel. But I guess I will do a special article about it after a good testing session! So here are the top 5 things (+ bonus) that caught our attention last week:
Self-Training and the Rise of Humpback
Meta's new AI model makes a splash. They introduced a new method named instruction back translation for self-training to follow instructions better than LLaMA. It essentially reverses the roles of instructions and answers. The two-phase process involves:
Self-augmentation: Generating instructions for unlabeled data.
Self-curation: Select high-quality augmented data based on a 5-point scoring system.
This method offers a fresh perspective on unlabeled data utility. Lean data practitioners can potentially reduce dependency on expensive labeled datasets and instead harness vast pools of unlabeled data, paving the way for cost-effective model training and refinement.
RoboAgent: A Multitasking Marvel
Meta and CMU unveiled RoboAgent, a multitask agent with 12 non-trivial manipulation skills (beyond picking/pushing, including articulated object manipulation and object re-orientation) across 38 tasks and can generalize them to 100s of diverse unseen scenarios (involving unseen objects, unseen tasks, and to completely unseen kitchens).
This represents the future of automation. Instead of deploying multiple specialized bots for different tasks, a single, versatile RoboAgent can streamline operations, reducing overheads and increasing efficiency. This is a step towards a more integrated and cost-effective robotic automation landscape.
Learn more on their GitHub Pages website
OpenAI Acquires Global Illumination
OpenAI's acquisition of Global Illumination, which operates an open-source Minecraft game, hints at the potential development of an AI simulation platform. Key features include in-browser game creation and play, eliminating the need for installations.
Beyond gaming, the move towards browser-based platforms suggests that data-intensive tasks might soon shift to the cloud entirely. We should be prepared for a future where heavy installations become obsolete, and browser-based data operations become the norm, simplifying deployment and scaling.
Read the paper published by OpenAI about it
Unitree's Affordable Humanoid Robot
China-based Unitree launched an economically priced humanoid robot at under $90K. They have not specified a release date for H1, but the company said it would reach commercialization in the next three to 10 years. Tesla is thought to be targeting a price point of around $20,000 for its Optimus but it’s also not yet on the market. What’s interesting is that their robots' have already been involved in significant events like the Winter Olympics and Super Bowl, they also play roles in inspection and rescue missions.
The democratization of robotics means even smaller organizations can adopt advanced robotics. It's a hint toward a more physically interactive AI future.
Check out the post they did on X(Twitter)
Platypus: The New LLM Prodigy
This recently introduced large language model has secured the top spot in Hugging Face's OpenLLM leaderboard. Its outstanding performance is attributed to:
Curated datasets, with the open-source subset called OpenPlatypus.
Proactive measures against test data leaks and contamination.
Advanced fine-tuning techniques
The meticulous data curation approach reinforces the age-old adage: quality over quantity. For lean data practitioners, it's a lesson in optimizing resources. Instead of massive, uncurated data lakes, the focus could shift to smaller, high-quality datasets, making data management more efficient and less resource-intensive.
Check the ranking on HF’s OpenLLM leaderboard
Bonus: Sketch a Sketch!
Sketch a Sketch from a Stanford student transforms partial sketches into detailed images. This tool exemplifies the potential of AI in bridging gaps in incomplete data. In the future, similar tools might auto-complete missing data pieces, reducing the need for exhaustive data collection and manual imputation.
The past week hints at a future where efficiency, cost-effectiveness, and resource optimization are central. Lean data practitioners, gear up for exciting times ahead! Drop a comment if I missed any gems, and see you in the next update. Cheers!