Welcome to another edition of 'The Lean Data Journal'. Every week on Monday, we curate the most significant updates and advancements in the world of data and AI. In line with our commitment to lean data practices, we bring you five carefully selected picks from last week that are important for data professionals and enthusiasts. These picks highlight key developments that align with our mission of promoting minimalism, efficiency, and sustainability in the data landscape. Let's dive in and explore why these picks are crucial for the lean data movement:
Jupyter AI: AI-Powered Assistance for Jupyter Notebooks
Jupyter introduces AI-powered assistance to Jupyter Notebooks with their jupyter-ai extension and it’s a piece of huge news for us Notebooks lovers! It will change the way data professionals work with this essential tool. With AI capabilities to explain code, fix errors, and generate code and notebooks from natural language prompts, Jupyter AI enhances productivity and promotes a lean data workflow. I installed it last Friday on my local Juypyter Lab instance and it helped me craft an analysis of my LinkedIn content in no time. Once we are done with V2 Alpha, we will surely integrate it into NaasLab.
Check out the repository on GitHub and start using it now
Python Aims to Remove the Global Interpreter Lock (GIL)
Python's ambition to remove the Global Interpreter Lock (GIL) is a game-changer for data processing and analysis. By eliminating the GIL, Python will unlock the full potential of multi-core CPUs, enabling efficient multithreading and significantly improving performance. This was a big source of frustration for Python users so it aligns perfectly with the lean data principles of maximizing resources and optimizing data operations, making Python an even more powerful tool.
Learn more about it on infoworld.com
Polars Raises $4 Million Seed Funding for Fast Data Frame Library
Polars, the fast data frame library, secured $4 million in seed funding, a testament to its potential to handle large datasets efficiently. With its impressive speed, lightweight design, and ability to handle data larger than available memory, Polars offers a lean alternative to traditional data frame libraries. This funding will further support the development of Polars and integration with other data science tools (like PyTorch, maybe?), empowering lean data practitioners to work with big data effectively and sustainably.
Check out the communication on BusinessWire.com
NASA and IBM Openly Release Geospatial AI Foundation Model for NASA Earth Observation Data
NASA and IBM have openly released the HLS Geospatial Foundation Model (HLS Geospatial FM), an AI model built using NASA's Harmonized Landsat Sentinel-2 (HLS) dataset. This open-source geospatial AI model enables tracking land use changes, monitoring natural disasters, and predicting crop yields. The release aligns with the lean data principles we believe in, reducing the need for extensive training datasets and enabling data products to be created on top of it. This collaborative effort is an inclusive and transparent approach to Earth science research.
Deep dive into this topic on the NASA website
Together.ai extend Llama2 to a 32k context window
LLaMA-2-7B-32K, a 32K context window model, has significantly impacted the open-source world as it can now compete with openAI GPTs. Together.ai has played a crucial role in developing this model using position interpolation techniques. The model's extended context window opens up possibilities for various applications. With improved speed and support for longer context, this model contributes to advancing lean data practices and is available on platforms like Hugging Face. We are dying to use it in our MyChatGPT interface; it’s in the backlog.
Check out the model page on Hugging Face.
That’s it for this week.
If you liked this update, or have suggestions on improving the length and format, your feedback is welcome!