Last week (Aug. 23-24), the LineaPy team attended the 2022 Ray Summit in downtown San Francisco. We went into the conference hoping to understand attendee workflows as well as the pains and successes that occur when moving from research to production. We’re excited to say that the conference proved to be much more fruitful. We wanted to share some of our observations and future integration plans between LineaPy and tools like Ray. 

Breaking Out of Local Minima 

A central theme to this year’s event was that the advancement in data science and machine learning comes from breaking out of local minima. One of the ways Ray helps do this is by providing core abstractions of Actors, Futures and Objects. Each of these concepts has a corresponding API that’s simple to grasp, but gives users a powerful set of tools to scale their increasingly complex workloads in an effective way.

Finding the DS/DE Balance 

We see a dynamic emerging between the art of data science that involves drawing on a deep toolkit of powerful libraries, and the process of data engineering that values predictability, reliability, and scalability. We believe that businesses that enable both of these viewpoints, rather than trying to force the methodologies of one discipline onto the other, are going to be the most successful at advancing ideas into the marketplace. 

That’s why LineaPy strives to acknowledge this reality and aims to be one of the critical tools in the machine learning data stack. We want to make the handoff between data science and data engineering teams seamless. As an open source software, LineaPy is a low code solution that embraces the highly interactive and iterative nature of data science by making it possible to easily identify and extract code. From there, code can be automatically transformed into pipelines that can be run on a variety of workflow engines.

Collaboration: The Core Value of Open Source 

The LineaPy and Anyscale teams collaborated on a proof-of-concept to automate the creation of Ray workflows. The resulting code demonstrates how the graph and artifact abstractions of LineaPy makes development to production transformations easy to implement.

The LineaPy team also co-hosted an #AfterRay party with Featureform and Gantry. Nearly 150 people spanning universities to big tech companies signed up to mix and mingle with us. It was great to meet so many new faces! 

What’s Next 

LineaPy’s superpower is its workflow abstraction derived directly from raw user Python code, which allows it to automatically generate data pipelines compatible with different frameworks. 

A key integration for LineaPy is with Ray. That’s one of the reasons we were so excited to attend last week’s event and get to interact face-to-face with Ray users and developers alike. The Ray integration by our collaborators at Anyscale is the first major community contribution to LineaPy. Integration with a few more frameworks such as Kubeflow and MLFlow Pipelines is in the works!

We want to thank AnyScale for hosting the 2022 Ray Summit, and the Ray community for their efforts to push the boundaries of distributed machine learning and data science. We’re looking forward to more collaborations and technologies that spawn from this event. 

We’d love to connect with you – whether it’s a technical collaboration or learning how we can help you implement LineaPy into your daily workflow. Find us on Twitter, Slack, GitHub or LinkedIn.