Feature Store: Uncover Uber's Secret to Large Scale Machine Learning

uber-ai-michelangelo.png
mlops-production-challenges.png

Enterprises fail to extract value from data science projects because the machine learning workflow is complicated and presents challenges previously unseen in software development. Enterprises must reimagine how they build and deploy digital products to overcome these challenges and deliver data-driven value to their customers.


Data management is at the top of this list of new challenges. In more than any other area along the data value chain, enterprises measure Chief Data Officers (CDOs) by their ability to manage and integrate an ever-increasing quantity and variety of data. Fortunately, CDOs don't have to solve these daunting challenges in a vacuum.

steve-jobs-steal-ideas.png

Instead of recreating the wheel for data management, the wise CDO should first learn from technology leaders who have already solved similar problems effectively. Companies like Uber, Facebook, Google, and Netflix were among the first to encounter the most extensive data management challenges. They were able to convert these challenges to value for both their customers and organizations. 

For example, technology leaders built their foundational data infrastructure stacks around open source software (OSS) for maximum versatility. CDOs outside of the technology sector would do well to take cues from technology leaders and follow suit when possible. When CDOs do not have the in-house expertise to work with OSS directly, there are other options. Enterprises may choose to work with platforms like Databricks that package OSS technology like Apache Spark into a consumable offering. 

Tools like Databricks deliver broad data management capabilities from technology leaders to enterprises in other sectors. Is this a pattern CDOs can repeat in more specific data management domains to deploy machine learning models more effectively? At any given time, Uber has thousands of machine learning models in production that make millions of predictions per second, from estimated time-of-arrival (ETA) to rider demand based on geography. Can CDOs learn from and implement similar successful approaches to deploying machine learning models?

To help answer this question, I recently spoke with Mike Del Balso, who left Google in 2013 and joined Uber to help build its first machine learning platform. Mike is now the CEO & Co-founder of Tecton, which provides an enterprise solution to the top data management challenge he faced at Uber. Here's what you need to know:

  1. Operational machine learning challenges

  2. Feature store for operational machine learning

  3. Feature store solutions and outcomes

Operational machine learning challenges

Feature Engineering For Each New Model

Enterprises fail to develop an operational machine learning practice because they don't have an adequate machine learning pipeline to support their efforts. The current state of machine learning is reminiscent of the environment in which business analysts found themselves before the data warehouse, or lakehouse paradigms emerged. 

It was not scalable for business analysts to extract, transform, and load business-critical metrics like costs and revenue into business intelligence systems every time they needed them. This duplication of efforts was inefficient. Business analysts required a solution. 

Del Balso and the Uber team faced a similar challenge as the business analysts before them, albeit in a different domain. Instead of grappling with business-critical metrics like the business analysts, data scientists struggled to repeatedly extract, integrate, and serve consumable data to their machine learning models. While business intelligence systems consume business metrics, machine learning models require data features as inputs for model training and predictions. 

Both business intelligence and machine learning use cases require structured data inputs. Raw data must be aggregated, integrated, normalized, and placed in a neat schema, like rows and columns, before it is useful. For example, an e-commerce platform may want to predict the next product you want to buy. And perhaps the machine learning model relies on the feature input of aggregated user clicks within product categories over the last 24 hours in order to serve you a product recommendation. 

Like business intelligence, it becomes next to impossible to scale machine learning operations if data scientists must rely on data engineers to build machine learning feature pipelines to pull and transform raw data for each use case. And once data engineers have built the feature pipelines, it can be a major feat to maintain them to keep them current and accurate.

Machine learning use cases go a step further beyond requiring structured vs. raw data. They also need structured feature data to be delivered at different speeds, depending on the machine learning life cycle. When training machine learning models "offline," data scientists need access to reasonably current features with updated values. Once deployed into a production ("online") application, machine learning models require features that are as up-to-date as to when the user makes a request. For instance, a model demands the most current information about the user, like time of day and location, to accurately predict estimated arrival time.  

When Mike arrived at Uber, data scientists had to rely on data engineers to manually build machine learning pipelines to extract, surface, and update these features for every use case. Also, there wasn't a system whereby data scientists could reuse standard features across different use cases. For example, most machine learning models at Uber rely on the feature of user location as an input to make a prediction. It would be impossible to scale any machine learning practice if data engineers had to recreate this feature from scratch for every use case. 

And beyond the technology limits of scaling such a manual process, there are also regulatory concerns. Global companies must comply with regulations including the EU General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), which grant consumers a "right to know" how companies use their data. If, for instance, a machine learning model at a bank rejects a customer loan request due to high predicted risk, the customer has the right to request an explanation. To comply, the bank would likely need to provide the exact machine learning features the model considered (customer age, income, etc.) at the time of judgment. This task is next to impossible to achieve at scale with a purely manual process for feature engineering.

As Mike and his team discovered at Uber, they needed to create a better system for deploying machine learning models and scaling machine learning operations.

Feature store for operational machine learning

Feature Store: Build Features Once, and Reuse Them Across Teams and Machine Learning Models

Del Balso and his team set out on a mission to democratize machine learning at Uber and create a fast path to production. They understood that data scientists needed self-service capabilities throughout the machine learning life cycle to scale. Uber should empower every data scientist to access up-to-date data features, train machine learning models, deploy machine learning models to production, and own live production model performance, with little (if any) help from the data engineering team. 

In addition to reducing heavy reliance on the data engineering team, Del Balso's team sought to eliminate duplicate efforts. It makes little sense to recreate machine learning features from raw data every time the data scientists need them. And suppose machine learning features already exist for one model and use case. In that situation, they should be discoverable to data scientists on separate teams working on different use cases to use pre-built features for their machine learning models. 

Lessons learned in software engineering inspired these objectives to some degree. At Facebook, for instance, new software developers push code (albeit small) to live production the first day on the job. Shipping code to production this quickly is only possible because Facebook provides developers with a self-service hardened development and testing framework, a continuous delivery pipeline, and appropriate governance measures, that we collectively refer to as “DevOps.”

Machine learning projects require a similarly rigorous approach, or machine learning DevOps if enterprises wish to deploy machine learning models to production systems at scale. At Uber, Mike and his team built a machine learning platform named Michelangelo to do just that. They wanted to provide data scientists with self-service access to features without heavy data engineering involvement. They also wanted to build a machine learning pipeline that supported and managed every stage of the machine learning life cycle. 

Michelangelo is Uber's internal and centralized "ML-as-a-service" platform. It makes scaling AI as easy as requesting a ride and supports every stage of the machine learning life cycle. Mike shared that Michelangelo's feature store component was the most important catalyst to scaling Uber's machine learning operations. 

The Michelangelo "feature store" is essentially a new, purpose-built data platform for machine learning features. It provides a similar function for data scientists as data warehouses provide for business analysts. Transform raw data into a feature once, place it in the feature store, fetch it offline for training, serve it online to machine learning models, and reuse the feature for any other machine learning model and use case. 

The feature store also acts as a bridge from the human-driven, ad-hoc analytics world to the machine-driven, automated world. Feature stores accelerate the time-to-value of initial machine learning model building and training. They also make it possible for machine learning models to consume pre-prepared features and deliver high-speed predictions at a massive scale in production applications. Customers benefit from the value of the feature store multiple times every time they open the Uber app and request a ride. 

Once a customer requests an Uber ride, the feature store engages (along with supporting systems) to:

  1. Capture, transform and store relevant feature sets in short-term, high-speed storage, including customer location, time of day, and supply of available Uber drivers in the area.

  2. Serve these feature sets to machine learning models at high speed so the models can quickly make predictions for customers via the Uber app, including ETA.

  3. Update feature sets in long-term storage, so data scientists have the most recent view of the world available to train machine learning models.

As Mike emphasized, "Feature stores are a critical part of the machine learning stack. We take data from the analytics world and bring it to the operational world."

Feature store solutions and outcomes

Before: Without Feature Store

After: With Tecton Feature Store

Now that we've seen the value of automating the data engineering component of the machine learning process with a feature store, how can you do the same?

While you could attempt to build a feature store at your organization, this may prove challenging for most enterprises because the technology sector owns most of the data science talent. And even for enterprises within the technology sector, it might make more sense to purchase a pre-built solution to accelerate time-to-value. 

Tecton provides one of the first commercially available feature stores on the market. Mike translated his first-hand experience building Michelangelo at Uber into a packaged feature store solution for enterprises who wish to deliver increased value with operational machine learning in a short time.

Tecton Feature Store Capabilities

It may be useful to learn that one of Tecton's early customers is Atlassian, which provides Jira, the number one software development tool used by agile teams. Atlassian aims to provide intelligent experiences using machine learning within its various offerings. Atlassian dedicated three team members to build its internal feature store. A year later, this team had a working solution. It solved some of the main challenges of serving features online. Unfortunately, however, the solution wasn't scalable across the company, and data science teams still operated in silos. 

After a successful proof of concept, Atlassian deployed the Tecton feature store within the company. The outcomes for Atlassian with Tecton are self-evident:

  • Accelerated time to build and deploy new features from 1-3 months to 1 day

  • Improved the prediction accuracy of existing models by 2%

  • Improved the accuracy of online features from 95%–97% accurate to 99.9% accurate

  • Freed up 2–3 FTE from maintaining the Atlassian feature store to focus on other priorities

As a net result of using the Tecton feature store, we’ve improved over 200,000 customer interactions every day. This is a monumental improvement for us.
— Geoff Sims, Atlassian Data Scientist

Considering these outcomes, if you'd like to accelerate the time-to-value of machine learning operations at your enterprise, do yourself a favor and look to technology leaders like Uber for guidance. Consider implementing an enterprise feature store like Tecton to complement the data strategy at your organization. With a feature store, you can empower your team to deploy machine learning models to production at greater velocity and scale, deliver increased value to your customers, and capture more value for your business.


Stay Current in AI, Machine Learning, & Data Science Technology

Subscribe to our free weekly newsletter.


Subscribe to our weekly AI, Data Science, & Machine Learning Technology Newsletter


Recent Posts

Posts by Category

Previous
Previous

Gartner Magic Quadrant for Data Science and Machine Learning Platforms 2021 Made Simple

Next
Next

Machine Learning is Like a Baby