Concern: 5 Tips to Curb Vendor Lock-In: Data Science and Analytics Technologies

Data Science & Machine Learning Platforms

Nov 17

CTO’s and CIO’s seem to have learned their lessons when it comes to avoiding vendor lock-in with data storage and processing. These days, very few mission critical data applications are built upon proprietary database technologies that have high switching costs. Instead, companies use a mix of open-source and “low lock-in” vendors such as Snowflake and AWS Redshift for their data processing needs.

(The definition of lock-in can be complex but to keep it simple let’s say that lock-in makes it hard to switch from a lower ROI software to a higher ROI software alternative.)

As data science and machine learning software tools become purchased with greater regularity, I thought it would make sense to provide some tips and tricks to avoid the painful lessons of vendor lock-in in analytics.

First…. Let’s take a look what freaks me out when it comes to potential vendor lock-in.

Lock-in “danger zone”

API’s and bundled applications: If you take something to production in proprietary software it will be hard to leave it.
Cloud-only platforms: If you use cloud native AWS Sagemaker/ Azure Machine Learning, your work will be tough to migrate away from this cloud.
Requires Experts to use and administer: Lock-in is about switching costs and if you need experts to use the software, this specific human capital will be difficult to move away from.
GUI-centric: GUI’s are unique and are specific to each technology. The more clicks, the more lock-in. Specific education increases lock-in chances.
Owns more of the stack: Data science projects require data access, munging and manipulation, modeling and deployment. If a software is an end-to-end solution and you utilize all parts, you are more susceptible to lock-in

5 Ways to reduce lock-in potential

Use POC time to replicate a sample project outside of the vendor software: If you use a ML platform to build and deploy a model. Take that process and rebuild one use case entirely in FOSS (Free and Open Source Software) alternatives.
Prototype in a commercial tool and productionalize from the ground up in FOSS. You don’t build churches for Easter Sunday but you better build API’s that way. You’ll need to tweak them anyway so to limit lock-in and maximize your performance, use what GAFA (Google Apple Facebook Amazon) uses. FOSS to take API’s to production.
Keep purchases departmental at first: Maybe don’t jump all in at once. Buy a little bit and try it out for a year or two. Have some successes and see if the software is worth committing to.
Favor tools that do a piece of the process well and: Datarobot does automodeling really really well. Alteryx is great for analysts doing data wrangling. Maybe use both in an analytics process flow.
Make sure the proprietary parts are the most productive and valuable parts: Follow my “why I buy data software tools” and make sure that you are buying data science tools for the right reasons.

Featured

May 28, 2019

Kenneth Sanford

May 28, 2019

Ken is a data science technology consultant and often quoted expert in data science. Ken is a reformed academic economist who likes to empower customers to solve problems with data. Ken’s primary passion is teaching and explaining. He likes to simplify and tell stories.

Ken has spent time in academia (Middle Tennessee State University, U of Cincinnati, Peace College, Boston College) consulting (Deloitte) and software development (SAS, H2O, Dataiku). He has worked with analytics departments in most of the Fortune 100 companies. He earned a Ph.D. in Economics from the University of Kentucky in Lexington and his work on price optimization has been published in peer-reviewed journals.

May 28, 2019

Become smarter in just 5 minutes

Get the weekly email that makes data science news actually enjoyable. Stay informed and entertained, for free.

Subscribe to our weekly Data Science & Machine Learning Technology Newsletter

Posts by Category

kenneth sanford

Concern: 5 Tips to Curb Vendor Lock-In: Data Science and Analytics Technologies

Lock-in “danger zone”

5 Ways to reduce lock-in potential

Become smarter in just 5 minutes

Recent Posts

Posts by Category

Docker for Data Scientists Made Simple: Why Pay for Data Science Software?

Snowflake vs Databricks: Where Should You Put Your Data?