Concern: 5 Tips to Curb Vendor Lock-In: Data Science and Analytics Technologies

Databricks Snowflake Mathworks Rapidminer ? SAS GCP AWS Tibco IBM Domino Data Lab Knime Altair Alteryx Dataiku MS Azure H2O.ai DataRobot Anaconda Gigantum Stata Weights & Biases Cubonacci Tecton Saturn Cloud

CTO’s and CIO’s seem to have learned their lessons when it comes to avoiding vendor lock-in with data storage and processing. These days, very few mission critical data applications are built upon proprietary database technologies that have high switching costs. Instead, companies use a mix of open-source and “low lock-in” vendors such as Snowflake and AWS Redshift for their data processing needs.

(The definition of lock-in can be complex but to keep it simple let’s say that lock-in makes it hard to switch from a lower ROI software to a higher ROI software alternative.)

As data science and machine learning software tools become purchased with greater regularity, I thought it would make sense to provide some tips and tricks to avoid the painful lessons of vendor lock-in in analytics. 

First…. Let’s take a look what freaks me out when it comes to potential vendor lock-in. 

Lock-in “danger zone”

  1. API’s and bundled applications: If you take something to production in proprietary software it will be hard to leave it. 

  2. Cloud-only platforms: If you use cloud native AWS Sagemaker/ Azure Machine Learning, your work will be tough to migrate away from this cloud. 

  3. Requires Experts to use and administer: Lock-in is about switching costs and if you need experts to use the software, this specific human capital will be difficult to move away from.

  4. GUI-centric: GUI’s are unique and are specific to each technology. The more clicks, the more lock-in. Specific education increases lock-in chances.

  5. Owns more of the stack: Data science projects require data access, munging and manipulation, modeling and deployment. If a software is an end-to-end solution and you utilize all parts, you are more susceptible to lock-in

5 Ways to reduce lock-in potential

  1. Use POC time to replicate a sample project outside of the vendor software: If you use a ML platform to build and deploy a model. Take that process and rebuild one use case entirely in FOSS (Free and Open Source Software) alternatives.

  2. Prototype in a commercial tool and productionalize from the ground up in FOSS. You don’t build churches for Easter Sunday but you better build API’s that way. You’ll need to tweak them anyway so to limit lock-in and maximize your performance, use what GAFA (Google Apple Facebook Amazon) uses. FOSS to take API’s to production.

  3. Keep purchases departmental at first: Maybe don’t jump all in at once. Buy a little bit and try it out for a year or two. Have some successes and see if the software is worth committing to. 

  4. Favor tools that do a piece of the process well and: Datarobot does automodeling really really well. Alteryx is great for analysts doing data wrangling. Maybe use both in an analytics process flow.

  5. Make sure the proprietary parts are the most productive and valuable parts: Follow my “why I buy data software tools” and make sure that you are buying data science tools for the right reasons.


Become smarter in just 5 minutes

Get the weekly email that makes data science news actually enjoyable. Stay informed and entertained, for free.


Subscribe to our weekly Data Science & Machine Learning Technology Newsletter


Recent Posts

Posts by Category

Previous
Previous

Docker for Data Scientists Made Simple: Why Pay for Data Science Software?

Next
Next

Snowflake vs Databricks: Where Should You Put Your Data?