This Article originally appeared in the July/August 2017 issue of IT Pro, published by the IEEE Computer Society.
Most of the work that data scientists do is “data janitorial” work, as opposed to science work, and there is a gulf between prototype and sandbox, and innovation and production. In addition, having pockets of knowledge and expertise throughout the enterprise, which may be gone when an employee leaves, poses a problem when the knowledge is not institutionalized or captured in a system. Organizations are best off if they focus on understanding their own data, focus on the business problems they are trying to solve, and build the semantic layers that can allow for data portability across various platforms. This lets them take advantage of best of-breed solutions and not become locked into a particular vendor that does not abstract the business problem, analytic, data, and platform layers required to operationalize the fast-evolving advanced machine learning analytic and AI technologies.