Most companies I talk to, from pre-product startups to Fortune 500s, are hiring data scientists. They see that data science and machine intelligence will play a key role in the business and in their products, so want to build a competitive advantage with data. If you’re one of them, you may still wonder what kind of background a good candidate should have, given that Data Scientist is a broad job title. You may also realise that hiring a data science expert is extremely tough in this market.
The good news is that you don’t need a team of Ph.D.s and statisticians to get started. While working in this industry, I’ve seen many business start their data science initiatives in a lean, cost-effective way by having just one person and by using free, open-source software like mine. Some of them even let a software engineer or a data analyst in their existing team to take on the role.
What should the first hire do?
If you’ve decided someone, what should you look for in this first data scientist? I suggest finding someone who can establish the following framework:
- Define a data problem based on your business need.
- Setup a software platform to collect data from multiple sources. Conduct data sanity checks to ensure there aren’t mistakes in the collection pipeline.
- Define an evaluation measurement based on your business goal.
- Measure the current business performance as a baseline.
- Understand the domain knowledge and data of your business.
- Implement the first simple solution on a production system.
- Conduct A/B tests with the evaluation measurement to compare the solution and the baseline.
Then repeat steps six and seven with different solutions to improve results.
These tasks are not rocket science. They at least don’t require Ph.D.-level theories. Basically, your first hire helps get three things ready: your data, a clear problem to be solved and a process to evaluate the business impact of any new solution.
After this framework is established, you might hire more data scientists or bring in external consultants if you want to create more sophisticated algorithms and solutions. The benefit of this process is that the problem is now very clear and you have a way to evaluate their work.
Next, when you are doing your initial interviewing, consider the following five points:
- Not every data problem is a big data problem. Don’t mix up data science and big data. Millions of records is not a big data problem. The data size of many small to medium size companies is small enough to be processed by a single machine. While you want a data storage system that is scalable for future growth, whether you need to analyze terabytes of data efficiently today depends on your business. Some experienced data scientists and engineers, especially those from tech giants like LinkedIn and Twitter, specialize in large-scale data processing. Their impressive experience may not be helpful to you if your data size is not that large, and they may not be interested in your data problem anyway.
- You cannot optimise what you cannot measure. Whether or not to build a data science team is an ROI question. You want to hire someone who cares about evaluation — who wants to measure the impact of their work on your business. Good candidates should be able to define the data problems based directly on your business needs and be able to propose evaluation methods for different data science approaches. They should care about delivering measurable results; otherwise, you may spend months doing something fancy but be left wondering if there was any business gain.
- Domain knowledge is key. Black-box predictive solutions often don’t work well because every business is unique, so their data is unique. A generic algorithm that doesn’t take domain knowledge into consideration has its limits. Your data team should understand and make use of your unique data to develop competitive advantages. Therefore, hire someone who is eager to acquire domain knowledge about your industry and your particular business by looking into your data.
- Avoid solutions that are looking for problems. Sometimes, an expert who is too deep in certain approaches or research areas may have a tendency to solve every problem the same way. This is not uncommon for candidates who have spent years investigating a single algorithm. But when you’re just starting out, you likely don’t know which methodology works best for your business. You need someone who is interested in conducting experiments and helping solve problems with the most suitable solutions. Look for open-minded candidates.
- Don’t be a perfectionist. When you have a prediction or optimisation problem, don’t aim for perfect accuracy. It’s not realistic. Instead, benchmark the data solution with what you currently have. Quickly having a practical solution that improves your business is much better than spending years chasing the impossible, perfect solution. For most companies, especially small businesses and startups, becoming a research lab isn’t a good idea.
It is always better to start lean and start early. Happy to hear your own experience in the comments.
Simon Chan is the CEO and Co-founder of PredictionIO, an open source Machine Learning server for developers to build predictive applications in a fraction of the time.
The Young Entrepreneur Council (YEC) is an invite-only organisation comprising the world’s most promising young entrepreneurs. In partnership with Citi, YEC recently launched BusinessCollective, a free virtual mentorship programme that helps millions of entrepreneurs start and grow businesses.