Usage and thought around data have changed radically over the last several decades. I am a statistician by training, a data scientist by trade, and have picked up assorted data engineering and cloud software development skills throughout my career. Modern data practitioners need to fill more roles than ever before, so this post explains how I see myself in each of these.

Statistician

I design experiments and studies, conduct appropriate and honest analyses thereof, build models to predict outcomes from potential new data, and help decide how much and what sort of data are required to test a hypothesized result that would be of practical interest. I have a Ph.D. in statistics from Montana State University, Bozeman, where I taught introductory statistics for three semesters and worked as a statistics research assistant for three years.

Data Scientist

This is a trendy rebranding of “statistician” that typically implies a focus on enterprise-scale predictive modeling at the expense of the other things stated above. This usually entails computing large quantities of statistical insights, making them available inside of software, and assuring quality of the outputs. Compared to statisticians, data scientists are more likely to use black-box deep learning techniques, but traditional statistical models and classifiers still make up the majority of “data science” applications. As of this writing, I held the job title of “data scientist” or “data science consultant” at two different companies spanning nearly 5 years.

Data Engineer

Enterprise organizations frequently require a division of labor between (1) the data scientists who use tools to analyze, summarize, and make decisions using data, and (2) the engineers who manage connections to datastores and software consumption layers while writing rigorous code to efficiently move data from one location to another in the required format. The toolkit of the data engineer has more in common with that of the software engineer than that of the data scientist, and sometimes overlaps with that of the database administrator. At my previous data scientist positions, I gained experience with data engineering while projects were in early phases and the need for a dedicated engineer was not yet realized.

Putting It Together

I am able and happy to consult on all of these types of projects. For a pure statistical research project, I would be pleased to be involved in each phase: design, analysis, conclusion, and dissemination. The turnaround for each phase can be relatively quick (a few weeks, largely based upon your timeline, but filtered through my workload of other obligations).

For a data science/engineering project, we will need to do extensive requirements gathering up-front to understand what state your data collection/storage systems are in, what new build may be necessary, and if I am the best person to do the engineering work. A proof-of-concept may be necessary to determine if your request is even possible at scale, and these projects can take 4-8 months to fully complete. I can do all the data science build, much of the data engineering build, and may be able to assist with an app build. However, in the event that I am not able to complete your project, I am versed in project management and will assist in writing requirements for later work potentially done by other teams. I can also provide referrals to colleagues at consulting firms with larger teams.

We are in an exciting era where analysts who were taught that 70-year-old statistical techniques are the only way to do things are discovering the power of brand new methods, while large data warehousing organizations are seeing the value of simple statistics applied at scale. I am excited to be at the center of this intersection.

Updated: