Making a change

Looking Back to My First Year as a Practicing Data Scientist

I’m a physicist, but for over a year now I have worked as a data scientist. I’ll tell you a bit about my experience, how I made the transition from a successful academic career path to being a data scientist, and how I got interested in data science in the first place. What steps I took to make this transition happen, and what it has been like to work in the industry.

Background

I started my academic career in 2003 when I started my physics studies at the University of Tartu. I wanted to learn something real about this world. Ten years later I graduated with a PhD in physics and with the experience of making a small dent in the boundaries of human knowledge in the field of ultrafast optics. Then I did my postdoc at the University of St Andrews in Scotland, transferring my knowledge from laser pulse measurement to shaping light for a special kind of microscope.

  • In optics, I have enjoyed the experimental side of the research the most and have had the impression that I wouldn’t want to spend all my time just looking at data on a computer.
  • However, things change, and as time passed, I spent more and more of it analysing data on a computer, programming experiments, creating appropriate simulations, and writing papers.
  • When I returned home from abroad with my family, the software industry didn’t seem so intimidating after all. Besides, twiddling with computers and programming is what I had been doing more or less since early middle school, right?
  • So a little more than a year ago I joined Mooncascade as their first data scientist, and put all my transferable skills to practice.

Let me say that during my physics studies I hadn’t dealt with data science per se, or at least had never thought of data analysis as data science, or linear regression as machine learning. But we’ll get into that later.

From Physics to Data Science

My interest in data science was initiated at the middle of my two-year postdoc period by a talk from a person who had recently completed her astrophysics PhD and who had been through Privigo’s S2DS (Science to Data Science) program. She talked about her own experience. I hadn’t had time to go through this program and so I started to learn on my own, spending roughly 5 to 10 hours per week during a half-year period.

Going along with the trend was certainly a part of why I made this decision, not to mention wanting to come back to Tartu with my family and the favourable salary level in the IT sector, which was definitely a bonus as well. Continuing my research would have required endlessly applying for grants both for my salary and for equipment. I tried to apply for a few industry places in optics around Europe as well, but in my home country it was much easier to get attention. Data science seemed to be a subject in which I could put my physics PhD to use.

But real life is trickier than that.

When it comes to data science, you have several options for institutions where it is used. In different companies the term “data scientist” can mean very different things, depending on:

  • which development stage the company is in;
  • how big the team is;
  • if there are separate data engineers;
  • what the expected outcome of the work of a data scientist is (if it will be a report, a model, a data pipeline or a service);
  • and how much data to be worked on there are.

In the case of consultancies the broadest scope of potential work appealed to me the most. But to succeed, a lot of work and learning had to be done. This is how I prepared myself for my career transition:

  • Foremost, I started with online courses: “Statistical Learning” (by Trevor Hastie and Rob Tibshirani), “Machine Learning” (by Andrew Ng), and later “Mining Massive Datasets” (by Jure Leskovec, Anand Rajaraman and Jeffrey D. Ullman);
  • Secondly, I focused on improving my programming skills in Python: “Think Python” (by Allen B. Downey), and probably every talk by Raymond Hettinger that I could find on YouTube;
  • Thirdly, I also tried a few Kaggle competitions and tutorials;
  • And last but not least, I started attending Scotland Data Science meetings and subscribed to data science aggregators like DataTau, Data Science Weekly, Data Elixir, and Hacker News. I have found recordings of PyData presentations especially useful for getting technical details.

Data Scientist at Mooncascade

As a PhD student I have had to deal with a variety of tasks from learning to teaching, from building a laser to fixing the plumbing — this has helped me to respond to problems with a curious and creative mindset.

The most valuable skills that I have taken with me from my previous career as a physicist and what have proved to be handy as a practising data scientist include learning skills, problem-solving, technical reading skills, the scientific method, a background in physical measurements, skills to perform repeatable analysis, and data visualisation.

These skills have helped me to acquire highly technical and software development-specific knowledge and competencies that a practising data scientist needs — such as clean code, test-driven development, and design patterns. Source code control, containerisation, continuous integration, and continuous delivery as well. Source code for repeatable analysis rather than reports has especially been one of the main outputs of my work as a data scientist, as this is the area where the steepest learning curve has been from so-called academic hacking to software development.

Compared to my previous experience in academia, life in Mooncascade is definitely paced much faster:

  • During my first year I have been involved in six different projects, and four of them have included some part of natural language processing;
  • There’s also a long running research project with the University of Tartu for analysing population movement based on mobile data, which I’m involved in;
  • Every project has had its own focus, which has meant getting things done and learning in parallel for much of the time;
  • Extensive time-tracking on a task level — surprisingly it isn’t such a pain at all when done continuously and not semi-fictitiously later, as I have had to do on few occasions in academia;
  • Daily project meetings with reports on progress, plans, and problems have kept me well on track and provided me with a feeling of accomplishment and relevance;
  • On the other hand, there can be certain difficulties in the tight interaction with customers, and meeting clients’ expectations. Owing to the hype around data science, those expectations may in some cases be unrealistically optimistic, given the constraints on time and money;
  • Working mostly in weekly or biweekly cycles, constant deadline pressure might build up;
  • During my academic career I haven’t had the necessity to use any of the ordinary software development tools (Git, JIRA, Docker, etc.). However, LPU (Least Publishable Unit) in academia is after all just a form of MVP (Minimum Viable Product).
The first Garage48 Open & Big Data hackathon that was held in 2016 in Tartu. Photo: Maido Parv

There are also things that you can only learn by practising them yourself

Taking online courses gives an overview of how things should work, but there are several things in data science that can only be learned by practice.

For example — how can you develop models with looming deadlines, when you need to get your model working the first try? What is a plausible baseline for the model and what level of error is acceptable for the timeframe? And perhaps integrate that not-so-good model into a full application? In addition to data processing, which is usually understood as taking up the most working hours of a data scientist, I have found that a big portion also goes to all the support structures of a trained model.

Conclusions

To summarise my experience, I’m really happy that I made the decision to change from a physicist to a data scientist. Shorter projects in duration, and smaller and more concrete tasks have come in handy in the processes of taming the procrastinator in me. It has been supported by the daily recaps.

Software development in its essence is an enjoyable process for me thanks to its short feedback loop in contrast to most of the research I have done. Yet there are no signs of routine in the job, and there continue to be lots of things to explore and learn.

This has been my experience as a practicing data scientist at Mooncascade, one of the many possibilities out there for what the title can mean. What has been yours?

MOONCASCADE

Mooncascade is a Product Development and AI Consultancy company focused on building Data Analytics, Data Engineering and Machine Learning applications. Our team of world-class data scientists is here to solve your problems. Our clients include international corporations Merck Group and Thyssenkrupp, as well as the telecoms Elisa, Tele2, and Lattelecom, among various other companies.

If you’d like to learn more about your company’s potential for growth, and have a look at your business case through a fresh pair of eyes, Mooncascade is here to help. Book a consultation and have a chat with our experts — the initial discussion and the first look by us at your data to spot the potential for high-impact opportunities are both free.

Published by Peeter Piksarv

. Peeter is a data scientist at Mooncascade,a leading Product Development and AI Consultancy Company. He enjoys building products that have data science embedded in them and has experience with machine learning, natural language processing, bots, and computer vision. He has a PhD in physics (optics).