What tools, other than Python, R, and SQL, are all data scientists expected to know?

Updated on : December 3, 2021 by Landen Donovan



What tools, other than Python, R, and SQL, are all data scientists expected to know?

This answer does not strictly cover technical tools, but it has been so crucial to my own growth that I have to share it.

Until I started working with my current team, I was focused on acquiring the proper technical / domain tools. I spent hours familiarizing myself with R, fluent in descriptive statistics, and reading the code of data scientists from my first companies.

Of course, all this exposure to the subject was crucial, especially for me, because I don't have a very rich background in science or math, and I have had to learn statistics and programming from box 1.

Keep reading

This answer does not strictly cover technical tools, but it has been so crucial to my own growth that I have to share it.

Until I started working with my current team, I was focused on acquiring the proper technical / domain tools. I spent hours familiarizing myself with R, fluent in descriptive statistics, and reading the code of data scientists from my first companies.

Of course, all this exposure to the subject was crucial, especially for me, because I don't have a very rich background in science or math, and I have had to learn statistics and programming from square 1 (or 0). I wouldn't recommend that anyone trying to do data science stop prioritizing technical learning.

But as I've moved into more formal analysis roles, I've clearly seen that no tool is a magic bullet. I had several projects quickly completed on a single Excel sheet that added tangible value to the business, and others, involving weeks and hundreds of lines of prototype code, that have been near complete failures.

Through these, I have come to realize that three fundamental skills underpin the effective use of virtually all tools:

(1) Evaluate and articulate the priority of a project,

(2) Obtain and document project requirements, and

(3) Measure the ability of the team to complete a certain body of work (especially important if "the team" means "you").

These are so crucial to the success of any data project that they have become something of a mantra. I encourage you to keep them too: priority, requirements, capacity. I started filtering every data request I receive through these three considerations and have only seen good results.

So much time, development effort, and potential business value can be wasted if SOME of these three are unclear, misaligned, or worse, not considered at all.

It doesn't matter how many programming languages, analysis paradigms, or visualization engines a data analyst / scientist has in their toolbox. For a functional data project, a firm understanding of these core elements is essential.

I think there is only one programming language that all data scientists are expected to know about, and you already mentioned that. It is SQL.

In many places, data scientists are expected to be familiar with Python, but it is rarely a strict requirement.

As a traditional software developer, you are often limited by the technology stack your company uses, but as a data scientist, not so much. If a data scientist comes up with a working model in R, it can always be re-implemented in a more production-friendly language.

The most important contribution of a data scientist is the model itself, not its technical implementation.

Keep reading

I think there is only one programming language that all data scientists are expected to know about, and you already mentioned that. It is SQL.

In many places, data scientists are expected to be familiar with Python, but it is rarely a strict requirement.

As a traditional software developer, you are often limited by the technology stack your company uses, but as a data scientist, not so much. If a data scientist comes up with a working model in R, it can always be re-implemented in a more production-friendly language.

The most important contribution of a data scientist is the model itself, not its technical implementation. You can go a long way just by knowing Python and SQL if your modeling skills are up to the task,

Other Guides:


GET SPECIAL OFFER FROM OUR PARTNER.