Quick Set-Up Guide: Python for Data Science
Welcome to the second post!
Quick Recap: This is part of my series of blogs dedicated to showing how Python (and its various libraries) is a programming language that will increase your prowess as a data analyst.
Ready to join me on this journey? Awesome! But first, let's get you set up. If you're anything like me, learning by doing is the most efficient way to dive into a new subject. Let me me guide you through setting up your environment so that you can actually follow along with the examples in this blog series. Shouldn't take more than an hour.
3 Quick Notes:
This set-up guide is geared towards newcomers to the world of programming. If you feel comfortable in a terminal or command prompt, then this guide is not for you. If you are looking for a summary of system requirements, then scroll down to the end of this post.
If you are looking to understand why one set up is preferred over the other, this guide is also not for you. This guide is to tell you which set-up is needed on your local machine to use Python for data science work and visualizations. Just cut and dry instructions to get you ready to code for data analysis. If you are looking for detailed explanations for setting up environments or you just want a deeper dive, scroll down to the end of this post for a list of resources.
Most important note: one extremely helpful lesson I have learned in my professional life is that using Google and sources such as stackexchange.com to search for troubleshooting solutions is a vital skill. Knowing how to search -- i.e. knowing which keywords and sources to use -- to find answers as you work though your specific projects and business problems is a skill that can be honed and become part of your arsenal as a professional. If you have any hiccups during set-up or at any point throughout this blog series, Google your issue for solutions! It is a good habit for any budding programmer or data analyst. Also, feel free to reach out to me!
What exactly are we downloading here?
Since the goal is to use Python for data science, we need to make sure we have all the right packages, versions, libraries, and so forth to do just that. The closest thing we could do to waving a magic wand and having an instant spin up is to download Anaconda Navigator.
Straight from the source: "...the open source Anaconda Distribution is the easiest way to do Python data science and machine learning. It includes 250+ popular data science packages and the conda package and virtual environment manager for Windows, Linux, and MacOS."
The Anaconda Navigator contains Jupyter Notebook, which we will be using throughout this whole blog series. For more information about using Navigator, see Navigator.
STEP 1: Determine Your System Details
Your system does not have to match mine. If you have a 32-bit CPU or a Windows OS, be sure to choose the appropriate installation package (as indicated below).
STEP 2: Download Anaconda
Go to this link:
Choose your operating system, as indicated below:
Choose the Graphical Installer and latest version of Python, as indicated below:
For Windows or Linux OS, choose the appropriate package for your CPU (32- or 64-bit), as indicated below:
STEP 3: Install Anaconda
Once you click on the download link, open the downloaded file (check your downloads folder if you don't see the file at the bottom of your screen):
Let's not waste great resources. Anaconda documentation provides a really well done set of detailed instructions, which include screenshots and helpful notes for common errors during the installation process. There are even specific guides for each OS!
Click on the installation guide for your OS below, and start following from Step #3. Come back to this post when you're done:
Mac OS -- Follow steps #3 - 10: https://docs.anaconda.com/anaconda/install/mac-os#macos-graphical-install
Windows -- Follow steps #3 - 15: https://docs.anaconda.com/anaconda/install/windows
STEP 4: Install Bokeh
In this blog series, we will be primarily using Bokeh to create data visualizations. If you prefer or need to use other data-viz libraries, such as seaborn and matplotlib, please do! And consider this step optional. I prefer Bokeh for several reasons, but mainly because I created a Bokeh data visualization guide for myself.
Let's take a look at the easiest way to install Bokeh, straight from the source:
Okay, let's do that!
For Mac OS -- You'll need to open a terminal, as shown below:
Press 'Command ⌘' and the space bar to open spotlight search. Then type in 'terminal'.
For Windows OS -- You'll need to open a command prompt. Go to this link for a walkthrough:
Once you have the shell program open, type in the following command "conda install bokeh", as shown below:
You will eventually see the following display:
Once you see the display below, you'll be ready to test your new environment:
STEP 5: Test Your New Environment!
First, let's create a new Jupyter Notebook file (extension .ipynb). There are several ways you can go about this, but I'm just going to walk you through the most efficient route.
Enter the command "jupyter notebook" in a terminal or command prompt, as shown below:
At first, it will seem like nothing has happened. Just wait a few seconds, and the shell will eventually begin spitting out lines that look like this:
Then your default web browser will open up automatically, and you will see a new Jupyter Notebook Navigation Screen as follows:
In the next blog post, I'll spend a little more time on the various features of jupyter notebooks and how to use them. For now, we are just trying to verify that the set-up process was successful.
If you want, you can navigate to a specific folder where you plan to save your .ipynb files. I navigated to my 'Documents' folder.
To create a new jupyter notebook file, click on 'New' for a drop down menu. Then click on 'Python 3' (or whatever version you have installed), as shown below:
A new tab should open up on your browser. Welcome to your first jupyter notebook!
Again, I will go over how to use these things in the next blog post. For now, let's just run some code to make sure we have what we need.
Make sure that 'Code' is selected, as shown above.
If you want, click on 'Untitled' to rename your file.
Now let's try some code! Copy the code below and paste it into the first cell in your notebook:
from bokeh.plotting import figure, output_notebook, show
# prepare some data x = [1, 2, 3, 4, 5] y = [6, 7, 2, 4, 5]
# output in notebook output_notebook()
# create a new plot with a title and axis labels p = figure(plot_height = 300, plot_width = 400, title="simple line example", x_axis_label='x', y_axis_label='y')
# add a line renderer with legend and line thickness p.line(x, y, legend="Temp.", line_width=2)
# show the results show(p)
Once you have all the code in the cell, click the 'run cell' button as shown below:
You should see the following output:
Success! I hope. If you got the expected output, go onto the next blog post for the real fun (or head back to the ebook if you are coming from the Python Visualization Handbook)!
If you've encountered some errors, be sure to check out the detailed resources below. Or feel free to reach out to me with questions!
SUMMARY OF SYSTEM REQUIREMENTS
License: Free use and redistribution under the terms of the Anaconda End User License Agreement.
Operating system: Windows Vista or newer, 64-bit macOS 10.10+, or Linux, including Ubuntu, RedHat, CentOS 6+, and others.
Windows XP supported on Anaconda versions 2.2 and earlier. See Old package lists. Download it from the archive.
System architecture: 64-bit x86, 32-bit x86 with Windows or Linux, or Power8.
Minimum 3 GB disk space to download and install.
-- Detailed Installation Information --
YouTube -- How to Install Anaconda (Windows 10):
Video -- Setting Python & Conda Path (Windows):
How to Setup a Python Environment for Machine Learning and Deep Learning with Anaconda:
Bokeh Documentation -- Quickstart: