! conda config --add channels conda-forge
! conda config --set channel_priority strict
Warning: 'conda-forge' already in 'channels' list, moving to the top
One of the most important languages for data science, machine learning, and general software development in academia and industry.
Python is a suitable language not only for doing research and prototyping but also for building the production systems (also Julia programming language).
Short for Numerical Python, has long been a cornerstone of numerical computing in Python. It provides the data structures, algorithms, and library glue needed for most scientific applications involving numerical data in Python
pandas provides high-level data structures and functions designed to make working with structured or tabular data intuitive and flexible.
It provides convenient indexing functionality to enable you to reshape, slice and dice, perform aggregations, and select subsets of data. Since data manipulation, preparation, and cleaning is such an important skill in data analysis,
is the most popular Python library for producing plots and other two-dimensional data visualizations
SciPy is a collection of packages addressing a number of foundational problems in scientific computing.
general-purpose machine learning toolkit for Python programmers.
is a statistical analysis package
We can install Python packages using “Pip” or “Conda”. Read more about pip vs python
The author recommends:
Miniconda, a minimal installation of the conda package manager, along with conda-forge, a community-maintained software distribution based on conda.
This book uses Python 3.10 throughout.
Conda is a packaging tool and installer that aims to do more than what pip does; handle library dependencies outside of the Python packages as well as the Python packages themselves. Conda also creates a virtual environment, like virtualenv does
miniforge is the community (conda-forge) driven minimalistic conda installer. Subsequent package installations come thus from conda-forge channel. Mini-forge
miniconda is the Anaconda (company) driven minimalistic conda installer. Subsequent package installations come from the anaconda channels (default or otherwise).
miniforge started because miniconda doens’t support aarch64, very quickly the ‘PyPy’ people jumped on board, and in the mean time there are also miniforge versions for all Linux architectures, as well as MacOS.
AARCH64, sometimes also referred to as ARM64, is a CPU architecture developed by ARM Ltd., and a 64-bit extension of the pre-existing ARM architecture. ARM architectures are primarily known for their energy efficiency and low power consumption. For that reason, virtually all mobile phones and tablets today use ARM architecture-based CPUs.
Although AARCH64 and x64 (Intel, AMD, …) are both 64-bit CPU architectures, their inner basics are vastly different. Programs compiled for one platform, won’t work on the other (except with some magic), and vice-versa. That means, software does not only need to be recompiled, but often requires extensive optimization for either platform.
The first step is to configure conda-forge as your default package channel by running the following commands in a shell:
! conda config --add channels conda-forge
! conda config --set channel_priority strict
Warning: 'conda-forge' already in 'channels' list, moving to the top
Now, we will install the essential packages used throughout the book (along with their dependencies) with conda install
-y -n pydata-book python=3.10 # create enviroment with python 3.10 installed
conda create -book # activate enviroment
conda activate pydata-book) $ conda install -y pandas jupyter matplotlib # install a (pydata
Install complete packages used in the the book
conda install lxml beautifulsoup4 html5lib openpyxl
requests sqlalchemy seaborn scipy statsmodels
patsy scikit-learn pyarrow pytables numba
While you can use both conda and pip to install packages, you should avoid updating packages originally installed with conda using pip (and vice versa), as doing so can lead to environment problems. I recommend sticking to conda if you can and falling back on pip only for packages which are unavailable with conda install.
conda install should always be preferred, but some packages are not available through conda so if conda install $package_name fails, try pip install $package_name.
Many commands : create env, activate env, delete env, lists env
Install tldr (https://github.com/tldr-pages/tldr) : The tldr-pages project is a collection of community-maintained help pages for command-line tools, that aims to be a simpler, more approachable complement to traditional
The Python community has adopted a number of naming conventions for commonly used modules:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as
Input In [1] import statsmodels as ^ SyntaxError: invalid syntax
This means that when you see np.arange, this is a reference to the arange function in NumPy. This is done because it’s considered bad practice in Python software development to import everything (from numpy import *) from a large package like NumPy
import numpy as np
= np.random.random((64, 3, 32, 10))
x = np.random.random((32, 10))
y
= np.maximum(x, y) z