[Тед Петроу] Анализ данных на Python (2021)


VIP складчик
1 Апр 2021
Автор: Тед Петроу
Название: Анализ данных на Python (2021)


Наиболее полный курс по анализу данных и визуализации на Python.
350 Упражнений, 800 страниц текста, множество проектов и их решений.

Что мы получим:

70+ Jupyter Notebook, где вы можете читать текст, выполнять упражнения и добавлять заметки
800+ страниц PDF-файла с текстом, позволяющим искать конкретный контент или читать, когда вы не в сети
240+ страниц PDF с подробными решениями всех упражнений
13 часов видео подробных и понятных объяснений текста и решений упражнений


1. Intro to pandas

In Python, pandas is a popular and powerful library to explore, analyze, and visualize data. You will be introduced to the DataFrame and the Series, the two main containers of data within pandas. You will learn the components of these objects and a few basic operations.

Intro to pandas is available to take for free and is bundled together with the next part, Selecting Subsets of Data.

2. Selecting Subsets of Data

One of the most common tasks during a data analysis is to select some subset of the data. In pandas, you can select data by row/column label or integer location as well as with conditional logic applied to the values. Although this is a rather simple task, pandas offers multiple ways to complete it, which causes confusion to the novice user.

In this part, you will be given very clear instruction on what are best practices for subset selection. You will also learn what methods of subset selection you should avoid.

3. Essential Series Commands

In this part, you'll begin performing calculations on your data. You'll begin by learning to how to operate on a single column of data, a pandas Series. You'll learn the difference between methods that aggregate (return a single value) and those that do not. You'll learn how to access string-only and datetime-only operations to process Series with those specific data types.

4. Essential DataFrame Commands

After learning how to operate on a single column of data, you'll learn how to operate on multiple columns at the same time by calling methods on a DataFrame. You'll learn how to change the direction of the operations from vertical to horizontal.

5. Data Types

There are a huge number of data types that are available for your DataFrames. In this part, you'll get a comprehensive tour of the exact definitions of each data type and how to convert to and from each one.

You'll also learn about the categorical data type, which is unique to pandas, and has the ability to save a tremendous amount of memory.

6. Grouping Data

Up to this point in the course, all operations were applied to the entire dataset. You will learn how to apply operations to independent groups within your data instead of the whole.

You will also learn how to display the results of grouping in a more human-readable way with pivot tables.

Grouping data can be tricky in pandas and has potential to be one of the slowest performing operations. You will learn best practices on how to optimize performance along with the newest syntax available.

7. Time Series

A time series is a sequence of data observed over a period of time. The entire set of observed data is ordered by its time component. You will learn how to sample time series data at evenly spaced intervals, operate over a rolling window of time, and group by any time period you desire.

8. Regular Expressions

Regular expressions are a miniature programming language on their own that help you match patterns within text. They can be extremely useful when combined with the pandas string-only methods to manipulate and analyze strings in almost any way.

9. Tidy Data

Tidy data is a structure of data that makes analysis easier. Often, it is necessary to rearrange, transform, and extract data so that it conforms to tidy data principles. You will learn how to tidy a variety of 'messy' data sets with the tools given to you by pandas.

10. Joining Data

In this part, you will learn how to work with multiple data sets together. You will learn how pandas implicitly uses automatic alignment of the index to combine datasets causing problems for the novice. You will also learn how to make SQL-like joins by interacting with a relational database.

11. Visualization with Matplotlib

A good visualization can make for easier understanding and decision making. In this part, you will learn a straightforward approach to using the powerful, yet confusing library matplotlib.

12. Visualization with Pandas and Seaborn

You learn how to plot data using pandas, before simplifying the process with the seaborn library.

Скрытое содержимое. Вам нужно войти или зарегистрироваться.

Скрытое содержимое. Вам нужно войти или зарегистрироваться.