Pandas Workout cover
welcome to this free extract from
an online version of the Manning book.
to read more
or

6 Grouping, joining, and sorting

 

So far, we have looked at how to create data frames, read data into them, clean the data, and then analyze that clean, imported data in a number of ways. But analysis often requires more than just the basics: We often need to break our input data apart, to zoom in on particularly interesting subsets, to combine data from different sources, to transform the data into a new format or value, and then to sort it according to a variety of criteria. This type of action is known in the Pandas world as "split-apply-combine," and is our focus in this chapter. If you have experience with SQL and relational databases, then you’ll find many similarities, in both principle and name, with functionality in Pandas.

For example: A company might want to find out its total sales in the last quarter. But it might want to find out which countries have done particularly well (or poorly). Or perhaps the head of sales would like to see how much each individual salesperson has brought in, or how much each product has contributed to the company’s income.

These types of questions can be answered using a technique known as "grouping." Much like the GROUP BY clause in an SQL query, we can use grouping in Pandas to ask the same question for various subsets of our data.

6.1 Useful references

6.2 Exercise 29: Longest taxi rides

6.2.1 Working it out

6.2.2 Solution

6.2.3 Beyond the exercise

6.3 Exercise 30: Taxi ride comparison

6.3.1 Working it out

6.3.2 Solution

6.3.3 Beyond the exercise

6.4 Exercise 31: Tourist spending per country

6.4.1 Working it out

6.4.2 Solution

6.4.3 Beyond the exercise

6.5 Summary

sitemap