Data Structures for Data Analysis for Data Visualization

Meelad Doroodchi

I wanted to analyze the two most recent World Cups(2018 and 2014) with the 2022 World cup looming this fall, but I wanted to compare the total # of goals scored in the 2018 World Cup to the total scored in 2014 to see the difference and what possibly could cause that difference. Each path during a World Cup run is different and I wanted to analyze the difference in 2014 and 2018!! In analyzing these two datasets, I looked through each dataset and looked at the variables named "Team, Goals For, and Goals Against. Using two data visualizations to compare the top total goals, which is goals for and goals against combined from the different years using two barplots.

Top 5 Teams 2014 Top 5 Teams 2018

1.The Data

I drew the data from Kaggle. You can look at it for yourself below:

Importing Necessary Packages

The code below was implented to use all the different packages to allow me to create visualization for my data and to be able to convert the .csv files.

picture of packages

2.The Structure

The 2014 dataset was approximately 1070 Kilobytes encoded as a .csv file. I read the file line by line and decided to iterate through the csv_2014 file, and append the teams, goals for, goals against into list_2014 as a dictionary for speed optimization. I would open the csv_2014 file and use the csv.reader as an object and place them into a list as a dictionary. The 2018 dataset was approximately 1030 Kilobytes encoded as a .csv file. I read the file line by line and did the same iteration for speed optimization and appending the variables that I am analyzing by putting them into list_2018 as a dictionary.

picture of code 2014
2014
picture of code 2018
2018

3.Binary Search Tree Code!

I wanted to do something that made this a little tougher. The code demonstrated below is the creation of the BST to print out the top 5 teams with most total goals scored. (goals for and goals against combined)I used the binary search tree with the list_2014 and list_2018 code. I used a Binary Search Tree so that the elements also known as the variables in the parenthesis are sorted in order and can be looked through and replaced by a greater value if it is one.(The root is the highest value of the highest number of total goals scored.

picture of class code
Inorder code
picture of inorder code

4.The Analysis/Code

 The code that is shown below is to print out the top 5 temas in order based off of 
      the total goals scored(printing out the numbers)in the two datasets 
      that have been converted to lists. 
Top 5 Teams code
 The Binary Search Tree printed these, which are the top 5 teams 
      based off of total goals in both 2014 and 2018.
Top 5 Teams printed

5.Visualization

Top 5 Teams 2014 Top 5 Teams 2018

After finishing the code and having the top 5 teams printed in both years I wanted to visualize both of the results with two seaborn barplots to compare and show the difference in total number of goals and to make some inferences. Overall, I was honestly pleased with how this project turned out and have a few thoughts and questions.


 
  1. In both of the World Cups the team with the highest number of total goals did not finish in 1st which is interesting.
  2. In 2018 France won the World Cup with the 3rd most amount of goals, could a team like the Netherlands done the same in 2014?
  3. Could a team like Switzerland or Argentina still win the world cup with the lowest amount of goals of the top 5?
  4. What other data structures could I have used?

Looking back after completing this project something I could have done instead of implenting the code using images, I could have used the code tag in html, but all in all I was pleased.