Overview and Motivation

Enjoying watching the European soccer league games, we want to visualize data about soccer to show the trend of soccer’s development. Since there have been many research about the outcome of games, we decide to exploring our data from another perspective: the transfer market which reflects not only the loyalty of players in a team but also the development of leagues or teams. Our topic is the European league soccer transfer, containing two levels, which are league and team, and two perspectives, which are the number of transfer players and the amount of money has been spent.

Appreciating to the visuliazation class, we have learnt not only technical method but also many concepts and criterias of visualization. Based on the concept “overview plus detail”, we build our visulization in two views, intuitive insight of the data set and more accurate qualitative details.


The following questions have been answered on both league level and team level:

The following questinos have been answered on team level:


Our data comes from Kaggle European Soccer Database, and the link is .

The data set contains 11 European league, more than 25000 matches and at least 10000 players from season 2008 to season 2014.

The form of data set is several tables in sqlite database, so we plan to join the tables to get which player belongs to which team from year to year, then we can know the trasfer of players.

Then we plan to explore the insights of transfer with a different angle, which is the transfer fee among each leagues and each teams.

We crawled the source data from , but the team names we got are slightly different from our database. e.g. ‘Arsenal’ and ‘Arsenal F.C.’. So we need to find the match team between transfer fee data and the data from database, we use Levenshtein Distance algorithm to measure the similarity of two teams, if their similarity larger than a fixed threshold, then we think they are the same team.

Exploratory Data Analysis

The database is well structure and clean, so we don’t need to do data clean jobs except for just getting rid of some None value.

The data is in relational data format, so first we need to join useful data schema like Player, Match, Team, League, then transfer the data to matrix format which can perfectly presents the transfer relationship among leagues and teams. Then generate a csv file to find out which teams belong to a certain league.

After certain step, we found out we can’t get much insights from the data we already have, which just showing how many players transfer from leagues to leagues or teams to teams, so we decide to find how much a team of a league spend or earn from players transfer.

We found some websites which we can get the data of soccer players transfer fee for each years, then we decide to crawl data from , which has the most comprehensive data we need. Then we use Python and Beautiful Soap data spider framework to crawl the data.

The data we got is kind of noise since there are some data we don’t need, like some loan fee data and the players transfered from outside the 11 leagues. So we clean those useless data then tranform them to the matrix format same as way we transform the data from database.

Using all this data, we can get a good sense of:

Design Evolution

Initial design

Our initial design contains three charts: a chord diagram, a force direct diagram, and a line chart. We choice chord diagram to show transfer relations between leagues, since this diagram is concise and space savin, comparing to our another design where a league column has been doubled and lines are drawn between two columns such like links between levels in neural Networks. The scale is added outside the circle to avoid the quantitative shortage of the circle.

According to class vertices and edges can show relationships, the force direct diagram has been chosen to express the transfer relationship between teams. We plan to zoom in a team to show the name and the transfer numbers of that team, which is discarded in our final according to TAs’ suggestion and more function has been added in this chart which will be introduced latter.

We plan to use a line chart to show the trend of players transfer for each league.A line stands for the number of players who transfer into this league and another line is the increasment of the number of players in each league, which comes from substracting transfer-out players from transfer-in players. The distance beween two line represent the the number of players who transfer out from this league. In final design, we add an aculmulative line chart in reference to the website of baby name shown in class when two or more leagues have been choiced. Moreover, in order to better reflect the mobility of players, we use sum of the number of player transfered in plus transfered out as the upper line and the number of player transfered in as the lower line, and the number of player transfered out can be read by the distance of two lines.

Additionaly, we plan to add year brush to facilitate choicing a year or a period containing multiple years, and add logos of leagues help us choice a league easierly. These designs are both applied in our final design.

We have two optional charts. One is a table which show the top ten players who has the most number of transfer times and the other is a map which show the transfer trace of a player selected from the table. The optional charts are not included in the final design, since we decided focus on the transfer in leagues and teams level and based on the TA’s suggestino adding the money information is more useful than our optional designs.

Our initial design can be seen in Figure 1 and Figure 2.