R You Curious About Sport Data?

How programming tools can help spark curiosity in sport science

By Dr Alice Sweeting

Data is now everywhere in professional team sports. Sport scientists can measure how far and fast teams move during matches, estimate how much energy athletes use during training sessions, and record the number of skilled actions, like kicks or passes, that happen on the field or court. Off the field, fans can follow with live, in-game statistics on their mobile or desktop devices. This rise in data availability, affordability and accessibility allows us the opportunity to dig into some really complex problems. However, given the big volume of data captured and the complexity of sport, it can be tricky to work with and make sense of it all. When you factor in how limited humans are in their ability to process and make decisions, relative to machines, the challenge grows. How can we make use of analytical approaches and data tools to tackle these complex problems and issues? 

A Typical (Time-Consuming) Sport Science Workflow

Sport scientists are people who are curious about how to improve something, and most likely that something is performance. Whilst this curiosity typically focuses on the athletes or teams with whom they work alongside, it is also important that sport scientists look to broaden their horizons and be curious about their own development. A skillset that may be useful for sport scientists to develop, whether they work heavily with technology or not, is computational thinking. This is defined as the ability to clearly define an issue and use rules or concepts to solve a problem. Or in simple terms, the capacity to break a problem down into small pieces, tackle them individually, before repeating and refining the process until it becomes more accurate, efficient and/or automated. 

This point around efficiency and automation is an important one. In professional team sport, sport scientists generally spend excessive amounts of time clicking, dragging and dropping data. The manual tagging of vision, for example, typically involves a sport scientist (or performance analyst) physically pressing a button on the keyboard, in specialised software, each point in time that the athlete passes the ball to a teammate or when a goal is scored. This data is then usually exported manually into a spreadsheet program, like Excel. The sport scientist will inevitably then spend more time clicking, dragging and dropping, to manually create figures and tables. These graphs and tables normally comprise a report or dashboard for other staff, coaches or athletes to read and digest. This laborious process is then repeated, often daily or more, by the sport scientist.

An Excel-lent Problem

Spreadsheet programs are ubiquitous in sport science. There’s an undeniable usefulness in entering data into neatly defined cells, applying formulas, and the ability to go back and edit or delete everything if needed. Spreadsheet programs can open files of different formats and even automatically change the format from one to another, all at the click of a button by a human user. Data from spreadsheets can be manually exported out of the program, for example into a PDF, which can then be (again, manually) attached to an email or dropped into a shared folder. However, spreadsheets are fraught with risk. In 13 audits of real-world spreadsheets, an average of 88% contained errors. These errors work like compound interest, whereby a reported error rate of 2% may produce a 10% chance of there actually being an error. Beyond the numbers, an error rate this high can mean the difference between an athlete having too much or too little load during training, as calculated within the spreadsheet. Just like Goldilocks, we need to ensure our processes are just right!  

Another real issue with spreadsheets is the limit on how much data can be entered or stored. This was something I discovered during my own PhD, when I had rows and rows of netball athlete tracking data, but had to separate individual athlete data within quarters, then try to merge it all together again to (manually) create a figure. What a nightmare! In another case, a colleague added a new tab to one of my spreadsheets that contained transformations on the raw data. I had no idea what happened and how the raw data had been altered, so I had to (manually) trace my way back and compare. These things may seem minor, but the lack of transparency around workflow can be frustrating and extremely time-consuming to fix. I am sure I am not the only sport scientist who has spent considerable time collecting data, only to spend even more time trying to transform it all in a spreadsheet. The time spent clicking, dragging and dropping takes away from crucial time that could be spent interpreting the data and making recommendations.

There R Better Ways

Thankfully, there are a number of tools to help sport scientists tackle modern and complex problems, other than spreadsheet programs. These include open-source or free programming languages, such as R and Python. The benefit of open-source languages for automating repetitive sport science tasks, as opposed to license-based alternatives, is that errors and ambiguity in workflows can be reduced through sharing of ideas and processes, and allowing others to debug code. Programming languages can automate the repetitive tasks sport scientists perform, allowing the importing, tidying and visualisation of data in just a few lines of code. This ability to quickly look at the distribution of a dataset or visualise the uncertainty in a recommendation, for example, is a valuable and time-efficient task sport scientists can easily perform in an open-source language. 

Another benefit of sport scientists learning a programming language is the ability to apply different analytical techniques to the same dataset. For example, using a linear or non-linear approach, on a single dataset, may lead to different questions and considerations. Both these approaches, along with merging different data sources or dealing with continuous athlete tracking data, can all be easily performed in a programming language, but would require substantial time (and computing power) in a spreadsheet program. Similarly, data can be easily imported into a programming language and a report automatically generated, so the sport scientist can save what time would be spent clicking, dragging and dropping, to focus on other, more important things.

---

If you are curious about sport data, analytics and programming languages, take a look at our new Graduate Certificate in Data Analytics for Sports Performance at Victoria University. Learn how to critically appraise sport technologies, contextualise knowledge from analytical insights and devise how decision-support systems can help humans synthesise data.

Previous
Previous

The Recruiter’s Eye