What?
In the world of fitness and personal training, keeping track of exercise repetitions is crucial for tracking progress, setting goals, and ensuring proper form. While manually counting reps can be tedious and error-prone, modern computer vision techniques offer a more accurate and convenient solution. On top of that, there's a huge value in processing historical data (time series analysis), which can be used to predict future outcomes, such as potential injuries, performance plateaus, or areas for improvement. In this blog post, we'll explore how to leverage the power of YOLOv8, a state-of-the-art object detection and pose estimation model in conjunction with PyTorch and NVIDIA CUDA (which I covered in the previous post), to build an exercise counting application.
Why?
The YOLOv8-Pose Model: YOLOv8 is the latest iteration of the popular You Only Look Once (YOLO) object detection algorithm, designed by Ultralytics. In addition to object detection, YOLOv8 also includes a pose estimation branch, capable of detecting 17 key points on the human body. This capability makes it an ideal choice for tracking exercises that involve specific body movements and positions. While tools like Google's MediaPipe also offer pose estimation capabilities, YOLOv8 stands out with its end-to-end approach, combining object detection and pose estimation into a single, unified model. This streamlined architecture can lead to faster inference times and improved overall performance, making it a compelling choice for real-time applications like exercise tracking.
The Power of PyTorch: One of the key advantages of YOLOv8 is its integration with PyTorch, a powerful and flexible deep learning framework. PyTorch's dynamic computation graph and intuitive programming style make it an excellent choice for research and innovation in the field of computer vision. By utilizing PyTorch, we can not only deploy pre-trained models like YOLOv8-Pose but also experiment with model architectures, fine-tune existing models, and develop custom solutions tailored to our specific needs. This flexibility is particularly valuable in the rapidly evolving field of computer vision, where new techniques and approaches are constantly emerging.
NVIDIA CUDA and Hardware Acceleration: While PyTorch provides a powerful framework for model development, harnessing the full potential of these models often requires hardware acceleration. This is where NVIDIA CUDA comes into play, offering a parallel computing platform and programming model that can significantly boost performance on NVIDIA GPUs. By leveraging CUDA, we can offload computationally intensive tasks, such as model inference and training, to the GPU, achieving faster processing times and enabling real-time applications like exercise tracking. This hardware acceleration not only improves the user experience but also facilitates more efficient model development and experimentation.
How?
The Approach: The exercise counting application I'm describing here utilizes the YOLOv8-Pose model to detect key points in the human body during exercise routines. By identifying discriminative key points and calculating the angles between specific key point lines, we can determine when a repetition is completed. For example, during a squat, we can track the angle between the hip, knee, and ankle points to determine when the user reaches the lowest position and returns to the starting position, counting one repetition.
NOTE: I won't cover the environment setup, code prep, and compilation here, as all these routines are well-covered in the README file in my GitHub repository
Sample code:
import cv2
import numpy as np
import math
from ultralytics import YOLO
from ultralytics.utils.plotting import Annotator, Colors
from copy import deepcopy
def calculate_angle(key_points, left_points_idx, right_points_idx):
def _calculate_angle(line1, line2):
# Calculate the slope of two straight lines
slope1 = math.atan2(line1[3] - line1[1], line1[2] - line1[0])
slope2 = math.atan2(line2[3] - line2[1], line2[2] - line2[0])
# Convert radians to angles
angle1 = math.degrees(slope1)
angle2 = math.degrees(slope2)
...
The source code contains the most interesting parts, such as loading the YOLO model, video capture prep, plotting the skeleton, and visualizing the results.
Before running the pose estimation, we need to prepare a model:
yolo export model=yolov8s-pose.pt format=engine device=0
This command is used with the YOLOv8 framework to export a pre-trained YOLOv8 model for deployment on NVIDIA TensorRT. YOLOv8 will export the yolov8s-pose.pt
model as a TensorRT engine file, which can then be deployed and used for efficient pose estimation inference on NVIDIA GPUs. TensorRT engines are optimized for fast inference and can significantly improve the performance of deep learning models, especially on NVIDIA hardware. Here's a quick outline:
On my Razer laptop (with NVIDIA GeForce RTX 4080), it took a while with quite heavy GPU utilization:
Now, after running the app:
python demo.py --sport squat --model yolov8s-pose.pt --show True --input 20240316_orig.mp4
We're getting the following results:
Please note the FPS rate... wow ;)
A higher FPS rate is generally desirable because it allows for more accurate tracking of fast movements, smoother visualization, and the ability to capture subtle details in the exercise form. In essence, we were able to achieve that with NVIDIA GPUs with CUDA support and utilizing efficient deep learning frameworks, such as PyTorch with TorchScript or TensorRT integration. The real-time processing is as smooth as the processing of pre-recorded videos.
What's next?
While the current implementation of this exercise counting application is a good and solid basis, it is crucial to continue advancing to further improve the quality, precision, and performance of such apps. There's a huge room for improvement. The field of computer vision and AI is rapidly evolving, with new techniques, algorithms, and hardware capabilities emerging regularly. At the same time, application-wise, here are a few areas to think about:
Scoring System. While counting repetitions is valuable, evaluating the quality and precision of the exercises performed is also important. To address this, we can introduce a scoring system that takes into account factors such as range of motion, stability, and tempo.
def score_exercise(keypoints, exercise_type):
# Calculate range of motion score
rom_score = calculate_rom_score(keypoints, exercise_type)
# Calculate stability score
stability_score = calculate_stability_score(keypoints)
# Calculate tempo score
tempo_score = calculate_tempo_score(keypoints)
# Combine scores with appropriate weights
total_score = 0.4 * rom_score + 0.3 * stability_score + 0.3 * tempo_score
return total_score
These scoring functions can be implemented using various techniques, such as tracking the distance between key points, analyzing the smoothness of key point trajectories, and monitoring the time between repetitions.
Model Improvement. The combination of PyTorch and NVIDIA CUDA opens up exciting possibilities for model improvement. By leveraging techniques such as transfer learning, data augmentation, and advanced optimization algorithms, we can fine-tune and enhance the YOLOv8-Pose model to better suit our specific exercise tracking needs!
# Fine-tune YOLOv8-Pose model on custom exercise dataset
model = YOLO("yolov8n-pose.pt")
model.train(data="path/to/exercise/dataset", epochs=100, imgsz=640)
# Train model on CUDA-enabled GPU
device = select_device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.train(data="path/to/exercise/dataset", epochs=100, imgsz=640, device=device)
For example, we could collect a custom dataset of exercise videos, annotate the key points, and use transfer learning to adapt the YOLOv8-Pose model to our data. This tailored model could potentially offer improved accuracy and robustness for detecting and tracking exercise repetitions, leading to a more reliable and effective exercise counting application.
Potential areas for further research and Improvement. To further enhance the quality, precision, and performance of exercise tracking and analysis systems, several areas could be explored:
- Biomechanical Modeling: Integrating biomechanical models and physics simulations could improve the evaluation of exercise form, joint angles, and potential injury risks, providing more detailed feedback to users
- Personalized Exercise Programs: By combining pose estimation with user profiles and fitness goals, personalized exercise programs could be generated and tailored to individual needs and preferences
- Augmented Reality (AR) Guidance: Leveraging AR technology, real-time visual cues and overlays could be displayed, guiding users through proper exercise form and correcting any deviations
- Explainable AI: Developing more interpretable and transparent AI models could help users better understand the reasoning behind exercise evaluations and recommendations, fostering trust and encouraging active engagement
- Edge Computing and Deployment: Exploring efficient deployment strategies on edge devices, such as smartphones or wearables, could enable exercise tracking and analysis in more diverse settings without the need for powerful workstations or cloud infrastructure (NVIDIA Jetson is something definitely worthwhile exploring)
- Predictive Analysis: By combining pose estimation data with historical user performance, machine learning models could be trained to predict future outcomes, such as potential injuries, performance plateaus, or areas for improvement. This predictive analysis could be achieved by incorporating additional tools and techniques like:
- Time-series Analysis: Analyzing the temporal patterns in pose data and exercise metrics to identify trends and make forecasts.
- Recurrent Neural Networks (RNNs): Utilizing RNNs, which are well-suited for sequential data, to capture the temporal dependencies in exercise data and make predictions
- Reinforcement Learning: Reinforcement learning algorithms are employed to learn optimal exercise programs or form adjustments based on predicted outcomes and user feedback
The predictive analysis could provide users with valuable insights and recommendations, helping them proactively address potential issues and optimize their training routines for better results and injury prevention.
Conclusion
Combining the power of YOLOv8-Pose, PyTorch, and NVIDIA CUDA, I've built an exercise counting application that not only tracks repetitions but also evaluates the quality and precision of the exercises performed. This approach offers a more engaging and effective way for users to monitor their fitness progress, receive valuable feedback, and adjust their technique accordingly.
As we've seen, PyTorch's flexibility and NVIDIA CUDA's performance benefits enable us to push the boundaries of computer vision applications, continually improving and refining our models to better meet users' evolving needs. While this application currently supports squats, push-ups, and sit-ups, the modular nature of the codebase makes it easy to extend to other exercise types by adjusting the key point selection and angle thresholds. Additionally, the scoring system can be further refined and customized based on specific training goals and preferences. Technologies continue to advance, NVIDIA Jetson is something I would definitely want to play with next time!
Like the article and enjoyed reading it? Buy me a coffee!