Jul 6, 2017 @ 04:44 |
The Saarland University has presented the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB web camera.
More and more applications require the capture of a person and their movements as a digital 3-D model in real-time: starting with virtual figures in computer games, through motion analysis in sports, and in medical examinations. Until now, this was only possible with expensive camera systems. Computer scientists at the Max Planck Institute for Computer Science have now developed a system that requires only a web camera. It can even calculate the 3D pose from a pre-recorded video, for example, from the YouTube online platform. This allows completely new applications, including the motion analysis via smartphone.
“With our system, you can even create a 3D motion model in the Alps, even in real-time and with the camera of your smartphone,” says Dushyant Mehta, PhD student at the Max Planck Institute for Computer Science (MPI) He has developed with his colleagues from the group “Graphics, Vision and Video”. The group is headed by Professor Christian Theobalt.
“Up to now, this was only possible with several cameras or a so-called depth camera, which is, for example, integrated into the Kinect from Microsoft,” explains Srinath Sridhar, who is also researching the MPI.
The progress allows for a special kind of neural network that the researcher calls a “folded neural network”, and which causes a sensation in industry and business under the term “deep learning”. The researchers from Saarbrücken have developed a new method to calculate the three-dimensional image of the person from the two-dimensional information of the video stream in the shortest possible time. This is what a specific short video shows on their website. A researcher juggles with clubs in the depth of a room, in the foreground a monitor shows the corresponding video recording. The figure of the researcher is here superimposed by a delicate, red stick figure. No matter how fast or how far the researcher pulls out, the line skeleton makes the same movements,
The researchers have baptized their system “VNect”. Before assessing the 3D pose of the person, it determines its position in the image. This does not waste any computing power on image regions that do not show the person. The neural network was trained with over ten thousand annotated images during machine learning. This allows the current 3D pose to be specified in the form of the corresponding joint angles, which can easily be transformed into virtual figures.
“With VNect, more people will be able to control computer games by body movement in the future. They do not need either an expensive depth camera, several cameras, nor do they need to wear special markers. Your web camera is enough. It is even possible to create completely new experiences in virtual reality, “explains Mehta. In addition, VNect is also the first system that only requires a video to create the 3D motion model of a person. “The range of possible applications for VNect is therefore enormous,” explains Professor Christian Theobalt, who heads the “Graphics, Vision and Video” group at MPI. “The range extends from man-machine interaction to human robot interaction to industrial 4.0, where humans and robots work side by side. Or think of autonomous driving.
Yet VNect is still pushing boundaries. The accuracy of the new system is somewhat lower than the accuracy of systems based on multiple cameras or markers. VNect also gets into trouble when the face of the person is hidden and if the movements are too fast or too little correspond to the learned prototypes. Several people in front of the camera also cause VNect problems.
However, MPI researcher Srinath Sridhar is convinced that VNect will continue to evolve and process complex scenes that can be used without problems in everyday life.
- Keywords: Shelf Webcams, 3D motion model, VNect, machine learning.
- Source: Saarland University
- Image: Saarland University