Facial Landmark Detection is a computer vision topic and it deals with the problem of detecting distinctive features in human faces automatically.

  • Tip of the nose
  • Corners of the eyes
  • Corners of the eyebrows
  • Corners of the mouth
  • Eye pupils

The detected landmarks are used in several different applications.

Introduction

The following quote introduces to facial landmarks detection: "As computer vision engineers and researchers we have been trying to understand the human face since the very early days. The most obvious application of facial analysis is Face Recognition. But to be able to identify a person in an image we first need to find where in the image a face is located. Therefore, face detection — locating a face in an image and returning a bounding rectangle / square that contains the face — was a hot research area. In 2001, Paul Viola and Michael Jones pretty much nailed the problem with their seminal paper titled “Rapid Object Detection using a Boosted Cascade of Simple Features.” In the early days of OpenCV and to some extent even now, the killer application of OpenCV was a good implementation of the Viola and Jones face detector." (1)

Facial landmark detection is also referred to as “facial feature detection”, “facial keypoint detection” and “face alignment” in the literature. Today more and more approaches with neural networks are developed and outperform classical approaches.

Metric

There are three different metric which are commonly used to measure the performance of facial landmark detection.

  • Mean Distance: The mean distance of detected landmarks compared to the ground truth landmarks.
  • Mean Error Rate: The mean distance of detected landmarks compared to the ground truth landmarks and divided by the ocular distance of the image.
  • Mean Failure Rate: A failure rate for detection is also an important metric. This rate is calculated as the percentage of the detected landmarks within a certain threshold of error or mean distance.

Network Structures

One typical State-of-the-Art structure for a facial landmark detection neural network is shown below.

Figure 1: Network architecture of a facial landmark detection neuronal network. First convolution are applied to the input image, then fully connected layers are applied. (2)

The first part contains convolutions and max pooling operations. This can be interpreted as searching key features inside the input image. Subsequently, a fully connected layer is applied. Neurons which represent the location of each landmark are activated respectively. In the present day, these networks can achieve a mean error rate of  8% (2) .

Applications of Facial Keypoint Detection

There are several applications of keypoint detection in human faces. A few of them are listed below.(1)

Facial feature detection improves face recognition

Facial landmarks can be used to align facial images to an intermediary face shape so that the location of the facial landmarks in all images are approximately the same after the alignment.

Male/Female Distinction

Automatical Landmark Detection can be utilized to distinguish between male and female faces. The distribution of all landmarks is typical for male and female face. Using neural nets and large datasets this pattern can be learned and applied.

Facial Expression Distinction

It is easily possible to get information about the facial expression of someone with use of landmarks.

Head pose estimation

Once a few landmark points are known, it is possible to estimate the pose of the head as well. In other words, it is possible to figure out how the head is oriented in space, to which direction the person is looking at. This information can be used for several applications for example driver assistance systems.

Face Morphing

Facial landmarks can be used to align faces that can then be morphed to produce in-between images. Figure 2 shows are visualization.

Virtual Makeover

The detected landmarks can be used to the calculate contours of the mouth, eyes etc. to render makeup virtually. Figure 3 visualizes the application of virtual makeovering.

Face Replacement

If facial feature points estimations on two faces are present, it is possible to align one face on the other one, and seamlessly clone one face onto the other one. On figure 4 you can see the result of a face replacement application. 

Figure 2: Face Morphing Example (3)

 

Figure 3: Virtual Makeover Exampe (4)

Figure 4: Face Replacement Example (5)

  1. https://arxiv.org/pdf/1511.04031v2.pdf (Facial Landmark Detection with Tweaked Convolutional Neural Networks; Yue Wu, Tal Hassner, KangGeon Kim, Gerard Medioni, Prem Natarajan)
  2. https://www.ics.uci.edu/~xzhu/paper/face-cvpr12.pdf (Face Detection, Pose Estimation, and Landmark Localization in the Wild; Xiangxin Zhu, Deva Ramanan)
  3. http://mmlab.ie.cuhk.edu.hk/projects/TCDCN.html (Facial Landmark Detection by Deep Multi-task Learning; Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang)
  4. http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Wu_Robust_Facial_Landmark_ICCV_2015_paper.pdf (Robust Facial Landmark Detection under Significant Head Poses and Occlusion; Yue Wu Qiang Ji)
  5. http://www.learnopencv.com/face-swap-using-opencv-c-python/ (Face Swapping Example)
  6. http://www.taaz.com/ (Virtual Makeover - Application Examples)
  7. http://jivp.eurasipjournals.springeropen.com/articles/10.1186/s13640-016-0103-z (Head Pose Estimation - Application Example)
  8. https://www.kaggle.com/competitions (One of the most famous facial landmark detection competition; Kaggle)

2 Kommentare

  1. Unbekannter Benutzer (ga69taq) sagt:

    General problems and suggestions:

    • Please refer to your images with figure labels if you wish to explain them in your text.
    • Do not use phrases such as "below", "above", "following" etc. .
    • Add image sources to your bibliography/weblinks instead of under the image without a hyperlink.
    • Do not use phrases such as "one can..." or "you can.." instead say "it is possible to..." or any passive formulation.
    • Avoid using subjective pronouns such as "we", "us" etc., use passive versions of the sentences.
    • Your bibliography style is inconsistent with the rest of the wiki.
    • Most of your weblinks are actually scientific papers and therefore belong to literature

    Corrections (note that some of the corrections have more than one changes):

    • Facial Landmark Detection is a cumputer vision topic and means detecting destinctive features in human faces automatically.
    • Facial Landmark Detection is a computer vision topic and it deals with the problem of detecting distinctive features in human faces automatically.
    • Eye-pupils
    • Eye pupils
    • These landmarks can be later used for several applications.
    • The detected landmarks are used in several different applications.
    • Computer vision engineers and researchers we have been trying to understand the human face since the very early days.
    • The human face and its features have been one of the important topics in computer vision.
    • The most obvious application of facial analysis is Face Recognition.
    • The most common application of facial analysis is Face Recognition. ("Obvious" is obviously your subjective assessment.)
    • But to be able to identify a person in an image we first need to find where in the image a face is located.
    • However, in order to identify a person in an image the location of the face within the image needs to be found.
    • was a hot research area.
    • has been a hot research area. ("had been" if you think past tense is an absolute must.)
    • In 2001, Paul Viola and Michael Jones pretty much nailed the problem with their seminal paper
    • In 2001, Paul Viola and Michael Jones worked on this problem providing a good solution in their seminal paper
    • In the early days of OpenCV and to some extent even now, the killer application of OpenCV was a good implementation of the Viola and Jones face detector.
    • The Viola and Jones' face detection algorithm is featured both in the early versions and to some extent in the later versions of OpenCV.
    • Once you have a bounding box around the face, the obvious research problem is to see if you can find the location of different facial features ( e.g. corners of the eyes, eyebrows, and the mouth, the tip of the nose etc ) accurately.
    • Once the bounding box around the face is found, the next research problem is to find the location of different facial features (such as corners of the eyes, eyebrows, and the mouth, the tip of the nose etc.) accurately.
    • in the literature, and you can use those keywords in Google for finding additional material on the topic.
    • (It is both lazy and a bad idea in general to tell your reader to "go Google it". Instead, you do the googling and give the links to those extra reading materials in your weblinks)
    • The first part contains convolutions and max-pooling operations.
    • The first part contains convolution operations and max pooling operations.
    • This can be understand as
    • This can be interpreted as
    • After that a fully connected layer is applied.
    • Subsequently, a fully connected layer is applied.
    • are activated respectivly.
    • are activated respectively.
    • With this networks a mean error of 8% (2) can be achieved today.
    • In the present day, these networks can achieve a mean error of 8% (2).
    • There are several interesting applications
    • There are several applications
    • to a mean face shape, so that after alignment the location of facial landmarks in all images is approximately the same.
    • to an intermediary face shape so that the location of the facial landmarks in all images are approximately the same after the alignment.
    • Intuitively it makes sense that facial recognition algorithms trained with aligned images would perform much better, and this intuition has been confirmed by many research papers.
    • Intuitively it makes sense that the facial recognition algorithms trained with the aligned images would perform much better, and this intuition is indeed supported by numerous work in the literature. (the sentence still needs a citation to those "research papers")
    • Automatical Landmark Detection can help to
    • Automatic landmark detection can be utilized to
    • With neural nets
    • Using neural nets
    • It is easily possible to get information about the facial expression of someone with use of landmarks.
    • It is possible to obtain information about the facial expression of a person using landmarks.
    • Once you know a few landmark points, you can also estimate the pose of the head.
    • Once a few landmark points are known, it is possible to estimate the pose of the head as well.
    • In other words you can figure out how the head is oriented in space, or where the person is looking.
    • In other words, it is possible to figure out how the head is oriented in space, to which direction the person is looking at.
    • This information can be used for several applications for example driver assistance systems or
    • This information can be used in several applications; for example driver assistance systems or (The sentence abruptly ends here, I believe you forgot to finish it)
    • If you have facial feature points estimated on two faces, you can align one face to the other, and then seamlessly clone one face onto the other.
    • If facial feature points estimations on two faces are present, it is possible to align one face on the other one, and seamlessly clone one face onto the other one.

    Final comments:

    • It is important to note that some of the corrections / suggestions are subjective and for you to decide.
    • Some of your sentences are formulated extremely subjectively.
    • Most of the issues I pointed out under the general topics are omitted in the correction list but I am pretty sure I still missed some stuff. Hopefully the next review will catch any mistakes I skipped.
  2. Unbekannter Benutzer (ga63muv) sagt:

    Hi, Martin,

    All the following arguments are highly subjective

    Aykin's comments covered almost all the shortcomings of you wiki, so I just added something from my view.

    Suggestions:

    • Since the images are not labeled with numbers, you should put them in the right place next to your description.
    • For the main part "Network Structure" in your wiki, I believe that a single picture with several sentences are not enough for explanation. 
    • Due the fact that your wiki is an introduction to "Face Landmark Detection" from papers, it would be better(or not) if you add a conclusion part of you wiki to show the results.
    • You attached 8 web links, but in fact you used only 3 of them, how about the other 5 links? I guess you want the readers to go to the links after seeing the brief introduction behind them. It would be better if you write the introductions to those links in the form of "Related Work".

    Corrections:

    • The detected landmarks can be used to the calculate contours of the mouth, eyes etc. to render makeup virtually.

    • The detected landmarks can be used to the calculate contours of, for instance, mouth, eyes and so on, to render makeup virtually.

     

    Final Comments:

    • The suggestions and corrections above come just from my view, it may not be better than your original version.