ACMMM MEGC2024

Click here to download the CFP. This year's Grand Challenge comprises of two tracks:

Cross-Cultural Spotting (CCS) Challenge | Codalab site
Spot-then-Recognize (STR) Challenge | Codalab site

Important Dates

Submission Open: 20th June 2024
Submission Deadline (for both Codalab submission and Paper): ~~29th July 2024~~ 6th August 2024(FIRM)
Notification of Results and Accepted Papers: ~~5th August 2024~~ 12th August 2024
Camera-Ready Deadline 19th August 2024 (FIRM)

Cross-Cultural Spotting (CCS) Task

To facilitate the establishment of robust and transferable ME spotting methods, an unseen cross-cultural long video test set will be used to validate the efficacy of spotting algorithms. By “unseen” and “cross-cultural”, we mean that the data has not been publicly released, and consists of subjects from diverse ethnicities and cultures. All participating algorithms are required to run on this test set and submit their results for spotting micro- and macro-expressions.

The unseen test set, which was first used in MEGC2023, will contain at least 30 long videos curated from unreleased data of the SAMM and CAS(ME)³ datasets. Both these datasets have different video frame rates (SAMM: 200 fps; CAS(ME)³: 30 fps), which will challenge participants to submit techniques that are also robust toward temporal sampling. For learning-based techniques, participants can use any kind of training set or combination of training sets, from the vast selection of databases available.

Recommended Training Databases

SAMM Long Videos with 147 long videos at 200 fps (average duration: 35.5s).
- To download the dataset, please visit: http://www2.docm.mmu.ac.uk/STAFF/M.Yap/dataset.php. Download and fill in the license agreement form, email to M.Yap@mmu.ac.uk with email subject: SAMM long videos.
- Reference: Yap, C. H., Kendrick, C., & Yap, M. H. (2020, November). SAMM long videos: A spontaneous facial micro-and macro-expressions dataset. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (pp. 771-776). IEEE.
CAS(ME)² with 97 long videos at 30 fps (average duration: 148s).
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e3. Download and fill in the license agreement form, submit throuth the website. >.
- Reference: Qu, F., Wang, S. J., Yan, W. J., Li, H., Wu, S., & Fu, X. (2017). CAS (ME) $^ 2$: a database for spontaneous macro-expression and micro-expression spotting and recognition. IEEE Transactions on Affective Computing, 9(4), 424-436.
SMIC-E-long with 162 long videos at 100 fps (average duration: 22s).
- To download the dataset, please visit: https://www.oulu.fi/cmvs/node/41319. Download and fill in the license agreement form (please indicate which version/subset you need), email to Xiaobai.Li@oulu.fi.
- Reference: Tran, T. K., Vo, Q. N., Hong, X., Li, X., & Zhao, G. (2021). Micro-expression spotting: A new benchmark. Neurocomputing, 443, 356-368.
CAS(ME)³ with 1300 long videos at 30 fps (average duration: 98s).
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e4. Download and fill in the license agreement form, submit throuth the website.
- Reference: Li, J., Dong, Z., Lu, S., Wang, S. J., Yan, W. J., Ma, Y., ... & Fu, X. (2022). CAS (ME)³: A third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, doi: 10.1109/TPAMI.2022.3174895..
4DME with 270 long videos at 60 fps (average duration: 2.5s).
- To download the dataset, please visit: https://www.oulu.fi/en/university/faculties-and-units/faculty-information-technology-and-electrical-engineering/center-machine-vision-and-signal-analysis. Download and fill in the license agreement form , email to Xiaobai.Li@oulu.fi.
- Reference: Li, X., Cheng, S., Li, Y., Behzad, M., Shen, J., Zafeiriou, S., ... & Zhao, G. (2022). 4DME: A spontaneous 4D micro-expression dataset with multimodalities. IEEE Transactions on Affective Computing.

Unseen Test Dataset

This year, we will be using the same unseen cross-cultural long-video test set as MEGC2023 to evaluate spotting algorithms' performances in a fairer manner.
The unseen testing set (MEGC2023-testSet) contains 30 long video, including 10 long videos from SAMM (SAMM Challenge dataset) and 20 clips cropped from different videos in CAS(ME)³ (unreleased before). The frame rate for SAMM Challenge dataset is 200fps and the frame rate for CAS(ME)³ is 30 fps. The participants should test on this unseen dataset.
To obtain the MEGC2023-testSet, download and fill in the license agreement form of SAMM Challenge dataset and the license agreement form of CAS(ME)³_clip, upload the file through this link: https://www.wjx.top/vm/PpmFKf7.aspx# .
- For the request from a bank or company, the participants are required to ask their director or CEO to sign the form.
- Reference:
  1. Li, J., Dong, Z., Lu, S., Wang, S.J., Yan, W.J., Ma, Y., Liu, Y., Huang, C. and Fu, X. (2023). CAS(ME)³: A Third Generation Facial Spontaneous Micro-Expression Database with Depth Information and High Ecological Validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, 1 March 2023, doi: 10.1109/TPAMI.2022.3174895.
  2. Davison, A. K., Lansley, C., Costen, N., Tan, K., & Yap, M. H. (2016). SAMM: A spontaneous micro-facial movement dataset. IEEE Transactions on Affective Computing, 9(1), 116-129.

Evaluation Protocol

Submissions will use the Codalab Competition Leaderboard.
Participants should test their proposed algorithm on the unseen dataset by uploading the predicted result to the Codalab Leaderboard (https://codalab.lisn.upsaclay.fr/competitions/18524) where the evaluation metrics will be calculated.
Evaluation metrics: F1-Score (Overall, SAMM, CAS), Precision (Overall, SAMM, CAS), Recall (Overall, SAMM, CAS)
Submissions to the Leaderboard must be made in the form of a zip file containining the predicted csv files with the following filenames:
- cas_pred.csv (for the CAS(ME)³ samples)
- samm_pred.csv (for the SAMM samples)
An example submission is provided here: example_submission and example_submission_withoutExpType.
Note: For submissions without labelling expression type (me or mae), the labelling will be done automatically using ME threshold of 0.5s (15 frames for CAS and 100 frames for SAMM).
The baseline method can be found in the following paper (please cite):
Zhang, L. W., Li, J., Wang, S. J., Duan, X. H., Yan, W. J., Xie, H. Y., & Huang, S. C. (2020, November). Spatio-temporal fusion for macro-and micro-expression spotting in long video sequences. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (pp. 734-741). IEEE.
while previous submissions to MEGC2023 can be found here: Leaderboard

Spot-then-Recognize (STR) Task

Since the rapid advancement of ME research started about a decade ago, most works have been mainly focused on two separate tasks: spotting and recognition. The task of only recognizing the ME class can be unrealistic in real-world settings since it assumes that the ME sequence has already been identified - an ill-posed problem in the case of a continuous-running video. On the other hand, the spotting task is unrealistic in its applicability since it cannot interpret the actual emotional state of the person observed.

A more realistic setting, also known as "spot-then-recognize", performs spotting followed by recognition in a sequential manner. Only samples that have been correctly spotted in the spotting step (i.e. true positives) will be passed on to the recognition step to be classified for its emotion class. The task will apply leave-one-subject-out (LOSO) cross-validation, and evaluated individually on the CAS(ME)² and SAMM Long Video datasets using selected metrics.

Reference:

Liong, G-B., See, J. and C.S. Chan (2023). Spot-then-recognize: A micro-expression analysis network for seamless evaluation of long videos. Signal Processing: Image Communication, vol. 110, pp. 116875, January 2023, doi: 10.1016/j.image.2022.116875

Evaluation Protocol

Submissions will use the Codalab Competition Leaderboard.
Participants should upload the predicted results for both CAS(ME)² and SAMM Long Video datasets to the Codalab Leaderboard (https://codalab.lisn.upsaclay.fr/competitions/19359) where specific evaluation metrics will be calculated.
Since there are no specific train-test partitions, leave-one-subject-out (LOSO) cross-validation must be used to perform spotting on the held-out samples. The true positive (TP) intervals from the Spotting step should be passed onto the Analysis step to predict the emotion class. Sample code from here can be used as template or reference.
Evaluation metrics (for SAMM, CAS):
- F1-score, for Spotting and Analysis steps. (Higher the better)
- Spot-then-Recognize Score (STRS), which is the product of the Spotting and Analysis F1-scores. (Higher the better)
Submissions to the Leaderboard must be made in the form of a zip file containining the predicted csv files with the following filenames:
- cas_pred.csv (for the CAS(ME)² samples)
- samm_pred.csv (for the SAMM Long Video samples)
An example submission is provided here: example_submission_STR.
The evaluation script is available at https://github.com/genbing99/STRS-Metric.
The baseline method can be found in the following paper (please cite):
Liong, G-B., See, J. and C.S. Chan (2023). Spot-then-recognize: A micro-expression analysis network for seamless evaluation of long videos. Signal Processing: Image Communication, Vol. 110, pp. 116875.

Submission

Please note: The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth (AoE).

Challenge submission platform for CCS task: https://codalab.lisn.upsaclay.fr/competitions/18524
Challenge submission platform for STR task: https://codalab.lisn.upsaclay.fr/competitions/19359
Submission guidelines:
- Submitted papers (.pdf format) must use the ACM Article Template https://www.acm.org/publications/proceedings-template as used by regular ACM MM submissions. Please use the template in traditional double-column format to prepare your submissions. For example, word users may use Word Interim Template, and latex users may use sample-sigconf template.
- Grand challenge papers will go through a single-blind review process. Each grand challenge paper submission is limited to 4 pages with 1-2 extra pages for references only.
- For all other required files besides the paper, please submit in a single zip file and upload to the submission system as supplementary material. It is compulsory to include:
  - GitHub repository URL containing codes of your implemented method, and all other relevant files such as feature/parameter data.
  - CSV files reporting the results, i.e. cas_pred.csv, samm_pred.csv (for both CCS and STR)
  The organizers have the right to reject any submissions that: 1) are not accompanied by a paper, 2) did not share the code repository and reported results for verification purposes.

Winners

This year, fifteen teams participated in Track 1 (CCS) while four teams participated in Track 2 (STR). The top two submissions with the highest score (i.e. Overall F1-score for CCS, STRS for STR) were accepted for publication in the MM'24 ACM Proceedings, after a review process was conducted.

Track 1: Cross-Cultural Spotting (CCS) Task

Rank	Contestant	Affiliation	Article Title	GitHub Link
1st Place 🥇	Jun Yu¹, Yaohui Zhang¹, Gongpeng Zhao¹, Peng He¹, Zerui Zhang¹, Zhongpeng Cai¹, Qingsong Liu², Jianqing Sun², and Jiaen Liang²	1 University of Science and Technology of China (USTC) 2 Unisound AI Technology Co. Ltd.	Micro-expression Spotting based on Optical Flow Feature with Boundary Calibration	https://github.com/new11-ops/2024
2nd Place 🥈	Zhengye Zhang, Sirui Zhao, Xinglong Mao, Shifeng Liu, Hao Wang, Tong Xu, and Enhong Chen	University of Science and Technology of China (USTC)	A Multi-scale Feature Learning Network with Optical Flow Correction for Micro- and Macro-expression Spotting	https://github.com/zzy188zzy/megc_spotting_code

Track 2: Spot-then-Recognize (STR) Task

Rank	Contestant	Affiliation	Article Title	GitHub Link
1st Place 🥇	Jun Yu¹, Gongpeng Zhao¹, Yaohui Zhang¹, Peng He¹, Zerui Zhang¹, Zhao Yang², Qingsong Liu³, Jianqing Sun³, and Jiaen Liang³	1 University of Science and Technology of China (USTC) 2 Xi'an Jiaotong University, China 3 Unisound AI Technology Co. Ltd.	Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize	https://github.com/zgp123-wq/MEGC2024-STR
2nd Place 🥈	Yuhong He, Wenchao Liu, Guangyu Wang, Lin Ma, and Haifeng Li	Harbin Institute of Technology, China	Enhancing Micro-Expression Analysis Performance by Effectively Addressing Data Imbalance	https://github.com/hitheyuhong/MEGC2024-CODE

-->

Frequently Asked Questions

Q: How to deal with the spotted intervals with overlap?
A: We consider that each ground-truth interval corresponds to at most one single spotted interval. If your algorithm detects multiple with overlap, you should merge them into an optimal interval. The fusion method is also part of your algorithm, and the final result evaluation only cares about the optimal interval obtained.
Q: For the STR challenge, how many classes are used in the classification part?
A: You are required to only classify emotions into three classes: "negative", "positive", "surprise". Only correctly spotted micro-expressions are passed on to the classification part, also knowns as Analysis (on the Leaderboard). The "other" class is not included in the evaluation calculation for the Analysis part. However, all occurrences, including those labelled with the "other" class are considered in the Spotting part as they are micro-expressions.