RESULTSadmin2021-10-20T22:47:44+02:00

Results

Part-of-Speech Tagging – Open Task

Participants	POS TA	CPOS TA	POS UWTA	CPOS UWTA	Rank
SemaWiki 2	96.75%	97.03%	94.62%	95.30%	1
SemaWiki 1	96.44%	96.73%	94.27%	95.07%	2
SemaWiki 4	96.38%	96.67%	93.13%	93.81%	3
SemaWiki 3	96.14%	96.42%	92.55%	93.24%	4
Pianta	96.06%	96.36%	92.21%	93.24%	5
Lesmo	95.95%	96.26%	92.33%	93.01%	6
Tamburini 1	95.93%	96.40%	90.95%	92.67%	7
Tamburini 2	95.63%	96.16%	91.07%	92.78%	8

Part-of-Speech Tagging – Closed Task

Participants	POS TA	CPOS TA	POS UWTA	CPOS UWTA	Rank
Felice_ILC	96,34%	96,91%	91,07%	93,36%	1
Gesmundo	95,85%	96,48%	91,41%	93,81%	2
SemaWiki 2	95,73%	96,52%	90,15%	93,47%	3
SemaWiki 1	95,24%	96,00%	87,40%	90,72%	4
Pianta	93,54%	94,10%	85,45%	87,74%	5
Rigutini 2	93,37%	94,15%	86,03%	88,43%	6
Rigutini 3	93,31%	94,15%	6,03%	88,55%	7
Rigutini 4	93,29%	94,17%	85,34%	88.09%	8
Rigutini 1	93,10%	93,76%	84,54%	87,06%	9
CSTSøgaard 1	91,90%	93,21%	86,03%	89,58%	10
CSTSøgaard 2	91,64%	93,21%	86,14%	89,92%	11

Dependency Parsing Track

Dependency parsing MDS: evaluation on all the test set (240 sentences)

Participants	LAS	UAS	p-value
UniTo_Lesmo_PAR	88.73	92.28	0.472
UniPi Attardi DPAR	88.67	92.72	0.0001
FBKirst Lavelli DPAR	86.5	90.96	0.005
UniAmsterdam Sangati DPAR	84.98	89.07	0.0001
UniCopenhagen Soegaard DPAR	80.42	89.05	0.0001
CELI Dini DPAR	68	77.95	–

Dependency parsing PDS: evaluation on all the test set (260 sentences)

Participants	LAS	UAS	p-value
UniPi Attardi DPAR	83.38	87.71	0.0001
FBKirst Lavelli DPAR	80.54	84.85	0.0012
UniCopenhagen Soegaard DPAR	78.51	85.81	0.0001
UniTo Lesmo DPAR	73.44	80.80	0.0001
CELI Dini DPAR	57.81	64.10	–

Dependency parsing MDS: evaluation on the shared test set (100 sentences from newspaper), the civil law (100 sentences) and passage (40 sentences)

Participants	shared		civillaw		passage
	LAS	UAS	LAS	UAS	LAS	UAS
UniPi Attardi DPAR	82.60	95.02	92.63	95.38	90.10	92.90
UniTo Lesmo DPAR	84.68	89.73	91.54	94.64	89.36	91.58
FBKirst Lavelli DPAR	79.91	87.15	90.23	93.33	89.11	91.75
UniAmsterdam Sangati DPAR	76.66	87.99	89.93	95.51	87.87	93.89
UniCopenhagen Soegaard DPAR	72.84	81.93	86.04	90.27	80.94	85.31
CELI Dini DPAR	63.86	70.15	70.74	74.97	68.89	73.35

Dependency parsing PDS: evaluation on the shared test set (100 sentences from newspapers), and the remaining test corpus (160 sentences)

Participants	shared		rest
	LAS	UAS	LAS	UAS
UniPi Attardi DPAR	84.67	88.99	82.70	87.04
FBKirst Lavelli DPAR	81.12	85.02	80.24	84.76
UniCopenhagen Soegaard DPAR	78.61	85.26	78.45	86.10
UniTo Lesmo DPAR	75.12	82.58	72.56	79.88
CELI Dini DPAR	60.78	67.07	56.27	62.55

Constituency Parsing Track

Constituency parsing: evaluation on all the test set (200 sentences)

Participants	LF	LR	LP	P for LR	P for LP
FBKirst Lavelli CPAR	78.73	80.02	77.48	0.1592	0.0021
UniAmsterdam Sangati CPAR	75.79	78.53	73.24	–	–

Constituency parsing: separate evaluation on the newspaper (100 sentences) and civil law (100 sentences) test set.

Participants	newspaper	civillaw
LF	LR	LP	LF	LR	LP
FBKirst Lavelli DPAR	76.21	76.08	76.34	80.66	83.15	78.33
UniAmsterdam Sangati DPAR	74.33	76.08	72.65	76.93	80.47	73.69

Lexical Substitution

Results obtained using the scoring type best

Participants	Prec.	Rec.	F	mode P	mode R
uniba2	8.16	7.18	7.64	10.58	10.58
baroniCutugnoLenciPucci	6.26	6.01	6.13	11.28	10.84
uniba1	6.80	5.53	6.10	8.90	8.90
uniba3	6.28	5.46	5.84	8.13	8.13
decao3	3.95	3.21	3.54	6.58	6.58
decao2	3.90	3.17	3.50	6.71	6.71
decao1	3.16	3.16	3.16	6.97	6.97
decao4	3.52	2.80	3.12	5.03	5.03

baseline psc	10.86	9.06	9.88	13.94	13.94
baseline iwn psc	9.71	8.19	8.89	13.16	13.16
baseline iwn	2.72	1.78	2.15	2.19	2.19

Results obtained using the scoring type oot

Participants	Prec.	Rec.	F	mode P	mode R
uniba2	41.46	36.50	38.82	47.23	47.23
uniba1	37.74	30.69	33.85	34.84	34.84
uniba3	28.54	24.79	26.53	34.58	34.58
decao3	23.48	19.11	21.07	26.58	26.58
decao2	23.00	18.72	20.64	26.32	26.32
decao1	20.09	20.09	20.09	27.74	27.74
decao4	18.62	14.78	16.48	20.52	20.52
baroniCutugnoLenciPucci	16.65	16.00	16.32	24.97	24.00

baseline iwn psc	27.52	23.23	25.19	37.24	32.39
baseline psc	23.00	19.20	20.93	26.97	26.97
baseline iwn	14.51	9.51	11.49	12.77	12.77

Named Entity Recognition

Systems’ results in terms of F-Measure, Precision and Recall

Participants	Over.	Over.	Over.	F1
	FB1	Prec.	Rec.	GPE	LOC	ORG	PER
FBK_ZanoliPianta	82.00	84.07	80.02	85.13	51.24	70.56	88.31
UniGen_Gesmundo_r2	81.46	86.06	77.33	83.36	50.81	71.08	87.41
UniTN-FBK-RGB_r2	81.09	83.20	79.08	85.25	52.24	69.61	86.69
UniTN-FBK-RGB_r1	80.90	83.05	78.86	85.19	54.62	69.41	86.30
UniTN_Nguyen_r1	79.77	82.26	77.43	82.85	42.34	67.89	86.44
UniTN_Nguyen_r2	79.61	81.65	77.67	82.49	50.85	67.38	86.25
UniGen_Gesmundo_r1	76.21	83.92	69.79	79.07	47.06	64.67	82.04
UniTN_Rigo_r2	74.98	81.08	69.73	75.96	38.32	60.36	83.18
UniTN_Rigo_r1	74.34	80.71	68.91	75.77	31.16	59.87	82.38
UniPI-ILC-CNR_r2	69.67	75.42	64.74	71.42	38.91	58.37	76.38
UniPI-ILC-CNR_r1	67.98	73.65	63.11	71.66	27.45	57.02	73.85
ECNU_Cai	61.03	65.55	57.09	69.25	28.72	51.49	63.49
BASELINE	43.99	42.80	45.25	69.00	37.07	45.54	32.06
BASELINE –u	39.14	40.58	37.80	52.75	28.57	44.23	32.10

Local Entity Detection and Recognition

Percentages for Value, Precision, Recall and F-measure of the participating system

Participants	LEDR				EMD
	Value	Prec.	Rec.	F	Value	Prec.	Rec.	F
FBKirst_UNITN	36.7%	78.5%	61.1%	68.7%	65,7%	78,1%	74,1%	76,1%

Textual Entailment

Run	Correct	Accuracy
FBKirst_run1.txt	285	0.71
FBKirst_run2.txt	282	0.71
ofe_semTypes_1.txt	257	0.64
ofe_semTypes_2.txt	228	0.57
ofe_lexical_2.txt	230	0.58
ofe_lexical_1.txt	225	0.56
FBKirst_run4.txt	202	0.51
FBKirst_run3.txt	199	0.50

Connected Digits Recognition

Clean ASR task. Results are ordered by Word Accuracy. In the last column, T means that a non-official training was used, L means that the results were delivered late.

Systems	Sentence Acc.	Word Acc.	Words	Corr.	Err	Del+Ins+Sub
ISTC-SONIC_2	96.44%	99.45%	2360	2353	13	7+6+0
ISTC-SONIC_1	96.44%	99.45%	2360	2350	13	8+3+2
ISTC-SPHINX_1	96.16%	99.32%	2360	2352	16	4+8+4
ABLA-NUANCE	95.89%	99.28%	2360	2345	17	6+2+9	T
ISTC-OGI_1	95.62%	99.19%	2360	2346	19	6+5+8
ISTC-OGI_2	94.25%	98.94%	2360	2342	25	11+7+7
ISTC-SPHINX_2	93.70%	98.77%%	2360	2345	29	6+14+9
CEDAT85	89.59%	98.05%	2360	2333	46	5+19+22	T
ABLA-TSPEECH	81.64%	96.06%	2360	2270	93	34+3+56	T
UNINA	18.36%	77.84%	2360	1941	523	116+104+303	L

Noisy ASR task Results are ordered by Word Accuracy. In the last column, T means that a non-official training was used.

Participants	Sentence Acc.	Word Acc.	Words	Corr.	Err	Del+Ins+Sub
ISTC-SONIC_2	87.77%	96.21%	4036	3896	153	104+13+36
ISTC-SONIC_1	86.45%	95.91%	4036	3882	165	105+11+49
ISTC-OGI_2	81.82%	93.95%	4036	3821	244	121+29+94
ISTC-SPHINX_1	79.17%	93.06%	4036	3807	280	136+51+93
ISTC-OGI_1	81.65%	99.45%	4036	3767	306	135+37+134
ISTC-SPHINX_2	72.56%	91.63%	4036	3779	338	133+81+124
CEDAT85	78.02%	91.03%	4036	2353	362	255+36+71	T
ABLA-NUANCE	77.69%	88.65%	4036	3604	458	268+26+164	T
ABLA-NUANCE	69.09%	82.23%	4036	3375	717	467+56+194	T

Spoken Dialogue Systems Evaluation

Dialog level statistics

Participants	Duration (sec)	Duration (# Turns)
UniNA	145.8±72.7	11.0±5.7
Loquendo	182.2±84.7	18.9±8.9
UniTN	206.4±81.7	24.4±10.1

Task durations (#turns: mean±std.dev.) and success rates

Task	UniNA		Loquendo		UniTN
	Duration (turns)	Tsr (corr/req)	Duration (turns)	Tsr (corr/req)	Duration (turns)	Tsr (corr/req)
Identify representative	1.9 ± 0.4	100.0% (19/19)	2.4 ± 0.8	95.0% (19/20)	3.1 ± 0.5	90.5% (19/21)
Ask customer detail	2.0 ± 0.0	83.3% (5/6)	2.3 ± 0.5	88.9% (8/9)	3.4 ± 1.6	54.6% (12/22)
List orders	2.5 ± 1.5	0.0% (0/8)	2.0 ± 0.0	80.0% (4/5)	3.0 ± 0.0	75.0% (3/4)
Show last order	2.0 ±0.0	100% (1/1)	–	–	–	–
List customers	2.0 ± 0.0	50.0% (2/4)	2.0 ± 0.0	0.0% (0/8)	3.0 ± 0.0	66.7% (2/3)
New order	4.6 ± 1.5	36.4% (4/11)	4.3 ± 1.8	42.9% (9/21)	7.5 ± 2.8	63.2% (12/19)
List products by category	3.0 ± 1.0	14.3% (1/7)	–	–	3.0 ± 0.0	100.0% (3/3)
List products by brand	–	–	–	–	3.0 ± 0.0	50.0% (1/2)
List products – other	2.0 ± 0.0	0.0% (0/4)	3.0 ± 0.8	25.0% (2/8)	3.8 ± 1.6	44.4% (4/9)
Search single product	2.3 ± 0.4	55.6% (5/9)	2.8 ± 1.6	77.8% (14/18)	3.5 ± 2.5	78.6% (11/14)
Ask for help	2.0 ± 0.0	100% (3/3)	–	–	2.0 ± 0.0	100.0% (2/2)
Exit application	2.5 ± 0.5	100.0% (5/5)	–	0.0% (0/1)	2.4 ± 0.8	25.0% (4/16)
OVERALL (corr/req)	–	58.4% (45/77)	–	62.2% (56/90)	–	63.5% (73/115)

Speaker Identity Verification

Application Track.

DCF analysis for TS1. Axes are log base 10 of normalized DCF.

DCF analysis for TS2. Axes are log base 10 of normalized DCF.

Forensic Track.

For detailed information about the results, please see the report