Skip to main content

Results of EVALITA 2009

Part-of-Speech Tagging - Open Task

ParticipantsPOS TACPOS TAPOS UWTACPOS UWTARank
SemaWiki 296.75%97.03%94.62%95.30%1
SemaWiki 196.44%96.73%94.27%95.07%2
SemaWiki 496.38%96.67%93.13%93.81%3
SemaWiki 396.14%96.42%92.55%93.24%4
Pianta96.06%96.36%92.21%93.24%5
Lesmo95.95%96.26%92.33%93.01%6
Tamburini 195.93%96.40%90.95%92.67%7
Tamburini 295.63%96.16%91.07%92.78%8

Part-of-Speech Tagging - Closed Task

ParticipantsPOS TACPOS TAPOS UWTACPOS UWTARank
Felice_ILC96,34%96,91%91,07%93,36%1
Gesmundo95,85%96,48%91,41%93,81%2
SemaWiki 295,73%96,52%90,15%93,47%3
SemaWiki 195,24%96,00%87,40%90,72%4
Pianta93,54%94,10%85,45%87,74%5
Rigutini 293,37%94,15%86,03%88,43%6
Rigutini 393,31%94,15%6,03%88,55%7
Rigutini 493,29%94,17%85,34%88.09%8
Rigutini 193,10%93,76%84,54%87,06%9
CSTSøgaard 191,90%93,21%86,03%89,58%10
CSTSøgaard 291,64%93,21%86,14%89,92%11

Dependency Parsing Track

Dependency parsing MDS: evaluation on all the test set (240 sentences)

ParticipantsLASUASp-value
UniTo_Lesmo_PAR88.7392.280.472
UniPi Attardi DPAR88.6792.720.0001
FBKirst Lavelli DPAR86.590.960.005
UniAmsterdam Sangati DPAR84.9889.070.0001
UniCopenhagen Soegaard DPAR80.4289.050.0001
CELI Dini DPAR6877.95-

Dependency parsing PDS: evaluation on all the test set (260 sentences)

ParticipantsLASUASp-value
UniPi Attardi DPAR83.3887.710.0001
FBKirst Lavelli DPAR80.5484.850.0012
UniCopenhagen Soegaard DPAR78.5185.810.0001
UniTo Lesmo DPAR73.4480.800.0001
CELI Dini DPAR57.8164.10-

Dependency parsing MDS: evaluation on the shared test set (100 sentences from newspaper), the civil law (100 sentences) and passage (40 sentences)

Participantssharedcivillawpassage
LASUASLASUASLASUAS
UniPi Attardi DPAR82.6095.0292.6395.3890.1092.90
UniTo Lesmo DPAR84.6889.7391.5494.6489.3691.58
FBKirst Lavelli DPAR79.9187.1590.2393.3389.1191.75
UniAmsterdam Sangati DPAR76.6687.9989.9395.5187.8793.89
UniCopenhagen Soegaard DPAR72.8481.9386.0490.2780.9485.31
CELI Dini DPAR63.8670.1570.7474.9768.8973.35

Dependency parsing PDS: evaluation on the shared test set (100 sentences from newspapers), and the remaining test corpus (160 sentences)

Participantssharedrest
LASUASLASUAS
UniPi Attardi DPAR84.6788.9982.7087.04
FBKirst Lavelli DPAR81.1285.0280.2484.76
UniCopenhagen Soegaard DPAR78.6185.2678.4586.10
UniTo Lesmo DPAR75.1282.5872.5679.88
CELI Dini DPAR60.7867.0756.2762.55

Constituency Parsing Track

Constituency parsing: evaluation on all the test set (200 sentences)

ParticipantsLFLRLPP for LRP for LP
FBKirst Lavelli CPAR78.7380.0277.480.15920.0021
UniAmsterdam Sangati CPAR75.7978.5373.24--

Constituency parsing: separate evaluation on the newspaper (100 sentences)
and civil law (100 sentences) test set.

Participantsnewspapercivillaw
LFLRLPLFLRLP
FBKirst Lavelli DPAR76.2176.0876.3480.6683.1578.33
UniAmsterdam Sangati DPAR74.3376.0872.6576.9380.4773.69

Lexical Substitution

Results obtained using the scoring type best

ParticipantsPrec.Rec.Fmode Pmode R
uniba28.167.187.6410.5810.58
baroniCutugnoLenciPucci6.266.016.1311.2810.84
uniba16.805.536.108.908.90
uniba36.285.465.848.138.13
decao33.953.213.546.586.58
decao23.903.173.506.716.71
decao13.163.163.166.976.97
decao43.522.803.125.035.03
    
 
baseline psc10.869.069.8813.9413.94
baseline iwn psc9.718.198.8913.1613.16
baseline iwn2.721.782.152.192.19

Results obtained using the scoring type oot

ParticipantsPrec.Rec.Fmode Pmode R
uniba241.4636.5038.8247.2347.23
uniba137.7430.6933.8534.8434.84
uniba328.5424.7926.5334.5834.58
decao323.4819.1121.0726.5826.58
decao223.0018.7220.6426.3226.32
decao120.0920.0920.0927.7427.74
decao418.6214.7816.4820.5220.52
baroniCutugnoLenciPucci16.6516.0016.3224.9724.00
    
 
baseline iwn psc27.5223.2325.1937.2432.39
baseline psc23.0019.2020.9326.9726.97
baseline iwn14.519.5111.4912.7712.77

Named Entity Recognition

Systems’ results in terms of F-Measure, Precision and Recall

ParticipantsOver.Over.Over.F1
FB1Prec.Rec.GPELOCORGPER
FBK_ZanoliPianta82.0084.0780.0285.1351.2470.5688.31
UniGen_Gesmundo_r281.4686.0677.3383.3650.8171.0887.41
UniTN-FBK-RGB_r281.0983.2079.0885.2552.2469.6186.69
UniTN-FBK-RGB_r180.9083.0578.8685.1954.6269.4186.30
UniTN_Nguyen_r179.7782.2677.4382.8542.3467.8986.44
UniTN_Nguyen_r279.6181.6577.6782.4950.8567.3886.25
UniGen_Gesmundo_r176.2183.9269.7979.0747.0664.6782.04
UniTN_Rigo_r274.9881.0869.7375.9638.3260.3683.18
UniTN_Rigo_r174.3480.7168.9175.7731.1659.8782.38
UniPI-ILC-CNR_r269.6775.4264.7471.4238.9158.3776.38
UniPI-ILC-CNR_r167.9873.6563.1171.6627.4557.0273.85
ECNU_Cai61.0365.5557.0969.2528.7251.4963.49
BASELINE43.9942.8045.2569.0037.0745.5432.06
BASELINE –u39.1440.5837.8052.7528.5744.2332.10

Local Entity Detection and Recognition

Percentages for Value, Precision, Recall and F-measure of the participating system

ParticipantsLEDREMD
ValuePrec.Rec.FValuePrec.Rec.F
FBKirst_UNITN36.7%78.5%61.1%68.7%65,7%78,1%74,1%76,1%

Textual Entailment

RunCorrectAccuracy
FBKirst_run1.txt2850.71
FBKirst_run2.txt2820.71
ofe_semTypes_1.txt2570.64
ofe_semTypes_2.txt2280.57
ofe_lexical_2.txt2300.58
ofe_lexical_1.txt2250.56
FBKirst_run4.txt2020.51
FBKirst_run3.txt1990.50

Connected Digits Recognition

Clean ASR task

Results are ordered by Word Accuracy. In the last column, T means that a non-official training was used, L means that the results were delivered late.

SystemsSentence Acc.Word Acc.WordsCorr.ErrDel+Ins+Sub
ISTC-SONIC_296.44%99.45%23602353137+6+0
ISTC-SONIC_196.44%99.45%23602350138+3+2
ISTC-SPHINX_196.16%99.32%23602352164+8+4
ABLA-NUANCE95.89%99.28%23602345176+2+9T
ISTC-OGI_195.62%99.19%23602346196+5+8
ISTC-OGI_294.25%98.94%236023422511+7+7
ISTC-SPHINX_293.70%98.77%%23602345296+14+9
CEDAT8589.59%98.05%23602333465+19+22T
ABLA-TSPEECH81.64%96.06%236022709334+3+56T
UNINA18.36%77.84%23601941523116+104+303L

Noisy ASR task

Results are ordered by Word Accuracy. In the last column, T means that a non-official training was used.

ParticipantsSentence Acc.Word Acc.WordsCorr.ErrDel+Ins+Sub
ISTC-SONIC_287.77%96.21%40363896153104+13+36
ISTC-SONIC_186.45%95.91%40363882165105+11+49
ISTC-OGI_281.82%93.95%40363821244121+29+94
ISTC-SPHINX_179.17%93.06%40363807280136+51+93
ISTC-OGI_181.65%99.45%40363767306135+37+134
ISTC-SPHINX_272.56%91.63%40363779338133+81+124
CEDAT8578.02%91.03%40362353362255+36+71T
ABLA-NUANCE77.69%88.65%40363604458268+26+164T
ABLA-NUANCE69.09%82.23%40363375717467+56+194T

Spoken Dialogue Systems Evaluation

Dialog level statistics

ParticipantsDuration (sec)Duration (# Turns)
UniNA145.8±72.711.0±5.7
Loquendo182.2±84.718.9±8.9
UniTN206.4±81.724.4±10.1

Task durations (#turns: mean±std.dev.) and success rates

TaskUniNALoquendoUniTN
Duration (turns)Tsr (corr/req)Duration (turns)Tsr (corr/req)Duration (turns)Tsr (corr/req)
Identify representative1.9 ± 0.4100.0% (19/19)2.4 ± 0.895.0% (19/20)3.1 ± 0.590.5% (19/21)
Ask customer detail2.0 ± 0.083.3% (5/6)2.3 ± 0.588.9% (8/9)3.4 ± 1.654.6% (12/22)
List orders2.5 ± 1.50.0% (0/8)2.0 ± 0.080.0% (4/5)3.0 ± 0.075.0% (3/4)
Show last order2.0 ±0.0100% (1/1)----
List customers2.0 ± 0.050.0% (2/4)2.0 ± 0.00.0% (0/8)3.0 ± 0.066.7% (2/3)
New order4.6 ± 1.536.4% (4/11)4.3 ± 1.842.9% (9/21)7.5 ± 2.863.2% (12/19)
List products by category3.0 ± 1.014.3% (1/7)--3.0 ± 0.090100.0% (3/3)
List products by brand----3.0 ± 0.050.0% (1/2)
List products – other2.0 ± 0.00.0% (0/4)3.0 ± 0.825.0% (2/8)3.8 ± 1.644.4% (4/9)
Search single product2.3 ± 0.455.6% (5/9)2.8 ± 1.677.8% (14/18)3.5 ± 2.578.6% (11/14)
Ask for help2.0 ± 0.0100% (3/3)--2.0 ± 0.0100.0% (2/2)
Exit application2.5 ± 0.5100.0% (5/5)-0.0% (0/1)2.4 ± 0.825.0% (4/16)
OVERALL (corr/req)-58.4% (45/77)-62.2% (56/90)-63.5% (73/115)

Speaker Identity Verification

Application Track

DCF analysis for TS1. Axes are log base 10 of normalized DCF.

DCF analysis for TS2. Axes are log base 10 of normalized DCF.

Forensic Track

For detailed information about the results, please see the report