改善多模式数据的标记:减少评估者,增加层次

你好!我们-ITMO机器学习实验室的科学家以及VKontakte的Core ML团队-正在进行联合研究。VK的重要任务之一是帖子的自动分类:不仅需要生成主题提要,而且还必须识别不需要的内容。评估人员参与了此类记录的处理。同时,使用诸如主动学习的机器学习范例可以显着降低他们的工作成本。



本文将讨论它在多模式数据分类中的应用。我们将向您介绍主动学习的一般原理和方法,将其应用于任务的特殊性以及在研究过程中获得的见解。



图片



介绍



— machine learning, . , , , .



, (, Amazon Mechanical Turk, .) . — reCAPTCHA, , , , — Google Street View. — .



. , Voyage — , . , , . , .



Amazon DALC (Deep Active Learning from targeted Crowds). , . Monte Carlo Dropout ( ). — noisy annotation. , « , », .



Amazon . : / . , , . , : , . .



! , . pool-based sampling.



数字: 1.基于池的主动学习方案的一般方案

. 1. pool-based



. , , ( ). : , .



, — . (. — query). , . ( , ) .



, , .





, — . ( ). ≈250 . . () 50 — — :



  1. , (. embedding), ;
  2. .


, (. . 2).



. 2 —

. 2 —





ML — . , .



. , . , , , . , , early stopping. , .



. residual , highway , (. encoder). , (. fusion): , .

— , . -.



, — , . , .



. , (. 3):



. 3.

. 3.



. , . , , . , ( + ) — .



, . 3, :



. 4.

. 4.



, , . , ó , , .



, : ? :



  1. ;
  2. ;
  3. .


. : maximum likelihood , - . :



L=1σ12L1+1σ22L2+1σ32L3+logσ1+logσ2+logσ3



L1,L2,L3 — ( -), σ1,σ2,σ3 — , .



Pool-based sampling



— , . pool-based sampling :



  1. - .
  2. .
  3. , , .
  4. .
  5. ( ).
  6. 3–5 (, ).


, 3–6 — .



, , :



  1. , . , : . , , , . . , 2 000.



  2. . , . ( ). , , . , . 20 .

    . , . — , . 100 200.





, , , .



№1: batch size



baseline , ( ) (. 5).



. 5.   baseline-  .

. 5. baseline- .



random state. .



. «» , , .



, (. batch size). 512 — - (50). , batch size . . :



  1. upsample, ;
  2. , .


batch size: (1).



current_batch_size=b+nmodbnb[1]



b — batch size, n — .

“” (. 6).



. 6.     batch size (passive  )   (passive + flexible  )

. 6. batch size (passive ) (passive + flexible )



: c . , , batch size . .



.



Uncertainty



— uncertainty sampling. , , .



:



1. (. Least confident sampling)



, :



xLC=argmaxx 1Pθ(y^|x)[2]



y^=argmaxy Pθ(y|x) — , y — , x — , xLC — , .



该措施可以理解如下。假设对象丢失函数看起来像1y^. , . .



. , : {0,5; 0,49; 0,01}, — {0,49; 0,255; 0,255}. , (0,49) , (0,5). , ó : . , .



2. (. Margin sampling)



, , , :



xM=argminx Pθ(y^1|x)Pθ(y^2|x)[3]



y^1x, y^2 — .



, . , . , , MNIST ( ) — , . .



3. (. Entropy sampling)



:



xH=argmaxx Pθ(yi|x)logPθ(yi|x)[4]



yii- x .



, , . :



  • , , ;
  • , .


, , . , entropy sampling .



(. 7).



. 7.      uncertainty sampling    ( —    ,   —    ,  —    )

. 7. uncertainty sampling ( — , — , — )



, least confident entropy sampling , . margin sampling .



, , : MNIST. , , entropy sampling , . , .



. O(plogq), p — , q — . , .



BALD



, , — BALD sampling (Bayesian Active Learning by Disagreement). .



, query-by-committee (QBC). — . uncertainty sampling. , . QBC Monte Carlo Dropout, .



, , — . dropout . dropout , ( ). , dropout- (. 8). Monte Carlo Dropout (MC Dropout) . , . ( dropout) Mutual Information (MI). MI , , — , . .



. 8.  MC Dropout   BALD

. 8. MC Dropout BALD



, QBC MC Dropout uncertainty sampling. , (. 9).



. 9.      uncertainty sampling   QBC       ( -    ,   -    ,  -    )

. 9. uncertainty sampling ( QBC ) ( — , — , — )



BALD. , Mutual Information :



aBALD=H(y1,...,yn)E[H(y1,...,yn|ω)][5]



E[H(y1,...,yn|w)]=1ki=1nj=1kH(yi|wj)[6]



n — , k — .



(5) , — . , , . BALD . 10.



. 10.    BALD

. 10. BALD



, , .

query-by-committee BALD , . , uncertainty sampling. , — O(kplog(q)), p — , q — , k — , .



BALD tf.keras, . PyTorch, dropout , batch normalization , .



№2: batch normalization



batch normalization. batch normalization — , . , , , , batch normalization. , . , . BALD. (. 11).



. 11.   batch normalization   BALD

. 11. batch normalization BALD



, , .



batch normalization, . , .



Learning loss



. , . , .



, . — . , . learning loss, . , (. 12).



. 12.   Learning loss

. 12. Learning loss



learning loss . .

. , . «» learning loss: , , . ideal learning loss (. 13).



. 13.   ideal learning loss

. 13. ideal learning loss



, learning loss.

, . , , - , . :



  1. (2000 ), ;
  2. 10000 ( );
  3. ;
  4. ;
  5. 100 ;
  6. , , 1;
  7. .


, , . , ( margin sampling).



1.



p-value
loss -0,2518 0,0115
margin 0,2461 0,0136


, margin sampling — , , , . c .



: ?

, , (. 14).



. 14.    ideal learning loss        ideal learning loss

. 14. ideal learning loss ideal learning loss



, MNIST :



2. MNIST



p-value
loss 0,2140 0,0326
0,2040 0,0418


ideal learning loss , (. 15).



. 15.        MNIST  ideal learning loss.   —  ideal learning loss,  —

. 15. MNIST ideal learning loss. — ideal learning loss, —



, , , , . .



learning loss , uncertainty sampling: O(plogq), p — , q — . , , . , .





, . . , margin sampling — . 16.

. 16.       ( )   ,   margin sampling

. 16. ( ) , margin sampling



: ( — margin sampling), — , , . ≈25 . . 25% — .



, . , , .



, , . , :



  • batch size;
  • , , — , batch normalization.



All Articles