Research Projects

  • Better prediction on answerability with rewritten questions
    Though answering open domain questions takes complex steps containing passage retrieve and span prediction, just judging whether a question is answerable should be a much easier task. However, zero-shot learning between different question answering datasets performs surprisingly poor and such poor performance doesn’t have much to do with model’s predicting ability. Instead, we find the cause occurs during annotation.In particular, whether a question is answerable doesn’t have a clear and unique judgement criteria among different datasets.Subjective factors such as ambiguity have strong negative effects and leads to such result. In this paper, we post a method to automatically figure out such injustice questions and rewriting origin questions into unambiguous ones. New questions have less ambiguity and pay more attention to undersensitivity. Automatically question rewriting helps to build a uniform criteria, in which model should learn better about predicting whether a question is unanswerable and have better performance on predicting answerable questions at the same time.

  • Autom-evaluation measurement for generated narrative story
    Story generation developed rapidly in recent years but the now exist evaluating metrics are too naive and cannot help improve the quality of text generated. Therefore, I post a new methhod which automatically evaluated the narrative of generated text, which also helps point out some new directions that could be pursued.

    Open Source Projects

  • Wenxin Retrieval: Large-Scale Chinese Corpus Retrieval Platform

Also known as “DCC: National Language Resources Dynamic Circulation Corpus Retrieval System” .The platform provides the dependency syntax conversion tool and apply the enhanced dependency syntax to “Wenxin Retrieval”, in collocation It has achieved good results in many applications such as extraction and language point retrieval.