Fast syntactic searching in very large corpora for many languages

Hits: 5848
Research areas: Year: 2010
Type of Publication: In Proceedings Keywords: corpus search, large corpora, CQL, syntactic search
Authors:
  • , 28 42
Editor: Ryo Otoguro, Kiyoshi Ishikawa, Hiroshi Umemoto, Kei Yoshimoto, Yasunari Harada
Book title: Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation (PACLIC 24)
Pages: 741-747
Address: Tokyo, Japan
Organization: PACLIC 24 Month: November 4-7
ISBN: 9784905166009
Abstract:
For many linguistic investigations, the first step is to find examples. In the 21st century, they should all be found, not invented. Thus linguists need flexible tools for finding even quite rare phenomena. To support linguists well, they need to be fast even where corpora are very large and queries are complex. We present extensions to the CQL ’Corpus Query Language’ for intuitive creation of syntactically rich queries, and demonstrate that they can be computed quickly within our tool even on multi-billion word corpora.