Incremental Distant Supervision for Extraction of Regulation Relations from Text

1:35pm to 1:45pm
Kachina Lounge

Biological Incremental Distant Supervision, or BIDS, is a system for identifying sentences expressing regulation relations in the primary literature for cancer biology.
BIDS extracts regulation relations using a classifier with a training set built up from several preprocessed sources. The first source contains expert-labelled text from the GENIA-Event subcorpus; the second source contains relations as triples of relation type, controller, and input to a regulated event, and comes from Pathway Commons; the final source is unlabelled text from articles available on PubMed.
The data sources are combined using a version of a machine learning method called Distant Supervision. BIDS works in several rounds to extract sentences, preprocess them with Arizona's Clulab's REACH system, and describe them with syntactic and semantic features. In the initial round descriptions of sentences from GENIA-Event annotated as expressing regulation relations are used as a “seed;” in subsequent rounds sentences are first extracted based on the co-occurence in them of controller and event input words from Pathway Commons, and are then filtered by the minimum distance from their descriptions to the descriptions of the sentences already accepted.
Once the Pathway Commons relations are exhausted over all rounds, the classifier is trained.

David Sidi, Graduate Student Researcher