View on GitHub

PennTURBO Documentation

The Github Pages site for PennTURBO


Transforming and Unifying Research with Biomedical Ontologies.

PennTURBO accelerates the processes of finding and connecting key information from clinical records, via semantic modeling of the processes that generated the data. This makes the discovery of previously unappreciated relations between the data possible for research and for operational tasks. The PennTURBO Group uses ontologies, primarily from the Open Biological and Biomedical Ontologies (OBO) Foundry to provide a common semantic framework for UPHS/PennMedicine data. Transforming clinical data in this way allows use of graph database technologies for navigating highly heterogeneous data.

PennTURBO uses shortcut reification to simplify the process of instantiating Electronic Heath Records from relational sources. The shortcuts are then expanded into triples following the principles of ontological realism. Documentation is available for the current shortcut reification process and the resulting types of expanded axioms.

PennTURBO makes use of the Carnival project, a JVM Property graph data unification framework.

Additional reading:


PennTURBO has its own application ontology, which is based on the Ontology for Biobanking and uses OBO Foundry terms wherever possible.

Additionally, the the PennTURBO graph imports several OBO foundry ontologies are imported in their entirety. That enables tasks such as mapping ICD codes to disease classes.


The PennTURBO group has developed a technology stack/pipeline that transforms tabular data into semantic triples, which are stored in a Resource Description Framework (RDF) triple store. The subjects of those triples are instances of classes present in the TURBO Ontology.

PennTURBO also uses text analytics and machine learning for tasks like mapping medication orders from an EHR to drug classes, along with the pharmaceutical roles of the mapped drugs.

Overview of steps in the PennTURBO pipeline

Current TURBO Cohort pipeline

We are now using the TURBO Cohort pipeline described here which uses the TURBO Carnival server. The TURBO Semantic repository component of the TURBO Cohort pipeline is what used to be called Drivetrain.