The next great frontier in science is the design, development, and deployment of systems capable of managing the overwhelming tide of scientific information produced each year. Central to challenges facing current and future scientists is the management of scientific information. Never before has more useful scientific research been available to people, but most of this research is either genuinely inaccessible or effectively unavailable due to our inability to properly curate it.
These challenges address the central problems of the conduct of scientific inquiry today: How do we find relevant research to assist us in what we want to do? How do we fit our research into the larger picture of how the world works? How do we statistically integrate our findings with the work of others? (This last question is central to the enterprise of meta-analysis — the mechanism by which many similar studies are combined mathematically to provide deeper insight than any one study can provide.) How do we conduct our research in a way that is reproducible? (It may surprise you to learn that, at present, most scientific research is not reproducible based only on the published results and available data.)
The Imaging Genetics and Informatics laboratory conducts original research into neuroinformatics as part of a larger multi-laboratory and multi-institution informatics research program. We are interested in all of the following areas of informatics:
Machine Learning & Text Mining — One of the central weaknesses of current scientific publishing is the lack of relevant metadata for the published scientific literature. The understanding and interpretation of scientific papers requires expert knowledge simply to decode the contents which are often obscured by the very words used to express the ideas. The proper assignment of metadata makes explicit (syntactically represented) knowledge that lies implicit (semantically available) within the text of scientific publications. But the placement of this metadata requires human workers (who would usually rather be doing science than curating it). We seek methods to allow smart machines to do a lot of this drudgery.
- Ontologies and Controlled Vocabularies — The fluidity and variability of language is wonderful…for poetry. It is not the best way for databases of scientific publications to communicate or for machines to scan when looking for relevant articles to present to us. We develop ontologies (structured systems of relations) and controlled vocabularies (specialized languages) for organizing scientific knowledge. We are particularly interested in “machine readable” ontologies, which may or more importantly may not be human readable.
- Computer Assisted Research Tools — We develop tools to manage the world’s scientific information ranging from research tools to assist working researchers to machines to better organize scientific knowledge.
- Web Intercommunication Architectures — We believe that the future of scientific informatics is the interaction of hundreds or thousands of small information systems, not the construction of large “portals” or web servers that are centrally managed. We are developing open-source and open-standard architectures for building machine to machine interactions and open information resource discovery. Currently we are developing an API and set of protocol conventions called Maa(R)S — Markup as a (Research) Service. This set of conventions is designed to allow anyone to package machine learning or annotation services in a way that will make them automatically discoverable and accessible to agents (human and machine) via the internet.
Informatics collaborators currently include / have included: Georgia State University (department of computer science), the University of New Mexico (departments of psychology and computer science), Florida International University (departments of physics and computer science), the Emory University Center for Comprehensive Informatics, and the Mind Research Network (New Mexico).