Speech/Language Disorders Database

This web-enabled database is a domain-specific resource for the study of heritable disorders of speech and language. The database includes:

  • Curated lists of genes, which have been associated with speech or language related phenotypes
  • Genetic loci that have been linked to speech or language disorders
  • Brain imaging results that demonstrate neurological differences (either functional or structural differences) between individuals diagnosed with a heritable speech/language disorder and control subjects.

Please note that the database does not provide a completely comprehensive set of results that have been published in the literature. Indeed, it is a work in progress, which we will update regularly. We encourage your assistance in our efforts (see below).

Our hope is that this database and web application can help to bring together geneticists, clinicians, informaticists, and cognitive neuroscientists who study different aspects of the problem, but who may "speak different languages." The database focuses on the brain, and we provide web links to relevant gene expression data from the Allen Human Brain Atlas. In future work we aim to give users direct access to gene expression data through the Allen Institute's web API.


The web application is powered by a backend MySQL database, in which a set of tables and relations are defined, and in which all data are stored. Basic gene-specific information (Entrez identifier, gene name, type, symbol, etc.) is loaded into our database by parsing snapshots of the Entrez Gene database.

The database consists primarily of results manually curated from the primary literature. We have adopted an apporach in which we attempt to minimize subjective evaluation or intepretation of individual published results, and instead we aim only to represent the findings of these studies as accurately as possible in a structured model. We have established internal criteria for inclusion of studies, which either define new genes or loci of interest or follow up on previous studies (providing either supporting or conflicting evidence). Coverage of neuroimaging results is currently sparser than genetics results (and treats only stuttering and SLI), and we are working toward remedying this issue. The ultimate goal is to bring together results in order to allow for new interpretations and integration with other data types.

We allow users to add to the knowledge base by leaving public comments on individual assertions in the database, which reflect relationships from the literature. Users must register to leave feedback, but unregistered users can read any comments.


The Genes view (see below) is the primary entry point into the web application. It provides a table of genes that have been previously implicated in language relevant phenotypes, with associated metadata and links to external resources.

In the main sortable table, entries hyperlink to the Entrez Gene database entry for the gene of interest and to relevant gene expression data for that gene from the Allen Human Brain Atlas. Clicking a row will link to a more detailed view of records related to that gene's possible involvement in speech and language disorders. Each gene is annotated with the number of current studies that describe association with that gene that have been to-date entered into the database, as well as the number of individual "GenePhenotype" records (assertions of association with a particular phenotypic variable). Note that a single study may include multiple such assertions about one or more genes.

The more detailed gene-specific view is shown below for the gene DYX1C1

This is again a sortable table, with links to external resources such as Entrez Gene. Each row is an individual assertion, and a separate table provides any negative results (in the form of failed replications of previous assertions). The right-most column provides links to download the GenePhenotype record corresponding to each row as either a JSON or XML-formatted text file. Note that the "disorder" column refers to the population in which an assertion was made (for example, if an association study was conducted involving patients with dyslexia, the term "dyslexia" will appear in that column. If the study was conducted in the general population or without regard to specific clinical diagnosis, that column will appear blank. For some disorders, clicking the disorder name will provide entry to the portion of our database containing neuroimaging results found in comparing individuals with the disorder to controls.

Expanding each table row allows the user to obtain relevant details from the original article including information about the study type, cohort, and genotyping procedures, and also allows users to leave comments on the individual assertion that a gene of interest is implicated in a particular phenotype. Users must register through a simple form in order to leave comments.


The Loci View (see below) provides a tabular format entry point to curated data on regions of the genome that have been implicated as of possible interest in speech and language disorders, typically through genetic linkage studies. Each row corresponds to an assertion from the literature, with a brief phenotype description, a disorder name (if appropriate), link to the primary reference, and description of the cytogenetic region of interest.

Expanding the rows of this table again provides further metdata about the cited study, including markers that were specifically described.

Brain Imaging Data

Finally, the Brain Imaging View (below) provides curated lists of neuroimaging results gathered from the primary literature, which point to specific coordinates and/or names of brain areas that showed differences between patients with a particular disorder (such as Specific Language Impairment, shown below), and control subjects. These differences can be structural or functional. The table describes the imaging phenotype (i.e., the difference observed) as well as a somewhat more detailed description of the effect of interest. All coordinate data are in MNI space.


We have constructed a RESTful web services API to allow automated query and XML / JSON download of database contents. More information coming soon.

Pubmed Searches and RSS feeds

Studies of interest are located by means of Pubmed searches using three criteria aimed at locating different types of articles. We make the searches and corresponding RSS feeds publicly available below (note that these searches may be updated over time).

Registered users may also suggest an article for inclusion here.


This project is partially funded by generous support from the Boston University NSF-sponsored Center of Excellence for Learning in Education, Science, and Technology (CELEST), PI Barbara Shinn-Cunningham.