Small Molecule Enzymes in E. coli - Structural and Sequence Family Assignments

We obtained the majority of our assignments from the Gene3D database. You can look at/download 4 different assignment files: All files contain the following columns:

blattnerid

The Blattner number for the gene. See the E. coli Genome Project homepage for more information.

gene_name

The given name for the gene encoding the protein assigned a domain.

pid

The GenBank accession ID for the protein; Gene3D uses these.

cath_number

CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and Homologous superfamily (H). Structural domains with the same 4-number CATH code are evolutionarily related.

CATH codes begining with 1, 2, 3 or 4 are structural domains. We also use a CATH-like encoding strategy to classify our sequence families. Domains with a cath_number starting with 6 or 88 are sequence families.

final_cath_number

To maximise our structural coverage of E. coli proteins we used a number of techniques. Proteins assigned to a sequence family but with a significant homology to a structural family are classified into that structural family.

Furthermore , we cluster certain of the CATH domain families classified seperately in the CATH database (e.g. Rossman folds). Such clustered families start with a 77.

Assignemts prior to such modification are given in the cath_number column, assignments after such modifications are given in the final_cath_number column. We used final_cath_number assignments in our paper.

concensus_start, concensus_end, extreme_start, extreme_end

These are domain boundaries as derived from the Gene3D assignment procedure. The boundaries are described here.