Small Molecule Enzymes in E. coli - Structural and Sequence
Family Assignmentsblattnerid
The Blattner number for the gene. See the E. coli Genome Project homepage for more information.
gene_name
The given name for the gene encoding the protein assigned a domain.
pid
The GenBank accession ID for the protein; Gene3D uses these.
cath_number
CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and Homologous superfamily (H). Structural domains with the same 4-number CATH code are evolutionarily related.
CATH codes begining with 1, 2, 3 or 4 are structural domains. We also use a CATH-like encoding strategy to classify our sequence families. Domains with a cath_number starting with 6 or 88 are sequence families.
final_cath_number
To maximise our structural coverage of E. coli proteins we used a number of techniques. Proteins assigned to a sequence family but with a significant homology to a structural family are classified into that structural family.
Furthermore , we cluster certain of the CATH domain families classified seperately in the CATH database (e.g. Rossman folds). Such clustered families start with a 77.
Assignemts prior to such modification are given in the cath_number column, assignments after such modifications are given in the final_cath_number column. We used final_cath_number assignments in our paper.
concensus_start, concensus_end, extreme_start, extreme_end
These are domain boundaries as derived from the Gene3D assignment procedure. The boundaries are described here.