ERPIN Home Page

GO TO THE NEW ERPIN HOME PAGE


ERPIN (Easy RNA Profile IdentificatioN) is a new RNA motif search program developped by Daniel Gautheret and André Lambert. Unlike most RNA pattern matching programs, ERPIN does not require users to write complex descriptors before starting a search. Instead ERPIN reads a sequence alignement and secondary structure, and automatically infers a statistical "secondary structure profile" (SSP). An original Dynamic Programming algorithm then matches this SSP onto any target database, finding solutions and their associated scores.

    Please cite: Gautheret D, Lambert A. (2001) Direct RNA Motif Definition and Identification from Multiple Sequence Alignments using Secondary Structure Profiles. J Mol Biol. 313:1003-11 (abstract).


VERSION 3.1 NOW AVAILABLE

Download ERPIN 3.1.2 (documentation)

ERPIN 1.x & 2.x Archive and datasets

Training sets (ERPIN 2/3 format):

Alignment Suggested command Comments
Nuclear tRNAs adapted from Sprinzl's '98 alignments. erpin trna-allnuclear.epn <database> -2,2 -umask 8 13 -nomask General euk/archae/bac tRNA. No intron. Quite specific at default cutoff.
Nuclear type I tRNAs (Sprinzl '98) erpin trna-typeI.epn <database> -2,2 -umask 8 13 -nomask Type I euk/archae/bac tRNA. No intron.
Nuclear type II tRNAs (Sprinzl '98) erpin trna-typeII.epn <database> -2,2 -umask 8 13 -nomask Type II tRNAs with aligned extra-stem in variable loop.
Bacterial SRP RNAs extracted from C. Zwieb's alignments. erpin srpeub.epn <database> -2,2 -umask 14 13 -umask 12 11 -nomask -cutoff -3 -2 15 Captures domain IV only. Highly specific at default cutoff. Here, we use lower cutoffs in order to discover new instances.
Iron uptake IRE Iron Response Element (home-made). erpin uptake-ire.epn <database file> -2,2 -nomask Transferrin receptor, etc. (as in JMB paper)
Iron storage IRE Iron Response Element (home-made). erpin storage-ire.epn <database file> -2,2 -nomask Ferritin, etc. (as in JMB paper)
Eukaryotic Selenocystein Insertion SECIS Element (home-made). erpin secis9.epn <database file> -2,2 -umask 4 7 -mask 3 5 -cutoff 5 28 Full SECIS model including apical loop. Highly specific: 7 hits per 100 Mb at score 29; 0.1 hit/100Mb at score 40.
Human Polyadenylation Site (with Downstream Sequence Element) erpin polya.epn <database file> 2,3 -umask 2 -umask 2 3 -cutoff 70% 74% -unifstat -smp From a database of 2327 human polyadenylation sites. Script will identify the region comprising the hexameric signal (AAUAAA or AUUAAA only) and the 50 nt downstream region. Specificity: 3.7 False Positives / 100kb in CDS, 22 FP/100kb in UTR sequences, 39 FP / 100 kb in intron sequences. Sensitivity: 56%.

Changes From Version 1 To Version 2

Changes From Version 2 To Version 3

    The main advance in ERPIN 3 is DYNAMIC MULTI-LEVEL SEARCHES. During multi-level searches, mask elements used at level n that were already identified at level n-1 are fixed. When properly used, this reduces considerably the number of mask configurations to be explored at any level, thus reducing CPU and memory usage. Very large motifs can now be handled by ERPIN.

    Another novelty is the "-add" mask option, that permits to add an element to the mask used at the previous step, without having to rewrite the entire list of elements.

    More information about mask usage in the documentation...

D. Gautheret's Home Page