A Brief Guide to the APD (Updated Jan 2024)
1. Database scope
Antimicrobial peptides (AMPs), or host defense peptides, are essential components of innate immune systems. AMPs not only eliminate invading pathogens rapidly (faster than bacterial replication) but also initiate additional immune responses to further clean up the system. It can do more beyond our imagination! The antimicrobial peptide field continues to grow at least for the following three reasons: (1) the urgent need of novel antimicrobial agents for drug-resistant pathogens such as superbugs, viruses, fungi, and parasites; (2) our desire to better understand the biological functions of natural AMPs in innate immunity; and (3) our growing interest in microbiota shaped by AMPs (Wang, 2024).
To promote the research, education, and information retrieval and knowledge discovery in the field, we created, and have been updating and expanding, the Antimicrobial Peptide Database and data analysis system (APD). It is a system dedicated to discovery timeline, glossary, nomenclature, classification, structure, function, information search, prediction, design, statistics and tools of AMPs and beyond. The peptide data stored in this original database were gleaned from the literature (PubMed, PDB, Google Scholar, and Swiss-Prot) manually in the past 20 years. For scientific rigor, the APD first established a set of criteria for peptide registration, leading to a core data set, which can be downloaded from the search interface in the FASTA format. Peptides will be registered into this database if,
(1). they are "natural AMPs", "predicted" and "synthetic" peptides;
(2). antimicrobial activities are demonstrated (MIC <100 uM or 100 ug/ml);
(3). the amino acid sequences of the mature peptides have been elucidated, at least partially; and
(4). antimicrobial peptides contain less than 100 amino acid (aa) residues (Starting October 2012, the APD database also includes some small yet important antimicrobial proteins with less than 200 aa).
Consequently, the peptides in the APD are expanding and can now be classified into three major categories: (a) natural AMPs, which include peptides from the six life kingdoms; (b) predicted peptides, which are predicted by machine learning or other in silico technologies such as sequence alignment and tested to inhibit microbial replication; (c) synthetic peptides, which are derivatives of natural AMPs or de novo peptides designed based on knowledge from natural AMPs. To increase sequence uniqueness, peptides from different species but sharing the same peptide sequence occupy the same entry (search using "found in multiple species" in the Additional Info field).
2. Database capabilities and AMP sub-databases
Our goal is to develop the antimicrobial peptide database into a comprehensive tool for discovery timeline, naming (nomenclature), classification, structure, function, information search, statistics, prediction, and design of antimicrobial peptides covering natural, predicted, and synthetic peptides. As an online e-Dictionary, you can search the database
for detailed info for any peptide (using the standard 20 aa) in many ways as listed below.
(1). Peptide full sequence, partial sequence, or motifs in the single-letter amino acid code such as DP and ILIKEAPD;
(2). In the name field, peptide name such as magainin, thanatin, gomesin, kalata B1, cecropin, aurein 1.2, dermaceptin;
(3). Peptide family name such as defensin, cathelicidin, and histatin;
(4). Common name for peptide source species such as human, cattle, frog, fish, spider, snake, and fungii (actual search words);
(5). peptide sources: (a) "natural AMPs", e.g. three domains (bacteria, archaea, eukaryotes) or six kingdoms (bacteria, archaea, protists, fungii, plants, and animals); (b) "predicted" (e.g., machine learning); and (c) "synthetic" peptides. Quoted words are searchable.
(6). Peptide post-translational modification using search keys such as XXA (amidation), XXC (cyclization), XXD (D-amino acids), XXE (N-terminal acetylation), XXG (glycosylation), XXH (halogenation), XXK (hydroxylation), XXO (oxidation), XXP (phosphorylation), XXQ (N-terminal cyclic glutamate), and XXS (sulfation); A more complete list can be downloaded here.
(7). Still in the name field, AMPs that form dimers can be searched by entering "dimer" (hetero or homo);
(8). Peptide-binding partners or targets using search keys such as BBS (carbohydrates), BBL (lipopolysaccharides, LPS, endotoxin) and BBII (cations such as Zn2+), BBN (nucleic acids), BBMm (biological membranes) (listed in the Glossary);
(9). Peptide original location: PDB, SwissProt or
Reference;
(10). Cationic, neutral, or anionic AMPs based on the net charge: <0, =0 or >0;
(11). Hydrophobic residue content (Pho%);
(12). Peptide length (size);
(13). Biophysical methods for structure determination: NMR spectroscopy, X-ray crystallography, or Circular Dichroism (CD);
(14). Structural classification such as alpha-helix, beta-sheet, alphabeta, or non-alphabeta;
(15). Contributing authors or year of AMP publication in the references;
(16). In the biological activity field, antibacterial (Gram-positive only or Gram-negative only), antiviral, antifungal, and anticancer peptides.It appears that AMPs have multiple functions, which the APD continues to annotate with time;
(17). Source species search in scientific name (e.g. Homo sapiens). This allows for the search of a collection of AMPs from a specific living species so far collected into the APD.
(18). In 2020, the APD also enabled "microbiota" search also in the Source Organism field;
(19). Antimicrobial activity against any species can be search in the Additional Information field by entering "E. coli", "S. aureus", or "C. albicans"; Note that E.coli (searchable) is not a typo but indicates "species not inhibited by the peptide at the tested concentration, e.g. > 100 uM". In 2022, the APD has established a full activity annotation by creating a third "uncertain" categoty, e.g., E-coli (searchable), for any observed activity not greater than 100 uM (e.g., >16 uM).
(20). In the Additional info field, synergistic effect using syner;
(21). Also antimicrobial robustness info: e.g., salt-sensitive and salt-insensitive peptides can be searched in the Add Info field;
(22). MOA (mechanism of action) in the Additional info field;
(23). Animal model and more (chemical modification, recombinant expression, resistance development, transgenic plants, peptide engineering, etc.) in the Additional info field; and
(24). Any combination of two or more options above.
The above annotations make the APD most comprehensive (see LL-37 for instance). In addition, the annotation in the NAME field allows users to search AMP information from a specified life kingdom or domain such as bacteria, archaea, fungi (search with fungii), plants, and animals. For example, a subdatabase for bacterial AMPs can be generated by entering the word "bacteria" into one of the search boxes in the name field, and any properties of the bacterial AMPs (i.e. bacteriocins) can be searched in the usual way. Likewise, users can search AMPs from plants or animals (amphibians, fish, reptiles, birds, invertebrates, insects, spiders, mollusks, worms, crustacean, spiders, etc.).
As one of the most recent addition, the APD has also annotated AMPs from microbiota. This information can be searched via "Source" (e.g., human microbiota:gut).
The database also has the Prediction and Design Interfaces from the very beginning since 2003. The APD database has a unique peptide prediction program. After you input the sequence, the program will calculate select properties of the peptide (e.g. net charge, length, hydrophobic residue% and amino acid composition). If the calculated parameters are out of the APD-defined space for natural AMPs, the program will stop (Nov 2013 version). Otherwise, the system will traverses the database and does pairwise sequence alignment. The APD will then list thousands of peptide sequences that are similar to your input. The program will calculate a similarity score and displays the differences between the input and database sequences. You can improve the activity of the peptide you designed based on the alignment results. Hence, it is useful to compare your sequences with the entries in the APD before you report your new AMPs.
The APD also provides statistical information on peptide sequence, structure and function of all entries or a group of peptide entries of similar properties such as anticancer or from the same sources such as bacteria. Examples of statistic data about AMPs can be viewed here.
For the definition of selected terms or abbreviations related to AMPs, please go to Glossary.
For selected papers contributed by the Wang laboratory and his collaborators, please go to Selected Publications.
For frequently asked questions from users, please refer to the FAQ page.
Selected peptide parameters and properties can be calculated using the peptide calculator of the APD. In addition, the APD has also generated links to Tools for users to calculate other properties of newly discovered peptides.
3. Database history, update and further development
The Antimicrobial Peptide Database (APD) was originally created by a graduate student, Zhe Wang, as his master's thesis in the laboratory of Dr. Guangshun Wang. The project was initiated in 2002 and the first version of the database was open to the public in August 2003. It contained 525 peptide entries, which can be searched in multiple ways, including APD ID, peptide name, amino acid sequence, peptide length, net charge, hydrophobic content, antibacterial, antifungal, antiviral, anticancer, hemolytic activity, original location, PDB ID, 3D structure, and methods for structural determination. The database also programmed the first interfaces for peptide prediction and design. Peptide statistics for all, a group of, or a single of peptide can also be calculated. Some results of this bioinformatics tool were reported in the 2004 database paper.
The 2nd version of our database (APD2) reported 1,228 antimicrobial peptide entries. Major features added are peptide source, family, chemical modifications, and binding targets. A summary of the new developments and database findings, including a demo of database-aided peptide design, is described in the 2009 database paper. The new web design is credited to Sophie Wang.
The 3rd version of the APD (APD3) reported 2,619 peptides. New educational web pages were created for FAQs, interesting AMP discovery timeline, classification, nomenclature, AMP facts, additional tools, Sequence downloads, and APD News (What's New). A unified peptide classification scheme has been introduced and updated. Peptide registration criteria were proposed so that this model database focuses on natural peptides and generates a needed core data set to decipher the design principles of natural antimicrobial peptides. Peptide biological sources were finally classified into six life kingdoms. In addition, the prediction interface has been improved and more peptide properties can be calculated.
In 2021, the APD database was reprogrammed to enhance cybersecurity. For database milestones and new features, please refer to APD2021 with 3273 entries.
In 2023, the APD was 20 years old. The database has been re-configured to cover three general classes of peptides: (1) natural AMPs; (2) synthetic peptides; and (3) predicted peptides (Protein Sci). The "predicted" group from machine learning prediction (followed by experimental validation) is the newest addition. However, peptides (e.g., cathelicidins) predicted by sequence alignment were registered into the APD even in the first version. This expanded version is temporarily referred to as APD2023.
The APD is regularly updated and annotated. By the end of Dec 2023, the AMP database contained a total of 3,940 peptides and proteins, which includes 3146 natural, 314 synthetic, and 198 predicted AMPs with antimicrobial activity data (not including over 200 natural peptides without antimicrobial activity collected before we define the criteria for data registration in the APD3).
We appreciate users who emailed us their depositions, suggestions, corrections, and additions. We were able to incorporate them in the current version and gave credits by including the provider's name in the related entries (e.g. T. Stein, Erik Martin, Adel Ghorani-Azam, Hadi Zare-Zardini, M. Bassam Alkotaini, Brice Felden, Sven-Ulrik Gorr, and Jun Wang). While we are further developing this database, you are welcome to make contributions. If your new AMPs escaped our attention, please contact us.
Disclaimer: Although every effort is made to make the database as complete and accurate as possible, we do not assume liability for any claim due to the use of the APD and its derivatives.