For researchers entering the field of peptide science, the nomenclature and conventions used to describe peptide structures can be initially overwhelming. This primer covers the essential knowledge needed to read, understand, and communicate about peptide sequences — from the basic amino acid codes to common modifications and structural notation.
The 20 Standard Amino Acids
All naturally occurring proteins and most synthetic peptides are built from the same 20 standard (proteinogenic) amino acids. Each amino acid has a common name, a three-letter abbreviation, and a one-letter code:
Nonpolar (Hydrophobic) Amino Acids
| Name | Three-Letter | One-Letter | Key Feature |
|---|---|---|---|
| Glycine | Gly | G | Smallest amino acid; no side chain |
| Alanine | Ala | A | Methyl side chain; simple hydrophobic |
| Valine | Val | V | Branched chain; beta-branched |
| Leucine | Leu | L | Branched chain; commonly in helices |
| Isoleucine | Ile | I | Branched chain; beta-branched |
| Proline | Pro | P | Cyclic; introduces rigidity in chain |
| Phenylalanine | Phe | F | Aromatic ring; hydrophobic |
| Tryptophan | Trp | W | Largest amino acid; indole ring |
| Methionine | Met | M | Thioether; oxidation-sensitive |
Polar Uncharged Amino Acids
| Name | Three-Letter | One-Letter | Key Feature |
|---|---|---|---|
| Serine | Ser | S | Hydroxyl group; phosphorylation site |
| Threonine | Thr | T | Hydroxyl group; beta-branched |
| Cysteine | Cys | C | Thiol group; forms disulfide bonds |
| Tyrosine | Tyr | Y | Phenol ring; phosphorylation site |
| Asparagine | Asn | N | Amide; deamidation-prone |
| Glutamine | Gln | Q | Amide; deamidation-prone |
Charged Amino Acids
| Name | Three-Letter | One-Letter | Charge at pH 7 |
|---|---|---|---|
| Aspartic acid | Asp | D | Negative (-1) |
| Glutamic acid | Glu | E | Negative (-1) |
| Lysine | Lys | K | Positive (+1) |
| Arginine | Arg | R | Positive (+1) |
| Histidine | His | H | ~Neutral (pKa 6.0; partially positive) |
Memorization Aid
The one-letter codes are not always intuitive. Some mnemonics:
- Letters that match: Gly, Ala, Val, Leu, Ile, Pro, Ser, Thr, Cys, His
- Phonetic connections: F (Phenylalanine sounds like F), W (tryptophan = double-ring, W = double-V)
- Remaining assignments: D (aspartic acid), E (glutamic acid), K (lysine), R (arginine), N (asparagine), Q (glutamine), M (methionine), Y (tyrosine)
Reading Peptide Sequences
Convention: N-Terminus to C-Terminus
Peptide sequences are always written from left (N-terminus, the amino/NH2 end) to right (C-terminus, the carboxyl/COOH end). This convention mirrors the direction of ribosomal protein synthesis (N to C) and the direction of solid-phase peptide synthesis reading order.
Example: Gly-His-Lys (GHK)
- Gly is at the N-terminus (left)
- Lys is at the C-terminus (right)
- The peptide has two peptide bonds: Gly-His and His-Lys
Three-Letter vs. One-Letter Notation
Both notation systems are widely used:
Three-letter notation: More explicit and less prone to misreading. Used in product descriptions, COAs, and detailed structural discussions.
- Example: Gly-Glu-Pro-Pro-Pro-Gly-Lys-Pro-Ala-Asp-Asp-Ala-Gly-Leu-Val (BPC-157)
One-letter notation: More compact. Used in databases, bioinformatics, and when space is limited.
- Example: GEPPPGKPADDAGLV (BPC-157)
Dashes and Notation
- Dashes between residues (Gly-His-Lys) indicate peptide bonds in three-letter notation
- No dashes in one-letter notation (GHK)
- Dashes in one-letter notation sometimes indicate chain breaks or modifications
Common Modifications
Research peptides frequently incorporate chemical modifications that alter their properties. Understanding the notation for these modifications is essential.
Terminal Modifications
Acetylation (Ac- or N-Ac):
- An acetyl group (CH3-CO-) is added to the N-terminus
- Protects against aminopeptidase degradation
- Notation: Ac-Gly-His-Lys or Ac-GHK
- Effect: Removes the positive charge at the N-terminus, may improve stability
Amidation (-NH2):
- The C-terminal carboxyl group is converted to an amide (-CONH2)
- Protects against carboxypeptidase degradation
- Notation: Gly-His-Lys-NH2 or GHK-NH2
- Effect: Removes the negative charge at the C-terminus, often improves biological activity
Both modifications together: Ac-Gly-His-Lys-NH2 (an acetylated and amidated tripeptide with no terminal charges)
Non-Natural Amino Acids
Many research peptides incorporate amino acids not found in standard proteins:
D-amino acids:
- Mirror images of the standard L-amino acids
- Notation: D-Arg, D-Trp, D-Phe, or using lowercase letters (r, w, f)
- Effect: Resistant to most proteases, which are stereospecific for L-amino acids
- Example: Ipamorelin contains D-2-Nal (D-2-naphthylalanine) and D-Phe
Aib (alpha-aminoisobutyric acid):
- A non-natural amino acid with two methyl groups on the alpha-carbon
- Promotes helical structure
- Found in Ipamorelin: Aib-His-D-2-Nal-D-Phe-Lys-NH2
Dmt (2',6'-dimethyltyrosine):
- A modified tyrosine with methyl groups on the aromatic ring
- Found in SS-31: D-Arg-Dmt-Lys-Phe-NH2
Orn (ornithine):
- Similar to lysine but with one fewer carbon in the side chain
- Sometimes used in peptide design for altered spacing
Disulfide Bonds
Cysteine residues can form covalent disulfide bonds that create loops or connect separate peptide chains:
- Notation: Parenthetical numbering showing which cysteines are linked
- Example: A peptide with Cys at positions 3 and 8 forming a disulfide: (Cys3-Cys8)
- Brackets may also be used: [Cys3-Cys8]
Cyclization
Cyclic peptides have their N-terminus and C-terminus connected:
- Notation: cyclo- prefix, or brackets: cyclo(Arg-Gly-Asp-D-Phe-Lys)
- Head-to-tail cyclization creates a ring structure with no free termini
- Side-chain-to-side-chain cyclization (e.g., lactam bridges) may also occur
PEGylation
Attachment of polyethylene glycol (PEG) chains:
- Notation: PEG-peptide or peptide-PEG, with PEG molecular weight specified
- Example: PEG-40K-GHK (GHK with a 40 kDa PEG chain)
- Effect: Dramatically extends half-life by increasing molecular size and reducing renal clearance
Drug Affinity Complex (DAC)
A specialized modification used in CJC-1295:
- A maleimidopropionic acid-lysine linker that covalently binds to serum albumin
- Notation: CJC-1295 DAC or CJC-1295 with Drug Affinity Complex
- Effect: Extends half-life from minutes to days
Molecular Weight Calculation
For researchers who need to calculate molar concentrations, the molecular weight of a peptide can be estimated:
MW = Sum of residue weights - (n-1) x 18.02
Where:
- Residue weights are the molecular weights of each amino acid minus water (since water is lost during peptide bond formation)
- n = number of amino acids
- 18.02 = molecular weight of water
- Additional adjustments for terminal modifications, counter ions, etc.
Online peptide molecular weight calculators are readily available and handle modifications automatically.
Counter Ions and Salt Forms
Synthetic peptides are typically supplied as salts. The counter ion affects the gross molecular weight and peptide content:
Trifluoroacetate (TFA) salt:
- Most common salt form from HPLC purification (TFA is used in the mobile phase)
- TFA molecular weight: 114.02 Da per TFA molecule
- Basic residues (Arg, Lys, His, N-terminus) each carry one TFA counter ion
- Can be exchanged to acetate form if TFA interferes with research applications
Acetate salt:
- Commonly used alternative to TFA
- Acetate molecular weight: 59.04 Da per acetate molecule
- More biocompatible for cell culture and in-vivo applications
- Produced by salt exchange from TFA form
Hydrochloride (HCl) salt:
- Used less commonly
- HCl molecular weight: 36.46 Da per HCl molecule
- Simple, well-characterized counter ion
Impact on peptide content: A highly basic peptide (multiple Arg/Lys residues) in TFA salt form may have a peptide content of only 60-65%, meaning that 35-40% of the powder mass is TFA counter ions plus residual moisture. This is not an impurity — it is simply the salt form. Researchers must account for this when calculating molar concentrations.
Common Peptide Naming Conventions
Research peptides may be referred to by various names:
- Systematic name: Based on the amino acid sequence (e.g., Gly-His-Lys)
- Trade/common name: An informal name used in research communities (e.g., GHK-Cu, BPC-157, Ipamorelin)
- Code name: An alphanumeric designation from the developing laboratory (e.g., AOD-9604, CJC-1295, SS-31)
- CAS number: A unique numerical identifier assigned by Chemical Abstracts Service (e.g., 137525-51-0 for BPC-157)
When ordering research peptides, the CAS number is the most unambiguous identifier. Product names and sequence descriptions can vary between vendors.
Conclusion
Peptide nomenclature follows logical conventions that become intuitive with practice. Understanding how sequences are written, what modifications mean, and how salt forms affect the physical product allows researchers to communicate precisely about their materials and interpret vendor product descriptions accurately. When in doubt, the CAS number and the full amino acid sequence (including all modifications) provide the most unambiguous identification of a research peptide.
This article is for educational purposes related to peptide chemistry and research. All peptides discussed are for laboratory research use only and are not intended for human consumption.
