Understanding Peptide Sequences and Amino Acid Nomenclature

For researchers entering the field of peptide science, the nomenclature and conventions used to describe peptide structures can be initially overwhelming. This primer covers the essential knowledge needed to read, understand, and communicate about peptide sequences — from the basic amino acid codes to common modifications and structural notation.

The 20 Standard Amino Acids

All naturally occurring proteins and most synthetic peptides are built from the same 20 standard (proteinogenic) amino acids. Each amino acid has a common name, a three-letter abbreviation, and a one-letter code:

Nonpolar (Hydrophobic) Amino Acids

Name	Three-Letter	One-Letter	Key Feature
Glycine	Gly	G	Smallest amino acid; no side chain
Alanine	Ala	A	Methyl side chain; simple hydrophobic
Valine	Val	V	Branched chain; beta-branched
Leucine	Leu	L	Branched chain; commonly in helices
Isoleucine	Ile	I	Branched chain; beta-branched
Proline	Pro	P	Cyclic; introduces rigidity in chain
Phenylalanine	Phe	F	Aromatic ring; hydrophobic
Tryptophan	Trp	W	Largest amino acid; indole ring
Methionine	Met	M	Thioether; oxidation-sensitive

Polar Uncharged Amino Acids

Name	Three-Letter	One-Letter	Key Feature
Serine	Ser	S	Hydroxyl group; phosphorylation site
Threonine	Thr	T	Hydroxyl group; beta-branched
Cysteine	Cys	C	Thiol group; forms disulfide bonds
Tyrosine	Tyr	Y	Phenol ring; phosphorylation site
Asparagine	Asn	N	Amide; deamidation-prone
Glutamine	Gln	Q	Amide; deamidation-prone

Charged Amino Acids

Name	Three-Letter	One-Letter	Charge at pH 7
Aspartic acid	Asp	D	Negative (-1)
Glutamic acid	Glu	E	Negative (-1)
Lysine	Lys	K	Positive (+1)
Arginine	Arg	R	Positive (+1)
Histidine	His	H	~Neutral (pKa 6.0; partially positive)

Memorization Aid

The one-letter codes are not always intuitive. Some mnemonics:

Letters that match: Gly, Ala, Val, Leu, Ile, Pro, Ser, Thr, Cys, His
Phonetic connections: F (Phenylalanine sounds like F), W (tryptophan = double-ring, W = double-V)
Remaining assignments: D (aspartic acid), E (glutamic acid), K (lysine), R (arginine), N (asparagine), Q (glutamine), M (methionine), Y (tyrosine)

Reading Peptide Sequences

Convention: N-Terminus to C-Terminus

Peptide sequences are always written from left (N-terminus, the amino/NH2 end) to right (C-terminus, the carboxyl/COOH end). This convention mirrors the direction of ribosomal protein synthesis (N to C) and the direction of solid-phase peptide synthesis reading order.

Example: Gly-His-Lys (GHK)

Gly is at the N-terminus (left)
Lys is at the C-terminus (right)
The peptide has two peptide bonds: Gly-His and His-Lys

Three-Letter vs. One-Letter Notation

Both notation systems are widely used:

Three-letter notation: More explicit and less prone to misreading. Used in product descriptions, COAs, and detailed structural discussions.

Example: Gly-Glu-Pro-Pro-Pro-Gly-Lys-Pro-Ala-Asp-Asp-Ala-Gly-Leu-Val (BPC-157)

One-letter notation: More compact. Used in databases, bioinformatics, and when space is limited.

Example: GEPPPGKPADDAGLV (BPC-157)

Dashes and Notation

Dashes between residues (Gly-His-Lys) indicate peptide bonds in three-letter notation
No dashes in one-letter notation (GHK)
Dashes in one-letter notation sometimes indicate chain breaks or modifications

Common Modifications

Research peptides frequently incorporate chemical modifications that alter their properties. Understanding the notation for these modifications is essential.

Terminal Modifications

Acetylation (Ac- or N-Ac):

An acetyl group (CH3-CO-) is added to the N-terminus
Protects against aminopeptidase degradation
Notation: Ac-Gly-His-Lys or Ac-GHK
Effect: Removes the positive charge at the N-terminus, may improve stability

Amidation (-NH2):

The C-terminal carboxyl group is converted to an amide (-CONH2)
Protects against carboxypeptidase degradation
Notation: Gly-His-Lys-NH2 or GHK-NH2
Effect: Removes the negative charge at the C-terminus, often improves biological activity

Both modifications together: Ac-Gly-His-Lys-NH2 (an acetylated and amidated tripeptide with no terminal charges)

Non-Natural Amino Acids

Many research peptides incorporate amino acids not found in standard proteins:

D-amino acids:

Mirror images of the standard L-amino acids
Notation: D-Arg, D-Trp, D-Phe, or using lowercase letters (r, w, f)
Effect: Resistant to most proteases, which are stereospecific for L-amino acids
Example: Ipamorelin contains D-2-Nal (D-2-naphthylalanine) and D-Phe

Aib (alpha-aminoisobutyric acid):

A non-natural amino acid with two methyl groups on the alpha-carbon
Promotes helical structure
Found in Ipamorelin: Aib-His-D-2-Nal-D-Phe-Lys-NH2

Dmt (2',6'-dimethyltyrosine):

A modified tyrosine with methyl groups on the aromatic ring
Found in SS-31: D-Arg-Dmt-Lys-Phe-NH2

Orn (ornithine):

Similar to lysine but with one fewer carbon in the side chain
Sometimes used in peptide design for altered spacing

Disulfide Bonds

Cysteine residues can form covalent disulfide bonds that create loops or connect separate peptide chains:

Notation: Parenthetical numbering showing which cysteines are linked
Example: A peptide with Cys at positions 3 and 8 forming a disulfide: (Cys3-Cys8)
Brackets may also be used: [Cys3-Cys8]

Cyclization

Cyclic peptides have their N-terminus and C-terminus connected:

Notation: cyclo- prefix, or brackets: cyclo(Arg-Gly-Asp-D-Phe-Lys)
Head-to-tail cyclization creates a ring structure with no free termini
Side-chain-to-side-chain cyclization (e.g., lactam bridges) may also occur

PEGylation

Attachment of polyethylene glycol (PEG) chains:

Notation: PEG-peptide or peptide-PEG, with PEG molecular weight specified
Example: PEG-40K-GHK (GHK with a 40 kDa PEG chain)
Effect: Dramatically extends half-life by increasing molecular size and reducing renal clearance

Drug Affinity Complex (DAC)

A specialized modification used in CJC-1295:

A maleimidopropionic acid-lysine linker that covalently binds to serum albumin
Notation: CJC-1295 DAC or CJC-1295 with Drug Affinity Complex
Effect: Extends half-life from minutes to days

Molecular Weight Calculation

For researchers who need to calculate molar concentrations, the molecular weight of a peptide can be estimated:

MW = Sum of residue weights - (n-1) x 18.02

Where:

Residue weights are the molecular weights of each amino acid minus water (since water is lost during peptide bond formation)
n = number of amino acids
18.02 = molecular weight of water
Additional adjustments for terminal modifications, counter ions, etc.

Online peptide molecular weight calculators are readily available and handle modifications automatically.

Counter Ions and Salt Forms

Synthetic peptides are typically supplied as salts. The counter ion affects the gross molecular weight and peptide content:

Trifluoroacetate (TFA) salt:

Most common salt form from HPLC purification (TFA is used in the mobile phase)
TFA molecular weight: 114.02 Da per TFA molecule
Basic residues (Arg, Lys, His, N-terminus) each carry one TFA counter ion
Can be exchanged to acetate form if TFA interferes with research applications

Acetate salt:

Commonly used alternative to TFA
Acetate molecular weight: 59.04 Da per acetate molecule
More biocompatible for cell culture and in-vivo applications
Produced by salt exchange from TFA form

Hydrochloride (HCl) salt:

Used less commonly
HCl molecular weight: 36.46 Da per HCl molecule
Simple, well-characterized counter ion

Impact on peptide content: A highly basic peptide (multiple Arg/Lys residues) in TFA salt form may have a peptide content of only 60-65%, meaning that 35-40% of the powder mass is TFA counter ions plus residual moisture. This is not an impurity — it is simply the salt form. Researchers must account for this when calculating molar concentrations.

Common Peptide Naming Conventions

Research peptides may be referred to by various names:

Systematic name: Based on the amino acid sequence (e.g., Gly-His-Lys)
Trade/common name: An informal name used in research communities (e.g., GHK-Cu, BPC-157, Ipamorelin)
Code name: An alphanumeric designation from the developing laboratory (e.g., AOD-9604, CJC-1295, SS-31)
CAS number: A unique numerical identifier assigned by Chemical Abstracts Service (e.g., 137525-51-0 for BPC-157)

When ordering research peptides, the CAS number is the most unambiguous identifier. Product names and sequence descriptions can vary between vendors.

Conclusion

Peptide nomenclature follows logical conventions that become intuitive with practice. Understanding how sequences are written, what modifications mean, and how salt forms affect the physical product allows researchers to communicate precisely about their materials and interpret vendor product descriptions accurately. When in doubt, the CAS number and the full amino acid sequence (including all modifications) provide the most unambiguous identification of a research peptide.

This article is for educational purposes related to peptide chemistry and research. All peptides discussed are for laboratory research use only and are not intended for human consumption.