• Hanns J. Neubert

Genomics was yesterday - proteomics is tomorrow.

Proteomics aufmacherScience. Decoding the 3.2 billion letters of the human genome 20 years ago was a sensation. Now researchers are taking the next step - deciphering the proteins, the actual carriers of life.

When the need was great, things suddenly moved very quickly. At the end of March 2020, after only six weeks of development, Bosch presented the first rapid Corona test. Using antibodies, the proteins of the immune system, SARS-CoV-2 virus fragments could be detected after a Corona illness.

This is an example of the importance of elucidating protein function. "If you look at the exact protein composition of different cells of patients, their proteome, you get detailed information about which proteins play a role in certain diseases," explains Jürgen Cox, research group leader at the Max Planck Institute of Biochemistry in Martinsried near Munich.

Proteins are the actual carriers of life. They are both building materials and tools. Muscles, nerves, organs and hair are all made of proteins. Just like the red haemoglobin blood bodies for oxygen transport, the enzymes for accelerating chemical reactions in the body, the hormones as messengers of messages or the antibodies of the immune defence.

What hardly anyone knows is that most non-infectious diseases are caused by incorrectly programmed proteins. And many modern therapies are based on protein, such as the diabetes drug insulin or the most effective cancer drugs.

If it were possible to decode the individual proteome of each patient, science would be a lot closer to the dream of personalised medicine. However, this is much more difficult than discovering defective gene segments in DNA. This is because DNA exists as a double strand on which the genes are strung together as a sequence of letters. Every cell in the body has the same genetic make-up in its nucleus, which remains unchanged throughout life.

It is quite different with proteins. Each cell of the body's organs always contains the same genome, but the protein composition of a liver cell is not comparable to that of nerve or brain cells.

Moreover, the sequence of amino acids in the chains changes during life. Thus, the proteome of young people looks different from that of older people. An example from the insect kingdom illustrates this. Even if its life begins as a caterpillar and ends as a butterfly, the insect's genes remain the same in every cell. The types of proteins in the cells, however, change fundamentally. "There are therefore almost innumerable possible combinations. It is therefore crucial that we develop methods in the future with which thousands of proteins can be analysed very quickly in a short time," explains Cox.

Of particular interest to researchers is the structure of protein macromolecules. Unlike DNA, they are spatially folded. This three-dimensionality can be used to find drugs that fit like a key into the lock of a protein molecule and thus unlock or lock it. This happens when, during immune defence, an antibody attaches itself to the antigenic counterpart in the shell of a bacterium or virus that causes illness, thereby rendering it harmless.

Unravelling such a structure is no easy task. This requires large-scale research tools such as the 2.3-kilometre-long storage ring of PETRA III at the electron research centre DESY in Hamburg. It generates the world's brightest X-ray radiation with extremely short-wavelength light beams. This makes it possible to observe the tiniest structures, such as the folding of individual proteins.

Recently, a group of researchers succeeded in a so-called X-ray screening in a short time to examine 7000 substances to see whether one of them has a three-dimensional structure that could be incorporated into an important enzyme of the SARS-CoV-2 virus and thus block it. They found 37 compounds to be candidate anti-corona drugs that could now be developed further.

"Another way to get better information about protein function is through sequencing, the breakdown of proteins into their individual amino acid building blocks. This involves identifying how many proteins of certain species are present," Cox explains. One research approach is to introduce a foreign molecule into a cell, such as a pollutant or a potential drug. "Then we can look to see which proteins are now present in the cell in higher or lower numbers, and compare those to healthy cells." This helps determine which proteins play a role in certain diseases, such as cancer.

The researchers' tool of choice here is what is known as mass spectrometry. In the course of this procedure, the analyzer removes electrons from the molecules, making them positively charged and thus electrically measurable. The result is a spectrogram that looks like a jagged curve when printed out. The length and position of the individual prongs provide information on how many protein molecules of a certain size are present in a sample.

The Herculean task is then to compare the data with other samples. The amount of data generated is so large that only an extremely fast computer is able to sort it. Cox's research group at the Max Planck Institute of Biochemistry has developed a powerful software called MaxQuant specifically for this purpose. With the help of this software, it is possible to compare the data of analysed cells with each other, but also with data in databases.

Protenostics

The largest of these protein databases is called UniProt, which has been operated and maintained since 2002 by the European Bioinformatics Institute, the Swiss Institute of Bioinformatics and the Protein Information Resource at Georgetown University in Washington, D.C. Information on well over 100000 proteins is now stored here and can be accessed freely and at no charge. "It's a huge treasure. New findings are added almost daily, especially from organisms that have not been so well studied scientifically," explains Cox. The information stored here allows conclusions to be drawn about the functions that proteins have in biology.

Because proteins are also becoming increasingly important as tools and active substances in industrial processes and products, their sequencing is even becoming more important for product development. The economy is increasingly moving towards a more environmentally friendly bioeconomy. Enzymes and surfactants in detergents have therefore long been produced from proteins. In the meantime, amino acid chains are even used in adhesives, high-performance lubricants or as reaction accelerators in the chemical industry.

However, in order to advance industrial development on the basis of proteins, hundreds of thousands of proteins have to be investigated and determined with regard to their properties. This can only be achieved with so-called "high throughput" methods.

The US company Quantum-Si, founded by Jonathan M. Rothberg, seems to have found a particularly fast way to sequence proteins more easily and cheaply. His invention is based on a semiconductor chip that can apparently be used to analyse and digitise hundreds of protein samples in a very short time. Rothberg calls the process "next generation protein sequencing". In any case, he should not lack the capital to further develop the technology. In mid-February 2021, the company succeeded in going public under the umbrella of SPAC acquisition company HighCape Capital. After the transaction, the company has more than $500 million in liquidity. "We want to democratize medicine by using the field of proteomics to understand not only what might be happening in the body, but what is actually happening right now," Jonathan M. Rothberg announced the ambitious goal on the occasion of the IPO.

Juergen Cox, however, is skeptical. "Quantum-Si is being quite cagey about it. You don't learn much about how the technology works exactly," comments the Max Planck researcher, even if the basic principle is clear to him. "There seems to be something in this silicon chip that can measure the protons, the positively charged molecules in the amino acid chains."

He thinks the term "next generation protein sequencing" is more marketing. "The American system, after all, is based on organizing a great deal of venture capital in order to be able to push technologies that no one knows yet whether they will even be successful." But Quantum-Si's new sequencing process doesn't just have to run, Cox points out. "After all, it has to work at least as well as mass spectrometry, while being cheaper, to replace established standard methods." The pure analysis of proteins is not enough either. Industry in particular requires more and more new, tailor-made amino acid combinations with very specific properties. These are not always of natural origin. In most cases, natural protein sequences have been modified in order to make them more efficient, more stable and suitable for specific applications, for example for particularly high or particularly low temperatures.

In the past, this required long analyses based on the principle of trial and error. These were extremely expensive and time-consuming laboratory experiments to test millions of protein variants for useful properties. Soon, this could be done by an artificial intelligence, such as the one recently developed at Chalmers University of Technology in Gothenburg.

"Accelerating the speed at which we develop proteins on the computer is

is very important to reduce the cost of enzyme catalysts, for example," said Martin Engqvist, one of the researchers involved from Chalmers' Department of Biology and Bioengineering. "This is key to realizing environmentally sound industrial processes and consumer products."

Moving forward. To be sure, understanding the human proteome in all its interactions will take another decade or two. But applications such as MaxQuant, the artificial intelligence of Chalmers protein researchers, or perhaps chip-based methods such as those of Quantum-Si, combined with ever-faster and more powerful computers, could put protein research and use on an exponential path. "Eventually, we might be able to use high-throughput methods, like mass spectrometry, directly in diagnostics. And thus measure the entire proteome of a patient simultaneously," says Jürgen Cox. That would indeed be a revolution. ®

–––––––––––––––––––––––––––

The protein - the secret of life.

Proteins - they used to be called proteins - are macromolecules consisting of long chains. There 23 different amino acids line up in different sequences like letters.

The sequence of amino acids and the length of such a chain is determined in the genetic code. Many genes stand for more than just one protein, so that there are far more proteins in the human body than genes. In theory, however, the 23 amino acids that can form protein chains can be combined to form 26 quadrillion combinations - a 26 with 21 zeros. If the nine-million-square-kilometer Sahara held that many grains of sand, it would be covered throughout with a three-meter-thick layer of sand.

"Beyond that, there are many more amino acids that are not incorporated into proteins," Jürgen Cox explains. "These, like other small molecules, also float in the cells and perform very specific tasks there."

As if that weren't complex enough, the blueprint for proteins given by the genes can also be altered many times within the cells by a chemical reaction called phosphorylation. "This is, so to speak, an additional code of its own that specifies at which sites of a protein this reaction starts. It happens very dynamically and rapidly," Cox explains. "In proteomics, this information is extremely important because the change happens afterwards, after these proteins are already assembled." In this way, signals are sent out very quickly in a cell so that it can, for example, bring an important, necessary molecule into the cell from outside or dispose of "junk" to the outside.

–––––––––––––––––––––––––––

Author: Hanns-J. Neubert