08-14 February 2023

Professor Doctor James R. Bozeman Jr.

Home Institution
American University of Malta, Malta

Host Contact
Dr. Veronica Andrea Gonzalez Lopez and Dr. Jesus Enrique Garcia,

Host Institution
Universidad de Campinas, Brazil

Aim of the mission
The purpose of this STSM is to complete work already begun on finding the probability that sequences of bases in DNA that are likely to be in the left-handed form, i.e. the Z-DNA conformation, occur in certain full DNA sequences. One of the researchers in Brazil that the proposer will be visiting on this mission, Dr. Veronica Andrea Gonzalez Lopez, has developed, with colleagues, a local metric from a distance measure between samples coming from discrete Markovian processes. They use the metric to decide if 2 independent samples are governed by the same Stochastic Law. This technique was applied to the Stochastic Profile of Strains of the Zika virus utilizing the 4 bases of DNA. They find the probability of one base following another in the genomic sequence. This last is accomplished in a computer program written by the other researcher with whom the proposer will be working while in Brazil, Dr. Jesus Enrique Garcia. While in Brazil, we will be applying the methodology above to find the
probability of potential Z-DNA forming sequences (ZFSs) occurring in rodent parvoviruses, salmonella, and certain carcinogens. The work already begun on Sars Covid-19 will also be completed.

Summary of the Results

During the STSM the probabilities of potential Z-DNA forming sequences (ZFS) occurring in numerous genomes were found. These genomes included SARS Covid-19, Salmonella, Rodent Parvoviruses and some cancer-causing chemicals, which were already known to have ZFS. More importantly, we also found such sequences in the Epstein-Barr virus, which had not been checked previously for
ZFS, and calculated the probabilities of those subsequences occurring. Included in this work above we also found the probability with which the other bases appeared before the beginning of the potential ZFS. The work in this paragraph was mostly done by Dr. Jesus Enrique Garcia, who is a Probabilist.

The other collaborator, Dr. Veronica Andrea Gonzalez Lopez, a Statistician, applied the local metric from a distance measure between samples coming from discrete Markovian processes to these genomic sequences and the ZFS subsequences. For instance, she looked at one cancer-causing chemical with ZFS and one without and compared the two. A partition Markov model is to be applied to all of these sequences and ultimately we plan to form a classification of them (see the next section).
The proposer, who is a low-dimensional topologist, is utilizing the work above while examining the 3-dimensional conformation of the DNA molecule. For example, DNA supercoils in the cell and negative supercoiling favors the left-handed Z-form of the molecule (note that topoisomerases can relax negative supercoiling). There are also proteins which only bind to Z-regions. Finally, there are anti-Z-DNA antibodies associated with the cancer-causing chemicals noted above.

New avenues of research were also discovered. One is the application of our techniques to the Epstein-Barr virus, as noted above. The other is the realization that we can form a classification of the DNA sequences and ZF subsequences, utilizing the partition Markov model. Some others include:
1. Finding the probability of the different bases that appear at the end of a ZFS.
2. Finding the total occurrences of ZFS in as many genomes as possible and then calculating the total appearances of same and comparing the result to the findings in the literature.
3. Answering the following questions: If a single different base appears in a ZFS can it still take on the left-handed form? We plan to use the Levenshtein distance to answer this question; Can different ZFS which are concatenated still flip to left-handed form?
4. Finally, apply the idea of microsatellites, which we just discovered, to our work.
The work already done in Brazil is publishable, however we may wait to prepare a paper until we complete the new items outlined above. Without doubt we will be presenting our findings at conferences this year.