E-mail: rossk [at] chalmers.se
Office: Room 1006A (Fysik Origo, Kemigården 1)
The Automation of Science
My main research interest is the automation of scientific research. I originated the idea of a ‘Robot Scientist’: a physically implemented computer/robotic system that utilises techniques from artificial intelligence (AI) to execute cycles of scientific experimentation. A Robot Scientist can in a fully automatic manner: originate hypotheses to explain observations, devise experiments to test these hypotheses, physically run the experiments using laboratory robotics, interpret the results, and then repeat the cycle. My Robot Scientist ‘Adam’ was the first machine to autonomously hypothesise and experimentally confirm novel scientific knowledge. My Robot Scientist ‘Eve’ automates early-stage drug design, especially for neglected tropical diseases such as Malaria, Chagas, etc. Eve integrates and automates library-screening, hit-confirmation, and lead generation through cycles of quantitative structure activity relationship learning and testing. Using econometric modelling, we have demonstrated that the use of AI to select compounds economically outperforms standard drug screening
I am currently developing a 3rd-generation Robot Scientist ‘Genesis’ designed to in parallel automate thousands of closed-loop cycles of experiment. To physically achieve this Genesis will have 10,000 micro-chemostats and associated analytical equipment. The application domain for Genesis is automatically learning computational models of eukaryotic cells. This task is one of the most important and challenging in modern science. It is important because understanding how cells work is fundamental to medicine and biotechnology. It is extremely challenging because even the simple ‘model’ eukaryotic cell yeast (S. cerevisiae) has thousands of different genes, proteins, small molecules, all interacting in complex temporal spatial ways. Constructing high-fidelity models will require many thousands of cycles of designed experiments and model improvement, yet little current systems biology research completes a single cycle of model improvement.
I am involved in organising the international ‘Nobel Turing Grand Challenge’ to develop AI Scientists: AI systems capable of making Nobel-quality scientific discoveries highly autonomously at a level comparable, and possibly superior, to the best human scientists by 2050.
The idea behind DNA computing is to use DNA rather than silicon as the substrate from which to build computers. Using DNA my colleagues and I demonstrated the first physical Nondeterministic Universal Turing Machine (NUTM). For the most important class of problem in computer science, NP complete (non-deterministic polynomial), NUTMs are theoretically exponentially faster than both classical computers and quantum computers – assuming P ≠ NP. The NUTM design was based on Thue string rewriting systems, and thereby avoids the limitations of most previous DNA computing schemes: all the computation is local (simple edits to strings) so there is no need for communication, and there is no need to order operations. The design exploits DNA’s ability to replicate to execute an exponential number of computational paths in P time. Each Thue rewriting step is embodied in a DNA edit implemented using a novel combination of polymerase chain reactions and site-directed mutagenesis. In an NUTM, the resource limitation is space, which contrasts with classical computers and quantum computers where it is time. This fundamental difference enables an NUTM to trade space for time, which is significant for both theoretical computer science and physics. It is also of practical importance, for to quote Richard Feynman ‘there’s plenty of room at the bottom’. This means that a desktop DNA computer could potentially utilize more processors than all the electronic computers in the world combined, and thereby outperform the World’s current fastest supercomputer, while consuming a tiny fraction of its energy
Machine learning (ML) is currently the hottest technology on the planet. I have worked in the field for over 35 years, mostly on application to biology and chemistry: I published 5 out of the first 30 papers in PubMed on ‘machine learning’, there are now >43,000 listed. I have been particularly involved in ‘relational learning’, the use of 1st order predicate logic as the model language for learning. This approach is particularly suitable for problems where the internal structure of the examples is important, such as the case when the structure of molecules is involved. ML a generic technology that can be applied to a vast number of areas. My current application interests are:
Drug design: I am interested in ML methods for quantitative structure activity relationship (QSAR) learning. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small molecules) with associated bioactivities (e.g. inhibition of the target), learn a predictive mapping from molecular representation to activity. I am particularly interested in ‘active’ and ‘meta’ ML methods for QSAR learning. In active ML the ML method gets to choose its next example, which is closely related to the drug design problem of what compound to synthesise next in a series. In meta ML machine learning is used to learn how to better apply ML.
Cancer: I am developing ML methods to learn to better model cell-signalling in cancer and healthy cells, and to better predict what anti-cancer drugs to give to patients. This requires the integration of large amount of bioinformatic, chemoinformatic, and medical information. Relational ML is well suited to learning from this complicated data structure.