safety tips for those designing and analyzing AI

Here are some ways to design AI so that it is inherently safer and more predictable and interpretable.

0. Do not trust anything that a for-profit corporation (or even a non-profit) says about AI. They only want profit, so everything that they say about safety is carefully curated not to maximize safety but instead to maximize their own personal profit.

If we are going to keep on using the same kind of AI models that we have now, we need to design them to be more interpretable.

1. Language models have these things called word embeddings. The problem with our current word embeddings is that we are currently representing tokens as vectors instead of matrices. We need to instead use matrix-valued word embeddings where every token is represented as a matrix instead of just a vector. Why should we do this? Vectors are good at encoding a single meaning of a single token. But in natural language processing, tokens tend to have several meanings. With matrices, we can let every row (or column) represent a distinct meaning of the token. There are other ways of doing this too. For example, the token in a particular context can be represented as rank-1 positive semidefinite matrices, while the token in general can be represented as a positive semidefinite matrix of higher rank. This is analogous to how in quantum information theory, pure states are rank-1 positive semidefinite matrices with trace 1 (which are just unit vectors modulo a scalar on the unit circle) while mixed states are positive semidefinite matrices with trace 1. Matrix-valued word embeddings will neatly encode all of the data in language models so that it is more interpretable. Of course, after this, the language model needs to figure out how to take the matrix-valued word embeddings and work with them as matrices. That is not a problem. One can just make the weight matrices into Kronecker products or Kronecker sums so that they work nicely with matrices. Of course, after several layers, the transformer may want to discard all of the inappropriate and inapplicable contexts of the token, so the transformer may want to turn the matrices into vectors.

We need to get rid of neural networks. Neural networks are opague. They are inherently uninterpretable. We need to design systems that are inherently more interpretable than neural networks.

aleph_0. Machine learning systems should still be trained using gradient ascent/descent, but the fitness/loss functions should be designed so that there is typically just one local optimum or few local optima. If there is just one local optimum, then that local optimum is free from any random information and it is unlikely to contain any pseudorandom information either. It is also likely to be much more mathematical than fitness/loss functions with many local optima. But this means that we need to get rid of neural networks because neural networks are unlikely to have this nice and other nice mathematical properties.

Inaccessible. Machine learning systems should use functions of several complex variables or perhaps several quaternionic variables instead of just the real numbers. With that being said, fitness functions need to be maximized, so they need to have a real output, so the machine learning systems should instead use plurisubharmonic functions. We all know that functions of a complex variable behave better than functions of a real variable, and functions of several complex variables behave much better than functions of several real variables. One benefit of plurisubharmonity/subharmonicity is that these functions satisfy maximum principles. This means that if we have a plurisubharmonic fitness function, then the local maxima will always be on the boundary. But this also means that for plurisubharmonic functions, there are fewer local maxima to think about. Why is this? If we restrict the function to the boundary, there will probably be more local maxima, but once we extend the function, those local maxima will no longer be local maxima since there will be a way to escape those local maxima. If we make the 'Shilov boundary' small (as is the case with the closed polydisc), then the plurisubharmonic functions will have a domain of real dimension 2n but a boundary of real dimension n, so there are n extra dimensions for the local maxima in the boundary to no longer be local maxima on the entire closed polydisc. Plurisubharmonicity will ensure not only that the fitness functions have fewer local optima, but also that they will behave much more mathematically.

Weakly compact. Instead of just using functions of several complex or quaternionic variables, it is even better to use matrix-valued functions as our new fitness functions. Plenty of functions in mathematics extend to matrix-valued functions. For example, we may apply any polynomial function to matrices. We may also apply any holomorphic function to matrices, and every real-valued continuous function may be applied to Hermitian matrices with no problem. Matrices are also endowed with several operator norms, so one can easily use these several matrix-valued functions to produce machine learning models. AI systems constructed using these matrix-valued functions will be more interpretable than your neural networks since they will behave more mathematically.

Measurable. The Hessian of your loss function at your local minimum should not have exceptionally small eigenvalues. The value min(lambda)/max(lambda) (which is the ratio of the smallest to the largest eigenvalues of the Hessian at the local minimum) should not be exceedingly small. An exceedingly small value of min(lambda)/max(lambda) means that your local minimum is only barely a local minimum and a local minimum by some accident. Furthermore, if min(lambda)/max(lambda) is small, then a small perturbation of your loss function will kill your local minimum, and we can't have that. The local minimum should be robust to this kind of noise, and the best way to guarantee robustness will be to have a highly non-singular Hessian at the local minimum.

Huge. We need to design systems that are naturally free from adversarial examples and especially uninterpretable adversarial examples. By naturally, I mean that we should not need to use adversarial training to get rid of these adversarial examples. By interpretable, the adversarial examples should look more like optical illusions. It is easy to explain why optical illusions confuse people, but we have no such explanation for adversarial examples in machine learning. We need to fix this by constructing AI without such adversarial examples, or at least where the adversarial examples are interpretable in the same way that optical illusions are interpretable.

Rank-into-Rank. The resulting AI model should be smooth. The loss function should be smooth. Similar inputs to the function should always result in similar outputs. This means that it is better to use an analytic function or at least infinitely differentiable that ReLU (but we need to make other changes for this approach to be sensible). I want to be able to take a Hessian at a local maximum. Those adversarial examples where we have two nearly identical images of cats and the neural network thinks one is a cat and the other is a sheep are not supposed to happen simply because the neural network should not exaggerate such small differences so much. Such non-smooth functions are also not quite as mathematical as smooth functions, so smooth functions should be more interpretable than non-smooth functions.

Rank+1-into-Rank+1-The AI models should be mathematical in the sense that it should be easy to prove mathematical theorems about them. For example, the spectrum of the Laplacian of a graph can be used in machine learning, and there are many theorems about the spectra of matrices. On the other hand, if we are given a neural network for some task, one cannot prove anything about the particular local optimum that we have obtained nor can one prove mathematical theorems about the distribution of all possible local optima for one's fitness function. Sure. We have the universal approximation theorem that tells us that neural networks even with a single hidden layer can approximate anything if the hidden layer is wide enough. We can also approximate arbitrary continuous functions with polynomials. And in the complex case, Mergelyan's theorem allows us to uniformly approximate arbitrary continuous functions on an arbitrary compact subset of the complex plane with a polynomial function as long the original function is holomorphic on the interior. Unless you have a theorem that is about as good as Mergelyan's theorem for neural networks, I will consider them as objects that are not quite as mathematical as the objects that we find in areas such as complex analysis (1 or several complex variables), quantum information theory, or random matrix theory. Or maybe I simply do not know enough about the mathematical theorems of neural networks, and this is more of a knowledge and communication problem.

I personally like the more mathematical AI models. For example, if I want to obtain statistics about a data set, I want to use an statistical model that always converges to the same local optimum so that the process of obtaining those statistics itself does not add more noise to the data. At the moment, we currently do not have AI systems that can satisfy all of these requirements which is not good for actually understanding the inner workings of AI system and how they are processing data.

-Joseph Van Name Ph.D.