{"title": "Training a Quantum Neural Network", "book": "Advances in Neural Information Processing Systems", "page_first": 1019, "page_last": 1026, "abstract": "", "full_text": "Training a Quantum Neural Network\n\nBob Ricks\n\nDepartment of Computer Science\n\nBrigham Young University\n\nProvo, UT 84602\n\nDan Ventura\n\nDepartment of Computer Science\n\nBrigham Young University\n\nProvo, UT 84602\n\ncyberbob@cs.byu.edu\n\nventura@cs.byu.edu\n\nAbstract\n\nMost proposals for quantum neural networks have skipped over the prob-\nlem of how to train the networks. The mechanics of quantum computing\nare different enough from classical computing that the issue of training\nshould be treated in detail. We propose a simple quantum neural network\nand a training method for it. It can be shown that this algorithm works\nin quantum systems. Results on several real-world data sets show that\nthis algorithm can train the proposed quantum neural networks, and that\nit has some advantages over classical learning algorithms.\n\n1 Introduction\n\nMany quantum neural networks have been proposed [1], but very few of these proposals\nhave attempted to provide an in-depth method of training them. Most either do not mention\nhow the network will be trained or simply state that they use a standard gradient descent\nalgorithm. This assumes that training a quantum neural network will be straightforward and\nanalogous to classical methods. While some quantum neural networks seem quite similar\nto classical networks [2], others have proposed quantum networks that are vastly different\n[3, 4, 5]. Several different network structures have been proposed, including lattices [6]\nand dots [4]. Several of these networks also employ methods which are speculative or\ndif\ufb01cult to do in quantum systems [7, 8]. These signi\ufb01cant differences between classical\nnetworks and quantum neural networks, as well as the problems associated with quantum\ncomputation itself, require us to look more deeply at the issue of training quantum neural\nnetworks. Furthermore, no one has done empirical testing on their training methods to\nshow that their methods work with real-world problems.\n\nIt is an open question what advantages a quantum neural network (QNN) would have over\na classical network. It has been shown that QNNs should have roughly the same computa-\ntional power as classical networks [7]. Other results have shown that QNNs may work best\nwith some classical components as well as quantum components [2].\n\nQuantum searches can be proven to be faster than comparable classical searches. We lever-\nage this idea to propose a new training method for a simple QNN. This paper details such a\nnetwork and how training could be done on it. Results from testing the algorithm on several\nreal-world problems show that it works.\n\n\f2 Quantum Computation\n\nSeveral necessary ideas that form the basis for the study of quantum computation are brie\ufb02y\nreviewed here. For a good treatment of the subject, see [9].\n\n2.1 Linear Superposition\n\nis described by a quantum state |\u03c8i =P\n\nLinear superposition is closely related to the familiar mathematical principle of linear com-\nbination of vectors. Quantum systems are described by a wave function \u03c8 that exists in a\nHilbert space. The Hilbert space has a set of states, |\u03c6ii, that form a basis, and the system\ni ci |\u03c6ii. |\u03c8i is said to be coherent or to be in a\nlinear superposition of the basis states |\u03c6ii, and in general the coef\ufb01cients ci are complex.\nA postulate of quantum mechanics is that if a coherent system interacts in any way with its\nenvironment (by being measured, for example), the superposition is destroyed. This loss\nof coherence is governed by the wave function \u03c8. The coef\ufb01cients ci are called probability\namplitudes, and |ci|2 gives the probability of |\u03c8i being measured in the state |\u03c6ii . Note\nthat the wave function \u03c8 describes a real physical system that must collapse to exactly one\nbasis state. Therefore, the probabilities governed by the amplitudes ci must sum to unity. A\ntwo-state quantum system is used as the basic unit of quantum computation. Such a system\nis referred to as a quantum bit or qubit and naming the two states |0i and |1i, it is easy to\nsee why this is so.\n\n2.2 Operators\n\nOperators on a Hilbert space describe how one wave function is changed into another and\nthey may be represented as matrices acting on vectors (the notation |\u00b7i indicates a column\nvector and the h\u00b7| a [complex conjugate] row vector). Using operators, an eigenvalue equa-\ntion can be written A|\u03c6ii = ai |\u03c6ii, where ai is the eigenvalue. The solutions |\u03c6ii to such\nan equation are called eigenstates and can be used to construct the basis of a Hilbert space\nas discussed in Section 2.1. In the quantum formalism, all properties are represented as op-\nerators whose eigenstates are the basis for the Hilbert space associated with that property\nand whose eigenvalues are the quantum allowed values for that property. It is important\nto note that operators in quantum mechanics must be linear operators and further that they\nmust be unitary.\n\n2.3 Interference\n\nInterference is a familiar wave phenomenon. Wave peaks that are in phase interfere con-\nstructively while those that are out of phase interfere destructively. This is a phenomenon\ncommon to all kinds of wave mechanics from water waves to optics. The well known\ndouble slit experiment demonstrates empirically that at the quantum level interference also\napplies to the probability waves of quantum mechanics. The wave function interferes with\nitself through the action of an operator \u2013 the different parts of the wave function interfere\nconstructively or destructively according to their relative phases just like any other kind of\nwave.\n\n2.4 Entanglement\n\nEntanglement is the potential for quantum systems to exhibit correlations that cannot be\naccounted for classically. From a computational standpoint, entanglement seems intuitive\nenough \u2013 it is simply the fact that correlations can exist between different qubits \u2013 for exam-\nple if one qubit is in the |1i state, another will be in the |1i state. However, from a physical\nstandpoint, entanglement is little understood. The questions of what exactly it is and how\n\n\fit works are still not resolved. What makes it so powerful (and so little understood) is the\nfact that since quantum states exist as superpositions, these correlations exist in superpo-\nsition as well. When coherence is lost, the proper correlation is somehow communicated\nbetween the qubits, and it is this \u201ccommunication\u201d that is the crux of entanglement. Mathe-\nmatically, entanglement may be described using the density matrix formalism. The density\nmatrix \u03c1\u03c8 of a quantum state |\u03c8i is de\ufb01ned as \u03c1\u03c8 = |\u03c8ih\u03c8| For example, the quantum\n\nstate |\u03bei = 1\u221a\n\n2\n\n|00i + 1\u221a\n\n2\n\n|01i appears in vector form as |\u03bei = 1\u221a\n\nalso be represented as the density matrix \u03c1\u03be = |\u03beih\u03be| = 1\n\n2\n\n\uf8eb\uf8ec\uf8ed 1\n\n1\n0\n0\n\n2\n\n1\n1\n0\n0\n\n1\n0\n0\n\n\uf8eb\uf8ec\uf8ed 1\n\uf8eb\uf8ec\uf8ed 1\n\n0\n0\n0\n0\n\n0\n0\n1\n\n\uf8f6\uf8f7\uf8f8 and it may\n\uf8f6\uf8f7\uf8f8 while the\n\uf8f6\uf8f7\uf8f8\n\n0\n0\n0\n0\n\n0\n0\n0\n0\n\n0\n0\n0\n0\n\n1\n0\n0\n1\n\nstate |\u03c8i = 1\u221a\n\n2\n\n|00i + 1\u221a\n\n2\n\n|11i is represented as \u03c1\u03c8 = |\u03c8ih\u03c8| = 1\n\n2\n\n(cid:18)(cid:18) 1\n\n(cid:19)\n\n(cid:18) 1\n\n(cid:19)(cid:19)\n\n0\n\n0\n0\n\n\u2297\n\nwhere the matrices and vectors are indexed by the state labels 00,..., 11. Notice that \u03c1\u03be\nwhere \u2297 is the normal tensor\ncan be factorized as \u03c1\u03be = 1\n2\nproduct. On the other hand, \u03c1\u03c8 can not be factorized. States that can not be factorized are\nsaid to be entangled, while those that can be factorized are not. There are different degrees\nof entanglement and much work has been done on better understanding and quantifying it\n[10, 11]. Finally, it should be mentioned that while interference is a quantum property that\nhas a classical cousin, entanglement is a completely quantum phenomenon for which there\nis no classical analog. It has proven to be a powerful computational resource in some cases\nand a major hindrance in others.\n\n1\n1\n\n1\n\nTo summarize, quantum computation can be de\ufb01ned as representing the problem to be\nsolved in the language of quantum states and then producing operators that drive the system\n(via interference and entanglement) to a \ufb01nal state such that when the system is observed\nthere is a high probability of \ufb01nding a solution.\n\n2.5 An Example \u2013 Quantum Search\n\nOne of the best known quantum algorithms searches an unordered database quadratically\nfaster than any classical method [12, 13]. The algorithm begins with a superposition of\nall N data items and depends upon an oracle that can recognize the target of the search.\n\u221a\nClassically, searching such a database requires O(N) oracle calls; however, on a quan-\nN) oracle calls. Each oracle call consists of a\ntum computer, the task requires only O(\nquantum operator that inverts the phase of the search target. An \u201cinversion about average\u201d\nN repetitions of this\nprocess, the system is measured and with high probability, the desired datum is the result.\n\noperator then shifts amplitude towards the target state. After \u03c0/4 \u2217 \u221a\n\n3 A Simple Quantum Neural Network\n\nWe would like a QNN with features that make it easy for us to model, yet powerful enough\nto leverage quantum physics. We would like our QNN to:\n\n\u2022 use known quantum algorithms and gates\n\u2022 have weights which we can measure for each node\n\n\f\u2022 work in classical simulations of reasonable size\n\u2022 be able to transfer knowledge to classical systems\n\nWe propose a QNN that operates much like a classical ANN composed of several layers\nof perceptrons \u2013 an input layer, one or more hidden layers and an output layer. Each layer\nis fully connected to the previous layer. Each hidden layer computes a weighted sum of\nthe outputs of the previous layer. If this is sum above a threshold, the node goes high,\notherwise it stays low. The output layer does the same thing as the hidden layer(s), except\nthat it also checks its accuracy against the target output of the network. The network as a\nwhole computes a function by checking which output bit is high. There are no checks to\nmake sure exactly one output is high. This allows the network to learn data sets which have\none output high or binary-encoded outputs.\n\nFigure 1: Simple QNN to compute XOR function\n\nThe QNN in Figure 1 is an example of such a network, with suf\ufb01cient complexity to com-\npute the XOR function. Each input node i is represented by a register, |\u03b1ii. The two hidden\nnodes compute a weighted sum of the inputs, |\u03c8ii1 and |\u03c8ii2, and compare the sum to a\nthreshold weight, |\u03c8ii0. If the weighted sum is greater than the threshold the node goes\nhigh. The |\u03b2ik represent internal calculations that take place at each node. The output layer\nworks similarly, taking a weighted sum of the hidden nodes and checking against a thresh-\nold. The QNN then checks each computed output and compares it to the target output, |\u2126ij\nsending |\u03d5ij high when they are equivalent. The performance of the network is denoted\nby |\u03c1i, which is the number of computed outputs equivalent to their corresponding target\noutput.\nAt the quantum gate level, the network will require O(blm + m2) gates for each node of\nthe network. Here b is the number of bits used for \ufb02oating point arithmetic in |\u03b2i, l is the\nnumber of bits for each weight and m is the number of inputs to the node [14]-[15].\n\nThe overall network works as follows on a training set. In our example, the network has\ntwo input parameters, so all n training examples will have two input registers. These are\nrepresented as |\u03b1i11 to |\u03b1in2. The target answers are kept in registers |\u2126i11 to |\u2126in2. Each\nhidden or output node has a weight vector, represented by |\u03c8ii, each vector containing\nweights for each of its inputs. After classifying a training example, the registers |\u03d5i1 and\n|\u03d5i2 re\ufb02ect the networks ability to classify that the training example. As a simple measure\nof performance, we increment |\u03c1i by the sum of all |\u03d5ii. When all training examples have\n\n\fFigure 2: QNN Training\n\nbeen classi\ufb01ed, |\u03c1i will be the sum of the output nodes that have the correct answer through-\nout the training set and will range between zero and the number of training examples times\nthe number of output nodes.\n\n4 Using Quantum Search to Learn Network Weights\n\nOne possibility for training this kind of a network is to search through the possible weight\nvectors for one which is consistent with the training data. Quantum searches have been\nused already in quantum learning [16] and many of the problems associated with them\nhave already been explored [17]. We would like to \ufb01nd a solution which classi\ufb01es all\ntraining examples correctly; in other words we would like |\u03c1i = n \u2217 m where n is the\nnumber of training examples and m is the number of output nodes. Since we generally do\nnot know how many weight vectors will do this, we use a generalization of the original\nsearch algorithm [18], intended for problems where the number of solutions t is unknown.\nThe basic idea is that we will put |\u03c8i into a superposition of all possible weight vectors and\nsearch for one which classi\ufb01es all training examples correctly.\nWe start out with |\u03c8i as a superposition of all possible weight vectors. All other registers\n(|\u03b2i, |\u03d5i, |\u03c1i), besides the inputs and target outputs are initialized to the state |0i. We\nthen classify each training example, updating the performance register, |\u03c1i. By using a su-\nperposition we classify the training examples with respect to every possible weight vector\nsimultaneously. Each weight vector is now entangled with |\u03c1i in such a way that |\u03c1i corre-\nsponds with how well every weight vector classi\ufb01es all the training data. In this case, the\noracle for the quantum search is |\u03c1i = n \u2217 m, which corresponds to searching for a weight\nvector which correctly classi\ufb01es the entire set.\nUnfortunately, searching the weight vectors while entangled with |\u03c1i would cause un-\nwanted weight vectors to grow that would be entangled with the performance metric we\nare looking for. The solution is to disentangle |\u03c8i from the other registers after inverting\nthe phase of those weights which match the search criteria, based on |\u03c1i. To do this the\nentire network will need to be uncomputed, which will unentangle all the registers and set\nthem back to their initial values. This means that the network will need to be recomputed\n\n\feach time we make an oracle call and after each measurement.\n\nThere are at least two things about this algorithm that are undesirable. First, not all training\ndata will have any solution networks that correctly classify all training instances. This\nmeans that nothing will be marked by the search oracle, so every weight vector will have\nan equal chance of being measured.\nIt is also possible that even when a solution does\nexist, it is not desirable because it over \ufb01ts the training data. Second, the amount of time\n\nneeded to \ufb01nd a vector which correctly classi\ufb01es the training set is O(p2b/t), which has\n\nexponential complexity with respect to the number of bits in the weight vector.\n\nOne way to deal with the \ufb01rst problem is to search until we \ufb01nd a solution which covers an\nacceptable percentage, p, of the training data. In other words, the search oracle is modi\ufb01ed\nto be |\u03c1i \u2265 n \u2217 m \u2217 p. The second problem is addressed in the next section.\n\n5 Piecewise Weight Learning\n\nOur quantum search algorithm gives us a good polynomial speed-up to the exponential task\nof \ufb01nding a solution to the QNN. This algorithm does not scale well, in fact it is exponential\nin the total number of weights in the network and the bits per weight. Therefore, we propose\na randomized training algorithm which searches each node\u2019s weight vector independently.\nThe network starts off, once again, with training examples in |\u03b1i, the corresponding an-\nswers in |\u2126i, and zeros in all the other registers. A node is randomly selected and its\nweight vector, |\u03c8ii, is put into superposition. All other weight vectors start with random\nclassical initial weights. We then search for a weight vector for this node that causes the\nentire network to classify a certain percentage, p, of the training examples correctly. This is\nrepeated, iteratively decreasing p, until a new weight vector is found. That weight is \ufb01xed\nclassically and the process is repeated randomly for the other nodes.\n\nSearching each node\u2019s weight vector separately is, in effect, a random search through the\nweight space where we select weight vectors which give a good level of performance for\neach node. Each node takes on weight vectors that tend to increase performance with some\namount of randomness that helps keep it out of local minima. This search can be terminated\nwhen an acceptable level of performance has been reached.\n\nThere are a few improvements to the basic design which help speed convergence. First,\nto insure that hidden nodes \ufb01nd weight vectors that compute something useful, a small\nperformance penalty is added to weight vectors which cause a hidden node to output the\nsame value for all training examples. This helps select weight vectors which contain useful\ninformation for the output nodes. Since each output node\u2019s performance is independent\nof the performance or all output nodes, the algorithm only considers the accuracy of the\noutput node being trained when training an output node.\n\n6 Results\n\nWe \ufb01rst consider the canonical XOR problem. Each of the hidden and the output nodes\nare thresholded nodes with three weights, one for each input and one for the threshold. For\neach weight 2 bits are used. Quantum search did well on this problem, \ufb01nding a solution\nin an average of 2.32 searches.\n\nThe randomized search algorithm also did well on the XOR problem. After an average of\n58 weight updates, the algorithm was able to correctly classify the training data. Since this\nis a randomized algorithm both in the number of iterations of the search algorithm before\nmeasuring and in the order which nodes update their weight vectors, the standard deviation\nfor this method was much higher, but still reasonable. In the randomized search algorithm,\n\n\fan epoch refers to \ufb01nding and \ufb01xing the weight of a single node.\n\nWe also tried the randomized search algorithm for a few real-world machine learning prob-\nlems: lenses, Hayes-Roth and the iris datasets [19]. The lenses data set is a data set that\ntries to predict whether people will need soft contact lenses, hard contact lenses or no con-\ntacts. The iris dataset details features of three different classes of irises. The Hayes-Roth\ndataset classi\ufb01es people into different classes depending several attributes.\n\nData Set\n\nIris\nLenses\nHayes-Roth\n\n# Weight\nQubits\n32\n42\n68\n\nWeight\n\nOutput\n\nTraining\n\nEpochs Updates Accuracy Accuracy Backprop\n23,000\n96%\n92%\n22,500\n5 \u00d7 106\n83%\n\n98.23%\n98.35%\n88.76%\n\n97.79%\n100.0%\n82.98%\n\n225\n145\n9,200\n\nTable 1: Training Results\n\nThe lenses data set can be solved with a network that has three hidden nodes. After between\na few hundred to a few thousand iterations it usually \ufb01nds a solution. This may be because\nit has a hard time with 2 bit weights, or because it is searching for perfect accuracy. The\nnumber of times a weight was \ufb01xed and updated was only 225 for this data set. The iris data\nset was normalized so that each input had a value between zero and one. The randomized\nsearch algorithm found the correct target for 97.79% of the output nodes.\n\nOur results for the Hayes-Roth problem were also quite good. We used four hidden nodes\nwith two bit weights for the hidden nodes. We had to normalize the inputs to range\nfrom zero to one once again so the larger inputs would not dominate the weight vectors.\nThe algorithm found the correct target for 88.86% of the output nodes correctly in about\n5,000,000 epochs. Note that this does not mean that it classi\ufb01ed 88.86% of the training\nexamples correctly since we are checking each output node for accuracy on each train-\ning example. The algorithm actually classi\ufb01ed 82.98% of the training set correctly, which\ncompares well with backpropagation\u2019s 83% [20].\n\n7 Conclusions and Future Work\n\nThis paper proposes a simple quantum neural network and a method of training it which\nworks well in quantum systems. By using a quantum search we are able to use a well-\nknown algorithm for quantum systems which has already been used for quantum learning.\nThe algorithm is able to search for solutions that cover an arbitrary percentage of the train-\ning set. This could be very useful for problems which require a very accurate solution. The\ndrawback is that it is an exponential algorithm, even with the signi\ufb01cant quadratic speedup.\n\nA randomized version avoids some of the exponential increases in complexity with problem\nsize. This algorithm is exponential in the number of qubits of each node\u2019s weight vector\ninstead of in the composite weight vector of the entire network. This means the complexity\nof the algorithm increases with the number of connections to a node and the precision of\neach individual weight, dramatically decreasing complexity for problems with large num-\nbers of nodes. This could be a great improvement for larger problems. Preliminary results\nfor both algorithms have been very positive.\n\nThere may be quantum methods which could be used to improve current gradient descent\nand other learning algorithms. It may also be possible to combine some of these with a\nquantum search. An example would be to use gradient descent to try and re\ufb01ne a compos-\nite weight vector found by quantum search. Conversely, a quantum search could start with\nthe weight vector of a gradient descent search. This would allow the search to start with an\n\n\faccurate weight vector and search locally for weight vectors which improve overall perfor-\nmance. Finally the two methods could be used simultaneously to try and take advantage of\nthe bene\ufb01ts of each technique.\n\nOther types of QNNs may be able to use a quantum search as well since the algorithm\nonly requires a weight space which can be searched in superposition. In addition, more\ntraditional gradient descent techniques might bene\ufb01t from a quantum speed-up themselves.\n\nReferences\n[1] Alexandr Ezhov and Dan Ventura. Quantum neural networks. In Ed. N. Kasabov, editor, Future\n\nDirections for Intelligent Systems and Information Science. Physica-Verlang, 2000.\n\n[2] Ajit Narayanan and Tammy Menneer. Quantum arti\ufb01cial neural network architectures and\n\ncomponents. In Information Sciences, volume 124 nos. 1-4, pages 231\u2013255, 2000.\n\n[3] M. V. Altaisky. Quantum neural network. Technical report, 2001. http://xxx.lanl.gov/quant-\n\nph/0107012.\n\n[4] E. C. Behrman, J. Niemel, J. E. Steck, and S. R. Skinner. A quantum dot neural network. In\n\nProceedings of the 4th Workshop on Physics of Computation, pages 22\u201324. Boston, 1996.\n\n[5] Fariel Shafee.\n\nNeural networks with c-not gated nodes.\n\nTechnical\n\nreport, 2002.\n\nhttp://xxx.lanl.gov/quant-ph/0202016.\n\n[6] Yukari Fujita and Tetsuo Matsui. Quantum gauged neural network: U(1) gauge theory. Techni-\n\ncal report, 2002. http://xxx.lanl.gov/cond-mat/0207023.\n[7] S. Gupta and R. K. P. Zia. Quantum neural networks.\n\nSciences, volume 63 No. 3, pages 355\u2013383, 2001.\n\nIn Journal of Computer and System\n\n[8] E. C. Behrman, V. Chandrasheka, Z. Wank, C. K. Belur, J. E. Steck, and S. R. Skinner. A quan-\ntum neural network computes entanglement. Technical report, 2002. http://xxx.lanl.gov/quant-\nph/0202131.\n\n[9] Michael A. Nielsen and Isaac L. Chuang. Quantum computation and quantum information.\n\nCambridge University Press, 2000.\n\n[10] V. Vedral, M. B. Plenio, M. A. Rippin, and P. L. Knight. Quantifying entanglement. In Physical\n\nReview Letters, volume 78(12), pages 2275\u20132279, 1997.\n\n[11] R. Jozsa. Entanglement and quantum computation. In S. Hugget, L. Mason, K.P. Tod, T. Tsou,\nand N.M.J. Woodhouse, editors, The Geometric Universe, pages 369\u2013379. Oxford University\nPress, 1998.\n\n[12] Lov K. Grover. A fast quantum mechanical algorithm for database search. In Proceedings of\n\nthe 28th ACM STOC, pages 212\u2013219, 1996.\n\n[13] Lov K. Grover. Quantum mechanics helps in searching for a needle in a haystack. In Physical\n\nReview Letters, volume 78, pages 325\u2013328, 1997.\n\n[14] Peter Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a\nquantum computer. In SIAM Journal of Computing, volume 26 no. 5, pages 1484\u20131509, 1997.\n[15] Vlatko Vedral, Adriano Barenco, and Artur Ekert. Quantum networks for elementary arithmetic\n\noperations. In Physical Review A, volume 54 no. 1, pages 147\u2013153, 1996.\n\n[16] Dan Ventura and Tony Martinez. Quantum associative memory. In Information Sciences, vol-\n\nume 124 nos. 1-4, pages 273\u2013296, 2000.\n\n[17] Alexandr Ezhov, A. Nifanova, and Dan Ventura. Distributed queries for quantum associative\n\nmemory. In Information Sciences, volume 128 nos. 3-4, pages 271\u2013293, 2000.\n\n[18] Michel Boyer, Gilles Brassard, Peter H\u00f8yer, and Alain Tapp. Tight bounds on quantum search-\ning. In Proceedings of the Fourth Workshop on Physics and Computation, pages 36\u201343, 1996.\nrepository of machine learning databases, 1998.\nhttp://www.ics.uci.edu/\u223cmlearn/MLRepository.html.\n\n[19] C.L. Blake and C.J. Merz.\n\nUCI\n\n[20] Frederick Zarndt. A comprehensive case study: An examination of machine learning and con-\n\nnectionist algorithms. Master\u2019s thesis, Brigham Young University, 1995.\n\n\f", "award": [], "sourceid": 2363, "authors": [{"given_name": "Bob", "family_name": "Ricks", "institution": null}, {"given_name": "Dan", "family_name": "Ventura", "institution": null}]}