We demonstrate that the problem of training neural networks with small (average) squared error is computationally intractable. Con(cid:173) sider a data set of M points (Xi, Yi), i = 1,2, ... , M, where Xi are input vectors from Rd, Yi are real outputs (Yi E R). For a net- work 10 in some class F of neural networks, (11M) L~l (fO(Xi)(cid:173) Yi)2)1/2 - inlfEF(l/ M) "2:f!1 (f(Xi) - YJ2)1/2 is the (avarage) rel(cid:173) ative error occurs when one tries to fit the data set by 10. We will prove for several classes F of neural networks that achieving a rela(cid:173) tive error smaller than some fixed positive threshold (independent from the size of the data set) is NP-hard.