R#2 and R#3 generally liked the paper. R#1 has a brief review that raised concern on novelty of the method. The rebuttal well addressed the concerns and made all reviewers increase their score. We have collected comments from an additional reviewer, who pointed out more issues on writing and the theoretical results (see blew). We advise the authors to take efforts to address these issues in the revision. ==== Pros: The work proposed a new approach of uncertainty-aware semi-supervised learning on graph data; empirical results are promising. Cons: 1) The method is complicated, putting many different existing methods together in a rather engineering way. 2) The presentation and clarity need to be improved significantly, especially in stating mathematical results. For example, in Theorem 1, u_diss, u_alea, u_epis, u_en were never explicitly defined. The reader need to guess what they are from the section. Unfortunately, the notation does not seem to be consistent across different sessions (e.g., it is unclear how the P(y|p) in Section 3.2 and P(y|x, \theta) in Section 3.3 are related). Because of this, Theorem 1 is not self-contain and it is unclear what are the rigorous mathematical conditions are needed. The use of notation "\gg" and "\approx" should be avoided in stating rigorous mathematical results. It should be stated clearly in what sense the approximation holds. 3) It is unclear if the proposed quantities (when estimated from data) do measure the promised types of uncertainties on the corresponding names. In fact, it is unclear if and when they are identifiable from empirical observation (even when there is infinite data). I would like to suggest to rephrase the writing to clarify the heuristic nature of the proposed method and avoid the impression that they are theoretically rigorous quantification of different uncertainties. In addition, Proposition 1 is very informal and no rigorous condition is presented to arrive the results stated. Also, Eq 11 is very confusing. It is unclear how the second and third terms depend on theta, especially given that theta has been integrated out in P(y|A, r; G) as shown in Eq 8. If variational inference is used somehow, should not we optimize the distribution parameters of theta, instead of theta itself? Misc: y_{ij} in Eq 9 is inconsistent with y_i appeared earlier. Line 107: if a_k = 1/K, why does it still appear in Eq (2)? Eq 5: unclear if the LHS is conditioned on x. Line 143: how is OOD related to alpha=1,1,1? Line 181: is Prob() the same as P()? Section 5.2: theta and f_i are undefined when they first appeared; is the theta here the same as the theta in Section Section 3.3? It is unclear how the variational inference objective is combined Eq 9 and Eq 11; specify it explicitly in the paper.