__ Summary and Contributions__: In this paper, the authors propose a novel NN model that incorporate the time-reversal symmetry into ODE networks. They develop a loss for this model. These can be combined with variants of ODE networks, such as Hamiltonian ODE networks. They empirically validate their model in several setups, i.e., conservative vs. non-conservative and reversible vs. irreversible systems.

__ Strengths__: The time-reversal symmetry is the important symmetry in physical systems. And, I think the proposed scheme is a reasonable and principled way to incorporate this important property into recently-attention-getting ODE networks. Empirical results show their model work reasonably for simple physical systems.

__ Weaknesses__: I think (not major) weak points of this paper is twofold: The first one is in the empirical evaluations. They use one data generating system and deal with setups (eg conservative vs. non-conservative) by varying parameters with Gaussian noises. Although I agree the proposed model works to learn the system, it is difficult to see the empirical properties, limitations and so on from this experiment. And the second is the part in their loss where the balance between two type of losses is controlled by the regularization term. To be honest, I am not dissatisfied with this part and wonder what is the premise of this part.

__ Correctness__: Because the basic idea is simple (principled), I found no concern in the correctness about their descriptions in the paper.

__ Clarity__: The paper is well-written with sufficient information, and the readability is high.

__ Relation to Prior Work__: The paper describes clearly the relation with existing works and their contributions upon those.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The authors present a regularizer for learning approximately time-symmetric governing equations for ordinary dynamical systems. They demonstrate that using the proposed regularization results in improved learning of dynamics from noisy data. The paper's main contribution is the recognition that time-symmetry may be an important inductive bias for learning real-world physical systems.

__ Strengths__: The work is well presented. It is novel and of interest to anyone learning dynamics from observed data.

__ Weaknesses__: The principle limitation of this work is that, like so many things in machine learning, we don't have real guarentees. The regularization is presented as a strength, and it is when we would like something that is "near" a time-symmetric solution. But a solution that guarentees it would be very welcome.
Other smaller, more specific points:
How general is this choice of R?
How difficult is it to choose the correct R?
What are the consequences of choosing the wrong R?
What is the impact on training time or convergence?

__ Correctness__: The approach and evaluation appear to be correct.

__ Clarity__: The paper is very well written and easy to follow.
The authors should explicitly state the definition of R, presumable (p,-q), they used in their experiments.
The authors clearly state that their data generation model is time-reversible whenever gamma=delta=0.
A small type: line 190 "lean" -> "learn"?

__ Relation to Prior Work__: The authors try to delineate their work from previous works. However, they appear to be setting themselves up to differentiate their work from approaches that use regularizations to encode physical knowledge (lines 27,28). Yet, the authors' work comes down to exactly a regularization that encodes physical knowledge. On line 29 the authors suggest that these approaches are somehow problem specific and thus don't generalize, the implication being that their method does. The authors should be more explicit as to what these other regularizers are and how specific they might be.

__ Reproducibility__: Yes

__ Additional Feedback__: I found the rebuttal to be solid. The authors have noted several additional experiments and points developed during the rebuttal period. They have addressed my criticisms and as a result, I am more firm in my score.

__ Summary and Contributions__: This paper proposes a new method for learning ordinary differential equations that underlie observed time series that biases the ordinary differential equations towards being time-reversal symmetric.

__ Strengths__: The paper was well-written and easy to follow.

__ Weaknesses__: With the new experiments, the main weaknesses that I previously had here have been answered. I think the only remaining weakness is that this will be most applicable to classical mechanics, but those systems are already often well-understood much of the time from basic physics, and so one could instead learn a dimensionality-reduced model of the enlarged physics model rather than using this method.

__ Correctness__: Yes, yes.

__ Clarity__: Yes.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper proposes a Time-reversal symmetric ODE network that utilizes a novel loss function measuring how well the ODE network complies with time-reversal symmetry. The main idea is to use the time-reversal symmetry of classical dynamics to measure the discrepancy in the time evolution of ODE networks between the forward and backward dynamics. The proposed time-reversal symmetric ODE is shown to be more sample efficient and achieves better prediction accuracy comparing with vanilla ODE neural network, Hamiltonian ODE neural networks.
------------------------------------------------------------------------------------------------------
I think the rebuttal addressed my main concern which is lack of evaluation of real-world datasets. I'd like to keep my score (7-accept).

__ Strengths__: 1) The paper provides a simple but very effective way to incorporating physics-based bias for neural networks.
2) The paper provides very thorough empirical evaluation of the proposed neural network on data generated from physics systems with different characteristics

__ Weaknesses__: 1) The paper did not provide any evaluation results of the system on real-world data

__ Correctness__: The claims and methods seems to be correct

__ Clarity__: The paper is generally well-written with some minor grammatical errors.
Suggestions:
1) line 13: "better predictive errors" -> "smaller predictive errors" or "better predictive performance"
2) line 106: "Furthermore, they can lean" -> "Furthermore, they can learn"
3) line 107: "because they fully exploit the nature of" -> "since they fully exploit the nature of"

__ Relation to Prior Work__: The paper clearly discussed how this work differs from previous contributions, such as Hamiltonian ODE neural network, which do not work properly for non-conservative systems.

__ Reproducibility__: Yes

__ Additional Feedback__: