Understanding Weight Distribution In Neural Network Cross Validated - Reinforcement Learning Loss Function Can Not Converge During Training