By: Woods Prewitt
David D’Onofrio a senior at Fordham College Rose Hill studying Computational Neuroscience, has been conducting research since high school. During the summer of 2021, D’Onofrio started his research with the Lakatos Lab at the Nathan Kline Institute, researching neural networks. His research fits into the larger field of biomimetics, the creation of models, systems, or machines that mimic biological processes. In this case, D’Onofrio and his team are looking to build biologically accurate computer models that mimic the human brain, often referred to as neural networks.
These biologically-detailed networks are made up of nodes, or computational units that represent neurons, belonging to three layers: sensory, association, and motor (See Figure 1). These three layers work together to complete a given task. In D’Onofrio’s research, the network attempts to succeed in the game CartPole. The game’s objective is to keep the pole upright on the cart for 200 seconds by moving the cart left and right (See image 2). The sensory nodes measure four variables from the game environment: the cart’s position, velocity, the pole’s upright angle and its angular velocity, or the rate at which the pole’s angle changes. The sensory layer sends this information to the association layer, which then sends information to. The motor layer, which uses its input information to decide the direction the cart moves in. The nodes in one layer are probabilistically connected to nodes in the next layer. If a node fires, then it will send a signal to connected nodes in the next layer. Each connection has a numerical weight, determining the strength of the signal sent to the receiving node. If a node receives a strong enough collective sensation from its connections, then it will fire.
By adjusting the weights of the connections in the network, the performance of a model can be optimized. A fascinating aspect of these neural networks is that they can adjust these weights to optimize their performance on their own. Self-optimization is possible because of the actor-critic model. The model created by D’Onofrio’s team contains two parts: the actor and the critic. The aforementioned sensory, association, and motor layers make up the actor, which decides the best action to take based on input from its environment, which in this case is CartPole. The critic provides feedback either rewards or punishes the actor it based on its performance. A reward from the critic communicates to the actor that it should replicate the patterns that produced the successful action, and a punishment communicates that it should adjust the patterns. More specifically, feedback only adjusts the weights of connections between two nodes that fired. This method for adjusting the weights of connections is known as spike-timing-dependent plasticity (STDP). D’Onofrio’s team decided that the critic should award the model if its action decreases the pole’s angle (meaning that it approaches perpendicularity to the cart). Conversely, it will punish the model if the angle increases. The actor-critic system is a technique for reinforcement learning. The model can optimize its performance, or learn, by changing the weights between nodes, according to patterns that have either succeeded or failed.
A critical shortcoming of the actor-critic model paired with STDP has to do with its timeframe. By the time the critic awards the actor, the actor’s nodes are not firing in the same pattern that resulted in the good or bad action. Thus, the critic might attribute positive or negative feedback to the wrong nodes/connections. This issue is known as the temporal credit assignment problem. Senior neural network researchers at the Nathan Kline Institute introduced a method called eligibility tracing to fix the temporal credit assignment problem. Eligibility tracing flags connections between the two neurons that fired, which will signal to the critic that it should assign either a reward or punishment to these connections depending on the motor layer’s action. The magnitude of a flagged connection’s reward or punishment decreases exponentially over time to avoid misattribution of feedback.
D’Onofrio’s team was tasked with optimizing STDP in a network in the CartPole environment so that the feature can best promote learning. His team tested the effects of changing variables, such as the decay time of the flags from the eligibility traces or how much the model would change the weights of connections between layers. Through these experiments, D’Onofrio’s team believes that they have maximized the learning capabilities of their model using STDP. The optimization of the STDP in the relatively simple CartPole environment can be applied to some slightly more complex models by changing the input to sensory and output from motor layers according to the given task and environment. However, their team’s insights on optimizing STDP may prove more valuable than the optimized model itself. Because the field of computational neuroscience is so young, the application of research like D’Onofrio’s is rather limited. Instead, the goal of such projects is often to expand our knowledge of neural networks. The more advanced and accurate models are, the better we can use computer models to analyze and predict how the human brain would work in a given situation.
D’Onofrio’s team is now experimenting with combining a different learning algorithm, known as evolutionary learning with STDP, their previously used algorithm. Evolutionary learning is more disjunct and randomized than STDP and imitates biological evolution. They believe that they can improve their model and gain insight into the effects of combining learning algorithms by combining evolutionary learning and STDP. After graduation, D’Onofrio plans on continuing lab work before applying to a Ph.D. program in machine learning, computational neuroscience, or a related field.