How to Perform Cloud Sync in Only 3 Steps?

In the past decade, more and more people have been willing to use cloud storage space to save all important files, including memories, videos, photos, work documents, email, design files, and more. It’s quite easy for you to access anywhere and anytime because you can access them online as long as you have a stable […]

Technology Transformation strategy for SMEs

The integration of technology into business has gone from being a luxury to a necessity, especially for small and medium-sized enterprises (SMEs), which are most keen to remain competitive in today’s environment. By leveraging digital technologies, SMEs can optimise their workforce and internal business operations, change the way they serve customers, and grow against less […]

3 Important Considerations in DDPG Reinforcement Algorithm

Photo by Jeremy Bishop on Unsplash

Deep Deterministic Policy Gradient (DDPG) is a Reinforcement learning algorithm for learning continuous actions. You can learn more about it in the video below on YouTube:

https://youtu.be/4jh32CvwKYw?si=FPX38GVQ-yKESQKU

Here are 3 important considerations you will have to work on while solving a problem with DDPG. Please note that this is not a How-to guide on DDPG but a what-to guide in the sense that it only talks about what areas you will have to look into.

Noise

Ornstein-Uhlenbeck

The original implementation/paper on DDPG mentioned using noise for exploration. It also suggested that the noise at a step depends on the noise in the earlier step. The implementation of this noise is the Ornstein-Uhlenbeck process. Some people later got rid of this constraint about the noise and just used random noise. Based on your problem domain, you may not be OK to keep noise at a step related to the noise at the earlier step. If you keep your noise at a step dependent on the noise at the earlier step, then your noise will be in one direction of the noise mean for some time and may limit the exploration. For the problem I am trying to solve with DDPG, a simple random noise works just fine.

Size of Noise

The size of noise you use for exploration is also important. If your valid action for your problem domain is from -0.01 to 0.01 there is not much benefit by using a noise with a mean of 0 and standard deviation of 0.2 as you will let your algorithm explore invalid areas using noise of higher values.

Noise decay

Many blogs talk about decaying the noise slowly during training, while many others do not and continue to use un-decayed during training. I think a well-trained algorithm will work fine with both options. If you do not decay the noise, you can just drop it during prediction, and a well-trained network and algorithm will be fine with that.

Soft update of the target networks

As you update your policy neural networks, at a certain frequency, you will have to pass a fraction of the learning to the target networks. So there are two aspects to look at here — At what frequency do you want to pass the learning (the original paper says after every update of the policy network) to the target networks and what fraction of the learning do you want to pass on to the target network? A hard update to the target networks is not recommended, as that destabilizes the neural network.

But a hard update to the target network worked fine for me. Here is my thought process — Say, your learning rate for the policy network is 0.001 and you update the target network with 0.01 of this every time you update your policy network. So in a way, you are passing 0.001*0.01 of the learning to the target network. If your neural network is stable with this, it will very well be stable if you do a hard update (pass all the learning from the policy network to the target network every time you update the policy network), but keep the learning rate very low.

Neural network design

While you are working on optimizing your DDPG algo parameters, you also need to design a good neural network for predicting action and value. This is where the challenge lies. It is difficult to tell if the bad performance of your solution is due to the bad design of the neural network or an unoptimized DDPG algo. You will need to keep optimizing on both fronts.

While a simpleton neural network can help you solve Open AI gym problems, it will not be sufficient for a real-world complex problem. The principle I follow while designing a neural network is that the neural network is an implementation of your (or the domain expert’s) mental framework of the solution. So you need to understand the mental framework of the domain expert in a very fundamental manner to implement it in a neural network. You also need to understand what features to pass to the neural network and how to engineer the features in a way that the neural network can interpret them to successfully predict. And that is where the art of the craft lies.

I still have not explored discount rate (which is used to discount rewards over time-steps) and have not yet developed a strong intuition (which is very important) about it.

I hope you liked the article and did not find it overly simplistic or stupid. If liked it, please do not forget to clap!


3 Important Considerations in DDPG Reinforcement Algorithm was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Harnessing Technology to Reveal the Potential of Stressed Assets

The Indian banking system is dealing with a huge amount of non-performing assets (NPAs), loans that are in arrears or default from borrowers. At any given time, there are more than ₹8 lakh crore worth of stressed assets in the form of NPAs, with at least ₹2 lakh crore written off annually by banks. Major […]

6 Benefits of Demand Planning Software For Business Growth

Proper demand planning has always helped businesses to predict what customers want and at what time. This accurate forecasting helps them stock up on required products in the inventory. So, nowadays, businessmen are relying on demand planning software to forecast the demand accurately. This software is best used for helping you fulfill the customer needs […]

How Automation is Enhancing Food Production

In the modern age, the food production industry is undergoing a significant transformation driven by the integration of automation technologies. From farm to fork, automation is revolutionizing every stage of the food production process, enhancing efficiency, safety, and sustainability. This technological shift is not only meeting the growing demand for food but also addressing challenges […]

Hyderabad’s Skippi Ice Pops Scoops Up ₹10 Crore in Latest Funding Round

Skippi Ice Pops startup from Hyderabad has successfully secured ₹10 crore in pre-Series A funding led by Hyderabad Angel Network (HAN) and Venture Catalysts (VCATs). It is simultaneously learned to have plans for secure another ₹7 crore. This is a gesture of strong investor confidence in it. The new funding will be used for brand-building […]

How is Google Wallet Different from Google Pay?

Google has come up with a new app called Google Wallet. It is especially designed to store payment cards, boarding passes, electronic car keys and more such similar digital stuff. The search giant announced about its wallet during the I/O 2022 developer conference in May. Soon after the announcement, Android users notice the new Google […]

Elev8 Venture Partners Invests Spiritual-Tech Company – Astrotalk

Elev8 Venture Partners, a Bangalore-based growth-stage fund, proudly announces its second investment from its maiden fund into Astrotalk. This investment round totals INR 110 crores (USD 14 million) and includes a mix of primary and secondary investments. This funding follows a previous round led by Left Lane with an investment of INR 160 crores (USD […]

Sports Software Development Trends in 2024

The sports betting industry continues to evolve at a rapid pace, with technological advancements and changing consumer behaviors driving significant transformations. As we move into 2024, several trends are set to shape the landscape of sports betting, offering new opportunities and challenges for operators and bettors alike. One key area of focus is the rise […]