TensorForce Bitcoin Trading Bot. Contribute to lefnire/tforce_btc_trader development by creating an account on GitHub. Aug 17, · A TensorForce -based Bitcoin trading bot (algo-trader). Uses deep reinforcement learning to automatically buy/sell/hold BTC based on price history. This project goes with Episode 26+ . A TensorForce-based Bitcoin trading bot (algo-trader). Uses deep reinforcement learning to automatically buy/sell/hold BTC based on price history. This project goes with Episode 26+ of .
Tensorforce btc traderGitHub - gitter-badger/tforce_btc_trader: TensorForce Bitcoin Trading Bot
This data is used by our BO or Boost algo to search for better hyper combos. You can have runs table in your history database if you want, one-and-the-same. I have them separate because I want the history DB on localhost for performance reason it's a major perf difference, you'll see , and runs as a public hosted DB, which allows me to collect runs from separate AWS p3.
Then, when you're ready for live mode, you'll want a live database which is real-time, constantly collecting exchange ticker data. Again, these can all 3 be the same database if you want, I'm just doing it my way for performance. I have them broken out of the hypersearch since they're so different, they kinda deserve their own runs DB each - but if someone can consolidate them into the hypersearch framework, please do.
In my own experience, in colleagues' experience, and in papers I've read here's one - we're all coming to the same conclusion. We're not sure why Maybe LSTM can only go so far with time-series. Another possibility is that Deep Reinforcement Learning is most commonly researched, published, and open-sourced using CNNs. This because RL is super video-game centric, self-driving cars, all the vision stuff. So maybe the math behind these models lends better to CNNs?
Who knows. The point is - experiment with both. Report back on Github your own findings. So how does CNN even make sense for time-series? Well we construct an "image" of a time-slice, where the x-axis is time obviously , the y-axis height is nothing TensorForce has all sorts of models you can play with. PPO is the second-most-state-of-the-art, so we're using that. DDPG I haven't put much thought into.
Those are the Policy Gradient models. We're not using those because they only support discrete actions, not continuous actions. Our agent has one discrete action buy sell hold , and one continuous action how much? Without that "how much" continuous flexibility, building an algo-trader would be You're likely familiar with grid search and random search when searching for optimial hyperparameters for machine learning models.
Random search throws a dart at random hyper combos over and over, and you just kill it eventually and take the best. Super naive - it works ok for other ML setups, but in RL hypers are the make-or-break; more than model selection.
That's why we're using Bayesian Optimization BO. See gp. BO starts off like random search, since it doesn't have anything to work with; and over time it hones in on the best hyper combo using Bayesian inference. Super meta - use ML to find the best hypers for your ML - but makes sense. Wait, why not use RL to find the best hypers? We could and I tried , but deep RL takes 10s of thousands of runs before it starts converging; and each run takes some 8hrs.
BO converges much quicker. I've also implemented my own flavor of hypersearch via Gradient Boosting if you use --boost during training ; more for my own experimentation. We're using gp. It uses scikit-learn's in-built GP functions.
I found gp. But luckily I hear you can pretty safely use BO's defaults. If anyone wants to explore any of that territory, please indeed! GPL bit so we share our findings. Community effort, right? Boats and tides. Skip to content. Go back. Launching Xcode If nothing happens, download Xcode and try again. This branch is 73 commits behind lefnire:master. Pull request Compare. Latest commit. Git stats commits. Failed to load latest commit information. View code. Chat on Gittr! Setup Python 3. You can call these whatever you want, and just use one db instead of two if you prefer see Data section.
Instead you want to try a hunch or two of your own first. Open hypersearch. What a waste. So you can use --gpu-split 3 to split your V 3 ways in 3 separate tabs, getting more bang-for-buck. BO is more exploratory and thorough, gradient boosting is more "find the best solution now ". Some papers have listed optimal default hypers. I'll keep my own "best defaults" updated in this project, but YMMV and you'll very likely need to try different hyper combos yourself.
Instead you want to try a hunch or two of your own first. What a waste. Boost will suck in the early runs. Run Once you've found a good hyper combo from above this could take days or weeks! Without this, it'll run from the hard-coded hyper defaults.
I'm gonna let you figure out how to plug it in on your own, 'cause that's danger territory. I ain't responsible for shit. This will start monitoring a live-updated database from config. In particular, PPO can give you great performance for a long time and then crash-and-burn. That kind of behavior will be obvious in your visualization below , so you can tell your run to stop after x consecutive positive episodes depends on the agent - some find an optimum and roll for 3 positive episodes, some 8, just eyeball your graph.
Visualize TensorForce comes pre-built with reward visualization on a TensorBoard. Check out their Github, you'll see. I needed much more customization than that for viz, so we're not using TensorBoard.
That's well and good - supervised learning learns what makes a time-series tick so it can predict the next-step future. But that's where it stops. It says "the price will go up next", but it doesn't tell you what to do. Well that's simple, buy, right? Ah, buy low, sell high - it's not that simple. Thousands of lines of code go into trading rules, "if this then that" style.
It's beautiful stuff! Those episodes are tutorial for this project; including an intro to Deep RL, hyperparameter decisions, etc. Data For this project I recommend using the Kaggle dataset described in Setup. It's a really solid dataset, best I've found! I'm personally using a friend's live-ticker DB. Unfortunately you can't. It's his personal thing, he may one day open it up as a paid API or something, we'll see.
If any y'all find anything better than the Kaggle set, LMK. Import it, train on it. This data is used by our BO or Boost algo to search for better hyper combos.
Again, these can all 3 be the same database if you want, I'm just doing it my way for performance. We're not sure why Maybe LSTM can only go so far with time-series. Another possibility is that Deep Reinforcement Learning is most commonly researched, published, and open-sourced using CNNs. This because RL is super video-game centric, self-driving cars, all the vision stuff. So maybe the math behind these models lends better to CNNs?
Who knows. The point is - experiment with both. Report back on Github your own findings. So how does CNN even make sense for time-series?