纯python写的一个实现了AlphaGoZero算法的开源库

01-07 生活常识 投稿:幻城
纯python写的一个实现了AlphaGoZero算法的开源库

This is a pure Python implementation of a neural-network based Go AI, using TensorFlow. While inspired by DeepMind's AlphaGo algorithm, this project is not a DeepMind project nor is it affiliated with the official AlphaGo project.

This is NOT an official version of AlphaGo

Repeat, this is not the official AlphaGo program by DeepMind. This is an independent effort by Go enthusiasts to replicate the results of the AlphaGo Zero paper ("Mastering the Game of Go without Human Knowledge," Nature), with some resources generously made available by Google.

Minigo is based off of Brian Lee's "MuGo" -- a pure Python implementation of the first AlphaGo paper "Mastering the Game of Go with Deep Neural Networks and Tree Search"published in Nature. This implementation adds features and architecture changes present in the more recent AlphaGo Zero paper, "Mastering the Game of Go without Human Knowledge". More recently, this architecture was extended for Chess and Shogi in "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". These papers will often be abridged in Minigo documentation as AG (for AlphaGo), AGZ (for AlphaGo Zero), and AZ (for AlphaZero) respectively.

Goals of the Project

Provide a clear set of learning examples using Tensorflow, Kubernetes, and Google Cloud Platform for establishing Reinforcement Learning pipelines on various hardware accelerators.

Reproduce the methods of the original DeepMind AlphaGo papers as faithfully as possible, through an open-source implementation and open-source pipeline tools.

Provide our data, results, and discoveries in the open to benefit the Go, machine learning, and Kubernetes communities.

An explicit non-goal of the project is to produce a competitive Go program that establishes itself as the top Go AI. Instead, we strive for a readable, understandable implementation that can benefit the community, even if that means our implementation is not as fast or efficient as possible.

While this product might produce such a strong model, we hope to focus on the process. Remember, getting there is half the fun. :)

We hope this project is an accessible way for interested developers to have access to a strong Go model with an easy-to-understand platform of python code available for extension, adaptation, etc.

If you'd like to read about our experiences training models, see RESULTS.md.

To see our guidelines for contributing, see CONTRIBUTING.md.

Getting Started

This project assumes you have the following:

virtualenv / virtualenvwrapperPython 3.5+Dockergcloud

The Hitchhiker's guide to python has a good intro to python development and virtualenv usage. The instructions after this point haven't been tested in environments that are not using virtualenv.

pip3 install virtualenv
pip3 install virtualenvwrapper
Install TensorFlow

First set up and enter your virtualenv. Then start by installing TensorFlow and the dependencies:

pip3 install -r requirements.txt

The requirements.txt file assumes you'll use a GPU; if you wish to run on GPU you must install CUDA 8.0 or later (see TensorFlow documentation).

If you don't want to run on GPU or don't have one, you can downgrade:

pip3 uninstall tensorflow-gpu
pip3 install tensorflow

Or just install the CPU requirements:

pip3 install -r requirements-cpu.txt
Setting up the Environment

You may want to use a cloud project for resources. If so set:

PROJECT=foo-project

Then, running

source cluster/common.sh

will set up other environment variables defaults.

Running unit tests
BOARD_SIZE=9 python3 -m unittest discover tests
Basics

All commands are compatible with either Google Cloud Storage as a remote file system, or your local file system. The examples here use GCS, but local file paths will work just as well.

To use GCS, set the BUCKET_NAME variable and authenticate via gcloud login. Otherwise, all commands fetching files from GCS will hang.

For instance, this would set a bucket, authenticate, and then look for the most recent model.

export BUCKET_NAME=your_bucket;
gcloud auth application-default login
gsutil ls gs://minigo/models | tail -3

Which might look like:

gs://$BUCKET_NAME/models/000193-trusty.data-00000-of-00001
gs://$BUCKET_NAME/models/000193-trusty.index
gs://$BUCKET_NAME/models/000193-trusty.meta

These three files comprise the model, and commands that take a model as an argument usually need the path to the model basename, e.g. gs://$BUCKET_NAME/models/000193-trusty

You'll need to copy them to your local disk. This fragment copies the latest model to the directory specified by MINIGO_MODELS

MINIGO_MODELS=$HOME/minigo-models
mkdir -p $MINIGO_MODELS
gsutil ls gs://minigo/models | tail -3 | xargs -I{} gsutil cp "{}" $MINIGO_MODELS
Selfplay

To watch Minigo play a game, you need to specify a model. Here's an example to play using the latest model in your bucket

python rl_loop.py selfplay --readouts=$READOUTS -v 2

where READOUTS is how many searches to make per move. Timing information and statistics will be printed at each move. Setting verbosity (-v) to 3 or higher will print a board at each move.

Playing Against Minigo

Minigo uses the GTP Protocol, and you can use any gtp-compliant program with it.

# Latest model should look like: /path/to/models/000123-something
LATEST_MODEL=$(ls -d $MINIGO_MODELS*.sgf | wc -l

or e.g. "B+R", etc to search for how many by resign etc.

A histogram of game lengths (uses the 'ministat' package)

find . -name "*.sgf" -exec /bin/sh -c 'tr -cd \; {} | wc -c' \; | ministats

Get output of the most frequent first moves

grep -oh -m 1 '^;B\[[a-s]*\]' **/*.sgf | sort | uniq -c | sort -n

Distribution of game-winning margin (ministat, again):

find . -name "*.sgf" -exec /bin/sh -c 'grep -o -m 1 "W+[[:digit:]]*" {} | cut -c3-'
\; | ministat

Also check the 'oneoffs' directory for interesting scripts to analyze e.g. the resignation threshold.

标签: # 标签 # 标题
声明:伯乐人生活网所有作品(图文、音视频)均由用户自行上传分享,仅供网友学习交流。若您的权利被侵害,请联系ttnweb@126.com