tensorflow 버전 지정해서 설치하는 경우
CUDA9.0이 설치가 안 돼서,
하는 수 없이 tensorflow 버전을 내리게 됐다.
C:> conda create -n tensorflow1.1 python=3.5
C:> activate tensorflow1.1
(tensorflow1.1)
C:> pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl
원본 주소는 요기 : https://github.com/carpedm20/multi-Speaker-tacotron-tensorflow
1. Install prerequisites
After preparing Tensorflow, install prerequisites with:
Click "Anaconda Prompt"
activate tensorflow nltk가 없다면 pip install nltkpip3 install -r requirements.txt python -c "import nltk; nltk.download('punkt')"
If you want to synthesize a speech in Korean dicrectly, follow 2-3. Download pre-trained models.
<svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg>2-1. Generate custom datasets
The datasets directory should look like:
datasets ├── son │ ├── alignment.json │ └── audio │ ├── 1.mp3 │ ├── 2.mp3 │ ├── 3.mp3 │ └── ... └── YOUR_DATASET ├── alignment.json └── audio ├── 1.mp3 ├── 2.mp3 ├── 3.mp3 └── ...
and YOUR_DATASET/alignment.json should look like:
{ "./datasets/YOUR_DATASET/audio/001.mp3": "My name is Taehoon Kim.", "./datasets/YOUR_DATASET/audio/002.mp3": "The buses aren't the problem.", "./datasets/YOUR_DATASET/audio/003.mp3": "They have discovered a new particle.", }
After you prepare as described, you should genearte preprocessed data with:
python3 -m datasets.generate_data ./datasets/YOUR_DATASET/alignment.json
<svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg>2-2. Generate Korean datasets
Follow below commands. (explain with son dataset)
-
export GOOGLE_APPLICATION_CREDENTIALS="YOUR-GOOGLE.CREDENTIALS.json"To automate an alignment between sounds and texts, prepare GOOGLE_APPLICATION_CREDENTIALS to use Google Speech Recognition API. To get credentials, read this. -
https://developers.google.com/identity/protocols/application-default-credentials
Google Application Default Credentials | Google Identity Platform ...
Google Application Default Credentials The Application Default Credentials provide a simple way to get authorization credentials ...
developers.google.com
여기에서 사용자 가입하고, https://cloud.google.com/sdk/docs/quickstart-windows 이걸로 cloud SDK 설치하고, gcloud auth application-default login 명령을 수행해야 한다.
Google Application Default Credentials | Google Identity Platform ...
Google Application Default Credentials The Application Default Credentials provide a simple way to get authorization credentials ...
developers.google.com
-
Download speech(or video) and text.
pip install m3u8 requests bs4 python3 -m datasets.son.download -
Segment all audios on silence.
pip install pydub python3 -m audio.silence --audio_pattern "./datasets/son/audio/*.wav" --method=pydub -
By using Google Speech Recognition API, we predict sentences for all segmented audios.
pip install google.cloud python3 -m recognition.google --audio_pattern "./datasets/son/audio/*.*.wav" -
By comparing original text and recognised text, save audio<->text pair information into ./datasets/son/alignment.json.
pip install tinytag jamo python3 -m recognition.alignment --recognition_path "./datasets/son/recognition.json" --score_threshold=0.5
pip install tinytag jamo
python3 -m recognition.alignment --recognition_path "./datasets/son/recognition.json" --score_threshold=0.5 -
Finally, generated numpy files which will be used in training.
python3 -m datasets.generate_data ./datasets/son/alignment.json
Because the automatic generation is extremely naive, the dataset is noisy. However, if you have enough datasets (20+ hours with random initialization or 5+ hours with pretrained model initialization), you can expect an acceptable quality of audio synthesis.
<svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg>3. Train a model
The important hyperparameters for a models are defined in hparams.py.
(Change cleaners in hparams.py from korean_cleaners to english_cleaners to train with English dataset)
To train a single-speaker model:
python3 train.py --data_path=datasets/son
python3 train.py --data_path=datasets/son --initialize_path=PATH_TO_CHECKPOINT
To train a multi-speaker model:
# after change `model_type` in `hparams.py` to `deepvoice` or `simple`
python3 train.py --data_path=datasets/son1,datasets/son2
To restart a training from previous experiments such as logs/son-20171015:
python3 train.py --data_path=datasets/son --load_path logs/son-20171015
If you don't have good and enough (10+ hours) dataset, it would be better to use --initialize_path to use a well-trained model as initial parameters.
4. Synthesize audio
You can train your own models with:
python3 app.py --load_path logs/son-20171015 --num_speakers=1
or generate audio directly with:
python3 synthesizer.py --load_path logs/son-20171015 --text "이거 실화냐?"
'TensorFlow OpenCV' 카테고리의 다른 글
darknet DEBUG BUILD 오류 해결 2018. 1. 16. (0) | 2019.10.05 |
---|---|
마이크로 소프트의 CNTK 설치할 때 환경 변수 지정. 2017. 12. 1. 10:48 (0) | 2019.10.05 |
PyCharm, Tensorflow, Keras 설치 2017. 11. 25. (0) | 2019.10.05 |
Tensorflow 처리 속도 비교. GTX660, GTX1080ti 2017. 10. 26. 1: (0) | 2019.10.05 |
텐서플로우 윈도우 설치할 때 고생하지 않으려면 2017. 5. 1. (0) | 2019.10.05 |