본문 바로가기
TensorFlow OpenCV

tensorflow 타코트론~ 핫핫핫. 잘 안 됩니다. 후히히 2017. 11. 19.

by BABEL-II 2019. 10. 5.

tensorflow 버전 지정해서 설치하는 경우

CUDA9.0이 설치가 안 돼서,

하는 수 없이 tensorflow 버전을 내리게 됐다.

 

C:> conda create -n tensorflow1.1 python=3.5

C:> activate tensorflow1.1

(tensorflow1.1) 

C:> pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl

 

 

원본 주소는 요기 : https://github.com/carpedm20/multi-Speaker-tacotron-tensorflow

1. Install prerequisites

After preparing Tensorflow, install prerequisites with:

Click "Anaconda Prompt"

activate tensorflow nltk가 없다면 pip install nltkpip3 install -r requirements.txt python -c "import nltk; nltk.download('punkt')"

If you want to synthesize a speech in Korean dicrectly, follow 2-3. Download pre-trained models.

<svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg>2-1. Generate custom datasets

The datasets directory should look like:

datasets ├── son │ ├── alignment.json │ └── audio │ ├── 1.mp3 │ ├── 2.mp3 │ ├── 3.mp3 │ └── ... └── YOUR_DATASET ├── alignment.json └── audio ├── 1.mp3 ├── 2.mp3 ├── 3.mp3 └── ...

and YOUR_DATASET/alignment.json should look like:

{ "./datasets/YOUR_DATASET/audio/001.mp3": "My name is Taehoon Kim.", "./datasets/YOUR_DATASET/audio/002.mp3": "The buses aren't the problem.", "./datasets/YOUR_DATASET/audio/003.mp3": "They have discovered a new particle.", }

After you prepare as described, you should genearte preprocessed data with:

python3 -m datasets.generate_data ./datasets/YOUR_DATASET/alignment.json

<svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg>2-2. Generate Korean datasets

Follow below commands. (explain with son dataset)

  1. To automate an alignment between sounds and texts, prepare GOOGLE_APPLICATION_CREDENTIALS to use Google Speech Recognition API. To get credentials, read this.

    export GOOGLE_APPLICATION_CREDENTIALS="YOUR-GOOGLE.CREDENTIALS.json"
  2. https://developers.google.com/identity/protocols/application-default-credentials

    Google Application Default Credentials  |  Google Identity Platform  ...

    Google Application Default Credentials The Application Default Credentials provide a simple way to get authorization credentials ...

    developers.google.com

    여기에서 사용자 가입하고, https://cloud.google.com/sdk/docs/quickstart-windows 이걸로 cloud SDK 설치하고,  gcloud auth application-default login 명령을 수행해야 한다.

    Google Application Default Credentials  |  Google Identity Platform  ...

    Google Application Default Credentials The Application Default Credentials provide a simple way to get authorization credentials ...

    developers.google.com

     

  3. Download speech(or video) and text.

    pip install m3u8 requests bs4 python3 -m datasets.son.download
  4. Segment all audios on silence.

    pip install pydub python3 -m audio.silence --audio_pattern "./datasets/son/audio/*.wav" --method=pydub
  5. By using Google Speech Recognition API, we predict sentences for all segmented audios.

    pip install google.cloud python3 -m recognition.google --audio_pattern "./datasets/son/audio/*.*.wav"
  6. By comparing original text and recognised text, save audio<->text pair information into ./datasets/son/alignment.json.
    pip install tinytag jamo
    python3 -m recognition.alignment --recognition_path "./datasets/son/recognition.json" --score_threshold=0.5

    pip install tinytag jamo python3 -m recognition.alignment --recognition_path "./datasets/son/recognition.json" --score_threshold=0.5
  7. Finally, generated numpy files which will be used in training.

    python3 -m datasets.generate_data ./datasets/son/alignment.json

Because the automatic generation is extremely naive, the dataset is noisy. However, if you have enough datasets (20+ hours with random initialization or 5+ hours with pretrained model initialization), you can expect an acceptable quality of audio synthesis.

<svg aria-hidden="true" class="octicon octicon-link" height="16" version="1.1" viewbox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg>3. Train a model

The important hyperparameters for a models are defined in hparams.py.

(Change cleaners in hparams.py from korean_cleaners to english_cleaners to train with English dataset)

To train a single-speaker model:
python3 train.py --data_path=datasets/son
python3 train.py --data_path=datasets/son --initialize_path=PATH_TO_CHECKPOINT

To train a multi-speaker model:
# after change `model_type` in `hparams.py` to `deepvoice` or `simple`
python3 train.py --data_path=datasets/son1,datasets/son2

To restart a training from previous experiments such as logs/son-20171015:
python3 train.py --data_path=datasets/son --load_path logs/son-20171015

If you don't have good and enough (10+ hours) dataset, it would be better to use --initialize_path to use a well-trained model as initial parameters.


4. Synthesize audio

You can train your own models with:

python3 app.py --load_path logs/son-20171015 --num_speakers=1

or generate audio directly with:

python3 synthesizer.py --load_path logs/son-20171015 --text "이거 실화냐?"