This API is not compatible with eager execution and tf.function. To migrate
to TF2, rewrite the code to be compatible with eager execution. Check the
migration
guide
on replacing Session.run calls. In Keras, session hooks can be replaced by
Callbacks e.g. logging hook notebook
For more details please read Better
performance with tf.function.
For a chief, this utility sets proper session initializer/restorer. It also
creates hooks related to checkpoint and summary saving. For workers, this
utility sets proper session creator which waits for the chief to
initialize/restore. Please check tf.compat.v1.train.MonitoredSession for
more
information.
Args
master
String the TensorFlow master to use.
is_chief
If True, it will take care of initialization and recovery the
underlying TensorFlow session. If False, it will wait on a chief to
initialize or recover the TensorFlow session.
checkpoint_dir
A string. Optional path to a directory where to restore
variables.
scaffold
A Scaffold used for gathering or building supportive ops. If not
specified, a default one is created. It's used to finalize the graph.
hooks
Optional list of SessionRunHook objects.
chief_only_hooks
list of SessionRunHook objects. Activate these hooks if
is_chief==True, ignore otherwise.
save_checkpoint_secs
The frequency, in seconds, that a checkpoint is saved
using a default checkpoint saver. If both save_checkpoint_steps and
save_checkpoint_secs are set to None, then the default checkpoint
saver isn't used. If both are provided, then only save_checkpoint_secs
is used. Default 600.
save_summaries_steps
The frequency, in number of global steps, that the
summaries are written to disk using a default summary saver. If both
save_summaries_steps and save_summaries_secs are set to None, then
the default summary saver isn't used. Default 100.
save_summaries_secs
The frequency, in secs, that the summaries are written
to disk using a default summary saver. If both save_summaries_steps and
save_summaries_secs are set to None, then the default summary saver
isn't used. Default not enabled.
Number of seconds given to threads to stop after
close() has been called.
log_step_count_steps
The frequency, in number of global steps, that the
global step/sec is logged.
max_wait_secs
Maximum time workers should wait for the session to become
available. This should be kept relatively short to help detect incorrect
code, but sometimes may need to be increased if the chief takes a while to
start up.
save_checkpoint_steps
The frequency, in number of global steps, that a
checkpoint is saved using a default checkpoint saver. If both
save_checkpoint_steps and save_checkpoint_secs are set to None, then
the default checkpoint saver isn't used. If both are provided, then only
save_checkpoint_secs is used. Default not enabled.
summary_dir
A string. Optional path to a directory where to save
summaries. If None, checkpoint_dir is used instead.
save_graph_def
Whether to save the GraphDef and MetaGraphDef to
checkpoint_dir. The GraphDef is saved after the session is created as
graph.pbtxt. MetaGraphDefs are saved out for every checkpoint as
model.ckpt-*.meta.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tf.compat.v1.train.MonitoredTrainingSession\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/tensorflow/blob/v2.16.1/tensorflow/python/training/monitored_session.py#L427-L609) |\n\nCreates a `MonitoredSession` for training. \n\n tf.compat.v1.train.MonitoredTrainingSession(\n master='',\n is_chief=True,\n checkpoint_dir=None,\n scaffold=None,\n hooks=None,\n chief_only_hooks=None,\n save_checkpoint_secs=USE_DEFAULT,\n save_summaries_steps=USE_DEFAULT,\n save_summaries_secs=USE_DEFAULT,\n config=None,\n stop_grace_period_secs=120,\n log_step_count_steps=100,\n max_wait_secs=7200,\n save_checkpoint_steps=USE_DEFAULT,\n summary_dir=None,\n save_graph_def=True\n )\n\n\u003cbr /\u003e\n\nMigrate to TF2\n--------------\n\n\u003cbr /\u003e\n\n| **Caution:** This API was designed for TensorFlow v1. Continue reading for details on how to migrate from this API to a native TensorFlow v2 equivalent. See the [TensorFlow v1 to TensorFlow v2 migration guide](https://www.tensorflow.org/guide/migrate) for instructions on how to migrate the rest of your code.\n\nThis API is not compatible with eager execution and [`tf.function`](../../../../tf/function). To migrate\nto TF2, rewrite the code to be compatible with eager execution. Check the\n[migration\nguide](https://www.tensorflow.org/guide/migrate#1_replace_v1sessionrun_calls)\non replacing `Session.run` calls. In Keras, session hooks can be replaced by\nCallbacks e.g. [logging hook notebook](https://github.com/tensorflow/docs/blob/master/site/en/guide/migrate/logging_stop_hook.ipynb)\nFor more details please read [Better\nperformance with tf.function](https://www.tensorflow.org/guide/function).\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nDescription\n-----------\n\n### Used in the notebooks\n\n| Used in the tutorials |\n|-----------------------------------------------------------------------------------------------------------------|\n| - [Exploring the TF-Hub CORD-19 Swivel Embeddings](https://www.tensorflow.org/hub/tutorials/cord_19_embeddings) |\n\nFor a chief, this utility sets proper session initializer/restorer. It also\ncreates hooks related to checkpoint and summary saving. For workers, this\nutility sets proper session creator which waits for the chief to\ninitialize/restore. Please check [`tf.compat.v1.train.MonitoredSession`](../../../../tf/compat/v1/train/MonitoredSession) for\nmore\ninformation.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `master` | `String` the TensorFlow master to use. |\n| `is_chief` | If `True`, it will take care of initialization and recovery the underlying TensorFlow session. If `False`, it will wait on a chief to initialize or recover the TensorFlow session. |\n| `checkpoint_dir` | A string. Optional path to a directory where to restore variables. |\n| `scaffold` | A `Scaffold` used for gathering or building supportive ops. If not specified, a default one is created. It's used to finalize the graph. |\n| `hooks` | Optional list of `SessionRunHook` objects. |\n| `chief_only_hooks` | list of `SessionRunHook` objects. Activate these hooks if `is_chief==True`, ignore otherwise. |\n| `save_checkpoint_secs` | The frequency, in seconds, that a checkpoint is saved using a default checkpoint saver. If both `save_checkpoint_steps` and `save_checkpoint_secs` are set to `None`, then the default checkpoint saver isn't used. If both are provided, then only `save_checkpoint_secs` is used. Default 600. |\n| `save_summaries_steps` | The frequency, in number of global steps, that the summaries are written to disk using a default summary saver. If both `save_summaries_steps` and `save_summaries_secs` are set to `None`, then the default summary saver isn't used. Default 100. |\n| `save_summaries_secs` | The frequency, in secs, that the summaries are written to disk using a default summary saver. If both `save_summaries_steps` and `save_summaries_secs` are set to `None`, then the default summary saver isn't used. Default not enabled. |\n| `config` | an instance of [`tf.compat.v1.ConfigProto`](../../../../tf/compat/v1/ConfigProto) proto used to configure the session. It's the `config` argument of constructor of [`tf.compat.v1.Session`](../../../../tf/compat/v1/Session). |\n| `stop_grace_period_secs` | Number of seconds given to threads to stop after `close()` has been called. |\n| `log_step_count_steps` | The frequency, in number of global steps, that the global step/sec is logged. |\n| `max_wait_secs` | Maximum time workers should wait for the session to become available. This should be kept relatively short to help detect incorrect code, but sometimes may need to be increased if the chief takes a while to start up. |\n| `save_checkpoint_steps` | The frequency, in number of global steps, that a checkpoint is saved using a default checkpoint saver. If both `save_checkpoint_steps` and `save_checkpoint_secs` are set to `None`, then the default checkpoint saver isn't used. If both are provided, then only `save_checkpoint_secs` is used. Default not enabled. |\n| `summary_dir` | A string. Optional path to a directory where to save summaries. If None, checkpoint_dir is used instead. |\n| `save_graph_def` | Whether to save the GraphDef and MetaGraphDef to `checkpoint_dir`. The GraphDef is saved after the session is created as `graph.pbtxt`. MetaGraphDefs are saved out for every checkpoint as `model.ckpt-*.meta`. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Returns ------- ||\n|---|---|\n| A `MonitoredSession` object. ||\n\n\u003cbr /\u003e"]]