Tensorflow Restore Only Some Variables Then Save Again

The phrase "Saving a TensorFlow model" typically ways 1 of two things:

  1. Checkpoints, OR
  2. SavedModel.

Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints practice not comprise any description of the computation defined by the model and thus are typically only useful when source code that volition use the saved parameter values is available.

The SavedModel format on the other paw includes a serialized clarification of the computation divers by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source lawmaking that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Calorie-free, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).

This guide covers APIs for writing and reading checkpoints.

Setup

          import tensorflow every bit tf                  
          form Cyberspace(tf.keras.Model):   """A simple linear model."""    def __init__(self):     super(Net, self).__init__()     self.l1 = tf.keras.layers.Dense(v)    def call(self, x):     render self.l1(x)                  
          net = Net()                  

Saving from tf.keras training APIs

See the tf.keras guide on saving and restoring.

tf.keras.Model.save_weights saves a TensorFlow checkpoint.

          cyberspace.save_weights('easy_checkpoint')                  

Writing checkpoints

The persistent land of a TensorFlow model is stored in tf.Variable objects. These can be constructed direct, but are oftentimes created through high-level APIs like tf.keras.layers or tf.keras.Model.

The easiest way to manage variables is by attaching them to Python objects, then referencing those objects.

Subclasses of tf.train.Checkpoint, tf.keras.layers.Layer, and tf.keras.Model automatically track variables assigned to their attributes. The following example constructs a simple linear model, and then writes checkpoints which contain values for all of the model'due south variables.

You tin easily save a model-checkpoint with Model.save_weights.

Manual checkpointing

Setup

To help demonstrate all the features of tf.train.Checkpoint, define a toy dataset and optimization step:

          def toy_dataset():   inputs = tf.range(x.)[:, None]   labels = inputs * 5. + tf.range(five.)[None, :]   return tf.information.Dataset.from_tensor_slices(     dict(10=inputs, y=labels)).repeat().batch(ii)                  
          def train_step(net, instance, optimizer):   """Trains `internet` on `example` using `optimizer`."""   with tf.GradientTape() as tape:     output = net(example['ten'])     loss = tf.reduce_mean(tf.abs(output - example['y']))   variables = net.trainable_variables   gradients = tape.gradient(loss, variables)   optimizer.apply_gradients(goose egg(gradients, variables))   return loss                  

Create the checkpoint objects

Utilize a tf.train.Checkpoint object to manually create a checkpoint, where the objects you want to checkpoint are set every bit attributes on the object.

A tf.train.CheckpointManager can likewise be helpful for managing multiple checkpoints.

          opt = tf.keras.optimizers.Adam(0.1) dataset = toy_dataset() iterator = iter(dataset) ckpt = tf.train.Checkpoint(step=tf.Variable(1), optimizer=opt, net=cyberspace, iterator=iterator) manager = tf.railroad train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=three)                  

Train and checkpoint the model

The following preparation loop creates an instance of the model and of an optimizer, then gathers them into a tf.train.Checkpoint object. It calls the training step in a loop on each batch of data, and periodically writes checkpoints to disk.

          def train_and_checkpoint(net, manager):   ckpt.restore(director.latest_checkpoint)   if manager.latest_checkpoint:     print("Restored from {}".format(manager.latest_checkpoint))   else:     print("Initializing from scratch.")    for _ in range(50):     example = next(iterator)     loss = train_step(internet, example, opt)     ckpt.step.assign_add(1)     if int(ckpt.step) % 10 == 0:       save_path = managing director.save()       impress("Saved checkpoint for footstep {}: {}".format(int(ckpt.step), save_path))       print("loss {:i.2f}".format(loss.numpy()))                  
          train_and_checkpoint(net, director)                  
Initializing from scratch. Saved checkpoint for footstep ten: ./tf_ckpts/ckpt-1 loss 32.44 Saved checkpoint for pace 20: ./tf_ckpts/ckpt-2 loss 25.86 Saved checkpoint for step 30: ./tf_ckpts/ckpt-3 loss 19.thirty Saved checkpoint for footstep 40: ./tf_ckpts/ckpt-4 loss 12.79 Saved checkpoint for step 50: ./tf_ckpts/ckpt-five loss six.51        

Restore and proceed preparation

Later the first training bicycle you tin pass a new model and manager, merely pick up training exactly where you left off:

          opt = tf.keras.optimizers.Adam(0.1) internet = Net() dataset = toy_dataset() iterator = iter(dataset) ckpt = tf.train.Checkpoint(step=tf.Variable(ane), optimizer=opt, net=net, iterator=iterator) director = tf.train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=3)  train_and_checkpoint(net, manager)                  
Restored from ./tf_ckpts/ckpt-5 Saved checkpoint for step sixty: ./tf_ckpts/ckpt-6 loss 1.58 Saved checkpoint for step 70: ./tf_ckpts/ckpt-seven loss 0.88 Saved checkpoint for step 80: ./tf_ckpts/ckpt-viii loss 0.54 Saved checkpoint for pace 90: ./tf_ckpts/ckpt-9 loss 0.43 Saved checkpoint for footstep 100: ./tf_ckpts/ckpt-ten loss 0.23        

The tf.train.CheckpointManager object deletes sometime checkpoints. Above it's configured to proceed only the three most contempo checkpoints.

          impress(managing director.checkpoints)  # List the three remaining checkpoints                  
['./tf_ckpts/ckpt-eight', './tf_ckpts/ckpt-9', './tf_ckpts/ckpt-10']        

These paths, e.thou. './tf_ckpts/ckpt-10', are not files on deejay. Instead they are prefixes for an index file and 1 or more than data files which contain the variable values. These prefixes are grouped together in a single checkpoint file ('./tf_ckpts/checkpoint') where the CheckpointManager saves its state.

          ls ./tf_ckpts        
checkpoint           ckpt-8.data-00000-of-00001  ckpt-9.index ckpt-10.data-00000-of-00001  ckpt-8.index ckpt-10.index            ckpt-nine.data-00000-of-00001        

Loading mechanics

TensorFlow matches variables to checkpointed values past traversing a directed graph with named edges, starting from the object existence loaded. Edge names typically come from attribute names in objects, for example the "l1" in cocky.l1 = tf.keras.layers.Dense(five). tf.railroad train.Checkpoint uses its keyword argument names, equally in the "step" in tf.train.Checkpoint(step=...).

The dependency graph from the example above looks like this:

Visualization of the dependency graph for the example training loop

The optimizer is in red, regular variables are in bluish, and the optimizer slot variables are in orange. The other nodes—for example, representing the tf.railroad train.Checkpoint—are in blackness.

Slot variables are office of the optimizer's country, simply are created for a specific variable. For example, the 'm' edges higher up stand for to momentum, which the Adam optimizer tracks for each variable. Slot variables are only saved in a checkpoint if the variable and the optimizer would both exist saved, thus the dashed edges.

Calling restore on a tf.train.Checkpoint object queues the requested restorations, restoring variable values as soon as there's a matching path from the Checkpoint object. For instance, yous tin load merely the bias from the model you divers to a higher place by reconstructing one path to it through the network and the layer.

          to_restore = tf.Variable(tf.zeros([5])) print(to_restore.numpy())  # All zeros fake_layer = tf.railroad train.Checkpoint(bias=to_restore) fake_net = tf.railroad train.Checkpoint(l1=fake_layer) new_root = tf.train.Checkpoint(net=fake_net) status = new_root.restore(tf.train.latest_checkpoint('./tf_ckpts/')) print(to_restore.numpy())  # This gets the restored value.                  
[0. 0. 0. 0. 0.] [three.2460225 3.2595956 iii.360168  4.5620303 4.827786 ]        

The dependency graph for these new objects is a much smaller subgraph of the larger checkpoint you wrote to a higher place. Information technology includes simply the bias and a save counter that tf.railroad train.Checkpoint uses to number checkpoints.

Visualization of a subgraph for the bias variable

restore returns a status object, which has optional assertions. All of the objects created in the new Checkpoint accept been restored, and so status.assert_existing_objects_matched passes.

          status.assert_existing_objects_matched()                  
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7ff53842c150>        

There are many objects in the checkpoint which haven't matched, including the layer's kernel and the optimizer's variables. status.assert_consumed simply passes if the checkpoint and the program match exactly, and would throw an exception hither.

Deferred restorations

Layer objects in TensorFlow may defer the cosmos of variables to their offset call, when input shapes are bachelor. For example, the shape of a Dense layer's kernel depends on both the layer's input and output shapes, and so the output shape required equally a constructor argument is not enough data to create the variable on its own. Since calling a Layer also reads the variable's value, a restore must happen between the variable'due south cosmos and its outset use.

To support this idiom, tf.train.Checkpoint defers restores which don't yet accept a matching variable.

          deferred_restore = tf.Variable(tf.zeros([ane, 5])) print(deferred_restore.numpy())  # Not restored; withal zeros fake_layer.kernel = deferred_restore print(deferred_restore.numpy())  # Restored                  
[[0. 0. 0. 0. 0.]] [[4.4692097 4.6525683 4.7541327 iv.786906  4.8251786]]        

Manually inspecting checkpoints

tf.train.load_checkpoint returns a CheckpointReader that gives lower level access to the checkpoint contents. It contains mappings from each variable's key, to the shape and dtype for each variable in the checkpoint. A variable's key is its object path, similar in the graphs displayed above.

          reader = tf.railroad train.load_checkpoint('./tf_ckpts/') shape_from_key = reader.get_variable_to_shape_map() dtype_from_key = reader.get_variable_to_dtype_map()  sorted(shape_from_key.keys())                  
['_CHECKPOINTABLE_OBJECT_GRAPH',  'iterator/.ATTRIBUTES/ITERATOR_STATE',  'net/l1/bias/.ATTRIBUTES/VARIABLE_VALUE',  'net/l1/bias/.OPTIMIZER_SLOT/optimizer/g/.ATTRIBUTES/VARIABLE_VALUE',  'internet/l1/bias/.OPTIMIZER_SLOT/optimizer/five/.ATTRIBUTES/VARIABLE_VALUE',  'net/l1/kernel/.ATTRIBUTES/VARIABLE_VALUE',  'net/l1/kernel/.OPTIMIZER_SLOT/optimizer/m/.ATTRIBUTES/VARIABLE_VALUE',  'net/l1/kernel/.OPTIMIZER_SLOT/optimizer/v/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/beta_1/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/beta_2/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/decay/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/iter/.ATTRIBUTES/VARIABLE_VALUE',  'optimizer/learning_rate/.ATTRIBUTES/VARIABLE_VALUE',  'save_counter/.ATTRIBUTES/VARIABLE_VALUE',  'stride/.ATTRIBUTES/VARIABLE_VALUE']        

So if y'all're interested in the value of internet.l1.kernel you lot can become the value with the following code:

          key = 'net/l1/kernel/.ATTRIBUTES/VARIABLE_VALUE'  print("Shape:", shape_from_key[key]) print("Dtype:", dtype_from_key[fundamental].name)                  
Shape: [one, v] Dtype: float32        

It also provides a get_tensor method assuasive you to inspect the value of a variable:

          reader.get_tensor(key)                  
assortment([[iv.4692097, iv.6525683, 4.7541327, 4.786906 , iv.8251786]],       dtype=float32)        

Object tracking

Checkpoints salvage and restore the values of tf.Variable objects past "tracking" any variable or trackable object set in one of its attributes. When executing a salve, variables are gathered recursively from all of the reachable tracked objects.

As with direct attribute assignments like self.l1 = tf.keras.layers.Dense(5), assigning lists and dictionaries to attributes volition track their contents.

          relieve = tf.train.Checkpoint() save.listed = [tf.Variable(i.)] salvage.listed.append(tf.Variable(ii.)) save.mapped = {'one': save.listed[0]} relieve.mapped['two'] = save.listed[1] save_path = salve.salvage('./tf_list_example')  restore = tf.train.Checkpoint() v2 = tf.Variable(0.) assert 0. == v2.numpy()  # Not restored notwithstanding restore.mapped = {'two': v2} restore.restore(save_path) assert 2. == v2.numpy()                  

You may notice wrapper objects for lists and dictionaries. These wrappers are checkpointable versions of the underlying data-structures. Merely like the attribute based loading, these wrappers restore a variable'due south value as soon as it's added to the container.

          restore.listed = [] print(restore.listed)  # ListWrapper([]) v1 = tf.Variable(0.) restore.listed.suspend(v1)  # Restores v1, from restore() in the previous cell assert 1. == v1.numpy()                  
ListWrapper([])        

Trackable objects include tf.railroad train.Checkpoint, tf.Module and its subclasses (eastward.g. keras.layers.Layer and keras.Model), and recognized Python containers:

  • dict (and collections.OrderedDict)
  • list
  • tuple (and collections.namedtuple, typing.NamedTuple)

Other container types are not supported, including:

  • collections.defaultdict
  • set

All other Python objects are ignored, including:

  • int
  • cord
  • float

Summary

TensorFlow objects provide an easy automatic machinery for saving and restoring the values of variables they utilize.

foypurmse49.blogspot.com

Source: https://www.tensorflow.org/guide/checkpoint

0 Response to "Tensorflow Restore Only Some Variables Then Save Again"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel