synbols package

Submodules

synbols.data_io module

class synbols.data_io.H5Stack(file, name, n_samples, chunk_size=10, compression='gzip')

Bases: object

add(x)
synbols.data_io.add_splits(fd, split_dict, random_seed)
synbols.data_io.load_attributes_h5(file_path)

Load the dataset from h5py format

Parameters:file_path – path to the hdf5 dataset
Returns:list of length n_samples, containing a dictionary of attributes for each images splits: dict of different type of splits for this dataset. Each split is a binary array of shape (n_samples, n_subset) representing a specific partition.
Return type:attributes
synbols.data_io.load_dataset_jpeg_sequential(file_path, max_samples=None)
synbols.data_io.load_h5(file_path)

Load the dataset from h5py format

Parameters:file_path – path to the hdf5 dataset
Returns:array of shape (n_samples, width, height, n_channels), containing images mask: array of shape (n_samples, width, height, n_symbols), containing the mask of each symbol in the image attributes: list of length n_samples, containing a dictionary of attributes for each images splits: dict of different type of splits for this dataset. Each split is a list of mask for each subset.
Return type:x
synbols.data_io.load_minibatch_h5(file_path, indices)
synbols.data_io.load_npz(file_path)

Load the dataset from compressed numpy format (npz).

synbols.data_io.pack_dataset(generator)

Turn a the output of a generator of (x,y) pairs into a numpy array containing the full dataset

synbols.data_io.write_h5(file_path, dataset_generator, n_samples, split_function=None, ratios=(0.6, 0.2, 0.2), random_seed=42)
synbols.data_io.write_jpg_zip(directory, generator)

Write the dataset in a zipped directory using jpeg and json for each image.

synbols.data_io.write_npz(file_path, generator)

synbols.drawing module

class synbols.drawing.Camouflage(stroke_length=0.4, stroke_width=0.05, stroke_angle=0.7853981633974483, stroke_noise=0.02, n_stroke=500, seed=None)

Bases: synbols.drawing.RandomPattern

draw(ctxt)
set_as_source(ctxt)
to_json()
class synbols.drawing.Gradient(alpha=1, types=('radial', 'linear'), random_color=None, seed=None)

Bases: synbols.drawing.RandomPattern

Uses linear or radial graidents to render patterns.

set_as_source(ctxt)
class synbols.drawing.Image(symbols, resolution=(32, 32), background=<synbols.drawing.NoPattern object>, inverse_color=False, pixel_noise_scale=0.01, is_gray=False, max_contrast=True, seed=None)

Bases: object

High level class for genrating an image with symbols, based on attributes.

symbols

a list of objects of type Symbol

resolution

a pair of integer describing the resolution of the image. Defaults to (32, 32).

background

an object of type Pattern for rendering the background of the image. Defaults to NoPattern.

inverse_color

Boolean, specifying if the colors should be inverted. Defaults to False.

pixel_noise_scale

The standard deviation of the pixel noise. Defaults to 0.01.

max_contrast

Boolean, specifying if the image contrast should be maximized after rendering. If True, the pixel values will be linearly map to range [0, 1] within an image. Defaults to True.

seed

The random seed of an image. For the same seed, the same image will be rendered. Defaults to None.

add_symbol(symbol)
attribute_dict()
make_image()
make_mask()
class synbols.drawing.ImagePattern(root='/images', rotation=0, translation=0.0, crop=True, min_crop_size=0.2, seed=None)

Bases: synbols.drawing.RandomPattern

Uses natural images to render patterns.

Parameters:
  • root – str, Base path to search for images.
  • rotation – float, Maximum random rotation in radian, default 0.
  • translation – float, Maximum random translation in proportion, default 1.
  • crop – bool, Whether to take a random crop of the image or not, default True.
  • min_crop_size – float, Crop’s minimal proportion from the image, default 0.2.
  • seed – Optional[int], Random seed to use for transformation, default to None
draw(ctxt)
set_as_source(ctxt)
class synbols.drawing.MultiGradient(alpha=0.5, n_gradients=2, types=('radial', 'linear'), random_color=None, seed=None)

Bases: synbols.drawing.RandomPattern

Renders multiple gradient patterns at with transparency.

draw(ctxt)
set_as_source(ctxt)
class synbols.drawing.NoPattern

Bases: synbols.drawing.Pattern

draw(ctxt)
set_as_source(ctxt)
class synbols.drawing.Pattern

Bases: object

Base class for all patterns

attribute_dict()
draw(ctxt)
set_as_source(ctxt)
surface(width, height)
class synbols.drawing.RandomPattern

Bases: synbols.drawing.Pattern

Base class for patterns using a seed.

attribute_dict()
class synbols.drawing.SolidColor(color=None)

Bases: synbols.drawing.Pattern

Uses fixed color to render pattern.

draw(ctxt)
set_as_source(ctxt)
class synbols.drawing.Symbol(alphabet, char, font, foreground, is_slant, is_bold, rotation, scale, translation)

Bases: object

Class containing attributes describing each symbol

alphabet

Object of type Alphabet

char

string of 1 or more characters in the image

font

string describing the font used to draw characters

foreground

object of type Pattern, used for the foreground of the symbol

is_slant

bool describing if char is italic or not

is_bold

bool describing if char is bold or not

rotation

float, rotation angle of the text

scale

float, scale of the text. A scale of 1 will have the longest extent of the symbol cover the whole image.

translation

relative (x, y) translation of the text. A translation in the range [-1, 1] will ensure that the symbol fits entirely in the image. Note if the scale i

attribute_dict()

Returns a dict of all attributes of the symbol.

draw(ctxt)
make_mask(resolution)

Creates a grey scale image corresponding to the mask of the symbol.

synbols.drawing.color_sampler(rng=<module 'numpy.random' from '/home/docs/checkouts/readthedocs.org/user_builds/synbols/envs/latest/lib/python3.7/site-packages/numpy/random/__init__.py'>, brightness_range=(0, 1))
synbols.drawing.draw_symbol(ctxt, attributes)

Core function drawing the characters as described in attributes

Parameters:
  • ctxt – cairo context to draw the image
  • attributes – Object of type Symbol
Returns:

rectangle containing the text in the coordinate of the context extent_main_char: rectangle containing the central character in the coordinate of the context

Return type:

extent

synbols.generate module

synbols.generate.add_occlusion(attr_sampler, n_occlusion=None, occlusion_char=None, rotation=None, scale=None, translation=None, foreground=None)

Augment an attribute sampler to add occlusions over the other symbols.

Parameters:
  • attr_sampler – a callable returning an object of type drawing.Image.
  • n_occlusion – integer or a distribution over it. Specifies the number of occlusions to draw.
Defaults to Uniform([1 .. 5])
occlusion_char: string or distribution over it.
Specifies the unicode symbols used to make occlusions.
Defaults to Uniform([’■’, ‘▲’, ‘●’]).
rotation: float or distribution over it.
Rotation of the symbol in radian in the range [-pi .. pi].
Defaults to Uniform([-pi .. pi]).
scale: float or distribution over it.
Scale of the symbol. A scale of 1 will have either the
width or height cover the whole image. Defaults to
0.3* exp(Normal(0, 0.1))
translation: a pair of float or a distribution over it.
Numbers between [-1 .. 1] will make sure the symbol stays withing the image i.e. the actual translation
depends on the remaining space after the symbol is
scaled. Defaults to Uniform(-1.5, 1.5).
foreground: object of type drawing.Pattern or distribution over it.
Defines how the foreground will be rendered.

Defaults to drawing.Gradient :returns: A callable taking an optional seed as an argument and

returning an object of type drawing.Image.

synbols.generate.basic_attribute_sampler(alphabet=None, char=None, font=None, background=None, foreground=None, is_slant=None, is_bold=None, rotation=None, scale=None, translation=None, inverse_color=None, max_contrast=None, pixel_noise_scale=None, resolution=(32, 32), is_gray=False, n_symbols=1)

Returns a function that generates a new Image object on every call.

This function is the high level interface for defining a new distribution over images. On every call, it will return an drawing.Image object, containing every attributes to render the final image into a numpy array. All arguments to this function have a proper default value. When no arguments are passed, this is referred to as the “default” synbols dataset.

All arguments can be either a constant, a callable, or None. If None is passed, the default distribution is used. A callable can be used to define a distribution over the specific argument. This function must take 1 argument specifying the random number generator.

Parameters:alphabet – Object of type utils.Alphabet or a distribution over it. An alphabet can be created easily using Language.get_alphabet(). This argument is only used to specify
the default distributions over char and
fonts. If these arguments are specified, alphabet is ignored.
char: string or distribution over strings.
Defaults to Uniform(alphabet.symbols)
font: string or distribution over strings.
Defaults to Uniform(alphabet.fonts)
background: object of type drawing.Pattern or distribution over it.
Defines how the background will be rendered.
Defaults to drawing.Gradient
foreground: object of type drawing.Pattern or distribution over it.
Defines how the foreground will be rendered.
Defaults to drawing.Gradient
is_slant: bool or distribution over bool.
Defines if character is drawn italic or normal.
For wider support, this is done using the a 2D
transformation instead of relying on the font’s italic.
Defaults to Uniform{True, False}.
is_bold: bool or distribution over bool.
Whether the character is rendered in bold or not.
Note: Some fonts do not support boldd. In which case, it
will have no effect. To obtain a collection of font that
support bold, use Language.get_alphabet(… support_bold=True)
rotation: float or distribution over it.
Rotation of the symbol in radian in the range [-pi .. pi].
Defaults to Normal(0, 0.3).
scale: float or distribution over it.
Scale of the symbol. A scale of 1 will have either
the width or height cover the whole image. Defaults to
0.6* exp(Normal(0, 0.2))
translation: a pair of float or a distribution over it.
Numbers between [-1 .. 1] will make sure the symbol stays withing the image i.e. the actual translation

depends on the remaining space after the symbol is scaled. Defaults to Uniform(-1, 1).

inverse_color: bool or a distribution over it.
If True, returns 1 - pixel_value to inverse the value of all pixels. Defaults to Uniform([True, False])
max_contrast: bool or distribution over it.
If True, pixel values will be rescaled to span 0..1 inside each image. Defaults to True.
pixel_noise_scale: float or a distribution over it.
The standard deviation of the pixel noise. Defaults to 0.01.
resolution: A pair of integer.
Defines the resolution of the image. Defaults to (32, 32).
is_gray: bool.
If True, the color channels are averaged into a single channel.
Defaults to False.
n_symbols: integer or a distribution over it.
Number of symbols to rendered in the image.
All arguments that are distributions will be sampled multiple
times to provide different symbols. Defaults to 1.
Note: if the number of symbols is variable, you will have
to provide a proper mask_aggregator when calling

dataset_generator e.g. flatten_mask. :returns: A callable taking an optional seed as an argument and

returning an object of type drawing.Image.

synbols.generate.dataset_generator(attr_sampler, n_samples, mask_aggregator=None, dataset_seed=None)

High level function generating the dataset from an attribute sampler.

synbols.generate.flatten_mask(masks)
synbols.generate.flatten_mask_except_first(masks)
synbols.generate.generate_and_write_dataset(file_path, attr_sampler, n_samples, preview_shape=(10, 10), seed=None)

Call the attribute sampler n_samples time to generate a dataset and saves it on disk.

Parameters:file_path – the destination of the dataset an extension
.h5py will be automatically added.
attr_sampler: a callable returning objects of type drawing.Image. n_samples: integer specifying the number of samples required. preview_shape: pair of integers or None.
Specifies the size of the image grid to render a preview. The png
will be saved alongside the dataset.

seed: integer or None. Specifies the seed the random number generator.

synbols.generate.generate_char_grid(language, n_char, n_font, seed=None, **kwargs)

Generate a dense grid of n_char x n_font. Mainly for visualization purpose.

synbols.generate.make_preview(generator, file_name, n_row=10, n_col=10)

Augment a generator to save a preview when the first n_row * n_col images are generated.

synbols.generate.rand_seed(rng)
synbols.generate.text_generator(char_list, seed=None, **kwargs)

Generate a string of synbols. Mainly for advertisement purpose

synbols.predefined_datasets module

synbols.predefined_datasets.all_chars(n_samples, seed=None, **kwarg)

Combines the symbols of all languages (up to 200 per languages). Note: some fonts may appear rarely.

synbols.predefined_datasets.generate_balanced_font_chars_dataset(n_samples, seed=None, **kwarg)

Samples uniformly from all fonts (max 200 per alphabet) or uniformly from all symbols (max 200 per alphabet) with probability 50%.

synbols.predefined_datasets.generate_camouflage_dataset(n_samples, language='english', texture='camouflage', seed=None, **kwarg)

Generate a dataset where the pixel distribution is the same for the foreground and background.

synbols.predefined_datasets.generate_counting_dataset(n_samples, language='english', resolution=(128, 128), n_symbols=None, scale_variation=0.5, seed=None, **kwarg)

Generate 3-10 symbols at various scale. Samples ‘a’ with prob 70% or a latin lowercase otherwise.

synbols.predefined_datasets.generate_counting_dataset_crowded(n_samples, seed=None, **kwargs)

Generate 30-50 symbols at fixed scale. Samples ‘a’ with prob 70% or a latin lowercase otherwise.

synbols.predefined_datasets.generate_counting_dataset_scale_fix(n_samples, seed=None, **kwargs)

Generate 3-10 symbols at fixed scale. Samples ‘a’ with prob 70% or a latin lowercase otherwise.

synbols.predefined_datasets.generate_default_dataset(n_samples, language='english', seed=None, **kwarg)

Generate the default dataset, using gradiant as foreground and background.

synbols.predefined_datasets.generate_korean_1k_dataset(n_samples, seed=None, **kwarg)

Uses the first 1000 korean symbols

synbols.predefined_datasets.generate_large_translation(n_samples, language='english', seed=None, **kwarg)

Synbols are translated beyond the border of the image to create a cropping effect. Scale is fixed to 0.5.

synbols.predefined_datasets.generate_many_small_occlusions(n_samples, language='english', seed=None, **kwarg)

Add small occlusions on all images. Number of occlusions are sampled uniformly in [0,5).

synbols.predefined_datasets.generate_natural_images_dataset(n_samples, language='english', seed=None, **kwargs)

Same as default dataset, but uses natural images as foreground and background.

synbols.predefined_datasets.generate_non_camou_bw_dataset(n_samples, language='english', seed=None, **kwargs)

Generate a black and white dataset with the same attribute distribution as the camouflage dataset.

synbols.predefined_datasets.generate_non_camou_shade_dataset(n_samples, language='english', seed=None, **kwargs)

Generate a gradient foreground and background dataset with same attribute distribution as the camouflage dataset.

synbols.predefined_datasets.generate_pixel_noise(n_samples, language='english', seed=None, **kwarg)

Add large pixel noise with probability 0.5.

synbols.predefined_datasets.generate_plain_dataset(n_samples, language='english', seed=None, **kwargs)

Generate white on black, centered symbols. The only factors of variations are font and char.

synbols.predefined_datasets.generate_segmentation_dataset(n_samples, language='english', resolution=(128, 128), seed=None, **kwarg)

Generate 3-10 symbols of various scale and rotation and translation (no bold).

synbols.predefined_datasets.generate_solid_bg_dataset(n_samples, language='english', seed=None, **kwarg)

Same as default datasets, but uses white on black.

synbols.predefined_datasets.generate_some_large_occlusions(n_samples, language='english', seed=None, **kwarg)

With probability 20%, add a large occlusion over the existing symbol.

synbols.predefined_datasets.generate_tiny_dataset(n_samples, language='english', seed=None, **kwarg)

Generate a dataset of 8x8 resolution in gray scale with scale of 1 and minimal variations.

synbols.predefined_datasets.less_variations(n_samples, language='english', seed=None, **kwarg)

Less variations in scale and rotations. Also, no bold and no italic. This makes a more accessible font classification task.

synbols.predefined_datasets.missing_symbol_dataset(n_samples, language='english', seed=None, **kwarg)

With 10% probability, no symbols are drawn

synbols.utils module

class synbols.utils.Alphabet(name, fonts, symbols)

Bases: object

Combines fonts and symbols for a given language.

class synbols.utils.Language(locale_file, font_blacklist_dir)

Bases: object

get_alphabet(standard=True, auxiliary=True, lower=True, upper=False, support_bold=False, include_blacklisted_fonts=False)
synbols.utils.flatten_attr(attr, ctxt=None)
synbols.utils.language_map_statistics()
synbols.utils.load_all_languages(override_locale_path=None)

Loads all supported languages. Returns a dictionnary of Language objects indexed by their name.

synbols.utils.make_img_grid(x, y, h_axis='char', v_axis='font', n_row=20, n_col=40)

synbols.visualization module

synbols.visualization.plot_dataset(x, y, h_axis='char', v_axis='font', n_row=20, n_col=40, hide_axis=False)

Module contents