icarus.scenarios package

Submodules

icarus.scenarios.algorithms module

Various algorithms used for optimal cache placement.

pam(distances, k, n_iter=10)[source]

Compute k-medoids using the PAM algorithm

Parameters:

distances : 2-d NumPy array

Array of distances between points

k : int

Number of clusters

n_iter : int

Number of iterations to repeat. Each repetition is executed using a different initial random assignment. Repetiting the experiment allow to reach different local optima, possibly achieving a best solution.

Notes

Implementation based on: https://github.com/salspaugh/machine_learning/blob/master/clustering/kmedoids.py

extract_cluster_level_topology(topology)[source]

Build a cluster-level topology.

Each node in the topology must be have the ‘cluster’ attribute

Parameters:

topology : Topology

The router-level topology

Returns:

topology : Topology

The cluster-level topology

Notes

  • Each router must have a cache deployed
  • All sources and receiver must have one single attachment point with a cache
  • Each node must be labelled with cluster
deploy_clusters(topology, clusters, assign_src_rcv=True)[source]

Annotate topology with cluster informations

This function checks that all ICR candidate nodes are assigned exactly to one cluster.

If assign_src_rcv is True, then it also labels source and receiver nodes to the closest cluster.

This function assumes that:
  • each node of the topology is either an icr_candidate, a source or a receiver
  • each source and receiver must have degree equal to 1
Parameters:

topology : Topology

The topology onto which deploy clusters

clusters : list of sets

Router-cluster assignment. Each element of a list is a set of node identifiers. Nodes in the same set belong to the same cluster. The length of the list therefore corresponds to the number of clusters.

assign_src_rcv : bool, optional

If True, the function labels source and receiver nodes with the cluster label of the router they are attached to.

compute_clusters(topology, k, distance='delay', nbunch=None, n_iter=10)[source]

Cluster nodes of a topologies as to minimize the intra-cluster latency.

This function assumes that every link is labelled with latencies and performs clustering using the k-medoids method with the PAM algorithm.

Parameters:

topology : Topology

The topology

k : int

The number of clusters

distance : str, optional

The link metric used to represent distance between nodes. If None, hop count is used instead

n_iter : int, optional

The number of iterations

compute_p_median(distances, p, n_iter=20)[source]

Compute p-median solution using the Adjusted Vertex Substitution (AVS) algorithm.

Parameters:

distances : dict of dicts

Distance between nodes

p : int

Number of facilities

icarus.scenarios.cacheplacement module

Cache placement strategies

This module provides algorithms for performing cache placement, i.e., given a cumulative cache size and a topology where each possible node candidate is labelled, these functions deploy caching space to the nodes of the topology.

uniform_cache_placement(topology, cache_budget, **kwargs)[source]

Places cache budget uniformly across cache nodes.

Parameters:

topology : Topology

The topology object

cache_budget : int

The cumulative cache budget

degree_centrality_cache_placement(topology, cache_budget, **kwargs)[source]

Places cache budget proportionally to the degree of the node.

Parameters:

topology : Topology

The topology object

cache_budget : int

The cumulative cache budget

betweenness_centrality_cache_placement(topology, cache_budget, **kwargs)[source]

Places cache budget proportionally to the betweenness centrality of the node.

Parameters:

topology : Topology

The topology object

cache_budget : int

The cumulative cache budget

uniform_consolidated_cache_placement(topology, cache_budget, spread=0.5, metric_dict=None, target='top', **kwargs)[source]

Consolidate caches in nodes with top centrality.

Differently from other cache placement strategies that place cache space to all nodes but proportionally to their centrality, this strategy places caches of all the same size in a set of selected nodes.

Parameters:

topology : Topology

The topology object

cache_budget : int

The cumulative cache budget

spread : float [0, 1], optional

The spread factor, The greater it is the more the cache budget is spread among nodes. If it is 1, all candidate nodes are assigned a cache, if it is 0, only the node with the highest/lowest centrality is assigned a cache

metric_dict : dict, optional

The centrality metric according to which nodes are selected. If not specified, betweenness centrality is selected.

target : (“top” | “bottom”), optional

The subsection of the ranked node on which to the deploy caches.

random_cache_placement(topology, cache_budget, n_cache_nodes, seed=None, **kwargs)[source]

Deploy caching nodes randomly

Parameters:

topology : Topology

The topology object

cache_budget : int

The cumulative cache budget

n_nodes : int

The number of caching nodes to deploy

optimal_median_cache_placement(topology, cache_budget, n_cache_nodes, hit_ratio, weight='delay', **kwargs)[source]

Deploy caching nodes in locations that minimize overall latency assuming a partitioned strategy (a la Google Global Cache). According to this, in the network, a set of caching nodes are deployed and each receiver is mapped to one and only one caching node. Requests from this receiver are always sent to the designated caching node. In case of cache miss requests are forwarded to the original source.

This placement problem can be mapped to the p-median location-allocation problem. This function solves this problem using the vertex substitution heuristic, which practically works like the k-medoid PAM algorithms, which is also similar to the k-means clustering algorithm. The result is not guaranteed to be globally optimal, only locally optimal.

Parameters:

topology : Topology

The topology object

cache_budget : int

The cumulative cache budget

n_nodes : int

The number of caching nodes to deploy

hit_ratio : float

The expected cache hit ratio of a single cache

weight : str

The weight attribute

Notes

This placement assumes that all receivers have degree = 1 and are connected to an ICR candidate nodes. Also, it assumes that contents are uniformly assigned to sources.

optimal_hashrouting_cache_placement(topology, cache_budget, n_cache_nodes, hit_ratio, weight='delay', **kwargs)[source]

Deploy caching nodes for hashrouting in optimized location

Parameters:

topology : Topology

The topology object

cache_budget : int

The cumulative cache budget

n_nodes : int

The number of caching nodes to deploy

hit_ratio : float

The expected global cache hit ratio

weight : str, optional

The weight attribute. Default is ‘delay’

clustered_hashrouting_cache_placement(topology, cache_budget, n_clusters, policy, distance='delay', **kwargs)[source]

Deploy caching nodes for hashrouting in with clusters

Parameters:

topology : Topology

The topology object

cache_budget : int

The cumulative cache budget

n_clusters : int

The number of clusters

policy : str (node_const | cluster_const)

The expected global cache hit ratio

distance : str

The attribute used to quantify distance between pairs of nodes. Default is ‘delay’

icarus.scenarios.contentplacement module

Content placement strategies.

This module contains function to decide the allocation of content objects to source nodes.

uniform_content_placement(topology, contents, seed=None)[source]
Places content objects to source nodes randomly following a uniform
distribution.
Parameters:

topology : Topology

The topology object

contents : iterable

Iterable of content objects

source_nodes
: list

List of nodes of the topology which are content sources

Returns:

cache_placement : dict

Dictionary mapping content objects to source nodes

Notes

A deterministic placement of objects (e.g., for reproducing results) can be achieved by using a fix seed value

weighted_content_placement(topology, contents, source_weights, seed=None)[source]
Places content objects to source nodes randomly according to the weight
of the source node.
Parameters:

topology : Topology

The topology object

contents : iterable

Iterable of content objects

source_weights
: dict

Dict mapping nodes nodes of the topology which are content sources and the weight according to which content placement decision is made.

Returns:

cache_placement : dict

Dictionary mapping content objects to source nodes

Notes

A deterministic placement of objects (e.g., for reproducing results) can be achieved by using a fix seed value

icarus.scenarios.topology module

Functions for creating or importing topologies for experiments.

To create a custom topology, create a function returning an instance of the IcnTopology class. An IcnTopology is simply a subclass of a Topology class provided by FNSS.

A valid ICN topology must have the following attributes:
  • Each node must have one stack among: source, receiver, router
  • The topology must have an attribute called icr_candidates which is a set of router nodes on which a cache may be possibly deployed. Caches are not deployed directly at topology creation, instead they are deployed by a cache placement algorithm.
class IcnTopology(data=None, name='', **kwargs)[source]

Bases: fnss.topologies.topology.Topology

Class modelling an ICN topology

An ICN topology is a simple FNSS Topology with addition methods that return sets of caching nodes, sources and receivers.

Attributes

name

Methods

add_cycle(nodes, **attr) Add a cycle.
add_edge(u, v[, attr_dict]) Add an edge between u and v.
add_edges_from(ebunch[, attr_dict]) Add all the edges in ebunch.
add_node(n[, attr_dict]) Add a single node n and update node attributes.
add_nodes_from(nodes, **attr) Add multiple nodes.
add_path(nodes, **attr) Add a path.
add_star(nodes, **attr) Add a star.
add_weighted_edges_from(ebunch[, weight]) Add all the edges in ebunch as weighted edges with specified weights.
adjacency_iter() Return an iterator of (node, adjacency dict) tuples for all nodes.
adjacency_list() Return an adjacency list representation of the graph.
adjlist_dict_factory alias of dict
applications() Return a dictionary of all applications deployed, keyed by node
buffers() Return a dictionary of all buffer sizes, keyed by interface
cache_nodes() Return a dictionary mapping nodes with a cache and respective cache
capacities() Return a dictionary of all link capacities, keyed by link
clear() Remove all nodes and edges from the graph.
copy() Return a copy of the topology.
degree([nbunch, weight]) Return the degree of a node or nodes.
degree_iter([nbunch, weight]) Return an iterator for (node, degree).
delays() Return a dictionary of all link delays, keyed by link
edge_attr_dict_factory alias of dict
edges([nbunch, data, default]) Return a list of edges.
edges_iter([nbunch, data, default]) Return an iterator over the edges.
get_edge_data(u, v[, default]) Return the attribute dictionary associated with edge (u,v).
has_edge(u, v) Return True if the edge (u,v) is in the graph.
has_node(n) Return True if the graph contains the node n.
is_directed() Return True if graph is directed, False otherwise.
is_multigraph() Return True if graph is a multigraph, False otherwise.
nbunch_iter([nbunch]) Return an iterator of nodes contained in nbunch that are also in the graph.
neighbors(n) Return a list of the nodes connected to the node n.
neighbors_iter(n) Return an iterator over all neighbors of node n.
node_dict_factory alias of dict
nodes([data]) Return a list of the nodes in the graph.
nodes_iter([data]) Return an iterator over the nodes.
nodes_with_selfloops() Return a list of nodes with self loops.
number_of_edges([u, v]) Return the number of edges between two nodes.
number_of_nodes() Return the number of nodes in the graph.
number_of_selfloops() Return the number of selfloop edges.
order() Return the number of nodes in the graph.
receivers() Return a set of receiver nodes
remove_edge(u, v) Remove the edge between u and v.
remove_edges_from(ebunch) Remove all edges specified in ebunch.
remove_node(n) Remove node n.
remove_nodes_from(nodes) Remove multiple nodes.
selfloop_edges([data, default]) Return a list of selfloop edges.
size([weight]) Return the number of edges.
sources() Return a set of source nodes
stacks() Return a dictionary of all node stacks, keyed by node
subgraph(nbunch) Return the subgraph induced on nodes in nbunch.
to_directed() Return a directed representation of the topology.
to_undirected() Return an undirected copy of the topology.
weights() Return a dictionary of all link weights, keyed by link
cache_nodes()[source]

Return a dictionary mapping nodes with a cache and respective cache size

Returns:

cache_nodes : dict

Dictionary mapping node identifiers and cache size

receivers()[source]

Return a set of receiver nodes

Returns:

receivers : set

Set of receiver nodes

sources()[source]

Return a set of source nodes

Returns:

sources : set

Set of source nodes

topology_tree(k, h, delay=1, **kwargs)[source]

Returns a tree topology, with a source at the root, receivers at the leafs and caches at all intermediate nodes.

Parameters:

h : int

The height of the tree

k : int

The branching factor of the tree

delay : float

The link delay in milliseconds

Returns:

topology : IcnTopology

The topology object

topology_path(n, delay=1, **kwargs)[source]

Return a path topology with a receiver on node 0 and a source at node ‘n-1’

Parameters:

n : int (>=3)

The number of nodes

delay : float

The link delay in milliseconds

Returns:

topology : IcnTopology

The topology object

topology_ring(n, delay_int=1, delay_ext=5, **kwargs)[source]

Returns a ring topology

This topology is comprised of a ring of n nodes. Each of these nodes is attached to a receiver. In addition one router is attached to a source. Therefore, this topology has in fact 2n + 1 nodes.

It models the case of a metro ring network, with many receivers and one only source towards the core network.

Parameters:

n : int

The number of routers in the ring

delay_int : float

The internal link delay in milliseconds

delay_ext : float

The external link delay in milliseconds

Returns:

topology : IcnTopology

The topology object

topology_mesh(n, m, delay_int=1, delay_ext=5, **kwargs)[source]

Returns a ring topology

This topology is comprised of a mesh of n nodes. Each of these nodes is attached to a receiver. In addition m router are attached each to a source. Therefore, this topology has in fact 2n + m nodes.

Parameters:

n : int

The number of routers in the ring

m : int

The number of sources

delay_int : float

The internal link delay in milliseconds

delay_ext : float

The external link delay in milliseconds

Returns:

topology : IcnTopology

The topology object

topology_geant(**kwargs)[source]

Return a scenario based on GEANT topology

Parameters:

seed : int, optional

The seed used for random number generation

Returns:

topology : fnss.Topology

The topology object

topology_tiscali(**kwargs)[source]

Return a scenario based on Tiscali topology, parsed from RocketFuel dataset

Parameters:

seed : int, optional

The seed used for random number generation

Returns:

topology : fnss.Topology

The topology object

topology_wide(**kwargs)[source]

Return a scenario based on GARR topology

Parameters:

seed : int, optional

The seed used for random number generation

Returns:

topology : fnss.Topology

The topology object

topology_garr(**kwargs)[source]

Return a scenario based on GARR topology

Parameters:

seed : int, optional

The seed used for random number generation

Returns:

topology : fnss.Topology

The topology object

topology_rocketfuel_latency(asn, source_ratio=0.1, ext_delay=34, **kwargs)[source]

Parse a generic RocketFuel topology with annotated latencies

To each node of the parsed topology it is attached an artificial receiver node. To the routers with highest degree it is also attached a source node.

Parameters:

asn : int

AS number

source_ratio : float

Ratio between number of source nodes (artificially attached) and routers

ext_delay : float

Delay on external nodes

icarus.scenarios.workload module

Traffic workloads

Every traffic workload to be used with Icarus must be modelled as an iterable class, i.e. a class with at least an __init__ method (through which it is initialized, with values taken from the configuration file) and an __iter__ method that is called to return a new event.

Each call to the __iter__ method must return a 2-tuple in which the first element is the timestamp at which the event occurs and the second is a dictionary, describing the event, which must contain at least the three following attributes:

  • receiver: The name of the node issuing the request
  • content: The name of the content for which the request is issued
  • log: A boolean value indicating whether this request should be logged or not for measurement purposes.

Each workload must expose the contents attribute which is an iterable of all content identifiers. This is needed for content placement.

class StationaryWorkload(topology, n_contents, alpha, beta=0, rate=1.0, n_warmup=100000, n_measured=400000, seed=None, **kwargs)[source]

Bases: object

This function generates events on the fly, i.e. instead of creating an event schedule to be kept in memory, returns an iterator that generates events when needed.

This is useful for running large schedules of events where RAM is limited as its memory impact is considerably lower.

These requests are Poisson-distributed while content popularity is Zipf-distributed

All requests are mapped to receivers uniformly unless a positive beta parameter is specified.

If a beta parameter is specified, then receivers issue requests at different rates. The algorithm used to determine the requests rates for each receiver is the following:

  • All receiver are sorted in decreasing order of degree of the PoP they are attached to. This assumes that all receivers have degree = 1 and are attached to a node with degree > 1
  • Rates are then assigned following a Zipf distribution of coefficient beta where nodes with higher-degree PoPs have a higher request rate
Parameters:

topology : fnss.Topology

The topology to which the workload refers

n_contents : int

The number of content object

alpha : float

The Zipf alpha parameter

beta : float, optional

Parameter indicating

rate : float, optional

The mean rate of requests per second

n_warmup : int, optional

The number of warmup requests (i.e. requests executed to fill cache but not logged)

n_measured : int, optional

The number of logged requests after the warmup

Returns:

events : iterator

Iterator of events. Each event is a 2-tuple where the first element is the timestamp at which the event occurs and the second element is a dictionary of event attributes.

name = 'STATIONARY'
class GlobetraffWorkload(topology, reqs_file, contents_file, beta=0, **kwargs)[source]

Bases: object

Parse requests from GlobeTraff workload generator

All requests are mapped to receivers uniformly unless a positive beta parameter is specified.

If a beta parameter is specified, then receivers issue requests at different rates. The algorithm used to determine the requests rates for each receiver is the following:

  • All receiver are sorted in decreasing order of degree of the PoP they are attached to. This assumes that all receivers have degree = 1 and are attached to a node with degree > 1
  • Rates are then assigned following a Zipf distribution of coefficient beta where nodes with higher-degree PoPs have a higher request rate
Parameters:

topology : fnss.Topology

The topology to which the workload refers

reqs_file : str

The GlobeTraff request file

contents_file : str

The GlobeTraff content file

beta : float, optional

Spatial skewness of requests rates

Returns:

events : iterator

Iterator of events. Each event is a 2-tuple where the first element is the timestamp at which the event occurs and the second element is a dictionary of event attributes.

name = 'GLOBETRAFF'
class TraceDrivenWorkload(topology, reqs_file, contents_file, n_contents, n_warmup, n_measured, rate=1.0, beta=0, **kwargs)[source]

Bases: object

Parse requests from a generic request trace.

This workload requires two text files:
  • a requests file, where each line corresponds to a string identifying the content requested
  • a contents file, which lists all unique content identifiers appearing in the requests file.

Since the trace do not provide timestamps, requests are scheduled according to a Poisson process of rate rate. All requests are mapped to receivers uniformly unless a positive beta parameter is specified.

If a beta parameter is specified, then receivers issue requests at different rates. The algorithm used to determine the requests rates for each receiver is the following:

  • All receiver are sorted in decreasing order of degree of the PoP they are attached to. This assumes that all receivers have degree = 1 and are attached to a node with degree > 1
  • Rates are then assigned following a Zipf distribution of coefficient beta where nodes with higher-degree PoPs have a higher request rate
Parameters:

topology : fnss.Topology

The topology to which the workload refers

reqs_file : str

The path to the requests file

contents_file : str

The path to the contents file

n_contents : int

The number of content object (i.e. the number of lines of contents_file)

n_warmup : int

The number of warmup requests (i.e. requests executed to fill cache but not logged)

n_measured : int

The number of logged requests after the warmup

rate : float, optional

The network-wide mean rate of requests per second

beta : float, optional

Spatial skewness of requests rates

Returns:

events : iterator

Iterator of events. Each event is a 2-tuple where the first element is the timestamp at which the event occurs and the second element is a dictionary of event attributes.

name = 'TRACE_DRIVEN'
class YCSBWorkload(workload, n_contents, n_warmup, n_measured, alpha=0.99, seed=None, **kwargs)[source]

Bases: object

Yahoo! Cloud Serving Benchmark (YCSB)

The YCSB is a set of reference workloads used to benchmark databases and, more generally any storage/caching systems. It comprises five workloads:

Workload Operations Record selection
A - Update heavy B - Read heavy C - Read only D - Read latest E - Short ranges Read: 50%, Update: 50% Read: 95%, Update: 5% Read: 100% Read: 95%, Insert: 5% Scan: 95%, Insert 5% Zipfian Zipfian Zipfian Latest Zipfian/Uniform

Notes

At the moment only workloads A, B and C are implemented, since they are the most relevant for caching systems.

name = 'YCSB'

Module contents

This package contains the code for generating simulation scenarios.