icarus.scenarios package¶

Submodules¶

icarus.scenarios.algorithms module¶

Various algorithms used for optimal cache placement.

pam(distances, k, n_iter=10)[source]¶

Compute k-medoids using the PAM algorithm

Parameters:	distances : 2-d NumPy array Array of distances between points k : int Number of clusters n_iter : int Number of iterations to repeat. Each repetition is executed using a different initial random assignment. Repetiting the experiment allow to reach different local optima, possibly achieving a best solution.

Notes

Implementation based on: https://github.com/salspaugh/machine_learning/blob/master/clustering/kmedoids.py

extract_cluster_level_topology(topology)[source]¶

Build a cluster-level topology.

Each node in the topology must be have the ‘cluster’ attribute

Parameters:	topology : Topology The router-level topology
Returns:	topology : Topology The cluster-level topology

Notes

Each router must have a cache deployed
All sources and receiver must have one single attachment point with a cache
Each node must be labelled with cluster

deploy_clusters(topology, clusters, assign_src_rcv=True)[source]¶

Annotate topology with cluster informations

This function checks that all ICR candidate nodes are assigned exactly to one cluster.

If assign_src_rcv is True, then it also labels source and receiver nodes to the closest cluster.

This function assumes that:

each node of the topology is either an icr_candidate, a source or a receiver
each source and receiver must have degree equal to 1

Parameters:

topology : Topology: The topology onto which deploy clusters
clusters : list of sets: Router-cluster assignment. Each element of a list is a set of node identifiers. Nodes in the same set belong to the same cluster. The length of the list therefore corresponds to the number of clusters.
assign_src_rcv : bool, optional: If True, the function labels source and receiver nodes with the cluster label of the router they are attached to.

compute_clusters(topology, k, distance='delay', nbunch=None, n_iter=10)[source]¶

Cluster nodes of a topologies as to minimize the intra-cluster latency.

This function assumes that every link is labelled with latencies and performs clustering using the k-medoids method with the PAM algorithm.

Parameters:	topology : Topology The topology k : int The number of clusters distance : str, optional The link metric used to represent distance between nodes. If None, hop count is used instead n_iter : int, optional The number of iterations

compute_p_median(distances, p, n_iter=20)[source]¶

Compute p-median solution using the Adjusted Vertex Substitution (AVS) algorithm.

Parameters:	distances : dict of dicts Distance between nodes p : int Number of facilities

icarus.scenarios.cacheplacement module¶

Cache placement strategies

This module provides algorithms for performing cache placement, i.e., given a cumulative cache size and a topology where each possible node candidate is labelled, these functions deploy caching space to the nodes of the topology.

uniform_cache_placement(topology, cache_budget, **kwargs)[source]¶

Places cache budget uniformly across cache nodes.

Parameters:	topology : Topology The topology object cache_budget : int The cumulative cache budget

degree_centrality_cache_placement(topology, cache_budget, **kwargs)[source]¶

Places cache budget proportionally to the degree of the node.

Parameters:	topology : Topology The topology object cache_budget : int The cumulative cache budget

betweenness_centrality_cache_placement(topology, cache_budget, **kwargs)[source]¶

Places cache budget proportionally to the betweenness centrality of the node.

Parameters:	topology : Topology The topology object cache_budget : int The cumulative cache budget

uniform_consolidated_cache_placement(topology, cache_budget, spread=0.5, metric_dict=None, target='top', **kwargs)[source]¶

Consolidate caches in nodes with top centrality.

Differently from other cache placement strategies that place cache space to all nodes but proportionally to their centrality, this strategy places caches of all the same size in a set of selected nodes.

Parameters:

topology : Topology: The topology object
cache_budget : int: The cumulative cache budget
spread : float [0, 1], optional: The spread factor, The greater it is the more the cache budget is spread among nodes. If it is 1, all candidate nodes are assigned a cache, if it is 0, only the node with the highest/lowest centrality is assigned a cache
metric_dict : dict, optional: The centrality metric according to which nodes are selected. If not specified, betweenness centrality is selected.
target : (“top” | “bottom”), optional: The subsection of the ranked node on which to the deploy caches.

random_cache_placement(topology, cache_budget, n_cache_nodes, seed=None, **kwargs)[source]¶

Deploy caching nodes randomly

Parameters:	topology : Topology The topology object cache_budget : int The cumulative cache budget n_nodes : int The number of caching nodes to deploy

optimal_median_cache_placement(topology, cache_budget, n_cache_nodes, hit_ratio, weight='delay', **kwargs)[source]¶

Deploy caching nodes in locations that minimize overall latency assuming a partitioned strategy (a la Google Global Cache). According to this, in the network, a set of caching nodes are deployed and each receiver is mapped to one and only one caching node. Requests from this receiver are always sent to the designated caching node. In case of cache miss requests are forwarded to the original source.

This placement problem can be mapped to the p-median location-allocation problem. This function solves this problem using the vertex substitution heuristic, which practically works like the k-medoid PAM algorithms, which is also similar to the k-means clustering algorithm. The result is not guaranteed to be globally optimal, only locally optimal.

Parameters:	topology : Topology The topology object cache_budget : int The cumulative cache budget n_nodes : int The number of caching nodes to deploy hit_ratio : float The expected cache hit ratio of a single cache weight : str The weight attribute

Notes

This placement assumes that all receivers have degree = 1 and are connected to an ICR candidate nodes. Also, it assumes that contents are uniformly assigned to sources.

optimal_hashrouting_cache_placement(topology, cache_budget, n_cache_nodes, hit_ratio, weight='delay', **kwargs)[source]¶

Deploy caching nodes for hashrouting in optimized location

Parameters:	topology : Topology The topology object cache_budget : int The cumulative cache budget n_nodes : int The number of caching nodes to deploy hit_ratio : float The expected global cache hit ratio weight : str, optional The weight attribute. Default is ‘delay’

clustered_hashrouting_cache_placement(topology, cache_budget, n_clusters, policy, distance='delay', **kwargs)[source]¶

Deploy caching nodes for hashrouting in with clusters

Parameters:	topology : Topology The topology object cache_budget : int The cumulative cache budget n_clusters : int The number of clusters policy : str (node_const \| cluster_const) The expected global cache hit ratio distance : str The attribute used to quantify distance between pairs of nodes. Default is ‘delay’

icarus.scenarios.contentplacement module¶

Content placement strategies.

This module contains function to decide the allocation of content objects to source nodes.

uniform_content_placement(topology, contents, seed=None)[source]¶

Places content objects to source nodes randomly following a uniform distribution.

Parameters:	topology : Topology The topology object contents : iterable Iterable of content objects source_nodes : list List of nodes of the topology which are content sources
Returns:	cache_placement : dict Dictionary mapping content objects to source nodes

Notes

A deterministic placement of objects (e.g., for reproducing results) can be achieved by using a fix seed value

weighted_content_placement(topology, contents, source_weights, seed=None)[source]¶

Places content objects to source nodes randomly according to the weight: of the source node.

Parameters:	topology : Topology The topology object contents : iterable Iterable of content objects source_weights : dict Dict mapping nodes nodes of the topology which are content sources and the weight according to which content placement decision is made.
Returns:	cache_placement : dict Dictionary mapping content objects to source nodes

Notes

A deterministic placement of objects (e.g., for reproducing results) can be achieved by using a fix seed value

icarus.scenarios.topology module¶

Functions for creating or importing topologies for experiments.

To create a custom topology, create a function returning an instance of the IcnTopology class. An IcnTopology is simply a subclass of a Topology class provided by FNSS.

A valid ICN topology must have the following attributes:

Each node must have one stack among: source, receiver, router
The topology must have an attribute called icr_candidates which is a set of router nodes on which a cache may be possibly deployed. Caches are not deployed directly at topology creation, instead they are deployed by a cache placement algorithm.

class IcnTopology(data=None, name='', **kwargs)[source]¶

Bases: fnss.topologies.topology.Topology

Class modelling an ICN topology

An ICN topology is a simple FNSS Topology with addition methods that return sets of caching nodes, sources and receivers.

Attributes:	name

Methods

`add_cycle`(nodes, **attr)	Add a cycle.
`add_edge`(u, v[, attr_dict])	Add an edge between u and v.
`add_edges_from`(ebunch[, attr_dict])	Add all the edges in ebunch.
`add_node`(n[, attr_dict])	Add a single node n and update node attributes.
`add_nodes_from`(nodes, **attr)	Add multiple nodes.
`add_path`(nodes, **attr)	Add a path.
`add_star`(nodes, **attr)	Add a star.
`add_weighted_edges_from`(ebunch[, weight])	Add all the edges in ebunch as weighted edges with specified weights.
`adjacency_iter`()	Return an iterator of (node, adjacency dict) tuples for all nodes.
`adjacency_list`()	Return an adjacency list representation of the graph.
`adjlist_dict_factory`	alias of `__builtin__.dict`
`applications`()	Return a dictionary of all applications deployed, keyed by node
`buffers`()	Return a dictionary of all buffer sizes, keyed by interface
`cache_nodes`()	Return a dictionary mapping nodes with a cache and respective cache size
`capacities`()	Return a dictionary of all link capacities, keyed by link
`clear`()	Remove all nodes and edges from the graph.
`copy`()	Return a copy of the topology.
`degree`([nbunch, weight])	Return the degree of a node or nodes.
`degree_iter`([nbunch, weight])	Return an iterator for (node, degree).
`delays`()	Return a dictionary of all link delays, keyed by link
`edge_attr_dict_factory`	alias of `__builtin__.dict`
`edges`([nbunch, data, default])	Return a list of edges.
`edges_iter`([nbunch, data, default])	Return an iterator over the edges.
`get_edge_data`(u, v[, default])	Return the attribute dictionary associated with edge (u,v).
`has_edge`(u, v)	Return True if the edge (u,v) is in the graph.
`has_node`(n)	Return True if the graph contains the node n.
`is_directed`()	Return True if graph is directed, False otherwise.
`is_multigraph`()	Return True if graph is a multigraph, False otherwise.
`nbunch_iter`([nbunch])	Return an iterator of nodes contained in nbunch that are also in the graph.
`neighbors`(n)	Return a list of the nodes connected to the node n.
`neighbors_iter`(n)	Return an iterator over all neighbors of node n.
`node_dict_factory`	alias of `__builtin__.dict`
`nodes`([data])	Return a list of the nodes in the graph.
`nodes_iter`([data])	Return an iterator over the nodes.
`nodes_with_selfloops`()	Return a list of nodes with self loops.
`number_of_edges`([u, v])	Return the number of edges between two nodes.
`number_of_nodes`()	Return the number of nodes in the graph.
`number_of_selfloops`()	Return the number of selfloop edges.
`order`()	Return the number of nodes in the graph.
`receivers`()	Return a set of receiver nodes
`remove_edge`(u, v)	Remove the edge between u and v.
`remove_edges_from`(ebunch)	Remove all edges specified in ebunch.
`remove_node`(n)	Remove node n.
`remove_nodes_from`(nodes)	Remove multiple nodes.
`selfloop_edges`([data, default])	Return a list of selfloop edges.
`size`([weight])	Return the number of edges.
`sources`()	Return a set of source nodes
`stacks`()	Return a dictionary of all node stacks, keyed by node
`subgraph`(nbunch)	Return the subgraph induced on nodes in nbunch.
`to_directed`()	Return a directed representation of the topology.
`to_undirected`()	Return an undirected copy of the topology.
`weights`()	Return a dictionary of all link weights, keyed by link

cache_nodes()[source]¶

Return a dictionary mapping nodes with a cache and respective cache size

Returns:	cache_nodes : dict Dictionary mapping node identifiers and cache size

receivers()[source]¶

Return a set of receiver nodes

Returns:	receivers : set Set of receiver nodes

sources()[source]¶

Return a set of source nodes

Returns:	sources : set Set of source nodes

topology_tree(k, h, delay=1, **kwargs)[source]¶

Returns a tree topology, with a source at the root, receivers at the leafs and caches at all intermediate nodes.

Parameters:	h : int The height of the tree k : int The branching factor of the tree delay : float The link delay in milliseconds
Returns:	topology : IcnTopology The topology object

topology_path(n, delay=1, **kwargs)[source]¶

Return a path topology with a receiver on node 0 and a source at node ‘n-1’

Parameters:	n : int (>=3) The number of nodes delay : float The link delay in milliseconds
Returns:	topology : IcnTopology The topology object

topology_ring(n, delay_int=1, delay_ext=5, **kwargs)[source]¶

Returns a ring topology

This topology is comprised of a ring of n nodes. Each of these nodes is attached to a receiver. In addition one router is attached to a source. Therefore, this topology has in fact 2n + 1 nodes.

It models the case of a metro ring network, with many receivers and one only source towards the core network.

Parameters:	n : int The number of routers in the ring delay_int : float The internal link delay in milliseconds delay_ext : float The external link delay in milliseconds
Returns:	topology : IcnTopology The topology object

topology_mesh(n, m, delay_int=1, delay_ext=5, **kwargs)[source]¶

Returns a ring topology

This topology is comprised of a mesh of n nodes. Each of these nodes is attached to a receiver. In addition m router are attached each to a source. Therefore, this topology has in fact 2n + m nodes.

Parameters:	n : int The number of routers in the ring m : int The number of sources delay_int : float The internal link delay in milliseconds delay_ext : float The external link delay in milliseconds
Returns:	topology : IcnTopology The topology object

topology_geant(**kwargs)[source]¶

Return a scenario based on GEANT topology

Parameters:	seed : int, optional The seed used for random number generation
Returns:	topology : fnss.Topology The topology object

topology_tiscali(**kwargs)[source]¶

Return a scenario based on Tiscali topology, parsed from RocketFuel dataset

Parameters:	seed : int, optional The seed used for random number generation
Returns:	topology : fnss.Topology The topology object

topology_wide(**kwargs)[source]¶

Return a scenario based on GARR topology

Parameters:	seed : int, optional The seed used for random number generation
Returns:	topology : fnss.Topology The topology object

topology_garr(**kwargs)[source]¶

Return a scenario based on GARR topology

Parameters:	seed : int, optional The seed used for random number generation
Returns:	topology : fnss.Topology The topology object

topology_rocketfuel_latency(asn, source_ratio=0.1, ext_delay=34, **kwargs)[source]¶

Parse a generic RocketFuel topology with annotated latencies

To each node of the parsed topology it is attached an artificial receiver node. To the routers with highest degree it is also attached a source node.

Parameters:	asn : int AS number source_ratio : float Ratio between number of source nodes (artificially attached) and routers ext_delay : float Delay on external nodes

icarus.scenarios.workload module¶

Traffic workloads

Every traffic workload to be used with Icarus must be modelled as an iterable class, i.e. a class with at least an __init__ method (through which it is initialized, with values taken from the configuration file) and an __iter__ method that is called to return a new event.

Each call to the __iter__ method must return a 2-tuple in which the first element is the timestamp at which the event occurs and the second is a dictionary, describing the event, which must contain at least the three following attributes:

receiver: The name of the node issuing the request

content: The name of the content for which the request is issued

log: A boolean value indicating whether this request should be logged or not for measurement purposes.

Each workload must expose the contents attribute which is an iterable of all content identifiers. This is needed for content placement.

class StationaryWorkload(topology, n_contents, alpha, beta=0, rate=1.0, n_warmup=100000, n_measured=400000, seed=None, **kwargs)[source]¶

Bases: object

This function generates events on the fly, i.e. instead of creating an event schedule to be kept in memory, returns an iterator that generates events when needed.

This is useful for running large schedules of events where RAM is limited as its memory impact is considerably lower.

These requests are Poisson-distributed while content popularity is Zipf-distributed

All requests are mapped to receivers uniformly unless a positive beta parameter is specified.

If a beta parameter is specified, then receivers issue requests at different rates. The algorithm used to determine the requests rates for each receiver is the following:

All receiver are sorted in decreasing order of degree of the PoP they are attached to. This assumes that all receivers have degree = 1 and are attached to a node with degree > 1

Rates are then assigned following a Zipf distribution of coefficient beta where nodes with higher-degree PoPs have a higher request rate

Parameters:

topology : fnss.Topology: The topology to which the workload refers
n_contents : int: The number of content object
alpha : float: The Zipf alpha parameter
beta : float, optional: Parameter indicating
rate : float, optional: The mean rate of requests per second
n_warmup : int, optional: The number of warmup requests (i.e. requests executed to fill cache but not logged)
n_measured : int, optional: The number of logged requests after the warmup

Returns:

events : iterator: Iterator of events. Each event is a 2-tuple where the first element is the timestamp at which the event occurs and the second element is a dictionary of event attributes.

name = 'STATIONARY'¶

class GlobetraffWorkload(topology, reqs_file, contents_file, beta=0, **kwargs)[source]¶

Bases: object

Parse requests from GlobeTraff workload generator

All requests are mapped to receivers uniformly unless a positive beta parameter is specified.

If a beta parameter is specified, then receivers issue requests at different rates. The algorithm used to determine the requests rates for each receiver is the following:

All receiver are sorted in decreasing order of degree of the PoP they are attached to. This assumes that all receivers have degree = 1 and are attached to a node with degree > 1

Rates are then assigned following a Zipf distribution of coefficient beta where nodes with higher-degree PoPs have a higher request rate

Parameters:	topology : fnss.Topology The topology to which the workload refers reqs_file : str The GlobeTraff request file contents_file : str The GlobeTraff content file beta : float, optional Spatial skewness of requests rates
Returns:	events : iterator Iterator of events. Each event is a 2-tuple where the first element is the timestamp at which the event occurs and the second element is a dictionary of event attributes.

name = 'GLOBETRAFF'¶

class TraceDrivenWorkload(topology, reqs_file, contents_file, n_contents, n_warmup, n_measured, rate=1.0, beta=0, **kwargs)[source]¶

Bases: object

Parse requests from a generic request trace.

This workload requires two text files:

a requests file, where each line corresponds to a string identifying the content requested
a contents file, which lists all unique content identifiers appearing in the requests file.

Since the trace do not provide timestamps, requests are scheduled according to a Poisson process of rate rate. All requests are mapped to receivers uniformly unless a positive beta parameter is specified.

If a beta parameter is specified, then receivers issue requests at different rates. The algorithm used to determine the requests rates for each receiver is the following:

All receiver are sorted in decreasing order of degree of the PoP they are attached to. This assumes that all receivers have degree = 1 and are attached to a node with degree > 1

Rates are then assigned following a Zipf distribution of coefficient beta where nodes with higher-degree PoPs have a higher request rate

Parameters:

topology : fnss.Topology: The topology to which the workload refers
reqs_file : str: The path to the requests file
contents_file : str: The path to the contents file
n_contents : int: The number of content object (i.e. the number of lines of contents_file)
n_warmup : int: The number of warmup requests (i.e. requests executed to fill cache but not logged)
n_measured : int: The number of logged requests after the warmup
rate : float, optional: The network-wide mean rate of requests per second
beta : float, optional: Spatial skewness of requests rates

Returns:

events : iterator: Iterator of events. Each event is a 2-tuple where the first element is the timestamp at which the event occurs and the second element is a dictionary of event attributes.

name = 'TRACE_DRIVEN'¶

class YCSBWorkload(workload, n_contents, n_warmup, n_measured, alpha=0.99, seed=None, **kwargs)[source]¶

Bases: object

Yahoo! Cloud Serving Benchmark (YCSB)

The YCSB is a set of reference workloads used to benchmark databases and, more generally any storage/caching systems. It comprises five workloads:

Workload	Operations	Record selection
A - Update heavy B - Read heavy C - Read only D - Read latest E - Short ranges	Read: 50%, Update: 50% Read: 95%, Update: 5% Read: 100% Read: 95%, Insert: 5% Scan: 95%, Insert 5%	Zipfian Zipfian Zipfian Latest Zipfian/Uniform

Notes

At the moment only workloads A, B and C are implemented, since they are the most relevant for caching systems.

name = 'YCSB'¶

Module contents¶

This package contains the code for generating simulation scenarios.