icarus.tools package

Submodules

icarus.tools.cacheperf module

Functions for modelling and evaluating the performance of cache replacement policies.

che_characteristic_time(pdf, cache_size, target=None)[source]

Return the characteristic time of an item or of all items, as defined by Che et al.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

target : int, optional

The item index [1,N] for which characteristic time is requested. If not specified, the function calculates the characteristic time of all the items in the population.

Returns:

r : array of float or float

If target is None, returns an array with the characteristic times of all items in the population. If a target is specified, then it returns the characteristic time of only the specified item.

che_per_content_cache_hit_ratio(pdf, cache_size, target=None)[source]

Estimate the cache hit ratio of an item or of all items using the Che’s approximation.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

target : int, optional

The item index for which cache hit ratio is requested. If not specified, the function calculates the cache hit ratio of all the items in the population.

Returns:

cache_hit_ratio : array of float or float

If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.

che_cache_hit_ratio(pdf, cache_size)[source]

Estimate the overall cache hit ratio of an LRU cache under generic IRM demand using the Che’s approximation.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

Returns:

cache_hit_ratio : float

The overall cache hit ratio

che_characteristic_time_simplified(pdf, cache_size)[source]

Return the characteristic time of an LRU cache under a given IRM workload, as defined by Che et al. This function computes one single characteristic time for all contents. This further approximation is normally accurate for workloads with reduced skewness in their popularity distribution.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

Returns:

r : float

The characteristic time.

che_per_content_cache_hit_ratio_simplified(pdf, cache_size, target=None)[source]

Estimate the cache hit ratio of an item or of all items using the Che’s approximation. This version uses a single characteristic time for all contents.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

target : int, optional

The item index for which cache hit ratio is requested. If not specified, the function calculates the cache hit ratio of all the items in the population.

Returns:

cache_hit_ratio : array of float or float

If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.

che_cache_hit_ratio_simplified(pdf, cache_size)[source]

Estimate the overall cache hit ratio of an LRU cache under generic IRM demand using the Che’s approximation. This version uses a single characteristic time for all contents.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

Returns:

cache_hit_ratio : float

The overall cache hit ratio

che_characteristic_time_generalized(pdf, cache_size, policy, **policy_args)[source]

Return the characteristic time of a cache under a given IRM demand according to the the extension of Che’s approximation proposed by Martina et al. This function computes one single characteristic time for all content items.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

policy : str

The cache replacement policy (‘LRU’, ‘q-LRU’, ‘FIFO’, ‘RANDOM’)

Returns:

r : float

The characteristic time.

che_per_content_cache_hit_ratio_generalized(pdf, cache_size, policy, **policy_args)[source]

Estimate the cache hit ratio of an item or of all items in a cache subject to IRM demand according to the extension of Che’s approximation proposed by Martina et al.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

policy : str, optional

The cache replacement policy (‘LRU’, ‘q-LRU’, ‘FIFO’, ‘RANDOM’)

Returns:

cache_hit_ratio : array of float or float

If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.

che_cache_hit_ratio_generalized(pdf, cache_size, policy='LRU', **policy_args)[source]

Estimate the overall cache hit ratio of a cache subject to IRM demand according to the extension of Che’s approximation proposed by Martina et al.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

policy : str, optional

The cache replacement policy (‘LRU’, ‘q-LRU’, ‘FIFO’, ‘RANDOM’)

Returns:

cache_hit_ratio : float

The overall cache hit ratio

laoutaris_characteristic_time(alpha, population, cache_size, order=3)[source]

Estimates the Che’s characteristic time of an LRU cache under general power-law demand using the Laoutaris approximation.

Parameters:

alpha : float

The coefficient of the demand power-law

population : int

The content population

cache_size : int

The cache size

order : int, optional

The order of the Taylor expansion. Supports only 2 and 3

Returns:

cache_hit_ratio : float

The cache hit ratio

References

http://arxiv.org/pdf/0705.1970.pdf

laoutaris_per_content_cache_hit_ratio(alpha, population, cache_size, order=3, target=None)[source]

Estimates the per-content cache hit ratio of an LRU cache under general power-law demand using the Laoutaris approximation.

Parameters:

alpha : float

The coefficient of the demand power-law distribution

population : int

The content population

cache_size : int

The cache size

order : int, optional

The order of the Taylor expansion. Supports only 2 and 3

target : int, optional

The item index [1,N] for which cache hit ratio is requested. If not specified, the function calculates the cache hit ratio of all the items in the population.

Returns:

cache_hit_ratio : array of float or float

If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.

References

http://arxiv.org/pdf/0705.1970.pdf

laoutaris_cache_hit_ratio(alpha, population, cache_size, order=3)[source]

Estimate the cache hit ratio of an LRU cache under general power-law demand using the Laoutaris approximation.

Parameters:

alpha : float

The coefficient of the demand power-law distribution

population : int

The content population

cache_size : int

The cache size

order : int, optional

The order of the Taylor expansion. Supports only 2 and 3

Returns:

cache_hit_ratio : float

The cache hit ratio

References

http://arxiv.org/pdf/0705.1970.pdf

optimal_cache_hit_ratio(pdf, cache_size)[source]

Return the value of the optimal cache hit ratio of a cache under IRM stationary demand with a given pdf.

In practice this function returns the probability of a cache hit if cache is filled with the cache_size most popular times. This value also corresponds to the steady-state cache hit ratio of an LFU cache.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache_size : int

The size of the cache (in number of items)

Returns:

cache_hit_ratio : float

The optimal cache hit ratio

numeric_per_content_cache_hit_ratio(pdf, cache, warmup=None, measure=None, seed=None, target=None)[source]

Numerically compute the per-content cache hit ratio of a cache under IRM stationary demand with a given pdf.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache : Cache

The cache object (i.e. the instance of a class subclassing icarus.Cache)

warmup : int, optional

The number of warmup requests to generate. If not specified, it is set to 10 times the content population

measure : int, optional

The number of measured requests to generate. If not specified, it is set to 30 times the content population

seed : int, optional

The seed used to generate random numbers

target : int, optional

The item index [1, N] for which cache hit ratio is requested. If not specified, the function calculates the cache hit ratio of all the items in the population.

Returns:

cache_hit_ratio : array of float or float

If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.

numeric_cache_hit_ratio(pdf, cache, warmup=None, measure=None, seed=None)[source]

Numerically compute the cache hit ratio of a cache under IRM stationary demand with a given pdf.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache : Cache

The cache object (i.e. the instance of a class subclassing icarus.Cache)

warmup : int, optional

The number of warmup requests to generate. If not specified, it is set to 10 times the content population

measure : int, optional

The number of measured requests to generate. If not specified, it is set to 30 times the content population

seed : int, optional

The seed used to generate random numbers

Returns:

cache_hit_ratio : float

The cache hit ratio

numeric_cache_hit_ratio_2_layers(pdf, l1_cache, l2_cache, warmup=None, measure=None, seed=None)[source]

Numerically compute the cache hit ratio of a two-layer cache under IRM stationary demand with a given pdf.

Differently from the numeric_cache_hit_ratio function, this function allows users to compute the hits at layer 1, layer 2 and overall.

Parameters:

pdf : array-like

The probability density function of an item being requested

cache : Cache

The cache object (i.e. the instance of a class subclassing icarus.Cache)

warmup : int, optional

The number of warmup requests to generate. If not specified, it is set to 10 times the content population

measure : int, optional

The number of measured requests to generate. If not specified, it is set to 30 times the content population

seed : int, optional

The seed used to generate random numbers

Returns:

cache_hit_ratio : dict

Dictionary with keys “l1_hits”, “l2_hits” and “total_hits”

trace_driven_cache_hit_ratio(workload, cache, warmup_ratio=0.25)[source]

Compute cache hit ratio of a cache under an arbitrary trace-driven workload.

Parameters:

workload : list or array

List of URLs or content identifiers extracted from a trace. This list only needs to contains content identifiers and not timestamps

cache : Cache

Instance of a cache object

warmup_ratio : float, optional

Ratio of requests of the workload used to warm up the cache (i.e. whose cache hit/miss results are discarded)

Returns:

cache_hit_ratio : float

The cache hit ratio

icarus.tools.stats module

Provides statistical utilities functions used by the simulator

class DiscreteDist(pdf, seed=None)[source]

Bases: object

Implements a discrete distribution with finite population.

The support must be a finite discrete set of contiguous integers {1, ..., N}. This definition of discrete distribution.

Attributes

cdf Return the Cumulative Density Function (CDF)
pdf Return the Probability Density Function (PDF)

Methods

rv() Get rand value from the distribution
cdf

Return the Cumulative Density Function (CDF)

Returns:

cdf : Numpy array

Array representing cdf

pdf

Return the Probability Density Function (PDF)

Returns:

pdf : Numpy array

Array representing the probability density function of the distribution

rv()[source]

Get rand value from the distribution

class TruncatedZipfDist(alpha=1.0, n=1000, seed=None)[source]

Bases: icarus.tools.stats.DiscreteDist

Implements a truncated Zipf distribution, i.e. a Zipf distribution with a finite population, which can hence take values of alpha > 0.

Attributes

alpha
cdf Return the Cumulative Density Function (CDF)
pdf Return the Probability Density Function (PDF)

Methods

rv() Get rand value from the distribution
alpha
means_confidence_interval(data, confidence=0.95)[source]

Computes the confidence interval for a given set of means.

Parameters:

data : array-like

The set of samples whose confidence interval is calculated

confidence : float, optional

The confidence level. It must be a value in the interval (0, 1)

Returns:

mean : float

The mean of the sample

err : float

The standard error of the sample

References

[1] N. Matloff, From Algorithms to Z-Scores: Probabilistic and Statistical
Modeling in Computer Science. Available: http://heather.cs.ucdavis.edu/probstatbook
proportions_confidence_interval(data, confidence)[source]

Computes the confidence interval of a proportion.

Parameters:

data : array-like of bool

The sample of data whose proportion of True values needs to be estimated

confidence : float, optional

The confidence level. It must be a value in the interval (0, 1)

References

[1] N. Matloff, From Algorithms to Z-Scores: Probabilistic and Statistical
Modeling in Computer Science. Available: http://heather.cs.ucdavis.edu/probstatbook
cdf(data)[source]

Return the empirical CDF of a set of 1D data

Parameters:

data : array-like

Array of data

Returns:

x : array

All occurrences of data sorted

cdf : array

The CDF of data. More specifically cdf[i] is the probability that x < x[i]

pdf(data, n_bins)[source]

Return the empirical PDF of a set of 1D data

Parameters:

data : array-like

Array of data

n_bins : int

The number of bins

Returns

x : array

The center point of all bins

pdf : array

The PDF of data.

icarus.tools.traces module

Functions for importing and analyzing traffic traces

frequencies(data)[source]

Extract frequencies from traces. Returns array of sorted frequencies

Parameters:

data : array-like

An array of generic data (i.e. URLs of web pages)

Returns:

frequencies : array of int

The frequencies of the data sorted in descending order

Notes

This function does not return the mapping between data elements and their frequencies, it only returns frequencies. This function can be used to get frequencies to pass to the zipf_fit function given a set of data, e.g. content request traces.

one_timers(data)[source]

Return fraction of contents requested only once (i.e., one-timers)

Parameters:

data : array-like

An array of generic data (i.e. URLs of web pages)

Returns:

one_timers : float

Fraction of content objects requested only once.

trace_stats(data)[source]

Print full stats of a trace

Parameters:

data : array-like

An array of generic data (i.e. URLs of web pages)

zipf_fit(obs_freqs, need_sorting=False)[source]

Returns the value of the Zipf’s distribution alpha parameter that best fits the data provided and the p-value of the fit test.

Parameters:

obs_freqs : array

The array of observed frequencies sorted in descending order

need_sorting : bool, optional

If True, indicates that obs_freqs is not sorted and this function will sort it. If False, assume that the array is already sorted

Returns:

alpha : float

The alpha parameter of the best Zipf fit

p : float

The p-value of the test

Notes

This function uses the method described in http://stats.stackexchange.com/questions/6780/how-to-calculate-zipfs-law-coefficient-from-a-set-of-top-frequencies

parse_url_list(path)[source]

Parse traces from a text file where each line contains a URL requested without timestamp or counters

Parameters:

path : str

The path to the trace file to parse

Returns:

trace : iterator of strings

An iterator whereby each element is dictionary expressing all attributes of an entry of the trace

parse_wikibench(path)[source]

Parses traces from the Wikibench dataset

Parameters:

path : str

The path to the trace file to parse

Returns:

trace : iterator of dicts

An iterator whereby each element is dictionary expressing all attributes of an entry of the trace

parse_squid(path)[source]

Parses traces from a Squid log file. Parse a Squid log file.

Squid is an HTTP reverse proxy. Its logs contains traces of all HTTP requests served and can be used for trace-driven simulations based on realistic HTTP workloads. Traces from the IRCache dataset are in this format.

Parameters:

path : str

The path to the trace file to parse

Returns:

trace : iterator of dicts

An iterator whereby each element is dictionary expressing all attributes of an entry of the trace

Notes

Documentation describing the Squid log format is available here: http://wiki.squid-cache.org/Features/LogFormat

parse_youtube_umass(path)[source]

Parse YouTube collected at UMass campus network [1]_.

These data were collected at UMass campus network over a a measurement period between June 2007 and March 2008.

This function parses the request traces, named youtube.parsed.X.Y.dat. Each entry of the trace provides the following information elements:

  • Timestamp
  • YouTube server IP (anonymized)
  • Client IP (anonymized)
  • Request
  • Video ID
  • Content server IP

Traces are available at http://traces.cs.umass.edu/index.php/Network/Network

Parameters:

path : str

The path to the trace file to parse

Returns:

trace : iterator of dicts

An iterator whereby each element is dictionary expressing all attributes of an entry of the trace

References

..[1] Michael Zink, Kyoungwon Suh, Yu Gu and Jim Kurose,
Watch Global Cache Local: YouTube Network Traces at a Campus Network - Measurements and Implications, in Proc. of IEEE MMCN‘08
parse_common_log_format(path)[source]

Parse files saved in the Common Log Format (CLF)

Parameters:

path : str

The path to the Common Log Format file to parse

Returns:

events : iterator

iterator over the events parsed from the file

Notes

Common Log Format specifications: http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format

Module contents

This package contains code providing various functionalities useful in caching research.

Examples include code for importing and analyzing traffic traces, modeling the behavior of caches and statistical utilities.