icarus.tools package¶
Submodules¶
icarus.tools.cacheperf module¶
Functions for modelling and evaluating the performance of cache replacement policies.
-
che_characteristic_time
(pdf, cache_size, target=None)[source]¶ Return the characteristic time of an item or of all items, as defined by Che et al.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
- target : int, optional
The item index [1,N] for which characteristic time is requested. If not specified, the function calculates the characteristic time of all the items in the population.
Returns: - r : array of float or float
If target is None, returns an array with the characteristic times of all items in the population. If a target is specified, then it returns the characteristic time of only the specified item.
-
che_per_content_cache_hit_ratio
(pdf, cache_size, target=None)[source]¶ Estimate the cache hit ratio of an item or of all items using the Che’s approximation.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
- target : int, optional
The item index for which cache hit ratio is requested. If not specified, the function calculates the cache hit ratio of all the items in the population.
Returns: - cache_hit_ratio : array of float or float
If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.
-
che_cache_hit_ratio
(pdf, cache_size)[source]¶ Estimate the overall cache hit ratio of an LRU cache under generic IRM demand using the Che’s approximation.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
Returns: - cache_hit_ratio : float
The overall cache hit ratio
-
che_characteristic_time_simplified
(pdf, cache_size)[source]¶ Return the characteristic time of an LRU cache under a given IRM workload, as defined by Che et al. This function computes one single characteristic time for all contents. This further approximation is normally accurate for workloads with reduced skewness in their popularity distribution.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
Returns: - r : float
The characteristic time.
-
che_per_content_cache_hit_ratio_simplified
(pdf, cache_size, target=None)[source]¶ Estimate the cache hit ratio of an item or of all items using the Che’s approximation. This version uses a single characteristic time for all contents.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
- target : int, optional
The item index for which cache hit ratio is requested. If not specified, the function calculates the cache hit ratio of all the items in the population.
Returns: - cache_hit_ratio : array of float or float
If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.
-
che_cache_hit_ratio_simplified
(pdf, cache_size)[source]¶ Estimate the overall cache hit ratio of an LRU cache under generic IRM demand using the Che’s approximation. This version uses a single characteristic time for all contents.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
Returns: - cache_hit_ratio : float
The overall cache hit ratio
-
che_characteristic_time_generalized
(pdf, cache_size, policy, **policy_args)[source]¶ Return the characteristic time of a cache under a given IRM demand according to the the extension of Che’s approximation proposed by Martina et al. This function computes one single characteristic time for all content items.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
- policy : str
The cache replacement policy (‘LRU’, ‘q-LRU’, ‘FIFO’, ‘RANDOM’)
Returns: - r : float
The characteristic time.
-
che_per_content_cache_hit_ratio_generalized
(pdf, cache_size, policy, **policy_args)[source]¶ Estimate the cache hit ratio of an item or of all items in a cache subject to IRM demand according to the extension of Che’s approximation proposed by Martina et al.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
- policy : str, optional
The cache replacement policy (‘LRU’, ‘q-LRU’, ‘FIFO’, ‘RANDOM’)
Returns: - cache_hit_ratio : array of float or float
If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.
-
che_cache_hit_ratio_generalized
(pdf, cache_size, policy='LRU', **policy_args)[source]¶ Estimate the overall cache hit ratio of a cache subject to IRM demand according to the extension of Che’s approximation proposed by Martina et al.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
- policy : str, optional
The cache replacement policy (‘LRU’, ‘q-LRU’, ‘FIFO’, ‘RANDOM’)
Returns: - cache_hit_ratio : float
The overall cache hit ratio
-
laoutaris_characteristic_time
(alpha, population, cache_size, order=3)[source]¶ Estimates the Che’s characteristic time of an LRU cache under general power-law demand using the Laoutaris approximation.
Parameters: - alpha : float
The coefficient of the demand power-law
- population : int
The content population
- cache_size : int
The cache size
- order : int, optional
The order of the Taylor expansion. Supports only 2 and 3
Returns: - cache_hit_ratio : float
The cache hit ratio
References
-
laoutaris_per_content_cache_hit_ratio
(alpha, population, cache_size, order=3, target=None)[source]¶ Estimates the per-content cache hit ratio of an LRU cache under general power-law demand using the Laoutaris approximation.
Parameters: - alpha : float
The coefficient of the demand power-law distribution
- population : int
The content population
- cache_size : int
The cache size
- order : int, optional
The order of the Taylor expansion. Supports only 2 and 3
- target : int, optional
The item index [1,N] for which cache hit ratio is requested. If not specified, the function calculates the cache hit ratio of all the items in the population.
Returns: - cache_hit_ratio : array of float or float
If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.
References
-
laoutaris_cache_hit_ratio
(alpha, population, cache_size, order=3)[source]¶ Estimate the cache hit ratio of an LRU cache under general power-law demand using the Laoutaris approximation.
Parameters: - alpha : float
The coefficient of the demand power-law distribution
- population : int
The content population
- cache_size : int
The cache size
- order : int, optional
The order of the Taylor expansion. Supports only 2 and 3
Returns: - cache_hit_ratio : float
The cache hit ratio
References
-
optimal_cache_hit_ratio
(pdf, cache_size)[source]¶ Return the value of the optimal cache hit ratio of a cache under IRM stationary demand with a given pdf.
In practice this function returns the probability of a cache hit if cache is filled with the cache_size most popular times. This value also corresponds to the steady-state cache hit ratio of an LFU cache.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache_size : int
The size of the cache (in number of items)
Returns: - cache_hit_ratio : float
The optimal cache hit ratio
-
numeric_per_content_cache_hit_ratio
(pdf, cache, warmup=None, measure=None, seed=None, target=None)[source]¶ Numerically compute the per-content cache hit ratio of a cache under IRM stationary demand with a given pdf.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache : Cache
The cache object (i.e. the instance of a class subclassing icarus.Cache)
- warmup : int, optional
The number of warmup requests to generate. If not specified, it is set to 10 times the content population
- measure : int, optional
The number of measured requests to generate. If not specified, it is set to 30 times the content population
- seed : int, optional
The seed used to generate random numbers
- target : int, optional
The item index [1, N] for which cache hit ratio is requested. If not specified, the function calculates the cache hit ratio of all the items in the population.
Returns: - cache_hit_ratio : array of float or float
If target is None, returns an array with the cache hit ratios of all items in the population. If a target is specified, then it returns the cache hit ratio of only the specified item.
-
numeric_cache_hit_ratio
(pdf, cache, warmup=None, measure=None, seed=None)[source]¶ Numerically compute the cache hit ratio of a cache under IRM stationary demand with a given pdf.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache : Cache
The cache object (i.e. the instance of a class subclassing icarus.Cache)
- warmup : int, optional
The number of warmup requests to generate. If not specified, it is set to 10 times the content population
- measure : int, optional
The number of measured requests to generate. If not specified, it is set to 30 times the content population
- seed : int, optional
The seed used to generate random numbers
Returns: - cache_hit_ratio : float
The cache hit ratio
-
numeric_cache_hit_ratio_2_layers
(pdf, l1_cache, l2_cache, warmup=None, measure=None, seed=None)[source]¶ Numerically compute the cache hit ratio of a two-layer cache under IRM stationary demand with a given pdf.
Differently from the numeric_cache_hit_ratio function, this function allows users to compute the hits at layer 1, layer 2 and overall.
Parameters: - pdf : array-like
The probability density function of an item being requested
- cache : Cache
The cache object (i.e. the instance of a class subclassing icarus.Cache)
- warmup : int, optional
The number of warmup requests to generate. If not specified, it is set to 10 times the content population
- measure : int, optional
The number of measured requests to generate. If not specified, it is set to 30 times the content population
- seed : int, optional
The seed used to generate random numbers
Returns: - cache_hit_ratio : dict
Dictionary with keys “l1_hits”, “l2_hits” and “total_hits”
-
trace_driven_cache_hit_ratio
(workload, cache, warmup_ratio=0.25)[source]¶ Compute cache hit ratio of a cache under an arbitrary trace-driven workload.
Parameters: - workload : list or array
List of URLs or content identifiers extracted from a trace. This list only needs to contains content identifiers and not timestamps
- cache : Cache
Instance of a cache object
- warmup_ratio : float, optional
Ratio of requests of the workload used to warm up the cache (i.e. whose cache hit/miss results are discarded)
Returns: - cache_hit_ratio : float
The cache hit ratio
icarus.tools.stats module¶
Provides statistical utilities functions used by the simulator
-
class
DiscreteDist
(pdf, seed=None)[source]¶ Bases:
object
Implements a discrete distribution with finite population.
The support must be a finite discrete set of contiguous integers {1, …, N}. This definition of discrete distribution.
Attributes: Methods
rv
()Get rand value from the distribution -
cdf
¶ Return the Cumulative Density Function (CDF)
Returns: - cdf : Numpy array
Array representing cdf
-
pdf
¶ Return the Probability Density Function (PDF)
Returns: - pdf : Numpy array
Array representing the probability density function of the distribution
-
-
class
TruncatedZipfDist
(alpha=1.0, n=1000, seed=None)[source]¶ Bases:
icarus.tools.stats.DiscreteDist
Implements a truncated Zipf distribution, i.e. a Zipf distribution with a finite population, which can hence take values of alpha > 0.
Attributes: Methods
rv
()Get rand value from the distribution -
alpha
¶
-
-
means_confidence_interval
(data, confidence=0.95)[source]¶ Computes the confidence interval for a given set of means.
Parameters: - data : array-like
The set of samples whose confidence interval is calculated
- confidence : float, optional
The confidence level. It must be a value in the interval (0, 1)
Returns: - mean : float
The mean of the sample
- err : float
The standard error of the sample
References
- [1] N. Matloff, From Algorithms to Z-Scores: Probabilistic and Statistical
- Modeling in Computer Science. Available: http://heather.cs.ucdavis.edu/probstatbook
-
proportions_confidence_interval
(data, confidence)[source]¶ Computes the confidence interval of a proportion.
Parameters: - data : array-like of bool
The sample of data whose proportion of True values needs to be estimated
- confidence : float, optional
The confidence level. It must be a value in the interval (0, 1)
References
- [1] N. Matloff, From Algorithms to Z-Scores: Probabilistic and Statistical
- Modeling in Computer Science. Available: http://heather.cs.ucdavis.edu/probstatbook
icarus.tools.traces module¶
Functions for importing and analyzing traffic traces
-
frequencies
(data)[source]¶ Extract frequencies from traces. Returns array of sorted frequencies
Parameters: - data : array-like
An array of generic data (i.e. URLs of web pages)
Returns: - frequencies : array of int
The frequencies of the data sorted in descending order
Notes
This function does not return the mapping between data elements and their frequencies, it only returns frequencies. This function can be used to get frequencies to pass to the zipf_fit function given a set of data, e.g. content request traces.
-
one_timers
(data)[source]¶ Return fraction of contents requested only once (i.e., one-timers)
Parameters: - data : array-like
An array of generic data (i.e. URLs of web pages)
Returns: - one_timers : float
Fraction of content objects requested only once.
-
trace_stats
(data)[source]¶ Print full stats of a trace
Parameters: - data : array-like
An array of generic data (i.e. URLs of web pages)
-
zipf_fit
(obs_freqs, need_sorting=False)[source]¶ Returns the value of the Zipf’s distribution alpha parameter that best fits the data provided and the p-value of the fit test.
Parameters: - obs_freqs : array
The array of observed frequencies sorted in descending order
- need_sorting : bool, optional
If True, indicates that obs_freqs is not sorted and this function will sort it. If False, assume that the array is already sorted
Returns: - alpha : float
The alpha parameter of the best Zipf fit
- p : float
The p-value of the test
Notes
This function uses the method described in http://stats.stackexchange.com/questions/6780/how-to-calculate-zipfs-law-coefficient-from-a-set-of-top-frequencies
-
parse_url_list
(path)[source]¶ Parse traces from a text file where each line contains a URL requested without timestamp or counters
Parameters: - path : str
The path to the trace file to parse
Returns: - trace : iterator of strings
An iterator whereby each element is dictionary expressing all attributes of an entry of the trace
-
parse_wikibench
(path)[source]¶ Parses traces from the Wikibench dataset
Parameters: - path : str
The path to the trace file to parse
Returns: - trace : iterator of dicts
An iterator whereby each element is dictionary expressing all attributes of an entry of the trace
-
parse_squid
(path)[source]¶ Parses traces from a Squid log file. Parse a Squid log file.
Squid is an HTTP reverse proxy. Its logs contains traces of all HTTP requests served and can be used for trace-driven simulations based on realistic HTTP workloads. Traces from the IRCache dataset are in this format.
Parameters: - path : str
The path to the trace file to parse
Returns: - trace : iterator of dicts
An iterator whereby each element is dictionary expressing all attributes of an entry of the trace
Notes
Documentation describing the Squid log format is available here: http://wiki.squid-cache.org/Features/LogFormat
-
parse_youtube_umass
(path)[source]¶ Parse YouTube collected at UMass campus network [1]_.
These data were collected at UMass campus network over a a measurement period between June 2007 and March 2008.
This function parses the request traces, named youtube.parsed.X.Y.dat. Each entry of the trace provides the following information elements:
- Timestamp
- YouTube server IP (anonymized)
- Client IP (anonymized)
- Request
- Video ID
- Content server IP
Traces are available at http://traces.cs.umass.edu/index.php/Network/Network
Parameters: - path : str
The path to the trace file to parse
Returns: - trace : iterator of dicts
An iterator whereby each element is dictionary expressing all attributes of an entry of the trace
References
- ..[1] Michael Zink, Kyoungwon Suh, Yu Gu and Jim Kurose,
- Watch Global Cache Local: YouTube Network Traces at a Campus Network - Measurements and Implications, in Proc. of IEEE MMCN‘08
-
parse_common_log_format
(path)[source]¶ Parse files saved in the Common Log Format (CLF)
Parameters: - path : str
The path to the Common Log Format file to parse
Returns: - events : iterator
iterator over the events parsed from the file
Notes
Common Log Format specifications: http://www.w3.org/Daemon/User/Config/Logging.html#common-logfile-format
Module contents¶
This package contains code providing various functionalities useful in caching research.
Examples include code for importing and analyzing traffic traces, modeling the behavior of caches and statistical utilities.