About Me
I am a distributed systems researcher at NVIDIA. Before joining NVIDIA, I was a senior researcher in the network intelligence unit at RISE. I completed my doctoral studies in the Network Systems Laboratory (NSLab) at KTH Royal Institute of Technology under the supervision of Professor Dejan Kostic and Professor Gerald Q. Maguire Jr. I have received my B.Sc. in Electrical Engineering (Electronics) from Sharif University of Technology, Tehran, Iran and my M.Sc. in Electrical Engineering (Digital Electronic Systems) from Amirkabir University of Technology, Tehran, Iran.
For more information, please check my résumé or full CV.
I have received a Google Ph.D. Fellowship 2021 award in Systems and Networking, check my interview with KTH EECS.
Research
My research interests include computer networks and networked systems. During my doctoral studies, I improved the performance of Network Functions Virtualization (NFV) service chains running at 100/200 Gbps by using low-level optimization techniques. You can read more in my licentiate thesis and doctoral dissertation. To read the highlight of my research in Swedish, please check the Framtidens Forskning’s article called “Optimerar cacheminnet för snabbare internettjänster”.
Publications
FAJITA: Stateful Packet Processing at 100 Million pps (CoNEXT’24)
Hamid Ghasemirahni, Alireza Farshin, Mariano Scazzariello, Gerald Q. Maguire Jr., Dejan Kostić, Marco Chiesa
[Paper]
FAJITA proposes an optimized processing pipeline for stateful network functions to minimize memory accesses and overcome the overheads of accessing shared data structures while ensuring efficient batch processing at every stage of the pipeline.NetConfEval: Can LLMs Facilitate Network Configuration? (CoNEXT’24)
Changjie Wang, Mariano Scazzariello, Alireza Farshin, Simone Ferlin, Dejan Kostić, Marco Chiesa
[Paper] [GitHub Repository] [HuggingFace Dataset]
We propose a benchmark, called NetConfEval, to quantify the benefits and challenges of using Large Language Models (LLMs) for configuring networks.Deploying Stateful Network Functions Efficiently using Large Language Models (EuroMLSys’24)
Hamid Ghasemirahni, Alireza Farshin, Mariano Scazzariello, Marco Chiesa, Dejan Kostić
[Paper] [Slides]
We use large language models (LLMs) to extract useful information from stateful network functions and find the optimal RSS configuration to minmize shared memory accesses.Overcoming the IOTLB wall for multi-100-Gbps Linux-based networking (PeerJ CS)
Alireza Farshin, Luigi Rizzo, Khaled Elmeleegy, Dejan Kostić
[Paper] [Reviews]
We (i) characterize the performance of IOMMU & IOTLB on recent Intel Xeon Scalable & AMD EPYC processors at 200 Gbps and (ii) explore the possible opportunities to mitigate its performance overheads in the Linux kernel. Our evaluation shows that using hugepage-backed buffers can completely recover up-to-20-percent throughput drop introduced by IOMMU.Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets (NSDI’22)
Hamid Ghasemirahni, Tom Barbette, Georgios Katsikas, Alireza Farshin, Massimo Girondi, Amir Roozbeh, Marco Chiesa, Gerald Q. Maguire Jr., Dejan Kostić
(Acceptance Rate Spring: 28/104 ≈ 26.9%)
[🏅Community Award Winner!]
[Paper] [Slides] [Video]
We systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrade substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing µs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%. This work has been featured in Ericsson Blog and KTH news.PacketMill: Toward per-core 100-Gbps Networking (ASPLOS’21)
Alireza Farshin, Tom Barbette, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostic
(Acceptance Rate: 75/398 ≈ 18.8%)
[Paper] [Extended Abstract] [Slides] [Video] [Webpage] [FOSDEM’21]
We present PacketMill, a system for optimizing software packet processing, which (i) introduces a new model to efficiently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function configuration file to the low-level userspace network (specifically DPDK) drivers, to mitigate inefficiencies and produce a customized binary for a given network function. Our evaluation results show that PacketMill increases throughput (up to 36.4 Gbps - 70%) & reduces latency (up to 101 us - 28%) and enables nontrivial packet processing (e.g., router) at ≈100 Gbps, when new packets arrive >10× faster than main memory access times, while using only one processing core. This work has been featured in the Ericsson Blog and on Twitter.Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks (ATC’20)
Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostic
(Acceptance Rate: 65/348 ≈ 18.6%)
[Paper] [Slides] [Poster] [Video]
We study the current implementation of Direct Cache Access (DCA) in Intel processors, called Data Direct I/O (DDIO) technology. Our paper shows that it is important to understand the details of DDIO and to tune/optimize it appropriately for a given Internet service to achieve high-performance, especially with the introduction of multi-hundred-gigabit networks. A preliminary version of this paper has been presented in the EuroSys’20 poster session.Make the Most out of Last Level Cache in Intel Processors (EuroSys’19)
Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostic
(Acceptance Rate: 45/207 ≈ 21.7%)
[Paper] [Slides] [Poster] [Video]
We exploited the characteristics of non-uniform cache architecture (NUCA) in recent Intel processors to introduce a new memory management scheme. The results of our work showed that our proposed scheme could reduce the tail latencies in latency-critical Network Function Virtualization (NFV) service chains by 21.5%. Furthermore, our work demonstrated that optimizing the computer systems and taking advantage of nanosecond improvements could have a higher impact on the performance of networking applications. This work has been featured in the Ericsson Blog, Tech Xplore, AlphaGalileo, Twitter, KTH main page, and KTH EECS news.
Personal
In my free time, I play piano and bass guitar. I am also trying to teach myself to compose minimal music, check my SoundCloud playlist.