“I have been working with mahfuzur for over 2 years now and i would highly recommend him for being a great team player, a keen observer and above all a very down to earth person.”
Mahfuzur Rahman
Greater Minneapolis-St. Paul Area
921 followers
500+ connections
About
I am a Machine Learning Engineer with proven experience building data-driven solutions to…
Experience
Education
-
University of Minnesota-Twin Cities
-
Activities and Societies: • UMN Squash Club • Bangladeshi Student Asociation
Research focus in Bioinformatics and Computational biology (Applied Data Science & ML)
-
-
Majored in Machine learning and Pattern recognition. Devised a bioinformatics method to calculate the evolutionary relationship between species as part of my undergraduate thesis.
Licenses & Certifications
Volunteer Experience
-
Computer Instructor
CAFFE (http://www.caffebd.org/)
- 1 year 3 months
Education
CAFFE is a non-profit organization devoted to providing education/computer training to underprivileged children in Bangladesh. As a member of CAFFE I
• Volunteered as a computer instructor to educate children through technology.
• Tutored students in preliminary mathematics through educational computer games.
• Trained them in the use of various entertainment-related software (music, painting, etc.). -
Treasurer
Bangaldeshi Student Association
- 3 years
Arts and Culture
Mange the funding and treasury of Bangladeshi Student Association, a cultural group at University of Minnesota.
-
Open Source Developer
KDE
- 2 months
Education
Mentored high school students in coding and documentation as a part of Google Code In 2011.
-
Scientific Reviewer
ISMB, ECCB, RECOMB, JOSS
- Present 7 years 5 months
Education
● Review research papers on applications of machine learning and statistical approaches in computational biology/bioinformatics.
● Reviewed 10+ research articles in different conferences to date. -
Analytics and Strategy Planning Chair
Omdena Bangladesh Chapter
- Present 3 years 2 months
Science and Technology
Promote AI/ML education in Bangladesh and empower Bangladeshi organizations to utilize AI to solve real-world problems.
Publications
-
Dimensionality reduction methods for extracting functional networks from large-scale CRISPR screens. Arshia Hassan, Henry Ward, Mahfuzur Rahman, et al.
Molecular Systems Biology / EMBOpress
CRISPR-Cas9 screens facilitate the discovery of gene functional relationships and phenotype-specific dependencies. The Cancer Dependency Map (DepMap) is the largest compendium of whole-genome CRISPR screens aimed at identifying cancer-specific genetic dependencies across human cell lines. A mitochondria-associated bias has been previously reported to mask signals for genes involved in other functions, and thus, methods for normalizing this dominant signal to improve co-essentiality networks are…
CRISPR-Cas9 screens facilitate the discovery of gene functional relationships and phenotype-specific dependencies. The Cancer Dependency Map (DepMap) is the largest compendium of whole-genome CRISPR screens aimed at identifying cancer-specific genetic dependencies across human cell lines. A mitochondria-associated bias has been previously reported to mask signals for genes involved in other functions, and thus, methods for normalizing this dominant signal to improve co-essentiality networks are of interest. In this study, we explore three unsupervised dimensionality reduction methods—autoencoders, robust, and classical principal component analyses (PCA)—for normalizing the DepMap to improve functional networks extracted from these data. We propose a novel “onion” normalization technique to combine several normalized data layers into a single network. Benchmarking analyses reveal that robust PCA combined with onion normalization outperforms existing methods for normalizing the DepMap. Our work demonstrates the value of removing low-dimensional signals from the DepMap before constructing functional gene networks and provides generalizable dimensionality reduction-based normalization tools.
Other authorsSee publication -
A method for benchmarking genetic screens reveals a predominant mitochondrial bias. Mahfuzur Rahman*, Maximilian Billmann* et al.
Molecular Systems Biology
We present FLEX (Functional evaluation of experimental perturbations), a pipeline that leverages several functional annotation resources to establish reference standards for benchmarking human genome-wide CRISPR screen data and methods for analyzing them. FLEX provides a quantitative measurement of the functional information captured by a given gene-pair dataset and a means to explore the diversity of functions captured by the input dataset. We apply FLEX to analyze data from the diverse cell…
We present FLEX (Functional evaluation of experimental perturbations), a pipeline that leverages several functional annotation resources to establish reference standards for benchmarking human genome-wide CRISPR screen data and methods for analyzing them. FLEX provides a quantitative measurement of the functional information captured by a given gene-pair dataset and a means to explore the diversity of functions captured by the input dataset. We apply FLEX to analyze data from the diverse cell line screens generated by the DepMap project. We identify a predominant mitochondria-associated signal within co-essentiality networks derived from these data and explore the basis of this signal. Our analysis and time-resolved CRISPR screens in a single cell line suggest that the variable phenotypes associated with mitochondria genes across cells may reflect screen dynamics and protein stability effects rather than genetic dependencies. We characterize this functional bias and demonstrate its relevance for interpreting differential hits in any CRISPR screening context. More generally, we demonstrate the utility of the FLEX pipeline for performing robust comparative evaluations of CRISPR screens or methods for processing them.
-
Environmental robustness of the global yeast genetic interaction network. Michael Costanzo*, Jing Hou*, Vincent Messier*, Justin Nelson*, Mahfuzur Rahman*, et al.
Science
Phenotypes associated with genetic variants can be altered by interactions with other genetic variants (GxG), with the environment (GxE), or both (GxGxE). Yeast genetic interactions have been mapped on a global scale, but the environmental influence on the plasticity of genetic networks has not been examined systematically. To assess environmental rewiring of genetic networks, we examined 14 diverse conditions and scored 30,000 functionally representative yeast gene pairs for dynamic…
Phenotypes associated with genetic variants can be altered by interactions with other genetic variants (GxG), with the environment (GxE), or both (GxGxE). Yeast genetic interactions have been mapped on a global scale, but the environmental influence on the plasticity of genetic networks has not been examined systematically. To assess environmental rewiring of genetic networks, we examined 14 diverse conditions and scored 30,000 functionally representative yeast gene pairs for dynamic, differential interactions. Different conditions revealed novel differential interactions, which often uncovered new functional connections between distantly related gene pairs. However, the majority of observed genetic interactions remained unchanged in different conditions, suggesting that the global yeast genetic interaction network is robust to environmental perturbation and captures the fundamental functional architecture of a eukaryotic cell.
-
τ-SGA: synthetic genetic array analysis for systematically screening and quantifying trigenic interactions in yeast. Elena Kuzmin, Mahfuzur Rahman, et al.
Nature Protocol
Systematic complex genetic interaction studies have provided insight into high-order functional redundancies and genetic network wiring of the cell. Here, we describe a method for screening and quantifying trigenic interactions from ordered arrays of yeast strains grown on agar plates as individual colonies. The protocol instructs users on the trigenic synthetic genetic array analysis technique, τ-SGA, for high-throughput screens. The steps describe construction of the double-mutant query…
Systematic complex genetic interaction studies have provided insight into high-order functional redundancies and genetic network wiring of the cell. Here, we describe a method for screening and quantifying trigenic interactions from ordered arrays of yeast strains grown on agar plates as individual colonies. The protocol instructs users on the trigenic synthetic genetic array analysis technique, τ-SGA, for high-throughput screens. The steps describe construction of the double-mutant query strains and the corresponding single-mutant control query strains, which are screened in parallel in two replicates. The screening experimental set-up consists of sequential replica-pinning steps that enable automated mating, meiotic recombination and successive haploid selection steps for the generation of triple mutants, which are scored for colony size as a proxy for fitness, which enables the calculation of trigenic interactions. The procedure described here was used to conduct 422 trigenic interaction screens, which generated ~460,000 yeast triple mutants for trigenic interaction analysis. Users should be familiar with robotic equipment required for high-throughput genetic interaction screens and be proficient at the command line to execute the scoring pipeline. Large-scale screen computational analysis is achieved by using MATLAB pipelines that score raw colony size data to produce τ-SGA interaction scores. Additional recommendations are included for optimizing experimental design and analysis of smaller-scale trigenic interaction screens by using a web-based analysis system, SGAtools. This protocol provides a resource for those who would like to gain a deeper, more practical understanding of trigenic interaction screening and quantification methodology.
-
Systematic mapping of genetic interactions for de novo fatty acid synthesis identifies C12orf49 as a regulator of lipid metabolism
Nature Metabolism
The de novo synthesis of fatty acids has emerged as a therapeutic target for various diseases, including cancer. Because cancer cells are intrinsically buffered to combat metabolic stress, it is important to understand how cells may adapt to the loss of de novo fatty acid biosynthesis. Here, we use pooled genome-wide CRISPR screens to systematically map genetic interactions (GIs) in human HAP1 cells carrying a loss-of-function mutation in fatty acid synthase (FASN), whose product catalyses the…
The de novo synthesis of fatty acids has emerged as a therapeutic target for various diseases, including cancer. Because cancer cells are intrinsically buffered to combat metabolic stress, it is important to understand how cells may adapt to the loss of de novo fatty acid biosynthesis. Here, we use pooled genome-wide CRISPR screens to systematically map genetic interactions (GIs) in human HAP1 cells carrying a loss-of-function mutation in fatty acid synthase (FASN), whose product catalyses the formation of long-chain fatty acids. FASN-mutant cells show a strong dependence on lipid uptake that is reflected in negative GIs with genes involved in the LDL receptor pathway, vesicle trafficking and protein glycosylation. Further support for these functional relationships is derived from additional GI screens in query cell lines deficient in other genes involved in lipid metabolism, including LDLR, SREBF1, SREBF2 and ACACA. Our GI profiles also identify a potential role for the previously uncharacterized gene C12orf49 (which we call LUR1) in regulation of exogenous lipid uptake through modulation of SREBF2 signalling in response to lipid starvation. Overall, our data highlight the genetic determinants underlying the cellular adaptation associated with loss of de novo fatty acid synthesis and demonstrate the power of systematic GI mapping for uncovering metabolic buffering mechanisms in human cells.
-
Genome-wide identification of quantitative genetic interactions in human cells using CRISPR/Cas9 screens. Maximilian Billmann, Michael Costanzo, A H M Mahfuzur Rahman et al.
In preparation
A major focus of systems biology and genomic medicine is to link genotype to phenotype, yet accurately predicting disease states from genome sequence remains a major challenge. Using lessons learned from the model organism yeast, we systematically mapped genome-wide genetic interactions using CRISPR/Cas9 in human cells. We performed 180 genome-wide screens using HAP1 query cell lines carrying loss-of-function mutations in genes in diverse bioprocesses, along with more than 30 screens in…
A major focus of systems biology and genomic medicine is to link genotype to phenotype, yet accurately predicting disease states from genome sequence remains a major challenge. Using lessons learned from the model organism yeast, we systematically mapped genome-wide genetic interactions using CRISPR/Cas9 in human cells. We performed 180 genome-wide screens using HAP1 query cell lines carrying loss-of-function mutations in genes in diverse bioprocesses, along with more than 30 screens in wildtype (wt) HAP1 cells. Overall, we screened more than 3 million unique gene pairs for interactions, representing the largest effort to date to study double mutant phenotypes in isogenic human cells. We developed a computational pipeline to identify quantitative genetic interactions (qGI) from these data. We identified several unexpected statistical artifacts in loss-of-function screens including interactions caused by variation between wt HAP1 screens and potential clonal effects of HAP1 cells harboring a loss-of-function mutation. We describe statistical elements of the qGI scoring pipeline designed to normalize these effects and insights we gained about interpreting phenotypes from CRISPR/Cas9 screens in the process. We also describe what we have learned about the topology of negative and positive genetic interactions in human cells, the power of genetic interaction profiles to define gene function across the genome, and their connections to other types of functional relationships, many of which are conserved from yeast to human cells. In summary, we performed a large number of genome-wide CRISPR/Cas9 screens in specific genetic backgrounds and developed a computational pipeline that will guide the generation of a genome-wide reference genetic interaction network in human cells.
-
A genome-wide screen reveals a role for the HIR histone chaperone complex in preventing mislocalization of budding yeast CENP-A
Genetics
Centromeric localization of the evolutionarily conserved centromere-specific histone H3 variant CENP-A (Cse4 in yeast) is essential for faithful chromosome segregation. Overexpression and mislocalization of CENP-A lead to chromosome segregation defects in yeast, flies, and human cells. Overexpression of CENP-A has been observed in human cancers; however, the molecular mechanisms preventing CENP-A mislocalization are not fully understood. Here, we used a genome-wide synthetic genetic array (SGA)…
Centromeric localization of the evolutionarily conserved centromere-specific histone H3 variant CENP-A (Cse4 in yeast) is essential for faithful chromosome segregation. Overexpression and mislocalization of CENP-A lead to chromosome segregation defects in yeast, flies, and human cells. Overexpression of CENP-A has been observed in human cancers; however, the molecular mechanisms preventing CENP-A mislocalization are not fully understood. Here, we used a genome-wide synthetic genetic array (SGA) to identify gene deletions that exhibit synthetic dosage lethality (SDL) when Cse4 is overexpressed. Deletion for genes encoding the replication-independent histone chaperone HIR complex (HIR1, HIR2, HIR3, HPC2) and a Cse4-specific E3 ubiquitin ligase, PSH1, showed highest SDL. We defined a role for Hir2 in proteolysis of Cse4 that prevents mislocalization of Cse4 to noncentromeric regions for genome stability. Hir2 interacts with Cse4 in vivo, and hir2∆ strains exhibit defects in Cse4 proteolysis and stabilization of chromatin-bound Cse4. Mislocalization of Cse4 to noncentromeric regions with a preferential enrichment at promoter regions was observed in hir2∆ strains. We determined that Hir2 facilitates the interaction of Cse4 with Psh1, and that defects in Psh1-mediated proteolysis contribute to increased Cse4 stability and mislocalization of Cse4 in the hir2∆ strain. In summary, our genome-wide screen provides insights into pathways that regulate proteolysis of Cse4 and defines a novel role for the HIR complex in preventing mislocalization of Cse4 by facilitating proteolysis of Cse4, thereby promoting genome stability.
-
Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens
G3: Genes, Genomes, Genetics
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and…
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines.
-
Effective Sparse Dynamic Programming Algorithms for Merged and Block Merged LCS Problems
Journal of Computers
The longest common subsequence problem has been widely studied and used to find out the relationship between sequences. In this paper, we study the interleaving relationship between sequences. Given a target sequence T and two merging sequences A and B, we need to find out the LCS between M(A, B) and T, where M(A, B) denotes the merging sequence of A and B. We first present a O((Rr + Pm)log log r) time algorithm where |T| = n, |A| = m, |B| = r, R is the total number of ordered pairs of…
The longest common subsequence problem has been widely studied and used to find out the relationship between sequences. In this paper, we study the interleaving relationship between sequences. Given a target sequence T and two merging sequences A and B, we need to find out the LCS between M(A, B) and T, where M(A, B) denotes the merging sequence of A and B. We first present a O((Rr + Pm)log log r) time algorithm where |T| = n, |A| = m, |B| = r, R is the total number of ordered pairs of positions at which the two strings A and T match and P denotes the total number of ordered pairs of positions at which the two strings B and T match. We also propose an algorithm to solve a variation of the problem where block constraint arises. The running time of the blocked version is O(max{Rβ log log r, Pαlog log r}), where α denotes the number of blocks in A and β denotes the number of blocks in B.
Other authors
Courses
-
Advanced Genetics and Genomics
GCD 8131
-
Algorithms
CSE 207
-
Artificial Intelligence
CSE 401
-
Bayesian Statistics: From Concept to Data Analysis
Coursera
-
Biostatistics I
PUBH 6450
-
Computational Techniques for Genomices
CSCI 5481
-
Data Structures
CSE 203
-
Functional Genomics, Systems Biology, and Bioinformatics
CSCI 5461
-
Introduction to Data Engineering
datacamp
-
Introduction to Data Mining
CSCI 5523
-
Introduction to Machine Learning
CSCI 5521
-
Machine Learning Crash Course
Google
-
Matrix, Vectors, Fourier Analysis and Laplace Transforms
MATH 243
-
Molecular Cell Biology
GCD 5036
-
Python for Genomic Data Science
Coursera
-
Writing in English at University
Coursera
Projects
-
A reference network of human genetic interactions
- Present
• Quantified reproducibility of genetic interactions (generated by collaborators) and systematically evaluated biological importance of genetic interaction scores.
• Calculated statistical enrichment of biological modules (i.e. protein complexes) in the network.
• Predicted a set of ~1600 essential genes for human HAP1 cell lines using Random Forest.
• Collaborated in a cross-disciplinary setting resulting in two publications (one in Nature Metabolism) and four more in…• Quantified reproducibility of genetic interactions (generated by collaborators) and systematically evaluated biological importance of genetic interaction scores.
• Calculated statistical enrichment of biological modules (i.e. protein complexes) in the network.
• Predicted a set of ~1600 essential genes for human HAP1 cell lines using Random Forest.
• Collaborated in a cross-disciplinary setting resulting in two publications (one in Nature Metabolism) and four more in preparation.
TECHNICAL SKILLS
• R (ggplot2, caret, ranger), Python (pyMC3, scikit-learn)
• Hierarchical clustering (Cluster 3.0), MCL (Markov Clustering), PCA, random forest classifier, class imbalance, SMOTE, cross-validation, interpretable machine learning, probabilistic modeling, Markov chain Monte Carlo - MCMC (gibs within metropolis) -
Benchmarking and visual interpretation of genome-wide experimental screens
-
• Devised and implemented an R package named FLEX (also available in MATLAB) for systematic performance evaluation of competing methods to score experimental screens.
• Upgraded FLEX for automatic data visualization and extensibility with additional evaluation datasets (currently supports Complex, Pathway, and Gene Ontology).
• Generated biological insights upon application of FLEX on the state-of-the-art DepMap CRISPR screens.
TECHNICAL SKILLS
• Data cleaning, machine learning,…• Devised and implemented an R package named FLEX (also available in MATLAB) for systematic performance evaluation of competing methods to score experimental screens.
• Upgraded FLEX for automatic data visualization and extensibility with additional evaluation datasets (currently supports Complex, Pathway, and Gene Ontology).
• Generated biological insights upon application of FLEX on the state-of-the-art DepMap CRISPR screens.
TECHNICAL SKILLS
• Data cleaning, machine learning, PCA, performance comparison, precision-recall, similarity metrics, local regression (loess), class imbalance, data visualization.
• R (Bioconductor), MATLAB (bioinformatics toolbox and machine learning toolbox)
• CRISPR, genome-wide screens, Cancer biology.Other creatorsSee project -
Yeast Genetic Interaction (GI) Network
-
• Maintained a yeast GI scoring pipeline (> 10,000 lines of Matlab and Python code) and applied it to new experimental screens.
• Enhanced the yeast GI scoring pipeline to solve new biological problems.
• Cross-functional collaboration of 4 different labs on 3 different projects resulting in 2 publications (One in Nature protocols), one more in peer-review (in Science)
TECHNICAL SKILLS
• MATLAB, Synthetic Genetic Array (SGA), Github, Clustering, TreeviewOther creatorsSee project -
Automated photo enhancements for Samsung Cameras
-
• Researched and developed several image enhancement algorithms including cartoon effect, night vision effect, and pencil sketch (black and white) effect.
• Implemented a fully automated non-photorealistic rendering technique for color pencil sketch drawing making it 5 times efficient than previous state-of-the-art algorithms.
• Deployed the solution using Microsoft .NET and MS Azure.
TECHNICAL SKILLS
• C++, C#, Microsoft Azure
• Image processing, Image segmentation, Edge…• Researched and developed several image enhancement algorithms including cartoon effect, night vision effect, and pencil sketch (black and white) effect.
• Implemented a fully automated non-photorealistic rendering technique for color pencil sketch drawing making it 5 times efficient than previous state-of-the-art algorithms.
• Deployed the solution using Microsoft .NET and MS Azure.
TECHNICAL SKILLS
• C++, C#, Microsoft Azure
• Image processing, Image segmentation, Edge detection -
Document Layout Recognition (GSOC 2011)
-
• Implemented a document layout recognition engine to recognize a variety of document types (PDF, DjVu, epub, etc.) using recursive XY cut-based algorithms.
• Upgraded the engine for fault tolerance to ill-formed technical documents.
• Integrated the layout recognition engine with Okular, an open-source document viewer.
TECHNICAL SKILLS
• Open source development, C++, Qt, optical character recognition, recursive XY cutOther creatorsSee project -
Algorithms for Merged LCS problem (Undergraduate Thesis)
-
Given two sequences A and B, we created a merged sequence M(A,B), taking different sub-sequences from A and B and merging (interleaving) them. Then we compared M(A,B) against a third sequence T by taking Longest common sub-sequence (LCS) of T and M(A,B). The resulting LCS is named Merged LCS; the Block Merged LCS problem arises when input sequences A and B are blocked sequences.
The matching of A and T is denoted as R and the matching between B and T is denoted by P. Instead of checking…Given two sequences A and B, we created a merged sequence M(A,B), taking different sub-sequences from A and B and merging (interleaving) them. Then we compared M(A,B) against a third sequence T by taking Longest common sub-sequence (LCS) of T and M(A,B). The resulting LCS is named Merged LCS; the Block Merged LCS problem arises when input sequences A and B are blocked sequences.
The matching of A and T is denoted as R and the matching between B and T is denoted by P. Instead of checking all possible characters in a pairwise manner, we kept track of the matches and calculated only for the matching positions. When the matches are relatively low, the algorithm works considerably faster than previous state-of-the-art algorithms; space complexity is also significantly lower than other competing algorithms in both the blocked and non-blocked cases.Other creators
Organizations
-
International Society of Computational Biology
-
- Present'A scholarly society for advancing understanding of living systems through computation and for communicating scientific advances worldwide.'
Recommendations received
3 people have recommended Mahfuzur
Join now to viewPeople also viewed
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Mahfuzur Rahman in United States
-
Mahfuzur Rahman
CEO at Stealth Startup
-
Mahfuzur Rahman
Chief Of Police at Federal Government
-
Mahfuzur Rahman
Financial Counselor
-
Mahfuzur R.
PhD student in Aerospace Engineering
117 others named Mahfuzur Rahman in United States are on LinkedIn
See others named Mahfuzur Rahman