Mahfuzur Rahman

Mahfuzur Rahman

Greater Minneapolis-St. Paul Area
921 followers 500+ connections

About

I am a Machine Learning Engineer with proven experience building data-driven solutions to…

Experience

  • Lowe's Companies, Inc. Graphic

    Lowe's Companies, Inc.

    Greater Minneapolis-St. Paul Area

  • -

  • -

  • -

    Greater Minneapolis-St. Paul Area

  • -

  • -

    Greater Minneapolis-St. Paul Area

  • -

    Greater Minneapolis-St. Paul Area

  • -

    Dhaka, Bangladesh

  • -

    Dhaka, Bangladesh

  • -

Education

  • University of Minnesota Graphic

    University of Minnesota-Twin Cities

    -

    Activities and Societies: • UMN Squash Club • Bangladeshi Student Asociation

    Research focus in Bioinformatics and Computational biology (Applied Data Science & ML)

  • -

    Majored in Machine learning and Pattern recognition. Devised a bioinformatics method to calculate the evolutionary relationship between species as part of my undergraduate thesis.

Licenses & Certifications

Volunteer Experience

  • Computer Instructor

    CAFFE (http://www.caffebd.org/)

    - 1 year 3 months

    Education

    CAFFE is a non-profit organization devoted to providing education/computer training to underprivileged children in Bangladesh. As a member of CAFFE I
    • Volunteered as a computer instructor to educate children through technology.
    • Tutored students in preliminary mathematics through educational computer games.
    • Trained them in the use of various entertainment-related software (music, painting, etc.).

  • Treasurer

    Bangaldeshi Student Association

    - 3 years

    Arts and Culture

    Mange the funding and treasury of Bangladeshi Student Association, a cultural group at University of Minnesota.

  • KDE Graphic

    Open Source Developer

    KDE

    - 2 months

    Education

    Mentored high school students in coding and documentation as a part of Google Code In 2011.

  • Scientific Reviewer

    ISMB, ECCB, RECOMB, JOSS

    - Present 7 years 5 months

    Education

    ● Review research papers on applications of machine learning and statistical approaches in computational biology/bioinformatics.
    ● Reviewed 10+ research articles in different conferences to date.

  • Omdena Bangladesh Chapter Graphic

    Analytics and Strategy Planning Chair

    Omdena Bangladesh Chapter

    - Present 3 years 2 months

    Science and Technology

    Promote AI/ML education in Bangladesh and empower Bangladeshi organizations to utilize AI to solve real-world problems.

Publications

  • Dimensionality reduction methods for extracting functional networks from large-scale CRISPR screens. Arshia Hassan, Henry Ward, Mahfuzur Rahman, et al.

    Molecular Systems Biology / EMBOpress

    CRISPR-Cas9 screens facilitate the discovery of gene functional relationships and phenotype-specific dependencies. The Cancer Dependency Map (DepMap) is the largest compendium of whole-genome CRISPR screens aimed at identifying cancer-specific genetic dependencies across human cell lines. A mitochondria-associated bias has been previously reported to mask signals for genes involved in other functions, and thus, methods for normalizing this dominant signal to improve co-essentiality networks are…

    CRISPR-Cas9 screens facilitate the discovery of gene functional relationships and phenotype-specific dependencies. The Cancer Dependency Map (DepMap) is the largest compendium of whole-genome CRISPR screens aimed at identifying cancer-specific genetic dependencies across human cell lines. A mitochondria-associated bias has been previously reported to mask signals for genes involved in other functions, and thus, methods for normalizing this dominant signal to improve co-essentiality networks are of interest. In this study, we explore three unsupervised dimensionality reduction methods—autoencoders, robust, and classical principal component analyses (PCA)—for normalizing the DepMap to improve functional networks extracted from these data. We propose a novel “onion” normalization technique to combine several normalized data layers into a single network. Benchmarking analyses reveal that robust PCA combined with onion normalization outperforms existing methods for normalizing the DepMap. Our work demonstrates the value of removing low-dimensional signals from the DepMap before constructing functional gene networks and provides generalizable dimensionality reduction-based normalization tools.

    Other authors
    See publication
  • A method for benchmarking genetic screens reveals a predominant mitochondrial bias. Mahfuzur Rahman*, Maximilian Billmann* et al.

    Molecular Systems Biology

    We present FLEX (Functional evaluation of experimental perturbations), a pipeline that leverages several functional annotation resources to establish reference standards for benchmarking human genome-wide CRISPR screen data and methods for analyzing them. FLEX provides a quantitative measurement of the functional information captured by a given gene-pair dataset and a means to explore the diversity of functions captured by the input dataset. We apply FLEX to analyze data from the diverse cell…

    We present FLEX (Functional evaluation of experimental perturbations), a pipeline that leverages several functional annotation resources to establish reference standards for benchmarking human genome-wide CRISPR screen data and methods for analyzing them. FLEX provides a quantitative measurement of the functional information captured by a given gene-pair dataset and a means to explore the diversity of functions captured by the input dataset. We apply FLEX to analyze data from the diverse cell line screens generated by the DepMap project. We identify a predominant mitochondria-associated signal within co-essentiality networks derived from these data and explore the basis of this signal. Our analysis and time-resolved CRISPR screens in a single cell line suggest that the variable phenotypes associated with mitochondria genes across cells may reflect screen dynamics and protein stability effects rather than genetic dependencies. We characterize this functional bias and demonstrate its relevance for interpreting differential hits in any CRISPR screening context. More generally, we demonstrate the utility of the FLEX pipeline for performing robust comparative evaluations of CRISPR screens or methods for processing them.

    See publication
  • Environmental robustness of the global yeast genetic interaction network. Michael Costanzo*, Jing Hou*, Vincent Messier*, Justin Nelson*, Mahfuzur Rahman*, et al.

    Science

    Phenotypes associated with genetic variants can be altered by interactions with other genetic variants (GxG), with the environment (GxE), or both (GxGxE). Yeast genetic interactions have been mapped on a global scale, but the environmental influence on the plasticity of genetic networks has not been examined systematically. To assess environmental rewiring of genetic networks, we examined 14 diverse conditions and scored 30,000 functionally representative yeast gene pairs for dynamic…

    Phenotypes associated with genetic variants can be altered by interactions with other genetic variants (GxG), with the environment (GxE), or both (GxGxE). Yeast genetic interactions have been mapped on a global scale, but the environmental influence on the plasticity of genetic networks has not been examined systematically. To assess environmental rewiring of genetic networks, we examined 14 diverse conditions and scored 30,000 functionally representative yeast gene pairs for dynamic, differential interactions. Different conditions revealed novel differential interactions, which often uncovered new functional connections between distantly related gene pairs. However, the majority of observed genetic interactions remained unchanged in different conditions, suggesting that the global yeast genetic interaction network is robust to environmental perturbation and captures the fundamental functional architecture of a eukaryotic cell.

    See publication
  • τ-SGA: synthetic genetic array analysis for systematically screening and quantifying trigenic interactions in yeast. Elena Kuzmin, Mahfuzur Rahman, et al.

    Nature Protocol

    Systematic complex genetic interaction studies have provided insight into high-order functional redundancies and genetic network wiring of the cell. Here, we describe a method for screening and quantifying trigenic interactions from ordered arrays of yeast strains grown on agar plates as individual colonies. The protocol instructs users on the trigenic synthetic genetic array analysis technique, τ-SGA, for high-throughput screens. The steps describe construction of the double-mutant query…

    Systematic complex genetic interaction studies have provided insight into high-order functional redundancies and genetic network wiring of the cell. Here, we describe a method for screening and quantifying trigenic interactions from ordered arrays of yeast strains grown on agar plates as individual colonies. The protocol instructs users on the trigenic synthetic genetic array analysis technique, τ-SGA, for high-throughput screens. The steps describe construction of the double-mutant query strains and the corresponding single-mutant control query strains, which are screened in parallel in two replicates. The screening experimental set-up consists of sequential replica-pinning steps that enable automated mating, meiotic recombination and successive haploid selection steps for the generation of triple mutants, which are scored for colony size as a proxy for fitness, which enables the calculation of trigenic interactions. The procedure described here was used to conduct 422 trigenic interaction screens, which generated ~460,000 yeast triple mutants for trigenic interaction analysis. Users should be familiar with robotic equipment required for high-throughput genetic interaction screens and be proficient at the command line to execute the scoring pipeline. Large-scale screen computational analysis is achieved by using MATLAB pipelines that score raw colony size data to produce τ-SGA interaction scores. Additional recommendations are included for optimizing experimental design and analysis of smaller-scale trigenic interaction screens by using a web-based analysis system, SGAtools. This protocol provides a resource for those who would like to gain a deeper, more practical understanding of trigenic interaction screening and quantification methodology.

    See publication
  • Systematic mapping of genetic interactions for de novo fatty acid synthesis identifies C12orf49 as a regulator of lipid metabolism

    Nature Metabolism

    The de novo synthesis of fatty acids has emerged as a therapeutic target for various diseases, including cancer. Because cancer cells are intrinsically buffered to combat metabolic stress, it is important to understand how cells may adapt to the loss of de novo fatty acid biosynthesis. Here, we use pooled genome-wide CRISPR screens to systematically map genetic interactions (GIs) in human HAP1 cells carrying a loss-of-function mutation in fatty acid synthase (FASN), whose product catalyses the…

    The de novo synthesis of fatty acids has emerged as a therapeutic target for various diseases, including cancer. Because cancer cells are intrinsically buffered to combat metabolic stress, it is important to understand how cells may adapt to the loss of de novo fatty acid biosynthesis. Here, we use pooled genome-wide CRISPR screens to systematically map genetic interactions (GIs) in human HAP1 cells carrying a loss-of-function mutation in fatty acid synthase (FASN), whose product catalyses the formation of long-chain fatty acids. FASN-mutant cells show a strong dependence on lipid uptake that is reflected in negative GIs with genes involved in the LDL receptor pathway, vesicle trafficking and protein glycosylation. Further support for these functional relationships is derived from additional GI screens in query cell lines deficient in other genes involved in lipid metabolism, including LDLR, SREBF1, SREBF2 and ACACA. Our GI profiles also identify a potential role for the previously uncharacterized gene C12orf49 (which we call LUR1) in regulation of exogenous lipid uptake through modulation of SREBF2 signalling in response to lipid starvation. Overall, our data highlight the genetic determinants underlying the cellular adaptation associated with loss of de novo fatty acid synthesis and demonstrate the power of systematic GI mapping for uncovering metabolic buffering mechanisms in human cells.

    See publication
  • Genome-wide identification of quantitative genetic interactions in human cells using CRISPR/Cas9 screens. Maximilian Billmann, Michael Costanzo, A H M Mahfuzur Rahman et al.

    In preparation

    A major focus of systems biology and genomic medicine is to link genotype to phenotype, yet accurately predicting disease states from genome sequence remains a major challenge. Using lessons learned from the model organism yeast, we systematically mapped genome-wide genetic interactions using CRISPR/Cas9 in human cells. We performed 180 genome-wide screens using HAP1 query cell lines carrying loss-of-function mutations in genes in diverse bioprocesses, along with more than 30 screens in…

    A major focus of systems biology and genomic medicine is to link genotype to phenotype, yet accurately predicting disease states from genome sequence remains a major challenge. Using lessons learned from the model organism yeast, we systematically mapped genome-wide genetic interactions using CRISPR/Cas9 in human cells. We performed 180 genome-wide screens using HAP1 query cell lines carrying loss-of-function mutations in genes in diverse bioprocesses, along with more than 30 screens in wildtype (wt) HAP1 cells. Overall, we screened more than 3 million unique gene pairs for interactions, representing the largest effort to date to study double mutant phenotypes in isogenic human cells. We developed a computational pipeline to identify quantitative genetic interactions (qGI) from these data. We identified several unexpected statistical artifacts in loss-of-function screens including interactions caused by variation between wt HAP1 screens and potential clonal effects of HAP1 cells harboring a loss-of-function mutation. We describe statistical elements of the qGI scoring pipeline designed to normalize these effects and insights we gained about interpreting phenotypes from CRISPR/Cas9 screens in the process. We also describe what we have learned about the topology of negative and positive genetic interactions in human cells, the power of genetic interaction profiles to define gene function across the genome, and their connections to other types of functional relationships, many of which are conserved from yeast to human cells. In summary, we performed a large number of genome-wide CRISPR/Cas9 screens in specific genetic backgrounds and developed a computational pipeline that will guide the generation of a genome-wide reference genetic interaction network in human cells.

  • A genome-wide screen reveals a role for the HIR histone chaperone complex in preventing mislocalization of budding yeast CENP-A

    Genetics

    Centromeric localization of the evolutionarily conserved centromere-specific histone H3 variant CENP-A (Cse4 in yeast) is essential for faithful chromosome segregation. Overexpression and mislocalization of CENP-A lead to chromosome segregation defects in yeast, flies, and human cells. Overexpression of CENP-A has been observed in human cancers; however, the molecular mechanisms preventing CENP-A mislocalization are not fully understood. Here, we used a genome-wide synthetic genetic array (SGA)…

    Centromeric localization of the evolutionarily conserved centromere-specific histone H3 variant CENP-A (Cse4 in yeast) is essential for faithful chromosome segregation. Overexpression and mislocalization of CENP-A lead to chromosome segregation defects in yeast, flies, and human cells. Overexpression of CENP-A has been observed in human cancers; however, the molecular mechanisms preventing CENP-A mislocalization are not fully understood. Here, we used a genome-wide synthetic genetic array (SGA) to identify gene deletions that exhibit synthetic dosage lethality (SDL) when Cse4 is overexpressed. Deletion for genes encoding the replication-independent histone chaperone HIR complex (HIR1, HIR2, HIR3, HPC2) and a Cse4-specific E3 ubiquitin ligase, PSH1, showed highest SDL. We defined a role for Hir2 in proteolysis of Cse4 that prevents mislocalization of Cse4 to noncentromeric regions for genome stability. Hir2 interacts with Cse4 in vivo, and hir2∆ strains exhibit defects in Cse4 proteolysis and stabilization of chromatin-bound Cse4. Mislocalization of Cse4 to noncentromeric regions with a preferential enrichment at promoter regions was observed in hir2∆ strains. We determined that Hir2 facilitates the interaction of Cse4 with Psh1, and that defects in Psh1-mediated proteolysis contribute to increased Cse4 stability and mislocalization of Cse4 in the hir2∆ strain. In summary, our genome-wide screen provides insights into pathways that regulate proteolysis of Cse4 and defines a novel role for the HIR complex in preventing mislocalization of Cse4 by facilitating proteolysis of Cse4, thereby promoting genome stability.

    See publication
  • Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens

    G3: Genes, Genomes, Genetics

    The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and…

    The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines.

    See publication
  • Effective Sparse Dynamic Programming Algorithms for Merged and Block Merged LCS Problems

    Journal of Computers

    The longest common subsequence problem has been widely studied and used to find out the relationship between sequences. In this paper, we study the interleaving relationship between sequences. Given a target sequence T and two merging sequences A and B, we need to find out the LCS between M(A, B) and T, where M(A, B) denotes the merging sequence of A and B. We first present a O((Rr + Pm)log log r) time algorithm where |T| = n, |A| = m, |B| = r, R is the total number of ordered pairs of…

    The longest common subsequence problem has been widely studied and used to find out the relationship between sequences. In this paper, we study the interleaving relationship between sequences. Given a target sequence T and two merging sequences A and B, we need to find out the LCS between M(A, B) and T, where M(A, B) denotes the merging sequence of A and B. We first present a O((Rr + Pm)log log r) time algorithm where |T| = n, |A| = m, |B| = r, R is the total number of ordered pairs of positions at which the two strings A and T match and P denotes the total number of ordered pairs of positions at which the two strings B and T match. We also propose an algorithm to solve a variation of the problem where block constraint arises. The running time of the blocked version is O(max{Rβ log log r, Pαlog log r}), where α denotes the number of blocks in A and β denotes the number of blocks in B.

    Other authors
    • Dr. Mohammad Sohel Rahman
    See publication

Courses

  • Advanced Genetics and Genomics

    GCD 8131

  • Algorithms

    CSE 207

  • Artificial Intelligence

    CSE 401

  • Bayesian Statistics: From Concept to Data Analysis

    Coursera

  • Biostatistics I

    PUBH 6450

  • Computational Techniques for Genomices

    CSCI 5481

  • Data Structures

    CSE 203

  • Functional Genomics, Systems Biology, and Bioinformatics

    CSCI 5461

  • Introduction to Data Engineering

    datacamp

  • Introduction to Data Mining

    CSCI 5523

  • Introduction to Machine Learning

    CSCI 5521

  • Machine Learning Crash Course

    Google

  • Matrix, Vectors, Fourier Analysis and Laplace Transforms

    MATH 243

  • Molecular Cell Biology

    GCD 5036

  • Python for Genomic Data Science

    Coursera

  • Writing in English at University

    Coursera

Projects

  • A reference network of human genetic interactions

    - Present

    • Quantified reproducibility of genetic interactions (generated by collaborators) and systematically evaluated biological importance of genetic interaction scores.
    • Calculated statistical enrichment of biological modules (i.e. protein complexes) in the network.
    • Predicted a set of ~1600 essential genes for human HAP1 cell lines using Random Forest.
    • Collaborated in a cross-disciplinary setting resulting in two publications (one in Nature Metabolism) and four more in…

    • Quantified reproducibility of genetic interactions (generated by collaborators) and systematically evaluated biological importance of genetic interaction scores.
    • Calculated statistical enrichment of biological modules (i.e. protein complexes) in the network.
    • Predicted a set of ~1600 essential genes for human HAP1 cell lines using Random Forest.
    • Collaborated in a cross-disciplinary setting resulting in two publications (one in Nature Metabolism) and four more in preparation.

    TECHNICAL SKILLS
    • R (ggplot2, caret, ranger), Python (pyMC3, scikit-learn)
    • Hierarchical clustering (Cluster 3.0), MCL (Markov Clustering), PCA, random forest classifier, class imbalance, SMOTE, cross-validation, interpretable machine learning, probabilistic modeling, Markov chain Monte Carlo - MCMC (gibs within metropolis)

  • Benchmarking and visual interpretation of genome-wide experimental screens

    -

    • Devised and implemented an R package named FLEX (also available in MATLAB) for systematic performance evaluation of competing methods to score experimental screens.
    • Upgraded FLEX for automatic data visualization and extensibility with additional evaluation datasets (currently supports Complex, Pathway, and Gene Ontology).
    • Generated biological insights upon application of FLEX on the state-of-the-art DepMap CRISPR screens.

    TECHNICAL SKILLS
    • Data cleaning, machine learning,…

    • Devised and implemented an R package named FLEX (also available in MATLAB) for systematic performance evaluation of competing methods to score experimental screens.
    • Upgraded FLEX for automatic data visualization and extensibility with additional evaluation datasets (currently supports Complex, Pathway, and Gene Ontology).
    • Generated biological insights upon application of FLEX on the state-of-the-art DepMap CRISPR screens.

    TECHNICAL SKILLS
    • Data cleaning, machine learning, PCA, performance comparison, precision-recall, similarity metrics, local regression (loess), class imbalance, data visualization.
    • R (Bioconductor), MATLAB (bioinformatics toolbox and machine learning toolbox)
    • CRISPR, genome-wide screens, Cancer biology.

    Other creators
    See project
  • Yeast Genetic Interaction (GI) Network

    -

    • Maintained a yeast GI scoring pipeline (> 10,000 lines of Matlab and Python code) and applied it to new experimental screens.
    • Enhanced the yeast GI scoring pipeline to solve new biological problems.
    • Cross-functional collaboration of 4 different labs on 3 different projects resulting in 2 publications (One in Nature protocols), one more in peer-review (in Science)

    TECHNICAL SKILLS
    • MATLAB, Synthetic Genetic Array (SGA), Github, Clustering, Treeview

    Other creators
    See project
  • Automated photo enhancements for Samsung Cameras

    -

    • Researched and developed several image enhancement algorithms including cartoon effect, night vision effect, and pencil sketch (black and white) effect.
    • Implemented a fully automated non-photorealistic rendering technique for color pencil sketch drawing making it 5 times efficient than previous state-of-the-art algorithms.
    • Deployed the solution using Microsoft .NET and MS Azure.

    TECHNICAL SKILLS
    • C++, C#, Microsoft Azure
    • Image processing, Image segmentation, Edge…

    • Researched and developed several image enhancement algorithms including cartoon effect, night vision effect, and pencil sketch (black and white) effect.
    • Implemented a fully automated non-photorealistic rendering technique for color pencil sketch drawing making it 5 times efficient than previous state-of-the-art algorithms.
    • Deployed the solution using Microsoft .NET and MS Azure.

    TECHNICAL SKILLS
    • C++, C#, Microsoft Azure
    • Image processing, Image segmentation, Edge detection

  • Document Layout Recognition (GSOC 2011)

    -

    • Implemented a document layout recognition engine to recognize a variety of document types (PDF, DjVu, epub, etc.) using recursive XY cut-based algorithms.
    • Upgraded the engine for fault tolerance to ill-formed technical documents.
    • Integrated the layout recognition engine with Okular, an open-source document viewer.

    TECHNICAL SKILLS
    • Open source development, C++, Qt, optical character recognition, recursive XY cut

    Other creators
    See project
  • Algorithms for Merged LCS problem (Undergraduate Thesis)

    -

    Given two sequences A and B, we created a merged sequence M(A,B), taking different sub-sequences from A and B and merging (interleaving) them. Then we compared M(A,B) against a third sequence T by taking Longest common sub-sequence (LCS) of T and M(A,B). The resulting LCS is named Merged LCS; the Block Merged LCS problem arises when input sequences A and B are blocked sequences.

    The matching of A and T is denoted as R and the matching between B and T is denoted by P. Instead of checking…

    Given two sequences A and B, we created a merged sequence M(A,B), taking different sub-sequences from A and B and merging (interleaving) them. Then we compared M(A,B) against a third sequence T by taking Longest common sub-sequence (LCS) of T and M(A,B). The resulting LCS is named Merged LCS; the Block Merged LCS problem arises when input sequences A and B are blocked sequences.

    The matching of A and T is denoted as R and the matching between B and T is denoted by P. Instead of checking all possible characters in a pairwise manner, we kept track of the matches and calculated only for the matching positions. When the matches are relatively low, the algorithm works considerably faster than previous state-of-the-art algorithms; space complexity is also significantly lower than other competing algorithms in both the blocked and non-blocked cases.

    Other creators
    • Dr. Mohammad Sohel Rahman
    See project

Organizations

  • International Society of Computational Biology

    -

    - Present

    'A scholarly society for advancing understanding of living systems through computation and for communicating scientific advances worldwide.'

Recommendations received

View Mahfuzur’s full profile

  • See who you know in common
  • Get introduced
  • Contact Mahfuzur directly
Join to view full profile

People also viewed

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Mahfuzur Rahman in United States