Thursday, 19 May 2011

Principal Component Analysis (PCA) and SAS

Introduction

PCA can be performed using either PRINCOMP or FACTOR procedures in SAS .
However FACTOR procedure is more flexible as it is possible to perform the exploratory factor analysis. Because the analysis is to be performed using the FACTOR procedure, the output will at times make references to factors rather than to principal components (i.e., component 1 will be referred to as FACTOR1 in the output, component 2 as FACTOR2, and so forth). PCA should be conducted for the large number of  data .

SAS Programming

 PROC FACTOR DATA=Dummy SIMPLE corr
 MINEIGEN=1
 SCREE
 out=D2 outstat=stat;
 VAR region PhysicianType clinicsize enrollpot setting healthcov
  racenation;
 RUN;

 
Steps in Conducting Principal Component Analysis
Principal component analysis is normally conducted in a sequence of steps, with subjective decisions being made at many of these steps.

Step 1: Initial Extraction of the Components

In principal component analysis, the number of components extracted is equal to the number of variables being analyzed. Because seven variables are analyzed in the present study, seven  components will be extracted. The first component can be expected to account for a fairly large amount of the total variance. Each succeeding component will account for progressively smaller amounts of variance. Although a large number of components may be extracted in this way, only the first few components will be important enough to be retained for interpretation .
Step 2: Determining the Number of “Meaningful” Components to Retain

A.The eigenvalue-one criterion.  

In principal component analysis, one of the most commonly used criteria for solving the number-of-components problem is the eigenvalue-one criterion, also known as the Kaiser criterion (Kaiser, 1960). With this approach, we  retain and interpret any component with an eigenvalue greater than 1.00.

 Each observed variable contributes one unit of variance to the total variance in the data set. Any component that displays an eigenvalue greater than 1.00 is accounting for a greater amount of variance than had been contributed by one variable. Such a component is therefore accounting for a meaningful amount of variance, and is worthy of being retained

With the SAS System, the eigenvalue-one criterion can be implemented by including the MINEIGEN=1 option in the PROC FACTOR statement, and not including the NFACT option. The use of MINEIGEN=1 will cause PROC FACTOR to retain any component with an eigenvalue greater than 1.00.
B. The scree test.

 With the scree test (Cattell, 1966), you plot the eigenvalues associated with
each component and look for a “break” between the components with relatively large
eigenvalues and those with small eigenvalues. The components that appear before the break are assumed to be meaningful and are retained for rotation; those apppearing after the break are assumed to be unimportant and are not retained. Sometimes a scree plot will display several large breaks. When this is the case, you should look for the last big break before the eigenvalues begin to level off. Only the components that appear before this last large break should be retained.
Specifying the SCREE option in the PROC FACTOR statement causes the SAS System to print  an eigenvalue plot as part of the output.

C. Proportion of variance

  A third criterion in solving the number of factors problem involves retaining a component if it accounts for a specified proportion (or percentage)of variance in the data set. This proportion can be calculated with a simple formula:
Proportion = Eigenvalue for the component of interest/ Total eigenvalues of the correlation matrix


Wednesday, 23 March 2011

Sample Size Calculation :The Basic

Important considerations for estimating sample size in clinical trials:
  1. Study Design
       There are many statistical designs have been used to achieve objectives. The most common design used are parallel group design and crossover design. For calculating sample size, the study design should be explicitly defined in the objective of the trial. Each design will have different approach and formula for estimating sample size

  1. One Sided or Two sided Test
     This is another important parameter needed for sample size estimation, which explains the objective of study. The objective can be equality, non-inferiority, superiority or equivalence. Equality and equivalence trials are two-sided trials where as non-inferiority and superiority trials are one-sided trials. Superiority or non-inferiority trials can be conducted only if there is prior information available about the test drug on a specific end point.

  1. Primary end point of the study
  The sample size calculation depends on primary end point of study. The description of primary study end point should cover whether it is discrete or continuous or time-to-event. Sample size is estimated differently for each of these end points. Sample size will be adjusted if primary end point involves multiple comparisons.

  1. Expected response of the treatment
The information about expected response is usually obtained from previous trials done on the test drug. If this information is not available, it could be obtained from previous published literature.

  1. Clinically important meaningful decisions
     This is one of most critical and one of most challenging parameters. The challenge here is to define a difference between test and reference which can be considered clinically meaningful. The selection of the difference might take account of the severity of the illness being treated (a treatment effect of that reduce mortality by one percent might be clinically important while a treatment effect that reduces transient asthma by 20% may be of little interest). It might take account of the existence of alternate treatments It might also take account of the treatments cost and side effects


  1. Level of Significance

  Usually 5% . %.Type I error is inversely proportional to sample size

  1. Power of the test
      As per ICH E9guideline). power should not be less that 80%.Type II error is directly proportional to sample size

  1. Withdrawals, missing data and Lost to Follow UP
     Any sample size calculation is based on the total number of subjects who are needed in the final study. In practice, eligible subjects will not always be willing to take part and it will be necessary to approach more subjects than are needed in the first instance. Subjects may fail or refuse to give valid responses to particular questions, physical measurements may suffer from technical problems, and in studies involving follow up (e.g. trials or cohort studies) there will always be some degree of attrition. It may therefore be necessary to calculate the number of subjects that need to be approached in order to achieve the final desired sample size. More formally, suppose a total of N subjects are required in the final study but a proportion (q) are expected to refuse to participate or to drop out before the study ends. In this case the following total number of subjects would have to be approached at the outset to ensure that the final sample size is achieved

General Rules for calculating Sample Size in clinical trials
The rules are as follows:
  1. Level of significance : It is most commonly taken as 5%. The sample size is inversely proportional to level of significance i.e sample size increases as level of significance decreases.
  2. Power :For calculating sample size, power of test should be more than or equal to 80%. Sample size increases as power increases. Higher the power, lower the chance of missing a real effect of treatments.
  3. Clinically meaningful difference : To detect a smaller difference, one needs a sample of large size and vice a versa.
  4. Sample size required to demonstrate equivalence is highest and to demonstrate equality is lowest.
    The sample size estimation is challenging for complex designs such as non-inferiority or, time to event end points. Also, the sample size estimation needs adjustment in accommodating
    • unplanned interim analysis
    • planned interim analysis and
    • adjustment for covariates.

Friday, 11 March 2011

Mis-Randomization and Analysis Populations

Introduction
There is a  common question when  the subject is randomized to a wrong treatment arm , what will  the analysis population during the statistical analysis. This part  tries to give an answer for this query based on the papers published wide over .
“Intention to treat” is a strategy for the analysis of randomized controlled trials that compares patients in the groups to which they were originally randomly assigned. This is generally interpreted as including all patients, regardless of whether they actually satisfied the entry criteria, the treatment actually received, and subsequent withdrawal or deviation from the protocol.

For example, in a trial comparing active and placebo vaccination there is the potential for placebo vaccine to be incorrectly administered in place of active, but this could not occur outside the trial and so need not be accounted for in estimates of potential efficacy. However, most types of deviations from protocol would continue to occur in routine practice and so should be included in the estimated benefit of a change in treatment policy. Exclusion of subjects and events from the analysis can introduce bias, for example, subjects who do not receive the assigned treatment, receive the wrong treatment assignment, die before treatment is given, do not adhere to or comply with the study protocol, or dropout of the study.

As per ICH E9 the statistical section of the protocol should address anticipated problems prospectively in terms of how these affect the subjects and data to be analyzed. The protocol should also specify procedures aimed at minimizing any anticipated irregularities in study conduct that might impair a satisfactory analysis, including various types of protocol violations, withdrawals and missing values. The protocol should consider ways both to reduce the frequency of such problems and to handle the problems that do occur in the analysis of data. Possible amendments to the way in which the analysis will deal with protocol violations should be identified during the blind review.


The problem of treatment deviation is not an anticipated error in a clinical trial.. The frequency and type of protocol violations, missing values, and other problems should be documented in the clinical study report and their potential influence on the trial results should be described (see ICH E3).
However, this ITT analysis has been criticized because it does not provide a true test of treatment efficacy (effect of treatment in those who follow the study protocol) but rather of treatment effectiveness (effect of treatment given to everyone). Thus, other methods have been proposed and used that exclude some subjects and events. For example, the analysis “per protocol” excludes subjects who did not adhere to the protocol. As per ICH E9 the treatment deviation is one of the relevant protocol deviations in a trial
Intention to treat analysis is therefore most suitable for pragmatic trials of effectiveness rather than for explanatory investigations of efficacy.
No method of analysis can completely account for large numbers of study subjects who deviate from the study protocol, thereby resulting in high rates of non-adherence, dropout, or missing data. If non-adherences anticipated being a problem in advance of the trial, the study design and the objectives of the study must be reconsidered.

Additional Notes
Pragmatic research asks whether an intervention works under real-life conditions and whether it works in terms that matter to the patient. It is simply concerned with whether the intervention works, not how or why. Pragmatic studies are most useful for deciding what services should be provided but give only limited insight into why interventions do or do not work.
Patient selection for a pragmatic study should reflect routine practice. All patients who might receive the intervention should be studied. Selection criteria should be broad, with exclusions limited to patient groups for whom either the intervention or control are contraindicated. Thus we will know whether the intervention works for patients in general.
 Explanatory research asks whether an intervention works under ideal or selected conditions. It is more concerned with how and why an intervention works. Explanatory studies are valuable for understanding questions of efficacy but are of limited value for telling us whether we should provide a service to a wide variety of patients in a wide variety of circumstances  
For an explanatory study recruitment may be more selective. By excluding patients with co-morbidity or patients with a doubtful diagnosis we can establish whether the intervention works under ideal conditions. However, we will not know how the intervention works in the rather more complex “real-life” setting.



CDSIC SDTM Standards (3)

Creating new domain
  1. Confirm that none of the existing published domains will fit the need. A custom domain may only be created if the data are different in nature and do not fit into an existing published domain.
  2. Check the Submission Data Standards in SDTM IG for the general rules of creating variables
  3.  Look for an existing, relevant domain model to serve as a prototype. If no existing model seems appropriate, choose the general observation class (Interventions, Events, or Findings) that best fits the data by considering the topic of the observation

General Conventions for SDTM Modeling

Ø        For dates, Findings domains use xxDTC while Interventions
                      and Events domains use xxSTDTC/xxENDTC.
Ø       User-defined domains are named XX format.
Ø      The position of the variables as per SDTM IG
Ø      Handling of “Other, specify” situations
         If only 1 response, put into SUPPxx
         If > 1 response, consider FA domain (if Findings data) and other options

CDSIC SDTM Standards (2)


General Observation Classes (contd)
Events
    Captures planned protocol milestones such as randomization , study completion ,adverse reactions
Findings
  Evaluations/examinations to address specific questions (when in doubt it’s a finding)
Others
   Special purpose domains( like Demography , comments ),  trial design , relationship datasets 
Core Variables :
 
         A required variable is any variable that is basic to the identification of a data record (i.e. cannot be null)
         An expected variable is any variable necessary to make a record meaningful in the context of a specific domain (variable should be included); Some values may be null
         Permissible variables should be used as appropriate when collected or derived.

Fundamentals of SDTM
 
The SDTM is built around the concept of observations collected about subjects who participated in a clinical study.
 Each observation can be described by a series of variables, corresponding to a row in a dataset or table.
 Each variable can be classified according to its Role.
 A Role determines the type of information conveyed by the variable about each distinct observation and how it   can be used.
Variables can be classified into five major roles:

Identifier variables, such as those that identify the study, subject,
 domain, and sequence number of the record
Topic variables, which specify the focus of the observation (such as
the name of a lab test)
Timing variables, which describe the timing of the observation (such
as start date and end date)
Qualifier variables, which include additional illustrative text or
numeric values that describe the results or additional traits of the observation
 (such as units or descriptive adjectives)
Rule variables, which express an algorithm or executable method to define
start, end, and branching or looping conditions in the Trial Design model

The set of Qualifier variables can be further categorized into five sub-classes:
         Grouping Qualifiers are used to group together a collection of observations within the same domain. Examples include --CAT and --SCAT.
         Result Qualifiers describe the specific results associated with the topic variable in a Findings dataset. They answer the question raised by the topic variable. Result Qualifiers are --ORRES, --STRESC, and --STRESN.
          Synonym Qualifiers specify an alternative name for a particular variable in an observation.

         Examples include --MODIFY and --DECOD, which are equivalent terms
         for a --TRT or --TERM topic variable, --TEST and --LOINC which are
         equivalent terms for a --TESTCD.

         Record Qualifiers define additional attributes of the observation record as a whole (rather than describing a particular variable within a record). Examples include --REASND, AESLIFE, and all other SAE flag variables in the AE domain; AGE, SEX, and RACE in the DM domain; and --BLFL, --POS, --LOC, --SPEC and --NAM in a Findings domain
         Variable Qualifiers are used to further modify or describe a specific variable within an observation and are only meaningful in the context of the variable they qualify. Examples include --ORRESU, --ORNRHI, and --ORNRLO, all of which are Variable Qualifiers of --ORRES; and --DOSU, which is a Variable Qualifier of --DOSE.

Each domain dataset is distinguished by a unique, two-character code that should be used consistently throughout the submission. This code, which is stored in the SDTM variable named DOMAIN, is used in four ways:
  1.      as the dataset name,
  2.      the value of the DOMAIN variable in that dataset,
  3.      as a prefix for most variable names in that dataset,
  4.     as a value in the RDOMAIN variable in relationship tables


Submission Metadata Model uses seven distinct metadata attributes to be defined for each dataset variable in the metadata definition document:
         The Variable Name (limited to 8-characters for compatibility with the SAS System Transport format)
         A descriptive Variable Label, using up to 40 characters, which should be unique for each variable in the dataset
         The data Type (e.g., whether the variable value is a character or numeric)
         The set of controlled terminology for the value or the presentation format of the variable (Controlled Terms or Format)
         The Origin or source of each variable
         The Role of the variable, which determines how the variable is used in the dataset. Roles are used to represent the categories of variables as Identifier, Topic, Timing, or the five types of Qualifiers. Since these roles are predefined for all domains that follow the general classes, they do not need to be specified by sponsors in their define data definition document.
         Comments or other relevant information about the variable or its data
General Rules

         The Identifier variables, STUDYID, USUBJID, DOMAIN, and --SEQ are required in all domains based on the general observation classes. Other Identifiers may be added as needed.
          Any Timing variables are permissible for use in any submission dataset based on a general observation class except where restricted by specific domain assumptions.

          Any additional Qualifier variables from the same general observation class may be added to a domain model except where restricted by specific domain assumptions.

          The SDTM allows for the inclusion of the sponsors non-SDTM variables using the Supplemental Qualifiers .
         Standard variables must not be renamed or modified
         As long as no data was collected for Permissible variables, a sponsor is free to drop them
 

Thursday, 10 March 2011

CDSIC SDTM Standards (1)

CDISC( Clinical Data Interchange Standards Consortium ) Milestones
  • Founded around 1997
  • Summer 1998 invited to form DIA SIAC
  • Feb2000- Formed an idependent non-profitable oraganization
  • Dec 2001-Global Presenation
CDISC Mission Statement
To  develop and support  global ,platform-independent data standards that enable information system interoperatability to improve medical research and related areas of HC
 
What is SDTM ?
Standard Data Tabulation Model describes the general abstract model for representing clinical study data that is submitted to regulatory agencies .The CDISC standards provides standard descriptions of  the most commonly used domains , with meta attributes .On July 2004 SDTM was selected as a standard specification for submitting tabulation data to FDA
 
The General Observation Classes
The CDISC has classified the domain  in to four different  categories : Interventions, Events, Findings and Others .
 Interventions
Are related to the therapeutic and experimental treatments